[jira] [Commented] (COMPRESS-132) Add support for unix dump files

2011-08-15 Thread Stefan Bodewig (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085020#comment-13085020
 ] 

Stefan Bodewig commented on COMPRESS-132:
-

svn revision 1157769 contains a repackaged version of the main tree of your 
code.

Things I've changed:

* repackaged to live in org.apache.commons land

* removed all @author tags and instead added you to the POM as contributor, 
hope this is OK with you (we don't do @author tags).  Should this is a problem 
for you then I'll simply remove the code again.

* merged POSIXArchiveEntry into DumpArchiveEntry for now

* renamed getModTime to getLastModifiedDate as your class didn't implement that 
method (it was added in Compress 1.1)

Missing for me in order to close this are tests - will add some once I have 
access to a machine that has dump installed - and initial documentation for the 
site.  I'll take care of that as well.

 Add support for unix dump files
 ---

 Key: COMPRESS-132
 URL: https://issues.apache.org/jira/browse/COMPRESS-132
 Project: Commons Compress
  Issue Type: New Feature
  Components: Archivers
Reporter: Bear Giles
Priority: Minor
 Fix For: 1.3

 Attachments: dump-20110722.zip, dump.zip, test-z.dump, test.dump


 I'm submitting a series of patches to the ext2/3/4 dump utility and noticed 
 that the commons-compress library doesn't have an archiver for it. It's as 
 old as tar and fills a similar niche but the later has become much more 
 widely used. Dump includes support for sparse files, extended attributes, mac 
 os finder, SELinux labels (I think), and more. Incremental  dumps can capture 
 that files have been deleted.
 I should have initial support for a decoder this weekend. I can read the 
 directory entries and inode information (file permissions, etc.) but need a 
 bit more work on extracting the content as an InputStream.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MATH-621) BOBYQA is missing in optimization

2011-08-15 Thread Dr. Dietmar Wolz (JIRA)

 [ 
https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dr. Dietmar Wolz updated MATH-621:
--

Attachment: BOBYQAOptimizer0.4.zip

No changes from the perl generated code 
beside the ones necessary to get INDEX_OFFSET=0 working. Introduced 
INDEX_OFFSET where possible but there were
many other adaptions necessary (just compare the perl generated code with the 
attachment). Version 0.3 had some useful 
additional minor changes/refactorings missing here (see remarks below),
but the main work for 0.3 was the index change, and this we have here again. 
Remarks:

1) The perl script has damaged the for loop intendation

2) n, npt and nptm should be global variables and not set separately
in each method

3) System generated locals: Declare variables in the scope they are needed and
not method-globally if not necessary

4) testDiagonalRosen() is a copy/paste leftover from CMAES, should be removed

5) We should shink about removing rescue as proposed by Mike Powell. 



 BOBYQA is missing in optimization
 -

 Key: MATH-621
 URL: https://issues.apache.org/jira/browse/MATH-621
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
Reporter: Dr. Dietmar Wolz
 Fix For: 3.0

 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, 
 BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, 
 bobyqaoptimizer0.4.zip, bobyqav0.3.zip

   Original Estimate: 8h
  Remaining Estimate: 8h

 During experiments with space flight trajectory optimizations I recently
 observed, that the direct optimization algorithm BOBYQA
 http://plato.asu.edu/ftp/other_software/bobyqa.zip
 from Mike Powell is significantly better than the simple Powell algorithm
 already in commons.math. It uses significantly lower function calls and is
 more reliable for high dimensional problems. You can replace CMA-ES in many
 more application cases by BOBYQA than by the simple Powell optimizer.
 I would like to contribute a Java port of the algorithm.
 I maintained the structure of the original FORTRAN code, so the
 code is fast but not very nice.
 License status: Michael Powell has sent the agreement via snail mail
 - it hasn't arrived yet.
 Progress: The attached patch relative to the trunk contains both the
 optimizer and the related unit tests - which are all green now.  
 Performance:
 Performance difference (number of function evaluations)
 PowellOptimizer / BOBYQA for different test functions (taken from
 the unit test of BOBYQA, dimension=13 for most of the
 tests. 
 Rosen = 9350 / 1283
 MinusElli = 118 / 59
 Elli = 223 / 58
 ElliRotated = 8626 / 1379
 Cigar = 353 / 60
 TwoAxes = 223 / 66
 CigTab = 362 / 60
 Sphere = 223 / 58
 Tablet = 223 / 58
 DiffPow = 421 / 928
 SsDiffPow = 614 / 219
 Ackley = 757 / 97
 Rastrigin = 340 / 64
 The number for DiffPow should be dicussed with Michael Powell,
 I will send him the details. 
 Open Problems:
 Some checkstyle violations because of the original Fortran source:
 - Original method comments were copied - doesn't follow javadoc standard
 - Multiple variable declarations in one line as in the original source
 - Problems related to goto conversions:
   gotos not convertible in loops were transated into a finite automata 
 (switch statement)
   no default in switch
   fall through from previos case in switch
   which usually are bad style make no sense here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-621) BOBYQA is missing in optimization

2011-08-15 Thread Gilles (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085061#comment-13085061
 ] 

Gilles commented on MATH-621:
-

Thanks for the work.
However, if I change the INDEX_OFFSET constant (setting it back to 1), the 
tests fail.
I see that you hard-coded the offset in most places instead of using 
INDEX_OFFSET. I still think that this place-holder would be useful to keep 
track of places where the index variables might have been set to fit with the 
Fortran 1-based counting... Don't you?

{quote}
The perl script has damaged the for loop intendation
{quote}
Sorry, I didn't see that. But that's easy to fix. I'll do it after the issue 
with INDEX_OFFSET is settled.

{quote}
n, npt and nptm should be global variables and not set separately
in each method
{quote}
Yes, I agree. But there are probably many other variables for which this is 
true (zmat, bmat, etc).

{quote}
System generated locals: Declare variables in the scope they are needed [...]
{quote}
Agreed, of course. I had started to do that mainly with d__1; then there are 
many cases where the same variable was reused whereas we would prefer to create 
yet another one with a more explicit name.

{quote}
testDiagonalRosen() is a copy/paste leftover from CMAES, should be removed
{quote}
OK, I'll do it in the next commit.

{quote}
We should shink about removing rescue as proposed by Mike Powell.
{quote}
I'm all for anything that leads to removing unnecessary lines of code :)
If you are indeed confident that, in most cases, the added complexity is not 
worth it, I'll just delete it.



 BOBYQA is missing in optimization
 -

 Key: MATH-621
 URL: https://issues.apache.org/jira/browse/MATH-621
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
Reporter: Dr. Dietmar Wolz
 Fix For: 3.0

 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, 
 BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, 
 bobyqaoptimizer0.4.zip, bobyqav0.3.zip

   Original Estimate: 8h
  Remaining Estimate: 8h

 During experiments with space flight trajectory optimizations I recently
 observed, that the direct optimization algorithm BOBYQA
 http://plato.asu.edu/ftp/other_software/bobyqa.zip
 from Mike Powell is significantly better than the simple Powell algorithm
 already in commons.math. It uses significantly lower function calls and is
 more reliable for high dimensional problems. You can replace CMA-ES in many
 more application cases by BOBYQA than by the simple Powell optimizer.
 I would like to contribute a Java port of the algorithm.
 I maintained the structure of the original FORTRAN code, so the
 code is fast but not very nice.
 License status: Michael Powell has sent the agreement via snail mail
 - it hasn't arrived yet.
 Progress: The attached patch relative to the trunk contains both the
 optimizer and the related unit tests - which are all green now.  
 Performance:
 Performance difference (number of function evaluations)
 PowellOptimizer / BOBYQA for different test functions (taken from
 the unit test of BOBYQA, dimension=13 for most of the
 tests. 
 Rosen = 9350 / 1283
 MinusElli = 118 / 59
 Elli = 223 / 58
 ElliRotated = 8626 / 1379
 Cigar = 353 / 60
 TwoAxes = 223 / 66
 CigTab = 362 / 60
 Sphere = 223 / 58
 Tablet = 223 / 58
 DiffPow = 421 / 928
 SsDiffPow = 614 / 219
 Ackley = 757 / 97
 Rastrigin = 340 / 64
 The number for DiffPow should be dicussed with Michael Powell,
 I will send him the details. 
 Open Problems:
 Some checkstyle violations because of the original Fortran source:
 - Original method comments were copied - doesn't follow javadoc standard
 - Multiple variable declarations in one line as in the original source
 - Problems related to goto conversions:
   gotos not convertible in loops were transated into a finite automata 
 (switch statement)
   no default in switch
   fall through from previos case in switch
   which usually are bad style make no sense here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-621) BOBYQA is missing in optimization

2011-08-15 Thread Dr. Dietmar Wolz (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085074#comment-13085074
 ] 

Dr. Dietmar Wolz commented on MATH-621:
---

{quote}
I see that you hard-coded the offset in most places instead of using 
INDEX_OFFSET. I still think that this place-holder would be useful to keep 
track of places where the index variables might have been set to fit with the 
Fortran 1-based counting... Don't you?

I am not convinced yet. I thought INDEX_OFFSET as a tool to support the 
conversion. If you don't use 
INDEX_OFFSET in the for loops (for int i = INDEX_OFFSET ...) I don't see why to 
introduce it artificially 
in other places. The final aim should be to get rid of the 
Fortran-Arrays/Matrices and have 0-based access. I don't see
it essential to maintain INDEX_OFFSET as a kind of back reference to the old 
Fortran code in the future. 
We have the unit tests as regression test. 

Just try to convert one method - lets say prelim - the way you want to have it. 
The working 0-based version 0.4 should make this easy. Then lets have a look at 
it. 
I suspect it to become rather ugly using INDEX_OFFSET in all places. But then we
also should convert the for loops as  (for int i = INDEX_OFFSET ...) so that 
the code runs
again with INDEX_OFFSET=1. If you then really think it is better this way, I 
will help to
convert the other methods. 



 BOBYQA is missing in optimization
 -

 Key: MATH-621
 URL: https://issues.apache.org/jira/browse/MATH-621
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
Reporter: Dr. Dietmar Wolz
 Fix For: 3.0

 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, 
 BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, 
 bobyqaoptimizer0.4.zip, bobyqav0.3.zip

   Original Estimate: 8h
  Remaining Estimate: 8h

 During experiments with space flight trajectory optimizations I recently
 observed, that the direct optimization algorithm BOBYQA
 http://plato.asu.edu/ftp/other_software/bobyqa.zip
 from Mike Powell is significantly better than the simple Powell algorithm
 already in commons.math. It uses significantly lower function calls and is
 more reliable for high dimensional problems. You can replace CMA-ES in many
 more application cases by BOBYQA than by the simple Powell optimizer.
 I would like to contribute a Java port of the algorithm.
 I maintained the structure of the original FORTRAN code, so the
 code is fast but not very nice.
 License status: Michael Powell has sent the agreement via snail mail
 - it hasn't arrived yet.
 Progress: The attached patch relative to the trunk contains both the
 optimizer and the related unit tests - which are all green now.  
 Performance:
 Performance difference (number of function evaluations)
 PowellOptimizer / BOBYQA for different test functions (taken from
 the unit test of BOBYQA, dimension=13 for most of the
 tests. 
 Rosen = 9350 / 1283
 MinusElli = 118 / 59
 Elli = 223 / 58
 ElliRotated = 8626 / 1379
 Cigar = 353 / 60
 TwoAxes = 223 / 66
 CigTab = 362 / 60
 Sphere = 223 / 58
 Tablet = 223 / 58
 DiffPow = 421 / 928
 SsDiffPow = 614 / 219
 Ackley = 757 / 97
 Rastrigin = 340 / 64
 The number for DiffPow should be dicussed with Michael Powell,
 I will send him the details. 
 Open Problems:
 Some checkstyle violations because of the original Fortran source:
 - Original method comments were copied - doesn't follow javadoc standard
 - Multiple variable declarations in one line as in the original source
 - Problems related to goto conversions:
   gotos not convertible in loops were transated into a finite automata 
 (switch statement)
   no default in switch
   fall through from previos case in switch
   which usually are bad style make no sense here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Gary D. Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085104#comment-13085104
 ] 

Gary D. Gregory commented on CODEC-127:
---

Sebb:

I get errors when I try your perl script on Windows with the latest perl (64 
bit) from ActiveState. Rather than use this space to figure out why, can you 
please run it again and check if we are done with this ticket? 

Thank you,
Gary

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; 
 */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085110#comment-13085110
 ] 

Sebb commented on CODEC-127:


What error do you get? Just curious.

I now get:

{code}
commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:110
  {m├Ânchengladbach, 664645214},
commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:130
  String[][] data = {{bergisch-gladbach, 174845214}, 
{M├╝ller-L├╝denscheidt, 65752682}};
commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:137
 {Meyer, M├╝ller},
commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:143
 {ganz, Gänse},
commons-codec-generics/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1222
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
commons-codec-generics/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93
 String[] names = { ácz, átz, Ignácz, Ignátz, Ignác };
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47
   { Nu├▒ez, spanish, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49
   { ─îapek, czech, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52
   { Küçük, turkish, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55
   { Ceauşescu, romanian, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57
   { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58
   { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59
   { ÎøÎö΃, hebrew, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60
   { ácz, any, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61
   { átz, any, EXACT } });
{code}

and

{code}
commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:110
 {m├Ânchengladbach, 664645214},
commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:130
   String[][] data = {{bergisch-gladbach, 174845214}, 
{M├╝ller-L├╝denscheidt, 65752682}};
commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:137
  {Meyer, M├╝ller},
commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:143
  {ganz, Gänse},
commons-codec/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227
  this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
commons-codec/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1232
  this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
commons-codec/src/test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93
  String[] names = { ácz, átz, Ignácz, Ignátz, Ignác };
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47
   { Nu├▒ez, spanish, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49
   { ─îapek, czech, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52
   { Küçük, turkish, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55
   { Ceauşescu, romanian, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57
   { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58
   { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59
   { ÎøÎö΃, hebrew, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60
   { ácz, any, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61
   { átz, any, EXACT } });
{code}

This was using an updated version of the script that uses File::Find to process 
directory traversal better.
(Some lines shortened above by manually removing leading spaces)

I think all the actual errors have now been 

[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Gary D. Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085115#comment-13085115
 ] 

Gary D. Gregory commented on CODEC-127:
---

That sounds good. Today, the code is not editable/maintainable.

There does not seem to be anything I can do in Eclipse to fix this just for 
viewing the chars correctly.

If the comments are left mangled, then they are not maintainable. If you change 
the code, then the comment should match. So I would not leave the comments 
mangled.

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; 
 */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Gary D. Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085116#comment-13085116
 ] 

Gary D. Gregory commented on CODEC-127:
---

If I run the command as is, I get:
{quote}
Can't open perl script ne: No such file or directory
{quote}

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; 
 */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb updated CODEC-127:
---

Description: 
Some of the test cases include characters in a native encoding (possibly 
UTF-8), rather than using Unicode escapes.

This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
compilation errors, which is how I found the issue), and possibly some 
transformations may corrupt the contents, e.g. fixing EOL.

I think we should have a rule of using Unicode escapes for all such non-ascii 
characters.
It's particularly important for non-ISO-8859-1 characters.

Some example classes with non-ascii characters:

{code}
binary\Base64Test.java:96 byte[] decode = 
b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
664645214},
language\ColognePhoneticTest.java:130 String[][] data = 
{{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
language\ColognePhoneticTest.java:143 {ganz, Gänse},
language\DoubleMetaphoneTest.java:1222 
this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
language\DoubleMetaphoneTest.java:1227 
this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
this.getSoundexEncoder().encode(´┐¢));
language\SoundexTest.java:375 Assert.assertEquals(, 
this.getSoundexEncoder().encode(´┐¢));
language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
this.getSoundexEncoder().encode(´┐¢));
language\SoundexTest.java:395 Assert.assertEquals(, 
this.getSoundexEncoder().encode(´┐¢));
{code}

The characters are probably not correct above, because I used a crude perl 
script to find them:

{code}
perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; 
*/*.java
{code}

language\SoundexTest.java:367 in particular is incorrect, because it's supposed 
to be a single character.

Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
gives:

if (Character.isLetter('\ufffd'))

which is an unknown character.

Similarly for binary\Base64Test.java:96.

It's not all that clear what the Unicode escapes should be in these cases, but 
probably not the unknown character.

[Possibly the characters got mangled at some point, or maybe they have always 
been wrong]

The ColognePhoneticTest.java cases are less serious, as the characters are 
valid ISO-8859-1 (accented German), but given that the rest of the file uses 
unicode escaps, I think they should be changed too (but add comments to say 
what they are, e.g. o-umlaut, u-umlaut)

  was:
Some of the test cases include characters in a native encoding (possibly 
UTF-8), rather than using Unicode escapes.

This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
compilation errors, which is how I found the issue), and possibly some 
transformations may corrupt the contents, e.g. fixing EOL.

I think we should have a rule of using Unicode escapes for all such non-ascii 
characters.
It's particularly important for non-ISO-8859-1 characters.

Some example classes with non-ascii characters:

{code}
binary\Base64Test.java:96 byte[] decode = 
b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
664645214},
language\ColognePhoneticTest.java:130 String[][] data = 
{{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
language\ColognePhoneticTest.java:143 {ganz, Gänse},
language\DoubleMetaphoneTest.java:1222 
this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
language\DoubleMetaphoneTest.java:1227 
this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
this.getSoundexEncoder().encode(´┐¢));
language\SoundexTest.java:375 Assert.assertEquals(, 
this.getSoundexEncoder().encode(´┐¢));
language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
this.getSoundexEncoder().encode(´┐¢));
language\SoundexTest.java:395 Assert.assertEquals(, 
this.getSoundexEncoder().encode(´┐¢));
{code}

The characters are probably not correct above, because I used a crude perl 
script to find them:

{code}
perl ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; 
*/*.java
{code}

language\SoundexTest.java:367 in particular 

[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085128#comment-13085128
 ] 

Sebb commented on CODEC-127:


If you change Eclipse to set the container / resource / text file encoding to 
UTF-8 (since that is what the POM says) the files should display correctly 
assuming they really are UTF-8.

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Gary D. Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085134#comment-13085134
 ] 

Gary D. Gregory commented on CODEC-127:
---

All better with the test source folder set to UTF-8, which I thought I had 
done, but obviously not.

I am now a lot less worried about maintenance because the files are editable 
given the right editor settings. I am inclined to leave things as is.

Perhaps each file need a prominent Javadoc about using UTF-8 in editors.

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085135#comment-13085135
 ] 

Sebb commented on CODEC-127:


See my fix to ColognePhoneticTest in trunk.

That now shows native comments for all unicode escapes.

Two of the otherwise lowercase names were previously converted to the Unicode 
for upper case umlauts; I wonder if that was a mistake?

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Gary D. Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085137#comment-13085137
 ] 

Gary D. Gregory commented on CODEC-127:
---

If I run:

{quote}
perl -n -e $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
m/\P{ASCII}/;$s=$ARGV; */*.java
{quote}

I get:
{quote}
Can't open */*.java: Invalid argument.
{quote}


 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Gary D. Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085137#comment-13085137
 ] 

Gary D. Gregory edited comment on CODEC-127 at 8/15/11 3:51 PM:


If I run:

{noformat}
perl -n -e $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
m/\P{ASCII}/;$s=$ARGV; */*.java
{noformat}

I get:
{noformat}
Can't open */*.java: Invalid argument.
{noformat}


  was (Author: garydgregory):
If I run:

{quote}
perl -n -e $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
m/\P{ASCII}/;$s=$ARGV; */*.java
{quote}

I get:
{quote}
Can't open */*.java: Invalid argument.
{quote}

  
 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Gary D. Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085139#comment-13085139
 ] 

Gary D. Gregory commented on CODEC-127:
---

WRT:
{noformat}
Author: sebb
Date: Mon Aug 15 15:47:42 2011
New Revision: 1157892

URL: http://svn.apache.org/viewvc?rev=1157892view=rev
Log:
CODEC-127 Convert to use Unicode in strings, but add comments in native 
encoding (utf-8)
{noformat}

I am having second thoughts here. If you cannot edit UTF-8, you cannot edit and 
maintain the files because if you change the Unicode escape in the code, you 
must change the comment to match. So now, I am favoring leaving the code as it 
was before...

Thoughts?


 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (POOL-99) Test for idle time exceeded in borrowObject

2011-08-15 Thread Rob Eamon (JIRA)

[ 
https://issues.apache.org/jira/browse/POOL-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085143#comment-13085143
 ] 

Rob Eamon commented on POOL-99:
---

In some cases for object pools, when the object idle time exceeds a threshold 
is it no longer a valid/usable object (e.g. a DB connection). Pool clients need 
to be able to determine if an object has been idle for more than X seconds so 
that such objects will not be used (they are no longer valid and will cause 
exceptions to be thrown). Either the pool itself should enforce it via settings 
or provide the information necessary for the pool client to do it in 
testOnBorrow.

 Test  for idle time exceeded in borrowObject
 

 Key: POOL-99
 URL: https://issues.apache.org/jira/browse/POOL-99
 Project: Commons Pool
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Rob Eamon
Priority: Minor
 Fix For: 2.0


 For GenericObjectPool, the evictor thread performs a calculation to determine 
 if an idle object as expired. If it has, the object is destroyed.
 Would like borrowObject to perform the same test and destroy behavior.
 I explored using the testOnBorrow facility but the time that the object went 
 idle is not available. Only the pool has access to the ObjectTimestampPair 
 object that is used to record the time that the object was placed in the 
 pool. I explored placing a timestamp in the pooled object and can do that but 
 it would seem better if the pool managed that test itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085145#comment-13085145
 ] 

Sebb commented on CODEC-127:


Sorry, forgot I was using a local module which handles DOS wildcards, see

http://docs.activestate.com/activeperl/5.14/lib/pods/perlwin32.html#command_line_wildcard_expansion

Either pass each file in separately, or create Wild.pm and use:

{code}
perl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
m/\P{ASCII}/;$s=$ARGV; */*.java
{code}

Wild.pm only works for one level of directories.

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (POOL-99) Test for idle time exceeded in borrowObject

2011-08-15 Thread Mark Thomas (JIRA)

[ 
https://issues.apache.org/jira/browse/POOL-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085147#comment-13085147
 ] 

Mark Thomas commented on POOL-99:
-

In that scenario, simply execute a validation query (which is good practise 
anyway for DB connections which can fail for all sorts of reasons).

 Test  for idle time exceeded in borrowObject
 

 Key: POOL-99
 URL: https://issues.apache.org/jira/browse/POOL-99
 Project: Commons Pool
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Rob Eamon
Priority: Minor
 Fix For: 2.0


 For GenericObjectPool, the evictor thread performs a calculation to determine 
 if an idle object as expired. If it has, the object is destroyed.
 Would like borrowObject to perform the same test and destroy behavior.
 I explored using the testOnBorrow facility but the time that the object went 
 idle is not available. Only the pool has access to the ObjectTimestampPair 
 object that is used to record the time that the object was placed in the 
 pool. I explored placing a timestamp in the pooled object and can do that but 
 it would seem better if the pool managed that test itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085149#comment-13085149
 ] 

Sebb commented on CODEC-127:


It's not that one cannot edit UTF-8; the problem is that it is easy to mangle 
non-ASCII characters by mistake.

The safest is to only use ASCII, i.e. Unicode escapes, which are valid in both 
UTF-8 and ISO-8859-1 and all likely default encodings.

However, they are difficult to read, hence the comments on the lines.
If the comments get mangled, it will be obvious, because they won't look right; 
and it's relatively easy to fix them from the Unicode.

I don't think it's an option to use native characters in the non-comment code, 
because we already know they can get corrupted, and the corruption won't 
necessarily cause errors.

I don't see the harm in translating the code into commments; after all the 
translation can be done again.

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Gary D. Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085153#comment-13085153
 ] 

Gary D. Gregory commented on CODEC-127:
---

Roger that. I'm sold then.

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CHAIN-53) Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions

2011-08-15 Thread Elijah Zupancic (JIRA)
Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions
--

 Key: CHAIN-53
 URL: https://issues.apache.org/jira/browse/CHAIN-53
 Project: Commons Chain
  Issue Type: Improvement
Reporter: Elijah Zupancic


As posted in the mailing list, I've done this work outside of an offical branch.

Here is the source:
http://elijah.zupancic.name/projects/commons-chain-v2-proof-of-concept.tar.gz

And here is a diff:

http://elijah.zupancic.name/projects/uber-diff

In this patch:
* Global upgrade to the JDK 1.5
* Added @Override annotations
* Upgraded to the Servlet 2.5 API
* Upgraded to the Faces 2.1 API
* Upgraded to the Portlet 2.0 API
* Upgraded the Maven Parent POM version
* Added generics support to Command so that Command's API looks like:

public interface CommandT extends Context {
...
   boolean execute(T context) throws Exception;
}

I'm very much new to the ASF and I was advised to file a bug in order to get 
the process started for these changes to be integrated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Gary D. Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085156#comment-13085156
 ] 

Gary D. Gregory commented on CODEC-127:
---

Perl:

I did all that and I get:

{noformat}
C:\svn\org\apache\commons\trunks-proper\codecperl -MWild -ne $.=1 if $s ne 
$ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java
syntax error at -e line 1, near *.
Execution of -e aborted due to compilation errors.
{noformat}

I also have:

PERL5OPT=-MWild

in my environment.

Gary

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (POOL-99) Test for idle time exceeded in borrowObject

2011-08-15 Thread Rob Eamon (JIRA)

[ 
https://issues.apache.org/jira/browse/POOL-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085158#comment-13085158
 ] 

Rob Eamon commented on POOL-99:
---

The DB case was just an example. You're right that the testOnBorrow could do a 
simple validation query. But why do a validation query when one can know up 
front that the pool object is stale?

IMO, there is no reason for the pool to not at least provide the information 
for when the object went idle so that the pool client can determine for itself 
whether or not the object is valid. The pool client developer can make the 
determination about what's expensive and what isn't.

I understand the view that the idle notion of the pool is intended to avoid 
holding on to objects that are unlikely to be used, or at least not used for 
considerable time.

But unlikely to be used is awfully close to shouldn't be used. Given that 
the test is the same, why not leverage the idle time facilities?

 Test  for idle time exceeded in borrowObject
 

 Key: POOL-99
 URL: https://issues.apache.org/jira/browse/POOL-99
 Project: Commons Pool
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: Rob Eamon
Priority: Minor
 Fix For: 2.0


 For GenericObjectPool, the evictor thread performs a calculation to determine 
 if an idle object as expired. If it has, the object is destroyed.
 Would like borrowObject to perform the same test and destroy behavior.
 I explored using the testOnBorrow facility but the time that the object went 
 idle is not available. Only the pool has access to the ObjectTimestampPair 
 object that is used to record the time that the object was placed in the 
 pool. I explored placing a timestamp in the pooled object and can do that but 
 it would seem better if the pool managed that test itself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085145#comment-13085145
 ] 

Sebb edited comment on CODEC-127 at 8/15/11 4:55 PM:
-

Sorry, forgot I was using a local module which handles DOS wildcards, see

http://docs.activestate.com/activeperl/5.14/lib/pods/perlwin32.html#command_line_wildcard_expansion

Either pass each file in separately, or create Wild.pm and use:

{code}
perl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
m/\P{ASCII}/;$s=$ARGV; */*.java
{code}

Wild.pm only works for one level of directories.

  was (Author: s...@apache.org):
Sorry, forgot I was using a local module which handles DOS wildcards, see

http://docs.activestate.com/activeperl/5.14/lib/pods/perlwin32.html#command_line_wildcard_expansion

Either pass each file in separately, or create Wild.pm and use:

{code}
perl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
m/\P{ASCII}/;$s=$ARGV; */*.java
{code}

Wild.pm only works for one level of directories.
  
 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085165#comment-13085165
 ] 

Sebb commented on CODEC-127:


Sorry, closing  was in the wrong place; it should have been before the file 
name params

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CHAIN-53) Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions

2011-08-15 Thread Matt Benson (JIRA)

[ 
https://issues.apache.org/jira/browse/CHAIN-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085184#comment-13085184
 ] 

Matt Benson commented on CHAIN-53:
--

Hello again Elijah,
  I have looked over the diff; here are some comments:

* diffs should be attached/uploaded in JIRA, with the grant/feather radio 
button checked indicating your intent that the patch be licensed to the ASF (I 
know you sent the ICLA, but a. it wouldn't have been processed yet, and b. just 
humor us ;)  )
* I don't see anything in the Faces-related changes to warrant upgrading to JSF 
2.x.  MyFaces in particular makes every attempt to continue to support JSF 1.x 
versions, so in the spirit of good inter-ASF cooperation, we should probably 
just leave the API levels of the JSF dependency wherever they stood previously.
* At Commons we often repackage components when their APIs change incompatibly. 
 The changes you have submitted are overwhelmingly backward-compatible once 
type erasure has been taken into account.  What I particularly notice as being 
backward-incompatible are the {{Map}} implementations.  Since most of these 
have gone from raw {{Map}} to {{MapString, ?}} their {{put()}} methods now 
have different signatures.  In all cases except for 
{{oac.chain.web.servlet.ServletApplicationScopeMap}} these keys are required to 
be {{String}} instances at runtime anyway, so there is quite a minimal chance 
that code currently using these wouldn't recompile against these binaries.  In 
the last case, {{null}} keys are rejected and other objects are converted to 
{{String}} if necessary.  Once again, it seems rather unlikely that existing 
code would be utilizing this conversion code path.

The {{Map}} concerns are the only potential point of contention I see with 
regard to backward compatibility.  It would seem to me that [chain] is likely 
to sit rather high in the architecture of a given application, with little 
chance of multiple consumers competing at runtime.  For this reason my personal 
opinion is that the incompatibilities introduced in the process of generifying 
the provided {{Map}} implementations are small enough to consider the component 
backward-compatible _enough_ and accept this patch directly onto [chain]'s 
trunk.  I point the situation out here, however, in case other members of the 
community, particularly those with actual _experience_ with [chain], have 
conflicting opinions.

Thanks for your interest!

 Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions
 --

 Key: CHAIN-53
 URL: https://issues.apache.org/jira/browse/CHAIN-53
 Project: Commons Chain
  Issue Type: Improvement
Reporter: Elijah Zupancic
  Labels: newbie, patch

 As posted in the mailing list, I've done this work outside of an offical 
 branch.
 Here is the source:
 http://elijah.zupancic.name/projects/commons-chain-v2-proof-of-concept.tar.gz
 And here is a diff:
 http://elijah.zupancic.name/projects/uber-diff
 In this patch:
 * Global upgrade to the JDK 1.5
 * Added @Override annotations
 * Upgraded to the Servlet 2.5 API
 * Upgraded to the Faces 2.1 API
 * Upgraded to the Portlet 2.0 API
 * Upgraded the Maven Parent POM version
 * Added generics support to Command so that Command's API looks like:
 public interface CommandT extends Context {
 ...
boolean execute(T context) throws Exception;
 }
 I'm very much new to the ASF and I was advised to file a bug in order to get 
 the process started for these changes to be integrated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CHAIN-53) Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions

2011-08-15 Thread Elijah Zupancic (JIRA)

[ 
https://issues.apache.org/jira/browse/CHAIN-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085189#comment-13085189
 ] 

Elijah Zupancic commented on CHAIN-53:
--

Thanks for the comments Matt.

* I will revert back to the MyFaces 1.0 API.
* I could add put methods that support Object, Object and then cast them to 
the K, V types.
* I will upload the diff to the bug once I have reverted the MyFaces changes.
* Do we want to update the version to 2.0? It seems like it would make sense 
because we are supporting a newer JDK. Or since it is backwards-compatible 
would just doing a minor upgrade would be sufficient?

 Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions
 --

 Key: CHAIN-53
 URL: https://issues.apache.org/jira/browse/CHAIN-53
 Project: Commons Chain
  Issue Type: Improvement
Reporter: Elijah Zupancic
  Labels: newbie, patch

 As posted in the mailing list, I've done this work outside of an offical 
 branch.
 Here is the source:
 http://elijah.zupancic.name/projects/commons-chain-v2-proof-of-concept.tar.gz
 And here is a diff:
 http://elijah.zupancic.name/projects/uber-diff
 In this patch:
 * Global upgrade to the JDK 1.5
 * Added @Override annotations
 * Upgraded to the Servlet 2.5 API
 * Upgraded to the Faces 2.1 API
 * Upgraded to the Portlet 2.0 API
 * Upgraded the Maven Parent POM version
 * Added generics support to Command so that Command's API looks like:
 public interface CommandT extends Context {
 ...
boolean execute(T context) throws Exception;
 }
 I'm very much new to the ASF and I was advised to file a bug in order to get 
 the process started for these changes to be integrated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CHAIN-53) Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions

2011-08-15 Thread Matt Benson (JIRA)

[ 
https://issues.apache.org/jira/browse/CHAIN-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085234#comment-13085234
 ] 

Matt Benson commented on CHAIN-53:
--

I seem to recall that simply the upgrade to generics and hence, required Java 
version, justifies a major version bump.  Not a big deal just at the moment, 
however.

 Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions
 --

 Key: CHAIN-53
 URL: https://issues.apache.org/jira/browse/CHAIN-53
 Project: Commons Chain
  Issue Type: Improvement
Reporter: Elijah Zupancic
  Labels: newbie, patch

 As posted in the mailing list, I've done this work outside of an offical 
 branch.
 Here is the source:
 http://elijah.zupancic.name/projects/commons-chain-v2-proof-of-concept.tar.gz
 And here is a diff:
 http://elijah.zupancic.name/projects/uber-diff
 In this patch:
 * Global upgrade to the JDK 1.5
 * Added @Override annotations
 * Upgraded to the Servlet 2.5 API
 * Upgraded to the Faces 2.1 API
 * Upgraded to the Portlet 2.0 API
 * Upgraded the Maven Parent POM version
 * Added generics support to Command so that Command's API looks like:
 public interface CommandT extends Context {
 ...
boolean execute(T context) throws Exception;
 }
 I'm very much new to the ASF and I was advised to file a bug in order to get 
 the process started for these changes to be integrated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CHAIN-53) Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CHAIN-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085237#comment-13085237
 ] 

Sebb commented on CHAIN-53:
---

Major version bump is not required when changing minimum Java version (though 
would be sensible if making a major jump)

http://commons.apache.org/releases/versioning.html

 Global Update of Chain - Generics, JDK 1.5, Update Dependency Versions
 --

 Key: CHAIN-53
 URL: https://issues.apache.org/jira/browse/CHAIN-53
 Project: Commons Chain
  Issue Type: Improvement
Reporter: Elijah Zupancic
  Labels: newbie, patch

 As posted in the mailing list, I've done this work outside of an offical 
 branch.
 Here is the source:
 http://elijah.zupancic.name/projects/commons-chain-v2-proof-of-concept.tar.gz
 And here is a diff:
 http://elijah.zupancic.name/projects/uber-diff
 In this patch:
 * Global upgrade to the JDK 1.5
 * Added @Override annotations
 * Upgraded to the Servlet 2.5 API
 * Upgraded to the Faces 2.1 API
 * Upgraded to the Portlet 2.0 API
 * Upgraded the Maven Parent POM version
 * Added generics support to Command so that Command's API looks like:
 public interface CommandT extends Context {
 ...
boolean execute(T context) throws Exception;
 }
 I'm very much new to the ASF and I was advised to file a bug in order to get 
 the process started for these changes to be integrated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085242#comment-13085242
 ] 

Sebb commented on CODEC-127:


Actually, DoubleMetaphoneTest is still corrupt; fixing now.

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-621) BOBYQA is missing in optimization

2011-08-15 Thread Gilles (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085243#comment-13085243
 ] 

Gilles commented on MATH-621:
-

OK. Keeping INDEX_OFFSET might be more work than really useful. I'll remove it 
also.

 BOBYQA is missing in optimization
 -

 Key: MATH-621
 URL: https://issues.apache.org/jira/browse/MATH-621
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
Reporter: Dr. Dietmar Wolz
 Fix For: 3.0

 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, 
 BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, 
 bobyqaoptimizer0.4.zip, bobyqav0.3.zip

   Original Estimate: 8h
  Remaining Estimate: 8h

 During experiments with space flight trajectory optimizations I recently
 observed, that the direct optimization algorithm BOBYQA
 http://plato.asu.edu/ftp/other_software/bobyqa.zip
 from Mike Powell is significantly better than the simple Powell algorithm
 already in commons.math. It uses significantly lower function calls and is
 more reliable for high dimensional problems. You can replace CMA-ES in many
 more application cases by BOBYQA than by the simple Powell optimizer.
 I would like to contribute a Java port of the algorithm.
 I maintained the structure of the original FORTRAN code, so the
 code is fast but not very nice.
 License status: Michael Powell has sent the agreement via snail mail
 - it hasn't arrived yet.
 Progress: The attached patch relative to the trunk contains both the
 optimizer and the related unit tests - which are all green now.  
 Performance:
 Performance difference (number of function evaluations)
 PowellOptimizer / BOBYQA for different test functions (taken from
 the unit test of BOBYQA, dimension=13 for most of the
 tests. 
 Rosen = 9350 / 1283
 MinusElli = 118 / 59
 Elli = 223 / 58
 ElliRotated = 8626 / 1379
 Cigar = 353 / 60
 TwoAxes = 223 / 66
 CigTab = 362 / 60
 Sphere = 223 / 58
 Tablet = 223 / 58
 DiffPow = 421 / 928
 SsDiffPow = 614 / 219
 Ackley = 757 / 97
 Rastrigin = 340 / 64
 The number for DiffPow should be dicussed with Michael Powell,
 I will send him the details. 
 Open Problems:
 Some checkstyle violations because of the original Fortran source:
 - Original method comments were copied - doesn't follow javadoc standard
 - Multiple variable declarations in one line as in the original source
 - Problems related to goto conversions:
   gotos not convertible in loops were transated into a finite automata 
 (switch statement)
   no default in switch
   fall through from previos case in switch
   which usually are bad style make no sense here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Gary D. Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085261#comment-13085261
 ] 

Gary D. Gregory commented on CODEC-127:
---

Arg:
{noformat}
C:\svn\org\apache\commons\trunks-proper\codecperl -MWild -ne $.=1 if $s ne 
$ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java
Can't open */*.java: Invalid argument.
{noformat}


 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085266#comment-13085266
 ] 

Sebb commented on CODEC-127:


Tried it here; works fine.

Probably an error in your Wild.pm, because I see the same if I omit the -MWild 
option.

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Gary D. Gregory (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085269#comment-13085269
 ] 

Gary D. Gregory commented on CODEC-127:
---

Can you post your .pm here or email to ggregory at apache dot org? 

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb updated CODEC-127:
---

Comment: was deleted

(was: Sebb:

I get errors when I try your perl script on Windows with the latest perl (64 
bit) from ActiveState. Rather than use this space to figure out why, can you 
please run it again and check if we are done with this ticket? 

Thank you,
Gary)

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; .java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb updated CODEC-127:
---

Comment: was deleted

(was: Sorry, closing  was in the wrong place; it should have been before the 
file name params)

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; */*.java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb updated CODEC-127:
---

Description: 
Some of the test cases include characters in a native encoding (possibly 
UTF-8), rather than using Unicode escapes.

This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
compilation errors, which is how I found the issue), and possibly some 
transformations may corrupt the contents, e.g. fixing EOL.

I think we should have a rule of using Unicode escapes for all such non-ascii 
characters.
It's particularly important for non-ISO-8859-1 characters.

Some example classes with non-ascii characters:

{code}
binary\Base64Test.java:96 byte[] decode = 
b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
664645214},
language\ColognePhoneticTest.java:130 String[][] data = 
{{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
language\ColognePhoneticTest.java:143 {ganz, Gänse},
language\DoubleMetaphoneTest.java:1222 
this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
language\DoubleMetaphoneTest.java:1227 
this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
this.getSoundexEncoder().encode(´┐¢));
language\SoundexTest.java:375 Assert.assertEquals(, 
this.getSoundexEncoder().encode(´┐¢));
language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
this.getSoundexEncoder().encode(´┐¢));
language\SoundexTest.java:395 Assert.assertEquals(, 
this.getSoundexEncoder().encode(´┐¢));
{code}

The characters are probably not correct above, because I used a crude perl 
script to find them:

{code}
perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; 
.java
{code}

language\SoundexTest.java:367 in particular is incorrect, because it's supposed 
to be a single character.

Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
gives:

if (Character.isLetter('\ufffd'))

which is an unknown character.

Similarly for binary\Base64Test.java:96.

It's not all that clear what the Unicode escapes should be in these cases, but 
probably not the unknown character.

[Possibly the characters got mangled at some point, or maybe they have always 
been wrong]

The ColognePhoneticTest.java cases are less serious, as the characters are 
valid ISO-8859-1 (accented German), but given that the rest of the file uses 
unicode escaps, I think they should be changed too (but add comments to say 
what they are, e.g. o-umlaut, u-umlaut)

  was:
Some of the test cases include characters in a native encoding (possibly 
UTF-8), rather than using Unicode escapes.

This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
compilation errors, which is how I found the issue), and possibly some 
transformations may corrupt the contents, e.g. fixing EOL.

I think we should have a rule of using Unicode escapes for all such non-ascii 
characters.
It's particularly important for non-ISO-8859-1 characters.

Some example classes with non-ascii characters:

{code}
binary\Base64Test.java:96 byte[] decode = 
b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
664645214},
language\ColognePhoneticTest.java:130 String[][] data = 
{{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
language\ColognePhoneticTest.java:143 {ganz, Gänse},
language\DoubleMetaphoneTest.java:1222 
this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
language\DoubleMetaphoneTest.java:1227 
this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
this.getSoundexEncoder().encode(´┐¢));
language\SoundexTest.java:375 Assert.assertEquals(, 
this.getSoundexEncoder().encode(´┐¢));
language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
this.getSoundexEncoder().encode(´┐¢));
language\SoundexTest.java:395 Assert.assertEquals(, 
this.getSoundexEncoder().encode(´┐¢));
{code}

The characters are probably not correct above, because I used a crude perl 
script to find them:

{code}
perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; 
*/*.java
{code}

language\SoundexTest.java:367 in 

[jira] [Updated] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb updated CODEC-127:
---

Comment: was deleted

(was: If I run the command as is, I get:
{quote}
Can't open perl script ne: No such file or directory
{quote})

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; .java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb updated CODEC-127:
---

Comment: was deleted

(was: Can you post your .pm here or email to ggregory at apache dot org? )

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; .java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085110#comment-13085110
 ] 

Sebb edited comment on CODEC-127 at 8/15/11 8:07 PM:
-

I now get:

{code}
commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:110
  {m├Ânchengladbach, 664645214},
commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:130
  String[][] data = {{bergisch-gladbach, 174845214}, 
{M├╝ller-L├╝denscheidt, 65752682}};
commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:137
 {Meyer, M├╝ller},
commons-codec-generics/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:143
 {ganz, Gänse},
commons-codec-generics/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1222
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
commons-codec-generics/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93
 String[] names = { ácz, átz, Ignácz, Ignátz, Ignác };
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47
   { Nu├▒ez, spanish, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49
   { ─îapek, czech, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52
   { Küçük, turkish, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55
   { Ceauşescu, romanian, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57
   { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58
   { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59
   { ÎøÎö΃, hebrew, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60
   { ácz, any, EXACT },
commons-codec-generics/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61
   { átz, any, EXACT } });
{code}

and

{code}
commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:110
 {m├Ânchengladbach, 664645214},
commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:130
   String[][] data = {{bergisch-gladbach, 174845214}, 
{M├╝ller-L├╝denscheidt, 65752682}};
commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:137
  {Meyer, M├╝ller},
commons-codec/src/test/org/apache/commons/codec/language/ColognePhoneticTest.java:143
  {ganz, Gänse},
commons-codec/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1227
  this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
commons-codec/src/test/org/apache/commons/codec/language/DoubleMetaphoneTest.java:1232
  this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
commons-codec/src/test/org/apache/commons/codec/language/bm/BeiderMorseEncoderTest.java:93
  String[] names = { ácz, átz, Ignácz, Ignátz, Ignác };
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:47
   { Nu├▒ez, spanish, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:49
   { ─îapek, czech, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:52
   { Küçük, turkish, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:55
   { Ceauşescu, romanian, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:57
   { ╬æ╬│╬│╬Á╬╗¤î¤Ç╬┐¤à╬╗╬┐¤é, greek, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:58
   { ðƒÐâÐêð║ð©ð¢, cyrillic, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:59
   { ÎøÎö΃, hebrew, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:60
   { ácz, any, EXACT },
commons-codec/src/test/org/apache/commons/codec/language/bm/LanguageGuessingTest.java:61
   { átz, any, EXACT } });
{code}

This was using an updated version of the script that uses File::Find to process 
directory traversal better.
(Some lines shortened above by manually removing leading spaces)

I think all the actual errors have now 

[jira] [Updated] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb updated CODEC-127:
---

Comment: was deleted

(was: Typo - missing hyphen for flags)

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; .java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb updated CODEC-127:
---

Comment: was deleted

(was: Tried it here; works fine.

Probably an error in your Wild.pm, because I see the same if I omit the -MWild 
option.)

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; .java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb updated CODEC-127:
---

Comment: was deleted

(was: Perl:

I did all that and I get:

{noformat}
C:\svn\org\apache\commons\trunks-proper\codecperl -MWild -ne $.=1 if $s ne 
$ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java
syntax error at -e line 1, near *.
Execution of -e aborted due to compilation errors.
{noformat}

I also have:

PERL5OPT=-MWild

in my environment.

Gary)

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; .java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb updated CODEC-127:
---

Comment: was deleted

(was: Arg:
{noformat}
C:\svn\org\apache\commons\trunks-proper\codecperl -MWild -ne $.=1 if $s ne 
$ARGV;print qq($ARGV:$. $_) if m/\P{ASCII}/;$s=$ARGV; */*.java
Can't open */*.java: Invalid argument.
{noformat}
)

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; .java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb updated CODEC-127:
---

Comment: was deleted

(was: Sorry, forgot I was using a local module which handles DOS wildcards, see

http://docs.activestate.com/activeperl/5.14/lib/pods/perlwin32.html#command_line_wildcard_expansion

Either pass each file in separately, or create Wild.pm and use:

{code}
perl -MWild -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
m/\P{ASCII}/;$s=$ARGV; */*.java
{code}

Wild.pm only works for one level of directories.)

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; .java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

 [ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb updated CODEC-127:
---

Comment: was deleted

(was: If I run:

{noformat}
perl -n -e $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
m/\P{ASCII}/;$s=$ARGV; */*.java
{noformat}

I get:
{noformat}
Can't open */*.java: Invalid argument.
{noformat}
)

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; .java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CODEC-127) Non-ascii characters in source files

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/CODEC-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085301#comment-13085301
 ] 

Sebb commented on CODEC-127:


I think all the files are now fixed so that the code uses Unicode escapes; the 
only non-ASCII characters are now in comments.

 Non-ascii characters in source files
 

 Key: CODEC-127
 URL: https://issues.apache.org/jira/browse/CODEC-127
 Project: Commons Codec
  Issue Type: Bug
Reporter: Sebb

 Some of the test cases include characters in a native encoding (possibly 
 UTF-8), rather than using Unicode escapes.
 This can cause a problem for IDEs if they don't know the encoding (e.g. cause 
 compilation errors, which is how I found the issue), and possibly some 
 transformations may corrupt the contents, e.g. fixing EOL.
 I think we should have a rule of using Unicode escapes for all such non-ascii 
 characters.
 It's particularly important for non-ISO-8859-1 characters.
 Some example classes with non-ascii characters:
 {code}
 binary\Base64Test.java:96 byte[] decode = 
 b64.decode(SGVsbG{´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢}8gV29ybGQ=);
 language\ColognePhoneticTest.java:110 {m├Ânchengladbach, 
 664645214},
 language\ColognePhoneticTest.java:130 String[][] data = 
 {{bergisch-gladbach, 174845214}, {M├╝ller-L├╝denscheidt, 65752682}};
 language\ColognePhoneticTest.java:137 {Meyer, M├╝ller},
 language\ColognePhoneticTest.java:143 {ganz, Gänse},
 language\DoubleMetaphoneTest.java:1222 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, S);
 language\DoubleMetaphoneTest.java:1227 
 this.getDoubleMetaphone().isDoubleMetaphoneEqual(´┐¢, N);
 language\SoundexTest.java:367 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:369 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:375 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:387 if (Character.isLetter('´┐¢')) {
 language\SoundexTest.java:389 Assert.assertEquals(´┐¢000, 
 this.getSoundexEncoder().encode(´┐¢));
 language\SoundexTest.java:395 Assert.assertEquals(, 
 this.getSoundexEncoder().encode(´┐¢));
 {code}
 The characters are probably not correct above, because I used a crude perl 
 script to find them:
 {code}
 perl -ne $.=1 if $s ne $ARGV;print qq($ARGV:$. $_) if 
 m/\P{ASCII}/;$s=$ARGV; .java
 {code}
 language\SoundexTest.java:367 in particular is incorrect, because it's 
 supposed to be a single character.
 Now one might think that native2ascii -encoding UTF-8 would fix that, but it 
 gives:
 if (Character.isLetter('\ufffd'))
 which is an unknown character.
 Similarly for binary\Base64Test.java:96.
 It's not all that clear what the Unicode escapes should be in these cases, 
 but probably not the unknown character.
 [Possibly the characters got mangled at some point, or maybe they have always 
 been wrong]
 The ColognePhoneticTest.java cases are less serious, as the characters are 
 valid ISO-8859-1 (accented German), but given that the rest of the file uses 
 unicode escaps, I think they should be changed too (but add comments to say 
 what they are, e.g. o-umlaut, u-umlaut)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-646) Unmodifiable views of RealVector

2011-08-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MATH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085303#comment-13085303
 ] 

Sébastien Brisard commented on MATH-646:


{quote}
Rather than an issue of large source file, the issue is whether this class 
should part of the public API.
Personally I think that it shouldn't
{quote}
I agree, that's the reason why I suggested we make this class private. No 
problem, I'll make it a nested, anonymous class within the 
{{unmodifiableRealVector()}} method.
{quote}
I'm suspicious that it is possible to call setIndex on the supposedly 
unmodifiable entry. Maybe that it is harmless?
{quote}
I have checked that calling {{setIndex}} is indeed harmless while iterating 
over the vector in question. However, in my view, this method sould not be 
visible.

Thanks for your detailed review of the code. I'll have these errors corrected 
by the end of this week, if that's OK with you.

 Unmodifiable views of RealVector
 

 Key: MATH-646
 URL: https://issues.apache.org/jira/browse/MATH-646
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
Reporter: Sébastien Brisard
  Labels: linear, vector
 Attachments: MATH-646.patch


 The issue has been discussed on the [mailing 
 list|http://mail-archives.apache.org/mod_mbox/commons-dev/201108.mbox/CAGRH7HqxUb2y1HmFt9VJ-kxsXwipk_MdO0D=rnuazmgpnot...@mail.gmail.com].
  Please find attached a proposal for a new class {{UnmodifiableRealVector}}. 
 I chose not to nest it in {{AbstractRealVector}} because it would make the 
 corresponding file huge. Therefore, {{UnmodifiableRealVector}} is {{final}}. 
 Maybe you'd like it to be {{private}} as well? A static method is provided in 
 {{AbstractRealVector}} to build an {{UnmodifiableRealVector}} from any 
 {{RealVector}}.
 Tests are also provided. Since iterating through different implementations of 
 {{RealVector}} is actually different, a test is provided for 
 {{UnmodifiableRealVector}} built on {{ArrayRealVector}} and 
 {{OpenMapRealVector}}. These tests both derive from the same abstract test 
 class. Hope everything works fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CONFIGURATION-460) reloadStrategy does not work for files inside additional tag using DefaultConfigurationBuilder

2011-08-15 Thread Oliver Heger (JIRA)

 [ 
https://issues.apache.org/jira/browse/CONFIGURATION-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Oliver Heger resolved CONFIGURATION-460.


   Resolution: Fixed
Fix Version/s: 1.7

A fix was applied in SVN revision 1157982. Thank you for reporting this.

 reloadStrategy does not work for files inside additional tag using 
 DefaultConfigurationBuilder
 

 Key: CONFIGURATION-460
 URL: https://issues.apache.org/jira/browse/CONFIGURATION-460
 Project: Commons Configuration
  Issue Type: Bug
  Components: File reloading
Affects Versions: 1.6
 Environment: Linux x86_64
Reporter: Azfar Kazmi
Assignee: Oliver Heger
 Fix For: 1.7


 In the configuration file that DefaultConfigurationBuilder reads to build a 
 CombinedConfiguration, it's possible to include configuration file either 
 inside override or additional xml elements.
 Each such declaration, of a file, allows a realodStrategy to be specified 
 (see example below). It appears that the reload occurs only for the files 
 inside override and not for the ones inside additional.
 Example:
 configuration
   header
 result forceReloadCheck=true
   expressionEngine 
 config-class=org.apache.commons.configuration.tree.xpath.XPathExpressionEngine/
 /result
   /header
   override
 properties fileName=user.properties config-optional=true
   reloadingStrategy refreshDelay=100
  
 config-class=org.apache.commons.configuration.reloading.FileChangedReloadingStrategy/
 /properties
   /override
   additional
 properties fileName=application.properties
   reloadingStrategy refreshDelay=100
  
 config-class=org.apache.commons.configuration.reloading.FileChangedReloadingStrategy/
 /properties
   /additional
 /configuration
 In above example, both user.properties and application.properties are 
 supposed to reload upon change. However, as tested by the following code, one 
 user.properties gets reloaded:
   DefaultConfigurationBuilder dcb = new 
 DefaultConfigurationBuilder(example.xml);
   Configuration conf = dcb.getConfiguration();
   System.out.println(user:  + conf.getBoolean(user));
   System.out.println(application:  + 
 conf.getBoolean(application));
   System.out.println(Change files and then press  to 
 continue...);
   System.in.read();
   
   System.out.println(user:  + conf.getBoolean(user));
   System.out.println(application:  + 
 conf.getBoolean(application));
 Output from above code:
 user: true
 application: true
 Change files and then press  to continue...
 0 [main] INFO org.apache.commons.configuration.PropertiesConfiguration  - 
 Reloading configuration. URL is file:snipped/user.properties
 user: false
 application: true

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-646) Unmodifiable views of RealVector

2011-08-15 Thread Gilles (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085329#comment-13085329
 ] 

Gilles commented on MATH-646:
-

There must be one public (or with package access) class in each Java source 
file.
But you can have additional ones (without access qualifier), not necessarily 
nested. Thus, in AbstractRealVector.java:
{code}
public class AbstractRealVector implements RealVector {
  // ...

  public static RealVector unmodifiableRealVector(RealVector v) {
return new UnmodifiableRealVector(v);
  }
}

class UnmodifiableRealVector implements RealVector {
  // ...
}
{code}

This makes for slightly less cluttered code.


 Unmodifiable views of RealVector
 

 Key: MATH-646
 URL: https://issues.apache.org/jira/browse/MATH-646
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
Reporter: Sébastien Brisard
  Labels: linear, vector
 Attachments: MATH-646.patch


 The issue has been discussed on the [mailing 
 list|http://mail-archives.apache.org/mod_mbox/commons-dev/201108.mbox/CAGRH7HqxUb2y1HmFt9VJ-kxsXwipk_MdO0D=rnuazmgpnot...@mail.gmail.com].
  Please find attached a proposal for a new class {{UnmodifiableRealVector}}. 
 I chose not to nest it in {{AbstractRealVector}} because it would make the 
 corresponding file huge. Therefore, {{UnmodifiableRealVector}} is {{final}}. 
 Maybe you'd like it to be {{private}} as well? A static method is provided in 
 {{AbstractRealVector}} to build an {{UnmodifiableRealVector}} from any 
 {{RealVector}}.
 Tests are also provided. Since iterating through different implementations of 
 {{RealVector}} is actually different, a test is provided for 
 {{UnmodifiableRealVector}} built on {{ArrayRealVector}} and 
 {{OpenMapRealVector}}. These tests both derive from the same abstract test 
 class. Hope everything works fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (JCI-67) Dubious use of mkdirs() return code

2011-08-15 Thread Sebb (JIRA)
Dubious use of mkdirs() return code
---

 Key: JCI-67
 URL: https://issues.apache.org/jira/browse/JCI-67
 Project: Commons JCI
  Issue Type: Bug
Reporter: Sebb
Priority: Minor


FileRestoreStore.java uses mkdirs() as follows:

{code}
final File parent = file.getParentFile();
if (!parent.exists()) {
if (!parent.mkdirs()) {
throw new IOException(could not create + parent);
}
}
{code}

Now mkdirs() returns true *only* if the method actually created the 
directories; it's theoretically possible for the directory to be created in the 
window between the exists() and mkdirs() invocations.

Also, the initial exists() call is redundant, because that's what mkdirs() does 
anyway (in the RI implementation, at least).

I suggest the following instead:

{code}
final File parent = file.getParentFile();
if (!parent.mkdirs()  !parent.exists()) {
throw new IOException(could not create + parent);
}
}
{code}

If mkdirs() returns false, the code then checks to see if the directory exists, 
so the throws clause will only be invoked if the parent really cannot be 
created.

The same code also appears in AbstractTestCase and 
FilesystemAlterationMonitorTestCase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-621) BOBYQA is missing in optimization

2011-08-15 Thread Gilles (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085332#comment-13085332
 ] 

Gilles commented on MATH-621:
-

1-based indexing issue solved in revision 1158015.


 BOBYQA is missing in optimization
 -

 Key: MATH-621
 URL: https://issues.apache.org/jira/browse/MATH-621
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
Reporter: Dr. Dietmar Wolz
 Fix For: 3.0

 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, 
 BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, 
 bobyqaoptimizer0.4.zip, bobyqav0.3.zip

   Original Estimate: 8h
  Remaining Estimate: 8h

 During experiments with space flight trajectory optimizations I recently
 observed, that the direct optimization algorithm BOBYQA
 http://plato.asu.edu/ftp/other_software/bobyqa.zip
 from Mike Powell is significantly better than the simple Powell algorithm
 already in commons.math. It uses significantly lower function calls and is
 more reliable for high dimensional problems. You can replace CMA-ES in many
 more application cases by BOBYQA than by the simple Powell optimizer.
 I would like to contribute a Java port of the algorithm.
 I maintained the structure of the original FORTRAN code, so the
 code is fast but not very nice.
 License status: Michael Powell has sent the agreement via snail mail
 - it hasn't arrived yet.
 Progress: The attached patch relative to the trunk contains both the
 optimizer and the related unit tests - which are all green now.  
 Performance:
 Performance difference (number of function evaluations)
 PowellOptimizer / BOBYQA for different test functions (taken from
 the unit test of BOBYQA, dimension=13 for most of the
 tests. 
 Rosen = 9350 / 1283
 MinusElli = 118 / 59
 Elli = 223 / 58
 ElliRotated = 8626 / 1379
 Cigar = 353 / 60
 TwoAxes = 223 / 66
 CigTab = 362 / 60
 Sphere = 223 / 58
 Tablet = 223 / 58
 DiffPow = 421 / 928
 SsDiffPow = 614 / 219
 Ackley = 757 / 97
 Rastrigin = 340 / 64
 The number for DiffPow should be dicussed with Michael Powell,
 I will send him the details. 
 Open Problems:
 Some checkstyle violations because of the original Fortran source:
 - Original method comments were copied - doesn't follow javadoc standard
 - Multiple variable declarations in one line as in the original source
 - Problems related to goto conversions:
   gotos not convertible in loops were transated into a finite automata 
 (switch statement)
   no default in switch
   fall through from previos case in switch
   which usually are bad style make no sense here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-621) BOBYQA is missing in optimization

2011-08-15 Thread Gilles (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085334#comment-13085334
 ] 

Gilles commented on MATH-621:
-

Removed testDiagonalRosen unit test in revision 1158017.

 BOBYQA is missing in optimization
 -

 Key: MATH-621
 URL: https://issues.apache.org/jira/browse/MATH-621
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
Reporter: Dr. Dietmar Wolz
 Fix For: 3.0

 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, 
 BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, 
 bobyqaoptimizer0.4.zip, bobyqav0.3.zip

   Original Estimate: 8h
  Remaining Estimate: 8h

 During experiments with space flight trajectory optimizations I recently
 observed, that the direct optimization algorithm BOBYQA
 http://plato.asu.edu/ftp/other_software/bobyqa.zip
 from Mike Powell is significantly better than the simple Powell algorithm
 already in commons.math. It uses significantly lower function calls and is
 more reliable for high dimensional problems. You can replace CMA-ES in many
 more application cases by BOBYQA than by the simple Powell optimizer.
 I would like to contribute a Java port of the algorithm.
 I maintained the structure of the original FORTRAN code, so the
 code is fast but not very nice.
 License status: Michael Powell has sent the agreement via snail mail
 - it hasn't arrived yet.
 Progress: The attached patch relative to the trunk contains both the
 optimizer and the related unit tests - which are all green now.  
 Performance:
 Performance difference (number of function evaluations)
 PowellOptimizer / BOBYQA for different test functions (taken from
 the unit test of BOBYQA, dimension=13 for most of the
 tests. 
 Rosen = 9350 / 1283
 MinusElli = 118 / 59
 Elli = 223 / 58
 ElliRotated = 8626 / 1379
 Cigar = 353 / 60
 TwoAxes = 223 / 66
 CigTab = 362 / 60
 Sphere = 223 / 58
 Tablet = 223 / 58
 DiffPow = 421 / 928
 SsDiffPow = 614 / 219
 Ackley = 757 / 97
 Rastrigin = 340 / 64
 The number for DiffPow should be dicussed with Michael Powell,
 I will send him the details. 
 Open Problems:
 Some checkstyle violations because of the original Fortran source:
 - Original method comments were copied - doesn't follow javadoc standard
 - Multiple variable declarations in one line as in the original source
 - Problems related to goto conversions:
   gotos not convertible in loops were transated into a finite automata 
 (switch statement)
   no default in switch
   fall through from previos case in switch
   which usually are bad style make no sense here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (JCI-67) Dubious use of mkdirs() return code

2011-08-15 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/JCI-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085339#comment-13085339
 ] 

Sebb commented on JCI-67:
-

Safer would be the following, as it checks the path is actually a directory:

{code}
final File parent = file.getParentFile();
if (!parent.mkdirs()  !parent.isDirectory()) {
throw new IOException(could not create + parent);
}
}
{code}

 Dubious use of mkdirs() return code
 ---

 Key: JCI-67
 URL: https://issues.apache.org/jira/browse/JCI-67
 Project: Commons JCI
  Issue Type: Bug
Reporter: Sebb
Priority: Minor

 FileRestoreStore.java uses mkdirs() as follows:
 {code}
 final File parent = file.getParentFile();
 if (!parent.exists()) {
 if (!parent.mkdirs()) {
 throw new IOException(could not create + parent);
 }
 }
 {code}
 Now mkdirs() returns true *only* if the method actually created the 
 directories; it's theoretically possible for the directory to be created in 
 the window between the exists() and mkdirs() invocations.
 Also, the initial exists() call is redundant, because that's what mkdirs() 
 does anyway (in the RI implementation, at least).
 I suggest the following instead:
 {code}
 final File parent = file.getParentFile();
 if (!parent.mkdirs()  !parent.exists()) {
 throw new IOException(could not create + parent);
 }
 }
 {code}
 If mkdirs() returns false, the code then checks to see if the directory 
 exists, so the throws clause will only be invoked if the parent really cannot 
 be created.
 The same code also appears in AbstractTestCase and 
 FilesystemAlterationMonitorTestCase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MATH-621) BOBYQA is missing in optimization

2011-08-15 Thread Gilles (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085340#comment-13085340
 ] 

Gilles commented on MATH-621:
-

Commenting out rescue (line 671) makes the testRescue test fail, as 
expected. So, if I also remove the test, we are fine. However, do you know 
whether I can also remove the whole case 190 (lines 667-697) as well as any 
code that references that state (e.g. lines 791-796, 846-851, 2597-2599, 
etc.)?


 BOBYQA is missing in optimization
 -

 Key: MATH-621
 URL: https://issues.apache.org/jira/browse/MATH-621
 Project: Commons Math
  Issue Type: New Feature
Affects Versions: 3.0
Reporter: Dr. Dietmar Wolz
 Fix For: 3.0

 Attachments: BOBYQA.math.patch, BOBYQA.v02.math.patch, 
 BOBYQAOptimizer0.4.zip, bobyqa.zip, bobyqa_convert.pl, 
 bobyqaoptimizer0.4.zip, bobyqav0.3.zip

   Original Estimate: 8h
  Remaining Estimate: 8h

 During experiments with space flight trajectory optimizations I recently
 observed, that the direct optimization algorithm BOBYQA
 http://plato.asu.edu/ftp/other_software/bobyqa.zip
 from Mike Powell is significantly better than the simple Powell algorithm
 already in commons.math. It uses significantly lower function calls and is
 more reliable for high dimensional problems. You can replace CMA-ES in many
 more application cases by BOBYQA than by the simple Powell optimizer.
 I would like to contribute a Java port of the algorithm.
 I maintained the structure of the original FORTRAN code, so the
 code is fast but not very nice.
 License status: Michael Powell has sent the agreement via snail mail
 - it hasn't arrived yet.
 Progress: The attached patch relative to the trunk contains both the
 optimizer and the related unit tests - which are all green now.  
 Performance:
 Performance difference (number of function evaluations)
 PowellOptimizer / BOBYQA for different test functions (taken from
 the unit test of BOBYQA, dimension=13 for most of the
 tests. 
 Rosen = 9350 / 1283
 MinusElli = 118 / 59
 Elli = 223 / 58
 ElliRotated = 8626 / 1379
 Cigar = 353 / 60
 TwoAxes = 223 / 66
 CigTab = 362 / 60
 Sphere = 223 / 58
 Tablet = 223 / 58
 DiffPow = 421 / 928
 SsDiffPow = 614 / 219
 Ackley = 757 / 97
 Rastrigin = 340 / 64
 The number for DiffPow should be dicussed with Michael Powell,
 I will send him the details. 
 Open Problems:
 Some checkstyle violations because of the original Fortran source:
 - Original method comments were copied - doesn't follow javadoc standard
 - Multiple variable declarations in one line as in the original source
 - Problems related to goto conversions:
   gotos not convertible in loops were transated into a finite automata 
 (switch statement)
   no default in switch
   fall through from previos case in switch
   which usually are bad style make no sense here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (IO-280) Dubious use of mkdirs() return code

2011-08-15 Thread Sebb (JIRA)
Dubious use of mkdirs() return code
---

 Key: IO-280
 URL: https://issues.apache.org/jira/browse/IO-280
 Project: Commons IO
  Issue Type: Bug
Reporter: Sebb
Priority: Minor


FileUtils.openOutputStream() has the following code:

{code}
File parent = file.getParentFile();
if (parent != null  parent.exists() == false) {
if (parent.mkdirs() == false) {
throw new IOException(File ' + file + ' could not be created);
}
}
{code}

Now mkdirs() returns true only if the method actually created the directories; 
it's theoretically possible for the directory to be created in the window 
between the exists() and mkdirs() invocations. [Indeed the class actually 
checks for this in the forceMkdir() method]

It would be safer to use:

{code}
File parent = file.getParentFile();
if (parent != null  !parent.mkdirs()  !parent.isDirectory()) {
throw new IOException(Directory ' + parent + ' could not be 
created); // note changed text
}
}
{code}

Similarly elsewhere in the class where mkdirs() is used.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-132) Add support for unix dump files

2011-08-15 Thread Stefan Bodewig (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085525#comment-13085525
 ] 

Stefan Bodewig commented on COMPRESS-132:
-

Later revisions have fixed some issues detected by findbugs, made some methods 
less public or fixed javadocs, so it has changed quite a bit.

My initial attempt to run your testcase resulted in Java spinning in an 
infinite loop, I'll investigate this further.

I tried to create a dump file on my Linux box - preferably one that has the 
same contents as src/test/resources/bla.* in Compress' trunk source tree - but 
failed so far.  Cursory reading of the manual page is obviously not enough to 
make it work.  Right now I don't know what to make of

{noformat}
stefan@birdy:~/cc$ sudo dump -v -f bla.dump test1.xml test2.xml 
  DUMP: Date of this level 0 dump: Tue Aug 16 06:34:18 2011
  DUMP: Dumping /dev/sda6 (/home (dir /stefan/cc/test1.xml)) to bla.dump
  DUMP: Excluding inode 8 (journal inode) from dump
  DUMP: Excluding inode 7 (resize inode) from dump
  DUMP: Label: none
  DUMP: Writing 10 Kilobyte records
  DUMP: mapping (Pass I) [regular files]
/dev/sda6: File not found by ext2_lookup while translating .xml
{noformat}


 Add support for unix dump files
 ---

 Key: COMPRESS-132
 URL: https://issues.apache.org/jira/browse/COMPRESS-132
 Project: Commons Compress
  Issue Type: New Feature
  Components: Archivers
Reporter: Bear Giles
Priority: Minor
 Fix For: 1.3

 Attachments: dump-20110722.zip, dump.zip, test-z.dump, test.dump


 I'm submitting a series of patches to the ext2/3/4 dump utility and noticed 
 that the commons-compress library doesn't have an archiver for it. It's as 
 old as tar and fills a similar niche but the later has become much more 
 widely used. Dump includes support for sparse files, extended attributes, mac 
 os finder, SELinux labels (I think), and more. Incremental  dumps can capture 
 that files have been deleted.
 I should have initial support for a decoder this weekend. I can read the 
 directory entries and inode information (file permissions, etc.) but need a 
 bit more work on extracting the content as an InputStream.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (DAEMON-213) procun log rotation support

2011-08-15 Thread viola.lu (JIRA)

[ 
https://issues.apache.org/jira/browse/DAEMON-213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085536#comment-13085536
 ] 

viola.lu commented on DAEMON-213:
-

But jsvc https://issues.apache.org/jira/browse/DAEMON-95, it also supports log 
rotation by catching signal user1, can we try this way on windows procun?

 procun log rotation support
 ---

 Key: DAEMON-213
 URL: https://issues.apache.org/jira/browse/DAEMON-213
 Project: Commons Daemon
  Issue Type: Improvement
  Components: Procrun
Affects Versions: 1.0.4, 1.0.5, 1.0.6
 Environment: os: winxp
Reporter: viola.lu
Priority: Minor
 Fix For: Nightly Builds


 currently, procun doesn't support log rotation. Should add an option

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Closed] (DBUTILS-79) fillStatement doesn't complain when there are too few parameters

2011-08-15 Thread Henri Yandell (JIRA)

 [ 
https://issues.apache.org/jira/browse/DBUTILS-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henri Yandell closed DBUTILS-79.


Resolution: Fixed

Resolved in r1158109 per your patch in DBUTILS-78.

 fillStatement doesn't complain when there are too few parameters
 

 Key: DBUTILS-79
 URL: https://issues.apache.org/jira/browse/DBUTILS-79
 Project: Commons DbUtils
  Issue Type: Bug
Affects Versions: 1.3
Reporter: William R. Speirs
 Fix For: 1.4


 Unless I'm reading the code incorrectly, it appears that the fillStatement 
 function does not complain if you provide too few parameters. For example, if 
 you supply an SQL statement like: select * from blah where ? = ?; but only 
 provide a single parameter test, fillStatement returns without issue. 
 However, only the first ? is actually set.
 Granted, this will almost always cause an exception to be thrown by the 
 driver, but since there is already a check for too many parameters, why not 
 check for too few as well?
 (FYI: I came across this bug, and a few others in my AsyncQueryRunner 
 implementation, while re-writing the unit tests to use Mockito.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira