[jira] [Commented] (COMPRESS-327) Support in-memory processing for ZipFile
[ https://issues.apache.org/jira/browse/COMPRESS-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570949#comment-15570949 ] Stefan Bodewig commented on COMPRESS-327: - The past months have been very quiet for compress so this is the only bigger change since 1.12. There are a few other open bugs where we are waiting for feedback. I can't promise anything but wouldn't expect the next release to be more than a few weeks away. > Support in-memory processing for ZipFile > > > Key: COMPRESS-327 > URL: https://issues.apache.org/jira/browse/COMPRESS-327 > Project: Commons Compress > Issue Type: New Feature >Reporter: Brett Kail >Priority: Minor > Fix For: 1.13 > > Attachments: > 0001-Add-a-SeekableInputStream-and-some-subclasses-that-Z.patch > > > ZipFile (and SevenZFile) currently require a File argument, but it would be > nice to support in-memory byte buffers rather than requiring temp files. > Perhaps create a new SeekableInputStream class (or SeekableDataInput > interface) and add corresponding constructors. > For convenience, perhaps also add a utility class that wraps a ByteBuffer > and/or byte[] and implements the new interface. > (The sevenz package appears to have a similar limitation, so it might make > sense to add the support there at the same time, but I personally don't have > a need for that.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COMPRESS-327) Support in-memory processing for ZipFile
[ https://issues.apache.org/jira/browse/COMPRESS-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570033#comment-15570033 ] Peter B. commented on COMPRESS-327: --- [~bodewig] this is great news! Any time-wise plans on release holding this feature you could share with us? > Support in-memory processing for ZipFile > > > Key: COMPRESS-327 > URL: https://issues.apache.org/jira/browse/COMPRESS-327 > Project: Commons Compress > Issue Type: New Feature >Reporter: Brett Kail >Priority: Minor > Fix For: 1.13 > > Attachments: > 0001-Add-a-SeekableInputStream-and-some-subclasses-that-Z.patch > > > ZipFile (and SevenZFile) currently require a File argument, but it would be > nice to support in-memory byte buffers rather than requiring temp files. > Perhaps create a new SeekableInputStream class (or SeekableDataInput > interface) and add corresponding constructors. > For convenience, perhaps also add a utility class that wraps a ByteBuffer > and/or byte[] and implements the new interface. > (The sevenz package appears to have a similar limitation, so it might make > sense to add the support there at the same time, but I personally don't have > a need for that.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (COMPRESS-361) Add support for SeekableByteChannel
[ https://issues.apache.org/jira/browse/COMPRESS-361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Bodewig resolved COMPRESS-361. - Resolution: Duplicate duplicate of COMPRESS-327 > Add support for SeekableByteChannel > --- > > Key: COMPRESS-361 > URL: https://issues.apache.org/jira/browse/COMPRESS-361 > Project: Commons Compress > Issue Type: New Feature >Reporter: Gary Gregory > > Support {{SeekableByteChannel}} to make {{ZipFile}} and {{SevenZFile}} usable > for non-Files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COMPRESS-327) Support in-memory processing for ZipFile
[ https://issues.apache.org/jira/browse/COMPRESS-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569249#comment-15569249 ] Stefan Bodewig commented on COMPRESS-327: - zip should now work as well, I'll close this ticket once I've updated the docs and at least a little bit optimized 7z > Support in-memory processing for ZipFile > > > Key: COMPRESS-327 > URL: https://issues.apache.org/jira/browse/COMPRESS-327 > Project: Commons Compress > Issue Type: New Feature >Reporter: Brett Kail >Priority: Minor > Fix For: 1.13 > > Attachments: > 0001-Add-a-SeekableInputStream-and-some-subclasses-that-Z.patch > > > ZipFile (and SevenZFile) currently require a File argument, but it would be > nice to support in-memory byte buffers rather than requiring temp files. > Perhaps create a new SeekableInputStream class (or SeekableDataInput > interface) and add corresponding constructors. > For convenience, perhaps also add a utility class that wraps a ByteBuffer > and/or byte[] and implements the new interface. > (The sevenz package appears to have a similar limitation, so it might make > sense to add the support there at the same time, but I personally don't have > a need for that.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (COMPRESS-327) Support in-memory processing for ZipFile
[ https://issues.apache.org/jira/browse/COMPRESS-327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Bodewig updated COMPRESS-327: Fix Version/s: 1.13 > Support in-memory processing for ZipFile > > > Key: COMPRESS-327 > URL: https://issues.apache.org/jira/browse/COMPRESS-327 > Project: Commons Compress > Issue Type: New Feature >Reporter: Brett Kail >Priority: Minor > Fix For: 1.13 > > Attachments: > 0001-Add-a-SeekableInputStream-and-some-subclasses-that-Z.patch > > > ZipFile (and SevenZFile) currently require a File argument, but it would be > nice to support in-memory byte buffers rather than requiring temp files. > Perhaps create a new SeekableInputStream class (or SeekableDataInput > interface) and add corresponding constructors. > For convenience, perhaps also add a utility class that wraps a ByteBuffer > and/or byte[] and implements the new interface. > (The sevenz package appears to have a similar limitation, so it might make > sense to add the support there at the same time, but I personally don't have > a need for that.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (COMPRESS-327) Support in-memory processing for ZipFile
[ https://issues.apache.org/jira/browse/COMPRESS-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569010#comment-15569010 ] Stefan Bodewig commented on COMPRESS-327: - Current git master's 7z package is able to deal with {{SeekableByteChannel}} and doesn't require {{RandomAccessFile}} anymore. I plan to do the same to the zip package. The current code could be improved to use cached {{ByteBuffer}}s in order to reduce memory consumption. We may want to add an implementation of {{SeekableByteChannel}} wrapped around a {{byte[]}}}. > Support in-memory processing for ZipFile > > > Key: COMPRESS-327 > URL: https://issues.apache.org/jira/browse/COMPRESS-327 > Project: Commons Compress > Issue Type: New Feature >Reporter: Brett Kail >Priority: Minor > Fix For: 1.13 > > Attachments: > 0001-Add-a-SeekableInputStream-and-some-subclasses-that-Z.patch > > > ZipFile (and SevenZFile) currently require a File argument, but it would be > nice to support in-memory byte buffers rather than requiring temp files. > Perhaps create a new SeekableInputStream class (or SeekableDataInput > interface) and add corresponding constructors. > For convenience, perhaps also add a utility class that wraps a ByteBuffer > and/or byte[] and implements the new interface. > (The sevenz package appears to have a similar limitation, so it might make > sense to add the support there at the same time, but I personally don't have > a need for that.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (LANG-1196) Provide way to set random number generator on RandomStringUtils to enable repeatable test execution
[ https://issues.apache.org/jira/browse/LANG-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568822#comment-15568822 ] Gilles commented on LANG-1196: -- bq. easy fix would be to pass the Random implementation in the constructor {{RandomSringUtils}} is a "_utility_" class: all methods are static. Passing an argument to the constructor will require to remove the _static_ keyword, which is not a compatible change. Better would be to define a specific interface (as suggested by Jochen on the ML). Even better would be to have a dedicated Commons component (for random-related utilities). There is a willingness to create one based on the RNG interface defined in http://commons.apache.org/proper/commons-rng/apidocs/org/apache/commons/rng/UniformRandomProvider.html Please have a look at the "dev" ML archive for recent discussions about the new "RNG" component. > Provide way to set random number generator on RandomStringUtils to enable > repeatable test execution > --- > > Key: LANG-1196 > URL: https://issues.apache.org/jira/browse/LANG-1196 > Project: Commons Lang > Issue Type: Improvement > Components: lang.* >Affects Versions: 3.4 > Environment: java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) > Linux 4.0.5 #3 SMP Mon Sep 14 12:41:09 BST 2015 x86_64 Intel(R) Core(TM) > i7-4710MQ CPU @ 2.50GHz GenuineIntel GNU/Linux >Reporter: Gus Power >Priority: Minor > > Hi, > I'm using [Sham > |http://search.maven.org/#artifactdetails%7Corg.shamdata%7Csham%7C0.3%7Cjar] > to generate realistic looking test data for both parameterized tests and user > acceptance testing. We log the seed that is used for each run so that if > there is an issue we can recreate exactly the same test data. I would also > like to use some of the commons-lang RandomStringUtils functionality but > notice that the implementation provides no way of setting the random number > generator to be used. > {code}private static final Random RANDOM = new Random();{code} > A way to configure this would be really useful. If there is an alternative > way to do this then that would be great. If you think it's a good idea and it > requires a patch I'm happy to supply one. > Cheers, > Gus. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MATH-1389) Runtime Improvement for getSubMatrix in Array2DRowRealMatrix
[ https://issues.apache.org/jira/browse/MATH-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Emmanuel Bourg resolved MATH-1389. -- Resolution: Fixed Fix Version/s: 3.7 4.0 I applied the patch on the 4.x and 3.x branches (with a minor optimization for the instantiation of the sub matrix). Thank you very much Christoph! > Runtime Improvement for getSubMatrix in Array2DRowRealMatrix > > > Key: MATH-1389 > URL: https://issues.apache.org/jira/browse/MATH-1389 > Project: Commons Math > Issue Type: Improvement >Reporter: Christoph Dibak >Priority: Trivial > Fix For: 4.0, 3.7 > > Attachments: 0001-faster-getSubMatrix-for-Array2DRowRealMatrix.patch, > MatrixBenchmark.java, RuntimeTestGetSubMatrix.java > > > Using System.arraycopy() for creating sub-matrices in the getSubMatrix() > method of Array2DRowRealMatrix improves the runtime. Tested for a matrix > with dimension 50x50, the execution time was 16 times faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (LANG-1094) Javadoc is not encoding spaces correctly
[ https://issues.apache.org/jira/browse/LANG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568284#comment-15568284 ] Bruno P. Kinoshita commented on LANG-1094: -- Hmmm, interesting. Just tried `mvn javadoc:javadoc` for [lang] master branch in my local environment, and the links now seem to be slightly different. {noformat} file:///../commons-lang/target/site/apidocs/org/apache/commons/lang3/time/DateUtils.html#isSameDay-java.util.Calendar-java.util.Calendar- {noformat} So now links contain the arguments separated by a single dash. Couldn't find any reference to this change in Java 8 release notes. Could someone else try it as well, please? If someone else confirms it is working now, I can try to chase the change in JDK. {noformat} $ java -version java version "1.8.0_101" Java(TM) SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) {noformat} > Javadoc is not encoding spaces correctly > > > Key: LANG-1094 > URL: https://issues.apache.org/jira/browse/LANG-1094 > Project: Commons Lang > Issue Type: Bug > Components: General >Reporter: Duncan Jones >Priority: Minor > > I've noticed the Javadocs include links to methods with spaces incorrectly > encoded. For example, the Javadocs for > [DateUtils|http://commons.apache.org/proper/commons-lang/javadocs/api-release/org/apache/commons/lang3/time/DateUtils.html] > describes a method: > {code:java} > public static boolean isSameDay(Calendar cal1, Calendar cal2) > {code} > The link to this is: > {noformat} > [...]/DateUtils.html#isSameDay(java.util.Calendar, java.util.Calendar) > {noformat} > whereas it should be: > {noformat} > [...]/DateUtils.html#isSameDay(java.util.Calendar,%20java.util.Calendar) > {noformat} > Not sure what's causing this problem. But it certainly hinders efforts to > link to our docs from other sites (like Stack Overflow). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] commons-lang issue #194: add isAllBlank,isNotAllBlank method for String "nul...
Github user kinow commented on the issue: https://github.com/apache/commons-lang/pull/194 Likewise @wangdongxun :-) are you albe to close this pull request? Otherwise I believe there is some integration in our infrastructure to let us close it. Please do not hesitate to submit other pull requests, issues or comment in our mailing lists. Thank you Bruno --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (LANG-1188) lang3/StringUtils.java:3302: warning: [unchecked] Possible heap pollution from parameterized vararg type T
[ https://issues.apache.org/jira/browse/LANG-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568178#comment-15568178 ] Bruno P. Kinoshita commented on LANG-1188: -- +1 [~s...@apache.org] for adding comments now in the code. Never used @SafeVarargs, but sounds like a good plan when we have >java 7 [~pascalschumacher] > lang3/StringUtils.java:3302: warning: [unchecked] Possible heap pollution > from parameterized vararg type T > -- > > Key: LANG-1188 > URL: https://issues.apache.org/jira/browse/LANG-1188 > Project: Commons Lang > Issue Type: Bug > Components: lang.* >Affects Versions: 3.4 > Environment: javac 1.8.0_25 >Reporter: Simon KRAMER >Priority: Minor > Original Estimate: 1h > Remaining Estimate: 1h > > commons-lang3-3.4-src/src/main/java/org/apache/commons/lang3/StringUtils.java:3302: > warning: [unchecked] Possible heap pollution from parameterized vararg type T > public static String join(final T... elements) { > ^ > usage: String.join(" ", stringarray) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (LANG-1196) Provide way to set random number generator on RandomStringUtils to enable repeatable test execution
[ https://issues.apache.org/jira/browse/LANG-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568166#comment-15568166 ] Bruno P. Kinoshita commented on LANG-1196: -- Hi Gus, Having a patch or pull request would definitely help reviewing your use case. I had a quick look at RandomStringUtils, and one easy fix would be to pass the Random implementation in the constructor. But java.util.Random inherits only form Object. Perhaps another object that wraps different random number generators? Feel free to update the ticket with comments or a patch or link to a pull request or repository. Cheers Bruno > Provide way to set random number generator on RandomStringUtils to enable > repeatable test execution > --- > > Key: LANG-1196 > URL: https://issues.apache.org/jira/browse/LANG-1196 > Project: Commons Lang > Issue Type: Improvement > Components: lang.* >Affects Versions: 3.4 > Environment: java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) > Linux 4.0.5 #3 SMP Mon Sep 14 12:41:09 BST 2015 x86_64 Intel(R) Core(TM) > i7-4710MQ CPU @ 2.50GHz GenuineIntel GNU/Linux >Reporter: Gus Power >Priority: Minor > > Hi, > I'm using [Sham > |http://search.maven.org/#artifactdetails%7Corg.shamdata%7Csham%7C0.3%7Cjar] > to generate realistic looking test data for both parameterized tests and user > acceptance testing. We log the seed that is used for each run so that if > there is an issue we can recreate exactly the same test data. I would also > like to use some of the commons-lang RandomStringUtils functionality but > notice that the implementation provides no way of setting the random number > generator to be used. > {code}private static final Random RANDOM = new Random();{code} > A way to configure this would be really useful. If there is an alternative > way to do this then that would be great. If you think it's a good idea and it > requires a patch I'm happy to supply one. > Cheers, > Gus. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (LANG-1196) Provide way to set random number generator on RandomStringUtils to enable repeatable test execution
[ https://issues.apache.org/jira/browse/LANG-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno P. Kinoshita updated LANG-1196: - Description: Hi, I'm using [Sham |http://search.maven.org/#artifactdetails%7Corg.shamdata%7Csham%7C0.3%7Cjar] to generate realistic looking test data for both parameterized tests and user acceptance testing. We log the seed that is used for each run so that if there is an issue we can recreate exactly the same test data. I would also like to use some of the commons-lang RandomStringUtils functionality but notice that the implementation provides no way of setting the random number generator to be used. {code}private static final Random RANDOM = new Random();{code} A way to configure this would be really useful. If there is an alternative way to do this then that would be great. If you think it's a good idea and it requires a patch I'm happy to supply one. Cheers, Gus. was: Hi, I'm using [Sham |http://search.maven.org/#artifactdetails%7Corg.shamdata%7Csham%7C0.3%7Cjar] to generate realistic looking test data for both parameterized tests and user acceptance testing. We log the seed that is used for each run so that if there is an issue we can recreate exactly the same test data. I would also like to use some of the commons-lang RandomStringUtils functionality but notice that the implementation provides no way of setting the random number generator to be used. {{ private static final Random RANDOM = new Random(); }} A way to configure this would be really useful. If there is an alternative way to do this then that would be great. If you think it's a good idea and it requires a patch I'm happy to supply one. Cheers, Gus. > Provide way to set random number generator on RandomStringUtils to enable > repeatable test execution > --- > > Key: LANG-1196 > URL: https://issues.apache.org/jira/browse/LANG-1196 > Project: Commons Lang > Issue Type: Improvement > Components: lang.* >Affects Versions: 3.4 > Environment: java version "1.8.0_66" > Java(TM) SE Runtime Environment (build 1.8.0_66-b17) > Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode) > Linux 4.0.5 #3 SMP Mon Sep 14 12:41:09 BST 2015 x86_64 Intel(R) Core(TM) > i7-4710MQ CPU @ 2.50GHz GenuineIntel GNU/Linux >Reporter: Gus Power >Priority: Minor > > Hi, > I'm using [Sham > |http://search.maven.org/#artifactdetails%7Corg.shamdata%7Csham%7C0.3%7Cjar] > to generate realistic looking test data for both parameterized tests and user > acceptance testing. We log the seed that is used for each run so that if > there is an issue we can recreate exactly the same test data. I would also > like to use some of the commons-lang RandomStringUtils functionality but > notice that the implementation provides no way of setting the random number > generator to be used. > {code}private static final Random RANDOM = new Random();{code} > A way to configure this would be really useful. If there is an alternative > way to do this then that would be great. If you think it's a good idea and it > requires a patch I'm happy to supply one. > Cheers, > Gus. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (LANG-1231) StringUtils#indexOfAny() methods with start position argument
[ https://issues.apache.org/jira/browse/LANG-1231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568151#comment-15568151 ] Bruno P. Kinoshita commented on LANG-1231: -- You are right that there is no such method [~gsavinov]. So if you have the following code. {code} String s = "paralalelepipedo"; int i = StringUtils.indexOfAny(s, "p"); System.out.println(i); {code} Would print 0, the first occurrence of 'p'. >Well, java.lang.String.substring(int beginIndex) can be used, so maybe it's Ok >to pass substring as an argument. That's correct as well. {code} String s = "paralalelepipedo"; int i = StringUtils.indexOfAny(s.substring(1), "p"); System.out.println(i); {code} Would print 9, the then first occurrence of 'p' in the substring (i.e. aralalelepipedo). You may have to account for the substring later (i.e. in the original string, the index would be 10, not 9). But I think it would be better to use the substring for now. What do you think? If you have other use cases for this, feel free to add a comment or send a pull request or patch :-) Cheers > StringUtils#indexOfAny() methods with start position argument > - > > Key: LANG-1231 > URL: https://issues.apache.org/jira/browse/LANG-1231 > Project: Commons Lang > Issue Type: New Feature > Components: lang.* >Affects Versions: 3.4 >Reporter: Guram Savinov >Priority: Minor > Labels: string > > There is no StringUtils#indexOfAny() methods with start position argument, > which would search for specified characters from the specified position. > Please add it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (JEXL-219) Blacklist by default in sandbox
[ https://issues.apache.org/jira/browse/JEXL-219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henri Biestro resolved JEXL-219. Resolution: Fixed Adding explicit white/black listing flag for default behavior of sandbox. src/main/java/org/apache/commons/jexl3/introspection/JexlSandbox.java src/test/java/org/apache/commons/jexl3/introspection/SandboxTest.java Committed revision 1764408. > Blacklist by default in sandbox > --- > > Key: JEXL-219 > URL: https://issues.apache.org/jira/browse/JEXL-219 > Project: Commons JEXL > Issue Type: Improvement >Affects Versions: 3.0 >Reporter: Henri Biestro >Assignee: Henri Biestro >Priority: Minor > Fix For: 3.1 > > > Originally a question from Wayne Robinson: > http://apache-commons.680414.n4.nabble.com/jexl-Blacklist-by-default-in-sandbox-td4690316.html > There is no way today to make a sandbox a blackbox by default; adding a flag > to determine whether the sandbox should consider that no explicit list (white > or black) on a given class means blacklisting or whitelisting it. Making it > explicit solves the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (JEXL-219) Blacklist by default in sandbox
Henri Biestro created JEXL-219: -- Summary: Blacklist by default in sandbox Key: JEXL-219 URL: https://issues.apache.org/jira/browse/JEXL-219 Project: Commons JEXL Issue Type: Improvement Affects Versions: 3.0 Reporter: Henri Biestro Assignee: Henri Biestro Priority: Minor Fix For: 3.1 Originally a question from Wayne Robinson: http://apache-commons.680414.n4.nabble.com/jexl-Blacklist-by-default-in-sandbox-td4690316.html There is no way today to make a sandbox a blackbox by default; adding a flag to determine whether the sandbox should consider that no explicit list (white or black) on a given class means blacklisting or whitelisting it. Making it explicit solves the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (LANG-1238) Overload Operations in StringUtils that take a regex to take precompiled Pattern
[ https://issues.apache.org/jira/browse/LANG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567995#comment-15567995 ] Bruno P. Kinoshita commented on LANG-1238: -- >It would be good to first identify how many methods we are talking about here. Here's what I could find (6) searching in Eclipse for 'regular' and 'regex' in StringUtils: - removeAll(:String, :String) - removeFirst(:String, :String) - replacePattern(:String, :String, :String) - removePattern(:String, :String) - replaceAll(:String, :String, :String) - replaceFirst(:String, :String, :String) > Overload Operations in StringUtils that take a regex to take precompiled > Pattern > > > Key: LANG-1238 > URL: https://issues.apache.org/jira/browse/LANG-1238 > Project: Commons Lang > Issue Type: Improvement > Components: lang.* >Affects Versions: 3.4 >Reporter: Christopher Cordeiro >Priority: Minor > > For performance reasons, it would be nice if the operations in StringUtils > that take a regular expression (removePattern/replacePattern) were overloaded > to optionally take a precompiled Pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEXT-21) Have a clear distinction between Edit Distance, String Similarity, Score, Metric, etc
[ https://issues.apache.org/jira/browse/TEXT-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567970#comment-15567970 ] Bruno P. Kinoshita edited comment on TEXT-21 at 10/12/16 7:59 AM: -- Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [A New Edit Distance Method for Finding Similarity in Dna Sequence |http://waset.org/publications/7178/a-new-edit-distance-method-for-finding-similarity-in-dna-sequence] * [Edit distance on Wikipedia|https://en.wikipedia.org/wiki/Edit_distance] was (Author: kinow): Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [Edit distance on Wikipedia|https://en.wikipedia.org/wiki/Edit_distance] > Have a clear distinction between Edit Distance, String Similarity, Score, > Metric, etc > - > > Key: TEXT-21 > URL: https://issues.apache.org/jira/browse/TEXT-21 > Project: Commons Text > Issue Type: Improvement >Reporter: Bruno P. Kinoshita >Assignee: Bruno P. Kinoshita > > From LANG-1269. > A user reported a nomenclature issue in [lang], which occurs in [text] as > well. > Currently we have an interface called EditDistance, with the following > implementations: > * CosineDistance > * HammingDistance > * JaroWrinklerDistance > * and LevenshteinDistance > JaroWrinkler is actually a similarity score, and not a distance. We have > other classes in the oact.similarity package too. > * CosineSimilarity > * FuzzyScore > We need to provide users a clear distinction on what we call an edit > distance, similarity or score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEXT-21) Have a clear distinction between Edit Distance, String Similarity, Score, Metric, etc
[ https://issues.apache.org/jira/browse/TEXT-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567970#comment-15567970 ] Bruno P. Kinoshita edited comment on TEXT-21 at 10/12/16 8:00 AM: -- Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [A New Edit Distance Method for Finding Similarity in Dna Sequence (PDF)|http://waset.org/publications/7178/a-new-edit-distance-method-for-finding-similarity-in-dna-sequence] * [Edit distance on Wikipedia|https://en.wikipedia.org/wiki/Edit_distance] was (Author: kinow): Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [A New Edit Distance Method for Finding Similarity in Dna Sequence |http://waset.org/publications/7178/a-new-edit-distance-method-for-finding-similarity-in-dna-sequence] * [Edit distance on Wikipedia|https://en.wikipedia.org/wiki/Edit_distance] > Have a clear distinction between Edit Distance, String Similarity, Score, > Metric, etc > - > > Key: TEXT-21 > URL: https://issues.apache.org/jira/browse/TEXT-21 > Project: Commons Text > Issue Type: Improvement >Reporter: Bruno P. Kinoshita >Assignee: Bruno P. Kinoshita > > From LANG-1269. > A user reported a nomenclature issue in [lang], which occurs in [text] as > well. > Currently we have an interface called EditDistance, with the following > implementations: > * CosineDistance > * HammingDistance > * JaroWrinklerDistance > * and LevenshteinDistance > JaroWrinkler is actually a similarity score, and not a distance. We have > other classes in the oact.similarity package too. > * CosineSimilarity > * FuzzyScore > We need to provide users a clear distinction on what we call an edit > distance, similarity or score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEXT-21) Have a clear distinction between Edit Distance, String Similarity, Score, Metric, etc
[ https://issues.apache.org/jira/browse/TEXT-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567973#comment-15567973 ] Bruno P. Kinoshita commented on TEXT-21: Similar libraries with distance/similarity/metrics. * https://github.com/Simmetrics/simmetrics * https://github.com/tdebatty/java-string-similarity/ > Have a clear distinction between Edit Distance, String Similarity, Score, > Metric, etc > - > > Key: TEXT-21 > URL: https://issues.apache.org/jira/browse/TEXT-21 > Project: Commons Text > Issue Type: Improvement >Reporter: Bruno P. Kinoshita >Assignee: Bruno P. Kinoshita > > From LANG-1269. > A user reported a nomenclature issue in [lang], which occurs in [text] as > well. > Currently we have an interface called EditDistance, with the following > implementations: > * CosineDistance > * HammingDistance > * JaroWrinklerDistance > * and LevenshteinDistance > JaroWrinkler is actually a similarity score, and not a distance. We have > other classes in the oact.similarity package too. > * CosineSimilarity > * FuzzyScore > We need to provide users a clear distinction on what we call an edit > distance, similarity or score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TEXT-21) Have a clear distinction between Edit Distance, String Similarity, Score, Metric, etc
[ https://issues.apache.org/jira/browse/TEXT-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567970#comment-15567970 ] Bruno P. Kinoshita edited comment on TEXT-21 at 10/12/16 7:59 AM: -- Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [Edit distance on Wikipedia|https://en.wikipedia.org/wiki/Edit_distance] was (Author: kinow): Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [Edit distance on Wikipedia](https://en.wikipedia.org/wiki/Edit_distance) > Have a clear distinction between Edit Distance, String Similarity, Score, > Metric, etc > - > > Key: TEXT-21 > URL: https://issues.apache.org/jira/browse/TEXT-21 > Project: Commons Text > Issue Type: Improvement >Reporter: Bruno P. Kinoshita >Assignee: Bruno P. Kinoshita > > From LANG-1269. > A user reported a nomenclature issue in [lang], which occurs in [text] as > well. > Currently we have an interface called EditDistance, with the following > implementations: > * CosineDistance > * HammingDistance > * JaroWrinklerDistance > * and LevenshteinDistance > JaroWrinkler is actually a similarity score, and not a distance. We have > other classes in the oact.similarity package too. > * CosineSimilarity > * FuzzyScore > We need to provide users a clear distinction on what we call an edit > distance, similarity or score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEXT-21) Have a clear distinction between Edit Distance, String Similarity, Score, Metric, etc
[ https://issues.apache.org/jira/browse/TEXT-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567970#comment-15567970 ] Bruno P. Kinoshita commented on TEXT-21: Some useful papers on the topic, and other resources too. * [A Comparison of Personal Name Matching: Techniques and Practical Issues (PDF)|http://users.cecs.anu.edu.au/~Peter.Christen/publications/tr-cs-06-02.pdf] * [A Comparison of String Distance Metrics for Name-Matching Tasks (PDF)|http://www.cs.cmu.edu/~wcohen/postscript/ijcai-ws-2003.pdf] * [Edit distance on Wikipedia](https://en.wikipedia.org/wiki/Edit_distance) > Have a clear distinction between Edit Distance, String Similarity, Score, > Metric, etc > - > > Key: TEXT-21 > URL: https://issues.apache.org/jira/browse/TEXT-21 > Project: Commons Text > Issue Type: Improvement >Reporter: Bruno P. Kinoshita >Assignee: Bruno P. Kinoshita > > From LANG-1269. > A user reported a nomenclature issue in [lang], which occurs in [text] as > well. > Currently we have an interface called EditDistance, with the following > implementations: > * CosineDistance > * HammingDistance > * JaroWrinklerDistance > * and LevenshteinDistance > JaroWrinkler is actually a similarity score, and not a distance. We have > other classes in the oact.similarity package too. > * CosineSimilarity > * FuzzyScore > We need to provide users a clear distinction on what we call an edit > distance, similarity or score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TEXT-21) Have a clear distinction between Edit Distance, String Similarity, Score, Metric, etc
Bruno P. Kinoshita created TEXT-21: -- Summary: Have a clear distinction between Edit Distance, String Similarity, Score, Metric, etc Key: TEXT-21 URL: https://issues.apache.org/jira/browse/TEXT-21 Project: Commons Text Issue Type: Improvement Reporter: Bruno P. Kinoshita Assignee: Bruno P. Kinoshita >From LANG-1269. A user reported a nomenclature issue in [lang], which occurs in [text] as well. Currently we have an interface called EditDistance, with the following implementations: * CosineDistance * HammingDistance * JaroWrinklerDistance * and LevenshteinDistance JaroWrinkler is actually a similarity score, and not a distance. We have other classes in the oact.similarity package too. * CosineSimilarity * FuzzyScore We need to provide users a clear distinction on what we call an edit distance, similarity or score. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (LANG-1269) Wrong name or result of StringUtils::getJaroWinklerDistance
[ https://issues.apache.org/jira/browse/LANG-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567947#comment-15567947 ] Bruno P. Kinoshita commented on LANG-1269: -- Indeed Benedikt. I'll file a ticket for [text]. If methods like this are going to be deprecated (and maybe removed in the 4.x release?) in [lang], then I'd think we should just add the @deprecated annotation to the method. Other wise, I'd be inclined to leave the method name as-is (so we keep binary compatibility), return 1 - currentResult as suggested by [~jmkeil]; and maybe update the Javadocs as well. > Wrong name or result of StringUtils::getJaroWinklerDistance > --- > > Key: LANG-1269 > URL: https://issues.apache.org/jira/browse/LANG-1269 > Project: Commons Lang > Issue Type: Bug >Affects Versions: 3.3, 3.4 >Reporter: Jan Martin Keil >Assignee: Bruno P. Kinoshita >Priority: Minor > > The name of the method StringUtils::getJaroWinklerDistance is misleading. > Currently for equal strings {{1}} is returned, for completely different > strings {{0}} is returned. That is a measure of similarity, not of a > distance. A distance must be {{0}} for equal strings. I read on the issues > LANG-591 and LANG-944, that it was decided to have a similar name to > StringUtils::getLevenshteinDistance, but that requires also the change of the > methods result. > Could you please (1) rename the method to > StringUtils::getJaroWinklerSimilarity or (2) change the method to return {{1 > - currentResult}}? > First option has the disadvantage to lose the similar naming of the similar > methods, second option implies the risk to unnoticed introduce bugs in > depending code. So I think it is preferable to use the first option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (LANG-1160) StringUtils.abbreviate() to support "custom ellipses" parameter
[ https://issues.apache.org/jira/browse/LANG-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567880#comment-15567880 ] Bruno P. Kinoshita commented on LANG-1160: -- Looks easy to be implemented. I can submit a pull request for this this weekend. [~britter] abbreviations will probably be included in [text] too. StringUtils has 8822 LOC at the moment. Do you reckon it is all right to add this new method on [lang], or would methods like this on StringUtils be good candidates to be moved to [text]? I'd be inclined to not add any more features to StringUtils if [text] is going to have similart code. Or should I drop an e-mail to the mailing list about it? > StringUtils.abbreviate() to support "custom ellipses" parameter > --- > > Key: LANG-1160 > URL: https://issues.apache.org/jira/browse/LANG-1160 > Project: Commons Lang > Issue Type: Improvement > Components: lang.* >Affects Versions: 3.4 >Reporter: Hendy Irawan >Priority: Trivial > > {{abbreviateMiddle()}} supports custom replacement string. > {{abbreviate()}} needs to also support this, for example to to use "…" > Unicode character instead of three "..." -- This message was sent by Atlassian JIRA (v6.3.4#6332)