TEXT-62: Fixing typographical errors, starting userguide
Project: http://git-wip-us.apache.org/repos/asf/commons-text/repo Commit: http://git-wip-us.apache.org/repos/asf/commons-text/commit/82542979 Tree: http://git-wip-us.apache.org/repos/asf/commons-text/tree/82542979 Diff: http://git-wip-us.apache.org/repos/asf/commons-text/diff/82542979 Branch: refs/heads/release Commit: 825429791a58976b6d6b7d03dec441d79d1409ae Parents: 212288b Author: Rob Tompkins <chtom...@gmail.com> Authored: Fri Jan 27 21:24:09 2017 -0500 Committer: Rob Tompkins <chtom...@gmail.com> Committed: Fri Jan 27 21:24:09 2017 -0500 ---------------------------------------------------------------------- RELEASE-NOTES.txt | 18 +- pom.xml | 4 +- src/assembly/src.xml | 6 + src/changes/changes.xml | 1 + .../text/beta/similarity/package-info.java | 1 + src/site/xdoc/userguide.xml | 331 ++++++++++++++++++- 6 files changed, 347 insertions(+), 14 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/commons-text/blob/82542979/RELEASE-NOTES.txt ---------------------------------------------------------------------- diff --git a/RELEASE-NOTES.txt b/RELEASE-NOTES.txt index 73c3a17..c2a44fb 100644 --- a/RELEASE-NOTES.txt +++ b/RELEASE-NOTES.txt @@ -13,6 +13,18 @@ Java environment. Apache Commons Text is a library focused on algorithms working on strings. +A NOTE ON THE HISTORY OF THE CODE +================================= + +The codebase began in the fall of 2014 as a location for housing algorithms for +operating on Strings that seemed to have a more complex nature than those which +would be considered a needed extension to java.lang. Thus, a new component, +different from Apache Commons Lang was warranted. As the project evolved, it was +noticed that Commons Lang had considerable more text manipulation tools than +the average Java application developer would need or even want. So, we have +decided to move the more esoteric String processing algorithms out of Commons +Lang into Commons Text. + JAVA 9 SUPPORT ============== @@ -46,6 +58,10 @@ o TEXT-9: Incorporate String algorithms from Commons Lang Thanks to britter. FIXED BUGS ========== +Note. We recognize the curoisity of a new component having "fixed bugs," but a +considerable number of files were migrated over from Commons Lang, some of which +needed fixes. + o TEXT-60: Upgrading Jacoco for Java 9-ea compatibility. Thanks to Lee Adcock. o TEXT-52: Possible attacks through StringEscapeUtils.escapeEcmaScrip better javadoc @@ -93,4 +109,4 @@ Apache Commons Text website: http://commons.apache.org/text/ Have fun! --Apachje Commons Text team \ No newline at end of file +-Apache Commons Text team \ No newline at end of file http://git-wip-us.apache.org/repos/asf/commons-text/blob/82542979/pom.xml ---------------------------------------------------------------------- diff --git a/pom.xml b/pom.xml index 55a4a95..ddeb5c0 100644 --- a/pom.xml +++ b/pom.xml @@ -124,8 +124,8 @@ <maven.compiler.target>1.7</maven.compiler.target> <commons.componentid>text</commons.componentid> - <!-- Current 3.x release series --> - <commons.release.version>1.0-beta-1</commons.release.version> + + <commons.release.version>1.0</commons.release.version> <commons.release.desc>(Java 7+)</commons.release.desc> <commons.jira.id>TEXT</commons.jira.id> http://git-wip-us.apache.org/repos/asf/commons-text/blob/82542979/src/assembly/src.xml ---------------------------------------------------------------------- diff --git a/src/assembly/src.xml b/src/assembly/src.xml index ab1da0d..48a8c7c 100644 --- a/src/assembly/src.xml +++ b/src/assembly/src.xml @@ -24,10 +24,16 @@ <fileSets> <fileSet> <includes> + <include>checkstyle.xml</include> + <include>checkstyle-supressions.xml</include> + <include>CONTRIBUTING.md</include> + <include>fb-excludes.xml</include> <include>LICENSE.txt</include> + <include>license-header.txt</include> <include>NOTICE.txt</include> <include>pom.xml</include> <include>PROPOSAL.html</include> + <include>README.md</include> <include>RELEASE-NOTES.txt</include> </includes> </fileSet> http://git-wip-us.apache.org/repos/asf/commons-text/blob/82542979/src/changes/changes.xml ---------------------------------------------------------------------- diff --git a/src/changes/changes.xml b/src/changes/changes.xml index 0f8ca6f..116a93d 100644 --- a/src/changes/changes.xml +++ b/src/changes/changes.xml @@ -46,6 +46,7 @@ The <action> type attribute can be add,update,fix,remove. <body> <release version="1.0-beta-1" date="2017-01-25" description="First release (beta) of Commons Text"> + <action issue="TEXT-62" type="fix" dev="chtompki">Incorporate suggestions from RC2 into 1.0 release</action> <action issue="TEXT-61" type="update" dev="chtompki" due-to="Lee Adcock">Naming packages org.apache.commons.text.beta</action> <action issue="TEXT-60" type="fix" dev="chtompki" due-to="Lee Adcock">Upgrading Jacoco for Java 9-ea compatibility.</action> <action issue="TEXT-58" type="update" dev="chtompki">Refactor EntityArrays to have unmodifiableMaps in leu of String[][]</action> http://git-wip-us.apache.org/repos/asf/commons-text/blob/82542979/src/main/java/org/apache/commons/text/beta/similarity/package-info.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/commons/text/beta/similarity/package-info.java b/src/main/java/org/apache/commons/text/beta/similarity/package-info.java index 914e45a..957901c 100644 --- a/src/main/java/org/apache/commons/text/beta/similarity/package-info.java +++ b/src/main/java/org/apache/commons/text/beta/similarity/package-info.java @@ -30,6 +30,7 @@ * <li>{@link org.apache.commons.text.beta.similarity.HammingDistance Hamming Distance}</li> * <li>{@link org.apache.commons.text.beta.similarity.JaroWinklerDistance Jaro-Winkler Distance}</li> * <li>{@link org.apache.commons.text.beta.similarity.LevenshteinDistance Levenshtein Distance}</li> + * <li>{@link org.apache.commons.text.beta.similarity.LongestCommonSubsequenceDistance Longest Commons Subsequence Distance}</li> * </ul> * * <p>The {@link org.apache.commons.text.beta.similarity.CosineDistance Cosine Distance} http://git-wip-us.apache.org/repos/asf/commons-text/blob/82542979/src/site/xdoc/userguide.xml ---------------------------------------------------------------------- diff --git a/src/site/xdoc/userguide.xml b/src/site/xdoc/userguide.xml index 27e4a7d..1c93b2d 100644 --- a/src/site/xdoc/userguide.xml +++ b/src/site/xdoc/userguide.xml @@ -6,27 +6,336 @@ this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at - http://www.apache.org/licenses/LICENSE-2.0 - Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> + <document> - <properties> - <title>Commons Text - User guide</title> - <author email="d...@commons.apache.org">Commons Documentation Team</author> - </properties> + <properties> + <title>Commons Text - User guide</title> + <author email="d...@commons.apache.org">Commons Documentation Team</author> + </properties> + + <body> + <!-- $Id$ --> + + <section name='User guide for Commons "Text"'> + <div align="center"> + <h1>The Commons <em>Text</em> Package + </h1> + <h2>Users Guide</h2> + <br/> + <a href="#Description">[Description]</a> + <a href="#text.beta.">[text.beta.*]</a> + <a href="#text.beta.diff.">[text.beta.diff.*]</a> + <a href="#text.beta.similarity.">[text.beta.similarity.*]</a> + <a href="#text.beta.translate.">[text.beta.translate.*]</a> + <br/> + <br/> + </div> + </section> + + <section name="Description"> + <p>The Commons Text library provides additions to the standard JDK's + java.lang package. Very generic, very reusable components for everyday + use. + </p> + <p>The text package was added in Commons Lang 2.2. It provides, amongst + other classes, a replacement for StringBuffer named <code> + StrBuilder</code>, a class for substituting variables within a String + named <code>StrSubstitutor</code> and a replacement for StringTokenizer + named <code>StrTokenizer</code>. While somewhat ungainly, the <code> + Str + </code> prefix has been used to ensure we don't clash with any current + or future standard Java classes. + </p> + </section> + + <section name="text.beta.*"> + <!-- + AlphabetConverter + Builder + CharacterPredicate + CharacterPredicates + CompositeFormat + ExtendedMessageFormat + FormatFactory + FormattableUtils + StrLookup + StrSubstitutor + StrBuilder + StrMatcher + StrTokenizer + StringEscapeUtils + --> + <p>Originally the text package was added in Commons Lang 2.2. However, its + new home is here. It provides, amongst other + classes, a replacement for <code>StringBuffer</code> named <code> + StrBuilder</code>, a class for substituting variables within a String + named <code>StrSubstitutor</code> and a replacement for StringTokenizer + named <code>StrTokenizer</code>. While somewhat ungainly, the <code> + Str + </code> prefix has been used to ensure we don't clash with any current + or future standard Java classes. + </p> + + <subsection name="String manipulation - StringEscapeUtils"> + <p>Text has a series of String utilities. The first is StringUtils, + oodles and oodles of functions which tweak, transform, squeeze and + cuddle java.lang.Strings. In addition to StringUtils, there are a + series of other String manipulating classes; RandomStringUtils, + StringEscapeUtils and Tokenizer. RandomStringUtils speaks for itself. + It's provides ways in which to generate pieces of text, such as might + be used for default passwords. StringEscapeUtils contains methods to + escape and unescape Java, JavaScript, HTML, XML and SQL. Tokenizer is + an improved alternative to java.util.StringTokenizer. + </p> + <p>These are ideal classes to start using if you're looking to get into + Text. StringUtils' capitalize, substringBetween/Before/After, split + and join are good methods to begin with. If you use + java.sql.Statements a lot, StringEscapeUtils.escapeSql might be of + interest. + </p> + <p>In addition to these classes, WordUtils is another String + manipulator. It works on Strings at the word level, for example + WordUtils.capitalize will capitalize every word in a piece of text. + WordUtils also contains methods to wrap text. + </p> + </subsection> + + <subsection + name="Character handling - CharSetUtils, CharSet, CharRange, CharUtils"> + <p>In addition to dealing with Strings, it's also important to deal with + chars and Characters. CharUtils exists for this purpose, while + CharSetUtils exists for set-manipulation of Strings. Be careful, + although CharSetUtils takes an argument of type String, it is only as + a set of characters. For example, <code> + CharSetUtils.delete("testtest", "tr") + </code> will remove all t's and all r's from the String, not just the + String "tr". + </p> + <p>CharRange and CharSet are both used internally by CharSetUtils, and + will probaby rarely be used. + </p> + </subsection> + + <subsection name="JVM interaction - SystemUtils, CharEncoding"> + <p>SystemUtils is a simple little class which makes it easy to find out + information about which platform you are on. For some, this is a + necessary evil. It was never something I expected to use myself until + I was trying to ensure that Commons Text itself compiled under JDK + 1.2. Having pushed out a few JDK 1.3 bits that had slipped in (<code> + Collections.EMPTY_MAP + </code> is a classic offender), I then found that one of the Unit + Tests was dying mysteriously under JDK 1.2, but ran fine under JDK + 1.3. There was no obvious solution and I needed to move onwards, so + the simple solution was to wrap that particular test in a <code> + if(SystemUtils.isJavaVersionAtLeast(1.3f)) {</code>, make a note and + move on. + </p> + <p>The CharEncoding class is also used to interact with the Java + environment and may be used to see which character encodings are + supported in a particular environment. + </p> + </subsection> + + <subsection + name="Serialization - SerializationUtils, SerializationException"> + <p>Serialization doesn't have to be that hard! A simple util class can + take away the pain, plus it provides a method to clone an object by + unserializing and reserializing, an old Java trick. + </p> + </subsection> + + <subsection + name="Assorted functions - ObjectUtils, ClassUtils, ArrayUtils, BooleanUtils"> + <p>Would you believe it, ObjectUtils contains handy functions for + Objects, mainly null-safe implementations of the methods on + java.lang.Object. + </p> + <p>ClassUtils is largely a set of helper methods for reflection. Of + special note are the comparators hidden away in ClassUtils, useful for + sorting Class and Package objects by name; however they merely sort + alphabetically and don't understand the common habit of sorting <code> + java + </code> and <code>javax</code> first. + </p> + <p>Next up, ArrayUtils. This is a big one with many methods and many + overloads of these methods so it is probably worth an in depth look + here. Before we begin, assume that every method mentioned is + overloaded for all the primitives and for Object. Also, the short-hand + 'xxx' implies a generic primitive type, but usually also includes + Object. + </p> + <ul> + <li>ArrayUtils provides singleton empty arrays for all the basic + types. These will largely be of use in the Collections API with its + toArray methods, but also will be of use with methods which want to + return an empty array on error. + </li> + <li> + <code>add(xxx[], xxx)</code> + will add a primitive type to an array, resizing the array as you'd + expect. Object is also supported. + </li> + <li> + <code>clone(xxx[])</code> + clones a primitive or Object array. + </li> + <li> + <code>contains(xxx[], xxx)</code> + searches for a primitive or Object in a primitive or Object array. + </li> + <li> + <code>getLength(Object)</code> + returns the length of any array or an IllegalArgumentException if + the parameter is not an array. <code>hashCode(Object)</code>, <code> + equals(Object, Object)</code>, + <code>toString(Object)</code> + </li> + <li> + <code>indexOf(xxx[], xxx)</code> + and <code>indexOf(xxx[], xxx, int)</code> are copies of the classic + String methods, but this time for primitive/Object arrays. In + addition, a lastIndexOf set of methods exists. + </li> + <li> + <code>isEmpty(xxx[])</code> + lets you know if an array is zero-sized or null. + </li> + <li> + <code>isSameLength(xxx[], xxx[])</code> + returns true if the arrays are the same length. + </li> + <li>Along side the add methods, there are also remove methods of two + types. The first type remove the value at an index, <code> + remove(xxx[], int)</code>, while the second type remove the first + value from the array, <code>remove(xxx[], xxx)</code>. + </li> + <li>Nearing the end now. The <code>reverse(xxx[])</code> method turns + an array around. + </li> + <li>The <code>subarray(xxx[], int, int)</code> method splices an array + out of a larger array. + </li> + <li>Primitive to primitive wrapper conversion is handled by the <code> + toObject(xxx[]) + </code> and <code>toPrimitive(Xxx[])</code> methods. + </li> + </ul> + <p>Lastly, <code>ArrayUtils.toMap(Object[])</code> is worthy of special + note. It is not a heavily overloaded method for working with arrays, + but a simple way to create Maps from literals. + </p> + <h5>Using toMap</h5> + <source> + Map colorMap = MapUtils.toMap(new String[][] {{ + {"RED", "#FF0000"}, + {"GREEN", "#00FF00"}, + {"BLUE", "#0000FF"} + }); + </source> + + <p>Our final util class is BooleanUtils. It contains various Boolean + acting methods, probably of most interest is the <code> + BooleanUtils.toBoolean(String) + </code> method which turns various positive/negative Strings into a + Boolean object, and not just true/false as with Boolean.valueOf. + </p> + </subsection> + + + </section> + + <section name="text.beta.diff.*"> + <!-- + CommandVisitor + DeleteCommand + EditCommand + EditScript + InsertCommand + KeepCommand + ReplacementsFinder + ReplacementsHandler + StringsComparator + --> + <p>Provides algorithms for diff between strings.</p> + <p>The initial implementation of the Myers algorithm was adapted from the + commons-collections sequence package. + </p> + </section> + + <section name="text.beta.similarity.*"> + <!-- + Enum + EnumUtils + ValuedEnum + --> + <p>Provides algorithms for string similarity.</p> + + <p>The algorithms that implement the EditDistance interface follow the + same + simple principle: the more similar (closer) strings are, lower is the + distance. + For example, the words house and hose are closer than house and + trousers. + </p> + + <p>The following algorithms are available at the moment:</p> + + <ul> + <li> + <code>CosineDistance</code> + </li> + <li> + <code>CosineSimilarity</code> + </li> + <li> + <code>FuzzyScore</code> + </li> + <li> + <code>HammingDistance</code> + </li> + <li> + <code>JaroWinklerDistance</code> + </li> + <li> + <code>LevenshteinDistance</code> + </li> + <li> + <code>LongestCommonSubsequenceDistance</code> + </li> + </ul> - <body> + <p>The <code>CosineDistance</code> utilises a + <code>RegexTokenizer</code> + regular expression tokenizer (\w+). And the <code> + LevenshteinDistance</code>'s + behaviour can be changed to take into consideration a maximum + throughput. + </p> + </section> - <section name='User guide for Commons "Text"'> - TODO - </section> + <section name="text.translate.*"> + <!-- + ExceptionUtils + Nestable + NestableDelegate + NestableError + NestableException + NestableRuntimeException + --> + <p>An API for creating text translation routines from a set of smaller + building blocks. Initially created to make it possible for the user to + customize the rules in the StringEscapeUtils class. + </p> + <p>These classes are immutable, and therefore thread-safe.</p> + </section> -</body> + </body> </document>