[jira] [Commented] (TEXT-19) Add alphabet converter
[ https://issues.apache.org/jira/browse/TEXT-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502568#comment-15502568 ] Eyal Allweil commented on TEXT-19: -- I opened [a new pull request|https://github.com/apache/commons-text/pull/5]. Some of Rob's comments are addressed there: - I removed the doNotEncodeMap data member (the only price is a slightly more expensive check when decoding) - I added a null check for the equals method. - I added an example of usage to the javadoc - I took care of the stylistic differences he mentioned I didn't address the following, which can be discussed: - Do we want to accommodate non-invertible or non-decodable encodings (e.g. new AlphabetConverter([‘a’,’b’,’c’,’d’],[‘a’,’e’,’f’,’e’],[‘a’]))? - Do we want to accommodate alphabets over concatenated chars (e.g. new AlphabetConverter([‘ab’,’c’,’d’,e’],[‘a’,’k’,’hi’,’z’],[]))? - the name of the class > Add alphabet converter > -- > > Key: TEXT-19 > URL: https://issues.apache.org/jira/browse/TEXT-19 > Project: Commons Text > Issue Type: New Feature >Reporter: Eyal Allweil > Fix For: 1.0 > > > (as described in [the mailing > list|http://mail-archives.apache.org/mod_mbox/commons-dev/201609.mbox/%3c289983494.3057706.1472720010...@mail.yahoo.com%3e]) > This is a utility class I wrote for converting from one alphabet to another - > for example, from unicode to latin, without using some of the chars in latin. > The usage looks like this: > {code} > Set originals; // a, b, c, d > Set encoding; // 0, 1, d > Set doNotEncode; // d > AlphabetConverter ac = AlphabetConverter.createConverter(originals, encoding, > doNotEncode); > ac.encode("a"); // 00 > ac.encode("b"); // 01 > ac.encode("c"); // 0d > ac.encode("d"); // d > ac.encode("abcd"); // 00010dd > {code} > Of course, x.equals(ac.decode(ac.encode(x))) should always be true. > The implementation provided makes the encodings of fixed length, other than > the "do not encode" chars, which remain as they are (length one). > In addition, in order to make it easier to preserve the encoding scheme, I've > added a human-readable toString implementation, and a constructor that can > recreate an AlphabetConverter from the encoding map, such that: > {code} > AlphabetConverter ac; > ac.equals(AlphabetConverter.createConverterFromMap(ac.getOriginalToEncoded())); > // always should be true > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEXT-19) Add alphabet converter
[ https://issues.apache.org/jira/browse/TEXT-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501262#comment-15501262 ] ASF GitHub Bot commented on TEXT-19: Github user eyala closed the pull request at: https://github.com/apache/commons-lang/pull/188 > Add alphabet converter > -- > > Key: TEXT-19 > URL: https://issues.apache.org/jira/browse/TEXT-19 > Project: Commons Text > Issue Type: New Feature >Reporter: Eyal Allweil > Fix For: 1.0 > > > (as described in [the mailing > list|http://mail-archives.apache.org/mod_mbox/commons-dev/201609.mbox/%3c289983494.3057706.1472720010...@mail.yahoo.com%3e]) > This is a utility class I wrote for converting from one alphabet to another - > for example, from unicode to latin, without using some of the chars in latin. > The usage looks like this: > {code} > Set originals; // a, b, c, d > Set encoding; // 0, 1, d > Set doNotEncode; // d > AlphabetConverter ac = AlphabetConverter.createConverter(originals, encoding, > doNotEncode); > ac.encode("a"); // 00 > ac.encode("b"); // 01 > ac.encode("c"); // 0d > ac.encode("d"); // d > ac.encode("abcd"); // 00010dd > {code} > Of course, x.equals(ac.decode(ac.encode(x))) should always be true. > The implementation provided makes the encodings of fixed length, other than > the "do not encode" chars, which remain as they are (length one). > In addition, in order to make it easier to preserve the encoding scheme, I've > added a human-readable toString implementation, and a constructor that can > recreate an AlphabetConverter from the encoding map, such that: > {code} > AlphabetConverter ac; > ac.equals(AlphabetConverter.createConverterFromMap(ac.getOriginalToEncoded())); > // always should be true > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEXT-19) Add alphabet converter
[ https://issues.apache.org/jira/browse/TEXT-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501221#comment-15501221 ] ASF GitHub Bot commented on TEXT-19: Github user britter commented on the issue: https://github.com/apache/commons-lang/pull/188 The issue has been moved to https://issues.apache.org/jira/browse/TEXT-19 Please reference TEXT-19 in your PR against the Commons Text repository. Thank you! > Add alphabet converter > -- > > Key: TEXT-19 > URL: https://issues.apache.org/jira/browse/TEXT-19 > Project: Commons Text > Issue Type: New Feature >Reporter: Eyal Allweil > Fix For: 1.0 > > > (as described in [the mailing > list|http://mail-archives.apache.org/mod_mbox/commons-dev/201609.mbox/%3c289983494.3057706.1472720010...@mail.yahoo.com%3e]) > This is a utility class I wrote for converting from one alphabet to another - > for example, from unicode to latin, without using some of the chars in latin. > The usage looks like this: > {code} > Set originals; // a, b, c, d > Set encoding; // 0, 1, d > Set doNotEncode; // d > AlphabetConverter ac = AlphabetConverter.createConverter(originals, encoding, > doNotEncode); > ac.encode("a"); // 00 > ac.encode("b"); // 01 > ac.encode("c"); // 0d > ac.encode("d"); // d > ac.encode("abcd"); // 00010dd > {code} > Of course, x.equals(ac.decode(ac.encode(x))) should always be true. > The implementation provided makes the encodings of fixed length, other than > the "do not encode" chars, which remain as they are (length one). > In addition, in order to make it easier to preserve the encoding scheme, I've > added a human-readable toString implementation, and a constructor that can > recreate an AlphabetConverter from the encoding map, such that: > {code} > AlphabetConverter ac; > ac.equals(AlphabetConverter.createConverterFromMap(ac.getOriginalToEncoded())); > // always should be true > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)