[jira] [Commented] (TEXT-19) Add alphabet converter

2016-09-19 Thread Eyal Allweil (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502568#comment-15502568
 ] 

Eyal Allweil commented on TEXT-19:
--

I opened [a new pull request|https://github.com/apache/commons-text/pull/5]. 
Some of Rob's comments are addressed there:

- I removed the doNotEncodeMap data member (the only price is a slightly more 
expensive check when decoding)
- I added a null check for the equals method.
- I added an example of usage to the javadoc
- I took care of the stylistic differences he mentioned

I didn't address the following, which can be discussed:

- Do we want to accommodate non-invertible or non-decodable encodings (e.g. new 
AlphabetConverter([‘a’,’b’,’c’,’d’],[‘a’,’e’,’f’,’e’],[‘a’]))?
- Do we want to accommodate alphabets over concatenated chars (e.g. new 
AlphabetConverter([‘ab’,’c’,’d’,e’],[‘a’,’k’,’hi’,’z’],[]))?
- the name of the class 

> Add alphabet converter
> --
>
> Key: TEXT-19
> URL: https://issues.apache.org/jira/browse/TEXT-19
> Project: Commons Text
>  Issue Type: New Feature
>Reporter: Eyal Allweil
> Fix For: 1.0
>
>
> (as described in [the mailing 
> list|http://mail-archives.apache.org/mod_mbox/commons-dev/201609.mbox/%3c289983494.3057706.1472720010...@mail.yahoo.com%3e])
> This is a utility class I wrote for converting from one alphabet to another - 
> for example, from unicode to latin, without using some of the chars in latin. 
> The usage looks like this:
> {code}
> Set originals; // a, b, c, d
> Set encoding; // 0, 1, d
> Set doNotEncode; // d
> AlphabetConverter ac = AlphabetConverter.createConverter(originals, encoding, 
> doNotEncode);
> ac.encode("a"); // 00
> ac.encode("b"); // 01
> ac.encode("c"); // 0d
> ac.encode("d"); // d
> ac.encode("abcd"); // 00010dd
> {code}
> Of course, x.equals(ac.decode(ac.encode(x))) should always be true.
> The implementation provided makes the encodings of fixed length, other than 
> the "do not encode" chars, which remain as they are (length one).
> In addition, in order to make it easier to preserve the encoding scheme, I've 
> added a human-readable toString implementation, and a constructor that can 
> recreate an AlphabetConverter from the encoding map, such that:
> {code}
> AlphabetConverter ac;
> ac.equals(AlphabetConverter.createConverterFromMap(ac.getOriginalToEncoded()));
>  // always should be true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEXT-19) Add alphabet converter

2016-09-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501262#comment-15501262
 ] 

ASF GitHub Bot commented on TEXT-19:


Github user eyala closed the pull request at:

https://github.com/apache/commons-lang/pull/188


> Add alphabet converter
> --
>
> Key: TEXT-19
> URL: https://issues.apache.org/jira/browse/TEXT-19
> Project: Commons Text
>  Issue Type: New Feature
>Reporter: Eyal Allweil
> Fix For: 1.0
>
>
> (as described in [the mailing 
> list|http://mail-archives.apache.org/mod_mbox/commons-dev/201609.mbox/%3c289983494.3057706.1472720010...@mail.yahoo.com%3e])
> This is a utility class I wrote for converting from one alphabet to another - 
> for example, from unicode to latin, without using some of the chars in latin. 
> The usage looks like this:
> {code}
> Set originals; // a, b, c, d
> Set encoding; // 0, 1, d
> Set doNotEncode; // d
> AlphabetConverter ac = AlphabetConverter.createConverter(originals, encoding, 
> doNotEncode);
> ac.encode("a"); // 00
> ac.encode("b"); // 01
> ac.encode("c"); // 0d
> ac.encode("d"); // d
> ac.encode("abcd"); // 00010dd
> {code}
> Of course, x.equals(ac.decode(ac.encode(x))) should always be true.
> The implementation provided makes the encodings of fixed length, other than 
> the "do not encode" chars, which remain as they are (length one).
> In addition, in order to make it easier to preserve the encoding scheme, I've 
> added a human-readable toString implementation, and a constructor that can 
> recreate an AlphabetConverter from the encoding map, such that:
> {code}
> AlphabetConverter ac;
> ac.equals(AlphabetConverter.createConverterFromMap(ac.getOriginalToEncoded()));
>  // always should be true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEXT-19) Add alphabet converter

2016-09-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TEXT-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501221#comment-15501221
 ] 

ASF GitHub Bot commented on TEXT-19:


Github user britter commented on the issue:

https://github.com/apache/commons-lang/pull/188
  
The issue has been moved to https://issues.apache.org/jira/browse/TEXT-19 
Please reference TEXT-19 in your PR against the Commons Text repository. Thank 
you!


> Add alphabet converter
> --
>
> Key: TEXT-19
> URL: https://issues.apache.org/jira/browse/TEXT-19
> Project: Commons Text
>  Issue Type: New Feature
>Reporter: Eyal Allweil
> Fix For: 1.0
>
>
> (as described in [the mailing 
> list|http://mail-archives.apache.org/mod_mbox/commons-dev/201609.mbox/%3c289983494.3057706.1472720010...@mail.yahoo.com%3e])
> This is a utility class I wrote for converting from one alphabet to another - 
> for example, from unicode to latin, without using some of the chars in latin. 
> The usage looks like this:
> {code}
> Set originals; // a, b, c, d
> Set encoding; // 0, 1, d
> Set doNotEncode; // d
> AlphabetConverter ac = AlphabetConverter.createConverter(originals, encoding, 
> doNotEncode);
> ac.encode("a"); // 00
> ac.encode("b"); // 01
> ac.encode("c"); // 0d
> ac.encode("d"); // d
> ac.encode("abcd"); // 00010dd
> {code}
> Of course, x.equals(ac.decode(ac.encode(x))) should always be true.
> The implementation provided makes the encodings of fixed length, other than 
> the "do not encode" chars, which remain as they are (length one).
> In addition, in order to make it easier to preserve the encoding scheme, I've 
> added a human-readable toString implementation, and a constructor that can 
> recreate an AlphabetConverter from the encoding map, such that:
> {code}
> AlphabetConverter ac;
> ac.equals(AlphabetConverter.createConverterFromMap(ac.getOriginalToEncoded()));
>  // always should be true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)