Re: Map Lexicoder

2015-12-29 Thread Keith Turner
Do you want all maps with the same keys to sort together?  If so doing
abc123 would make that happen.

The map data below

  { a : 1 , b : 2, c : 3 }
  { a : 1 , x : 8, y : 9 }
  { a : 4 , b : 5, c : 6 }

Would sort like the following if encoding key values pairs

  a1b2c3
  a1x8y9
  a4b5c6

If encoding all key and then all values, it would sort as follows

  abc123
  abc456
  axy189



On Tue, Dec 29, 2015 at 4:53 PM, Adam J. Shook  wrote:

> Agreed, I came to the same conclusion while implementing.  The final
> result that I have is a SortedMapLexicoder to avoid any comparisons going
> haywire.  Additionally, would it be best to encode the map as an array of
> keys followed by an array of values, or encode all key value pairs
> back-to-back:
>
> { a : 1 , b : 2, c : 3 } encoded as
>
> a1b2c3
> -or-
> abc123
>
> Feels like I should be encoding a list of keys, then the list of values,
> and then concatenating these two encoded byte arrays.  I think the end
> solution will be to support both?  I'm having a hard time reconciling which
> method is better, if any.  Hard to find some good examples of people who
> are sorting a list of maps.
>
> On Tue, Dec 29, 2015 at 2:47 PM, Keith Turner  wrote:
>
>>
>>
>> On Mon, Dec 28, 2015 at 11:47 AM, Adam J. Shook 
>> wrote:
>>
>>> Hello all,
>>>
>>> Any suggestions for using a Map Lexicoder (or implementing one)?  I am
>>> currently using a new ListLexicoder(new PairLexicoder(some lexicoder, some
>>> lexicoder), which is working for single maps.  However, when one of the
>>> lexicoders in the Pair is itself a Map (and therefore another
>>> ListLexicoder(PairLexicoder)), an exception is being thrown because
>>> ArrayList is not Comparable.
>>>
>>
>>
>> Since maps do not have a well defined order of keys and values,
>> comparison is tricky.   The purpose of Lexicoders is encode things in such
>> a way that the lexicographical comparison of the serialized data is
>> possible.  With a hashmap if I add the same data in the same order to two
>> different hash map instances, its possible that when iterating over those
>> maps I could see the data in different orders.   This could lead to two
>> maps constructed in the same way at different times (like different JVMs
>> with different implementations of HashMap) generating different data that
>> compare as different.  Ideally comparison of the two would yield equality.
>>
>> Something like LinkedHashMap does not have this problem for the same
>> insertion order.  If you want things to be comparable regardless of
>> insertion order (which I think is more intuitive), then SortedMap seems
>> like it would be a good candidate.  So maybe a SortedMapLexicoder would be
>> a better thing to offer?
>>
>>
>>> Regards,
>>> --Adam
>>>
>>
>>
>


Re: Map Lexicoder

2015-12-29 Thread Keith Turner
On Mon, Dec 28, 2015 at 11:47 AM, Adam J. Shook 
wrote:

> Hello all,
>
> Any suggestions for using a Map Lexicoder (or implementing one)?  I am
> currently using a new ListLexicoder(new PairLexicoder(some lexicoder, some
> lexicoder), which is working for single maps.  However, when one of the
> lexicoders in the Pair is itself a Map (and therefore another
> ListLexicoder(PairLexicoder)), an exception is being thrown because
> ArrayList is not Comparable.
>


Since maps do not have a well defined order of keys and values, comparison
is tricky.   The purpose of Lexicoders is encode things in such a way that
the lexicographical comparison of the serialized data is possible.  With a
hashmap if I add the same data in the same order to two different hash map
instances, its possible that when iterating over those maps I could see the
data in different orders.   This could lead to two maps constructed in the
same way at different times (like different JVMs with different
implementations of HashMap) generating different data that compare as
different.  Ideally comparison of the two would yield equality.

Something like LinkedHashMap does not have this problem for the same
insertion order.  If you want things to be comparable regardless of
insertion order (which I think is more intuitive), then SortedMap seems
like it would be a good candidate.  So maybe a SortedMapLexicoder would be
a better thing to offer?


> Regards,
> --Adam
>


Re: Map Lexicoder

2015-12-29 Thread Matthew Hall
On Tue, Dec 29, 2015 at 04:53:47PM -0500, Adam J. Shook wrote:
> a1b2c3

Most map serializations use this approach to maximize locality of reference in 
the CPU cache when serializing and deserializing.

Matthew.


Re: Map Lexicoder

2015-12-29 Thread Adam J. Shook
Agreed, I came to the same conclusion while implementing.  The final result
that I have is a SortedMapLexicoder to avoid any comparisons going
haywire.  Additionally, would it be best to encode the map as an array of
keys followed by an array of values, or encode all key value pairs
back-to-back:

{ a : 1 , b : 2, c : 3 } encoded as

a1b2c3
-or-
abc123

Feels like I should be encoding a list of keys, then the list of values,
and then concatenating these two encoded byte arrays.  I think the end
solution will be to support both?  I'm having a hard time reconciling which
method is better, if any.  Hard to find some good examples of people who
are sorting a list of maps.

On Tue, Dec 29, 2015 at 2:47 PM, Keith Turner  wrote:

>
>
> On Mon, Dec 28, 2015 at 11:47 AM, Adam J. Shook 
> wrote:
>
>> Hello all,
>>
>> Any suggestions for using a Map Lexicoder (or implementing one)?  I am
>> currently using a new ListLexicoder(new PairLexicoder(some lexicoder, some
>> lexicoder), which is working for single maps.  However, when one of the
>> lexicoders in the Pair is itself a Map (and therefore another
>> ListLexicoder(PairLexicoder)), an exception is being thrown because
>> ArrayList is not Comparable.
>>
>
>
> Since maps do not have a well defined order of keys and values, comparison
> is tricky.   The purpose of Lexicoders is encode things in such a way that
> the lexicographical comparison of the serialized data is possible.  With a
> hashmap if I add the same data in the same order to two different hash map
> instances, its possible that when iterating over those maps I could see the
> data in different orders.   This could lead to two maps constructed in the
> same way at different times (like different JVMs with different
> implementations of HashMap) generating different data that compare as
> different.  Ideally comparison of the two would yield equality.
>
> Something like LinkedHashMap does not have this problem for the same
> insertion order.  If you want things to be comparable regardless of
> insertion order (which I think is more intuitive), then SortedMap seems
> like it would be a good candidate.  So maybe a SortedMapLexicoder would be
> a better thing to offer?
>
>
>> Regards,
>> --Adam
>>
>
>