Aha, makes sense. Thanks!

On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik <[email protected]> wrote:

> TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of()));
>
> On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan <
> [email protected]> wrote:
>
>> So I have my custom coder created for TreeMap and I'm ready to set it...
>>
>> So my Type is "TreeMap<String, ArrayList<Integer>>"
>>
>> What do I put for ".setCoder(TreeMapCoder.of(???, ???))"
>>
>> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang <[email protected]> wrote:
>>
>>> Hi Shannon,  [1] will be a good start on coder in Java SDK.
>>>
>>>
>>> [1]
>>> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety
>>>
>>> Rui
>>>
>>> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <
>>> [email protected]> wrote:
>>>
>>>> Was able to get it to use ArrayList by doing List<List<Integer>> result
>>>> = new ArrayList<List<Integer>>();
>>>>
>>>> Then storing my keys in a separate array that I'll pass in as a side
>>>> input to key for the list of lists.
>>>>
>>>> Thanks for the help, lemme know more in the future about how coders
>>>> work and instantiate and I'd love to help contribute by adding some new
>>>> coders.
>>>>
>>>> - Shannon
>>>>
>>>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
>>>> [email protected]> wrote:
>>>>
>>>>> Will do. Thanks. A new coder for deterministic Maps would be great in
>>>>> the future. Thank you!
>>>>>
>>>>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang <[email protected]> wrote:
>>>>>
>>>>>> I think Mike refers to ListCoder
>>>>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java>
>>>>>>  which
>>>>>> is deterministic if its element is the same. Maybe you can search the 
>>>>>> repo
>>>>>> for examples of ListCoder?
>>>>>>
>>>>>>
>>>>>> -Rui
>>>>>>
>>>>>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> So ArrayList doesn't work either, so just a standard List?
>>>>>>>
>>>>>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <[email protected]> wrote:
>>>>>>>
>>>>>>>> Shannon, I agree with Mike on List is a good workaround if your
>>>>>>>> element within list is deterministic and you are eager to make your new
>>>>>>>> pipeline working.
>>>>>>>>
>>>>>>>>
>>>>>>>> Let me send back some pointers to adding new coder later.
>>>>>>>>
>>>>>>>>
>>>>>>>> -Rui
>>>>>>>>
>>>>>>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> I just started learning Java today to attempt to convert our
>>>>>>>>> python pipelines to Java to take advantage of key features that Java 
>>>>>>>>> has. I
>>>>>>>>> have no idea how I would create a new coder and include it in for 
>>>>>>>>> beam to
>>>>>>>>> recognize.
>>>>>>>>>
>>>>>>>>> If you can point me in the right direction of where it hooks
>>>>>>>>> together I might be able to figure that out. I can duplicate MapCoder 
>>>>>>>>> and
>>>>>>>>> try to make changes, but how will beam know to pick up that coder for 
>>>>>>>>> a
>>>>>>>>> groupByKey?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> Shannon
>>>>>>>>>
>>>>>>>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> It could be just straightforward to create a SortedMapCoder for
>>>>>>>>>> TreeMap. Just add checks on map instances and then change
>>>>>>>>>> verifyDeterministic.
>>>>>>>>>>
>>>>>>>>>> If this is a common need we could just submit it into Beam repo.
>>>>>>>>>>
>>>>>>>>>> [1]:
>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> There isn't a coder for deterministic maps in Beam, so even if
>>>>>>>>>>> your datastructure is deterministic, Beam will assume the 
>>>>>>>>>>> serialized bytes
>>>>>>>>>>> aren't deterministic.
>>>>>>>>>>>
>>>>>>>>>>> You could make one using the MapCoder as a guide:
>>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
>>>>>>>>>>> Just change it such that the exception in VerifyDeterministic is
>>>>>>>>>>> removed and when decoding it instantiates a TreeMap or such instead 
>>>>>>>>>>> of a
>>>>>>>>>>> HashMap.
>>>>>>>>>>>
>>>>>>>>>>> Alternatively, you could just represent your key as a sorted
>>>>>>>>>>> list of KV pairs. Lookups could be done using binary search if 
>>>>>>>>>>> necessary.
>>>>>>>>>>>
>>>>>>>>>>> Mike
>>>>>>>>>>>
>>>>>>>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>
>>>>>>>>>>>> So I'm working on essentially doing a word-count on a complex
>>>>>>>>>>>> data structure.
>>>>>>>>>>>>
>>>>>>>>>>>> I tried just using a HashMap as the Structure, but that didn't
>>>>>>>>>>>> work because it is non-deterministic.
>>>>>>>>>>>>
>>>>>>>>>>>> However when Given a LinkedHashMap or TreeMap which is
>>>>>>>>>>>> deterministic the SDK complains that it's non-deterministic when 
>>>>>>>>>>>> trying to
>>>>>>>>>>>> use it as a key for GroupByKey.
>>>>>>>>>>>>
>>>>>>>>>>>> What would be an appropriate Map style data structure that
>>>>>>>>>>>> would be deterministic enough for Apache Beam to accept it as a 
>>>>>>>>>>>> key?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Shannon
>>>>>>>>>>>>
>>>>>>>>>>>

Reply via email to