Aha, makes sense. Thanks! On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik <[email protected]> wrote:
> TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of())); > > On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan < > [email protected]> wrote: > >> So I have my custom coder created for TreeMap and I'm ready to set it... >> >> So my Type is "TreeMap<String, ArrayList<Integer>>" >> >> What do I put for ".setCoder(TreeMapCoder.of(???, ???))" >> >> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang <[email protected]> wrote: >> >>> Hi Shannon, [1] will be a good start on coder in Java SDK. >>> >>> >>> [1] >>> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety >>> >>> Rui >>> >>> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan < >>> [email protected]> wrote: >>> >>>> Was able to get it to use ArrayList by doing List<List<Integer>> result >>>> = new ArrayList<List<Integer>>(); >>>> >>>> Then storing my keys in a separate array that I'll pass in as a side >>>> input to key for the list of lists. >>>> >>>> Thanks for the help, lemme know more in the future about how coders >>>> work and instantiate and I'd love to help contribute by adding some new >>>> coders. >>>> >>>> - Shannon >>>> >>>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan < >>>> [email protected]> wrote: >>>> >>>>> Will do. Thanks. A new coder for deterministic Maps would be great in >>>>> the future. Thank you! >>>>> >>>>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang <[email protected]> wrote: >>>>> >>>>>> I think Mike refers to ListCoder >>>>>> <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/ListCoder.java> >>>>>> which >>>>>> is deterministic if its element is the same. Maybe you can search the >>>>>> repo >>>>>> for examples of ListCoder? >>>>>> >>>>>> >>>>>> -Rui >>>>>> >>>>>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> So ArrayList doesn't work either, so just a standard List? >>>>>>> >>>>>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang <[email protected]> wrote: >>>>>>> >>>>>>>> Shannon, I agree with Mike on List is a good workaround if your >>>>>>>> element within list is deterministic and you are eager to make your new >>>>>>>> pipeline working. >>>>>>>> >>>>>>>> >>>>>>>> Let me send back some pointers to adding new coder later. >>>>>>>> >>>>>>>> >>>>>>>> -Rui >>>>>>>> >>>>>>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> I just started learning Java today to attempt to convert our >>>>>>>>> python pipelines to Java to take advantage of key features that Java >>>>>>>>> has. I >>>>>>>>> have no idea how I would create a new coder and include it in for >>>>>>>>> beam to >>>>>>>>> recognize. >>>>>>>>> >>>>>>>>> If you can point me in the right direction of where it hooks >>>>>>>>> together I might be able to figure that out. I can duplicate MapCoder >>>>>>>>> and >>>>>>>>> try to make changes, but how will beam know to pick up that coder for >>>>>>>>> a >>>>>>>>> groupByKey? >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> Shannon >>>>>>>>> >>>>>>>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> It could be just straightforward to create a SortedMapCoder for >>>>>>>>>> TreeMap. Just add checks on map instances and then change >>>>>>>>>> verifyDeterministic. >>>>>>>>>> >>>>>>>>>> If this is a common need we could just submit it into Beam repo. >>>>>>>>>> >>>>>>>>>> [1]: >>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146 >>>>>>>>>> >>>>>>>>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> There isn't a coder for deterministic maps in Beam, so even if >>>>>>>>>>> your datastructure is deterministic, Beam will assume the >>>>>>>>>>> serialized bytes >>>>>>>>>>> aren't deterministic. >>>>>>>>>>> >>>>>>>>>>> You could make one using the MapCoder as a guide: >>>>>>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java >>>>>>>>>>> Just change it such that the exception in VerifyDeterministic is >>>>>>>>>>> removed and when decoding it instantiates a TreeMap or such instead >>>>>>>>>>> of a >>>>>>>>>>> HashMap. >>>>>>>>>>> >>>>>>>>>>> Alternatively, you could just represent your key as a sorted >>>>>>>>>>> list of KV pairs. Lookups could be done using binary search if >>>>>>>>>>> necessary. >>>>>>>>>>> >>>>>>>>>>> Mike >>>>>>>>>>> >>>>>>>>>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan < >>>>>>>>>>> [email protected]>: >>>>>>>>>>> >>>>>>>>>>>> So I'm working on essentially doing a word-count on a complex >>>>>>>>>>>> data structure. >>>>>>>>>>>> >>>>>>>>>>>> I tried just using a HashMap as the Structure, but that didn't >>>>>>>>>>>> work because it is non-deterministic. >>>>>>>>>>>> >>>>>>>>>>>> However when Given a LinkedHashMap or TreeMap which is >>>>>>>>>>>> deterministic the SDK complains that it's non-deterministic when >>>>>>>>>>>> trying to >>>>>>>>>>>> use it as a key for GroupByKey. >>>>>>>>>>>> >>>>>>>>>>>> What would be an appropriate Map style data structure that >>>>>>>>>>>> would be deterministic enough for Apache Beam to accept it as a >>>>>>>>>>>> key? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Shannon >>>>>>>>>>>> >>>>>>>>>>>
