And if you have a DoFn<X, List<ABC>> you can follow this with Flatten.iterables <https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/Flatten.Iterables.html> to turn the output PCollection<List<ABC>> into a PCollection<ABC>. In some cases you may want to follow this with a Reshuffle <https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/Reshuffle.html#viaRandomKey--> so that the outputs from a single X get distributed among multiple machines.
On Thu, Jun 7, 2018 at 8:19 AM Marián Dvorský <[email protected]> wrote: > If you have a function which given X returns a List<ABC>, you can use > FlatMapElements > <https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/FlatMapElements.html> > transform > on PCollection<X> to get a PCollection<ABC>. > > On Thu, Jun 7, 2018 at 8:16 AM S. Sahayaraj <[email protected]> wrote: > >> In case if we could return List<ABC> from DoFn<> then we could use the >> code as suggested in section 3.1.2 and mentioned by you below., but the >> return type of DoFn<> is always PCollection<> in where I could not have the >> list of ABC objects which further will be fed as input for parallel >> computation. Is there any possibility to convert List<ABC> to >> PCollection<ABC> in DoFn<> itself? OR can DoFn<> return List<ABC> objects? >> >> >> >> >> >> Cheers, >> >> S. Sahayaraj >> >> >> >> *From:* Robert Bradshaw [mailto:[email protected]] >> *Sent:* Wednesday, June 6, 2018 9:40 PM >> *To:* [email protected] >> *Subject:* [EXTERNAL] - Re: Create PCollection<ABC> from List<ABC>in >> DoFn<> >> >> >> >> You can use the Create transform to do this, e.g. >> >> >> >> Pipeline p = ... >> >> List<ABC> inMemoryObjects = ... >> >> PCollection<ABC> pcollectionOfObject = p.apply(Create.of( >> inMemoryObjects)); >> >> result = pcollectionOfObject.apply(ParDo.of(SomeDoFn...)); >> >> >> >> See section 3.1.2 at >> https://beam.apache.org/documentation/programming-guide/#pcollections >> >> >> >> On Wed, Jun 6, 2018 at 8:34 AM S. Sahayaraj <[email protected]> wrote: >> >> Hello, >> >> I have created a java class which extends DoFn<>, there >> are list of objects of type ABC (List<ABC>) populated in >> processElement(ProcessContext c) at runtime and would like to generate >> respective PCollection<ABC> from List<ABC> so that the subsequent >> transformation can do parallel execution on each ABC object in >> PCollection<ABC>. How do we create PCollection from in-memory object >> created in DoFn<>? OR How do we get pipeline object being in DoFn<>? OR is >> there any SDK guidelines to refer? >> >> >> >> >> >> Thanks, >> >> S. Sahayaraj >> >>
