Also, remember that a DoFn can return multiple results:

@ProcessElement
void process(...) {
  for (...) {
    c.output(...);
  }
}

On Thu, Jun 7, 2018 at 9:27 AM Robert Bradshaw <[email protected]> wrote:

> And if you have a DoFn<X, List<ABC>> you can follow this with
> Flatten.iterables
> <https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/Flatten.Iterables.html>
>  to
> turn the output PCollection<List<ABC>> into a PCollection<ABC>. In some
> cases you may want to follow this with a Reshuffle
> <https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/Reshuffle.html#viaRandomKey-->
>  so
> that the outputs from a single X get distributed among multiple machines.
>
> On Thu, Jun 7, 2018 at 8:19 AM Marián Dvorský <[email protected]> wrote:
>
>> If you have a function which given X returns a List<ABC>, you can use
>> FlatMapElements
>> <https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/FlatMapElements.html>
>>  transform
>> on PCollection<X> to get a PCollection<ABC>.
>>
>> On Thu, Jun 7, 2018 at 8:16 AM S. Sahayaraj <[email protected]> wrote:
>>
>>> In case if we could return List<ABC> from DoFn<> then we could use the
>>> code as suggested in section 3.1.2 and mentioned by you below., but the
>>> return type of DoFn<> is always PCollection<> in where I could not have the
>>> list of ABC objects which further will be fed as input for parallel
>>> computation. Is there any possibility to convert List<ABC> to
>>> PCollection<ABC> in DoFn<> itself? OR can DoFn<> return List<ABC> objects?
>>>
>>>
>>>
>>>
>>>
>>> Cheers,
>>>
>>> S. Sahayaraj
>>>
>>>
>>>
>>> *From:* Robert Bradshaw [mailto:[email protected]]
>>> *Sent:* Wednesday, June 6, 2018 9:40 PM
>>> *To:* [email protected]
>>> *Subject:* [EXTERNAL] - Re: Create PCollection<ABC> from List<ABC>in
>>> DoFn<>
>>>
>>>
>>>
>>> You can use the Create transform to do this, e.g.
>>>
>>>
>>>
>>>   Pipeline p = ...
>>>
>>>   List<ABC> inMemoryObjects = ...
>>>
>>>   PCollection<ABC> pcollectionOfObject = p.apply(Create.of(
>>> inMemoryObjects));
>>>
>>>   result = pcollectionOfObject.apply(ParDo.of(SomeDoFn...));
>>>
>>>
>>>
>>> See section 3.1.2 at
>>> https://beam.apache.org/documentation/programming-guide/#pcollections
>>>
>>>
>>>
>>> On Wed, Jun 6, 2018 at 8:34 AM S. Sahayaraj <[email protected]>
>>> wrote:
>>>
>>> Hello,
>>>
>>>                 I have created a java class which extends DoFn<>, there
>>> are list of objects of type ABC (List<ABC>) populated in
>>> processElement(ProcessContext c) at runtime and would like to generate
>>> respective PCollection<ABC> from List<ABC> so that the subsequent
>>> transformation can do parallel execution on each ABC object in
>>> PCollection<ABC>. How do we create PCollection from in-memory object
>>> created in DoFn<>? OR How do we get pipeline object being in DoFn<>? OR is
>>> there any SDK guidelines to refer?
>>>
>>>
>>>
>>>
>>>
>>> Thanks,
>>>
>>> S. Sahayaraj
>>>
>>>

Reply via email to