Hi Ken,

thank you for your answer!



On Fri, Feb 17, 2017 at 7:51 AM, Tobias Feldhaus 
<[email protected]> wrote:

My intention is to get a

        PCollectionView<Long> numOfDays

Which holds the number of elements (days) of a given PCollection<DateTime> 
dates,
and building it via a sum of counts (as a SingletonView).

Something that corresponds in my head to *wrong*:

        dates.apply("Count", Count.globally().asSingletonView());

Seems about right.

It seems about right to me too, but apparently I have a misconception about 
what is happening here in my head:

        PCollection<ItLogLine> logLines = p.apply("Read logfile", 
TextIO.Read.from(bucket))
                .apply("Repartition", Repartition.of())
                .apply("Parse JSON", ParDo.of(new ReadObjects()))
                .apply("Extract timestamp", ParDo.of(new ExtractTimestamps()));

        PCollection<DateTime> dates = logLines
                .apply("Get Dates", ParDo.of(new GetDateFunction()))
                .apply("Get distinct Dates", Distinct.<DateTime>create());
        
        final PCollectionView<Long> numOfDays = dates.apply("Count", 
Count.globally().asSingletonView());

This ends up in a type mismatch, is Count only applicable to certain types like 
String, Long, Integer, TableRow? 
Considering the code this would be counterintuitive as everything seems to be 
implemented using generic types in the SDK. 

I am still getting used to the Beam Programming Model and it confuses me still 
from time to time, sorry.
 
Is it possible to access the side input outside of a DoFn?

You can access a side input from a DoFn and now also from a 
CombineFnWithContext (a bit of an advanced feature).

So getting a single calculated value to partition a PCollection (number of days 
in this case), should be done via the extractOutput [0] 
Method in this case?

Best,
Tobi

[0] 
https://beam.apache.org/documentation/sdks/javadoc/0.5.0/org/apache/beam/sdk/transforms/CombineWithContext.CombineFnWithContext.html#extractOutput-AccumT-org.apache.beam.sdk.transforms.CombineWithContext.Context-


Reply via email to