Of course, this functionality can be implemented manually in a DoFn -- this wouldn't be a primitive. That said, an argument could be made that a Beam SDK should provide a composite that logs elements in a PCollection. This can further be augmented with proposals like "log_every_n_seconds" or "log_every_n_elements", etc. There's no such concept in the Java SDK right now, but can be easily constructed by users.
Now, there are several counter-arguments as well: - An approach where we provide a logging PTransform would be less flexible / powerful than a DoFn + direct usage of a logging library. Regardless, one might want to optimize the common, simple case. - A runner could choose to sample elements of all PCollections to provide this automatically, perhaps without any intervention from the pipeline writer (and/or enable this behavior on-demand, while the pipeline is running). Regarding runner portability, our thinking is that logging ought to work across runners, regardless of the choice of logging library. This is important to be able to use other, non-Beam related libraries, that neither we or the pipeline developers can control. Basically, a runner can set up logging bridges between major logging frameworks and standard streams. Then, the runner gets all logs from Beam SDK, user code, any random libraries that users may choose, etc. -- it all works transparently. On Sun, Mar 20, 2016 at 10:55 AM, Dan Halperin <[email protected]> wrote: > Sorry for short phone email. Check out this Transform from a pull request. > > > https://github.com/elibixby/DataflowJavaSDK/blob/51ee732964b3425c6c4a8677d135c41765d9bcdc/contrib/firebaseio/src/main/java/contrib/LogElements.java > > The key subtleties are around making sure you don't log too often--logging > per element is very expensive. > > Happy to discuss more if you have questions, and again sorry I'm on a > phone so can't provide a more comprehensive response. > > On Sun, Mar 20, 2016 at 10:21 Dan Halperin <[email protected]> wrote: > >> Hi Ismael, >> >> I think you can just do this with a normal DoFn. Why do you think this >> needs to be a new primitive? >> On Sun, Mar 20, 2016 at 10:20 Dan Halperin <[email protected]> wrote: >> >>> Hi is >>> On Sun, Mar 20, 2016 at 08:18 Jean-Baptiste Onofré <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> thanks for the update. >>>> >>>> IMHO, I would name Debug transform as Log: >>>> >>>> .apply(Log.withLevel("DEBUG")) >>>> .apply(Log.withLevel("INFO").withPattern("%d %m ...")) >>>> .apply(Log.withLevel("WARN").withMessage("Foo").withStream("System.out") >>>> >>>> It would more flexible and related to the actual behavior. >>>> >>>> I would mimic a bit the Camel log component for instance. >>>> >>>> If you don't mind, I will do it with you. >>>> >>>> Thanks >>>> Regards >>>> JB >>>> >>>> On 03/20/2016 12:07 PM, Ismaël Mejía wrote: >>>> > Hi, >>>> > >>>> > The code of the transform is here in a playground for Beam >>>> experiments I >>>> > created (it is a bit alpha for the moment, and it does not have >>>> comments): >>>> > >>>> > >>>> https://github.com/iemejia/beam-playground/blob/master/src/main/java/org/apache/beam/transforms/Debug.java >>>> > >>>> > Since my initial goal was more of a test scenario in the >>>> > DirectPipelineRunner I haven't considered yet more advanced logging >>>> > capabilities and the possible issues of distribution (serialization, >>>> in >>>> > particular of dependencies, as well as exceptions, etc), but of course >>>> > it is something I expect to improve if there is interest. Do you see >>>> > some immediate things to improve to try it with the distributed >>>> runners >>>> > (I want to do this, as a excuse also to try the FlinkRunner). >>>> > >>>> > Best, >>>> > -Ismael >>>> > >>>> > >>>> > On Sun, Mar 20, 2016 at 11:13 AM, Jean-Baptiste Onofré < >>>> [email protected] >>>> > <mailto:[email protected]>> wrote: >>>> > >>>> > By the way, for the "Integration" DSL, in addition of explicit >>>> debug >>>> > transform, it would make sense to have an implicit "Tracer". It's >>>> > something that I planned: it would allow us to have sampling on >>>> > PCollection if the pipeline tracer is enabled (like we do in a >>>> Camel >>>> > route with the tracer). >>>> > >>>> > Regards >>>> > JB >>>> > >>>> > On 03/20/2016 10:14 AM, Ismaël Mejía wrote: >>>> > >>>> > Hello, >>>> > >>>> > I just started playing with Beam and I wanted to debug what >>>> happens >>>> > between transforms in pipelines. I wrote a simple 'Debug' >>>> > transform for >>>> > this. >>>> > The idea is to apply a function based on a predicate to any >>>> > element in a >>>> > collection without changing the collection, or in other >>>> words, a >>>> > transform that >>>> > does not transform but produces side effects. >>>> > >>>> > The idea is better illustrated with this simple example: >>>> > >>>> > .apply(FlatMapElements.via((String text) -> >>>> > Arrays.asList(text.split(" "))) >>>> > .withOutputType(new TypeDescriptor<String>() { >>>> > })) >>>> > .apply(Debug >>>> > .when((String s) -> s.startsWith("A")) >>>> > .with((String s) -> { >>>> > System.out.println(s); >>>> > return null; >>>> > })); >>>> > .apply(Filter.byPredicate((String text) -> >>>> text.length() > 5)) >>>> > .apply(Debug.print()); // sugared method, same as above >>>> > >>>> > I think this can be useful (at least for debugging purposes), >>>> is >>>> > there >>>> > something >>>> > like this already in the SDK ? If this is not the case, can >>>> you >>>> > please >>>> > give me some >>>> > feedback/ideas to improve my transform. >>>> > >>>> > Thanks, >>>> > -Ismael >>>> > >>>> > ps. You can find the code of the first version of the >>>> transform >>>> > here: >>>> > >>>> https://github.com/iemejia/beam-playground/blob/master/src/main/java/org/apache/beam/transforms/Debug.java >>>> > >>>> > >>>> > >>>> > -- >>>> > Jean-Baptiste Onofré >>>> > [email protected] <mailto:[email protected]> >>>> > http://blog.nanthrax.net >>>> > Talend - http://www.talend.com >>>> > >>>> > >>>> >>>> -- >>>> Jean-Baptiste Onofré >>>> [email protected] >>>> http://blog.nanthrax.net >>>> Talend - http://www.talend.com >>>> >>>
