Re: Are there plans for removing joda-time from the beam java SDK?

2018-09-27 Thread Jeff Klukas
(eg. some class internals, not exposed through class >> interface). I'm not sure if adding methods with alternative signatures only >> because of the time is the way to go (lots of duplication, low gain?). I >> think we should wait with places like this until the 3.0 version. >&g

Re: [Proposal] Add exception handling option to MapElements

2018-10-05 Thread Jeff Klukas
I've posted a full PR for the Java exception handling API that's ready for review: https://github.com/apache/beam/pull/6586 It implements new WithErrors nested classes on MapElements, FlatMapElements, Filter, AsJsons, and ParseJsons. On Wed, Oct 3, 2018 at 7:55 PM Jeff Klukas wrote: > J

Re: [Proposal] Add exception handling option to MapElements

2018-10-05 Thread Jeff Klukas
k whether we can implement this on ParDo as well, > though that might be a bit trickier since it involves hooking into > DoFnInvoker. > > Reuven > > On Fri, Oct 5, 2018 at 10:33 AM Jeff Klukas wrote: > >> I've posted a full PR for the Java exception handling API that's rea

Re: [Proposal] Add exception handling option to MapElements

2018-10-11 Thread Jeff Klukas
constructor and thus isn't AvroCoder-compatible. I'm currently handling this through early failure with context about how to choose a different exception type. On Fri, Oct 5, 2018 at 3:59 PM Jeff Klukas wrote: > It would be ideal to have some higher-level way of wrapping a PTransform > to

Re: 2 tier input

2018-10-22 Thread Jeff Klukas
Chaim - If the full list of IDs is able to fit comfortably in memory and the Mongo collection is small enough that you can read the whole collection, you may want to fetch the IDs into a Java collection using the BigQuery API directly, then turn them into a Beam PCollection using

Re: [Proposal] Add exception handling option to MapElements

2018-10-23 Thread Jeff Klukas
code for exception handling is nearly identical between the three classes and could be consolidated without altering the current public interfaces. Are there concerns with adding such a base class ? On Thu, Oct 11, 2018 at 4:44 PM Jeff Klukas wrote: > The PR (https://github.com/apache/beam/p

Re: New Edit button on beam.apache.org pages

2018-10-25 Thread Jeff Klukas
Max - The website source was indeed merged into the main beam repository a few weeks ago, separate from this change. The edit button is a great idea! On Thu, Oct 25, 2018 at 7:37 AM Maximilian Michels wrote: > Cool! > > I guess the underlying change is that the website can now be edited >

Re: New Edit button on beam.apache.org pages

2018-10-25 Thread Jeff Klukas
On Thu, Oct 25, 2018 at 3:17 PM Kenneth Knowles wrote: > What I haven't figured out is how to get GitHub to create the branch for > the PR on your fork. > GitHub does that work for you. I don't have commit access to apache/beam, so when I hit the link, it gives me a banner at the top of the

Evolving a Coder for an added field

2018-11-02 Thread Jeff Klukas
I'm adding a new lastModifiedMillis field to MatchResult.Metadata [0] which requires also updating MetadataCoder, but it's not clear to me whether there are guidelines to follow when evolving a type when that changes the encoding. Is a user allowed to update Beam library versions as part of

Re: Evolving a Coder for an added field

2018-11-02 Thread Jeff Klukas
Lukasz - Thanks for those links. That's very helpful context. It sounds like there's no explicit user contract about evolving Coder classes in the Java SDK and users might reasonably assume Coders to be stable between SDK versions. Thus, users of the Dataflow or Flink runners might reasonably

FileSystems should retrieve lastModified time

2018-10-29 Thread Jeff Klukas
I just wrote up a JIRA issues proposing that FileSystem implementations retrieve lastModified time of the files they list: https://issues.apache.org/jira/browse/BEAM-5910 Any immediate concerns? I'm not intimately familiar with HDFS, but I'm otherwise confident that GCS, S3, and local filesystems

Re: Design review for supporting AutoValue Coders and conversions to Row

2018-11-09 Thread Jeff Klukas
Anton - Thanks for reading and commenting. I've gone as far as creating a skeleton AutoValue extension to better understand how that API works, but I don't yet have a working prototype for either of these proposed additions. I'll move on to prototyping the Coder generation for AutoValue classes

Design review for supporting AutoValue Coders and conversions to Row

2018-11-09 Thread Jeff Klukas
Hi all - I'm looking for some review and commentary on a proposed design for providing built-in Coders for AutoValue classes. There's existing discussion in BEAM-1891 [0] about using AvroCoder, but that's blocked on incompatibility between AutoValue and Avro's reflection machinery that don't look

Re: Design review for supporting AutoValue Coders and conversions to Row

2018-11-12 Thread Jeff Klukas
to turn it into a Row if a > schema is registered). > > We already do this for POJOs and basic JavaBeans. I'm happy to help do > this for AutoValue. > > Reuven > > On Sat, Nov 10, 2018 at 5:50 AM Jeff Klukas wrote: > >> Hi all - I'm looking for some review and commentary

Re: Design review for supporting AutoValue Coders and conversions to Row

2018-11-13 Thread Jeff Klukas
. Those types would define the schema, and the fromRow impementation would call the discovered constructor. On Mon, Nov 12, 2018 at 10:02 AM Reuven Lax wrote: > > > On Mon, Nov 12, 2018 at 11:38 PM Jeff Klukas wrote: > >> Reuven - A SchemaProvider makes sense. It's not c

Re: Evolving a Coder for an added field

2018-11-12 Thread Jeff Klukas
ated > >> >>> Matadata object) ? In this case where is the right place to identify > >> >>> and decide what coder to use? > >> >>> > >> >>> Other ideas... ? > >> >>> > >> >>> Last thing, th

Re: [Proposal] Add exception handling option to MapElements

2018-10-03 Thread Jeff Klukas
n > > On Tue, Oct 2, 2018 at 4:49 PM Jeff Klukas wrote: > >> I've seen a few Beam users mention the need to handle errors in their >> transforms by using a try/catch and routing to different outputs based on >> whether an exception was thrown. This was particularly nicely wr

Re: [Proposal] Add exception handling option to MapElements

2018-10-03 Thread Jeff Klukas
anges are related. Please open a JIRA for tracking. >> >> Have you though about introducing a similar change to Python SDK ? >> (doesn't have to be the same PR). >> >> - Cham >> >> On Wed, Oct 3, 2018 at 8:31 AM Jeff Klukas wrote: >> >

[Proposal] Add exception handling option to MapElements

2018-10-02 Thread Jeff Klukas
I've seen a few Beam users mention the need to handle errors in their transforms by using a try/catch and routing to different outputs based on whether an exception was thrown. This was particularly nicely written up in a post by Vallery Lancey:

Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-10-03 Thread Jeff Klukas
I'm working on https://issues.apache.org/jira/browse/BEAM-5638 to add exception handling options to single message transforms in the Java SDK. MapElements' via() method is overloaded to accept either a SimpleFunction, a SerializableFunction, or a Contextful, all of which are ultimately stored as

Re: [Proposal] Add exception handling option to MapElements

2018-10-03 Thread Jeff Klukas
Jira issues for adding exception handling in Java and Python SDKs: https://issues.apache.org/jira/browse/BEAM-5638 https://issues.apache.org/jira/browse/BEAM-5639 I'll plan to have a complete PR for the Java SDK put together in the next few days. On Wed, Oct 3, 2018 at 1:29 PM Jeff Klukas

Re: [Proposal] Add a static PTransform.compose() method for composing transforms in a lambda expression

2018-09-19 Thread Jeff Klukas
? > * can't supply display data? > > +u...@beam.apache.org , do users think that the > provided API would be useful enough for it to be added to the core SDK or > would the addition of the method provide noise/detract from the existing > API? > > On Mon, Sep 17, 2018 at 12:57 P

Re: Are there plans for removing joda-time from the beam java SDK?

2018-09-26 Thread Jeff Klukas
ava.time >>> with joda-time. See the giant schema PR here: >>> https://github.com/apache/beam/pull/4964 I don't think this was ever >>> discussed on the list though. >>> >>> Andrew >>> >>> On Wed, Sep 26, 2018 at 9:21 AM Jeff Klukas <

Why does Beam not use the google-api-client libraries?

2019-01-02 Thread Jeff Klukas
I'm building a high-volume Beam pipeline using PubsubIO and running into some concerns over performance and delivery semantics, prompting me to want to better understand the implementation. Reading through the library, PubsubIO appears to be a completely separate implementation of Pubsub client

Re: Why does Beam not use the google-api-client libraries?

2019-01-02 Thread Jeff Klukas
html > [5] > https://github.com/apache/beam/blob/2e759fecf63d62d110f29265f9438128e3bdc8ab/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.java#L130 > > The latter seems to be the older library, so I would assume it's for > legacy reasons. > > Reg

Re: Confusing sentence in Windowing section in Beam programming guide

2019-01-17 Thread Jeff Klukas
How about: "Once the watermark progresses past the end of a window, any further elements that arrive with a timestamp in that window are considered late data." On Thu, Jan 17, 2019 at 1:43 PM Rui Wang wrote: > Hi Community, > > In Beam programming guide [1], there is a sentence: "Data that

Re: Confusing sentence in Windowing section in Beam programming guide

2019-01-17 Thread Jeff Klukas
t; Though it's not tied to window. You could be in the global window, so the > watermark never advances past the end of the window, yet still get late > data. > > On Thu, Jan 17, 2019, 11:14 AM Jeff Klukas >> How about: "Once the watermark progresses past the end of

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-23 Thread Jeff Klukas
add a comment to > make the test more clear. I am assuming the modified times get copied, not > re-timestamped on copy, which is why your method works. > Otherwise looks good to me > > On Wed, Jan 23, 2019 at 5:49 AM Jeff Klukas wrote: > >> Posted https://github.com/apache/b

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-23 Thread Jeff Klukas
to Eugene's suggestion #1. On Wed, Jan 23, 2019 at 8:12 AM Jeff Klukas wrote: > Suggestion #4: Create source files outside the writer thread, and then > copy them from a source directory to the watched directory. That should > atomically write the file with the already known lastModific

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-23 Thread Jeff Klukas
. >> 2) Implement a Matcher that ignores last modified time and use >> that >> >> Jeff - your option #3 is unfortunately also race-prone, because the code >> may match the files after they have been written but before >> setLastModifiedTime was called. >> >>

Re: FileIOTest.testMatchWatchForNewFiles flakey in java presubmit

2019-01-23 Thread Jeff Klukas
Suggestion #4: Create source files outside the writer thread, and then copy them from a source directory to the watched directory. That should atomically write the file with the already known lastModificationTime. On Wed, Jan 23, 2019 at 7:37 AM Jeff Klukas wrote: > I'll work on getting a

Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-12-17 Thread Jeff Klukas
Thanks to Kenn Knowles for picking up the review here. The PR is merged, so the new interface ProcessFunction and abstract class InferableFunction class (both of which declare `throws Exception`) are now available in master. On Fri, Dec 14, 2018 at 12:18 PM Jeff Klukas wrote: > Check

Re: help with right transform to read tgz file

2018-12-26 Thread Jeff Klukas
The general approach in your example looks reasonable to me. I don't think there's anything built in to Beam to help with parsing the tar file format and I don't know how robust the method of replacing "^@" and then splitting on newlines will be. I'd likely use Apache's commons-compress library

Re: Evolving a Coder for an added field

2018-12-27 Thread Jeff Klukas
using it either opt-in or, if a good enough case can be made, possibly even >> opt-out could get this unblocked. >> >> On Mon, Nov 26, 2018 at 3:05 PM Jeff Klukas wrote: >> >>> Lukasz - Were you able to get any more context on the possibility of >>> versioni

Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-12-14 Thread Jeff Klukas
for developers to be familiar with rather than 2. On Fri, Dec 7, 2018 at 6:54 AM Robert Bradshaw wrote: > How should we move forward on this? The idea looks good, and even > comes with a PR to review. Any objections to the names? > On Wed, Dec 5, 2018 at 12:48 PM Jeff Klukas wrote: > &g

Re: TextIO setting file dynamically issue

2018-11-28 Thread Jeff Klukas
You can likely achieve what you want using FileIO with dynamic destinations, which is described in the "Advanced features" section of the TextIO docs [0]. Your case might look something like: PCollection events = ...; events.apply(FileIO.writeDynamic() .by(event ->

Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-11-28 Thread Jeff Klukas
introduce this giving time for people to >> migrate, would it make sense to do that now and deprecate the alternatives >> that it replaces? >> >> On Mon, Nov 26, 2018 at 5:59 AM Jeff Klukas wrote: >> >>> Picking up this thread again. Based on the

Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-12-05 Thread Jeff Klukas
Reminder that I'm looking for review on https://github.com/apache/beam/pull/7160 On Thu, Nov 29, 2018, 11:48 AM Jeff Klukas I created a JIRA and a PR for this: > > https://issues.apache.org/jira/browse/BEAM-6150 > https://github.com/apache/beam/pull/7160 > > On naming

Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-11-29 Thread Jeff Klukas
mes. This has > everything I love. > > Kenn > > On Wed, Nov 28, 2018 at 6:43 PM Jeff Klukas wrote: > >> It looks like we can add make the new interface a superinterface for the >> existing SerializableFunction while maintaining binary compatibility [0]. >&g

Re: Can we allow SimpleFunction and SerializableFunction to throw Exception?

2018-11-26 Thread Jeff Klukas
kedin.com/in/rmannibucau> | Book >>>>>>> <https://www.packtpub.com/application-development/java-ee-8-high-performance> >>>>>>> >>>>>>> >>>>>>> Le dim. 14 oct. 2018 à 10:49, Reuven Lax a >>>>>>> écrit : >>&

Re: Evolving a Coder for an added field

2018-11-26 Thread Jeff Klukas
a new > snapshot with the encodings modified. > > Reuven > > On Tue, Nov 13, 2018 at 3:16 AM Jeff Klukas wrote: > >> Conversation here has fizzled, but sounds like there's basically a >> consensus here on a need for a new concept of Coder versioning that's >> accessible

Re: Design review for supporting AutoValue Coders and conversions to Row

2018-11-26 Thread Jeff Klukas
ing constructor support >>> to this right now. >>> >>> On Wed, Nov 14, 2018 at 12:29 AM Jeff Klukas >>> wrote: >>> >>>> Sounds, then, like we need to a define a new `AutoValueSchema extends >>>> SchemaProvider` and users

Re: Query expressions for schema fields

2019-01-07 Thread Jeff Klukas
There is also JMESPath (http://jmespath.org/) which is quite similar to JsonPath, but does have a spec and lacks the leading $ character. The AWS CLI uses JMESPath for defining queries. On Mon, Jan 7, 2019 at 1:05 PM Reuven Lax wrote: > > > On Mon, Jan 7, 2019 at 1:44 AM Robert Bradshaw >

Re: Possible to use GlobalWindows for writing unbounded input to files?

2019-01-11 Thread Jeff Klukas
t; do with Flattening two PCollections together) with their original trigger. >> Without this, we also know that you can have three PCollections with >> identical triggering and you can CoGroupByKey them together but you cannot >> do this three-way join as a sequence of binary joins. >&

Re: Possible to use GlobalWindows for writing unbounded input to files?

2019-01-11 Thread Jeff Klukas
Lax wrote: > Ah, > > numShards = 0 is explicitly not supported in unbounded mode today, for the > reason mentioned above. If FileIO doesn't reject the pipeline in that case, > we should fix that. > > Reuven > > On Fri, Jan 11, 2019 at 9:23 AM Jeff Klukas

[Proposal] Add a static PTransform.compose() method for composing transforms in a lambda expression

2018-09-14 Thread Jeff Klukas
Hello all, I'm a data engineer at Mozilla working on a first project using Beam. I've been impressed with the usability of the API as there are good built-in solutions for handling many simple transformation cases with minimal code, and wanted to discuss one bit of ergonomics that seems to be

Re: [Proposal] Add a static PTransform.compose() method for composing transforms in a lambda expression

2018-09-17 Thread Jeff Klukas
I've gone ahead and filed a JIRA Issue and GitHub PR to follow up on this suggestion and make it more concrete: https://issues.apache.org/jira/browse/BEAM-5413 https://github.com/apache/beam/pull/6414 On Fri, Sep 14, 2018 at 1:42 PM Jeff Klukas wrote: > Hello all, I'm a data engin

Re: Apache Beam, Spark, Hadoop and AWS cross accounts

2019-03-25 Thread Jeff Klukas
I don't think I'm fully understanding the input part of your pipeline since it looks like it's using some custom IO, but I am fairly confident that FileIO.write() will never produce empty files, so the behavior you describe sounds expected if you're reading in an empty file. FileIO doesn't copy

Add exception handling to MapElements

2019-02-08 Thread Jeff Klukas
I'm looking for feedback on a new attempt at implementing an exception handling interface for map transforms as previously discussed on this list [0] and documented in JIRA [1]. I'd like for users to be able to pass a function into MapElements, FlatMapElements, etc. that potentially raises an

Re: Add exception handling to MapElements

2019-02-12 Thread Jeff Klukas
here each step can either be "fulfilled" or "rejected". The > promises can then be chained together like above. > > > On Mon, Feb 11, 2019 at 1:47 PM Jeff Klukas wrote: > >> Vallery Lancey's post is definitely one of the viewpoints incorporated >> in

Re: Add exception handling to MapElements

2019-02-11 Thread Jeff Klukas
like to add another option that adds A+ Promise spec into the > apply method. This makes exception handling more general than with only the > Map method. > > On Fri, Feb 8, 2019 at 9:53 AM Jeff Klukas wrote: > >> I'm looking for feedback on a new attempt at implementing an except

Re: pipeline steps

2019-02-07 Thread Jeff Klukas
I haven't needed to do this with Beam before, but I've definitely had similar needs in the past. Spark, for example, provides an input_file_name function that can be applied to a dataframe to add the input file as an additional column. It's not clear to me how that's implemented, though. Perhaps

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Jeff Klukas
I would prefer we move towards option [2]. I just tried the following refactor in my own code from: return input .apply(TextIO.read().from(fileSpec)); to: return input .apply(FileIO.match().filepattern(fileSpec)) .apply(FileIO.readMatches())

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-01-30 Thread Jeff Klukas
ses > each chunk in a ParDo. TextIO.readFiles currently chops up each file into > 64mb chunks (hardcoded), and then processes each chunk in a ParDo. > > Reuven > > > On Wed, Jan 30, 2019 at 9:18 AM Jeff Klukas wrote: > >> I would prefer we move towards option [2]. I

Schemas for classes with type parameters

2019-02-04 Thread Jeff Klukas
I've started experimenting with Beam schemas in the context of creating custom AutoValue-based classes and using AutoValueSchema to generate schemas and thus coders. AFAICT, schemas need to have types fully specified, so it doesn't appear to be possible to define an AutoValue class with a type

Re: GSOC - Implement an S3 filesystem for Python SDK

2019-05-01 Thread Jeff Klukas
For getting started reading data, there are some public S3 buckets on which Amazon hosts data used in tutorials. For example, you should be able to access s3://awssampledbuswest2 which is referenced in Redshift tutorials [0]. Amazon also has a free tier for S3 for the first year an account is

Review needed: BigQuery clustering support for Beam Java API

2019-06-28 Thread Jeff Klukas
Beam devs - I'm looking for review on https://github.com/apache/beam/pull/8945 which builds on a previous PR by Wout Sheepers that was never merged due to concerns over evolving the coder for TableDestination. The new PR defaults to the existing TableDestinationCoderV2 and ensures the new

Re: Prevent Shuffling on Writing Files

2019-09-18 Thread Jeff Klukas
What you propose with a writer per bundle is definitely possible, but I expect the blocker is that in most cases the runner has control of bundle sizes and there's nothing exposed to the user to control that. I've wanted to do similar, but found average bundle sizes in my case on Dataflow to be so

Re: Version Beam Website Documentation

2019-12-04 Thread Jeff Klukas
The API reference docs (Java and Python at least) are versioned, so we have a durable reference there and it's possible to link to particular sections of API docs for particular versions. For the major bits of introductory documentation (like the Beam Programming Guide), I think it's a good thing

Re: [VOTE] Beam's Mascot will be the Firefly (Lampyridae)

2019-12-13 Thread Jeff Klukas
+1 (non-binding) On Thu, Dec 12, 2019 at 11:58 PM Kenneth Knowles wrote: > Please vote on the proposal for Beam's mascot to be the Firefly. This > encompasses the Lampyridae family of insects, without specifying a genus or > species. > > [ ] +1, Approve Firefly being the mascot > [ ] -1,

Reviewer needed for clustering bug fix in Java BigQueryIO.Write

2019-10-22 Thread Jeff Klukas
I've had a one-line bug fix PR open since last week that I'd love to get merged for 2.17. Would a committer be willing to take a look at it? https://github.com/apache/beam/pull/9784

Re: Support for LZO compression.

2019-10-09 Thread Jeff Klukas
Sameer - Also note that there's some prior art for optional components, particularly compression algorithms. We added support for zstd compression, but document that it's the user's responsibility to make sure the relevant library is available on the classpath [0]. There was some interest in a

Re: Subclassing MapElements

2020-01-28 Thread Jeff Klukas
Jason - I was going to suggest the same approach that you arrived at. I think using a static method to a MapElements instance is an elegant solution. The one drawback I can think of with that approach is that you lose control over default naming. From your example usage:

Re: Discrete Transforms vs One Single transform

2020-02-21 Thread Jeff Klukas
Also note that runners in many cases will fuse discrete transforms together for efficiency, so while you might worry about performance degradation from breaking your logic into many discrete transforms, that likely won't be an issue in practice. Also note that you have the option of defining

Review needed for BEAM-8745 - exposing BigQueryIO.Write#withMaxBytesPerPartition

2020-01-09 Thread Jeff Klukas
Per the Contribution Guide, I'm bringing a PR that's been idle for more than 3 days to the dev list to find a reviewer. This change doesn't add any new code, but makes the existing withMaxBytesPerPartition method public and adds documentation concerning its use.

Re: [RESULT][VOTE] Accept the Firefly design donation as Beam Mascot - Deadline Mon April 6

2020-04-17 Thread Jeff Klukas
I personally like the sound of "Datum" as a name. I also like the idea of not assigning them a gender. As a counterpoint on the naming side, one of the slide decks provided while iterating on the design mentions: > Mascot can change colors when it is “full of data” or has a “batch of data” to

Re: [QUESTION] Reading Snappy Compressed Text Files

2020-04-22 Thread Jeff Klukas
Beam is able to infer compression from file extensions for a variety of formats, but snappy is not among them currently: https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/Compression.java Although ParquetIO and AvroIO each look to have support for

Re: [BEAM-10587] Support Maps in BigQuery #12389

2020-10-09 Thread Jeff Klukas
It's definitely desirable to be able to get back Map types from BQ, and it's nice that BQ is consistent in representing maps as repeated key/value structs. Inferring maps from that specific structure is preferable to inventing some new naming convention for the fields, which would hinder

Re: BigQueryIO create dataset

2020-12-03 Thread Jeff Klukas
> I don't think BigQuery offers a way to automatically create datasets when writing. This exactly, and it also makes sense from the standpoint of the BQ permissions model. In BigQuery, table creation requires that the service account have privileges at the dataset level, but dataset creation

Re: Use case for reading to dynamic Pub/Sub subscriptions?

2020-12-09 Thread Jeff Klukas
I have certainly had use cases in the past where I've made use of the Kafka consumer library's ability to consume from a set of topics based on a regular expression. Specifically, I worked with a microservices architecture where each service had a Postgres DB with a logical decoding client for

Re: Upgrade instruction from TimerDataCoder to TimerDataCoderV2

2020-11-06 Thread Jeff Klukas
Ke - You are correct that generally data encoded with a previous coder version cannot be read with an updated coder. The formats have to match exactly. As far as I'm aware, it's necessary to flush a job and start with fresh state in order to upgrade coders. On Fri, Nov 6, 2020 at 2:13 PM Ke Wu

Re: How to flush window when draining a Dataflow pipeline?

2021-05-21 Thread Jeff Klukas
ri, May 21, 2021 at 5:10 AM Jeff Klukas wrote: > >> Beam users, >> >> We're attempting to write a Java pipeline that uses Count.perKey() to >> collect event counts, and then flush those to an HTTP API every ten minutes >> based on processing time.

Re: [Question] How does one test Counters?

2021-09-01 Thread Jeff Klukas
We have a few transform-level tests where we check counter behavior in Mozilla's telemetry pipeline codebase. See: https://github.com/mozilla/gcp-ingestion/blob/fa98ac0c8fa09b5671a961062e6cf0985ec48b0e/ingestion-beam/src/test/java/com/mozilla/telemetry/decoder/GeoIspLookupTest.java#L77-L83 But