Re: Testing an updating side input on global window

2018-05-29 Thread Kenneth Knowles
After the first value for a triggered side input, that side input is always "ready" for the particular window, so there is no longer any synchronization with the main input. You probably want to have tests with the updated and the old value. I would expect that your steps 1-4 would work with TestSt

Re: Testing an updating side input on global window

2018-05-29 Thread Pablo Estrada
As far as I know, that behavior is not specified. I do not think that Dataflow streaming supports this sort of updating to side inputs, though I've added Slava who might have more to add. If updating side inputs is really not supported in Dataflow, you may be able to use a LoadingCache, like so: h

Re: Testing an updating side input on global window

2018-05-29 Thread Carlos Alonso
Hi Lukasz, many thanks for your responses. I'm actually using them but I think I'm not being able to synchronise the following steps: 1: The side input gets its first value (v1) 2: The main stream gets that side input applied and finds that v1 value 3: The side one gets updated (v2) 4: The main st

Re: Testing an updating side input on global window

2018-05-29 Thread Lukasz Cwik
Your best bet is to use TestStreams[1] as it is used to validate window/triggering behavior. Note that the transform requires special runner based execution and currently only works with the DirectRunner. All examples are marked with the JUnit category "UsesTestStream", for example [2]. 1: https:/

Testing an updating side input on global window

2018-05-29 Thread Carlos Alonso
Hi all!! Basically that's what I'm trying to do. I'm building a pipeline that has a refreshing, multimap, side input (BQ schemas) that then I apply to the main stream of data (records that are ultimately saved to the corresponding BQ table). My job, although being of streaming nature, runs on the

Re: Support for graph processing

2018-05-29 Thread Lukasz Cwik
This has been a long requested feature within Apache Beam: https://issues.apache.org/jira/browse/BEAM-106 The short story is that this requires a lot of support from execution engines since watermarks become a concept of time + loop iteration (possibly multiple loop iterations if they are nested).

Re: Java based AWS IO connector

2018-05-29 Thread Alexey Romanenko
Hi Sahayaraj, Yes, there is a module “beam-sdks-java-io-amazon-web-services” which can help with this. Also, I’d suggest you to take a look on this example which reads data from S3 bucket: https://github.com/jbonofre/beam-samples/blob/master/amazon-web-services/src/main/java/org/apache/beam/sa

Re: Java based AWS IO connector

2018-05-29 Thread Eugene Kirpichov
Beam works with S3 out of the box, you can provide s3://... paths to anything that works with files. Don't remember if it's already available in 2.4 or only starting 2.5 - does it currently not work for you? On Tue, May 29, 2018, 2:26 PM S. Sahayaraj wrote: > Hello All, > > The d

Java based AWS IO connector

2018-05-29 Thread S. Sahayaraj
Hello All, The data source for our Beam pipleline is in S3 bucket, Is there any built-in I/O Connector available with Java samples? If so, can you please guide me how to integrate with them?. I am using Bean SDK for Java version 2.4.0 and Spark runner in clustere

RE: How to process data

2018-05-29 Thread maxime.dejeux
Hi Jean-Baptiste, Yes, I want to process data with the timestamp available in each record (event time). For the use case, I have a dataset (Json) with a timestamp in the past available inside the record itself. This dataset is pushed in one topic Kafka. With Beam, I want to detect some records