Re: [DISCUSSION] Consistent use of loggers

2017-03-21 Thread Aviem Zur
+1 to what JB said. Will just have to be documented well as if we provide no binding there will be no logging out of the box unless the user adds a binding. On Wed, Mar 22, 2017 at 6:24 AM Jean-Baptiste Onofré wrote: > Hi Aviem, > > Good point. > > I think, in our dependencies set, we should ju

Re: [DISCUSSION] Consistent use of loggers

2017-03-21 Thread Jean-Baptiste Onofré
Hi Aviem, Good point. I think, in our dependencies set, we should just depend to slf4j-api and let the user provides the binding he wants (slf4j-log4j12, slf4j-simple, whatever). We define a binding only with test scope in our modules. Regards JB On 03/22/2017 04:58 AM, Aviem Zur wrote: Hi

[DISCUSSION] Consistent use of loggers

2017-03-21 Thread Aviem Zur
Hi all, There have been a few reports lately (On JIRA [1] and on Slack) from users regarding inconsistent loggers used across Beam's modules. While we use SLF4J, different modules use a different logger behind it (JUL, log4j, etc) So when people add a log4j.properties file to their classpath for

Re: First IO IT Running!

2017-03-21 Thread Jean-Baptiste Onofré
Awesome !!! Great news ! Thanks guys for that ! I started to implement IT in JMS, MQTT, Redis, Cassandra IOs. I keep you posted. Regards JB On 03/21/2017 11:01 PM, Stephen Sisk wrote: I'm really excited to see these tests are running! These Jdbc tests are testing against a postgres instance

Re: [DISCUSSION] using NexMark for Beam

2017-03-21 Thread Kenneth Knowles
This is great! Having a variety of realistic-ish pipelines running on all runners complements the validation suite and IO IT work. If I recall, some of these involve heavy and esoteric uses of state, so definitely give me a ping if you hit any trouble. Kenn On Tue, Mar 21, 2017 at 9:38 AM, Etien

Re: First IO IT Running!

2017-03-21 Thread Kenneth Knowles
This is a really exciting development! I would definitely like to help out. Still ingesting the docs and JIRAs. On Tue, Mar 21, 2017 at 3:01 PM, Stephen Sisk wrote: > I'm really excited to see these tests are running! > > These Jdbc tests are testing against a postgres instance - that instance i

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-21 Thread Raghu Angadi
Expanding a bit more on what Dan wrote: - In Dataflow, there are two modes of restarting a job : regular stop and then start & an *update*. The checkpoint is carried over only in the case of update. - Update is the only to keep 'exactly-once' semantics. - If the requirements are not

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-21 Thread Dan Halperin
[We should keep user list involved if that's where the discussion originally was :)] Jins George's original question was a good one. The right way to resume from the previous offset here is what we're already doing – use the KafkaCheckpointMark. In Beam, the runner maintains the state and not the

Re: Style: how much testing for transform builder classes?

2017-03-21 Thread Robert Bradshaw
On Tue, Mar 21, 2017 at 5:14 PM, Dan Halperin wrote: > https://github.com/apache/beam/commit/b202548323b4d59b11bbdf06c99d0f > 99e6a947ef > is one example where tests of feature Bar exist but did not discover bugs > that could be introduced by builders. > True, though one would need to test the f

Re: Style: how much testing for transform builder classes?

2017-03-21 Thread Dan Halperin
https://github.com/apache/beam/commit/b202548323b4d59b11bbdf06c99d0f99e6a947ef is one example where tests of feature Bar exist but did not discover bugs that could be introduced by builders. AutoValue like alleviates many, but not all, of these concerns - as Ismael points out. On Tue, Mar 21, 2

Re: [PROPOSAL] "Requires deterministic input"

2017-03-21 Thread Ben Chambers
Allowing an annotation on DoFn's that produce deterministic output could be added in the future, but doesn't seem like a great option. 1. It is a correctness issue to assume a DoFn is deterministic and be wrong, so we would need to assume all transform outputs are non-deterministic unless annotate

Re: Docker image dependencies

2017-03-21 Thread Stephen Sisk
Hey Ismael, I definitely agree with you that we want something that developers will actually be able to/want to use. in my experience *all* the container orchestration engines are non-trivial to set up. When I started examining solutions for beam hosting, I did installs of mesos, kubernetes and d

Re: [PROPOSAL] "Requires deterministic input"

2017-03-21 Thread Kenneth Knowles
Good points & questions. I'll try to be more clear. > On 21 March 2017 at 13:52, Stephen Sisk wrote: > > > Hey Kenn- > > > > this seems important, but I don't have all the context on what the > problem > > is. > > > > Can you explain this sentence "Specifically, there is pseudorandom data > > g

Re: First IO IT Running!

2017-03-21 Thread Stephen Sisk
I'm really excited to see these tests are running! These Jdbc tests are testing against a postgres instance - that instance is running on the kubernetes cluster I've set up for beam IO ITs as discussed in the "Hosting data stores for IO transform testing" thread[0]. I set up that postgres instance

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-21 Thread Mingmin Xu
In SparkRunner, the default checkpoint storage is TmpCheckpointDirFactory. Can it restore during job restart? --Not test the runner in streaming for some time. Regarding to data-completeness, I would use at-most-once when few data missing(mostly tasknode failure) is tolerated, compared to the perf

First IO IT Running!

2017-03-21 Thread Jason Kuster
Hi all, Exciting news! As of yesterday, we have checked in the Jenkins configuration for our first continuously running IO Integration Test! You can check it out in Jenkins here[1]. We’re also publishing results to a database, and we’ve turned up a basic dashboarding system where you can see the r

Re: [PROPOSAL] "Requires deterministic input"

2017-03-21 Thread vikas rk
+1 for the general idea of runners handling it over hard-coded implementation strategy. For the Write transform I believe you are talking about ApplyShardingKey

Re: [PROPOSAL] "Requires deterministic input"

2017-03-21 Thread Stephen Sisk
Hey Kenn- this seems important, but I don't have all the context on what the problem is. Can you explain this sentence "Specifically, there is pseudorandom data generated and once it has been observed and used to produce a side effect, it cannot be regenerated without erroneous results." ? Where

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-21 Thread Amit Sela
On Tue, Mar 21, 2017 at 7:26 PM Mingmin Xu wrote: > Move discuss to dev-list > > Savepoint in Flink, also checkpoint in Spark, should be good enough to > handle this case. > > When people don't enable these features, for example only need at-most-once > The Spark runner forces checkpointing on an

[PROPOSAL] "Requires deterministic input"

2017-03-21 Thread Kenneth Knowles
Problem: I will drop all nuance and say that the `Write` transform as it exists in the SDK is incorrect until we add some specification and APIs. We can't keep shipping an SDK with an unsafe transform in it, and IMO this certainly blocks a stable release. Specifically, there is pseudorandom data

Re: Style: how much testing for transform builder classes?

2017-03-21 Thread Robert Bradshaw
On Wed, Mar 15, 2017 at 2:11 AM, Ismaël Mejía wrote: > +1 to Vikas point maybe the right place to enforce things correct > build tests is in the validate and like this reduce the test > boilerplate and only test the validate, but I wonder if this totally > covers both cases (the buildsCorrectly a

Re: Kafka Offset handling for Restart/failure scenarios.

2017-03-21 Thread Mingmin Xu
Move discuss to dev-list Savepoint in Flink, also checkpoint in Spark, should be good enough to handle this case. When people don't enable these features, for example only need at-most-once semantic, each unbounded IO should try its best to restore from last offset, although CheckpointMark is nul

Re: Beam spark 2.x runner status

2017-03-21 Thread Ted Yu
I have done some work over in HBASE-16179 where compatibility modules are created to isolate changes in Spark 2.x API so that code in hbase-spark module can be reused. FYI

Re: [DISCUSSION] using NexMark for Beam

2017-03-21 Thread Jean-Baptiste Onofré
Hi Etienne, That's a great news and good job ! By "having Nexmark on Beam", I guess you mean the translation of the NEXMark queries in Beam, not NEXMark itself, right ? If you mean the later, I'm not sure as NEXMark is not Beam related (it's more generic) and it could be tricky in terms of l

Re: [DISCUSSION] using NexMark for Beam

2017-03-21 Thread Dan Halperin
Not a deep response, but this is awesome! We'd really like to have some good benchmarks, and I'm excited you're updating Nexmark. This will be great! On Tue, Mar 21, 2017 at 9:38 AM, Etienne Chauchot wrote: > Hi all, > > Ismael and I are working on upgrading the Nexmark implementation for Beam.

[DISCUSSION] using NexMark for Beam

2017-03-21 Thread Etienne Chauchot
Hi all, Ismael and I are working on upgrading the Nexmark implementation for Beam. See https://github.com/iemejia/beam/tree/BEAM-160-nexmark and https://issues.apache.org/jira/browse/BEAM-160. We are continuing the work done by Mark Shields. See https://github.com/apache/beam/pull/366 for the

Re: why Source#validate() is not declared to throw any exception

2017-03-21 Thread Ted Yu
Looks like JIRA notification is temporarily not working. I have logged BEAM-1773 FYI On Mon, Mar 20, 2017 at 11:26 PM, Eugene Kirpichov < kirpic...@google.com.invalid> wrote: > I think it would make sense to allow the validate method to throw > Exception. > > On Mon, Mar 20, 2017, 11:21 PM Jean

Re: why Source#validate() is not declared to throw any exception

2017-03-21 Thread Jean-Baptiste Onofré
Got it. Regarding Jira notification, we changed the notification schema to use comm...@beam.apache.org (it was using comm...@beam.incubator.apache.org), but I don't think it's related, I think it's a Jira service/mail issue (even if status.apache.org doesn't show anything). I gonna ping Infra

Re: why Source#validate() is not declared to throw any exception

2017-03-21 Thread Jean-Baptiste Onofré
I just discussed with Daniel (from Infra), we've had a global networking issue affecting mail and LDAP. That's why the Jira notifications are in the queue for now. Regards JB On 03/21/2017 03:13 PM, Ted Yu wrote: Looks like JIRA notification is temporarily not working. I have logged BEAM-1773

Re: [ANNOUNCEMENT] New committers, March 2017 edition!

2017-03-21 Thread Stephan Ewen
Welcome :-) On Mon, Mar 20, 2017 at 11:17 PM, Ismaël Mejía wrote: > Thanks everyone, Feels great to be part of the team. > Congratulations to the other new committers ! > > -Ismaël > > On Mon, Mar 20, 2017 at 2:50 PM, Tyler Akidau > wrote: > > Welcome! > > > > On Mon, Mar 20, 2017, 02:25 Jean-B

Jenkins build is back to normal : beam_Release_NightlySnapshot #364

2017-03-21 Thread Apache Jenkins Server
See