Memory consumption & ValueProvider.RuntimeValueProvider#optionsMap

2017-11-16 Thread Stas Levin
Hi all, I'm investigating a memory consumption issue (leak, so it seems) and was wondering if it could be related to the way runtime options are handled. In particular, upon deserializing a PipelineOptions object, ProxyInvocationHandler.Deserializer calls

Re: [DISCUSS] Capability Matrix revamp

2017-08-30 Thread Stas Levin
+1 for having plain English feature descriptions. Nitpick: the capability matrix uses the "~" symbol, the meaning of which is not entirely clear from the context. I think a legend would be helpful given things have gone beyond ✘ and ✓. -Stas On Mon, Aug 28, 2017 at 7:23 PM Lukasz Cwik

Re: Pipeline termination in the unified Beam model

2017-04-18 Thread Stas Levin
Ted, the timeout is needed mostly for testing purposes. AFAIK there is no easy way to express the fact a source is "done" in a Spark native streaming application. Moreover, the Spark streaming "native" flow can either "awaitTermination()" or "awaitTerminationOrTimeout(...)". If you

Re: [DISCUSS] Change "RunnableOnService" To A More Intuitive Name

2017-03-29 Thread Stas Levin
t will already be available. > > > > The one wrinkle here is that NeedsRunner is used in Stas's abandoned node > > detection in TestPipeline, a very useful piece of functionality. My > > suggestion here would be to just turn on abandoned node detection by > > default; tests whic

Re: [DISCUSS] Change "RunnableOnService" To A More Intuitive Name

2017-03-28 Thread Stas Levin
Jason, I can add the write-up in https://github.com/apache/beam/pull/2077#issuecomment-282273273 to the testing section as part of the upcoming doc updates in light of "RunnableOnService" becoming "NeedsRunner". -Stas On Tue, Mar 28, 2017 at 12:38

Re: splitIntoBundles vs. generateInitialSplits

2017-03-21 Thread Stas Levin
> >>> Dataflow runner). > >>> "generateInitialSplits" assumes that this splitting happens only > >>> "initially", i.e. at job startup time. This is currently true in > practice > >>> for all existing runners, but it doesn't

Re: [ANNOUNCEMENT] New committers, March 2017 edition!

2017-03-18 Thread Stas Levin
Congrats to the new committers! On Sat, Mar 18, 2017 at 3:44 PM Aviem Zur wrote: Thanks all! Very excited to join. Congratulations to other new committers! On Sat, Mar 18, 2017 at 2:17 AM Thomas Weise wrote: > Congrats! > > > On Fri, Mar 17, 2017

Re: Pipeline termination in the unified Beam model

2017-03-02 Thread Stas Levin
+1! I think it's a very cool way to abstract away the batch vs. streaming dissonance from the Beam model. It does require that practitioners are *educated* to think this way as well. I believe that nowadays the terms "batch" and "streaming" are so deeply rooted, that they play a key role in the

Re: Metrics for Beam IOs.

2017-02-18 Thread Stas Levin
o be sent (and the > >> corresponding configuration to connect, like Kafka or Elasticsearch > >> location) > >> 2. The format of the metric data (for instance, json format). > >> > >> In Apache Karaf, I created something similar named Decanter: > >> > >

Re: Metrics for Beam IOs.

2017-02-15 Thread Stas Levin
+1 to making the IO metrics (e.g. producers, consumers) available as part of the Beam pipeline metrics tree for debugging and visibility. As it has already been mentioned, many IO clients have a metrics mechanism in place, so in these cases I think it could be beneficial to mirror their metrics

Re: [ANNOUNCEMENT] New committers, January 2017 edition!

2017-01-27 Thread Stas Levin
> > > > > > > > Regards > > > > > > > JB > > > > > > > > > > > > > > On Jan 27, 2017, 01:27, at 01:27, Davor Bonaci < > da...@apache.org > > > > > > > > wrote: > > > > > > > >Please

Re: splitIntoBundles vs. generateInitialSplits

2017-01-11 Thread Stas Levin
both should be called simply "split", or "splitIntoSubSources". On Mon, Jan 9, 2017 at 2:12 PM Stas Levin <stasle...@gmail.com> wrote: > Definitely seems like the formatting got lost in translation, sorry about > that :) > > I guess both cases (methods) create splits,

Re: splitIntoBundles vs. generateInitialSplits

2017-01-09 Thread Stas Levin
use splitIntoBundles during job startup to be > able to split up the work before creating readers rather than after > creating readers and waiting to use splitAtFraction. > > S > > On Sun, Jan 8, 2017 at 6:06 AM Stas Levin <stasle...@gmail.com> wrote: > > > H

splitIntoBundles vs. generateInitialSplits

2017-01-08 Thread Stas Levin
Hi, A short terminology question regarding "bundle", and particularly splitIntoBundles vs. generateInitialSplits. In *BoundedSource* we have: List> *splitIntoBundles*(...) In *UnboundedSource* we have: List> *generateInitialSplits*(...) I was wondering if the names were intentionally made

Re: Testing Metrics

2017-01-02 Thread Stas Levin
I see. Just to make sure I get it right, in (2), by sinks I mean various metrics backends (e.g., Graphite). So it boils down to having integration tests as part of Beam (runners?) that beyond testing the SDK layer (i.e., asserting over pipeline.metrics()) and actually test the specific metrics

Re: Running a Specific Test

2016-12-29 Thread Stas Levin
> -pl runners/direct-java -am integration-test > > Note that this is an `integration-test`, not a `test` because it tests the > integration of the SDK with the DirectRunner: > https://github.com/apache/beam/blob/master/runners/direct-java/pom.xml#L64 > > Dan > > On Thu, Dec 29,

Re: Running a Specific Test

2016-12-29 Thread Stas Levin
P.S You can also do this from the main directory (without cd-ing into the direct-runner): "mvn test -Dtest=RegexTest -DdependenciesToScan=org.apache.beam:beam-sdks-java-core -pl runners/direct-java" On Thu, Dec 29, 2016 at 8:50 PM Stas Levin <stasle...@gmail.com> wrote:

Re: Running a Specific Test

2016-12-29 Thread Stas Levin
Once you "cd" into "runners/direct-java" you can use: "mvn test -Dtest=RegexTest -DdependenciesToScan=org.apache.beam:beam-sdks-java-core" -Stas On Thu, Dec 29, 2016 at 8:27 PM Jesse Anderson wrote: > I tried that one already. It gives a no tests run error. If you