Re: One-to-many mapping between unbounded input source and pipelines with session windows

2016-12-21 Thread Lukasz Cwik
Do the records have another attribute Z which joins them all together? Are the set of attributes A, B, C, X, Y, K, L are from a fixed set of values like enums or can be mapped onto a certain number of states (like an attribute A > 50 can be mapped onto a state "exceeds threshold")? For your

Re: Event-time based in-window trigger

2016-12-01 Thread Lukasz Cwik
Can you provide more details about the problem your trying to solve with some examples showing input and the expected output? On Wed, Nov 30, 2016 at 11:08 PM, Manu Zhang wrote: > Hi, > > Recently I’m addressing a problem where users want to trigger after > watermark

Re: Support for sorting output in Beam?

2016-11-23 Thread Lukasz Cwik
ould also work with certain string > representations of a timestamp. > > Feel free to ping me with any questions about the sorter. > > Thanks, > Mitch > > On Tue, Nov 22, 2016 at 7:55 AM, Lukasz Cwik <lc...@google.com> wrote: > >> Since the only guarantee for a uni

Re: Support for sorting output in Beam?

2016-11-22 Thread Lukasz Cwik
fice if I could partition the output with a custom > partition function (for example daywise)… > > > > Thanks, Rico. > > > > *Von:* Lukasz Cwik [mailto:lc...@google.com] > *Gesendet:* Dienstag, 22. November 2016 16:00 > *An:* user@beam.inc

Re: Support for sorting output in Beam?

2016-11-22 Thread Lukasz Cwik
There is not explicit support for sorting in the Beam model today because the problem space is large and typically the usecases people have generally suffice to do a global combine and sort in memory or do a combine per key with a radix like scheme and sort each radix individually. Can you give

Re: GroupByKey and CombineFn: internals

2016-11-22 Thread Lukasz Cwik
; Cheers! > > Matthias > > On Tue, Nov 22, 2016 at 2:14 PM, Lukasz Cwik <lc...@google.com> wrote: > >> The javadoc goes through a lengthy explanation: >> http://beam.incubator.apache.org/documentation/sdks/javadoc/ >> 0.3.0-incubating/org/apache/beam/sdk/tran

Re: is beam's core stable now? or when will the beam's first stable version release?

2016-11-10 Thread Lukasz Cwik
Are you asking about python or java? On Thu, Nov 10, 2016 at 3:56 AM, 陈竞 wrote: > i want to use beam in my company, i want to know that whether beam's core > is stable, > and when will the beam's first stable version release? > > thanks. >

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Lukasz Cwik
of them have the > requirements ? > > On Sat, Oct 29, 2016 at 12:06 AM Lukasz Cwik <lc...@google.com> wrote: > >> For it to be considered a combiner, the function needs to be associative >> and commutative. >> >> The issue is that from an API perspective it would be

Re: Runtime Windows/Aggregation Computations in Beam

2016-09-22 Thread Lukasz Cwik
For 2) Depending on the runner, the Iterable inside the KV may not be materialized in memory. For example, in Dataflow, the Iterable is implemented as a pointer which you walk which loads and caches blocks of data to prevent the out of memory issue that is common for keys with lots of

Re: Spark TransformEvaluator Error

2016-09-19 Thread Lukasz Cwik
The Spark runner currently only supports a limited set of sources because there is no support in the Spark runner implementation which handles the Read PTransform directly. They are able to support TextIO, AvroIO and a few others by manually providing a few direct implementations. The supported

Re: TextIO().Read pipeline implementation question

2016-08-24 Thread Lukasz Cwik
Amir, it seems like your attempting to build a network simulation ( https://en.wikipedia.org/wiki/Network_simulation). Are you sure Apache Beam is the right tool for this? On Wed, Aug 24, 2016 at 3:54 PM, Thomas Groh wrote: > The Beam model generally is agnostic to the rate at

Re: Example: pass Runner at command line

2016-07-26 Thread Lukasz Cwik
I was under the impression that we had several @RunnableOnService integration tests that executed across runners. Also, doesn't WordCount works on the DirectRunner, Flink and Dataflow? (

Re: Java 1.8 Stream vs Beam

2016-07-13 Thread Lukasz Cwik
Java 8 streams conceptually follow parts of the big data processing model that came out of map reduce. Java 8 streams are limited to in memory transformations but Beam abstracts away the datasets and allows for arbitrarily large transformations, think petabytes of data so I believe that comparing

Re: Multi-threading implementation equivalence in Beam

2016-06-22 Thread Lukasz Cwik
oo.com> > > *Sent:* Wednesday, June 22, 2016 2:18 AM > > *Subject:* Re: Multi-threading implementation equivalence in Beam > > Lukasz is of course correct in assuming that Flink does nothing to > synchronize accesses to Redis (or any other external system, for that

Re: Iterative Pipelines in Beam Model

2016-06-22 Thread Lukasz Cwik
You should be able to provide a fixed number of iterations with a for like loop that is "unrolled" constructing a larger pipeline which will allow you to submit the pipeline less often. On Wed, Jun 22, 2016 at 1:10 PM, Frances Perry wrote: > Correct -- Beam pipelines currently

Re: Multi-threading implementation equivalence in Beam

2016-06-21 Thread Lukasz Cwik
both nodes > DoFn processes) & a concurrentHashMap for another set of data > I assume FlinkCluster maintains the thread safety of Redis & > concurrentHashMap objects. > Is this the right assumption? . > Thanks again. > Amir- > > > -- > *F

Re: Multi-threading implementation equivalence in Beam

2016-06-20 Thread Lukasz Cwik
g in-memory db such as Redis or Aerospike for intermediate look > ups etc. > Is this what the above statement referring to: dont use in-memory dbs? > Thanks again. > > > -- > *From:* Lukasz Cwik <lc...@google.com> > *To:* user@beam.incubator.a