Accumulators/Metrics

2015-11-11 Thread Nick Dimiduk
Hello, I'm interested in exposing metrics from my UDFs. I see FLINK-1501 exposes task manager metrics via a UI; it would be nice to plug into the same MetricRegistry to register my own (ie, gauges). I don't see this exposed via runtime context. This did lead me to discovering the Accumulators

Re: Flink, Kappa and Lambda

2015-11-11 Thread Nick Dimiduk
The first and 3rd points here aren't very fair -- they apply to all data systems. Systems downstream of your database can lose data in the same way; the database retention policy expires old data, downstream fails, and back to the tapes you must go. Likewise with 3, a bug in any ETL system can

Re: Creating a representative streaming workload

2015-11-16 Thread Nick Dimiduk
Why not use an existing benchmarking tool -- is there one? Perhaps you'd like to build something like YCSB [0] but for streaming workloads? Apache Storm is the OSS framework that's been around the longest. Search for "apache storm benchmark" and you'll get some promising hits. Looks like

Re: Error handling

2015-11-16 Thread Nick Dimiduk
ere are still some error cases that cause exceptions, apparently, outside of my UDF try block that cause the whole streaming job to terminate. > > On 11 Nov 2015, at 21:49, Nick Dimiduk <ndimi...@gmail.com> wrote: > > > > Heya, > > > > I don't see a section

Re: Creating a representative streaming workload

2015-11-16 Thread Nick Dimiduk
All those should apply for streaming too... On Mon, Nov 16, 2015 at 11:06 AM, Vasiliki Kalavri < vasilikikala...@gmail.com> wrote: > Hi, > > thanks Nick and Ovidiu for the links! > > Just to clarify, we're not looking into creating a generic streaming > benchmark. We have quite limited time and

Re: Accumulators/Metrics

2015-11-12 Thread Nick Dimiduk
itoring service which uses Akka messages to query the JobManager on > > a job's status and accumulators. I'm wondering if you two could engage > > in any way. > > > > Cheers, > > Max > > > > On Wed, Nov 11, 2015 at 6:44 PM, Nick Dimiduk <ndimi...@gmail.com> &g

Re: Implementing samza table/stream join

2015-11-10 Thread Nick Dimiduk
html#checkpointing-local-variables > [3] https://gist.github.com/fhueske/4ea5422edb5820915fa4 > > <https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#using-the-keyvalue-state-interface> > > > > 2015-11-10 19:02 GMT+01:00 Nick Dimiduk <ndimi...@gmail.com>: &

Re: IT's with POJO's

2015-11-05 Thread Nick Dimiduk
@Override >public void run(SourceContext ctx) throws Exception { > int i = 0; > while (running) { > ctx.collect(i++); > } >} > >@Override >public void cancel() { > running = false; >} > }); > > > Let me know if yo

IT's with POJO's

2015-11-04 Thread Nick Dimiduk
Heya, I'm writing my first flink streaming application and have a flow that passes type checking and complies. I've written a simple end-to-end test with junit, using StreamExecutionEnvironment#fromElements() to provide a stream if valid and invalid test objects; the objects are POJOs. It seems

Re: Published test artifacts for flink streaming

2015-11-05 Thread Nick Dimiduk
ry and replace the > tag by test-jar: > > >org.apache.flink >flink-streaming-core >${flink.version} >test-jar > test > > > > On Thu, Nov 5, 2015 at 8:18 PM, Nick Dimiduk <ndimi...@gmail.com> wrote: >> >> Hello, >&g

Re: Published test artifacts for flink streaming

2015-11-05 Thread Nick Dimiduk
se "DataStreamUtils.collect(stream)", so you need to > stop reading it once you know the stream is exhausted... > > Stephan > > > On Thu, Nov 5, 2015 at 10:38 PM, Nick Dimiduk <ndimi...@gmail.com > <javascript:_e(%7B%7D,'cvml','ndimi...@gmail.com');>> wrot

Re: Published test artifacts for flink streaming

2015-11-06 Thread Nick Dimiduk
of > predicates should do the job. Thus, there shouldn’t be a need to dig deeper > than the DataStream for the first version. > > Cheers, > Till > > > On Fri, Nov 6, 2015 at 3:58 AM, Nick Dimiduk <ndimi...@gmail.com> wrote: >> >> Thanks Stephan, I'll check

Re: Published test artifacts for flink streaming

2015-11-18 Thread Nick Dimiduk
gt; DOP 1. The sink gathers the elements in a list, and the close() function > validates the result. > > Validating the results may involve sorting the list where elements where > gathered (make the order deterministic) or use a hash set if it is only > about distinct count. > > Hope

Re: Published test artifacts for flink streaming

2015-11-18 Thread Nick Dimiduk
sink in DOP 1" (sorry for the unclear terminology) is a sink with > parallelism 1, so all data is collected by the same function instance. > > Any of this helpful? > > Stephan > > > On Wed, Nov 18, 2015 at 5:13 PM, Nick Dimiduk <ndimi...@gmail.com> wrote: >

Re: Using Kryo for serializing lambda closures

2015-12-08 Thread Nick Dimiduk
ere are classes that you can serialize with Java Serialization, but not > out of the box with Kryo (especially when immutable collections are > involved). Also classes that have no default constructors, but have checks > on invariants, etc can fail with Kryo arbitrarily. > > > > On

Re: Using Kryo for serializing lambda closures

2015-12-08 Thread Nick Dimiduk
ntain Serializable objects. > The serializer registration only applies to the data which is processed by > the Flink job. Thus, for the moment I would try to get rid of the > ColumnInfo object in your closure. > > Cheers, > Till > ​ > > On Mon, Dec 7, 2015 at 10:0

Re: Using Hadoop Input/Output formats

2015-12-04 Thread Nick Dimiduk
gt; val textData: DataStream[(LongWritable, Text)] = env.createInput( >>> > new HadoopInputFormat[LongWritable, Text]( >>> > new TextInputFormat, >>> > classOf[LongWritable], >>> > classOf[Text], >>> > new JobConf() >>>

Re: Tiny topology shows '0' for all stats.

2015-12-15 Thread Nick Dimiduk
For my own understanding, are you suggesting the FLINK-2944 (or a subtask) is the appropriate place to implement exposure of metrics such as bytes, records in, out of Streaming sources and sinks? On Tue, Dec 15, 2015 at 5:24 AM, Niels Basjes wrote: > Hi, > > @Ufuk: I added the

Re: Accumulators/Metrics

2015-12-14 Thread Nick Dimiduk
vironment, I hope this will be released as open source > anytime soon since the Otto Group believes in open source ;-) If you would > like to know more about it, feel free to ask ;-) > > Best > Christian (Kreutzfeldt) > > > Nick Dimiduk wrote > > I'm much mor

Re: Sink - Cassandra

2016-01-05 Thread Nick Dimiduk
Hi Sebastian, I've had preliminary success with a steaming job that is Kafka -> Flink -> HBase (actually, Phoenix) using the Hadoop OutputFormat adapter. A little glue was required but it seems to work okay. My guess it it would be the same for Cassandra. Maybe that can get you started? Good

Re: Using Hadoop Input/Output formats

2015-11-24 Thread Nick Dimiduk
tation[1] in Flink homepage. > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/hadoop_compatibility.html > > > On Nov 25, 2015, at 3:04 AM, Nick Dimiduk <ndimi...@apache.org> wrote: > > > > Hello, > > > > Is it possible to use e

Using Hadoop Input/Output formats

2015-11-24 Thread Nick Dimiduk
Hello, Is it possible to use existing Hadoop Input and OutputFormats with Flink? There's a lot of existing code that conforms to these interfaces, seems a shame to have to re-implement it all. Perhaps some adapter shim..? Thanks, Nick

Frequent exceptions killing streaming job

2016-01-15 Thread Nick Dimiduk
Hi folks, I have a streaming job that consumes from of a kafka topic. The topic is pretty active so the local-mode single worker is obviously not able to keep up with the fire-hose. I expect the job to skip records and continue on. However, I'm getting an exception from the LegacyFetcher which

Re: Frequent exceptions killing streaming job

2016-01-17 Thread Nick Dimiduk
xception is being thrown from the consumer, not the runtime. > On Sun, Jan 17, 2016 at 12:42 AM, Nick Dimiduk <ndimi...@gmail.com > <javascript:_e(%7B%7D,'cvml','ndimi...@gmail.com');>> wrote: > >> This goes back to the idea that streaming applications should never

Re: OutputFormat vs SinkFunction

2016-02-09 Thread Nick Dimiduk
tation and use the sources and sinks accordingly. It is not like >> OutputFormats are dangerous but all SinkFunctions are failure-proof. >> >> Consolidating the two interfaces would make sense. It might be a bit late >> for the 1.0 release because I see that we would ne

Re: OutputFormat vs SinkFunction

2016-02-08 Thread Nick Dimiduk
erance aware) so > that it can easily be used with something akin to OutputFormats. > > What do you think? > > -Aljoscha > > On 08 Feb 2016, at 19:40, Nick Dimiduk <ndimi...@apache.org > <javascript:;>> wrote: > > > > On Mon, Feb 8, 2016 at 9:51 AM, Maximilian M

Re: OutputFormat vs SinkFunction

2016-02-08 Thread Nick Dimiduk
thod. > > Cheers, > Max > > On Mon, Feb 8, 2016 at 7:14 AM, Nick Dimiduk <ndimi...@apache.org> wrote: > > Heya, > > > > Is there a plan to consolidate these two interfaces? They appear to > provide > > identical functionality, differing only in lifecycle m

OutputFormat vs SinkFunction

2016-02-07 Thread Nick Dimiduk
Heya, Is there a plan to consolidate these two interfaces? They appear to provide identical functionality, differing only in lifecycle management. I found myself writing an adaptor so I can consume an OutputFormat where a SinkFunction is expected; there's not much to it. This seems like code that

Re: streaming using DeserializationSchema

2016-02-12 Thread Nick Dimiduk
ou need to read the file as byte arrays somehow to make it work. > What read function did you use? The mapper is not hard to write but the > byte array stuff gives me a headache. > > cheers Martin > > > > > On Fri, Feb 12, 2016 at 9:12 PM, Nick Dimiduk <ndimi...@ap

Re: Mixing Batch & Streaming

2016-01-28 Thread Nick Dimiduk
If the dataset is too large for a file, you can put it behind a service and have your stream operators query the service for enrichment. You can even support updates to that dataset in a style very similar to the "lambda architecture" discussed elsewhere. On Thursday, January 28, 2016, Fabian

Re: Frequent exceptions killing streaming job

2016-02-25 Thread Nick Dimiduk
about our suggested approach. > > Sadly, it seems that the Kafka 0.9 consumer API does not yet support > requesting the latest offset of a TopicPartition. I'll ask about this on > their ML. > > > > > On Sun, Jan 17, 2016 at 8:28 PM, Nick Dimiduk <ndimi...@gmail.com> wrote:

Re: Frequent exceptions killing streaming job

2016-02-26 Thread Nick Dimiduk
gt; On Thu, Feb 25, 2016 at 6:03 PM, Nick Dimiduk <ndimi...@gmail.com > <javascript:_e(%7B%7D,'cvml','ndimi...@gmail.com');>> wrote: > >> For what it's worth, I dug into the TM logs and found that this exception >> was not the root cause, merely a symptom of other back

Re: Flink 0.10.1 and HBase

2016-01-25 Thread Nick Dimiduk
Hi Christophe, What HBase version are you using? Have you looked at using the shaded client jars? Those should at least isolate HBase/Hadoop's Guava version from that used by your application. -n On Monday, January 25, 2016, Christophe Salperwyck < christophe.salperw...@gmail.com> wrote: > Hi

Log4j configuration on YARN

2016-03-11 Thread Nick Dimiduk
Can anyone tell me where I must place my application-specific log4j.properties to have them honored when running on a YARN cluster? In my application jar doesn't work. In the log4j files under flink/conf doesn't work. My goal is to set the log level for 'com.mycompany' classes used in my flink

Re: Sink - Cassandra

2017-05-16 Thread Nick Dimiduk
Yes the approach works fine for my use-case; has been in "production" for quite some time. My implementation has some untested scenarios around job restarting and failures, so of course your mileage may vary. On Mon, May 15, 2017 at 5:58 AM, nragon wrote: >