Hello,
I'm interested in exposing metrics from my UDFs. I see FLINK-1501 exposes
task manager metrics via a UI; it would be nice to plug into the same
MetricRegistry to register my own (ie, gauges). I don't see this exposed
via runtime context. This did lead me to discovering the Accumulators
The first and 3rd points here aren't very fair -- they apply to all data
systems. Systems downstream of your database can lose data in the same way;
the database retention policy expires old data, downstream fails, and back
to the tapes you must go. Likewise with 3, a bug in any ETL system can
Why not use an existing benchmarking tool -- is there one? Perhaps you'd
like to build something like YCSB [0] but for streaming workloads?
Apache Storm is the OSS framework that's been around the longest. Search
for "apache storm benchmark" and you'll get some promising hits. Looks like
ere are still some error cases that cause exceptions,
apparently, outside of my UDF try block that cause the whole streaming job
to terminate.
> > On 11 Nov 2015, at 21:49, Nick Dimiduk <ndimi...@gmail.com> wrote:
> >
> > Heya,
> >
> > I don't see a section
All those should apply for streaming too...
On Mon, Nov 16, 2015 at 11:06 AM, Vasiliki Kalavri <
vasilikikala...@gmail.com> wrote:
> Hi,
>
> thanks Nick and Ovidiu for the links!
>
> Just to clarify, we're not looking into creating a generic streaming
> benchmark. We have quite limited time and
itoring service which uses Akka messages to query the JobManager on
> > a job's status and accumulators. I'm wondering if you two could engage
> > in any way.
> >
> > Cheers,
> > Max
> >
> > On Wed, Nov 11, 2015 at 6:44 PM, Nick Dimiduk <ndimi...@gmail.com>
&g
html#checkpointing-local-variables
> [3] https://gist.github.com/fhueske/4ea5422edb5820915fa4
>
> <https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#using-the-keyvalue-state-interface>
>
>
>
> 2015-11-10 19:02 GMT+01:00 Nick Dimiduk <ndimi...@gmail.com>:
&
@Override
>public void run(SourceContext ctx) throws Exception {
> int i = 0;
> while (running) {
> ctx.collect(i++);
> }
>}
>
>@Override
>public void cancel() {
> running = false;
>}
> });
>
>
> Let me know if yo
Heya,
I'm writing my first flink streaming application and have a flow that
passes type checking and complies. I've written a simple end-to-end
test with junit, using StreamExecutionEnvironment#fromElements() to
provide a stream if valid and invalid test objects; the objects are
POJOs.
It seems
ry and replace the
> tag by test-jar:
>
>
>org.apache.flink
>flink-streaming-core
>${flink.version}
>test-jar
> test
>
>
>
> On Thu, Nov 5, 2015 at 8:18 PM, Nick Dimiduk <ndimi...@gmail.com> wrote:
>>
>> Hello,
>&g
se "DataStreamUtils.collect(stream)", so you need to
> stop reading it once you know the stream is exhausted...
>
> Stephan
>
>
> On Thu, Nov 5, 2015 at 10:38 PM, Nick Dimiduk <ndimi...@gmail.com
> <javascript:_e(%7B%7D,'cvml','ndimi...@gmail.com');>> wrot
of
> predicates should do the job. Thus, there shouldn’t be a need to dig deeper
> than the DataStream for the first version.
>
> Cheers,
> Till
>
>
> On Fri, Nov 6, 2015 at 3:58 AM, Nick Dimiduk <ndimi...@gmail.com> wrote:
>>
>> Thanks Stephan, I'll check
gt; DOP 1. The sink gathers the elements in a list, and the close() function
> validates the result.
>
> Validating the results may involve sorting the list where elements where
> gathered (make the order deterministic) or use a hash set if it is only
> about distinct count.
>
> Hope
sink in DOP 1" (sorry for the unclear terminology) is a sink with
> parallelism 1, so all data is collected by the same function instance.
>
> Any of this helpful?
>
> Stephan
>
>
> On Wed, Nov 18, 2015 at 5:13 PM, Nick Dimiduk <ndimi...@gmail.com> wrote:
>
ere are classes that you can serialize with Java Serialization, but not
> out of the box with Kryo (especially when immutable collections are
> involved). Also classes that have no default constructors, but have checks
> on invariants, etc can fail with Kryo arbitrarily.
>
>
>
> On
ntain Serializable objects.
> The serializer registration only applies to the data which is processed by
> the Flink job. Thus, for the moment I would try to get rid of the
> ColumnInfo object in your closure.
>
> Cheers,
> Till
>
>
> On Mon, Dec 7, 2015 at 10:0
gt; val textData: DataStream[(LongWritable, Text)] = env.createInput(
>>> > new HadoopInputFormat[LongWritable, Text](
>>> > new TextInputFormat,
>>> > classOf[LongWritable],
>>> > classOf[Text],
>>> > new JobConf()
>>>
For my own understanding, are you suggesting the FLINK-2944 (or a subtask)
is the appropriate place to implement exposure of metrics such as bytes,
records in, out of Streaming sources and sinks?
On Tue, Dec 15, 2015 at 5:24 AM, Niels Basjes wrote:
> Hi,
>
> @Ufuk: I added the
vironment, I hope this will be released as open source
> anytime soon since the Otto Group believes in open source ;-) If you would
> like to know more about it, feel free to ask ;-)
>
> Best
> Christian (Kreutzfeldt)
>
>
> Nick Dimiduk wrote
> > I'm much mor
Hi Sebastian,
I've had preliminary success with a steaming job that is Kafka -> Flink ->
HBase (actually, Phoenix) using the Hadoop OutputFormat adapter. A little
glue was required but it seems to work okay. My guess it it would be the
same for Cassandra. Maybe that can get you started?
Good
tation[1] in Flink homepage.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/hadoop_compatibility.html
>
> > On Nov 25, 2015, at 3:04 AM, Nick Dimiduk <ndimi...@apache.org> wrote:
> >
> > Hello,
> >
> > Is it possible to use e
Hello,
Is it possible to use existing Hadoop Input and OutputFormats with Flink?
There's a lot of existing code that conforms to these interfaces, seems a
shame to have to re-implement it all. Perhaps some adapter shim..?
Thanks,
Nick
Hi folks,
I have a streaming job that consumes from of a kafka topic. The topic is
pretty active so the local-mode single worker is obviously not able to keep
up with the fire-hose. I expect the job to skip records and continue on.
However, I'm getting an exception from the LegacyFetcher which
xception is being thrown from the
consumer, not the runtime.
> On Sun, Jan 17, 2016 at 12:42 AM, Nick Dimiduk <ndimi...@gmail.com
> <javascript:_e(%7B%7D,'cvml','ndimi...@gmail.com');>> wrote:
>
>> This goes back to the idea that streaming applications should never
tation and use the sources and sinks accordingly. It is not like
>> OutputFormats are dangerous but all SinkFunctions are failure-proof.
>>
>> Consolidating the two interfaces would make sense. It might be a bit late
>> for the 1.0 release because I see that we would ne
erance aware) so
> that it can easily be used with something akin to OutputFormats.
>
> What do you think?
>
> -Aljoscha
> > On 08 Feb 2016, at 19:40, Nick Dimiduk <ndimi...@apache.org
> <javascript:;>> wrote:
> >
> > On Mon, Feb 8, 2016 at 9:51 AM, Maximilian M
thod.
>
> Cheers,
> Max
>
> On Mon, Feb 8, 2016 at 7:14 AM, Nick Dimiduk <ndimi...@apache.org> wrote:
> > Heya,
> >
> > Is there a plan to consolidate these two interfaces? They appear to
> provide
> > identical functionality, differing only in lifecycle m
Heya,
Is there a plan to consolidate these two interfaces? They appear to provide
identical functionality, differing only in lifecycle management. I found
myself writing an adaptor so I can consume an OutputFormat where a
SinkFunction is expected; there's not much to it. This seems like code that
ou need to read the file as byte arrays somehow to make it work.
> What read function did you use? The mapper is not hard to write but the
> byte array stuff gives me a headache.
>
> cheers Martin
>
>
>
>
> On Fri, Feb 12, 2016 at 9:12 PM, Nick Dimiduk <ndimi...@ap
If the dataset is too large for a file, you can put it behind a service and
have your stream operators query the service for enrichment. You can even
support updates to that dataset in a style very similar to the "lambda
architecture" discussed elsewhere.
On Thursday, January 28, 2016, Fabian
about our suggested approach.
>
> Sadly, it seems that the Kafka 0.9 consumer API does not yet support
> requesting the latest offset of a TopicPartition. I'll ask about this on
> their ML.
>
>
>
>
> On Sun, Jan 17, 2016 at 8:28 PM, Nick Dimiduk <ndimi...@gmail.com> wrote:
gt; On Thu, Feb 25, 2016 at 6:03 PM, Nick Dimiduk <ndimi...@gmail.com
> <javascript:_e(%7B%7D,'cvml','ndimi...@gmail.com');>> wrote:
>
>> For what it's worth, I dug into the TM logs and found that this exception
>> was not the root cause, merely a symptom of other back
Hi Christophe,
What HBase version are you using? Have you looked at using the shaded
client jars? Those should at least isolate HBase/Hadoop's Guava version
from that used by your application.
-n
On Monday, January 25, 2016, Christophe Salperwyck <
christophe.salperw...@gmail.com> wrote:
> Hi
Can anyone tell me where I must place my application-specific
log4j.properties to have them honored when running on a YARN cluster? In my
application jar doesn't work. In the log4j files under flink/conf doesn't
work.
My goal is to set the log level for 'com.mycompany' classes used in my
flink
Yes the approach works fine for my use-case; has been in "production" for
quite some time. My implementation has some untested scenarios around job
restarting and failures, so of course your mileage may vary.
On Mon, May 15, 2017 at 5:58 AM, nragon wrote:
>
35 matches
Mail list logo