As a follow-up question, how well does the history server work for
observing a running job? I'm trying to understand whether, in the
cluster-per-job model, a user would be expected to hop from the Web UI to
the History Server once the job completed.
Thanks
On Wed, Oct 4, 2017 at 3:49 AM,
Hi,
I want to implement flowable (BPMN platform - www.flowable.org) <-> flink
integration module. The motivation is to execute process simulations with
flink (simple simulation experiment example
https://gromar01.wordpress.com/2017/11/07/will-we-meet-our-kpis/). I was
able to create
Flink
Consider the watermarks that are generated by your chosen watermark
generator as an +assertion+ about the progression of time, based on domain
knowledge, observation of elements, and connector specifics. The generator
is asserting that any elements observed after a given watermark will come
later
Sorry for the late response.
MapState is currently only support as keyed state but not as operator state.
If you want to create a keyed MapState the object should be created using a
MapStateDescriptor in the open() method via the RuntimeContext.
2018-01-16 1:54 GMT+01:00 Boris Lublinsky
HI,
We recently upgraded our test environment to from flink 1.3.2 to flink
1.4.0.
We are using a high availability setup on the job manager. And now often
when I go to the job details in the web ui the call will timeout and the
following error will pop up in the job manager log
Hi Jason,
I'd suggest to start with [1] and [2] for getting the basics of a Flink
program.
The DataStream API basically wires operators together with streams so
that whatever stream gets out of one operator is the input of the next.
By connecting both functions to the same Kafka stream source,
Hi Adrian,
couldn't you solve this by providing your own DeserializationSchema [1],
possibly extending from JSONKeyValueDeserializationSchema and catching
the error there?
Nico
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#the-deserializationschema
Looking at my previous mail which mentions changes to API, optimizer, and
runtime code of the DataSet API this would be a major and non-trivial
effort and also require that a committer spends a good amount of time for
this.
2018-01-16 10:07 GMT+01:00 Flavio Pompermaier :
>
Alternatively, you can also create a keyed MapState as
context.getKeyedStateStore().getMapState() in
CheckpointedFunction.initializeState().
2018-01-16 9:58 GMT+01:00 Fabian Hueske :
> Sorry for the late response.
>
> MapState is currently only support as keyed state but not
IMHO, this looks like a bug and it makes sense that you only see this
with an HA setup:
The JobFound message contains the ExecutionGraph which, however, does
not implement the Serializable interface. Without HA, when browsing the
web interface, this message is (probably) not serialized since it
Just a guess, but probably our logging initialisation changes the global
log level (see conf/log4j.properties). DataStream.collect() executes the
program along with creating a local Flink "cluster" (if you are testing
locally / in an IDE) and initializing logging, among other things.
Please
Just found a nice (but old) blog post that explains Flink's integration
with Kafka:
https://data-artisans.com/blog/kafka-flink-a-practical-how-to
I guess, the basics are still valid
Nico
On 16/01/18 11:17, Nico Kruber wrote:
> Hi Jason,
> I'd suggest to start with [1] and [2] for getting the
Hi George,
I suspect issuing a read operation for every 68 bytes incurs too much
overhead to perform as you would like it to. Instead, create a bigger
buffer (64k?) and extract single events from sub-regions of this buffer
instead.
Please note, however, that then the first buffer will only be
Do you think is that complex to support it? I think we can try to implement
it if someone could give us some support (at least some big picture)
On Tue, Jan 16, 2018 at 10:02 AM, Fabian Hueske wrote:
> No, I'm not aware of anybody working on extending the Hadoop compatibility
Hi James,
In this scenario, with the restart strategy set, the job should restart
(without YARN/Mesos) as long as you have enough slots available.
Can you check with the web interface on http://:8081/ that
enough slots are available after killing one TaskManager?
Can you provide JobManager and
No, I'm not aware of anybody working on extending the Hadoop compatibility
support.
I'll also have no time to work on this any time soon :-(
2018-01-13 1:34 GMT+01:00 Flavio Pompermaier :
> Any progress on this Fabian? HBase bulk loading is a common task for us
> and it's
This depends on the requirements of your application.
Using the usual watermark generation strategies which are purely data
driven, a stream that does not produce data would not advance its
watermarks.
Not advancing the watermarks means that the program cannot make progress.
This might also be
Hi,
I have added the code below to the start of processElement2 in
CoProcessFunction. It prints timestamps and watermarks for the first 3
elements for each new watermark. Shouldn't the timestamp always be
lower than the next watermark? The 3 timestamps before the last
watermark are all larger
I've opened https://issues.apache.org/jira/browse/FLINK-8437
Unfortunately i doubt we can fix this properly. The proposed solution
will not work if we ever allow arbitrary functions to use side-outputs.
On 16.01.2018 08:59, Juho Autio wrote:
Could someone with knowledge of the right terms
What I had in mind was about a generic handling of the JsonParseException case. But you are right, the picture becomes fuzzier if we also consider messages that are parseable but invalid due to missing or invalid fields. We could imagine a deeper message validation feature but I think subclassing
Hi,
this indeed indicates that a REST handler is requesting the ExecutionGraph
from a JobManager which does not run in the same ActorSystem. Could you
please tell us the exact HA setup. Are your running Flink on Yarn with HA
or do you use standalone HA with standby JobManagers?
It would be
Nice, I didn't even read that far myself :P
-> turns out the API was prepared for that after all
I'm not sure about a default option for handling/skipping corrupted
messages since the handling of those is probably highly use-case
specific. If you nonetheless feel that this should be in there,
If you pass a KeyedDeserializationSchema to
new FlinkKafkaConsumer08(topic, keyedDeserializationSchema, properties),
you'll implement a method like this:
public T deserialize(byte[] messageKey, byte[] message, String topic,
int partition, long offset) throws IOException {
}
Then just
Thanks Chesnay,
So I think to support multi input and multiple output model like data flow
paper indicates, Flink needs to get credit based scheduling as well as side
input ready and doing a new set of data stream apis that doesn’t constrained
with backwards compatibility issues. Only then can
I think i found the issue. I'd just like to verify that my reasoning is
correct
We had the following keys in our flink-conf.yaml
jobmanager.web.address: localhost
jobmanager.web.port: 8081
This worked on flink 1.3.2
But on flink 1.4.0 this check
Hi all,
I'm trying to debug a python script with a flink job in using Intellij.
I'm using the current snapshot (1.5 cloned today). In former versions, I
could simply run org.apache.flink.python.api.PythonPlanBinder from
within the IDE. Atm, I'm getting NoClassDefFoundError s from classes of
the
A question came up from my colleagues about canary deploys and Flink. We had a
hard time understanding how we could do a canary deploy without constructing a
new cluster and deploying the job there. If you have a canary deploy model, how
do you do this?
Thanks for your help!
Ron
Hi all,
At first my state should not be "that" big and fit in memory, so
FsStateBackend could be a solution for me. However moving forward I
envision more features and more users and the state growing. With that in
mind RocksDBStateBackend might be the solution.
Is there an easy "upgrade" path
Hi Gordon
Thanks a lot!
So far I used AbstractDeserializationSchema.
I will try the class you mentioned.
Regards
On 2018/01/17 2:48, Gordon Weakliem wrote:
If you pass a KeyedDeserializationSchema to
new FlinkKafkaConsumer08(topic, keyedDeserializationSchema, properties),
you'll implement a
Folks sorry for being late on this. Can some body with the knowledge of
this code base create a jira issue for the above ? We have seen this more
than once on production.
On Mon, Oct 9, 2017 at 10:21 AM, Aljoscha Krettek
wrote:
> Hi Vishal,
>
> Some relevant Jira issues for
Hi Nico,
Thanks a lot. I did consider that, but I've missed the clarification of the contract brought by the piece a doc you pointed: "returning null to allow the Flink Kafka consumer to silently skip the corrupted message".
I suppose it could be an improvement for
Thanks Piotr and Stefan,
The problem was the overhead in the heap memory usage of the JobManager
when increasing the num-retained checkpoints. It was solved once I revert
that value to one.
BR
That's the actual error according to the JobManager log in the OOM:
2018-01-08 22:27:09,293 WARN
32 matches
Mail list logo