Ufuk & Robert,
There's a good chance you're right! On the EMR master node, where
yarn-session.sh is run, /etc/hadoop/conf/yarn-site.xml says that
"yarn.nodemanager.resource.cpu-vcores" is 4.
Meanwhile, on the core nodes, the value in that file is 8.
Shall I submit a JIRA? This might be
I think so far getExecutionPlan() was only used for debugging purpose and
not in programs that would also be executed.
You can open a JIRA issue if you think that this would a valuable feature.
Thanks, Fabian
2017-01-13 16:34 GMT+01:00 Robert Schmidtke :
> Just a side
Hi Kat,
I did not understand the difference between a case and a trace.
If I got it right, the goal of your first job is to assemble the individual
events into cases. Is a case here the last event for a case-id or all
events of a case-id?
If a case is the collection of all events (which I assume)
Hi Matt,
it is not possible to share an object across different task of the same
operator or even different operators.
This would be globally mutable state which is in general hard to get
efficient in distributed systems.
Something that might work is to use a CoFlatMapOperator with one input
Hi,
Let's say we have multiple subtask chains and all of them are executing in
the same task manager slot (i.e. in the same JVM).
What's the point in serializing data between them?
Can it be disabled?
The reason I want keep different chains is that some subtasks should be
executed in parallel to
Hi Andrew,
Your observations are correct. Like you mentioned, the current problem circles
around how we deal with the pending buffered requests with accordance to
Flink’s checkpointing.
I’ve filed a JIRA for this, as well as some thoughts for the solution in the
description:
On thing to add: the Flink KafkaProducer provides only at-least-once if
flush-on-checkpoint is enabled [1].
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.1/api/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducerBase.html#setFlushOnCheckpoint-boolean-
2017-01-13
Hi,
I am reading messages off a Kafka Topic and want to process the messages
through Flink and save them into S3. It was pointed out to me that stream
processing of the Kafka data won't be saved to S3 because S3 doesn't allow
data to be appended to a file, so I want to convert the Kafka stream
Hi Dawid,
I'll try to reproduce the error in the next couple of days. Can you also share
the value deserializer you use? Also, have you tried even smaller examples in
the meantime? Did they work?
As a side-note in general regarding the queryable state "sink" using ListState
Hi,
I'm afraid there is no functionality for this in Flink. What you can do,
however, is to not evict these elements from the window buffer but instead
ignore them when processing your elements in the WindowFunction. This way
they will be preserved for the next firing. You have to make sure to
Errata: How can an *object (such as the classifier, line 1)* be accessed by
any Flink node [...]
Just in case, the classifier itself can't be serialized I believe, it's
part of a framework which I can't modify. In any case, even if it's
serialized, I guess the cost of moving it to one node and
Hi Fabian,
thanks for the quick and comprehensive reply. I'll have a look at the
ExecutionPlan using your suggestion to check what actually gets computed,
and I'll use the properties as well. If I stumble across something else
I'll let you know.
Many thanks again!
Robert
On Fri, Jan 13, 2017 at
Hi Fabian,
Yes, OrderA is a table of POJO.
But what I consume is that in construct func PojoTypeInfo
The input param (fields)’s order is right, it ‘s change after the sort
operation, and I’m wonder if the sort operation can be removed?
public PojoTypeInfo(Class typeClass, List fields) {
Hi Yuhong,
as a solution you can specify the order of your Pojo fields when
converting from DataStream to Table.
Table table = tableEnv
.fromDataSet(env.fromCollection(data), "department AS a, " +
"age AS b, " +
"salary AS c, " +
"name AS d")
.select("a, b, c, d");
Hi Shannon,
Flink is reading the number of available vcores from the local YARN
configuration. Is it possible that the YARN / Hadoop config on the machine
where you are submitting your job from sets the number of vcores as 4 ?
On Fri, Jan 13, 2017 at 12:51 AM, Shannon Carey
Hi,
I'm now using streaming sql, And I have the sql like
select * FROM OrderA where user > 2
the OrderA has 3 attr (user, product, amount)
and I expect the result is as the order like input, but it has been sorted by
attr name,
and I found the order has already been sorted when call addSource,
I think the sorting is done for consistency reasons, i.e., that all
PojoTypeInfos for the same class behave the same.
Since this code is used in many parts of Flink and many jobs (DataSet,
DataStream, etc.) I would be very careful to change the default behavior
here.
Maybe we can add a
I have been playing around with Flink for a few weeks to try to
ascertain whether or not it meets our use cases, and also what best
practices we should be following. I have a few questions I would
appreciate answers to.
Our scenario is that we want to process a lot of event data into
cases. A
Hi Huang,
this seems to be very strange, because the JobManager’s actor system has
bound to the address 9-96-101-177 instead of 9.96.101.177. It seems a if
the . have been replaced by -.
Could you maybe tell me which version of Flink you’re running and also
share the complete JobManager log with
I tested the Table API / SQL a bit.
I implemented a windowed aggregation with the streaming Table API and it
produced the same results as a DataStream API implementation.
Joining a stream with a TableFunction also seemed to work well.
Moreover, I checked the results of a bunch of TPC-H queries
Hi all,
I'm having some trouble grasping what the meaning of/difference between the
following concepts is:
- Split
- Group
- Partition
Let me elaborate a bit on the problem I'm trying to solve here. In my tests
I'm using a 5-node cluster, on which I'm running Flink 1.1.3 in standalone
mode.
Hi Robert,
let me first describe what splits, groups, and partitions are.
* Partition: This is basically all data that goes through the same task
instance. If you have an operator with a parallelism of 80, you have 80
partitions. When you call sortPartition() you'll have 80 sorted streams, if
Hi,
This is expected behaviour due to how the per-partition watermarks are designed
in the Kafka consumer, but I think it’s probably a good idea to handle idle
partitions also when the Kafka consumer itself emits watermarks. I’ve filed a
JIRA issue for this:
Just a side note, I'm guessing there's a bug here:
https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/ContextEnvironment.java#L68
It should say createProgramPlan("unnamed job", false);
Otherwise I'm getting an exception complaining that no new
Hello,
If I missed the answer to this or some essential step of the documentation,
please do tell.
I am having the following problem while trying out the
org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly
API (Java).
Specs: JDK 1.8.0_102 x64
Apache Flink: 1.1.4
Suppose I
25 matches
Mail list logo