一个slot可以运行多个task(同一个作业的不同task),每个task使用一个线程执行。
ゞ野蠻遊戲χ wrote
> Hi 大家好
>
>
> 一个slot同时只能运行一个线程吗?或者1个slot可以同时并行运行多个线程?
>
>
> 谢谢,
> 嘉治
--
Sent from: http://apache-flink.147419.n8.nabble.com/
这个问题有人讨论下嘛?
赵一旦 于2020年11月16日周一 下午2:48写道:
> 再具体点,reduce中return的对象作为reduce之后输出(这里是否涉及立即序列化)。
>
> reduce(new ReduceFunction{
>
> @Override
> public ObjCls reduce( ObjCls ele1, ObjCls ele2 ){
> long resultPv = ele1.getPv() + ele2.getPv();
>
> ele1.setPv(999); //
Hi KristoffSC,
I'd strongly suggest not blocking the task thread if it involves external
services. RPC notification cannot be processed and checkpoints are delayed
when the task thread is blocked. That's what AsyncIO is for.
If your third party library just takes a few ms to finish computation
你可以用这篇文章中的 docker:
https://flink.apache.org/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html
https://raw.githubusercontent.com/wuchong/flink-sql-demo/v1.11-EN/docker-compose.yml
这个容器里面的 ts 数据格式是 SQL 格式的。
1. 像上述时间格式字段在Flink SQL中应该解析成什么类型?
TIMESTAMP WITH LOCAL TIME ZONE, 1.12 的
Hi
退订请发邮件到 user-zh-unsubscr...@flink.apache.org,详情可以参考文档[1]
[1]
https://flink.apache.org/community.html#how-to-subscribe-to-a-mailing-list
Best,
Congxian
回响 <939833...@qq.com> 于2020年11月24日周二 下午8:42写道:
>
-help
Xev Orm 于2020年11月25日周三 下午12:25写道:
> Unsubscribe
>
delete
数据源来自Jark项目 https://github.com/wuchong/flink-sql-submit
中的kafka消息,里面user_behavior消息例如
{"user_id": "470572", "item_id":"3760258", "category_id": "1299190",
"behavior": "pv", "ts": "2017-11-26T01:00:01Z"}
可以看到ts值是 '2017-11-26T01:00:00Z',现在要为它定义一张flink sql源表,如下
CREATE TABLE user_log (
user_id
我自己写了个 Sink 到数据库的 SinkFunction,SinkFunction 中指定只有数据到了一定条数(100)
才执行入库操作。我通过定义了一个 List 缓存需要入库的数据的方式实现。
public class SinkToJDBCWithJDBCStatementBatch extends
RichSinkFunction {
private List statementList = new
ArrayList();
@Override
public void close() throws Exception {
有啊,一个slot本身就可以运行多个线程的。但是不可以运行1个算子结点的多个任务,也不可以运行多个作业中的算子结点的多个任务。
ゞ野蠻遊戲χ 于2020年11月25日周三 上午10:33写道:
> Hi 大家好
>
>
>
> 一个slot同时只能运行一个线程吗?或者1个slot可以同时并行运行多个线程?
>
>
> 谢谢,
> 嘉治
如题,standalone集群,目前我部署的模式都是所有机器同时启动jobmanager(StandaloneSessionClusterEntrypoint)+taskmanager。
问题是发布任务,取消任务等操作的时候FlinkWebUI很卡顿,有时候仅卡顿之后恢复正常,有时候则可能导致整个集群直接多个结点陆续失效(slot变少,有时候会自动变回来,估计是网络问题)。
(1)请问,这个是因为JobManager进程所在机器性能问题嘛,如果我单独一台机器跑JobManager会不会好一点。
Each task will be assigned a dedicated thread for its data processing.
A slot can be shared by multiple tasks if they are in the same slot sharing
group[1].
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/concepts/runtime.html#task-slots-and-resources
Thanks,
Zhu
ゞ野蠻遊戲χ
Hi ??
slot1??slot??
??
Hi all
Can only one thread run at a time for a slot? Or one
slot can run multiple threads in parallel at the same time?
Thanks,
Jiazhi
Hi,
可以去社区jira上建个issue吗?如果有问题在1.11的版本里也需要修复的。
祝好
Leonard
[1] https://issues.apache.org/jira/projects/FLINK/issues/
> 在 2020年11月24日,01:03,macdoor 写道:
>
> 自己回答一下,供其他人参考。
>
> 换成flink 1.12.0-rc1,用相同sql处理相同数据,结果跟 hive 计算的结果相同,确认是 1.11.2
> 的一个bug,1.12应该已经改正了
>
>
>
> --
> Sent from:
Hi Kevin Kwon ~
Do you want to customize only the source operator name or all the operator
name in order for the state compatibility ?
State compatibility is an orthogonal topic and keep the operator name is
one way to solve it.
Kevin Kwon 于2020年11月25日周三 上午1:11写道:
> For SQLs, I know that the
退订
| |
gfjia
|
|
邮箱:gfjia_t...@163.com
|
签名由 网易邮箱大师 定制
Thank you. This is very helpful.
On Mon, Nov 23, 2020 at 9:46 AM Till Rohrmann wrote:
> Hi George,
>
> Here is some documentation about how to deploy a stateful function job
> [1]. In a nutshell, you need to deploy a Flink cluster on which you can run
> the stateful function job. This can
Hi Arvid,
Thank you for your answer.
And what if a) would block task's thread?
Let's say I'm ok with making entire task thread to wait on this third party
lib.
In that case I would be safe from having this exception even though I would
not use AsyncIO?
--
Sent from:
Hi everyone,
I have a question about how delayed messages work, I tried to dig through
some docs on it, but not sure it addresses exactly my question. Basically,
if I send a delayed message with exactly-once mode on, does Flink need to
wait until the delayed message sends to commit Kafka offsets?
Probolved solved. It is because another function sends messages to myFunc
by using non hard coded ids. Thanks.
On Tue, Nov 24, 2020 at 11:24 AM Lian Jiang wrote:
> Hi,
>
> I am using statefun 2.2.0 and have below routing:
>
> downstream.forward(myFunc.TYPE, myFunc.TYPE.name(), message);
>
> I
Hi KristoffSC,
sorry for the confusing error message. In short, mailbox thread = task
thread.
your operator a) calls collector.collect from a different thread (in which
the CompleteableFuture is completed). However, all APIs must always be used
from the task thread.
The only way to cross thread
Hi,
I am using statefun 2.2.0 and have below routing:
downstream.forward(myFunc.TYPE, myFunc.TYPE.name(), message);
I expect this statement will create only one physical myFunc because
the id is hard coded with myFunc.TYPE.name().
This design can use the PersistedValue field in myFunc for all
Hi Timo and Dawid,
Thank you for a detailed answer; it looks like we need to reconsider all job
submission flow.
What is the best way to compare the new job graph? Can we use Flink
visualizer to ensure that the new job graph shares the table as you mention
It is not guaranteed?
Best regards,
For SQLs, I know that the operator ID assignment is not possible now since
the query optimizer may not be backward compatible in each release
But are DDLs also affected by this?
for example,
CREATE TABLE mytable (
id BIGINT,
data STRING
) with (
connector = 'kafka'
...
id = 'mytable'
Hi Fuyao,
great that you could make progress.
2. Btw nice summary of the idleness concept. We should have that in the
docs actually.
4. By looking at tests like `IntervalJoinITCase` [1] it seems that we
also support FULL OUTER JOINs as interval joins. Maybe you can make use
of them.
5.
Hi,
I faced an issue on Flink 1.11. It was for now one time thing and I cannot
reproduce it. However I think something is lurking there...
I cannot post full stack trace and user code however I will try to describe
the problem.
Setup without any resource groups with only one Operator chain
Btw,能问一下为什么用 Stream API 而不是直接用 Flink SQL 么?
On Wed, 25 Nov 2020 at 00:21, Jark Wu wrote:
> See the docs:
> https://github.com/ververica/flink-cdc-connectors/wiki/MySQL-CDC-Connector#setting-up-mysql-session-timeouts
>
> On Tue, 24 Nov 2020 at 23:54, yujianbo <15205029...@163.com> wrote:
>
>>
See the docs:
https://github.com/ververica/flink-cdc-connectors/wiki/MySQL-CDC-Connector#setting-up-mysql-session-timeouts
On Tue, 24 Nov 2020 at 23:54, yujianbo <15205029...@163.com> wrote:
> 一、环境:
> 1、版本:1.11.2
> 2、flink CDC 用Stream API 从mysql 同步到kudu
>
> 二、遇到的问题现象:
>
Hi Patrick,
Flink supports regional failover [1] which only restarts all tasks
connected via pipelined data exchanges. Hence, either when having an
embarrassingly parallel topology or running a batch job, Flink should not
restart the whole job in case of a task failure.
However, in the case of
Hi,
one advice I can give you is to checkout the code and execute some of
the examples in debugging mode. Esp. within Flink's functions e.g.
MapFunction or ProcessFunction you can set a breakpoint and look at the
stack trace. This gives you a good overview about the Flink stack in
general.
I agree with Dawid.
Maybe one thing to add is that reusing parts of the pipeline is possible
via StatementSets in TableEnvironment. They allow you to add multiple
queries that consume from a common part of the pipeline (for example a
common source). But all of that is compiled into one big
Hi Flavio,
looking only at the code, then the job should first transition into a
globally terminal state before notifying the client about it. The only
possible reason I could see for this behaviour is that the
RestServerEndpoint uses an ExecutionGraphCache (DefaultExecutionGraphCache
is the
Hi all,
We are trying to setup regions to enable Flink to only stop failing tasks based
on region instead of failing the entire stream.
We are using one main stream that is reading from a kafka topic and a bunch of
side outputs for processing each event from that topic differently.
For the
Just to close this thread I found the cause of the problem: looking into
the code of the mysql connector the value of
PropertyDefinitions.SYSP_disableAbandonedConnectionCleanup
is "com.mysql.cj.disableAbandonedConnectionCleanup" and not
"com.mysql.disableAbandonedConnectionCleanup" as stated in
For debugging you can also implement a simple non-parallel source using
`org.apache.flink.streaming.api.functions.source.SourceFunction`. You
would need to implement the run() method with an endless loop after
emitting all your records.
Regards,
Timo
On 24.11.20 16:07, Klemens Muthmann
Hi,
Really sorry for a late reply.
To the best of my knowledge there is no such possibility to "attach" to
a source/reader of a different job. Every job would read the source
separately.
`The GenericInMemoryCatalog is an in-memory
implementation of a catalog. All objects will be available only
一、环境:
1、版本:1.11.2
2、flink CDC 用Stream API 从mysql 同步到kudu
二、遇到的问题现象:
1、目前线上已经同步了几张mysql表到kudu了,mysql的量级都在3千万左右。
但是有一张mysql表同步了几次都遇到一个问题:大概能判断在全量阶段,还没到增量阶段。
错误日志在下面。目前想采取“autoReconnect=true”看看来避免,到是不应该加在哪个地方,看日志感觉加了这个参数也是治标不治本,重点是为啥不发送packet,造成了卡顿?
下面是具体报错:
Hi,
the best way learning Flink's source code would be to dive into it [1].
[1] https://github.com/apache/flink
Cheers,
Till
On Tue, Nov 24, 2020 at 3:43 AM 心剑 <2752980...@qq.com> wrote:
> Excuse me, I want to learn the flink source code. Do you have any good
> information and the latest
ok, thanks you all for the help! s
From: David Anderson
Sent: 24 November 2020 15:16
To: Simone Cavallarin
Cc: user@flink.apache.org
Subject: Re: Print on screen DataStream content
Simone,
What you want to do is to override the toString() method on Event so
Simone,
What you want to do is to override the toString() method on Event so that
it produces a more helpful String as its result, and then use
stream.print()
in your IDE (where stream is a DataStream).
By the way, printOrTest(stream) isn't part of Flink -- that's just
something used by the
Hi,
yes, I would like to debug locally on my IDE.
This is what I tried so far, but no luck.
a)String ff = result.toString();
System.out.print(ff);
b) printOrTest(stream);
c)stream.print();
d) System.out.println(stream.print());
This is the output and to me it looks
Hi,
Thanks for your reply. I am using processing time instead of event time,
since we do get the events in batches and some might arrive days later.
But for my current dev setup I just use a CSV dump of finite size as
input. I will hand over the pipeline to some other guys, who will need
to
本地运行测试用例有时会有一堆Scala文件报错,但是整体工程编译又没问题,求大佬解答这种情况该怎么办呢?能忽略Scala文件吗?
Hi Saksham,
could you tell us a bit more about your deployement where you run Flink.
This seems to be the root exception:
2020-11-24 11:11:16,296 ERROR
org.apache.flink.runtime.rest.handler.taskmanager.TaskManagerStdoutFileHandler
[] - Failed to transfer file from TaskExecutor
自定义AggregateFunction 实现了UV的 HLL 近似计算,问题是 HyperLogLog 是第三方包,这个如何让flink 识别 ?
就不知道这个TypeInformation该如何写。
代码如下:
import io.airlift.slice.Slices;
import io.airlift.stats.cardinality.HyperLogLog;
import org.apache.flink.table.functions.AggregateFunction;
import org.slf4j.Logger;
import
好的,感谢Benchao的解答~
Benchao Li 于2020年11月24日周二 下午7:49写道:
> 从这一行代码看出来的:
>
> https://github.com/yangyichao-mango/flink-protobuf/blob/616051d74d0973136f931189fd29bd78c0e5/src/main/java/flink/formats/protobuf/ProtobufRowDeserializationSchema.java#L107
>
> 现在社区还没有正式支持ProtoBuf
Hi Klemens,
what you are observing are reasons why event-time should be preferred
over processing-time. Event-time uses the timestamp of your data while
processing-time is to basic for many use cases. Esp. when you want to
reprocess historic data, you want to do that at full speed instead of
Hi All,
I am running a job in flink and somehow the job is failing and the task
manager is getting out of the pool unknowingly.
Also some heartbeat timeout exceptions are coming.
Thanks,
Saksham
2020-11-24 11:07:44,594 INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] -
Hi Simone,
if you are just executing DataStream pipelines locally in your IDE while
prototyping. You should be able to use `DataStream#print()` which just
prints to standard out [1] (It might be hidden between the log messages).
For debugging locally, you can also just set breakpoints in
从这一行代码看出来的:
https://github.com/yangyichao-mango/flink-protobuf/blob/616051d74d0973136f931189fd29bd78c0e5/src/main/java/flink/formats/protobuf/ProtobufRowDeserializationSchema.java#L107
现在社区还没有正式支持ProtoBuf Format,不过已经有相关issue和讨论了[1]
[1] https://issues.apache.org/jira/browse/FLINK-18202
ok that's fine to me, just add an @internal annotation on the
RestClusterClient if it is intended only for internal use.. but wouldn't be
easier to provide some sort of client generation facility (e.g. swagger or
similar)?
Il mar 24 nov 2020, 11:38 Till Rohrmann ha scritto:
> I see the point in
YAML file 中定义的 source sink 是通过老的 factory 来寻找的,debezium format
只实现了新接口,所以会找不到。
目前也没有计划在 YAML 中支持新接口,因为 YAML 的方式已经被废弃了。
可以看下这个issue: https://issues.apache.org/jira/browse/FLINK-20260
Best,
Jark
On Tue, 24 Nov 2020 at 18:52, jy l wrote:
> Hi:
> flink版本1.12.0:
>
>
Hi:
flink版本1.12.0:
我想在sql-client-defaults.yaml中配置一张表,配置如下:
tables:
- name: t_users
type: source-table
connector:
property-version: 1
type: kafka
version: universal
topic: ods.userAnalysis.user_profile
startup-mode: latest-offset
Hi Kevin,
I expect the 1.12.0 release to happen within the next 3 weeks.
Cheers,
Till
On Tue, Nov 24, 2020 at 4:23 AM Yang Wang wrote:
> Hi Kevin,
>
> Let me try to understand your problem. You have added the trusted keystore
> to the Flink app image(my-flink-app:0.0.1)
> and it could not be
I see the point in having a richer RestClusterClient. However, I think we
first have to make a decision whether the RestClusterClient is something
internal or not. If it is something internal, then only extending the
RestClusterClient and not adding these convenience methods to ClusterClient
could
I tried to `DataStream#print()` but I don't quite understand how to implement
it. Could you please give me an example? I'm using Intellij so what I would
need is just to see the data on my screen.
Thanks
From: David Anderson
Sent: 24 November 2020 10:01
To:
When Flink is running on a cluster, `DataStream#print()` prints to files in
the log directory.
Regards,
David
On Tue, Nov 24, 2020 at 6:03 AM Pankaj Chand
wrote:
> Please correct me if I am wrong. `DataStream#print()` only prints to the
> screen when running from the IDE, but does not work
Hi,
I have written an Apache Flink Pipeline containing the following piece
of code (Java):
stream.window(ProcessingTimeSessionWindows.withGap(Time.seconds(50))).aggregate(new
CustomAggregator()).print();
If I run the pipeline using local execution I see the following
behavior. The
Hello everybody,
these days I have been trying to use the JobListener to implement a simple
logic in our platform that consists in calling an external service to
signal that the job has ended and, in case of failure, save the error cause.
After some problems to make it work when starting a job
这是从哪看出来的呢 求指点,另外如果想用DDL写的schema 应该怎么做呢?
Benchao Li 于2020年11月24日周二 下午4:33写道:
> 看起来这个format是用的自动推导schema,而不是用的DDL写的schema。
>
> zilong xiao 于2020年11月24日周二 下午4:13写道:
>
> > 用的Flink1.11 不过是用的别人写的format,估计是这里面有bug吧,
> > https://github.com/yangyichao-mango/flink-protobuf
> >
> > Benchao Li
看起来这个format是用的自动推导schema,而不是用的DDL写的schema。
zilong xiao 于2020年11月24日周二 下午4:13写道:
> 用的Flink1.11 不过是用的别人写的format,估计是这里面有bug吧,
> https://github.com/yangyichao-mango/flink-protobuf
>
> Benchao Li 于2020年11月24日周二 下午3:43写道:
>
> > 看起来你的DDL写的没有什么问题。
> >
> > 你用的是哪个Flink版本呢?
> > 此外就是可以发下更完整的异常栈么?
> >
> >
不好意思,这个报错应该是内存的问题。 我想说的是一下的报错。
2020-11-24 16:19:33,569 ERROR
org.apache.flink.kubernetes.kubeclient.Fabric8FlinkKubeClient [] - A Kubernetes
exception occurred.
java.net.UnknownHostException: tuiwen-flink-rest.flink: Name or service not
known
at
用的Flink1.11 不过是用的别人写的format,估计是这里面有bug吧,
https://github.com/yangyichao-mango/flink-protobuf
Benchao Li 于2020年11月24日周二 下午3:43写道:
> 看起来你的DDL写的没有什么问题。
>
> 你用的是哪个Flink版本呢?
> 此外就是可以发下更完整的异常栈么?
>
> zilong xiao 于2020年11月24日周二 下午2:54写道:
>
> > Hi Benchao,图片可以看https://imgchr.com/i/DtoGge,期待您的解答~
> >
> >
Hi,
这是一个已知问题 [1][2],新版本中我们只是简单的把这几个函数在hive module里禁掉了 [3],建议先用flink的函数来绕一下。
[1] https://issues.apache.org/jira/browse/FLINK-16688
[2] https://issues.apache.org/jira/browse/FLINK-16618
[3] https://issues.apache.org/jira/browse/FLINK-18995
On Tue, Nov 24, 2020 at 11:54 AM 酷酷的浑蛋 wrote:
>
使用-Dkubernetes.rest-service.exposed.type=ClusterIP 配置是启动的flink报错:
如下:
2020-11-24 15:49:19,796 INFO
org.apache.flink.configuration.GlobalConfiguration
[] - Loading configuration property: jobmanager.rpc.address,
0.0.0.0
2020-11-24 15:49:19,800 INFO
68 matches
Mail list logo