[jira] [Created] (FLINK-14538) Metrics of TaskManager Status.JVM.Memory.NonHeap.Max is always -1
Shuwen Zhou created FLINK-14538: --- Summary: Metrics of TaskManager Status.JVM.Memory.NonHeap.Max is always -1 Key: FLINK-14538 URL: https://issues.apache.org/jira/browse/FLINK-14538 Project: Flink Issue Type: Bug Components: Runtime / Metrics Affects Versions: 1.9.0 Reporter: Shuwen Zhou Attachments: image-2019-10-28-12-34-53-413.png I'm collecting TaskManager status metrics to DataDog. While all metrics of TaskManager Status.JVM.Memory.NonHeap.Max is always -1 !image-2019-10-28-12-34-53-413.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] FLIP-78: Flink Python UDF Environment and Dependency Management
Hi Max, Thanks for your feedback. You are right, we really need a more generic solution, I volunteer to draft an init solution design doc, and bring up the discussion in Beam @dev ASAP. (Maybe after release of Flink 1.10). Thank you for the voting. Best, Jincheng Maximilian Michels 于2019年10月26日周六 上午1:05写道: > Hi Wei, hi Jincheng, > > +1 on the current approach. > > I agree it would be nice to allow for the Beam artifact staging to use > Flink's BlobServer. However, the current implementation which uses the > distributed file system is more generic, since the BlobServer is only > available on the TaskManager and not necessarily inside Harness > containers (Stage 3). > > So for stage 1 (Client <=> JobServer) we could certainly use the > BlobServer. Stage 2 (Flink job submission) already uses it, and Stage 3 > (container setup) probably has to have some form of distributed file > system or directory which has been populated with the dependencies. > > Thanks, > Max > > On 25.10.19 03:45, Wei Zhong wrote: > > Hi Max, > > > > Is there any other concerns from your side? I appreciate if you can give > some feedback and vote on this. > > > > Best, > > Wei > > > >> 在 2019年10月25日,09:33,jincheng sun 写道: > >> > >> Hi Thomas, > >> > >> Thanks for your explanation. I understand your original intention. I > will > >> seriously consider this issue. After I have the initial solution, I will > >> bring up a further discussion in Beam ML. > >> > >> Thanks for your voting. :) > >> > >> Best, > >> Jincheng > >> > >> > >> Thomas Weise 于2019年10月25日周五 上午7:32写道: > >> > >>> Hi Jincheng, > >>> > >>> Yes, this topic can be further discussed on the Beam ML. The only > reason I > >>> brought it up here is that it would be desirable from Beam Flink runner > >>> perspective for the artifact staging mechanism that you work on to be > >>> reusable. > >>> > >>> Stage 1 in Beam is also up to the runner, artifact staging is a service > >>> discovered from the job server and that the Flink job server currently > uses > >>> DFS is not set in stone. My interest was more regarding assumptions > >>> regarding the artifact structure, which may or may not allow for > reusable > >>> implementation. > >>> > >>> +1 for the proposal otherwise > >>> > >>> Thomas > >>> > >>> > >>> On Mon, Oct 21, 2019 at 8:40 PM jincheng sun > > >>> wrote: > >>> > Hi Thomas, > > Thanks for sharing your thoughts. I think improve and solve the > >>> limitations > of the Beam artifact staging is good topic(For beam). > > As I understand it as follows: > > For Beam(data): > Stage1: BeamClient --> JobService (data will be upload to > DFS). > Stage2: JobService(FlinkClient) --> FlinkJob (operator > download > the data from DFS) > Stage3: Operator --> Harness(artifact staging service) > > For Flink(data): > Stage1: FlinkClient(data(local) upload to BlobServer using > distribute > cache) --> Operator (data will be download from BlobServer). Do > not > have to depend on DFS. > Stage2: Operator --> Harness(for docker we using artifact > staging > service) > > So, I think Beam have to depend on DFS in Stage1. and Stage2 can using > distribute cache if we remove the dependency of DFS for Beam in > >>> Stage1.(Of > course we need more detail here), we can bring up the discussion in a > separate Beam dev@ ML, the current discussion focuses on Flink 1.10 > version > of UDF Environment and Dependency Management for python, so I > recommend > voting in the current ML for Flink 1.10, Beam artifact staging > >>> improvements > are discussed in a separate Beam dev@. > > What do you think? > > Best, > Jincheng > > Thomas Weise 于2019年10月21日周一 下午10:25写道: > > > Beam artifact staging currently relies on shared file system and > there > are > > limitations, for example when running locally with Docker and local > FS. > It > > sounds like a distributed cache based implementation might be a good > > (better?) option for artifact staging even for the Beam Flink runner? > > > > If so, can the implementation you propose be compatible with the Beam > > artifact staging service so that it can be plugged into the Beam > Flink > > runner? > > > > Thanks, > > Thomas > > > > > > On Mon, Oct 21, 2019 at 2:34 AM jincheng sun < > sunjincheng...@gmail.com > > > wrote: > > > >> Hi Max, > >> > >> Sorry for the late reply. Regarding the issue you mentioned above, > >>> I'm > > glad > >> to share my thoughts: > >> > >>> For process-based execution we use Flink's cache distribution > >>> instead > > of > >> Beam's artifact staging. > >> > >> In current design, we use Flink's cache distribution to upload > users' > > files > >> from client to cluster in both
[ANNOUNCE] Weekly Community Update 2019/43
Dear community, happy to share this week's community update with updates on some exciting ongoing efforts like unaligned checkpoints, the contribution of a Pulsar connector and the introduction of Executors, a new initiative around queryable state, a couple of bugs and a bit more. Flink Development == * [development process] Gary encourages everyone involved in Flink development to join the builds@.f.a.o mailing list to receive updates on build stability issues (e.g. from the nightly builds). You can subscribe via builds-subscr...@flink.apache.org. [1] * [state] Piotrs has proposed a change to FLIP-76 on unaligned checkpoints. For unaligned checkpoints in-flight data needs to be stored as part of the checkpoint. Piotr proposes to continuously persist all stream records to avoid having to store a large amount of in-flight data during checkpointing. Details in the thread [2] * [state] Vino Yang has revived the discussion on improving queryable state in Flink. Specifically, he proposes to introduce a new component called "QueryableStateLocationService": instead of connecting to the QueryableStateProxy of a Taskmanager a client would connect to the location service to find the correct TaskManager to serve its query. [3] * [network] The survey on non credit-based flow control is over and it will be dropped in the next Flink release. [4] * [connectors] For the contribution of the Pulsar connector(s), Yijie kicked off a discussion on a design document for a corresponding FLIP (FLIP-72). After some discussion, the initial FLIP will now cover mostly the sink. The source will be part of future work (after FLIP-27) and the catalog will get its own FLIP [5]. * [client] Kostas has started (and concluded) a discussion on FLIP-81 to introduce new configuration options for Executors (FLIP-73). FLIP-81 mainly introduces configuration options for parameters, that were previously only exposed via the CLI (e.g. the path to a savepoint for initial state). The vote is currently ongoing [6,7] * [sql] Two weeks ago Peter Huang started a discussion on FLIP-59 support for functions in Flink's DDL. The primary purpose is the dynamic registration of user defined functions via the DDL. [8] [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/REMINDER-Ensuring-build-stability-td34158.html [2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html [3] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduce-a-location-oriented-two-stage-query-mechanism-to-improve-the-queryable-state-td34265.html [4] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Dropping-non-Credit-based-Flow-Control-tp33714.html [5] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-72-Introduce-Pulsar-Connector-tp33283.html [6] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/FLIP-81-Executor-related-new-ConfigOptions-tp34236.html [7] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-81-Executor-related-new-ConfigOptions-tp34290.html [8] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discussion-FLIP-79-Flink-Function-DDL-Support-tp33965.html Notable Bugs == * [FLINK-14074] [1.9.1] [mesos] There is a bug in the way Flink starts Taskmanagers on Mesos: after the initial set of Taskmanagers is started (e.g. during the first job submission) any subsequent job submission will not trigger additional Taskmanagers to be launched although they are needed to run the job. Check the ticket for a workaround. Fixed for 1.9.2. and 1.10.0. [9] * [FLINK-15524[ [1.9.1] The JDBC table sink generates an invalid PostgreSQL query for upserts. Fixed in 1.9.2 and 1.10.0. [10] * [FLINK-14470] [1.9.1] [1.8.2] Not particularly new, but still open and confusing. The Flink WebUI might not show all watermarks for Flink jobs with many tasks. [11] [9] https://issues.apache.org/jira/browse/FLINK-14074 [10] https://issues.apache.org/jira/browse/FLINK-14524 [11] https://issues.apache.org/jira/browse/FLINK-14470 Events, Blog Posts, Misc === * Alibaba's food delivery business ele.me has published a blog post about their stream processing architecture and use cases for Flink. [10] * There will be Flink/Spark talk at the next Chicago Big Data [11] on the 7th of November. No idea what it will be about (can't join the group) :) * At the next Athens Big Data Group on the 14th of November *Chaoran Yu *of Lightbend will talk about Flink and Spark on Kubernetes. [12] * There will be full-day meetup with six talks in the Bangalore Kafka Group on the 2nd of November including at least three Flink talks by *Timo Walter* (Ververica), *Shashank Agarwal* (Razorpay) and *Rasyid Hakim* (GoJek). [13] [10] https://hackernoon.com/flink-or-flunk-why-ele-me-is-developing-a-taste-for-apache-flink-7d2a74e4d6c0 [11]
[jira] [Created] (FLINK-14536) Make clear the way to aggregate specified cpuCores resources
Zhu Zhu created FLINK-14536: --- Summary: Make clear the way to aggregate specified cpuCores resources Key: FLINK-14536 URL: https://issues.apache.org/jira/browse/FLINK-14536 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.10.0 Reporter: Zhu Zhu Fix For: 1.10.0 I'm raising this question because I find {{cpuCores}} in {{ResourceSpec#merge}} are aggregated with {{max()}}, while in {{ResourceProfile#merge}} it is {{sum()}}. This means that when calculating resources of a vertex from within operators, the {{cpuCores}} is the max value. While it is a sum(or subtraction in {{ResourceProfile#subtract}}) when dealing with shared slot bookkeeping and related checks. This is confusing to me, especially when I'm considering how to generate a shared slot resource spec merged from all vertices in it(see FLINK-14314). I'm not pretty sure if we already have a concise definition for it? If there is, we need to respect it and change {{ResourceSpec}} or {{ResourceProfile}} accordingly. If not, we need to decide it first before we can move on with fine grained resources. Need to mention that if we take the {{max()}} way, we need to re-consider how we can conduct a correct {{ResourceProfile#subtract}} regarding {{cpuCores}}, since there is not a clear way to reverse a {{max()}} computation. cc [~trohrmann] [~wuzang] -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Build error: package does not exist
Hi Hynek, Bruce is right, you should build Flink source code first before developing by `mvn clean package -DskipTests` in the root directory of Flink. This may take 10 minutes or more depends on your machine. Best, Jark On Sun, 27 Oct 2019 at 20:46, yanjun qiu wrote: > Hi Hynek, > I think you should run maven build first, execute mvn clean install > -DskipTests. Because the Flink SQL parser is used apache calcite framework > to generate the sql parser source code. > > Regards, > Bruce > > > 在 2019年10月27日,上午12:09,Hynek Noll 写道: > > > > package seems to be missing on GitHub: > >
Re: Build error: package does not exist
Hi Hynek, I think you should run maven build first, execute mvn clean install -DskipTests. Because the Flink SQL parser is used apache calcite framework to generate the sql parser source code. Regards, Bruce > 在 2019年10月27日,上午12:09,Hynek Noll 写道: > > package seems to be missing on GitHub: