[jira] [Created] (FLINK-14538) Metrics of TaskManager Status.JVM.Memory.NonHeap.Max is always -1

2019-10-27 Thread Shuwen Zhou (Jira)
Shuwen Zhou created FLINK-14538:
---

 Summary: Metrics of TaskManager Status.JVM.Memory.NonHeap.Max is 
always -1
 Key: FLINK-14538
 URL: https://issues.apache.org/jira/browse/FLINK-14538
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Metrics
Affects Versions: 1.9.0
Reporter: Shuwen Zhou
 Attachments: image-2019-10-28-12-34-53-413.png

I'm collecting TaskManager status metrics to DataDog.

While all metrics of TaskManager Status.JVM.Memory.NonHeap.Max is always -1

!image-2019-10-28-12-34-53-413.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] FLIP-78: Flink Python UDF Environment and Dependency Management

2019-10-27 Thread jincheng sun
Hi Max,

Thanks for your feedback. You are right, we really need a more generic
solution,  I volunteer to draft an init solution design doc, and bring up
the discussion in Beam @dev ASAP. (Maybe after release of Flink 1.10).

Thank you for the voting.

Best,
Jincheng

Maximilian Michels  于2019年10月26日周六 上午1:05写道:

> Hi Wei, hi Jincheng,
>
> +1 on the current approach.
>
> I agree it would be nice to allow for the Beam artifact staging to use
> Flink's BlobServer. However, the current implementation which uses the
> distributed file system is more generic, since the BlobServer is only
> available on the TaskManager and not necessarily inside Harness
> containers (Stage 3).
>
> So for stage 1 (Client <=> JobServer) we could certainly use the
> BlobServer. Stage 2 (Flink job submission) already uses it, and Stage 3
> (container setup) probably has to have some form of distributed file
> system or directory which has been populated with the dependencies.
>
> Thanks,
> Max
>
> On 25.10.19 03:45, Wei Zhong wrote:
> > Hi Max,
> >
> > Is there any other concerns from your side? I appreciate if you can give
> some feedback and vote on this.
> >
> > Best,
> > Wei
> >
> >> 在 2019年10月25日,09:33,jincheng sun  写道:
> >>
> >> Hi Thomas,
> >>
> >> Thanks for your explanation. I understand your original intention. I
> will
> >> seriously consider this issue. After I have the initial solution, I will
> >> bring up a further discussion in Beam ML.
> >>
> >> Thanks for your voting. :)
> >>
> >> Best,
> >> Jincheng
> >>
> >>
> >> Thomas Weise  于2019年10月25日周五 上午7:32写道:
> >>
> >>> Hi Jincheng,
> >>>
> >>> Yes, this topic can be further discussed on the Beam ML. The only
> reason I
> >>> brought it up here is that it would be desirable from Beam Flink runner
> >>> perspective for the artifact staging mechanism that you work on to be
> >>> reusable.
> >>>
> >>> Stage 1 in Beam is also up to the runner, artifact staging is a service
> >>> discovered from the job server and that the Flink job server currently
> uses
> >>> DFS is not set in stone. My interest was more regarding assumptions
> >>> regarding the artifact structure, which may or may not allow for
> reusable
> >>> implementation.
> >>>
> >>> +1 for the proposal otherwise
> >>>
> >>> Thomas
> >>>
> >>>
> >>> On Mon, Oct 21, 2019 at 8:40 PM jincheng sun  >
> >>> wrote:
> >>>
>  Hi Thomas,
> 
>  Thanks for sharing your thoughts. I think improve and solve the
> >>> limitations
>  of the Beam artifact staging is good topic(For beam).
> 
>  As I understand it as follows:
> 
>  For Beam(data):
>  Stage1: BeamClient --> JobService (data will be upload to
> DFS).
>  Stage2: JobService(FlinkClient) -->  FlinkJob (operator
> download
>  the data from DFS)
>  Stage3: Operator --> Harness(artifact staging service)
> 
>  For Flink(data):
>  Stage1: FlinkClient(data(local) upload to BlobServer using
> distribute
>  cache) --> Operator (data will be download from BlobServer). Do
> not
>  have to depend on DFS.
>  Stage2: Operator --> Harness(for docker we using artifact
> staging
>  service)
> 
>  So, I think Beam have to depend on DFS in Stage1. and Stage2 can using
>  distribute cache if we remove the dependency of DFS for Beam in
> >>> Stage1.(Of
>  course we need more detail here),  we can bring up the discussion in a
>  separate Beam dev@ ML, the current discussion focuses on Flink 1.10
>  version
>  of  UDF Environment and Dependency Management for python, so I
> recommend
>  voting in the current ML for Flink 1.10, Beam artifact staging
> >>> improvements
>  are discussed in a separate Beam dev@.
> 
>  What do you think?
> 
>  Best,
>  Jincheng
> 
>  Thomas Weise  于2019年10月21日周一 下午10:25写道:
> 
> > Beam artifact staging currently relies on shared file system and
> there
>  are
> > limitations, for example when running locally with Docker and local
> FS.
>  It
> > sounds like a distributed cache based implementation might be a good
> > (better?) option for artifact staging even for the Beam Flink runner?
> >
> > If so, can the implementation you propose be compatible with the Beam
> > artifact staging service so that it can be plugged into the Beam
> Flink
> > runner?
> >
> > Thanks,
> > Thomas
> >
> >
> > On Mon, Oct 21, 2019 at 2:34 AM jincheng sun <
> sunjincheng...@gmail.com
> 
> > wrote:
> >
> >> Hi Max,
> >>
> >> Sorry for the late reply. Regarding the issue you mentioned above,
> >>> I'm
> > glad
> >> to share my thoughts:
> >>
> >>> For process-based execution we use Flink's cache distribution
> >>> instead
> > of
> >> Beam's artifact staging.
> >>
> >> In current design, we use Flink's cache distribution to upload
> users'
> > files
> >> from client to cluster in both 

[ANNOUNCE] Weekly Community Update 2019/43

2019-10-27 Thread Konstantin Knauf
Dear community,

happy to share this week's community update with updates on some exciting
ongoing efforts like unaligned checkpoints, the contribution of a Pulsar
connector and the introduction of Executors, a new initiative around
queryable state, a couple of bugs and a bit more.

Flink Development
==

* [development process] Gary encourages everyone involved in Flink
development to join the builds@.f.a.o mailing list to receive updates on
build stability issues (e.g. from the nightly builds). You can subscribe
via builds-subscr...@flink.apache.org. [1]

* [state] Piotrs has proposed a change to FLIP-76 on unaligned checkpoints.
For unaligned checkpoints in-flight data needs to be stored as part of the
checkpoint. Piotr proposes to continuously persist all stream records to
avoid having to store a large amount of in-flight data during
checkpointing. Details in the thread [2]

* [state] Vino Yang has revived the discussion on improving queryable state
in Flink. Specifically, he proposes to introduce a new component called
"QueryableStateLocationService": instead of connecting to the
QueryableStateProxy of a Taskmanager a client would connect to the location
service to find the correct TaskManager to serve its query. [3]

* [network] The survey on non credit-based flow control is over and it will
be dropped in the next Flink release. [4]

* [connectors] For the contribution of the Pulsar connector(s), Yijie
kicked off a discussion on a design document for a corresponding FLIP
(FLIP-72). After some discussion, the initial FLIP will now cover mostly
the sink. The source will be part of future work (after FLIP-27) and the
catalog  will get its own FLIP [5].

* [client] Kostas has started (and concluded) a discussion on FLIP-81 to
introduce new configuration options for Executors (FLIP-73). FLIP-81 mainly
introduces configuration options for parameters, that were previously only
exposed via the CLI (e.g. the path to a savepoint for initial state). The
vote is currently ongoing [6,7]

* [sql] Two weeks ago Peter Huang started a discussion on FLIP-59 support
for functions in Flink's DDL. The primary purpose is the dynamic
registration of user defined functions via the DDL. [8]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/REMINDER-Ensuring-build-stability-td34158.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduce-a-location-oriented-two-stage-query-mechanism-to-improve-the-queryable-state-td34265.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Dropping-non-Credit-based-Flow-Control-tp33714.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-72-Introduce-Pulsar-Connector-tp33283.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/FLIP-81-Executor-related-new-ConfigOptions-tp34236.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-81-Executor-related-new-ConfigOptions-tp34290.html
[8]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discussion-FLIP-79-Flink-Function-DDL-Support-tp33965.html

Notable Bugs
==

* [FLINK-14074] [1.9.1] [mesos] There is a bug in the way Flink starts
Taskmanagers on Mesos: after the initial set of Taskmanagers is started
(e.g. during the first job submission) any subsequent job submission will
not trigger additional Taskmanagers to be launched although they are needed
to run the job. Check the ticket for a workaround. Fixed for 1.9.2. and
1.10.0.  [9]
* [FLINK-15524[ [1.9.1] The JDBC table sink generates an invalid PostgreSQL
query for upserts. Fixed in 1.9.2 and 1.10.0. [10]
* [FLINK-14470] [1.9.1] [1.8.2] Not particularly new, but still open and
confusing. The Flink WebUI might not show all watermarks for Flink jobs
with many tasks. [11]

[9] https://issues.apache.org/jira/browse/FLINK-14074
[10] https://issues.apache.org/jira/browse/FLINK-14524
[11] https://issues.apache.org/jira/browse/FLINK-14470


Events, Blog Posts, Misc
===

* Alibaba's food delivery business ele.me has published a blog post about
their stream processing architecture and use cases for Flink. [10]

* There will be Flink/Spark talk at the next Chicago Big Data [11] on the
7th of November. No idea what it will be about (can't join the group) :)

* At the next Athens Big Data Group on the 14th of November *Chaoran Yu *of
Lightbend will talk about Flink and Spark on Kubernetes. [12]

* There will be full-day meetup with six talks in the Bangalore Kafka Group
on the 2nd of November including at least three Flink talks by *Timo Walter*
(Ververica), *Shashank Agarwal* (Razorpay) and *Rasyid Hakim* (GoJek).
[13]

[10]
https://hackernoon.com/flink-or-flunk-why-ele-me-is-developing-a-taste-for-apache-flink-7d2a74e4d6c0
[11]

[jira] [Created] (FLINK-14536) Make clear the way to aggregate specified cpuCores resources

2019-10-27 Thread Zhu Zhu (Jira)
Zhu Zhu created FLINK-14536:
---

 Summary: Make clear the way to aggregate specified cpuCores 
resources
 Key: FLINK-14536
 URL: https://issues.apache.org/jira/browse/FLINK-14536
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Zhu Zhu
 Fix For: 1.10.0


I'm raising this question because I find {{cpuCores}} in {{ResourceSpec#merge}} 
are aggregated with {{max()}}, while in {{ResourceProfile#merge}} it is 
{{sum()}}.

This means that when calculating resources of a vertex from within operators, 
the {{cpuCores}} is the max value. While it is a sum(or subtraction in 
{{ResourceProfile#subtract}}) when dealing with shared slot bookkeeping and 
related checks. 
This is confusing to me, especially when I'm considering how to generate a 
shared slot resource spec merged from all vertices in it(see FLINK-14314).

I'm not pretty sure if we already have a concise definition for it?
If there is, we need to respect it and change {{ResourceSpec}} or 
{{ResourceProfile}} accordingly.
If not, we need to decide it first before we can move on with fine grained 
resources.

Need to mention that if we take the {{max()}} way, we need to re-consider how 
we can conduct a correct {{ResourceProfile#subtract}} regarding {{cpuCores}}, 
since there is not a clear way to reverse a {{max()}} computation.

cc [~trohrmann] [~wuzang]




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Build error: package does not exist

2019-10-27 Thread Jark Wu
Hi Hynek,

Bruce is right, you should build Flink source code first before developing
by `mvn clean package -DskipTests` in the root directory of Flink.
This may take 10 minutes or more depends on your machine.

Best,
Jark

On Sun, 27 Oct 2019 at 20:46, yanjun qiu  wrote:

> Hi Hynek,
> I think you should run maven build first, execute mvn clean install
> -DskipTests. Because the Flink SQL parser is used apache calcite framework
> to generate the sql parser source code.
>
> Regards,
> Bruce
>
> > 在 2019年10月27日,上午12:09,Hynek Noll  写道:
> >
> > package seems to be missing on GitHub:
>
>


Re: Build error: package does not exist

2019-10-27 Thread yanjun qiu
Hi Hynek,
I think you should run maven build first, execute mvn clean install 
-DskipTests. Because the Flink SQL parser is used apache calcite framework to 
generate the sql parser source code.

Regards,
Bruce

> 在 2019年10月27日,上午12:09,Hynek Noll  写道:
> 
> package seems to be missing on GitHub: