Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-06-07 Thread Jingsong Li
Hi, Thanks all for your feedback. I created JIRA for bundling format jars in lib. [1] FYI. [1]https://issues.apache.org/jira/browse/FLINK-18173 Best, Jingsong Lee On Fri, Jun 5, 2020 at 3:59 PM Rui Li wrote: > +1 to add light-weighted formats into the lib > > On Fri, Jun 5, 2020 at 3:28 PM

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-06-05 Thread Rui Li
+1 to add light-weighted formats into the lib On Fri, Jun 5, 2020 at 3:28 PM Leonard Xu wrote: > +1 for Jingsong’s proposal to put flink-csv, flink-json and flink-avro > under lib/ directory. > I have heard many SQL users(most of newbies) complaint the out-of-box > experience in mail list. > >

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-06-05 Thread Leonard Xu
+1 for Jingsong’s proposal to put flink-csv, flink-json and flink-avro under lib/ directory. I have heard many SQL users(most of newbies) complaint the out-of-box experience in mail list. Best, Leonard Xu > 在 2020年6月5日,14:39,Benchao Li 写道: > > +1 to include them for sql-client by default; >

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-06-05 Thread Benchao Li
+1 to include them for sql-client by default; +0 to put into lib and exposed to all kinds of jobs, including DataStream. Danny Chan 于2020年6月5日周五 下午2:31写道: > +1, at least, we should keep an out of the box SQL-CLI, it’s very poor > experience to add such required format jars for SQL users. > >

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-06-05 Thread Danny Chan
+1, at least, we should keep an out of the box SQL-CLI, it’s very poor experience to add such required format jars for SQL users. Best, Danny Chan 在 2020年6月5日 +0800 AM11:14,Jingsong Li ,写道: > Hi all, > > Considering that 1.11 will be released soon, what about my previous > proposal? Put

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-06-04 Thread Jark Wu
+1 to add these 3 formast into dist, under the lib/ directory. This is a worth trying step toward better usability for SQL users. They don't have *any* dependencies and very small, so I think it's safe to add them. Best, Jark On Fri, 5 Jun 2020 at 11:14, Jingsong Li wrote: > Hi all, > >

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-06-04 Thread Jingsong Li
Hi all, Considering that 1.11 will be released soon, what about my previous proposal? Put flink-csv, flink-json and flink-avro under lib. These three formats are very small and no third party dependence, and they are widely used by table users. Best, Jingsong Lee On Tue, May 12, 2020 at 4:19 PM

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-05-12 Thread Jingsong Li
Thanks for your discussion. Sorry to start discussing another thing: The biggest problem I see is the variety of problems caused by users' lack of format dependency. As Aljoscha said, these three formats are very small and no third party dependence, and they are widely used by table users.

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-05-11 Thread Chesnay Schepler
One downside would be that we're shipping more stuff when running on YARN for example, since the entire plugins directory is shiped by default. On 17/04/2020 16:38, Stephan Ewen wrote: @Aljoscha I think that is an interesting line of thinking. the swift-fs may be rarely enough used to move it

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-05-05 Thread Thomas Weise
Great discussion! I'm also in favor of a single distribution that is optimized for the initial user experience. Most advanced users understand how to customize a distribution and many are probably already building their own. A forcing function for custom builds is the need to patch the official

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-05-05 Thread Benchao Li
Hi all, Thanks Aljoscha for bringing this discussion, and thanks all for the wonderful discussion. In general, I think improving the user experience is a good idea, and it seems that we all agree on that. Regarding how to achieve this, I think Aljoscha has brought a good solution, which we have

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-05-05 Thread Kurt Young
SQL client is one of the user cases. There are also use cases like submitting SQL job to a cluster and then meet the missing connector or format jars error. And in that case, it's actually more difficult for users to understand and fix. For example, user submits a SQL job to a running cluster with

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-05-05 Thread Aljoscha Krettek
For SQL we could leave them in opt/. The SQL client shell script already does discovery for some jars in opt, for example the main SQL client jar is not in lib but it's loaded from opt/. We could do the same for the connector/format jars. @Timo or @Jark could you confirm whether this would

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-05-05 Thread Till Rohrmann
Are you suggesting to add the SQL dependencies to opt/ or lib/? I thought the argument against opt/ was that it would not be much different from downloading the additional dependencies. Moving it to lib/ would justify in my opinion a separate release because of potential dependency conflicts for

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-05-05 Thread Aljoscha Krettek
Thanks Till for summarizing! Another alternative is also to stick to one distribution but remove one of the very heavy filesystem connectors and add all the mentioned SQL connectors/formats, which will keep the size of the distribution the same, or a bit smaller. Best, Aljoscha On 04.05.20

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-05-04 Thread Till Rohrmann
Thanks everyone for this lively discussion and all your thoughts. Let me try to summarise the current state of the discussion and then let's see how we can move it forward. To begin with, I think everyone agrees that we want to improve Flink's user experience. In particular, we want to improve

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-28 Thread Chesnay Schepler
It would be good if we could nail down what a slim/fat distribution would look like, as there are various ideas floating around in this thread. Like, what is a "slim" distribution? Are we just emptying /opt? Removing everything larger than 1mb? Are we throwing out the Table API from /lib for

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-28 Thread Chesnay Schepler
This would likely solve the issues surrounding the SQL client, so I would go along with that. On 17/04/2020 12:16, Aljoscha Krettek wrote: I think having such tools and/or tailor-made distributions can be nice but I also think the discussion is missing the main point: The initial

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-24 Thread Kurt Young
+1 for "slim" and "fat" solution. One comment about the fat one, I think we need to put all needed jars into /lib (or /plugins). Put jars into /opt and relying on users moving them from /opt to /lib doesn't really improve the out-of-box experience. Best, Kurt On Fri, Apr 24, 2020 at 8:28 PM

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-24 Thread Aljoscha Krettek
re (1): I don't know about that, probably the people that did the metrics reporter plugin support had some thoughts about that. re (2): I agree, that's why I initially suggested to split it into "slim" and "fat" because our current "medium fat" selection of jars in Flink dist does not serve

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-17 Thread Stephan Ewen
@Aljoscha I think that is an interesting line of thinking. the swift-fs may be rarely enough used to move it to an optional download. I would still drop two more thoughts: (1) Now that we have plugins support, is there a reason to have a metrics reporter or file system in /opt instead of

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-17 Thread Aljoscha Krettek
I think having such tools and/or tailor-made distributions can be nice but I also think the discussion is missing the main point: The initial observation/motivation is that apparently a lot of users (Kurt and I talked about this) on the chinese DingTalk support groups, and other support

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-17 Thread Stephan Ewen
A similar issue exists for the docker files. I also heard the fame feedback from various users, for example why we don't simply include all FS connectors in the images by default. I actually like the idea of having a slim and a fat/convenience docker file. - If you build a clean production

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-16 Thread Jark Wu
Hi, I like the idea of web tool to assemble fat distribution. And the https://code.quarkus.io/ looks very nice. All the users need to do is just select what he/she need (I think this step can't be omitted anyway). We can also provide a default fat distribution on the web which default selects

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-16 Thread Rafi Aroch
As a reference for a nice first-experience I had, take a look at https://code.quarkus.io/ You reach this page after you click "Start Coding" at the project homepage. Rafi On Thu, Apr 16, 2020 at 6:53 PM Kurt Young wrote: > I'm not saying pre-bundle some jars will make this problem go away,

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-16 Thread Kurt Young
I'm not saying pre-bundle some jars will make this problem go away, and you're right that only hides the problem for some users. But what if this solution can hide the problem for 90% users? Would't that be good enough for us to try? Regarding to would users following instructions really be such

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-16 Thread Chesnay Schepler
The problem with having a distribution with "popular" stuff is that it doesn't really /solve/ a problem, it just hides it for users who fall into these particular use-cases. Move out of it and you once again run into exact same problems out-lined. This is exactly why I like the tooling

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-16 Thread Kurt Young
I'm not so sure about the web tool solution though. The concern I have for this approach is the final generated distribution is kind of non-deterministic. We might generate too many different combinations when user trying to package different types of connector, format, and even maybe hadoop

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-16 Thread Till Rohrmann
I think what Chesnay and Dawid proposed would be the ideal solution. Ideally, we would also have a nice web tool for the website which generates the corresponding distribution for download. To get things started we could start with only supporting to download/creating the "fat" version with the

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-16 Thread Dawid Wysakowicz
Hi all, Few points from my side: 1. I like the idea of simplifying the experience for first time users. As for production use cases I share Jark's opinion that in this case I would expect users to combine their distribution manually. I think in such scenarios it is important to understand

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-16 Thread Aljoscha Krettek
I want to reinforce my opinion from earlier: This is about improving the situation both for first-time users and for experienced users that want to use a Flink dist in production. The current Flink dist is too "thin" for first-time SQL users and it is too "fat" for production users, that is

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread wenlong.lwl
Hi all, Regarding slim and fat distributions, I think different kinds of jobs may prefer different type of distribution: For DataStream job, I think we may not like fat distribution containing connectors because user would always need to depend on the connector in user code, it is easy to

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread Jingsong Li
Hi, I am thinking both "improve first experience" and "improve production experience". I'm thinking about what's the common mode of Flink? Streaming job use Kafka? Batch job use Hive? Hive 1.2.1 dependencies can be compatible with most of Hive server versions. So Spark and Presto have built-in

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread Jark Wu
Hi, I think we should first reach an consensus on "what problem do we want to solve?" (1) improve first experience? or (2) improve production experience? As far as I can see, with the above discussion, I think what we want to solve is the "first experience". And I think the slim jar is still the

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread Chesnay Schepler
I don't see a lot of value in having multiple distributions. The simple reality is that no fat distribution we could provide would satisfy all use-cases, so why even try. If users commonly run into issues for certain jars, then maybe those should be added to the current distribution.

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread Kurt Young
Regarding to the specific solution, I'm not sure about the "fat" and "slim" solution though. I get the idea that we can make the slim one even more lightweight than current distribution, but what about the "fat" one? Do you mean that we would package all connectors and formats into this? I'm not

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread Jingsong Li
Big +1. I like "fat" and "slim". For csv and json, like Jark said, they are quite small and don't have other dependencies. They are important to kafka connector, and important to upcoming file system connector too. So can we move them to both "fat" and "slim"? They're so important, and they're

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread godfrey he
Big +1. This will improve user experience (special for Flink new users). We answered so many questions about "class not found". Best, Godfrey Dian Fu 于2020年4月15日周三 下午4:30写道: > +1 to this proposal. > > Missing connector jars is also a big problem for PyFlink users. Currently, > after a Python

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread Dian Fu
+1 to this proposal. Missing connector jars is also a big problem for PyFlink users. Currently, after a Python user has installed PyFlink using `pip`, he has to manually copy the connector fat jars to the PyFlink installation directory for the connectors to be used if he wants to run jobs

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread Jark Wu
+1 to the proposal. I also found the "download additional jar" step is really verbose when I prepare webinars. At least, I think the flink-csv and flink-json should in the distribution, they are quite small and don't have other dependencies. Best, Jark On Wed, 15 Apr 2020 at 15:44, Jeff Zhang

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread Jeff Zhang
Hi Aljoscha, Big +1 for the fat flink distribution, where do you plan to put these connectors ? opt or lib ? Aljoscha Krettek 于2020年4月15日周三 下午3:30写道: > Hi Everyone, > > I'd like to discuss about releasing a more full-featured Flink > distribution. The motivation is that there is friction for

Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread Kurt Young
Big +1 from my side. >From my experience, missing connector & format jar is the TOP 1 problem which SQL users will probably run into. Similar questions raised in Flink's Dingtalk group almost every 1 or 2 days. And I have personally answered dozens of such question. Sometimes it's still not

[DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread Aljoscha Krettek
Hi Everyone, I'd like to discuss about releasing a more full-featured Flink distribution. The motivation is that there is friction for SQL/Table API users that want to use Table connectors which are not there in the current Flink Distribution. For these users the workflow is currently