Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-04 Thread Wei Zhong
Hi all,

Thanks for all of your quick response! I will bring up the VOTE.

Best,
Wei


> 在 2019年12月5日,10:39,Jark Wu  写道:
> 
> Hi Wei,
> 
> Thanks for bringing this discussion up, the changes look good to me.
> Looking forward to the vote. And you can prepare the pull request at the
> same time (in order to checkin in time).
> 
> Best,
> Jark
> 
> On Thu, 5 Dec 2019 at 10:27, Hequn Cheng  wrote:
> 
>> Hi all,
>> 
>> Thanks a lot for the discussion! Using "#" also makes sense to me.
>> And +1 to have these improvements in 1.10 as we don't want to
>> introduce compatibility problems later.
>> 
>> Looking forward to the vote!
>> 
>> Best, Hequn
>> 
>> 
>> 
>> On Thu, Dec 5, 2019 at 10:02 AM jincheng sun 
>> wrote:
>> 
>>> Hi all,
>>> 
>>> Thanks for the quick response Aljoscha & Wei !
>>> 
>>> It seems unify the options is necessary, and 1.10 will be code frozen. I
>>> would be like to bring up the VOTE thread for this change ASAP, and more
>>> detail can continue discuss in the PR.
>>> 
>>> What do you think?
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> Aljoscha Krettek  于2019年12月4日周三 下午5:11写道:
>>> 
 Perfect, thanks for the background info! I also found this section now,
 which mentions that it comes from Hadoop:
 
>>> 
>> https://spark.apache.org/docs/latest/running-on-yarn.html#important-notes.
 
 I think the proposed changes are good!
 
 Best,
 Aljoscha
 
> On 4. Dec 2019, at 04:34, Wei Zhong  wrote:
> 
> Hi Aljoscha,
> 
> Thanks for your reply! Before bringing up this discussion I did some
 research on commonly used separators for options that take multiple
>>> values.
 I have considered ",", ":" and "#". Finally I chose "#" as the
>> separator
>>> of
 "--pyRequirements".
> 
> For ",", it is the most widely used separator. Many projects use it
>> as
 the separator of the values in same level. e.g. "-Dexcludes" in Maven,
 "--files" in Spark and "-pyFiles" in Flink. But the second parameter of
 "--pyRequirements", the requirement cached directory, is not at the
>> same
 level as its first parameter (the requirements file). It is secondary
>> and
 is only needed when the packages in the requirements file can not be
 downloaded from the package index server.
> 
> For ":", it is used as a path separator in most cases. e.g. main
 arguments of scp (secure copy), "--volume" in Docker and "-cp" in Java.
>>> But
 as we support accept a URI as the file path, which contains ":" in most
 cases, ":" can not be used as the separator of "--pyRequirements".
> 
> For "#", it is really rarely used as a separator for multiple
>> values. I
 only find Spark using "#" as the separator for option "--files" and
 "--archives" between file path and target file/directory name. After
>> some
 research I find that this usage comes from the URI fragment. We can
>>> append
 a secondary resource as the fragment of the URI after a number sign
>> ("#")
 character. As we treat user file paths as URIs when parsing command
>> line,
 using "#" as the separator of "--pyRequirements" makes sense to me,
>> which
 means the second parameter is the fragment of the first parameter. The
 definition of URI fragment can be found here [1].
> 
> The reason of using "#" in "--pyArchives" as the separator of file
>> path
 and targer directory name is the same as above.
> 
> Best,
> Wei
> 
> [1] https://tools.ietf.org/html/rfc3986#section-3.5
> 
>> 在 2019年12月3日,22:02,Aljoscha Krettek  写道:
>> 
>> Hi,
>> 
>> Yes, I think it’s a good idea to make the options uniform. Using ‘#’
>>> as
 a separator for options that take two values seems a bit strange to me,
>>> did
 you research if any other CLI tools have this convention?
>> 
>> Side note: I don’t like that our options use camel-case, I think
>>> that’s
 very non-standard. But that’s how it is now…
>> 
>> Best,
>> Aljoscha
>> 
>>> On 3. Dec 2019, at 10:14, jincheng sun 
 wrote:
>>> 
>>> Thanks for bringup this discussion Wei!
>>> I think this is very important for Flink User, we should contains
>>> this
>>> changes in Flink 1.10.
>>> +1  for the optimization from the perspective of user convenience
>> and
 the
>>> unified use of Flink command line parameters.
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> Wei Zhong  于2019年12月2日周一 下午3:26写道:
>>> 
 Hi everyone,
 
 I wanted to bring up the discussion of improving the Pyflink
>> command
 line
 options.
 
 A few command line options have been introduced in the FLIP-78
>> [1],
 i.e.
 "python-executable-path", "python-requirements","python-archive",
>>> etc.
 There are a few problems with these options, i.e. the naming
>> style,
 variable argument options, etc.
 
 We want to make some 

Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-04 Thread Jark Wu
Hi Wei,

Thanks for bringing this discussion up, the changes look good to me.
Looking forward to the vote. And you can prepare the pull request at the
same time (in order to checkin in time).

Best,
Jark

On Thu, 5 Dec 2019 at 10:27, Hequn Cheng  wrote:

> Hi all,
>
> Thanks a lot for the discussion! Using "#" also makes sense to me.
> And +1 to have these improvements in 1.10 as we don't want to
> introduce compatibility problems later.
>
> Looking forward to the vote!
>
> Best, Hequn
>
>
>
> On Thu, Dec 5, 2019 at 10:02 AM jincheng sun 
> wrote:
>
> > Hi all,
> >
> > Thanks for the quick response Aljoscha & Wei !
> >
> > It seems unify the options is necessary, and 1.10 will be code frozen. I
> > would be like to bring up the VOTE thread for this change ASAP, and more
> > detail can continue discuss in the PR.
> >
> > What do you think?
> >
> > Best,
> > Jincheng
> >
> > Aljoscha Krettek  于2019年12月4日周三 下午5:11写道:
> >
> > > Perfect, thanks for the background info! I also found this section now,
> > > which mentions that it comes from Hadoop:
> > >
> >
> https://spark.apache.org/docs/latest/running-on-yarn.html#important-notes.
> > >
> > > I think the proposed changes are good!
> > >
> > > Best,
> > > Aljoscha
> > >
> > > > On 4. Dec 2019, at 04:34, Wei Zhong  wrote:
> > > >
> > > > Hi Aljoscha,
> > > >
> > > > Thanks for your reply! Before bringing up this discussion I did some
> > > research on commonly used separators for options that take multiple
> > values.
> > > I have considered ",", ":" and "#". Finally I chose "#" as the
> separator
> > of
> > > "--pyRequirements".
> > > >
> > > > For ",", it is the most widely used separator. Many projects use it
> as
> > > the separator of the values in same level. e.g. "-Dexcludes" in Maven,
> > > "--files" in Spark and "-pyFiles" in Flink. But the second parameter of
> > > "--pyRequirements", the requirement cached directory, is not at the
> same
> > > level as its first parameter (the requirements file). It is secondary
> and
> > > is only needed when the packages in the requirements file can not be
> > > downloaded from the package index server.
> > > >
> > > > For ":", it is used as a path separator in most cases. e.g. main
> > > arguments of scp (secure copy), "--volume" in Docker and "-cp" in Java.
> > But
> > > as we support accept a URI as the file path, which contains ":" in most
> > > cases, ":" can not be used as the separator of "--pyRequirements".
> > > >
> > > > For "#", it is really rarely used as a separator for multiple
> values. I
> > > only find Spark using "#" as the separator for option "--files" and
> > > "--archives" between file path and target file/directory name. After
> some
> > > research I find that this usage comes from the URI fragment. We can
> > append
> > > a secondary resource as the fragment of the URI after a number sign
> ("#")
> > > character. As we treat user file paths as URIs when parsing command
> line,
> > > using "#" as the separator of "--pyRequirements" makes sense to me,
> which
> > > means the second parameter is the fragment of the first parameter. The
> > > definition of URI fragment can be found here [1].
> > > >
> > > > The reason of using "#" in "--pyArchives" as the separator of file
> path
> > > and targer directory name is the same as above.
> > > >
> > > > Best,
> > > > Wei
> > > >
> > > > [1] https://tools.ietf.org/html/rfc3986#section-3.5
> > > >
> > > >> 在 2019年12月3日,22:02,Aljoscha Krettek  写道:
> > > >>
> > > >> Hi,
> > > >>
> > > >> Yes, I think it’s a good idea to make the options uniform. Using ‘#’
> > as
> > > a separator for options that take two values seems a bit strange to me,
> > did
> > > you research if any other CLI tools have this convention?
> > > >>
> > > >> Side note: I don’t like that our options use camel-case, I think
> > that’s
> > > very non-standard. But that’s how it is now…
> > > >>
> > > >> Best,
> > > >> Aljoscha
> > > >>
> > > >>> On 3. Dec 2019, at 10:14, jincheng sun 
> > > wrote:
> > > >>>
> > > >>> Thanks for bringup this discussion Wei!
> > > >>> I think this is very important for Flink User, we should contains
> > this
> > > >>> changes in Flink 1.10.
> > > >>> +1  for the optimization from the perspective of user convenience
> and
> > > the
> > > >>> unified use of Flink command line parameters.
> > > >>>
> > > >>> Best,
> > > >>> Jincheng
> > > >>>
> > > >>> Wei Zhong  于2019年12月2日周一 下午3:26写道:
> > > >>>
> > >  Hi everyone,
> > > 
> > >  I wanted to bring up the discussion of improving the Pyflink
> command
> > > line
> > >  options.
> > > 
> > >  A few command line options have been introduced in the FLIP-78
> [1],
> > > i.e.
> > >  "python-executable-path", "python-requirements","python-archive",
> > etc.
> > >  There are a few problems with these options, i.e. the naming
> style,
> > >  variable argument options, etc.
> > > 
> > >  We want to make some adjustment of FLIP-78 to improve the newly
> > > introduced
> > >  

Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-04 Thread Hequn Cheng
Hi all,

Thanks a lot for the discussion! Using "#" also makes sense to me.
And +1 to have these improvements in 1.10 as we don't want to
introduce compatibility problems later.

Looking forward to the vote!

Best, Hequn



On Thu, Dec 5, 2019 at 10:02 AM jincheng sun 
wrote:

> Hi all,
>
> Thanks for the quick response Aljoscha & Wei !
>
> It seems unify the options is necessary, and 1.10 will be code frozen. I
> would be like to bring up the VOTE thread for this change ASAP, and more
> detail can continue discuss in the PR.
>
> What do you think?
>
> Best,
> Jincheng
>
> Aljoscha Krettek  于2019年12月4日周三 下午5:11写道:
>
> > Perfect, thanks for the background info! I also found this section now,
> > which mentions that it comes from Hadoop:
> >
> https://spark.apache.org/docs/latest/running-on-yarn.html#important-notes.
> >
> > I think the proposed changes are good!
> >
> > Best,
> > Aljoscha
> >
> > > On 4. Dec 2019, at 04:34, Wei Zhong  wrote:
> > >
> > > Hi Aljoscha,
> > >
> > > Thanks for your reply! Before bringing up this discussion I did some
> > research on commonly used separators for options that take multiple
> values.
> > I have considered ",", ":" and "#". Finally I chose "#" as the separator
> of
> > "--pyRequirements".
> > >
> > > For ",", it is the most widely used separator. Many projects use it as
> > the separator of the values in same level. e.g. "-Dexcludes" in Maven,
> > "--files" in Spark and "-pyFiles" in Flink. But the second parameter of
> > "--pyRequirements", the requirement cached directory, is not at the same
> > level as its first parameter (the requirements file). It is secondary and
> > is only needed when the packages in the requirements file can not be
> > downloaded from the package index server.
> > >
> > > For ":", it is used as a path separator in most cases. e.g. main
> > arguments of scp (secure copy), "--volume" in Docker and "-cp" in Java.
> But
> > as we support accept a URI as the file path, which contains ":" in most
> > cases, ":" can not be used as the separator of "--pyRequirements".
> > >
> > > For "#", it is really rarely used as a separator for multiple values. I
> > only find Spark using "#" as the separator for option "--files" and
> > "--archives" between file path and target file/directory name. After some
> > research I find that this usage comes from the URI fragment. We can
> append
> > a secondary resource as the fragment of the URI after a number sign ("#")
> > character. As we treat user file paths as URIs when parsing command line,
> > using "#" as the separator of "--pyRequirements" makes sense to me, which
> > means the second parameter is the fragment of the first parameter. The
> > definition of URI fragment can be found here [1].
> > >
> > > The reason of using "#" in "--pyArchives" as the separator of file path
> > and targer directory name is the same as above.
> > >
> > > Best,
> > > Wei
> > >
> > > [1] https://tools.ietf.org/html/rfc3986#section-3.5
> > >
> > >> 在 2019年12月3日,22:02,Aljoscha Krettek  写道:
> > >>
> > >> Hi,
> > >>
> > >> Yes, I think it’s a good idea to make the options uniform. Using ‘#’
> as
> > a separator for options that take two values seems a bit strange to me,
> did
> > you research if any other CLI tools have this convention?
> > >>
> > >> Side note: I don’t like that our options use camel-case, I think
> that’s
> > very non-standard. But that’s how it is now…
> > >>
> > >> Best,
> > >> Aljoscha
> > >>
> > >>> On 3. Dec 2019, at 10:14, jincheng sun 
> > wrote:
> > >>>
> > >>> Thanks for bringup this discussion Wei!
> > >>> I think this is very important for Flink User, we should contains
> this
> > >>> changes in Flink 1.10.
> > >>> +1  for the optimization from the perspective of user convenience and
> > the
> > >>> unified use of Flink command line parameters.
> > >>>
> > >>> Best,
> > >>> Jincheng
> > >>>
> > >>> Wei Zhong  于2019年12月2日周一 下午3:26写道:
> > >>>
> >  Hi everyone,
> > 
> >  I wanted to bring up the discussion of improving the Pyflink command
> > line
> >  options.
> > 
> >  A few command line options have been introduced in the FLIP-78 [1],
> > i.e.
> >  "python-executable-path", "python-requirements","python-archive",
> etc.
> >  There are a few problems with these options, i.e. the naming style,
> >  variable argument options, etc.
> > 
> >  We want to make some adjustment of FLIP-78 to improve the newly
> > introduced
> >  command line options, here is the design doc:
> > 
> > 
> >
> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
> >  <
> > 
> >
> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
> > >
> >  Looking forward to your feedback!
> > 
> >  Best,
> >  Wei
> > 
> >  [1]
> > 
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
> >  <

Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-04 Thread Dian Fu
Thanks for bringing up this discussion Wei. +1 for this proposal!

As these options are proposed in 1.10, it will be great if we can improve them 
in 1.10. Then it will not cause compatible issues.

Thanks,
Dian

> 在 2019年12月5日,上午10:01,jincheng sun  写道:
> 
> Hi all,
> 
> Thanks for the quick response Aljoscha & Wei !
> 
> It seems unify the options is necessary, and 1.10 will be code frozen. I
> would be like to bring up the VOTE thread for this change ASAP, and more
> detail can continue discuss in the PR.
> 
> What do you think?
> 
> Best,
> Jincheng
> 
> Aljoscha Krettek  于2019年12月4日周三 下午5:11写道:
> 
>> Perfect, thanks for the background info! I also found this section now,
>> which mentions that it comes from Hadoop:
>> https://spark.apache.org/docs/latest/running-on-yarn.html#important-notes.
>> 
>> I think the proposed changes are good!
>> 
>> Best,
>> Aljoscha
>> 
>>> On 4. Dec 2019, at 04:34, Wei Zhong  wrote:
>>> 
>>> Hi Aljoscha,
>>> 
>>> Thanks for your reply! Before bringing up this discussion I did some
>> research on commonly used separators for options that take multiple values.
>> I have considered ",", ":" and "#". Finally I chose "#" as the separator of
>> "--pyRequirements".
>>> 
>>> For ",", it is the most widely used separator. Many projects use it as
>> the separator of the values in same level. e.g. "-Dexcludes" in Maven,
>> "--files" in Spark and "-pyFiles" in Flink. But the second parameter of
>> "--pyRequirements", the requirement cached directory, is not at the same
>> level as its first parameter (the requirements file). It is secondary and
>> is only needed when the packages in the requirements file can not be
>> downloaded from the package index server.
>>> 
>>> For ":", it is used as a path separator in most cases. e.g. main
>> arguments of scp (secure copy), "--volume" in Docker and "-cp" in Java. But
>> as we support accept a URI as the file path, which contains ":" in most
>> cases, ":" can not be used as the separator of "--pyRequirements".
>>> 
>>> For "#", it is really rarely used as a separator for multiple values. I
>> only find Spark using "#" as the separator for option "--files" and
>> "--archives" between file path and target file/directory name. After some
>> research I find that this usage comes from the URI fragment. We can append
>> a secondary resource as the fragment of the URI after a number sign ("#")
>> character. As we treat user file paths as URIs when parsing command line,
>> using "#" as the separator of "--pyRequirements" makes sense to me, which
>> means the second parameter is the fragment of the first parameter. The
>> definition of URI fragment can be found here [1].
>>> 
>>> The reason of using "#" in "--pyArchives" as the separator of file path
>> and targer directory name is the same as above.
>>> 
>>> Best,
>>> Wei
>>> 
>>> [1] https://tools.ietf.org/html/rfc3986#section-3.5
>>> 
 在 2019年12月3日,22:02,Aljoscha Krettek  写道:
 
 Hi,
 
 Yes, I think it’s a good idea to make the options uniform. Using ‘#’ as
>> a separator for options that take two values seems a bit strange to me, did
>> you research if any other CLI tools have this convention?
 
 Side note: I don’t like that our options use camel-case, I think that’s
>> very non-standard. But that’s how it is now…
 
 Best,
 Aljoscha
 
> On 3. Dec 2019, at 10:14, jincheng sun 
>> wrote:
> 
> Thanks for bringup this discussion Wei!
> I think this is very important for Flink User, we should contains this
> changes in Flink 1.10.
> +1  for the optimization from the perspective of user convenience and
>> the
> unified use of Flink command line parameters.
> 
> Best,
> Jincheng
> 
> Wei Zhong  于2019年12月2日周一 下午3:26写道:
> 
>> Hi everyone,
>> 
>> I wanted to bring up the discussion of improving the Pyflink command
>> line
>> options.
>> 
>> A few command line options have been introduced in the FLIP-78 [1],
>> i.e.
>> "python-executable-path", "python-requirements","python-archive", etc.
>> There are a few problems with these options, i.e. the naming style,
>> variable argument options, etc.
>> 
>> We want to make some adjustment of FLIP-78 to improve the newly
>> introduced
>> command line options, here is the design doc:
>> 
>> 
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>> <
>> 
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>>> 
>> Looking forward to your feedback!
>> 
>> Best,
>> Wei
>> 
>> [1]
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
>> <
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
>>> 
>> 
>> 
 
>>> 
>> 
>> 



Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-04 Thread jincheng sun
Hi all,

Thanks for the quick response Aljoscha & Wei !

It seems unify the options is necessary, and 1.10 will be code frozen. I
would be like to bring up the VOTE thread for this change ASAP, and more
detail can continue discuss in the PR.

What do you think?

Best,
Jincheng

Aljoscha Krettek  于2019年12月4日周三 下午5:11写道:

> Perfect, thanks for the background info! I also found this section now,
> which mentions that it comes from Hadoop:
> https://spark.apache.org/docs/latest/running-on-yarn.html#important-notes.
>
> I think the proposed changes are good!
>
> Best,
> Aljoscha
>
> > On 4. Dec 2019, at 04:34, Wei Zhong  wrote:
> >
> > Hi Aljoscha,
> >
> > Thanks for your reply! Before bringing up this discussion I did some
> research on commonly used separators for options that take multiple values.
> I have considered ",", ":" and "#". Finally I chose "#" as the separator of
> "--pyRequirements".
> >
> > For ",", it is the most widely used separator. Many projects use it as
> the separator of the values in same level. e.g. "-Dexcludes" in Maven,
> "--files" in Spark and "-pyFiles" in Flink. But the second parameter of
> "--pyRequirements", the requirement cached directory, is not at the same
> level as its first parameter (the requirements file). It is secondary and
> is only needed when the packages in the requirements file can not be
> downloaded from the package index server.
> >
> > For ":", it is used as a path separator in most cases. e.g. main
> arguments of scp (secure copy), "--volume" in Docker and "-cp" in Java. But
> as we support accept a URI as the file path, which contains ":" in most
> cases, ":" can not be used as the separator of "--pyRequirements".
> >
> > For "#", it is really rarely used as a separator for multiple values. I
> only find Spark using "#" as the separator for option "--files" and
> "--archives" between file path and target file/directory name. After some
> research I find that this usage comes from the URI fragment. We can append
> a secondary resource as the fragment of the URI after a number sign ("#")
> character. As we treat user file paths as URIs when parsing command line,
> using "#" as the separator of "--pyRequirements" makes sense to me, which
> means the second parameter is the fragment of the first parameter. The
> definition of URI fragment can be found here [1].
> >
> > The reason of using "#" in "--pyArchives" as the separator of file path
> and targer directory name is the same as above.
> >
> > Best,
> > Wei
> >
> > [1] https://tools.ietf.org/html/rfc3986#section-3.5
> >
> >> 在 2019年12月3日,22:02,Aljoscha Krettek  写道:
> >>
> >> Hi,
> >>
> >> Yes, I think it’s a good idea to make the options uniform. Using ‘#’ as
> a separator for options that take two values seems a bit strange to me, did
> you research if any other CLI tools have this convention?
> >>
> >> Side note: I don’t like that our options use camel-case, I think that’s
> very non-standard. But that’s how it is now…
> >>
> >> Best,
> >> Aljoscha
> >>
> >>> On 3. Dec 2019, at 10:14, jincheng sun 
> wrote:
> >>>
> >>> Thanks for bringup this discussion Wei!
> >>> I think this is very important for Flink User, we should contains this
> >>> changes in Flink 1.10.
> >>> +1  for the optimization from the perspective of user convenience and
> the
> >>> unified use of Flink command line parameters.
> >>>
> >>> Best,
> >>> Jincheng
> >>>
> >>> Wei Zhong  于2019年12月2日周一 下午3:26写道:
> >>>
>  Hi everyone,
> 
>  I wanted to bring up the discussion of improving the Pyflink command
> line
>  options.
> 
>  A few command line options have been introduced in the FLIP-78 [1],
> i.e.
>  "python-executable-path", "python-requirements","python-archive", etc.
>  There are a few problems with these options, i.e. the naming style,
>  variable argument options, etc.
> 
>  We want to make some adjustment of FLIP-78 to improve the newly
> introduced
>  command line options, here is the design doc:
> 
> 
> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>  <
> 
> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
> >
>  Looking forward to your feedback!
> 
>  Best,
>  Wei
> 
>  [1]
> 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
>  <
> 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
> >
> 
> 
> >>
> >
>
>


Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-04 Thread Aljoscha Krettek
Perfect, thanks for the background info! I also found this section now, which 
mentions that it comes from Hadoop: 
https://spark.apache.org/docs/latest/running-on-yarn.html#important-notes.

I think the proposed changes are good!

Best,
Aljoscha

> On 4. Dec 2019, at 04:34, Wei Zhong  wrote:
> 
> Hi Aljoscha,
> 
> Thanks for your reply! Before bringing up this discussion I did some research 
> on commonly used separators for options that take multiple values. I have 
> considered ",", ":" and "#". Finally I chose "#" as the separator of 
> "--pyRequirements".
> 
> For ",", it is the most widely used separator. Many projects use it as the 
> separator of the values in same level. e.g. "-Dexcludes" in Maven, "--files" 
> in Spark and "-pyFiles" in Flink. But the second parameter of 
> "--pyRequirements", the requirement cached directory, is not at the same 
> level as its first parameter (the requirements file). It is secondary and is 
> only needed when the packages in the requirements file can not be downloaded 
> from the package index server.
> 
> For ":", it is used as a path separator in most cases. e.g. main arguments of 
> scp (secure copy), "--volume" in Docker and "-cp" in Java. But as we support 
> accept a URI as the file path, which contains ":" in most cases, ":" can not 
> be used as the separator of "--pyRequirements".
> 
> For "#", it is really rarely used as a separator for multiple values. I only 
> find Spark using "#" as the separator for option "--files" and "--archives" 
> between file path and target file/directory name. After some research I find 
> that this usage comes from the URI fragment. We can append a secondary 
> resource as the fragment of the URI after a number sign ("#") character. As 
> we treat user file paths as URIs when parsing command line, using "#" as the 
> separator of "--pyRequirements" makes sense to me, which means the second 
> parameter is the fragment of the first parameter. The definition of URI 
> fragment can be found here [1].
> 
> The reason of using "#" in "--pyArchives" as the separator of file path and 
> targer directory name is the same as above.
> 
> Best,
> Wei
> 
> [1] https://tools.ietf.org/html/rfc3986#section-3.5
> 
>> 在 2019年12月3日,22:02,Aljoscha Krettek  写道:
>> 
>> Hi,
>> 
>> Yes, I think it’s a good idea to make the options uniform. Using ‘#’ as a 
>> separator for options that take two values seems a bit strange to me, did 
>> you research if any other CLI tools have this convention?
>> 
>> Side note: I don’t like that our options use camel-case, I think that’s very 
>> non-standard. But that’s how it is now…
>> 
>> Best,
>> Aljoscha
>> 
>>> On 3. Dec 2019, at 10:14, jincheng sun  wrote:
>>> 
>>> Thanks for bringup this discussion Wei!
>>> I think this is very important for Flink User, we should contains this
>>> changes in Flink 1.10.
>>> +1  for the optimization from the perspective of user convenience and the
>>> unified use of Flink command line parameters.
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> Wei Zhong  于2019年12月2日周一 下午3:26写道:
>>> 
 Hi everyone,
 
 I wanted to bring up the discussion of improving the Pyflink command line
 options.
 
 A few command line options have been introduced in the FLIP-78 [1], i.e.
 "python-executable-path", "python-requirements","python-archive", etc.
 There are a few problems with these options, i.e. the naming style,
 variable argument options, etc.
 
 We want to make some adjustment of FLIP-78 to improve the newly introduced
 command line options, here is the design doc:
 
 https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
 <
 https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
> 
 Looking forward to your feedback!
 
 Best,
 Wei
 
 [1]
 https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
 <
 https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
> 
 
 
>> 
> 



Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-03 Thread Wei Zhong
Hi Aljoscha,

Thanks for your reply! Before bringing up this discussion I did some research 
on commonly used separators for options that take multiple values. I have 
considered ",", ":" and "#". Finally I chose "#" as the separator of 
"--pyRequirements".

For ",", it is the most widely used separator. Many projects use it as the 
separator of the values in same level. e.g. "-Dexcludes" in Maven, "--files" in 
Spark and "-pyFiles" in Flink. But the second parameter of "--pyRequirements", 
the requirement cached directory, is not at the same level as its first 
parameter (the requirements file). It is secondary and is only needed when the 
packages in the requirements file can not be downloaded from the package index 
server.

For ":", it is used as a path separator in most cases. e.g. main arguments of 
scp (secure copy), "--volume" in Docker and "-cp" in Java. But as we support 
accept a URI as the file path, which contains ":" in most cases, ":" can not be 
used as the separator of "--pyRequirements".

For "#", it is really rarely used as a separator for multiple values. I only 
find Spark using "#" as the separator for option "--files" and "--archives" 
between file path and target file/directory name. After some research I find 
that this usage comes from the URI fragment. We can append a secondary resource 
as the fragment of the URI after a number sign ("#") character. As we treat 
user file paths as URIs when parsing command line, using "#" as the separator 
of "--pyRequirements" makes sense to me, which means the second parameter is 
the fragment of the first parameter. The definition of URI fragment can be 
found here [1].

The reason of using "#" in "--pyArchives" as the separator of file path and 
targer directory name is the same as above.

Best,
Wei

[1] https://tools.ietf.org/html/rfc3986#section-3.5

> 在 2019年12月3日,22:02,Aljoscha Krettek  写道:
> 
> Hi,
> 
> Yes, I think it’s a good idea to make the options uniform. Using ‘#’ as a 
> separator for options that take two values seems a bit strange to me, did you 
> research if any other CLI tools have this convention?
> 
> Side note: I don’t like that our options use camel-case, I think that’s very 
> non-standard. But that’s how it is now…
> 
> Best,
> Aljoscha
> 
>> On 3. Dec 2019, at 10:14, jincheng sun  wrote:
>> 
>> Thanks for bringup this discussion Wei!
>> I think this is very important for Flink User, we should contains this
>> changes in Flink 1.10.
>> +1  for the optimization from the perspective of user convenience and the
>> unified use of Flink command line parameters.
>> 
>> Best,
>> Jincheng
>> 
>> Wei Zhong  于2019年12月2日周一 下午3:26写道:
>> 
>>> Hi everyone,
>>> 
>>> I wanted to bring up the discussion of improving the Pyflink command line
>>> options.
>>> 
>>> A few command line options have been introduced in the FLIP-78 [1], i.e.
>>> "python-executable-path", "python-requirements","python-archive", etc.
>>> There are a few problems with these options, i.e. the naming style,
>>> variable argument options, etc.
>>> 
>>> We want to make some adjustment of FLIP-78 to improve the newly introduced
>>> command line options, here is the design doc:
>>> 
>>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>>> <
>>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
 
>>> Looking forward to your feedback!
>>> 
>>> Best,
>>> Wei
>>> 
>>> [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
>>> <
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
 
>>> 
>>> 
> 



Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-03 Thread Aljoscha Krettek
Hi,

Yes, I think it’s a good idea to make the options uniform. Using ‘#’ as a 
separator for options that take two values seems a bit strange to me, did you 
research if any other CLI tools have this convention?

Side note: I don’t like that our options use camel-case, I think that’s very 
non-standard. But that’s how it is now…

Best,
Aljoscha

> On 3. Dec 2019, at 10:14, jincheng sun  wrote:
> 
> Thanks for bringup this discussion Wei!
> I think this is very important for Flink User, we should contains this
> changes in Flink 1.10.
> +1  for the optimization from the perspective of user convenience and the
> unified use of Flink command line parameters.
> 
> Best,
> Jincheng
> 
> Wei Zhong  于2019年12月2日周一 下午3:26写道:
> 
>> Hi everyone,
>> 
>> I wanted to bring up the discussion of improving the Pyflink command line
>> options.
>> 
>> A few command line options have been introduced in the FLIP-78 [1], i.e.
>> "python-executable-path", "python-requirements","python-archive", etc.
>> There are a few problems with these options, i.e. the naming style,
>> variable argument options, etc.
>> 
>> We want to make some adjustment of FLIP-78 to improve the newly introduced
>> command line options, here is the design doc:
>> 
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>> <
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>>> 
>> Looking forward to your feedback!
>> 
>> Best,
>> Wei
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
>>> 
>> 
>> 



Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-03 Thread jincheng sun
Thanks for bringup this discussion Wei!
I think this is very important for Flink User, we should contains this
changes in Flink 1.10.
+1  for the optimization from the perspective of user convenience and the
unified use of Flink command line parameters.

Best,
Jincheng

Wei Zhong  于2019年12月2日周一 下午3:26写道:

> Hi everyone,
>
> I wanted to bring up the discussion of improving the Pyflink command line
> options.
>
> A few command line options have been introduced in the FLIP-78 [1], i.e.
> "python-executable-path", "python-requirements","python-archive", etc.
> There are a few problems with these options, i.e. the naming style,
> variable argument options, etc.
>
> We want to make some adjustment of FLIP-78 to improve the newly introduced
> command line options, here is the design doc:
>
> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
> <
> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
> >
> Looking forward to your feedback!
>
> Best,
> Wei
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
> >
>
>


[DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-01 Thread Wei Zhong
Hi everyone,

I wanted to bring up the discussion of improving the Pyflink command line 
options.

A few command line options have been introduced in the FLIP-78 [1], i.e. 
"python-executable-path", "python-requirements","python-archive", etc. There 
are a few problems with these options, i.e. the naming style, variable argument 
options, etc.

We want to make some adjustment of FLIP-78 to improve the newly introduced 
command line options, here is the design doc:
https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
 

Looking forward to your feedback!

Best,
Wei

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management