[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371253#comment-16371253 ] ASF GitHub Bot commented on FLINK-8668: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/5531#discussion_r169606158 --- Diff: docs/ops/deployment/hadoop.md --- @@ -0,0 +1,47 @@ +--- +title: "Hadoop Integration" +nav-title: Hadoop Integration +nav-parent_id: deployment +nav-pos: 8 +--- + + +* This will be replaced by the TOC +{:toc} + +## Configuring Flink with Hadoop Classpaths + +Flink will use the environment variable `HADOOP_CLASSPATH` to augment the +classpath that is used when starting Flink components such as the Client, +JobManager, or TaskManager. Most Hadoop distributions and cloud environments +will not set this variable by default so if the Hadoop classpath should be +picked up by Flink the environment variable must be exported on all machines +that are running Flink components. + +When running on YARN, this is usually not a problem because the components +running inside YARN will be started with the Hadoop classpaths, but it can +happen that the Hadoop dependencies must be in the classpath when submitting a +job to YARN. For this, it's usually enough to do a + +``` +export HADOOP_CLASSPATH=`hadoop classpath` --- End diff -- it's the `hadoop` binary with `classpath` as argument > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.0 > > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371252#comment-16371252 ] ASF GitHub Bot commented on FLINK-8668: --- Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/5531 > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.0 > > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371230#comment-16371230 ] ASF GitHub Bot commented on FLINK-8668: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/5531#discussion_r169600042 --- Diff: docs/ops/deployment/hadoop.md --- @@ -0,0 +1,47 @@ +--- +title: "Hadoop Integration" +nav-title: Hadoop Integration +nav-parent_id: deployment +nav-pos: 8 +--- + + +* This will be replaced by the TOC +{:toc} + +## Configuring Flink with Hadoop Classpaths + +Flink will use the environment variable `HADOOP_CLASSPATH` to augment the +classpath that is used when starting Flink components such as the Client, +JobManager, or TaskManager. Most Hadoop distributions and cloud environments +will not set this variable by default so if the Hadoop classpath should be +picked up by Flink the environment variable must be exported on all machines +that are running Flink components. + +When running on YARN, this is usually not a problem because the components +running inside YARN will be started with the Hadoop classpaths, but it can +happen that the Hadoop dependencies must be in the classpath when submitting a +job to YARN. For this, it's usually enough to do a + +``` +export HADOOP_CLASSPATH=`hadoop classpath` --- End diff -- add `<` `>`? > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.0 > > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369949#comment-16369949 ] ASF GitHub Bot commented on FLINK-8668: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/5531 @zentol Yes, I struggled with where exactly to put this. I think I will just create a "Hadoop" page under "Clusters" that has only this section for now. WDYT? > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.0 > > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369945#comment-16369945 ] ASF GitHub Bot commented on FLINK-8668: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/5531#discussion_r169287038 --- Diff: docs/ops/config.md --- @@ -82,6 +82,26 @@ prefix that is checked against the fully qualified class name. By default, this If you want to change this setting you have to make sure to also include the default patterns in your list of patterns if you want to keep that default behaviour. +## Configuring Flink with Hadoop Classpaths + +Flink will use the environment variable `HADOOP_CLASSPATH` to augment the +classpath that is used when starting Flink components such as the Client, +JobManager, or TaskManager. Most Hadoop distributions and cloud environments +will not set this variable by default so if the Hadoop classpath should be +picked up by Flink the environment variable should be exported on all machines +that are running Flink components. + +When running on YARN, this is usually not a problem because the components +running inside YARN will be started with the Hadoop classpaths anyways but it --- End diff -- fixing > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.0 > > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369946#comment-16369946 ] ASF GitHub Bot commented on FLINK-8668: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/5531#discussion_r169287058 --- Diff: docs/ops/config.md --- @@ -82,6 +82,26 @@ prefix that is checked against the fully qualified class name. By default, this If you want to change this setting you have to make sure to also include the default patterns in your list of patterns if you want to keep that default behaviour. +## Configuring Flink with Hadoop Classpaths + +Flink will use the environment variable `HADOOP_CLASSPATH` to augment the +classpath that is used when starting Flink components such as the Client, +JobManager, or TaskManager. Most Hadoop distributions and cloud environments +will not set this variable by default so if the Hadoop classpath should be +picked up by Flink the environment variable should be exported on all machines --- End diff -- fixing > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.0 > > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369903#comment-16369903 ] ASF GitHub Bot commented on FLINK-8668: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/5531 We may instead want to add a whole new page under "Clusters" for hadoop related things. > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.0 > > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369878#comment-16369878 ] ASF GitHub Bot commented on FLINK-8668: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/5531#discussion_r169269041 --- Diff: docs/ops/config.md --- @@ -82,6 +82,26 @@ prefix that is checked against the fully qualified class name. By default, this If you want to change this setting you have to make sure to also include the default patterns in your list of patterns if you want to keep that default behaviour. +## Configuring Flink with Hadoop Classpaths + +Flink will use the environment variable `HADOOP_CLASSPATH` to augment the +classpath that is used when starting Flink components such as the Client, +JobManager, or TaskManager. Most Hadoop distributions and cloud environments +will not set this variable by default so if the Hadoop classpath should be +picked up by Flink the environment variable should be exported on all machines --- End diff -- replace "should" with "must"? > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.0 > > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369877#comment-16369877 ] ASF GitHub Bot commented on FLINK-8668: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/5531#discussion_r169268952 --- Diff: docs/ops/config.md --- @@ -82,6 +82,26 @@ prefix that is checked against the fully qualified class name. By default, this If you want to change this setting you have to make sure to also include the default patterns in your list of patterns if you want to keep that default behaviour. +## Configuring Flink with Hadoop Classpaths + +Flink will use the environment variable `HADOOP_CLASSPATH` to augment the +classpath that is used when starting Flink components such as the Client, +JobManager, or TaskManager. Most Hadoop distributions and cloud environments +will not set this variable by default so if the Hadoop classpath should be +picked up by Flink the environment variable should be exported on all machines +that are running Flink components. + +When running on YARN, this is usually not a problem because the components +running inside YARN will be started with the Hadoop classpaths anyways but it --- End diff -- remove `anyways` and replace it with a comma. > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.0 > > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369869#comment-16369869 ] ASF GitHub Bot commented on FLINK-8668: --- GitHub user aljoscha opened a pull request: https://github.com/apache/flink/pull/5531 [FLINK-8668] Document how to set HADOOP_CLASSPATH for Flink R: @zentol @StephanEwen You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/flink jira-8668-doc-hadoop-classpath Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5531.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5531 commit 10f6a53bcd85da126746a8cdeec97cce31c013f0 Author: Aljoscha KrettekDate: 2018-02-20T09:51:27Z [FLINK-8668] Document how to set HADOOP_CLASSPATH for Flink > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.0 > > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369301#comment-16369301 ] Aljoscha Krettek commented on FLINK-8668: - [~StephanEwen] The plan was to include it in the release notes because the feature was previously undocumented and there's no obvious place to put this. There is no central documentation about how Flink works with Hadoop so I thought I can put a small section in the [config doc|https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html]. Basically just: Make sure you set {{HADOOP_CLASSPATH}} on your machines if you want Flink to pick up your Hadoop classpath. Btw, I had this PR open: https://github.com/apache/flink/pull/4920 But by now I think the only sane approach is to require users to ensure {{HADOOP_CLASSPATH}} is set. A configuration script cannot do that for you. For example in cases where you have a standalone cluster setup the script could give the appearance of doing configuration but you still have to manually do steps to move that configuration to all machines, which people would probably forget. > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.0 > > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368960#comment-16368960 ] Stephan Ewen commented on FLINK-8668: - This change now simply removes prior present functionality, without adding and reference, docs, or replacement. I think that will throw users off... > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > Fix For: 1.5.0 > > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367783#comment-16367783 ] Stephan Ewen commented on FLINK-8668: - Could you also apply this to the HBase configuration, which is magically happening? > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367751#comment-16367751 ] Stephan Ewen commented on FLINK-8668: - I like this, it removes some hard to understand magic. It may be useful to add a directory {{setup}} with scripts like {{configureForHadoop.sh}} that does what the removed magic does. That way users still get help during the setup, without hard to understand things happening automagically. > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8668) Remove "hadoop classpath" from config.sh
[ https://issues.apache.org/jira/browse/FLINK-8668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366746#comment-16366746 ] Aljoscha Krettek commented on FLINK-8668: - See also comments on FLINK-7477 > Remove "hadoop classpath" from config.sh > > > Key: FLINK-8668 > URL: https://issues.apache.org/jira/browse/FLINK-8668 > Project: Flink > Issue Type: New Feature >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Major > > Automatically adding this when available can lead to dependency problems for > some users and there is no way of turning of this "feature". It was added to > make using Flink on AWS/EMR and GCE a bit easier but I think it's causing > more harm than good. > If users want to to augment the classpath they can always {{export > HADOOP_CLASSPATH=...}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)