[jira] [Commented] (SPARK-21994) Spark 2.2 can not read Parquet table created by itself
[ https://issues.apache.org/jira/browse/SPARK-21994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250066#comment-16250066 ] Srinivasa Reddy Vundela commented on SPARK-21994: - [~srowen] Thats right, it is not available in public release yet. I just posted for reference. > Spark 2.2 can not read Parquet table created by itself > -- > > Key: SPARK-21994 > URL: https://issues.apache.org/jira/browse/SPARK-21994 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 > Environment: Spark 2.2 on Cloudera CDH 5.10.1, Hive 1.1 >Reporter: Jurgis Pods > > This seems to be a new bug introduced in Spark 2.2, since it did not occur > under Spark 2.1. > When writing a dataframe to a table in Parquet format, Spark SQL does not > write the 'path' of the table to the Hive metastore, unlike in previous > versions. > As a consequence, Spark 2.2 is not able to read the table it just created. It > just outputs the table header without any row content. > A parallel installation of Spark 1.6 at least produces an appropriate error > trace: > {code:java} > 17/09/13 10:22:12 WARN metastore.ObjectStore: Version information not found > in metastore. hive.metastore.schema.verification is not enabled so recording > the schema version 1.1.0 > 17/09/13 10:22:12 WARN metastore.ObjectStore: Failed to get database default, > returning NoSuchObjectException > org.spark-project.guava.util.concurrent.UncheckedExecutionException: > java.util.NoSuchElementException: key not found: path > [...] > {code} > h3. Steps to reproduce: > Run the following in spark2-shell: > {code:java} > scala> val df = spark.sql("show databases") > scala> df.show() > ++ > |databaseName| > ++ > | mydb1| > | mydb2| > | default| > |test| > ++ > scala> df.write.format("parquet").saveAsTable("test.spark22_test") > scala> spark.sql("select * from test.spark22_test").show() > ++ > |databaseName| > ++ > ++{code} > When manually setting the path (causing the data to be saved as external > table), it works: > {code:java} > scala> df.write.option("path", > "/hadoop/eco/hive/warehouse/test.db/spark22_parquet_with_path").format("parquet").saveAsTable("test.spark22_parquet_with_path") > scala> spark.sql("select * from test.spark22_parquet_with_path").show() > ++ > |databaseName| > ++ > | mydb1| > | mydb2| > | default| > |test| > ++ > {code} > A second workaround is to update the metadata of the managed table created by > Spark 2.2: > {code} > spark.sql("alter table test.spark22_test set SERDEPROPERTIES > ('path'='hdfs://my-cluster-name:8020/hadoop/eco/hive/warehouse/test.db/spark22_test')") > spark.catalog.refreshTable("test.spark22_test") > spark.sql("select * from test.spark22_test").show() > ++ > |databaseName| > ++ > | mydb1| > | mydb2| > | default| > |test| > ++ > {code} > It is kind of a disaster that we are not able to read tables created by the > very same Spark version and have to manually specify the path as an explicit > option. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21994) Spark 2.2 can not read Parquet table created by itself
[ https://issues.apache.org/jira/browse/SPARK-21994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250032#comment-16250032 ] Srinivasa Reddy Vundela commented on SPARK-21994: - commit d5e3ba3e970c7241298db2578f0d7965b6e16ae3 Author: Srinivasa Reddy VundelaDate: Mon Oct 9 14:25:01 2017 -0700 CDH-60037. Not able to read hive table from Cloudera version of Spark 2.2 > Spark 2.2 can not read Parquet table created by itself > -- > > Key: SPARK-21994 > URL: https://issues.apache.org/jira/browse/SPARK-21994 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 > Environment: Spark 2.2 on Cloudera CDH 5.10.1, Hive 1.1 >Reporter: Jurgis Pods > > This seems to be a new bug introduced in Spark 2.2, since it did not occur > under Spark 2.1. > When writing a dataframe to a table in Parquet format, Spark SQL does not > write the 'path' of the table to the Hive metastore, unlike in previous > versions. > As a consequence, Spark 2.2 is not able to read the table it just created. It > just outputs the table header without any row content. > A parallel installation of Spark 1.6 at least produces an appropriate error > trace: > {code:java} > 17/09/13 10:22:12 WARN metastore.ObjectStore: Version information not found > in metastore. hive.metastore.schema.verification is not enabled so recording > the schema version 1.1.0 > 17/09/13 10:22:12 WARN metastore.ObjectStore: Failed to get database default, > returning NoSuchObjectException > org.spark-project.guava.util.concurrent.UncheckedExecutionException: > java.util.NoSuchElementException: key not found: path > [...] > {code} > h3. Steps to reproduce: > Run the following in spark2-shell: > {code:java} > scala> val df = spark.sql("show databases") > scala> df.show() > ++ > |databaseName| > ++ > | mydb1| > | mydb2| > | default| > |test| > ++ > scala> df.write.format("parquet").saveAsTable("test.spark22_test") > scala> spark.sql("select * from test.spark22_test").show() > ++ > |databaseName| > ++ > ++{code} > When manually setting the path (causing the data to be saved as external > table), it works: > {code:java} > scala> df.write.option("path", > "/hadoop/eco/hive/warehouse/test.db/spark22_parquet_with_path").format("parquet").saveAsTable("test.spark22_parquet_with_path") > scala> spark.sql("select * from test.spark22_parquet_with_path").show() > ++ > |databaseName| > ++ > | mydb1| > | mydb2| > | default| > |test| > ++ > {code} > A second workaround is to update the metadata of the managed table created by > Spark 2.2: > {code} > spark.sql("alter table test.spark22_test set SERDEPROPERTIES > ('path'='hdfs://my-cluster-name:8020/hadoop/eco/hive/warehouse/test.db/spark22_test')") > spark.catalog.refreshTable("test.spark22_test") > spark.sql("select * from test.spark22_test").show() > ++ > |databaseName| > ++ > | mydb1| > | mydb2| > | default| > |test| > ++ > {code} > It is kind of a disaster that we are not able to read tables created by the > very same Spark version and have to manually specify the path as an explicit > option. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9104) expose network layer memory usage
[ https://issues.apache.org/jira/browse/SPARK-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249904#comment-16249904 ] Srinivasa Reddy Vundela edited comment on SPARK-9104 at 11/13/17 6:39 PM: -- Hi [~jerryshao] Thanks for the PR which exposes the Netty buffered pool memory usage, its a very good start to expose the unaccounted memory. But, I see that these are not registered with Metric System or Web UI. I was wondering if you have plans to expose them to metrics system? or Would it be okay if I send a PR and you help in reviewing? I see that you have included netty metrics for ExternalShuffleService and I was wondering for other parts which uses TransportServer and TransportClientFactory like NettyRpcEnv. was (Author: vsr): Hi [~jerryshao] Thanks for the PR which exposes the Netty buffered pool memory usage, its a very good start to expose the unaccounted memory. But, I see that these are not registered with Metric System or Web UI. I was wondering if you have plans to expose them to metrics system? or Would it be okay if I send a PR and you help in reviewing? > expose network layer memory usage > - > > Key: SPARK-9104 > URL: https://issues.apache.org/jira/browse/SPARK-9104 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Zhang, Liye >Assignee: Saisai Shao > Fix For: 2.3.0 > > > The default network transportation is netty, and when transfering blocks for > shuffle, the network layer will consume a decent size of memory, we shall > collect the memory usage of this part and expose it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9104) expose network layer memory usage
[ https://issues.apache.org/jira/browse/SPARK-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249904#comment-16249904 ] Srinivasa Reddy Vundela commented on SPARK-9104: Hi [~jerryshao] Thanks for the PR which exposes the Netty buffered pool memory usage, its a very good start to expose the unaccounted memory. But, I see that these are not registered with Metric System or Web UI. I was wondering if you have plans to expose them to metrics system? or Would it be okay if I send a PR and you help in reviewing? > expose network layer memory usage > - > > Key: SPARK-9104 > URL: https://issues.apache.org/jira/browse/SPARK-9104 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Zhang, Liye >Assignee: Saisai Shao > Fix For: 2.3.0 > > > The default network transportation is netty, and when transfering blocks for > shuffle, the network layer will consume a decent size of memory, we shall > collect the memory usage of this part and expose it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21994) Spark 2.2 can not read Parquet table created by itself
[ https://issues.apache.org/jira/browse/SPARK-21994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246463#comment-16246463 ] Srinivasa Reddy Vundela commented on SPARK-21994: - This issue is related to Cloudera spark and got fixed recently. We can close this jira. > Spark 2.2 can not read Parquet table created by itself > -- > > Key: SPARK-21994 > URL: https://issues.apache.org/jira/browse/SPARK-21994 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 > Environment: Spark 2.2 on Cloudera CDH 5.10.1, Hive 1.1 >Reporter: Jurgis Pods > > This seems to be a new bug introduced in Spark 2.2, since it did not occur > under Spark 2.1. > When writing a dataframe to a table in Parquet format, Spark SQL does not > write the 'path' of the table to the Hive metastore, unlike in previous > versions. > As a consequence, Spark 2.2 is not able to read the table it just created. It > just outputs the table header without any row content. > A parallel installation of Spark 1.6 at least produces an appropriate error > trace: > {code:java} > 17/09/13 10:22:12 WARN metastore.ObjectStore: Version information not found > in metastore. hive.metastore.schema.verification is not enabled so recording > the schema version 1.1.0 > 17/09/13 10:22:12 WARN metastore.ObjectStore: Failed to get database default, > returning NoSuchObjectException > org.spark-project.guava.util.concurrent.UncheckedExecutionException: > java.util.NoSuchElementException: key not found: path > [...] > {code} > h3. Steps to reproduce: > Run the following in spark2-shell: > {code:java} > scala> val df = spark.sql("show databases") > scala> df.show() > ++ > |databaseName| > ++ > | mydb1| > | mydb2| > | default| > |test| > ++ > scala> df.write.format("parquet").saveAsTable("test.spark22_test") > scala> spark.sql("select * from test.spark22_test").show() > ++ > |databaseName| > ++ > ++{code} > When manually setting the path (causing the data to be saved as external > table), it works: > {code:java} > scala> df.write.option("path", > "/hadoop/eco/hive/warehouse/test.db/spark22_parquet_with_path").format("parquet").saveAsTable("test.spark22_parquet_with_path") > scala> spark.sql("select * from test.spark22_parquet_with_path").show() > ++ > |databaseName| > ++ > | mydb1| > | mydb2| > | default| > |test| > ++ > {code} > A second workaround is to update the metadata of the managed table created by > Spark 2.2: > {code} > spark.sql("alter table test.spark22_test set SERDEPROPERTIES > ('path'='hdfs://my-cluster-name:8020/hadoop/eco/hive/warehouse/test.db/spark22_test')") > spark.catalog.refreshTable("test.spark22_test") > spark.sql("select * from test.spark22_test").show() > ++ > |databaseName| > ++ > | mydb1| > | mydb2| > | default| > |test| > ++ > {code} > It is kind of a disaster that we are not able to read tables created by the > very same Spark version and have to manually specify the path as an explicit > option. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22483) Exposing java.nio bufferedPool memory metrics to metrics system
Srinivasa Reddy Vundela created SPARK-22483: --- Summary: Exposing java.nio bufferedPool memory metrics to metrics system Key: SPARK-22483 URL: https://issues.apache.org/jira/browse/SPARK-22483 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 2.2.0 Reporter: Srinivasa Reddy Vundela Spark currently exposes on-heap and off-heap memory of JVM to metric system. Currently there is no way to know how much direct/mapped memory allocated for java.nio buffered pools. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12717) pyspark broadcast fails when using multiple threads
[ https://issues.apache.org/jira/browse/SPARK-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srinivasa Reddy Vundela updated SPARK-12717: Attachment: run.log Please find the attached log with fix for the following command spark2-submit --master local[20] bug_spark.py --parallelism 1000 >& run.log > pyspark broadcast fails when using multiple threads > --- > > Key: SPARK-12717 > URL: https://issues.apache.org/jira/browse/SPARK-12717 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.0 > Environment: Linux, python 2.6 or python 2.7. >Reporter: Edward Walker >Priority: Critical > Attachments: run.log > > > The following multi-threaded program that uses broadcast variables > consistently throws exceptions like: *Exception("Broadcast variable '18' not > loaded!",)* --- even when run with "--master local[10]". > {code:title=bug_spark.py|borderStyle=solid} > try: > > import pyspark > > except: > > pass > > from optparse import OptionParser > > > > def my_option_parser(): > > op = OptionParser() > > op.add_option("--parallelism", dest="parallelism", type="int", > default=20) > return op > > > > def do_process(x, w): > > return x * w.value > > > > def func(name, rdd, conf): > > new_rdd = rdd.map(lambda x : do_process(x, conf)) > > total = new_rdd.reduce(lambda x, y : x + y) > > count = rdd.count() > > print name, 1.0 * total / count > > > > if __name__ == "__main__": > > import threading > > op = my_option_parser() > > options, args = op.parse_args() > > sc = pyspark.SparkContext(appName="Buggy") > > data_rdd = sc.parallelize(range(0,1000), 1) > > confs = [ sc.broadcast(i) for i in xrange(options.parallelism) ] > > threads = [ threading.Thread(target=func, args=["thread_" + str(i), > data_rdd, confs[i]]) for i in xrange(options.parallelism) ] > > for t in threads: > > t.start() > > for t in threads: > > t.join() > {code} > Abridged run output: > {code:title=abridge_run.txt|borderStyle=solid} > % spark-submit
[jira] [Commented] (SPARK-19732) DataFrame.fillna() does not work for bools in PySpark
[ https://issues.apache.org/jira/browse/SPARK-19732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975369#comment-15975369 ] Srinivasa Reddy Vundela commented on SPARK-19732: - Hi [~lenfro], I was checking the documentation for fillna (python API) and na.fill(scala API). Both places I see that supported datatypes for value are either String or Numerical type (int, float, double). So, this surely not a bug as the implementation is in sync with documentation. May be this can be changed to enhancement. > DataFrame.fillna() does not work for bools in PySpark > - > > Key: SPARK-19732 > URL: https://issues.apache.org/jira/browse/SPARK-19732 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.0 >Reporter: Len Frodgers > > In PySpark, the fillna function of DataFrame inadvertently casts bools to > ints, so fillna cannot be used to fill True/False. > e.g. > `spark.createDataFrame([Row(a=True),Row(a=None)]).fillna(True).collect()` > yields > `[Row(a=True), Row(a=None)]` > It should be a=True for the second Row > The cause is this bit of code: > {code} > if isinstance(value, (int, long)): > value = float(value) > {code} > There needs to be a separate check for isinstance(bool), since in python, > bools are ints too -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13341) Casting Unix timestamp to SQL timestamp fails
[ https://issues.apache.org/jira/browse/SPARK-13341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154694#comment-15154694 ] Srinivasa Reddy Vundela commented on SPARK-13341: - I guess the following commit is the reason for the change https://github.com/apache/spark/commit/9ed4ad4265cf9d3135307eb62dae6de0b220fc21 Seems HIVE-3454 fixed in 1.2.0 and if customers are using earlier versions of HIVE they will see this problem. > Casting Unix timestamp to SQL timestamp fails > - > > Key: SPARK-13341 > URL: https://issues.apache.org/jira/browse/SPARK-13341 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: William Dee > > The way that unix timestamp casting is handled has been broken between Spark > 1.5.2 and Spark 1.6.0. This can be easily demonstrated via the spark-shell: > {code:title=1.5.2} > scala> sqlContext.sql("SELECT CAST(145558084 AS TIMESTAMP) as ts, > CAST(CAST(145558084 AS TIMESTAMP) AS DATE) as d").show > ++--+ > | ts| d| > ++--+ > |2016-02-16 00:00:...|2016-02-16| > ++--+ > {code} > {code:title=1.6.0} > scala> sqlContext.sql("SELECT CAST(145558084 AS TIMESTAMP) as ts, > CAST(CAST(145558084 AS TIMESTAMP) AS DATE) as d").show > ++--+ > | ts| d| > ++--+ > |48095-07-09 12:06...|095-07-09| > ++--+ > {code} > I'm not sure what exactly is causing this but this defect has definitely been > introduced in Spark 1.6.0 as jobs that relied on this functionality ran on > 1.5.2 and now don't run on 1.6.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11801) Notify driver when OOM is thrown before executor JVM is killed
[ https://issues.apache.org/jira/browse/SPARK-11801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034846#comment-15034846 ] Srinivasa Reddy Vundela commented on SPARK-11801: - [~irashid] This is what I have observed during my testing. Once JVM gets OOM exception, it does the 'kill %p'. During this process it creates a new thread to execute the shutdown hooks. Since, the new thread is yet to be scheduled always the OOM thread is winning the race. The only case it may not win the race is when it gets preempted just after getting OOM. > Notify driver when OOM is thrown before executor JVM is killed > --- > > Key: SPARK-11801 > URL: https://issues.apache.org/jira/browse/SPARK-11801 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Srinivasa Reddy Vundela >Priority: Minor > > Here is some background for the issue. > Customer got OOM exception in one of the task and executor got killed with > kill %p. It is unclear in driver logs/Spark UI why the task is lost or > executor is lost. Customer has to look into the executor logs to see OOM is > the cause for the task/executor lost. > It would be helpful if driver logs/spark UI shows the reason for task > failures by making sure that task updates the driver with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11799) Make it explicit in executor logs that uncaught exceptions are thrown during executor shutdown
[ https://issues.apache.org/jira/browse/SPARK-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011080#comment-15011080 ] Srinivasa Reddy Vundela commented on SPARK-11799: - Thanks [~srowen], it was added by mistake > Make it explicit in executor logs that uncaught exceptions are thrown during > executor shutdown > -- > > Key: SPARK-11799 > URL: https://issues.apache.org/jira/browse/SPARK-11799 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Srinivasa Reddy Vundela >Priority: Minor > > Here is some background for the issue. > Customer got OOM exception in one of the task and executor got killed with > kill %p. Few shutdown hooks are registered with ShutDownHookManager to do the > hadoop temp directory cleanup. During this shutdown phase other tasks are > throwing uncaught exception and executor logs are filled up with so many of > them. > Since it is unclear for the customer in driver logs/ Spark UI why the > container was lost customer is going through the executor logs and he see lot > of uncaught exception. > It would be clear to the customer if we can prepend the uncaught exceptions > with some message like [Container is in shutdown mode] so that he can skip > those. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11801) Notify driver when OOM is thrown before executor JVM is killed
Srinivasa Reddy Vundela created SPARK-11801: --- Summary: Notify driver when OOM is thrown before executor JVM is killed Key: SPARK-11801 URL: https://issues.apache.org/jira/browse/SPARK-11801 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.5.1 Reporter: Srinivasa Reddy Vundela Priority: Minor Here is some background for the issue. Customer got OOM exception in one of the task and executor got killed with kill %p. It is unclear in driver logs/Spark UI why the task is lost or executor is lost. Customer has to look into the executor logs to see OOM is the cause for the task/executor lost. It would be helpful if driver logs/spark UI shows the reason for task failures by making sure that task updates the driver with OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11799) Make it explicit in executor logs that uncaught exceptions are thrown during executor shutdown
Srinivasa Reddy Vundela created SPARK-11799: --- Summary: Make it explicit in executor logs that uncaught exceptions are thrown during executor shutdown Key: SPARK-11799 URL: https://issues.apache.org/jira/browse/SPARK-11799 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.5.1 Reporter: Srinivasa Reddy Vundela Priority: Minor Here is some background for the issue. Customer got OOM exception in one of the task and executor got killed with kill %p. Few shutdown hooks are registered with ShutDownHookManager to do the hadoop temp directory cleanup. During this shutdown phase other tasks are throwing uncaught exception and executor logs are filled up with so many of them. Since it is unclear for the customer in driver logs/ Spark UI why the container was lost customer is going through the executor logs and he see lot of uncaught exception. It would be clear to the customer if we can prepend the uncaught exceptions with some message like [Container is in shutdown mode] so that he can skip those. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11484) Giving precedence to proxyBase set by spark instead of env
[ https://issues.apache.org/jira/browse/SPARK-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srinivasa Reddy Vundela updated SPARK-11484: Description: Customer reported a strange UI when running spark application through Oozie in Uber mode (Issue was observed only in yarn-client mode). When debugging the sparkUI through chrome developer console, figured out that CSS files are looked for in different applicationId (Oozie mapreduce application) instead of actual spark application. (Please see the attached screenshot for more information). Looking into the live sparkUI it seems that proxyBase is taken from APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase (Pointing to the actual spark application). This fix gives precedence to spark property (which should be correct in most cases when it was set). was: Live Spark UI without CSS styling is observed when running spark application from Oozie in uber mode. Please see the attached screenshot for the strange UI. In developer console we can see that UI is looking for css files in mapreduce application proxy instead of spark application proxy. Summary: Giving precedence to proxyBase set by spark instead of env (was: Strange Spark UI issue when running on Oozie uber mode ON) > Giving precedence to proxyBase set by spark instead of env > -- > > Key: SPARK-11484 > URL: https://issues.apache.org/jira/browse/SPARK-11484 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Srinivasa Reddy Vundela >Priority: Minor > Attachments: Screen Shot 2015-10-29 at 9.06.46 AM.png > > > Customer reported a strange UI when running spark application through Oozie > in Uber mode (Issue was observed only in yarn-client mode). > When debugging the sparkUI through chrome developer console, figured out that > CSS files are looked for in different applicationId (Oozie mapreduce > application) instead of actual spark application. (Please see the attached > screenshot for more information). > Looking into the live sparkUI it seems that proxyBase is taken from > APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase > (Pointing to the actual spark application). > This fix gives precedence to spark property (which should be correct in most > cases when it was set). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11484) Giving precedence to proxyBase set by spark instead of env
[ https://issues.apache.org/jira/browse/SPARK-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srinivasa Reddy Vundela updated SPARK-11484: Description: Customer reported a strange UI when running spark application through Oozie in Uber mode (Issue was observed only in yarn-client mode). When debugging the sparkUI through chrome developer console, figured out that CSS files are looked for in different applicationId (Oozie mapreduce application) instead of actual spark application. (Please see the attached screenshot for more information). Looking into the live sparkUI it seems that proxyBase is taken from APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase (Pointing to the actual spark application). This fix gives precedence to spark property (which should be correct in most cases when it was set), which should fix the issue. was: Customer reported a strange UI when running spark application through Oozie in Uber mode (Issue was observed only in yarn-client mode). When debugging the sparkUI through chrome developer console, figured out that CSS files are looked for in different applicationId (Oozie mapreduce application) instead of actual spark application. (Please see the attached screenshot for more information). Looking into the live sparkUI it seems that proxyBase is taken from APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase (Pointing to the actual spark application). This fix gives precedence to spark property (which should be correct in most cases when it was set). > Giving precedence to proxyBase set by spark instead of env > -- > > Key: SPARK-11484 > URL: https://issues.apache.org/jira/browse/SPARK-11484 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Srinivasa Reddy Vundela >Priority: Minor > Attachments: Screen Shot 2015-10-29 at 9.06.46 AM.png > > > Customer reported a strange UI when running spark application through Oozie > in Uber mode (Issue was observed only in yarn-client mode). > When debugging the sparkUI through chrome developer console, figured out that > CSS files are looked for in different applicationId (Oozie mapreduce > application) instead of actual spark application. (Please see the attached > screenshot for more information). > Looking into the live sparkUI it seems that proxyBase is taken from > APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase > (Pointing to the actual spark application). > This fix gives precedence to spark property (which should be correct in most > cases when it was set), which should fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11484) Giving precedence to proxyBase set by spark instead of env
[ https://issues.apache.org/jira/browse/SPARK-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srinivasa Reddy Vundela updated SPARK-11484: Description: Customer reported a strange UI when running spark application through Oozie in Uber mode (Issue was observed only in yarn-client mode). When debugging the sparkUI through chrome developer console, figured out that CSS files are looked for in different applicationId (Oozie mapreduce application) instead of actual spark application. (Please see the attached screenshot for more information). Looking into the live sparkUI code, it seems that proxyBase is taken from APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase (Pointing to the actual spark application). This issue might be reproducible if the above specified env is set manually or set by other job. Fix would be giving precedence to spark property (which might be correct in most cases when it was set). was: Customer reported a strange UI when running spark application through Oozie in Uber mode (Issue was observed only in yarn-client mode). When debugging the sparkUI through chrome developer console, figured out that CSS files are looked for in different applicationId (Oozie mapreduce application) instead of actual spark application. (Please see the attached screenshot for more information). Looking into the live sparkUI it seems that proxyBase is taken from APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase (Pointing to the actual spark application). This fix gives precedence to spark property (which should be correct in most cases when it was set), which should fix the issue. > Giving precedence to proxyBase set by spark instead of env > -- > > Key: SPARK-11484 > URL: https://issues.apache.org/jira/browse/SPARK-11484 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Srinivasa Reddy Vundela >Priority: Minor > Attachments: Screen Shot 2015-10-29 at 9.06.46 AM.png > > > Customer reported a strange UI when running spark application through Oozie > in Uber mode (Issue was observed only in yarn-client mode). > When debugging the sparkUI through chrome developer console, figured out that > CSS files are looked for in different applicationId (Oozie mapreduce > application) instead of actual spark application. (Please see the attached > screenshot for more information). > Looking into the live sparkUI code, it seems that proxyBase is taken from > APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase > (Pointing to the actual spark application). This issue might be reproducible > if the above specified env is set manually or set by other job. > Fix would be giving precedence to spark property (which might be correct in > most cases when it was set). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11484) Strange Spark UI issue when running on Oozie uber mode ON
[ https://issues.apache.org/jira/browse/SPARK-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988195#comment-14988195 ] Srinivasa Reddy Vundela commented on SPARK-11484: - [~sowen] Oozie mapreduce job is setting APPLICATION_WEB_PROXY_BASE with the mapreduce application proxybase. I guess this issue will occur if this env is set manually or by some other job. Ideal case would be to use the proxyBase set by spark (Which would be correct in all cases), if property not found then using env would be a good option. I am not sure though why we were using the proxyBase from the env variable. > Strange Spark UI issue when running on Oozie uber mode ON > - > > Key: SPARK-11484 > URL: https://issues.apache.org/jira/browse/SPARK-11484 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Srinivasa Reddy Vundela >Priority: Minor > Attachments: Screen Shot 2015-10-29 at 9.06.46 AM.png > > > Live Spark UI without CSS styling is observed when running spark application > from Oozie in uber mode. Please see the attached screenshot for the strange > UI. In developer console we can see that UI is looking for css files in > mapreduce application proxy instead of spark application proxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11484) Strange Spark UI issue when running on Oozie uber mode ON
Srinivasa Reddy Vundela created SPARK-11484: --- Summary: Strange Spark UI issue when running on Oozie uber mode ON Key: SPARK-11484 URL: https://issues.apache.org/jira/browse/SPARK-11484 Project: Spark Issue Type: Bug Reporter: Srinivasa Reddy Vundela Live Spark UI without CSS styling is observed when running spark application from Oozie in uber mode. Please see the attached screenshot for the strange UI. In developer console we can see that UI is looking for css files in mapreduce application proxy instead of spark application proxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11484) Strange Spark UI issue when running on Oozie uber mode ON
[ https://issues.apache.org/jira/browse/SPARK-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srinivasa Reddy Vundela updated SPARK-11484: Attachment: Screen Shot 2015-10-29 at 9.06.46 AM.png > Strange Spark UI issue when running on Oozie uber mode ON > - > > Key: SPARK-11484 > URL: https://issues.apache.org/jira/browse/SPARK-11484 > Project: Spark > Issue Type: Bug >Reporter: Srinivasa Reddy Vundela > Attachments: Screen Shot 2015-10-29 at 9.06.46 AM.png > > > Live Spark UI without CSS styling is observed when running spark application > from Oozie in uber mode. Please see the attached screenshot for the strange > UI. In developer console we can see that UI is looking for css files in > mapreduce application proxy instead of spark application proxy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11105) Dsitribute the log4j.properties files from the client to the executors
Srinivasa Reddy Vundela created SPARK-11105: --- Summary: Dsitribute the log4j.properties files from the client to the executors Key: SPARK-11105 URL: https://issues.apache.org/jira/browse/SPARK-11105 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.5.1 Reporter: Srinivasa Reddy Vundela Priority: Minor The log4j.properties file from the client is not distributed to the executors. This means that the client settings are not applied to the executors and they run with the default settings. This affects troubleshooting and data gathering. The workaround is to use the --files option for spark-submit to propagate the log4j.properties file -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11105) Disitribute the log4j.properties files from the client to the executors
[ https://issues.apache.org/jira/browse/SPARK-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Srinivasa Reddy Vundela updated SPARK-11105: Summary: Disitribute the log4j.properties files from the client to the executors (was: Dsitribute the log4j.properties files from the client to the executors) > Disitribute the log4j.properties files from the client to the executors > --- > > Key: SPARK-11105 > URL: https://issues.apache.org/jira/browse/SPARK-11105 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.1 >Reporter: Srinivasa Reddy Vundela >Priority: Minor > > The log4j.properties file from the client is not distributed to the > executors. This means that the client settings are not applied to the > executors and they run with the default settings. > This affects troubleshooting and data gathering. > The workaround is to use the --files option for spark-submit to propagate the > log4j.properties file -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org