[jira] [Commented] (SPARK-21994) Spark 2.2 can not read Parquet table created by itself

2017-11-13 Thread Srinivasa Reddy Vundela (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250066#comment-16250066
 ] 

Srinivasa Reddy Vundela commented on SPARK-21994:
-

[~srowen] Thats right, it is not available in public release yet. I just posted 
for reference. 

> Spark 2.2 can not read Parquet table created by itself
> --
>
> Key: SPARK-21994
> URL: https://issues.apache.org/jira/browse/SPARK-21994
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
> Environment: Spark 2.2 on Cloudera CDH 5.10.1, Hive 1.1
>Reporter: Jurgis Pods
>
> This seems to be a new bug introduced in Spark 2.2, since it did not occur 
> under Spark 2.1.
> When writing a dataframe to a table in Parquet format, Spark SQL does not 
> write the 'path' of the table to the Hive metastore, unlike in previous 
> versions.
> As a consequence, Spark 2.2 is not able to read the table it just created. It 
> just outputs the table header without any row content. 
> A parallel installation of Spark 1.6 at least produces an appropriate error 
> trace:
> {code:java}
> 17/09/13 10:22:12 WARN metastore.ObjectStore: Version information not found 
> in metastore. hive.metastore.schema.verification is not enabled so recording 
> the schema version 1.1.0
> 17/09/13 10:22:12 WARN metastore.ObjectStore: Failed to get database default, 
> returning NoSuchObjectException
> org.spark-project.guava.util.concurrent.UncheckedExecutionException: 
> java.util.NoSuchElementException: key not found: path
> [...]
> {code}
> h3. Steps to reproduce:
> Run the following in spark2-shell:
> {code:java}
> scala> val df = spark.sql("show databases")
> scala> df.show()
> ++
> |databaseName|
> ++
> |   mydb1|
> |   mydb2|
> | default|
> |test|
> ++
> scala> df.write.format("parquet").saveAsTable("test.spark22_test")
> scala> spark.sql("select * from test.spark22_test").show()
> ++
> |databaseName|
> ++
> ++{code}
> When manually setting the path (causing the data to be saved as external 
> table), it works:
> {code:java}
> scala> df.write.option("path", 
> "/hadoop/eco/hive/warehouse/test.db/spark22_parquet_with_path").format("parquet").saveAsTable("test.spark22_parquet_with_path")
> scala> spark.sql("select * from test.spark22_parquet_with_path").show()
> ++
> |databaseName|
> ++
> |   mydb1|
> |   mydb2|
> | default|
> |test|
> ++
> {code}
> A second workaround is to update the metadata of the managed table created by 
> Spark 2.2:
> {code}
> spark.sql("alter table test.spark22_test set SERDEPROPERTIES 
> ('path'='hdfs://my-cluster-name:8020/hadoop/eco/hive/warehouse/test.db/spark22_test')")
> spark.catalog.refreshTable("test.spark22_test")
> spark.sql("select * from test.spark22_test").show()
> ++
> |databaseName|
> ++
> |   mydb1|
> |   mydb2|
> | default|
> |test|
> ++
> {code}
> It is kind of a disaster that we are not able to read tables created by the 
> very same Spark version and have to manually specify the path as an explicit 
> option.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21994) Spark 2.2 can not read Parquet table created by itself

2017-11-13 Thread Srinivasa Reddy Vundela (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16250032#comment-16250032
 ] 

Srinivasa Reddy Vundela commented on SPARK-21994:
-

commit d5e3ba3e970c7241298db2578f0d7965b6e16ae3
Author: Srinivasa Reddy Vundela 
Date:   Mon Oct 9 14:25:01 2017 -0700

CDH-60037. Not able to read hive table from Cloudera version of Spark 2.2

> Spark 2.2 can not read Parquet table created by itself
> --
>
> Key: SPARK-21994
> URL: https://issues.apache.org/jira/browse/SPARK-21994
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
> Environment: Spark 2.2 on Cloudera CDH 5.10.1, Hive 1.1
>Reporter: Jurgis Pods
>
> This seems to be a new bug introduced in Spark 2.2, since it did not occur 
> under Spark 2.1.
> When writing a dataframe to a table in Parquet format, Spark SQL does not 
> write the 'path' of the table to the Hive metastore, unlike in previous 
> versions.
> As a consequence, Spark 2.2 is not able to read the table it just created. It 
> just outputs the table header without any row content. 
> A parallel installation of Spark 1.6 at least produces an appropriate error 
> trace:
> {code:java}
> 17/09/13 10:22:12 WARN metastore.ObjectStore: Version information not found 
> in metastore. hive.metastore.schema.verification is not enabled so recording 
> the schema version 1.1.0
> 17/09/13 10:22:12 WARN metastore.ObjectStore: Failed to get database default, 
> returning NoSuchObjectException
> org.spark-project.guava.util.concurrent.UncheckedExecutionException: 
> java.util.NoSuchElementException: key not found: path
> [...]
> {code}
> h3. Steps to reproduce:
> Run the following in spark2-shell:
> {code:java}
> scala> val df = spark.sql("show databases")
> scala> df.show()
> ++
> |databaseName|
> ++
> |   mydb1|
> |   mydb2|
> | default|
> |test|
> ++
> scala> df.write.format("parquet").saveAsTable("test.spark22_test")
> scala> spark.sql("select * from test.spark22_test").show()
> ++
> |databaseName|
> ++
> ++{code}
> When manually setting the path (causing the data to be saved as external 
> table), it works:
> {code:java}
> scala> df.write.option("path", 
> "/hadoop/eco/hive/warehouse/test.db/spark22_parquet_with_path").format("parquet").saveAsTable("test.spark22_parquet_with_path")
> scala> spark.sql("select * from test.spark22_parquet_with_path").show()
> ++
> |databaseName|
> ++
> |   mydb1|
> |   mydb2|
> | default|
> |test|
> ++
> {code}
> A second workaround is to update the metadata of the managed table created by 
> Spark 2.2:
> {code}
> spark.sql("alter table test.spark22_test set SERDEPROPERTIES 
> ('path'='hdfs://my-cluster-name:8020/hadoop/eco/hive/warehouse/test.db/spark22_test')")
> spark.catalog.refreshTable("test.spark22_test")
> spark.sql("select * from test.spark22_test").show()
> ++
> |databaseName|
> ++
> |   mydb1|
> |   mydb2|
> | default|
> |test|
> ++
> {code}
> It is kind of a disaster that we are not able to read tables created by the 
> very same Spark version and have to manually specify the path as an explicit 
> option.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9104) expose network layer memory usage

2017-11-13 Thread Srinivasa Reddy Vundela (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249904#comment-16249904
 ] 

Srinivasa Reddy Vundela edited comment on SPARK-9104 at 11/13/17 6:39 PM:
--

Hi [~jerryshao] Thanks for the PR which exposes the Netty buffered pool memory 
usage, its a very good start to expose the unaccounted memory. But, I see that 
these are not registered with Metric System or Web UI. I was wondering if you 
have plans to expose them to metrics system? or Would it be okay if I send a PR 
and you help in reviewing? I see that you have included netty metrics for 
ExternalShuffleService and I was wondering for other parts which uses 
TransportServer and TransportClientFactory like NettyRpcEnv.


was (Author: vsr):
Hi [~jerryshao] Thanks for the PR which exposes the Netty buffered pool memory 
usage, its a very good start to expose the unaccounted memory. But, I see that 
these are not registered with Metric System or Web UI. I was wondering if you 
have plans to expose them to metrics system? or Would it be okay if I send a PR 
and you help in reviewing? 

> expose network layer memory usage
> -
>
> Key: SPARK-9104
> URL: https://issues.apache.org/jira/browse/SPARK-9104
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Zhang, Liye
>Assignee: Saisai Shao
> Fix For: 2.3.0
>
>
> The default network transportation is netty, and when transfering blocks for 
> shuffle, the network layer will consume a decent size of memory, we shall 
> collect the memory usage of this part and expose it. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9104) expose network layer memory usage

2017-11-13 Thread Srinivasa Reddy Vundela (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249904#comment-16249904
 ] 

Srinivasa Reddy Vundela commented on SPARK-9104:


Hi [~jerryshao] Thanks for the PR which exposes the Netty buffered pool memory 
usage, its a very good start to expose the unaccounted memory. But, I see that 
these are not registered with Metric System or Web UI. I was wondering if you 
have plans to expose them to metrics system? or Would it be okay if I send a PR 
and you help in reviewing? 

> expose network layer memory usage
> -
>
> Key: SPARK-9104
> URL: https://issues.apache.org/jira/browse/SPARK-9104
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Zhang, Liye
>Assignee: Saisai Shao
> Fix For: 2.3.0
>
>
> The default network transportation is netty, and when transfering blocks for 
> shuffle, the network layer will consume a decent size of memory, we shall 
> collect the memory usage of this part and expose it. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21994) Spark 2.2 can not read Parquet table created by itself

2017-11-09 Thread Srinivasa Reddy Vundela (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246463#comment-16246463
 ] 

Srinivasa Reddy Vundela commented on SPARK-21994:
-

This issue is related to Cloudera spark and got fixed recently. We can close 
this jira.

> Spark 2.2 can not read Parquet table created by itself
> --
>
> Key: SPARK-21994
> URL: https://issues.apache.org/jira/browse/SPARK-21994
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
> Environment: Spark 2.2 on Cloudera CDH 5.10.1, Hive 1.1
>Reporter: Jurgis Pods
>
> This seems to be a new bug introduced in Spark 2.2, since it did not occur 
> under Spark 2.1.
> When writing a dataframe to a table in Parquet format, Spark SQL does not 
> write the 'path' of the table to the Hive metastore, unlike in previous 
> versions.
> As a consequence, Spark 2.2 is not able to read the table it just created. It 
> just outputs the table header without any row content. 
> A parallel installation of Spark 1.6 at least produces an appropriate error 
> trace:
> {code:java}
> 17/09/13 10:22:12 WARN metastore.ObjectStore: Version information not found 
> in metastore. hive.metastore.schema.verification is not enabled so recording 
> the schema version 1.1.0
> 17/09/13 10:22:12 WARN metastore.ObjectStore: Failed to get database default, 
> returning NoSuchObjectException
> org.spark-project.guava.util.concurrent.UncheckedExecutionException: 
> java.util.NoSuchElementException: key not found: path
> [...]
> {code}
> h3. Steps to reproduce:
> Run the following in spark2-shell:
> {code:java}
> scala> val df = spark.sql("show databases")
> scala> df.show()
> ++
> |databaseName|
> ++
> |   mydb1|
> |   mydb2|
> | default|
> |test|
> ++
> scala> df.write.format("parquet").saveAsTable("test.spark22_test")
> scala> spark.sql("select * from test.spark22_test").show()
> ++
> |databaseName|
> ++
> ++{code}
> When manually setting the path (causing the data to be saved as external 
> table), it works:
> {code:java}
> scala> df.write.option("path", 
> "/hadoop/eco/hive/warehouse/test.db/spark22_parquet_with_path").format("parquet").saveAsTable("test.spark22_parquet_with_path")
> scala> spark.sql("select * from test.spark22_parquet_with_path").show()
> ++
> |databaseName|
> ++
> |   mydb1|
> |   mydb2|
> | default|
> |test|
> ++
> {code}
> A second workaround is to update the metadata of the managed table created by 
> Spark 2.2:
> {code}
> spark.sql("alter table test.spark22_test set SERDEPROPERTIES 
> ('path'='hdfs://my-cluster-name:8020/hadoop/eco/hive/warehouse/test.db/spark22_test')")
> spark.catalog.refreshTable("test.spark22_test")
> spark.sql("select * from test.spark22_test").show()
> ++
> |databaseName|
> ++
> |   mydb1|
> |   mydb2|
> | default|
> |test|
> ++
> {code}
> It is kind of a disaster that we are not able to read tables created by the 
> very same Spark version and have to manually specify the path as an explicit 
> option.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22483) Exposing java.nio bufferedPool memory metrics to metrics system

2017-11-09 Thread Srinivasa Reddy Vundela (JIRA)
Srinivasa Reddy Vundela created SPARK-22483:
---

 Summary: Exposing java.nio bufferedPool memory metrics to metrics 
system
 Key: SPARK-22483
 URL: https://issues.apache.org/jira/browse/SPARK-22483
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Srinivasa Reddy Vundela


Spark currently exposes on-heap and off-heap memory of JVM to metric system. 
Currently there is no way to know how much direct/mapped memory allocated for 
java.nio buffered pools. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12717) pyspark broadcast fails when using multiple threads

2017-04-20 Thread Srinivasa Reddy Vundela (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivasa Reddy Vundela updated SPARK-12717:

Attachment: run.log

Please find the attached log with fix for the following command 
spark2-submit  --master local[20] bug_spark.py --parallelism 1000 >& run.log

> pyspark broadcast fails when using multiple threads
> ---
>
> Key: SPARK-12717
> URL: https://issues.apache.org/jira/browse/SPARK-12717
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.6.0
> Environment: Linux, python 2.6 or python 2.7.
>Reporter: Edward Walker
>Priority: Critical
> Attachments: run.log
>
>
> The following multi-threaded program that uses broadcast variables 
> consistently throws exceptions like:  *Exception("Broadcast variable '18' not 
> loaded!",)* --- even when run with "--master local[10]".
> {code:title=bug_spark.py|borderStyle=solid}
> try:  
>  
> import pyspark
>  
> except:   
>  
> pass  
>  
> from optparse import OptionParser 
>  
>   
>  
> def my_option_parser():   
>  
> op = OptionParser()   
>  
> op.add_option("--parallelism", dest="parallelism", type="int", 
> default=20)  
> return op 
>  
>   
>  
> def do_process(x, w): 
>  
> return x * w.value
>  
>   
>  
> def func(name, rdd, conf):
>  
> new_rdd = rdd.map(lambda x :   do_process(x, conf))   
>  
> total = new_rdd.reduce(lambda x, y : x + y)   
>  
> count = rdd.count()   
>  
> print name, 1.0 * total / count   
>  
>   
>  
> if __name__ == "__main__":
>  
> import threading  
>  
> op = my_option_parser()   
>  
> options, args = op.parse_args()   
>  
> sc = pyspark.SparkContext(appName="Buggy")
>  
> data_rdd = sc.parallelize(range(0,1000), 1)   
>  
> confs = [ sc.broadcast(i) for i in xrange(options.parallelism) ]  
>  
> threads = [ threading.Thread(target=func, args=["thread_" + str(i), 
> data_rdd, confs[i]]) for i in xrange(options.parallelism) ]   
>
> for t in threads: 
>  
> t.start() 
>  
> for t in threads: 
>  
> t.join() 
> {code}
> Abridged run output:
> {code:title=abridge_run.txt|borderStyle=solid}
> % spark-submit 

[jira] [Commented] (SPARK-19732) DataFrame.fillna() does not work for bools in PySpark

2017-04-19 Thread Srinivasa Reddy Vundela (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975369#comment-15975369
 ] 

Srinivasa Reddy Vundela commented on SPARK-19732:
-

Hi [~lenfro], I was checking the documentation for fillna (python API) and 
na.fill(scala API). Both places I see that supported datatypes for value are 
either String or Numerical type (int, float, double). So, this surely not a bug 
as the implementation is in sync with documentation. May be this can be changed 
to enhancement.

> DataFrame.fillna() does not work for bools in PySpark
> -
>
> Key: SPARK-19732
> URL: https://issues.apache.org/jira/browse/SPARK-19732
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Len Frodgers
>
> In PySpark, the fillna function of DataFrame inadvertently casts bools to 
> ints, so fillna cannot be used to fill True/False.
> e.g. 
> `spark.createDataFrame([Row(a=True),Row(a=None)]).fillna(True).collect()` 
> yields
> `[Row(a=True), Row(a=None)]`
> It should be a=True for the second Row
> The cause is this bit of code: 
> {code}
> if isinstance(value, (int, long)):
> value = float(value)
> {code}
> There needs to be a separate check for isinstance(bool), since in python, 
> bools are ints too



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13341) Casting Unix timestamp to SQL timestamp fails

2016-02-19 Thread Srinivasa Reddy Vundela (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154694#comment-15154694
 ] 

Srinivasa Reddy Vundela commented on SPARK-13341:
-

I guess the following commit is the reason for the change
https://github.com/apache/spark/commit/9ed4ad4265cf9d3135307eb62dae6de0b220fc21

Seems HIVE-3454 fixed in 1.2.0 and if customers are using earlier versions of 
HIVE they will see this problem.

> Casting Unix timestamp to SQL timestamp fails
> -
>
> Key: SPARK-13341
> URL: https://issues.apache.org/jira/browse/SPARK-13341
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: William Dee
>
> The way that unix timestamp casting is handled has been broken between Spark 
> 1.5.2 and Spark 1.6.0. This can be easily demonstrated via the spark-shell:
> {code:title=1.5.2}
> scala> sqlContext.sql("SELECT CAST(145558084 AS TIMESTAMP) as ts, 
> CAST(CAST(145558084 AS TIMESTAMP) AS DATE) as d").show
> ++--+
> |  ts| d|
> ++--+
> |2016-02-16 00:00:...|2016-02-16|
> ++--+
> {code}
> {code:title=1.6.0}
> scala> sqlContext.sql("SELECT CAST(145558084 AS TIMESTAMP) as ts, 
> CAST(CAST(145558084 AS TIMESTAMP) AS DATE) as d").show
> ++--+
> |  ts| d|
> ++--+
> |48095-07-09 12:06...|095-07-09|
> ++--+
> {code}
> I'm not sure what exactly is causing this but this defect has definitely been 
> introduced in Spark 1.6.0 as jobs that relied on this functionality ran on 
> 1.5.2 and now don't run on 1.6.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11801) Notify driver when OOM is thrown before executor JVM is killed

2015-12-01 Thread Srinivasa Reddy Vundela (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034846#comment-15034846
 ] 

Srinivasa Reddy Vundela commented on SPARK-11801:
-

[~irashid] This is what I have observed during my testing. Once JVM gets OOM 
exception, it does the 'kill %p'. During this process it creates a new thread 
to execute the shutdown hooks. Since, the new thread is yet to be scheduled 
always the OOM thread is winning the race. The only case it may not win the 
race is when it gets preempted just after getting OOM. 

> Notify driver when OOM is thrown before executor JVM is killed 
> ---
>
> Key: SPARK-11801
> URL: https://issues.apache.org/jira/browse/SPARK-11801
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Srinivasa Reddy Vundela
>Priority: Minor
>
> Here is some background for the issue.
> Customer got OOM exception in one of the task and executor got killed with 
> kill %p. It is unclear in driver logs/Spark UI why the task is lost or 
> executor is lost. Customer has to look into the executor logs to see OOM is 
> the cause for the task/executor lost. 
> It would be helpful if driver logs/spark UI shows the reason for task 
> failures by making sure that task updates the driver with OOM. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11799) Make it explicit in executor logs that uncaught exceptions are thrown during executor shutdown

2015-11-18 Thread Srinivasa Reddy Vundela (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011080#comment-15011080
 ] 

Srinivasa Reddy Vundela commented on SPARK-11799:
-

Thanks [~srowen], it was added by mistake

> Make it explicit in executor logs that uncaught exceptions are thrown during 
> executor shutdown
> --
>
> Key: SPARK-11799
> URL: https://issues.apache.org/jira/browse/SPARK-11799
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Srinivasa Reddy Vundela
>Priority: Minor
>
> Here is some background for the issue.
> Customer got OOM exception in one of the task and executor got killed with 
> kill %p. Few shutdown hooks are registered with ShutDownHookManager to do the 
> hadoop temp directory cleanup. During this shutdown phase other tasks are 
> throwing uncaught exception and executor logs are filled up with so many of 
> them. 
> Since it is unclear for the customer in driver logs/ Spark UI why the 
> container was lost customer is going through the executor logs and he see lot 
> of uncaught exception. 
> It would be clear to the customer if we can prepend the uncaught exceptions 
> with some message like [Container is in shutdown mode] so that he can skip 
> those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11801) Notify driver when OOM is thrown before executor JVM is killed

2015-11-17 Thread Srinivasa Reddy Vundela (JIRA)
Srinivasa Reddy Vundela created SPARK-11801:
---

 Summary: Notify driver when OOM is thrown before executor JVM is 
killed 
 Key: SPARK-11801
 URL: https://issues.apache.org/jira/browse/SPARK-11801
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.5.1
Reporter: Srinivasa Reddy Vundela
Priority: Minor


Here is some background for the issue.
Customer got OOM exception in one of the task and executor got killed with kill 
%p. It is unclear in driver logs/Spark UI why the task is lost or executor is 
lost. Customer has to look into the executor logs to see OOM is the cause for 
the task/executor lost. 

It would be helpful if driver logs/spark UI shows the reason for task failures 
by making sure that task updates the driver with OOM. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11799) Make it explicit in executor logs that uncaught exceptions are thrown during executor shutdown

2015-11-17 Thread Srinivasa Reddy Vundela (JIRA)
Srinivasa Reddy Vundela created SPARK-11799:
---

 Summary: Make it explicit in executor logs that uncaught 
exceptions are thrown during executor shutdown
 Key: SPARK-11799
 URL: https://issues.apache.org/jira/browse/SPARK-11799
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.5.1
Reporter: Srinivasa Reddy Vundela
Priority: Minor


Here is some background for the issue.

Customer got OOM exception in one of the task and executor got killed with kill 
%p. Few shutdown hooks are registered with ShutDownHookManager to do the hadoop 
temp directory cleanup. During this shutdown phase other tasks are throwing 
uncaught exception and executor logs are filled up with so many of them. 

Since it is unclear for the customer in driver logs/ Spark UI why the container 
was lost customer is going through the executor logs and he see lot of uncaught 
exception. 

It would be clear to the customer if we can prepend the uncaught exceptions 
with some message like [Container is in shutdown mode] so that he can skip 
those.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11484) Giving precedence to proxyBase set by spark instead of env

2015-11-04 Thread Srinivasa Reddy Vundela (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivasa Reddy Vundela updated SPARK-11484:

Description: 
Customer reported a strange UI when running spark application through Oozie in 
Uber mode (Issue was observed only in yarn-client mode).

When debugging the sparkUI through chrome developer console, figured out that 
CSS files are looked for in different applicationId (Oozie mapreduce 
application) instead of actual spark application. (Please see the attached 
screenshot for more information).

Looking into the live sparkUI it seems that proxyBase is taken from 
APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase 
(Pointing to the actual spark application). 

This fix gives precedence to spark property (which should be correct in most 
cases when it was set).

  was:
Live Spark UI without CSS styling is observed when running spark application 
from Oozie in uber mode.  Please see the attached screenshot for the strange 
UI. In developer console we can see that UI is looking for css files in 
mapreduce application proxy instead of spark application proxy. 



Summary: Giving precedence to proxyBase set by spark instead of env  
(was: Strange Spark UI issue when running on Oozie uber mode ON)

> Giving precedence to proxyBase set by spark instead of env
> --
>
> Key: SPARK-11484
> URL: https://issues.apache.org/jira/browse/SPARK-11484
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Srinivasa Reddy Vundela
>Priority: Minor
> Attachments: Screen Shot 2015-10-29 at 9.06.46 AM.png
>
>
> Customer reported a strange UI when running spark application through Oozie 
> in Uber mode (Issue was observed only in yarn-client mode).
> When debugging the sparkUI through chrome developer console, figured out that 
> CSS files are looked for in different applicationId (Oozie mapreduce 
> application) instead of actual spark application. (Please see the attached 
> screenshot for more information).
> Looking into the live sparkUI it seems that proxyBase is taken from 
> APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase 
> (Pointing to the actual spark application). 
> This fix gives precedence to spark property (which should be correct in most 
> cases when it was set).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11484) Giving precedence to proxyBase set by spark instead of env

2015-11-04 Thread Srinivasa Reddy Vundela (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivasa Reddy Vundela updated SPARK-11484:

Description: 
Customer reported a strange UI when running spark application through Oozie in 
Uber mode (Issue was observed only in yarn-client mode).

When debugging the sparkUI through chrome developer console, figured out that 
CSS files are looked for in different applicationId (Oozie mapreduce 
application) instead of actual spark application. (Please see the attached 
screenshot for more information).

Looking into the live sparkUI it seems that proxyBase is taken from 
APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase 
(Pointing to the actual spark application). 

This fix gives precedence to spark property (which should be correct in most 
cases when it was set), which should fix the issue.

  was:
Customer reported a strange UI when running spark application through Oozie in 
Uber mode (Issue was observed only in yarn-client mode).

When debugging the sparkUI through chrome developer console, figured out that 
CSS files are looked for in different applicationId (Oozie mapreduce 
application) instead of actual spark application. (Please see the attached 
screenshot for more information).

Looking into the live sparkUI it seems that proxyBase is taken from 
APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase 
(Pointing to the actual spark application). 

This fix gives precedence to spark property (which should be correct in most 
cases when it was set).


> Giving precedence to proxyBase set by spark instead of env
> --
>
> Key: SPARK-11484
> URL: https://issues.apache.org/jira/browse/SPARK-11484
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Srinivasa Reddy Vundela
>Priority: Minor
> Attachments: Screen Shot 2015-10-29 at 9.06.46 AM.png
>
>
> Customer reported a strange UI when running spark application through Oozie 
> in Uber mode (Issue was observed only in yarn-client mode).
> When debugging the sparkUI through chrome developer console, figured out that 
> CSS files are looked for in different applicationId (Oozie mapreduce 
> application) instead of actual spark application. (Please see the attached 
> screenshot for more information).
> Looking into the live sparkUI it seems that proxyBase is taken from 
> APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase 
> (Pointing to the actual spark application). 
> This fix gives precedence to spark property (which should be correct in most 
> cases when it was set), which should fix the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11484) Giving precedence to proxyBase set by spark instead of env

2015-11-04 Thread Srinivasa Reddy Vundela (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivasa Reddy Vundela updated SPARK-11484:

Description: 
Customer reported a strange UI when running spark application through Oozie in 
Uber mode (Issue was observed only in yarn-client mode).

When debugging the sparkUI through chrome developer console, figured out that 
CSS files are looked for in different applicationId (Oozie mapreduce 
application) instead of actual spark application. (Please see the attached 
screenshot for more information).

Looking into the live sparkUI code, it seems that proxyBase is taken from 
APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase 
(Pointing to the actual spark application). This issue might be reproducible if 
the above specified env is set manually or set by other job. 

Fix would be giving precedence to spark property (which might be correct in 
most cases when it was set). 

  was:
Customer reported a strange UI when running spark application through Oozie in 
Uber mode (Issue was observed only in yarn-client mode).

When debugging the sparkUI through chrome developer console, figured out that 
CSS files are looked for in different applicationId (Oozie mapreduce 
application) instead of actual spark application. (Please see the attached 
screenshot for more information).

Looking into the live sparkUI it seems that proxyBase is taken from 
APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase 
(Pointing to the actual spark application). 

This fix gives precedence to spark property (which should be correct in most 
cases when it was set), which should fix the issue.


> Giving precedence to proxyBase set by spark instead of env
> --
>
> Key: SPARK-11484
> URL: https://issues.apache.org/jira/browse/SPARK-11484
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Srinivasa Reddy Vundela
>Priority: Minor
> Attachments: Screen Shot 2015-10-29 at 9.06.46 AM.png
>
>
> Customer reported a strange UI when running spark application through Oozie 
> in Uber mode (Issue was observed only in yarn-client mode).
> When debugging the sparkUI through chrome developer console, figured out that 
> CSS files are looked for in different applicationId (Oozie mapreduce 
> application) instead of actual spark application. (Please see the attached 
> screenshot for more information).
> Looking into the live sparkUI code, it seems that proxyBase is taken from 
> APPLICATION_WEB_PROXY_BASE instead of spark property spark.ui.proxyBase 
> (Pointing to the actual spark application). This issue might be reproducible 
> if the above specified env is set manually or set by other job. 
> Fix would be giving precedence to spark property (which might be correct in 
> most cases when it was set). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11484) Strange Spark UI issue when running on Oozie uber mode ON

2015-11-03 Thread Srinivasa Reddy Vundela (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988195#comment-14988195
 ] 

Srinivasa Reddy Vundela commented on SPARK-11484:
-

[~sowen] Oozie mapreduce job is setting APPLICATION_WEB_PROXY_BASE with the 
mapreduce application proxybase. I guess this issue will occur if this env is 
set manually or by some other job. Ideal case would be to use the proxyBase set 
by spark (Which would be correct in all cases), if property not found then 
using env would be a good option. 

I am not sure though why we were using the proxyBase from the env variable.

> Strange Spark UI issue when running on Oozie uber mode ON
> -
>
> Key: SPARK-11484
> URL: https://issues.apache.org/jira/browse/SPARK-11484
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Srinivasa Reddy Vundela
>Priority: Minor
> Attachments: Screen Shot 2015-10-29 at 9.06.46 AM.png
>
>
> Live Spark UI without CSS styling is observed when running spark application 
> from Oozie in uber mode.  Please see the attached screenshot for the strange 
> UI. In developer console we can see that UI is looking for css files in 
> mapreduce application proxy instead of spark application proxy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11484) Strange Spark UI issue when running on Oozie uber mode ON

2015-11-03 Thread Srinivasa Reddy Vundela (JIRA)
Srinivasa Reddy Vundela created SPARK-11484:
---

 Summary: Strange Spark UI issue when running on Oozie uber mode ON
 Key: SPARK-11484
 URL: https://issues.apache.org/jira/browse/SPARK-11484
 Project: Spark
  Issue Type: Bug
Reporter: Srinivasa Reddy Vundela


Live Spark UI without CSS styling is observed when running spark application 
from Oozie in uber mode.  Please see the attached screenshot for the strange 
UI. In developer console we can see that UI is looking for css files in 
mapreduce application proxy instead of spark application proxy. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11484) Strange Spark UI issue when running on Oozie uber mode ON

2015-11-03 Thread Srinivasa Reddy Vundela (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivasa Reddy Vundela updated SPARK-11484:

Attachment: Screen Shot 2015-10-29 at 9.06.46 AM.png

> Strange Spark UI issue when running on Oozie uber mode ON
> -
>
> Key: SPARK-11484
> URL: https://issues.apache.org/jira/browse/SPARK-11484
> Project: Spark
>  Issue Type: Bug
>Reporter: Srinivasa Reddy Vundela
> Attachments: Screen Shot 2015-10-29 at 9.06.46 AM.png
>
>
> Live Spark UI without CSS styling is observed when running spark application 
> from Oozie in uber mode.  Please see the attached screenshot for the strange 
> UI. In developer console we can see that UI is looking for css files in 
> mapreduce application proxy instead of spark application proxy. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11105) Dsitribute the log4j.properties files from the client to the executors

2015-10-14 Thread Srinivasa Reddy Vundela (JIRA)
Srinivasa Reddy Vundela created SPARK-11105:
---

 Summary: Dsitribute the log4j.properties files from the client to 
the executors
 Key: SPARK-11105
 URL: https://issues.apache.org/jira/browse/SPARK-11105
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.5.1
Reporter: Srinivasa Reddy Vundela
Priority: Minor


The log4j.properties file from the client is not distributed to the executors. 
This means that the client settings are not applied to the executors and they 
run with the default settings.
This affects troubleshooting and data gathering.
The workaround is to use the --files option for spark-submit to propagate the 
log4j.properties file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11105) Disitribute the log4j.properties files from the client to the executors

2015-10-14 Thread Srinivasa Reddy Vundela (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivasa Reddy Vundela updated SPARK-11105:

Summary: Disitribute the log4j.properties files from the client to the 
executors  (was: Dsitribute the log4j.properties files from the client to the 
executors)

> Disitribute the log4j.properties files from the client to the executors
> ---
>
> Key: SPARK-11105
> URL: https://issues.apache.org/jira/browse/SPARK-11105
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.1
>Reporter: Srinivasa Reddy Vundela
>Priority: Minor
>
> The log4j.properties file from the client is not distributed to the 
> executors. This means that the client settings are not applied to the 
> executors and they run with the default settings.
> This affects troubleshooting and data gathering.
> The workaround is to use the --files option for spark-submit to propagate the 
> log4j.properties file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org