date:20180831



 [ 
https://issues.apache.org/jira/browse/SPARK-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25301:


Assignee: Apache Spark

> When a view uses an UDF from a non default database, Spark analyser throws 
> AnalysisException
> 
>
> Key: SPARK-25301
> URL: https://issues.apache.org/jira/browse/SPARK-25301
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Vinod KC
>Assignee: Apache Spark
>Priority: Minor
>
> When a hive view uses an UDF from a non default database, Spark analyser 
> throws AnalysisException
> Steps to simulate this issue
>  -
>  In Hive
>  
>  1) CREATE DATABASE d100;
>  2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 
> 'com.uzx.udf.Masking'
>  3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is 
> created in d100
>  4) create view d100.v100 as select *d100.udf100*(name)  from default.emp; // 
> Note : table default.emp has two columns 'nanme', 'address', 
>  5) select * from d100.v100; // query on view d100.v100 gives correct result
> In Spark
>  -
>  1) spark.sql("select * from d100.v100").show
>  throws 
>  ```
>  org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. 
> This function is neither a registered temporary function nor a permanent 
> function registered in the database '*default*'
>  ```
> This is because, while parsing the SQL statement of the View 'select 
> `d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to 
> split database name and udf name and hence Spark function registry tries to 
> load the UDF 'd100.udf100' from 'default' database.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-25301) When a view uses an UDF from a non default database, Spark analyser throws AnalysisException



 [ 
https://issues.apache.org/jira/browse/SPARK-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25301:


Assignee: (was: Apache Spark)

> When a view uses an UDF from a non default database, Spark analyser throws 
> AnalysisException
> 
>
> Key: SPARK-25301
> URL: https://issues.apache.org/jira/browse/SPARK-25301
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Vinod KC
>Priority: Minor
>
> When a hive view uses an UDF from a non default database, Spark analyser 
> throws AnalysisException
> Steps to simulate this issue
>  -
>  In Hive
>  
>  1) CREATE DATABASE d100;
>  2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 
> 'com.uzx.udf.Masking'
>  3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is 
> created in d100
>  4) create view d100.v100 as select *d100.udf100*(name)  from default.emp; // 
> Note : table default.emp has two columns 'nanme', 'address', 
>  5) select * from d100.v100; // query on view d100.v100 gives correct result
> In Spark
>  -
>  1) spark.sql("select * from d100.v100").show
>  throws 
>  ```
>  org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. 
> This function is neither a registered temporary function nor a permanent 
> function registered in the database '*default*'
>  ```
> This is because, while parsing the SQL statement of the View 'select 
> `d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to 
> split database name and udf name and hence Spark function registry tries to 
> load the UDF 'd100.udf100' from 'default' database.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25301) When a view uses an UDF from a non default database, Spark analyser throws AnalysisException



[ 
https://issues.apache.org/jira/browse/SPARK-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599540#comment-16599540
 ] 

Apache Spark commented on SPARK-25301:
--

User 'vinodkc' has created a pull request for this issue:
https://github.com/apache/spark/pull/22307

> When a view uses an UDF from a non default database, Spark analyser throws 
> AnalysisException
> 
>
> Key: SPARK-25301
> URL: https://issues.apache.org/jira/browse/SPARK-25301
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Vinod KC
>Priority: Minor
>
> When a hive view uses an UDF from a non default database, Spark analyser 
> throws AnalysisException
> Steps to simulate this issue
>  -
>  In Hive
>  
>  1) CREATE DATABASE d100;
>  2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 
> 'com.uzx.udf.Masking'
>  3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is 
> created in d100
>  4) create view d100.v100 as select *d100.udf100*(name)  from default.emp; // 
> Note : table default.emp has two columns 'nanme', 'address', 
>  5) select * from d100.v100; // query on view d100.v100 gives correct result
> In Spark
>  -
>  1) spark.sql("select * from d100.v100").show
>  throws 
>  ```
>  org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. 
> This function is neither a registered temporary function nor a permanent 
> function registered in the database '*default*'
>  ```
> This is because, while parsing the SQL statement of the View 'select 
> `d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to 
> split database name and udf name and hence Spark function registry tries to 
> load the UDF 'd100.udf100' from 'default' database.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-25301) When a view uses an UDF from a non default database, Spark analyser throws AnalysisException

2018-08-31 Thread Vinod KC (JIRA)

Vinod KC created SPARK-25301:


 Summary: When a view uses an UDF from a non default database, 
Spark analyser throws AnalysisException
 Key: SPARK-25301
 URL: https://issues.apache.org/jira/browse/SPARK-25301
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Vinod KC


When a hive view uses an UDF from a non default database, Spark analyser throws 
AnalysisException

Steps to simulate this issue
 -
 In Hive
 
 1) CREATE DATABASE d100;
 2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 
'com.uzx.udf.Masking'
 3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is 
created in d100
 4) create view d100.v100 as select *d100.udf100*(name)  from default.emp; // 
Note : table default.emp has two columns 'nanme', 'address', 
 5) select * from d100.v100; // query on view d100.v100 gives correct result

In Spark
 -
 1) spark.sql("select * from d100.v100").show
 throws 
 ```
 org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. 
This function is neither a registered temporary function nor a permanent 
function registered in the database '*default*'
 ```

This is because, while parsing the SQL statement of the View 'select 
`d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to split 
database name and udf name and hence Spark function registry tries to load the 
UDF 'd100.udf100' from 'default' database.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23466) Remove redundant null checks in generated Java code by GenerateUnsafeProjection

2018-08-31 Thread Takuya Ueshin (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-23466.
---
   Resolution: Fixed
 Assignee: Kazuaki Ishizaki
Fix Version/s: 2.4.0

Issue resolved by pull request 20637
https://github.com/apache/spark/pull/20637

> Remove redundant null checks in generated Java code by 
> GenerateUnsafeProjection
> ---
>
> Key: SPARK-23466
> URL: https://issues.apache.org/jira/browse/SPARK-23466
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Major
> Fix For: 2.4.0
>
>
> One of TODOs in {{GenerateUnsafeProjection}} is "if the nullability of field 
> is correct, we can use it to save null check" to simplify generated code.
> When {{nullable=false}} in {{DataType}}, {{GenerateUnsafeProjection}} removed 
> code for null checks in the generated Java code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-25300) Unified the configuration parameter `spark.shuffle.service.enabled`



 [ 
https://issues.apache.org/jira/browse/SPARK-25300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25300:


Assignee: (was: Apache Spark)

> Unified the configuration parameter `spark.shuffle.service.enabled`
> ---
>
> Key: SPARK-25300
> URL: https://issues.apache.org/jira/browse/SPARK-25300
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: liuxian
>Priority: Minor
>
> The configuration parameter "spark.shuffle.service.enabled"  has defined in 
> `package.scala`,  and it  is also used in many place, so we can replace it 
> with `SHUFFLE_SERVICE_ENABLED`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25300) Unified the configuration parameter `spark.shuffle.service.enabled`



[ 
https://issues.apache.org/jira/browse/SPARK-25300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599512#comment-16599512
 ] 

Apache Spark commented on SPARK-25300:
--

User '10110346' has created a pull request for this issue:
https://github.com/apache/spark/pull/22306

> Unified the configuration parameter `spark.shuffle.service.enabled`
> ---
>
> Key: SPARK-25300
> URL: https://issues.apache.org/jira/browse/SPARK-25300
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: liuxian
>Priority: Minor
>
> The configuration parameter "spark.shuffle.service.enabled"  has defined in 
> `package.scala`,  and it  is also used in many place, so we can replace it 
> with `SHUFFLE_SERVICE_ENABLED`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-25300) Unified the configuration parameter `spark.shuffle.service.enabled`



 [ 
https://issues.apache.org/jira/browse/SPARK-25300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25300:


Assignee: Apache Spark

> Unified the configuration parameter `spark.shuffle.service.enabled`
> ---
>
> Key: SPARK-25300
> URL: https://issues.apache.org/jira/browse/SPARK-25300
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: liuxian
>Assignee: Apache Spark
>Priority: Minor
>
> The configuration parameter "spark.shuffle.service.enabled"  has defined in 
> `package.scala`,  and it  is also used in many place, so we can replace it 
> with `SHUFFLE_SERVICE_ENABLED`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-25300) Unified the configuration parameter `spark.shuffle.service.enabled`

2018-08-31 Thread liuxian (JIRA)

liuxian created SPARK-25300:
---

 Summary: Unified the configuration parameter 
`spark.shuffle.service.enabled`
 Key: SPARK-25300
 URL: https://issues.apache.org/jira/browse/SPARK-25300
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: liuxian


The configuration parameter "spark.shuffle.service.enabled"  has defined in 
`package.scala`,  and it  is also used in many place, so we can replace it with 
`SHUFFLE_SERVICE_ENABLED`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25299) Use distributed storage for persisting shuffle data



[ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599428#comment-16599428
 ] 

Matt Cheah commented on SPARK-25299:


 

Note that SPARK-1529 was a much earlier feature request that is more or less 
identical to this one, but the old age of SPARK-1529 led me to open this newer 
issue instead of re-opening the old one. If it is preferable to use the old 
issue we can do that as well.

> Use distributed storage for persisting shuffle data
> ---
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-25299) Use distributed storage for persisting shuffle data



[ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599428#comment-16599428
 ] 

Matt Cheah edited comment on SPARK-25299 at 9/1/18 12:27 AM:
-

Note that SPARK-1529 was a much earlier feature request that is more or less 
identical to this one, but the old age of SPARK-1529 led me to open this newer 
issue instead of re-opening the old one. If it is preferable to use the old 
issue we can do that as well.


was (Author: mcheah):
 

Note that SPARK-1529 was a much earlier feature request that is more or less 
identical to this one, but the old age of SPARK-1529 led me to open this newer 
issue instead of re-opening the old one. If it is preferable to use the old 
issue we can do that as well.

> Use distributed storage for persisting shuffle data
> ---
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-25299) Use distributed storage for persisting shuffle data

Matt Cheah created SPARK-25299:
--

Summary: Use distributed storage for persisting shuffle data
Key: SPARK-25299
URL: https://issues.apache.org/jira/browse/SPARK-25299
Project: Spark
Issue Type: New Feature
Components: Shuffle
Affects Versions: 2.4.0
Reporter: Matt Cheah

In Spark, the shuffle primitive requires Spark executors to persist data to the
local disk of the worker nodes. If executors crash, the external shuffle
service can continue to serve the shuffle data that was written beyond the
lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the
external shuffle service is deployed on every worker node. The shuffle service
shares local disk with the executors that run on its node.

There are some shortcomings with the way shuffle is fundamentally implemented
right now. Particularly:
* If any external shuffle service process or node becomes unavailable, all
applications that had an executor that ran on that node must recompute the
shuffle blocks that were lost.
* Similarly to the above, the external shuffle service must be kept running at
all times, which may waste resources when no applications are using that
shuffle service node.
* Mounting local storage can prevent users from taking advantage of desirable
isolation benefits from using containerized environments, like Kubernetes. We
had an external shuffle service implementation in an early prototype of the
Kubernetes backend, but it was rejected due to its strict requirement to be
able to mount hostPath volumes or other persistent volume setups.

In the following [architecture discussion
document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
(note: _not_ an SPIP), we brainstorm various high level architectures for
improving the external shuffle service in a way that addresses the above
problems. The purpose of this umbrella JIRA is to promote additional discussion
on how we can approach these problems, both at the architecture level and the
implementation level. We anticipate filing sub-issues that break down the tasks
that must be completed to achieve this goal.

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599380#comment-16599380
 ] 

Felix Cheung commented on SPARK-24434:
--

.. and let's make sure any discussion are summarized and communicated as per 
ASF policy (eg. update this JIRA) and be mindful of other's contribution, work 
schedule and life style etc.

Sounds like in this case we could:
 * record any discussion in k8s sig, slack or offline and communicate in this 
Jira and/or to [d...@spark.apache.org|mailto:d...@spark.apache.org]
 * give plenty of heads up time to the originator, eg. give people 3-4 days to 
response, react etc after directly pinging the person on the Jira or email
 * make sure due credit is given in JIra, github PR description etc and link to 
any history or reference design doc
 * if this PR [https://github.com/apache/spark/pull/22146] is intended to be a 
WIP, please mark and describe it as such. As of now I don't see any indication 
of it

I believe this would then follow more closely to the convention we have adopted 
for the Apache Spark project. As stated, we do not assign Jira to user until 
after it is merged and Jira to be resolved, for various reasons. However, 
typically contributor expresses their desire to work on something in Jira or 
dev@ and wait a bit for feedback or comment.

[~onursatici] could you please update your PR to the effect outlined above?

[~skonto] hopefully this make sense to you and we (Spark, k8s communities) 
would still love to work with you, and would like your feedback on guiding the 
PR to completion.

 

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25264) Fix comma-delineated arguments passed into PythonRunner and RRunner



 [ 
https://issues.apache.org/jira/browse/SPARK-25264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-25264.

   Resolution: Fixed
Fix Version/s: 2.4.0

> Fix comma-delineated arguments passed into PythonRunner and RRunner
> ---
>
> Key: SPARK-25264
> URL: https://issues.apache.org/jira/browse/SPARK-25264
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 2.4.0
>Reporter: Ilan Filonenko
>Priority: Major
> Fix For: 2.4.0
>
>
> The arguments passed into the PythonRunner and RRunner are comma-delineated. 
> Because the Runners do a arg.slice(2,...) This means that the delineation in 
> the entrypoint needs to be a space, as it would be expected by the Runner 
> arguments. 
> This issue was logged here: 
> [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/273]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25282) Fix support for spark-shell with K8s

2018-08-31 Thread Yinan Li (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599310#comment-16599310
 ] 

Yinan Li commented on SPARK-25282:
--

I'm not sure this is a bug and how this should be enforced systematically. When 
you use the client mode and run the driver outside a cluster on a host, you are 
using the Spark distribution on the host, which may or may not have the same 
version as that of the Spark jars in the image. I guess this is not even a 
unique problem to Spark on Kubernetes.

> Fix support for spark-shell with K8s
> 
>
> Key: SPARK-25282
> URL: https://issues.apache.org/jira/browse/SPARK-25282
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Spark shell when run with kubernetes master, gives following errors.
> {noformat}
> java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local 
> class incompatible: stream classdesc serialVersionUID = -3720498261147521051, 
> local class serialVersionUID = -6655865447853211720
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
> {noformat}
> Special care was taken to ensure, the same compiled jar was used both in 
> images and the host system. or system running the driver.
> This issue affects, pyspark and R interface as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25295) Pod names conflicts in client mode, if previous submission was not a clean shutdown.

2018-08-31 Thread Yinan Li (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599308#comment-16599308
 ] 

Yinan Li commented on SPARK-25295:
--

We made it clear in the documentation of the Kubernetes mode at 
[https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md#client-mode-executor-pod-garbage-collection]
 that when running the client mode, executor pods may be left behind. This is 
by design. If you want to have the executor pods deleted automatically, run the 
driver in a pod inside the cluster and set {{spark.driver.pod.name}} to the 
name of the driver pod so an {{OwnerReference}} pointing to the driver pod gets 
added to the executor pods. This way the executor pods get garbage collected 
when the driver pod is gone.

> Pod names conflicts in client mode, if previous submission was not a clean 
> shutdown.
> 
>
> Key: SPARK-25295
> URL: https://issues.apache.org/jira/browse/SPARK-25295
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Prashant Sharma
>Priority: Major
>
> If the previous job was killed somehow, by disconnecting the client. It 
> leaves behind the executor pods named spark-exec-#, which cause naming 
> conflicts and failures for the next job submission.
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://:6443/api/v1/namespaces/default/pods. Message: pods 
> "spark-exec-4" already exists. Received status: Status(apiVersion=v1, 
> code=409, details=StatusDetails(causes=[], group=null, kind=pods, 
> name=spark-exec-4, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=pods "spark-exec-4" already 
> exists, metadata=ListMeta(resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=AlreadyExists, status=Failure, 
> additionalProperties={}).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Yinan Li (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599304#comment-16599304
 ] 

Yinan Li commented on SPARK-24434:
--

[~skonto] we can understand your feeling and frustration on this, and we really 
appreciate your work driving the design. AFAIK, the PR created by [~onursatici] 
follows the design (you are helping reviewing it so you can judge if this is 
the case). I think the situation was that people wanted to move this forward 
(granted that you were driving this) while you were on vacation and thought it 
would be good to get the ball rolling with a WIP PR that everyone could comment 
and give early feedbacks on. The fact that no one knew how far you had gone on 
the implementation before you started your vacation is probably also a factor 
here. Anyway, with that being said, we really appreciate your work driving the 
design and reviewing the PR! If you want to have further discussion on this and 
have ideas on how to better coordinate on big features in the future, let us 
know and we can bring it up at the next sig meeting. 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23781) Merge YARN and Mesos token renewal code

2018-08-31 Thread Marcelo Vanzin (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-23781:
---
Description: 
With the fix for SPARK-23361, the code that handles delegation tokens in Mesos 
and YARN ends up being very similar.

We should refactor that code so that both backends are sharing the same code, 
which also would make it easier for other cluster managers to use that code.

  was:
With the fix for SPARK-23361, the code that handles delegation tokens in Mesos 
and YARN ends up being very similar.

We shouyld refactor that code so that both backends are sharing the same code, 
which also would make it easier for other cluster managers to use that code.


> Merge YARN and Mesos token renewal code
> ---
>
> Key: SPARK-23781
> URL: https://issues.apache.org/jira/browse/SPARK-23781
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, YARN
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> With the fix for SPARK-23361, the code that handles delegation tokens in 
> Mesos and YARN ends up being very similar.
> We should refactor that code so that both backends are sharing the same code, 
> which also would make it easier for other cluster managers to use that code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25283) A deadlock in UnionRDD

2018-08-31 Thread Maxim Gekk (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk resolved SPARK-25283.

   Resolution: Fixed
Fix Version/s: 2.4.0

It is fixed by the PR: https://github.com/apache/spark/pull/22292

> A deadlock in UnionRDD
> --
>
> Key: SPARK-25283
> URL: https://issues.apache.org/jira/browse/SPARK-25283
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Major
> Fix For: 2.4.0
>
>
> The PR https://github.com/apache/spark/pull/21913 replaced Scala parallel 
> collections in UnionRDD by new parmap function. This changes cause a deadlock 
> in the partitions method. The code demonstrates the problem:
> {code:scala}
> val wide = 20
> def unionRDD(num: Int): UnionRDD[Int] = {
>   val rdds = (0 until num).map(_ => sc.parallelize(1 to 10, 1))
>   new UnionRDD(sc, rdds)
> }
> val level0 = (0 until wide).map { _ =>
>   val level1 = (0 until wide).map(_ => unionRDD(wide))
>   new UnionRDD(sc, level1)
> }
> val rdd = new UnionRDD(sc, level0)
> rdd.partitions.length
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Erik Erlandson (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599299#comment-16599299
 ] 

Erik Erlandson commented on SPARK-24434:


To amplify a little from my points above: I co-chair a SIG that is attended by 
some Apache Spark contributors, most frequently people involved around the 
kubernetes back-end. As chair, I do my best to provide input on the discussions 
we have there. However, the various community participants are their own 
independent entities; nobody in this community takes orders from me.

When everything is running smoothly, this kind of duplicated effort should 
never happen. Here things didn't go smoothly, and I hope to work it out as best 
we can.

[~skonto] I encourage you to post your dev on this feature, which allows 
everyone to discuss all the available options.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599284#comment-16599284
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 9:15 PM:
--

[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree, its ok. I understand why 
its important not to violate the FOSS stuff and communicate that all here was 
fine, but honestly that this is not the point I am trying to make. Anyway I 
will refrain from adding more comments it does not make any sense.


was (Author: skonto):
[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree, its ok. I understand why 
its important not to violate the FOSS stuff, but honestly that this is not the 
point I am trying to make. I am disappointed, anyway I will refrain from adding 
more comments it does not make any sense.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599284#comment-16599284
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 9:14 PM:
--

[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree, its ok. I understand why 
its important not to violate the FOSS stuff, but honestly that this is not the 
point I am trying to make. I am disappointed, anyway I will refrain from adding 
more comments it does not make any sense.


was (Author: skonto):
[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree, its ok. I understand why 
its important not to violate the FOSS stuff, but honestly that this is not the 
point.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599284#comment-16599284
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 9:13 PM:
--

[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree, its ok. I understand why 
its important not to violate the FOSS stuff, but honestly that this is not the 
point.


was (Author: skonto):
[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599284#comment-16599284
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 9:12 PM:
--

[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree.


was (Author: skonto):
[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599284#comment-16599284
 ] 

Stavros Kontopoulos commented on SPARK-24434:
-

[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

[
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
]

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 9:09 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the meetings) should
go in 2.4 and you responded that this cannot be the case, as it needs testing
etc. I had no objection. It seems now that Palantir is pushing this for their
own reasons and will be marked as experimental. The PR created though is not
that big (not complete yet if you ask me). Given that I suspect we had time
back then even with the old dates of the 2.4 cut to make a similar PR. Next
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
@liyinan926 @eje I am working on the pod template PR but i will be off for a
couple of weeks, work more on that after.

eje [5:04 PM]
@Stavros thanks!

liyinan926 [7:29 PM]
@Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are
working on something. They dont need to create a WIP PR AFAIK (Next time I will
just commit a few lines of code to declare assignment ).

3) Copying again from the meeting notes:
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

15th of August
* Custom YAML
* Stavros on vacation
* Complete it without him?
* Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket,
because I only checked emails on my vacations for good or bad. You could have
pinged me though not just do it without me. I created the design doc dont you
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know
the status of things, also in the minutes doc I dont see any decision about who
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do
double effort.

If the whole thing looks ok in terms of collaboration on this project, then
fine, my misunderstanding then, will adapt. It is not awkward at all, people on
the call decided to assign it to Palantir guys without me knowing anything,
that's all. Nobody is obliged to inform me about anything, im just a
contributor here, but I took it for granted that this would be the case when
collaborating in a healthy community, my mistake.

was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

2) Before I leave on vacations I left a comment on our slack channel, not to
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
@liyinan926 @eje I am working on the pod template PR but i will be off for a
couple of weeks, work more on that after.

eje [5:04 PM]
@Stavros thanks!

liyinan926 [7:29 PM]
@Stavros Thanks for working on that!

As you see you agreed that im working on it no?

3) Copying again from the meeting notes:
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

15th of August
* Custom YAML
* Stavros on vacation
* Complete it without him?
* Palantir will look at completing

4) I almost always join the meetings and im active on the project. But nobody
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know
the status of things, also in the minutes doc I dont see any decision about who
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do
double effort.

If the whole thing looks ok in terms of

[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Erik Erlandson (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599283#comment-16599283
 ] 

Erik Erlandson commented on SPARK-24434:


Stavros, yes, I knew you were working on it, and also that there were no plans 
for 2.4.

As I said above, it is generally more efficient and respectful to coordinate 
with issue assignees. I did not request this second PR. On the other hand, 
multiple PRs for an issue doesn't violate any FOSS principles, it means there 
should be a community discussion about which PR ought to be pursued.

I'm not aware of any renewed push to get this into 2.4.  I don't see any 
discussion about it on dev@spark.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25257) v2 MicroBatchReaders can't resume from checkpoints

2018-08-31 Thread Shixiong Zhu (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599279#comment-16599279
 ] 

Shixiong Zhu commented on SPARK-25257:
--

[~mojodna] This issue has been fixed in SPARK-23092.

> v2 MicroBatchReaders can't resume from checkpoints
> --
>
> Key: SPARK-25257
> URL: https://issues.apache.org/jira/browse/SPARK-25257
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Seth Fitzsimmons
>Priority: Major
> Attachments: deserialize.patch
>
>
> When resuming from a checkpoint:
> {code:java}
> writeStream.option("checkpointLocation", 
> "/tmp/checkpoint").format("console").start
> {code}
> The stream reader fails with:
> {noformat}
> osmesa.common.streaming.AugmentedDiffMicroBatchReader@59e19287
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.execution.streaming.SerializedOffset cannot be cast to 
> org.apache.spark.sql.sources.v2.reader.streaming.Offset
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:405)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:390)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at 
> org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
>   at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at 
> org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:389)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:133)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
>   ... 1 more
> {noformat}
> The root cause appears to be that the {{SerializedOffset}} (JSON, from disk) 
> is never deserialized; I would expect to see something along the lines of 
> {{reader.deserializeOffset(off.json)}} here (unless {{available}} is intended 
> to be deserialized elsewhere):

[jira] [Resolved] (SPARK-25257) v2 MicroBatchReaders can't resume from checkpoints

2018-08-31 Thread Shixiong Zhu (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-25257.
--
Resolution: Duplicate

> v2 MicroBatchReaders can't resume from checkpoints
> --
>
> Key: SPARK-25257
> URL: https://issues.apache.org/jira/browse/SPARK-25257
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Seth Fitzsimmons
>Priority: Major
> Attachments: deserialize.patch
>
>
> When resuming from a checkpoint:
> {code:java}
> writeStream.option("checkpointLocation", 
> "/tmp/checkpoint").format("console").start
> {code}
> The stream reader fails with:
> {noformat}
> osmesa.common.streaming.AugmentedDiffMicroBatchReader@59e19287
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.execution.streaming.SerializedOffset cannot be cast to 
> org.apache.spark.sql.sources.v2.reader.streaming.Offset
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:405)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:390)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at 
> org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
>   at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at 
> org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:389)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:133)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
>   ... 1 more
> {noformat}
> The root cause appears to be that the {{SerializedOffset}} (JSON, from disk) 
> is never deserialized; I would expect to see something along the lines of 
> {{reader.deserializeOffset(off.json)}} here (unless {{available}} is intended 
> to be deserialized elsewhere):
>

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates