[jira] [Assigned] (SPARK-25301) When a view uses an UDF from a non default database, Spark analyser throws AnalysisException

2018-08-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25301:


Assignee: Apache Spark

> When a view uses an UDF from a non default database, Spark analyser throws 
> AnalysisException
> 
>
> Key: SPARK-25301
> URL: https://issues.apache.org/jira/browse/SPARK-25301
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Vinod KC
>Assignee: Apache Spark
>Priority: Minor
>
> When a hive view uses an UDF from a non default database, Spark analyser 
> throws AnalysisException
> Steps to simulate this issue
>  -
>  In Hive
>  
>  1) CREATE DATABASE d100;
>  2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 
> 'com.uzx.udf.Masking'
>  3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is 
> created in d100
>  4) create view d100.v100 as select *d100.udf100*(name)  from default.emp; // 
> Note : table default.emp has two columns 'nanme', 'address', 
>  5) select * from d100.v100; // query on view d100.v100 gives correct result
> In Spark
>  -
>  1) spark.sql("select * from d100.v100").show
>  throws 
>  ```
>  org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. 
> This function is neither a registered temporary function nor a permanent 
> function registered in the database '*default*'
>  ```
> This is because, while parsing the SQL statement of the View 'select 
> `d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to 
> split database name and udf name and hence Spark function registry tries to 
> load the UDF 'd100.udf100' from 'default' database.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25301) When a view uses an UDF from a non default database, Spark analyser throws AnalysisException

2018-08-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25301:


Assignee: (was: Apache Spark)

> When a view uses an UDF from a non default database, Spark analyser throws 
> AnalysisException
> 
>
> Key: SPARK-25301
> URL: https://issues.apache.org/jira/browse/SPARK-25301
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Vinod KC
>Priority: Minor
>
> When a hive view uses an UDF from a non default database, Spark analyser 
> throws AnalysisException
> Steps to simulate this issue
>  -
>  In Hive
>  
>  1) CREATE DATABASE d100;
>  2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 
> 'com.uzx.udf.Masking'
>  3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is 
> created in d100
>  4) create view d100.v100 as select *d100.udf100*(name)  from default.emp; // 
> Note : table default.emp has two columns 'nanme', 'address', 
>  5) select * from d100.v100; // query on view d100.v100 gives correct result
> In Spark
>  -
>  1) spark.sql("select * from d100.v100").show
>  throws 
>  ```
>  org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. 
> This function is neither a registered temporary function nor a permanent 
> function registered in the database '*default*'
>  ```
> This is because, while parsing the SQL statement of the View 'select 
> `d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to 
> split database name and udf name and hence Spark function registry tries to 
> load the UDF 'd100.udf100' from 'default' database.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25301) When a view uses an UDF from a non default database, Spark analyser throws AnalysisException

2018-08-31 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599540#comment-16599540
 ] 

Apache Spark commented on SPARK-25301:
--

User 'vinodkc' has created a pull request for this issue:
https://github.com/apache/spark/pull/22307

> When a view uses an UDF from a non default database, Spark analyser throws 
> AnalysisException
> 
>
> Key: SPARK-25301
> URL: https://issues.apache.org/jira/browse/SPARK-25301
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Vinod KC
>Priority: Minor
>
> When a hive view uses an UDF from a non default database, Spark analyser 
> throws AnalysisException
> Steps to simulate this issue
>  -
>  In Hive
>  
>  1) CREATE DATABASE d100;
>  2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 
> 'com.uzx.udf.Masking'
>  3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is 
> created in d100
>  4) create view d100.v100 as select *d100.udf100*(name)  from default.emp; // 
> Note : table default.emp has two columns 'nanme', 'address', 
>  5) select * from d100.v100; // query on view d100.v100 gives correct result
> In Spark
>  -
>  1) spark.sql("select * from d100.v100").show
>  throws 
>  ```
>  org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. 
> This function is neither a registered temporary function nor a permanent 
> function registered in the database '*default*'
>  ```
> This is because, while parsing the SQL statement of the View 'select 
> `d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to 
> split database name and udf name and hence Spark function registry tries to 
> load the UDF 'd100.udf100' from 'default' database.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25301) When a view uses an UDF from a non default database, Spark analyser throws AnalysisException

2018-08-31 Thread Vinod KC (JIRA)
Vinod KC created SPARK-25301:


 Summary: When a view uses an UDF from a non default database, 
Spark analyser throws AnalysisException
 Key: SPARK-25301
 URL: https://issues.apache.org/jira/browse/SPARK-25301
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Vinod KC


When a hive view uses an UDF from a non default database, Spark analyser throws 
AnalysisException

Steps to simulate this issue
 -
 In Hive
 
 1) CREATE DATABASE d100;
 2) ADD JAR /usr/udf/masking.jar // masking.jar has a custom udf class 
'com.uzx.udf.Masking'
 3) create function d100.udf100 as "com.uzx.udf.Masking"; // Note: udf100 is 
created in d100
 4) create view d100.v100 as select *d100.udf100*(name)  from default.emp; // 
Note : table default.emp has two columns 'nanme', 'address', 
 5) select * from d100.v100; // query on view d100.v100 gives correct result

In Spark
 -
 1) spark.sql("select * from d100.v100").show
 throws 
 ```
 org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. 
This function is neither a registered temporary function nor a permanent 
function registered in the database '*default*'
 ```

This is because, while parsing the SQL statement of the View 'select 
`d100.udf100`(`emp`.`name`) from `default`.`emp`' , spark parser fails to split 
database name and udf name and hence Spark function registry tries to load the 
UDF 'd100.udf100' from 'default' database.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23466) Remove redundant null checks in generated Java code by GenerateUnsafeProjection

2018-08-31 Thread Takuya Ueshin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-23466.
---
   Resolution: Fixed
 Assignee: Kazuaki Ishizaki
Fix Version/s: 2.4.0

Issue resolved by pull request 20637
https://github.com/apache/spark/pull/20637

> Remove redundant null checks in generated Java code by 
> GenerateUnsafeProjection
> ---
>
> Key: SPARK-23466
> URL: https://issues.apache.org/jira/browse/SPARK-23466
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Major
> Fix For: 2.4.0
>
>
> One of TODOs in {{GenerateUnsafeProjection}} is "if the nullability of field 
> is correct, we can use it to save null check" to simplify generated code.
> When {{nullable=false}} in {{DataType}}, {{GenerateUnsafeProjection}} removed 
> code for null checks in the generated Java code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25300) Unified the configuration parameter `spark.shuffle.service.enabled`

2018-08-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25300:


Assignee: (was: Apache Spark)

> Unified the configuration parameter `spark.shuffle.service.enabled`
> ---
>
> Key: SPARK-25300
> URL: https://issues.apache.org/jira/browse/SPARK-25300
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: liuxian
>Priority: Minor
>
> The configuration parameter "spark.shuffle.service.enabled"  has defined in 
> `package.scala`,  and it  is also used in many place, so we can replace it 
> with `SHUFFLE_SERVICE_ENABLED`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25300) Unified the configuration parameter `spark.shuffle.service.enabled`

2018-08-31 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599512#comment-16599512
 ] 

Apache Spark commented on SPARK-25300:
--

User '10110346' has created a pull request for this issue:
https://github.com/apache/spark/pull/22306

> Unified the configuration parameter `spark.shuffle.service.enabled`
> ---
>
> Key: SPARK-25300
> URL: https://issues.apache.org/jira/browse/SPARK-25300
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: liuxian
>Priority: Minor
>
> The configuration parameter "spark.shuffle.service.enabled"  has defined in 
> `package.scala`,  and it  is also used in many place, so we can replace it 
> with `SHUFFLE_SERVICE_ENABLED`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25300) Unified the configuration parameter `spark.shuffle.service.enabled`

2018-08-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25300:


Assignee: Apache Spark

> Unified the configuration parameter `spark.shuffle.service.enabled`
> ---
>
> Key: SPARK-25300
> URL: https://issues.apache.org/jira/browse/SPARK-25300
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: liuxian
>Assignee: Apache Spark
>Priority: Minor
>
> The configuration parameter "spark.shuffle.service.enabled"  has defined in 
> `package.scala`,  and it  is also used in many place, so we can replace it 
> with `SHUFFLE_SERVICE_ENABLED`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25300) Unified the configuration parameter `spark.shuffle.service.enabled`

2018-08-31 Thread liuxian (JIRA)
liuxian created SPARK-25300:
---

 Summary: Unified the configuration parameter 
`spark.shuffle.service.enabled`
 Key: SPARK-25300
 URL: https://issues.apache.org/jira/browse/SPARK-25300
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: liuxian


The configuration parameter "spark.shuffle.service.enabled"  has defined in 
`package.scala`,  and it  is also used in many place, so we can replace it with 
`SHUFFLE_SERVICE_ENABLED`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25299) Use distributed storage for persisting shuffle data

2018-08-31 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599428#comment-16599428
 ] 

Matt Cheah commented on SPARK-25299:


 

Note that SPARK-1529 was a much earlier feature request that is more or less 
identical to this one, but the old age of SPARK-1529 led me to open this newer 
issue instead of re-opening the old one. If it is preferable to use the old 
issue we can do that as well.

> Use distributed storage for persisting shuffle data
> ---
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25299) Use distributed storage for persisting shuffle data

2018-08-31 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599428#comment-16599428
 ] 

Matt Cheah edited comment on SPARK-25299 at 9/1/18 12:27 AM:
-

Note that SPARK-1529 was a much earlier feature request that is more or less 
identical to this one, but the old age of SPARK-1529 led me to open this newer 
issue instead of re-opening the old one. If it is preferable to use the old 
issue we can do that as well.


was (Author: mcheah):
 

Note that SPARK-1529 was a much earlier feature request that is more or less 
identical to this one, but the old age of SPARK-1529 led me to open this newer 
issue instead of re-opening the old one. If it is preferable to use the old 
issue we can do that as well.

> Use distributed storage for persisting shuffle data
> ---
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25299) Use distributed storage for persisting shuffle data

2018-08-31 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-25299:
--

 Summary: Use distributed storage for persisting shuffle data
 Key: SPARK-25299
 URL: https://issues.apache.org/jira/browse/SPARK-25299
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Affects Versions: 2.4.0
Reporter: Matt Cheah


In Spark, the shuffle primitive requires Spark executors to persist data to the 
local disk of the worker nodes. If executors crash, the external shuffle 
service can continue to serve the shuffle data that was written beyond the 
lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
external shuffle service is deployed on every worker node. The shuffle service 
shares local disk with the executors that run on its node.

There are some shortcomings with the way shuffle is fundamentally implemented 
right now. Particularly:
 * If any external shuffle service process or node becomes unavailable, all 
applications that had an executor that ran on that node must recompute the 
shuffle blocks that were lost.
 * Similarly to the above, the external shuffle service must be kept running at 
all times, which may waste resources when no applications are using that 
shuffle service node.
 * Mounting local storage can prevent users from taking advantage of desirable 
isolation benefits from using containerized environments, like Kubernetes. We 
had an external shuffle service implementation in an early prototype of the 
Kubernetes backend, but it was rejected due to its strict requirement to be 
able to mount hostPath volumes or other persistent volume setups.

In the following [architecture discussion 
document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
 (note: _not_ an SPIP), we brainstorm various high level architectures for 
improving the external shuffle service in a way that addresses the above 
problems. The purpose of this umbrella JIRA is to promote additional discussion 
on how we can approach these problems, both at the architecture level and the 
implementation level. We anticipate filing sub-issues that break down the tasks 
that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599380#comment-16599380
 ] 

Felix Cheung commented on SPARK-24434:
--

.. and let's make sure any discussion are summarized and communicated as per 
ASF policy (eg. update this JIRA) and be mindful of other's contribution, work 
schedule and life style etc.

Sounds like in this case we could:
 * record any discussion in k8s sig, slack or offline and communicate in this 
Jira and/or to [d...@spark.apache.org|mailto:d...@spark.apache.org]
 * give plenty of heads up time to the originator, eg. give people 3-4 days to 
response, react etc after directly pinging the person on the Jira or email
 * make sure due credit is given in JIra, github PR description etc and link to 
any history or reference design doc
 * if this PR [https://github.com/apache/spark/pull/22146] is intended to be a 
WIP, please mark and describe it as such. As of now I don't see any indication 
of it

I believe this would then follow more closely to the convention we have adopted 
for the Apache Spark project. As stated, we do not assign Jira to user until 
after it is merged and Jira to be resolved, for various reasons. However, 
typically contributor expresses their desire to work on something in Jira or 
dev@ and wait a bit for feedback or comment.

[~onursatici] could you please update your PR to the effect outlined above?

[~skonto] hopefully this make sense to you and we (Spark, k8s communities) 
would still love to work with you, and would like your feedback on guiding the 
PR to completion.

 

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25264) Fix comma-delineated arguments passed into PythonRunner and RRunner

2018-08-31 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-25264.

   Resolution: Fixed
Fix Version/s: 2.4.0

> Fix comma-delineated arguments passed into PythonRunner and RRunner
> ---
>
> Key: SPARK-25264
> URL: https://issues.apache.org/jira/browse/SPARK-25264
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 2.4.0
>Reporter: Ilan Filonenko
>Priority: Major
> Fix For: 2.4.0
>
>
> The arguments passed into the PythonRunner and RRunner are comma-delineated. 
> Because the Runners do a arg.slice(2,...) This means that the delineation in 
> the entrypoint needs to be a space, as it would be expected by the Runner 
> arguments. 
> This issue was logged here: 
> [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/273]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25282) Fix support for spark-shell with K8s

2018-08-31 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599310#comment-16599310
 ] 

Yinan Li commented on SPARK-25282:
--

I'm not sure this is a bug and how this should be enforced systematically. When 
you use the client mode and run the driver outside a cluster on a host, you are 
using the Spark distribution on the host, which may or may not have the same 
version as that of the Spark jars in the image. I guess this is not even a 
unique problem to Spark on Kubernetes.

> Fix support for spark-shell with K8s
> 
>
> Key: SPARK-25282
> URL: https://issues.apache.org/jira/browse/SPARK-25282
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Spark shell when run with kubernetes master, gives following errors.
> {noformat}
> java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local 
> class incompatible: stream classdesc serialVersionUID = -3720498261147521051, 
> local class serialVersionUID = -6655865447853211720
>   at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1630)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
> {noformat}
> Special care was taken to ensure, the same compiled jar was used both in 
> images and the host system. or system running the driver.
> This issue affects, pyspark and R interface as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25295) Pod names conflicts in client mode, if previous submission was not a clean shutdown.

2018-08-31 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599308#comment-16599308
 ] 

Yinan Li commented on SPARK-25295:
--

We made it clear in the documentation of the Kubernetes mode at 
[https://github.com/apache/spark/blob/master/docs/running-on-kubernetes.md#client-mode-executor-pod-garbage-collection]
 that when running the client mode, executor pods may be left behind. This is 
by design. If you want to have the executor pods deleted automatically, run the 
driver in a pod inside the cluster and set {{spark.driver.pod.name}} to the 
name of the driver pod so an {{OwnerReference}} pointing to the driver pod gets 
added to the executor pods. This way the executor pods get garbage collected 
when the driver pod is gone.

> Pod names conflicts in client mode, if previous submission was not a clean 
> shutdown.
> 
>
> Key: SPARK-25295
> URL: https://issues.apache.org/jira/browse/SPARK-25295
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Prashant Sharma
>Priority: Major
>
> If the previous job was killed somehow, by disconnecting the client. It 
> leaves behind the executor pods named spark-exec-#, which cause naming 
> conflicts and failures for the next job submission.
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://:6443/api/v1/namespaces/default/pods. Message: pods 
> "spark-exec-4" already exists. Received status: Status(apiVersion=v1, 
> code=409, details=StatusDetails(causes=[], group=null, kind=pods, 
> name=spark-exec-4, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=pods "spark-exec-4" already 
> exists, metadata=ListMeta(resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=AlreadyExists, status=Failure, 
> additionalProperties={}).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Yinan Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599304#comment-16599304
 ] 

Yinan Li commented on SPARK-24434:
--

[~skonto] we can understand your feeling and frustration on this, and we really 
appreciate your work driving the design. AFAIK, the PR created by [~onursatici] 
follows the design (you are helping reviewing it so you can judge if this is 
the case). I think the situation was that people wanted to move this forward 
(granted that you were driving this) while you were on vacation and thought it 
would be good to get the ball rolling with a WIP PR that everyone could comment 
and give early feedbacks on. The fact that no one knew how far you had gone on 
the implementation before you started your vacation is probably also a factor 
here. Anyway, with that being said, we really appreciate your work driving the 
design and reviewing the PR! If you want to have further discussion on this and 
have ideas on how to better coordinate on big features in the future, let us 
know and we can bring it up at the next sig meeting. 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23781) Merge YARN and Mesos token renewal code

2018-08-31 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated SPARK-23781:
---
Description: 
With the fix for SPARK-23361, the code that handles delegation tokens in Mesos 
and YARN ends up being very similar.

We should refactor that code so that both backends are sharing the same code, 
which also would make it easier for other cluster managers to use that code.

  was:
With the fix for SPARK-23361, the code that handles delegation tokens in Mesos 
and YARN ends up being very similar.

We shouyld refactor that code so that both backends are sharing the same code, 
which also would make it easier for other cluster managers to use that code.


> Merge YARN and Mesos token renewal code
> ---
>
> Key: SPARK-23781
> URL: https://issues.apache.org/jira/browse/SPARK-23781
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, YARN
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> With the fix for SPARK-23361, the code that handles delegation tokens in 
> Mesos and YARN ends up being very similar.
> We should refactor that code so that both backends are sharing the same code, 
> which also would make it easier for other cluster managers to use that code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25283) A deadlock in UnionRDD

2018-08-31 Thread Maxim Gekk (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk resolved SPARK-25283.

   Resolution: Fixed
Fix Version/s: 2.4.0

It is fixed by the PR: https://github.com/apache/spark/pull/22292

> A deadlock in UnionRDD
> --
>
> Key: SPARK-25283
> URL: https://issues.apache.org/jira/browse/SPARK-25283
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Major
> Fix For: 2.4.0
>
>
> The PR https://github.com/apache/spark/pull/21913 replaced Scala parallel 
> collections in UnionRDD by new parmap function. This changes cause a deadlock 
> in the partitions method. The code demonstrates the problem:
> {code:scala}
> val wide = 20
> def unionRDD(num: Int): UnionRDD[Int] = {
>   val rdds = (0 until num).map(_ => sc.parallelize(1 to 10, 1))
>   new UnionRDD(sc, rdds)
> }
> val level0 = (0 until wide).map { _ =>
>   val level1 = (0 until wide).map(_ => unionRDD(wide))
>   new UnionRDD(sc, level1)
> }
> val rdd = new UnionRDD(sc, level0)
> rdd.partitions.length
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Erik Erlandson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599299#comment-16599299
 ] 

Erik Erlandson commented on SPARK-24434:


To amplify a little from my points above: I co-chair a SIG that is attended by 
some Apache Spark contributors, most frequently people involved around the 
kubernetes back-end. As chair, I do my best to provide input on the discussions 
we have there. However, the various community participants are their own 
independent entities; nobody in this community takes orders from me.

When everything is running smoothly, this kind of duplicated effort should 
never happen. Here things didn't go smoothly, and I hope to work it out as best 
we can.

[~skonto] I encourage you to post your dev on this feature, which allows 
everyone to discuss all the available options.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599284#comment-16599284
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 9:15 PM:
--

[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree, its ok. I understand why 
its important not to violate the FOSS stuff and communicate that all here was 
fine, but honestly that this is not the point I am trying to make. Anyway I 
will refrain from adding more comments it does not make any sense.


was (Author: skonto):
[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree, its ok. I understand why 
its important not to violate the FOSS stuff, but honestly that this is not the 
point I am trying to make. I am disappointed, anyway I will refrain from adding 
more comments it does not make any sense.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599284#comment-16599284
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 9:14 PM:
--

[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree, its ok. I understand why 
its important not to violate the FOSS stuff, but honestly that this is not the 
point I am trying to make. I am disappointed, anyway I will refrain from adding 
more comments it does not make any sense.


was (Author: skonto):
[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree, its ok. I understand why 
its important not to violate the FOSS stuff, but honestly that this is not the 
point.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599284#comment-16599284
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 9:13 PM:
--

[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree, its ok. I understand why 
its important not to violate the FOSS stuff, but honestly that this is not the 
point.


was (Author: skonto):
[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599284#comment-16599284
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 9:12 PM:
--

[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine. Not to mention that this case has nothing to do with multiple PRs, we had 
a design doc we agreed upon it. Anyway we disagree.


was (Author: skonto):
[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599284#comment-16599284
 ] 

Stavros Kontopoulos commented on SPARK-24434:
-

[~eje] if not mistaken all of us are on the same meeting including Palantir 
guys no? Do you see value when we are on the same call doing double work? If so 
fine.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 9:09 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration on this project, then 
fine, my misunderstanding then, will adapt. It is not awkward at all, people on 
the call decided to assign it to Palantir guys without me knowing anything, 
that's all. Nobody is obliged to inform me about anything, im just a 
contributor here, but I took it for granted that this would be the case when 
collaborating in a healthy community, my mistake.


was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of 

[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Erik Erlandson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599283#comment-16599283
 ] 

Erik Erlandson commented on SPARK-24434:


Stavros, yes, I knew you were working on it, and also that there were no plans 
for 2.4.

As I said above, it is generally more efficient and respectful to coordinate 
with issue assignees. I did not request this second PR. On the other hand, 
multiple PRs for an issue doesn't violate any FOSS principles, it means there 
should be a community discussion about which PR ought to be pursued.

I'm not aware of any renewed push to get this into 2.4.  I don't see any 
discussion about it on dev@spark.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25257) v2 MicroBatchReaders can't resume from checkpoints

2018-08-31 Thread Shixiong Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599279#comment-16599279
 ] 

Shixiong Zhu commented on SPARK-25257:
--

[~mojodna] This issue has been fixed in SPARK-23092.

> v2 MicroBatchReaders can't resume from checkpoints
> --
>
> Key: SPARK-25257
> URL: https://issues.apache.org/jira/browse/SPARK-25257
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Seth Fitzsimmons
>Priority: Major
> Attachments: deserialize.patch
>
>
> When resuming from a checkpoint:
> {code:java}
> writeStream.option("checkpointLocation", 
> "/tmp/checkpoint").format("console").start
> {code}
> The stream reader fails with:
> {noformat}
> osmesa.common.streaming.AugmentedDiffMicroBatchReader@59e19287
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.execution.streaming.SerializedOffset cannot be cast to 
> org.apache.spark.sql.sources.v2.reader.streaming.Offset
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:405)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:390)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at 
> org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
>   at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at 
> org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:389)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:133)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
>   ... 1 more
> {noformat}
> The root cause appears to be that the {{SerializedOffset}} (JSON, from disk) 
> is never deserialized; I would expect to see something along the lines of 
> {{reader.deserializeOffset(off.json)}} here (unless {{available}} is intended 
> to be deserialized elsewhere):

[jira] [Resolved] (SPARK-25257) v2 MicroBatchReaders can't resume from checkpoints

2018-08-31 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-25257.
--
Resolution: Duplicate

> v2 MicroBatchReaders can't resume from checkpoints
> --
>
> Key: SPARK-25257
> URL: https://issues.apache.org/jira/browse/SPARK-25257
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Seth Fitzsimmons
>Priority: Major
> Attachments: deserialize.patch
>
>
> When resuming from a checkpoint:
> {code:java}
> writeStream.option("checkpointLocation", 
> "/tmp/checkpoint").format("console").start
> {code}
> The stream reader fails with:
> {noformat}
> osmesa.common.streaming.AugmentedDiffMicroBatchReader@59e19287
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.execution.streaming.SerializedOffset cannot be cast to 
> org.apache.spark.sql.sources.v2.reader.streaming.Offset
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:405)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1$$anonfun$apply$9.apply(MicroBatchExecution.scala:390)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at 
> org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
>   at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at 
> org.apache.spark.sql.execution.streaming.StreamProgress.flatMap(StreamProgress.scala:25)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$1.apply(MicroBatchExecution.scala:390)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:389)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:133)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121)
>   at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
>   ... 1 more
> {noformat}
> The root cause appears to be that the {{SerializedOffset}} (JSON, from disk) 
> is never deserialized; I would expect to see something along the lines of 
> {{reader.deserializeOffset(off.json)}} here (unless {{available}} is intended 
> to be deserialized elsewhere):
> 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:51 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration on this project, then 
fine, my misunderstanding then, will adapt. It is not awkward at all, people on 
the call decided to assign it to Palantir guys without me knowing anything, 
that's all. Nobody is obliged to inform me about anything, im just a 
contributor here, but I took it for granted that this would be the case when 
collaborating in a healthy community.


was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration, 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:48 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration, then fine, my 
misunderstanding then, will adapt. It is not awkward at all, people on the call 
decided to assign it to Palantir guys without me knowing anything, that's all. 
Nobody is obliged to inform me about anything, im just a contributor here, but 
I took it for granted that this would be the case when collaborating in a 
healthy community.


was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration, then fine, my 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:47 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration, then fine, my 
misunderstanding then, will adapt. It is not awkward at all, people on the call 
decided to assign it to Palantir guys without me knowing anything, that's all. 
Nobody is obliged to inform me about anything, but I took it for granted that 
this is the case when collaborating in a healthy community.


was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration, then fine, my 
misunderstanding then, will adapt. 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:46 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration, then fine, my 
misunderstanding then, will adapt. It is not awkward at all, people on the call 
decided to assign it to Palantir guys without me knowing anything, that's all. 
Nobody is obliged to inform me about anything, but I thought that is normal 
when collaborate in a healthy community.


was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration, then fine, my 
misunderstanding then, will adapt. It is not awkward at 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:45 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration, then fine, my 
misunderstanding then, will adapt. It is not awkward at all, people on the call 
decided to assign it to Palantir guys without me knowing anything, that's all. 


was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration, then fine, my 
misunderstanding then, will adapt. It is not awkward people on the call decided 
to assign it to Palantir without knowing anything that's all. 

> Support user-specified 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:44 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration, then fine, my 
misunderstanding then, will adapt. It is not awkward people on the call decided 
to assign it to Palantir without knowing anything that's all. 


was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration, then fine, my 
misunderstanding then, will adapt.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:38 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole thing looks ok in terms of collaboration, then fine, my 
misunderstanding then, will adapt.


was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole things looks ok in terms of collaboration, then fine.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:37 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad. You could have 
pinged me though not just do it without me. I created the design doc dont you 
think I want to finish the work?

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole things looks ok in terms of collaboration, then fine.


was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad.

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole things looks ok in terms of collaboration, then fine.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:36 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]

 

15th of August
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

I missed that part sorry. I was expecting some update on the Jira ticket, 
because I only checked emails on my vacations for good or bad.

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole things looks ok in terms of collaboration, then fine.


was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole things looks ok in terms of collaboration, then fine.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:34 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole things looks ok in terms of collaboration, then fine.


was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole things looks ok, then fine.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:34 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created though is not 
that big (not complete yet if you ask me). Given that I suspect we had time 
back then even with the old dates of the 2.4 cut to make a similar PR. Next 
time will push harder.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole things looks ok, then fine.


was (Author: skonto):
[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created is not that big 
we had time back then even with the old dates of the 2.4 cut.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole things looks ok, then fine.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos commented on SPARK-24434:
-

[~eje] Personally I will just stick to the facts:

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created is not that big 
we had time back then even with the old dates of the 2.4 cut.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
@liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
@Stavros thanks!

liyinan926 [7:29 PM]
@Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]
 * 


 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole things looks ok, then fine.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599258#comment-16599258
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:32 PM:
--

[~eje] Personally I will just stick to the facts (the ones I am aware of):

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created is not that big 
we had time back then even with the old dates of the 2.4 cut.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
 @Stavros thanks!

liyinan926 [7:29 PM]
 @Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]
 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole things looks ok, then fine.


was (Author: skonto):
[~eje] Personally I will just stick to the facts:

1) Several weeks ago I asked if this feature (in one of the  meetings) should 
go in 2.4 and you responded that this cannot be the case, as it needs testing 
etc. I had no objection. It seems now that Palantir is pushing this for their 
own reasons and will be marked as experimental. The PR created is not that big 
we had time back then even with the old dates of the 2.4 cut.

2) Before I leave on vacations I left a comment on our slack channel, not to 
mention the explicit comment in this Jira above:

Stavros [3:27 PM]
@liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

eje [5:04 PM]
@Stavros thanks!

liyinan926 [7:29 PM]
@Stavros Thanks for working on that!

As you see you agreed that im working on it no?

In any other Jira I have seen before, people just need to state that they are 
working on something. They dont need to create a WIP PR AFAIK (Next time I will 
just commit a few lines of code to declare assignment ). 

3) Copying again from the meeting notes: 
[https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit]
 * 


 * Custom YAML
 * Stavros on vacation
 * Complete it without him?
 * Palantir will look at completing

4) I almost always join the meetings and im active on the project. But nobody 
me pinged AFAIK. Fine.

5) Palantir guys didnt update the Jira so all people (outside meetings) know 
the status of things, also in the minutes doc I dont see any decision about who 
is going to do the PR.

I think the reasonable thing to do is ask what I have done, so people dont do 
double effort. 

If the whole things looks ok, then fine.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Erik Erlandson (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599092#comment-16599092
 ] 

Erik Erlandson commented on SPARK-24434:


There are a few related, but separate, issues here.

I agree that it is most efficient, and considerate, to respect issue 
assignments and coordinate our distributed development around absences, etc.

To the best of my knowledge, the work Stavros did on 24434 was not made visible 
as a public WIP apache/spark branch. Making dev visible this way is one 
important way to minimize coordination problems.

Although this confusion is awkward, nothing in regard to 24434 has violated 
FOSS principles, or Spark governance. Onur's PR has been developed and reviewed 
on a public apache/spark branch. This Jira was filed, and has hosted discussion 
from all stakeholders.

The Kubernetes Big Data SIG is a separate community that overlaps with the 
Spark community. Our meetings are open to the public, and we publish recordings 
and meeting minutes. Although we discuss topics related to Spark on Kubernetes, 
we do not make Spark development decisions in that community. All of the work 
that members of the K8s Big Data SIG have contributed to Spark respects Apache 
governance and has been done using established Spark processes: SPIP, 
discussion on dev, Jira, and the PR workflow.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25286) Remove dangerous parmap

2018-08-31 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-25286.
-
   Resolution: Fixed
 Assignee: Maxim Gekk
Fix Version/s: 2.4.0

> Remove dangerous parmap
> ---
>
> Key: SPARK-25286
> URL: https://issues.apache.org/jira/browse/SPARK-25286
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 2.4.0
>
>
> One of parmap methods accepts an execution context created outside of parmap. 
> If the parmap method is called recursively on a thread pool limited by size, 
> it could lead to deadlocks. See the JIRA tickets: SPARK-25240 and SPARK-25283 
> . To eliminate the problems in the future, need to remove parmap() with the 
> signature:
> {code:scala}
> def parmap[I, O, Col[X] <: TraversableLike[X, Col[X]]]
>   (in: Col[I])
>   (f: I => O)
>   (implicit
> cbf: CanBuildFrom[Col[I], Future[O], Col[Future[O]]], // For in.map
> cbf2: CanBuildFrom[Col[Future[O]], O, Col[O]], // for Future.sequence
> ec: ExecutionContext
>   ): Col[O]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25279) Throw exception: zzcclp java.io.NotSerializableException: org.apache.spark.sql.TypedColumn in Spark-shell when run example of doc

2018-08-31 Thread Dilip Biswal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599048#comment-16599048
 ] 

Dilip Biswal commented on SPARK-25279:
--

Hello,

Tried against the latest trunk. Seems to work fine.
{code:java}

 scala> import org.apache.spark.sql.expressions.Aggregator
 import org.apache.spark.sql.expressions.Aggregator

scala> import org.apache.spark.sql.Encoder
 import org.apache.spark.sql.Encoder

scala> import org.apache.spark.sql.Encoders
 import org.apache.spark.sql.Encoders

scala> import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.SparkSession

scala>

scala> case class Employee(name: String, salary: Long)
 defined class Employee

scala> case class Average(var sum: Long, var count: Long)
 defined class Average

scala>

scala> object MyAverage extends Aggregator[Employee, Average, Double] {
|// A zero value for this aggregation. Should satisfy the property that any b + 
zero = b|
|def zero: Average = Average(0L, 0L)|
|// Combine two values to produce a new value. For performance, the function 
may modify `buffer`|
|// and return it instead of constructing a new object|
|def reduce(buffer: Average, employee: Employee): Average = \{ \| buffer.sum += 
employee.salary \| buffer.count += 1 \| buffer \| }|
|// Merge two intermediate values|
|def merge(b1: Average, b2: Average): Average = \{ \| b1.sum += b2.sum \| 
b1.count += b2.count \| b1 \| }|
|// Transform the output of the reduction|
|def finish(reduction: Average): Double = reduction.sum.toDouble / 
reduction.count|
|// Specifies the Encoder for the intermediate value type|
|def bufferEncoder: Encoder[Average] = Encoders.product|
|// Specifies the Encoder for the final output value type|
|def outputEncoder: Encoder[Double] = Encoders.scalaDouble|
|}
 defined object MyAverage|

scala>

scala> val ds = 
spark.read.json("examples/src/main/resources/employees.json").as[Employee]
 ds: org.apache.spark.sql.Dataset[Employee] = [name: string, salary: bigint]

scala> ds.show()
 ++-+
|name|salary|

++-+
|Michael|3000|
|Andy|4500|
|Justin|3500|
|Berta|4000|

++-+

scala> // ++-+

scala> // | name|salary|

scala> // ++-+

scala> // |Michael| 3000|

scala> // | Andy| 4500|

scala> // | Justin| 3500|

scala> // | Berta| 4000|

scala> // ++-+

scala>

scala> // Convert the function to a `TypedColumn` and give it a name

scala> val averageSalary = MyAverage.toColumn.name("average_salary")
 averageSalary: org.apache.spark.sql.TypedColumn[Employee,Double] = myaverage() 
AS `average_salary`

scala> val result = ds.select(averageSalary)
 result: org.apache.spark.sql.Dataset[Double] = [average_salary: double]

scala> result.show()
 +--+
|average_salary|

+--+
|3750.0|

+--+
 {code}

> Throw exception: zzcclp   java.io.NotSerializableException: 
> org.apache.spark.sql.TypedColumn in Spark-shell when run example of doc
> ---
>
> Key: SPARK-25279
> URL: https://issues.apache.org/jira/browse/SPARK-25279
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 2.2.1
>Reporter: Zhichao  Zhang
>Priority: Minor
>
> Hi dev: 
>   I am using Spark-Shell to run the example which is in section 
> '[http://spark.apache.org/docs/2.2.2/sql-programming-guide.html#type-safe-user-defined-aggregate-functions'],
>  
> and there is an error: 
> {code:java}
> Caused by: java.io.NotSerializableException: 
> org.apache.spark.sql.TypedColumn 
> Serialization stack: 
>         - object not serializable (class: org.apache.spark.sql.TypedColumn, 
> value: 
> myaverage() AS `average_salary`) 
>         - field (class: $iw, name: averageSalary, type: class 
> org.apache.spark.sql.TypedColumn) 
>         - object (class $iw, $iw@4b2f8ae9) 
>         - field (class: MyAverage$, name: $outer, type: class $iw) 
>         - object (class MyAverage$, MyAverage$@2be41d90) 
>         - field (class: 
> org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression, 
> name: aggregator, type: class org.apache.spark.sql.expressions.Aggregator) 
>         - object (class 
> org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression, 
> MyAverage(Employee)) 
>         - field (class: 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression, 
> name: aggregateFunction, type: class 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction) 
>         - object (class 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression, 
> partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class Employee)), 
> Some(class Employee), Some(StructType(StructField(name,StringType,true), 

[jira] [Commented] (SPARK-25294) Add integration test for Kerberos

2018-08-31 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598975#comment-16598975
 ] 

Marcelo Vanzin commented on SPARK-25294:


It might be hard to cover all cases since kerberos is, well, complicated. But 
there's a lot of ground we can cover by using Hadoop's {{MiniKdc}} in our 
tests. This has been on my "things to take a look at sometime" list for a few 
years...

> Add integration test for Kerberos 
> --
>
> Key: SPARK-25294
> URL: https://issues.apache.org/jira/browse/SPARK-25294
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Some changes may cause Kerberos issues, such as {{Yarn}}, {{Hive}}, {{HDFS}}. 
> we should add tests.
> https://issues.apache.org/jira/browse/SPARK-23789
> https://github.com/apache/spark/pull/21987#issuecomment-417560077



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25296) Create ExplainSuite

2018-08-31 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-25296.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

> Create ExplainSuite
> ---
>
> Key: SPARK-25296
> URL: https://issues.apache.org/jira/browse/SPARK-25296
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
> Fix For: 2.4.0
>
>
> Move the output verification of Explain to a new suite ExplainSuite. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24561) User-defined window functions with pandas udf (bounded window)

2018-08-31 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598885#comment-16598885
 ] 

Apache Spark commented on SPARK-24561:
--

User 'icexelloss' has created a pull request for this issue:
https://github.com/apache/spark/pull/22305

> User-defined window functions with pandas udf (bounded window)
> --
>
> Key: SPARK-24561
> URL: https://issues.apache.org/jira/browse/SPARK-24561
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.3.1
>Reporter: Li Jin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24561) User-defined window functions with pandas udf (bounded window)

2018-08-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24561:


Assignee: Apache Spark

> User-defined window functions with pandas udf (bounded window)
> --
>
> Key: SPARK-24561
> URL: https://issues.apache.org/jira/browse/SPARK-24561
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.3.1
>Reporter: Li Jin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24561) User-defined window functions with pandas udf (bounded window)

2018-08-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24561:


Assignee: (was: Apache Spark)

> User-defined window functions with pandas udf (bounded window)
> --
>
> Key: SPARK-24561
> URL: https://issues.apache.org/jira/browse/SPARK-24561
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 2.3.1
>Reporter: Li Jin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25294) Add integration test for Kerberos

2018-08-31 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598743#comment-16598743
 ] 

Sean Owen commented on SPARK-25294:
---

Sure, another test is good. What test? I don't have any knowledge of this area.

Regarding the Hadoop 2.7.7 issue, ideally that's worked-around as I don't 
expect Hadoop behavior will change back.

> Add integration test for Kerberos 
> --
>
> Key: SPARK-25294
> URL: https://issues.apache.org/jira/browse/SPARK-25294
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Some changes may cause Kerberos issues, such as {{Yarn}}, {{Hive}}, {{HDFS}}. 
> we should add tests.
> https://issues.apache.org/jira/browse/SPARK-23789
> https://github.com/apache/spark/pull/21987#issuecomment-417560077



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25298) spark-tools build failure for Scala 2.12

2018-08-31 Thread Darcy Shen (JIRA)
Darcy Shen created SPARK-25298:
--

 Summary: spark-tools build failure for Scala 2.12
 Key: SPARK-25298
 URL: https://issues.apache.org/jira/browse/SPARK-25298
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 2.4.0
Reporter: Darcy Shen


$ sbt--

> ++ 2.12.6

> compile

 

[error] 
/Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:22:
 object runtime is not a member of package reflect
[error] import scala.reflect.runtime.\{universe => unv}
[error]  ^
[error] 
/Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:23:
 object runtime is not a member of package reflect
[error] import scala.reflect.runtime.universe.runtimeMirror
[error]  ^
[error] 
/Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:41:
 not found: value runtimeMirror
[error]   private val mirror = runtimeMirror(classLoader)
[error]    ^
[error] 
/Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:43:
 not found: value unv
[error]   private def isPackagePrivate(sym: unv.Symbol) =



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25297) Future for Scala 2.12 will block on a already shutdown ExecutionContext

2018-08-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25297:


Assignee: (was: Apache Spark)

> Future for Scala 2.12 will block on a already shutdown ExecutionContext
> ---
>
> Key: SPARK-25297
> URL: https://issues.apache.org/jira/browse/SPARK-25297
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Priority: Major
>
> *+see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/193/]+*
> *The Units Test blocks on FileBasedWriteAheadLogWithFileCloseAfterWriteSuite 
> in Console Output.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25297) Future for Scala 2.12 will block on a already shutdown ExecutionContext

2018-08-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25297:


Assignee: Apache Spark

> Future for Scala 2.12 will block on a already shutdown ExecutionContext
> ---
>
> Key: SPARK-25297
> URL: https://issues.apache.org/jira/browse/SPARK-25297
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Assignee: Apache Spark
>Priority: Major
>
> *+see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/193/]+*
> *The Units Test blocks on FileBasedWriteAheadLogWithFileCloseAfterWriteSuite 
> in Console Output.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25297) Future for Scala 2.12 will block on a already shutdown ExecutionContext

2018-08-31 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598668#comment-16598668
 ] 

Apache Spark commented on SPARK-25297:
--

User 'sadhen' has created a pull request for this issue:
https://github.com/apache/spark/pull/22304

> Future for Scala 2.12 will block on a already shutdown ExecutionContext
> ---
>
> Key: SPARK-25297
> URL: https://issues.apache.org/jira/browse/SPARK-25297
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Priority: Major
>
> *+see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/193/]+*
> *The Units Test blocks on FileBasedWriteAheadLogWithFileCloseAfterWriteSuite 
> in Console Output.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25297) Future for Scala 2.12 will block on a already shutdown ExecutionContext

2018-08-31 Thread Darcy Shen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Darcy Shen updated SPARK-25297:
---
Description: 
*+see 
[https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/193/]+*

*The Units Test blocks on FileBasedWriteAheadLogWithFileCloseAfterWriteSuite in 
Console Output.*

> Future for Scala 2.12 will block on a already shutdown ExecutionContext
> ---
>
> Key: SPARK-25297
> URL: https://issues.apache.org/jira/browse/SPARK-25297
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Priority: Major
>
> *+see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/193/]+*
> *The Units Test blocks on FileBasedWriteAheadLogWithFileCloseAfterWriteSuite 
> in Console Output.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25297) Future for Scala 2.12 will block on a already shutdown ExecutionContext

2018-08-31 Thread Darcy Shen (JIRA)
Darcy Shen created SPARK-25297:
--

 Summary: Future for Scala 2.12 will block on a already shutdown 
ExecutionContext
 Key: SPARK-25297
 URL: https://issues.apache.org/jira/browse/SPARK-25297
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: Darcy Shen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25207) Case-insensitve field resolution for filter pushdown when reading Parquet

2018-08-31 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-25207.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 22197
[https://github.com/apache/spark/pull/22197]

> Case-insensitve field resolution for filter pushdown when reading Parquet
> -
>
> Key: SPARK-25207
> URL: https://issues.apache.org/jira/browse/SPARK-25207
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: yucai
>Assignee: yucai
>Priority: Major
>  Labels: Parquet
> Fix For: 2.4.0
>
> Attachments: image.png
>
>
> Currently, filter pushdown will not work if Parquet schema and Hive metastore 
> schema are in different letter cases even spark.sql.caseSensitive is false.
> Like the below case:
> {code:java}
> spark.range(10).write.parquet("/tmp/data")
> sql("DROP TABLE t")
> sql("CREATE TABLE t (ID LONG) USING parquet LOCATION '/tmp/data'")
> sql("select * from t where id > 0").show{code}
> -No filter will be pushed down.-
> {code}
> scala> sql("select * from t where id > 0").explain   // Filters are pushed 
> with `ID`
> == Physical Plan ==
> *(1) Project [ID#90L]
> +- *(1) Filter (isnotnull(id#90L) && (id#90L > 0))
>+- *(1) FileScan parquet default.t[ID#90L] Batched: true, Format: Parquet, 
> Location: InMemoryFileIndex[file:/tmp/data], PartitionFilters: [], 
> PushedFilters: [IsNotNull(ID), GreaterThan(ID,0)], ReadSchema: 
> struct
> scala> sql("select * from t").show// Parquet returns NULL for `ID` 
> because it has `id`.
> ++
> |  ID|
> ++
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> ++
> scala> sql("select * from t where id > 0").show   // `NULL > 0` is `false`.
> +---+
> | ID|
> +---+
> +---+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25207) Case-insensitve field resolution for filter pushdown when reading Parquet

2018-08-31 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-25207:
---

Assignee: yucai

> Case-insensitve field resolution for filter pushdown when reading Parquet
> -
>
> Key: SPARK-25207
> URL: https://issues.apache.org/jira/browse/SPARK-25207
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: yucai
>Assignee: yucai
>Priority: Major
>  Labels: Parquet
> Attachments: image.png
>
>
> Currently, filter pushdown will not work if Parquet schema and Hive metastore 
> schema are in different letter cases even spark.sql.caseSensitive is false.
> Like the below case:
> {code:java}
> spark.range(10).write.parquet("/tmp/data")
> sql("DROP TABLE t")
> sql("CREATE TABLE t (ID LONG) USING parquet LOCATION '/tmp/data'")
> sql("select * from t where id > 0").show{code}
> -No filter will be pushed down.-
> {code}
> scala> sql("select * from t where id > 0").explain   // Filters are pushed 
> with `ID`
> == Physical Plan ==
> *(1) Project [ID#90L]
> +- *(1) Filter (isnotnull(id#90L) && (id#90L > 0))
>+- *(1) FileScan parquet default.t[ID#90L] Batched: true, Format: Parquet, 
> Location: InMemoryFileIndex[file:/tmp/data], PartitionFilters: [], 
> PushedFilters: [IsNotNull(ID), GreaterThan(ID,0)], ReadSchema: 
> struct
> scala> sql("select * from t").show// Parquet returns NULL for `ID` 
> because it has `id`.
> ++
> |  ID|
> ++
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> |null|
> ++
> scala> sql("select * from t where id > 0").show   // `NULL > 0` is `false`.
> +---+
> | ID|
> +---+
> +---+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25284) Spark UI: make sure skipped stages are updated onJobEnd

2018-08-31 Thread Juliusz Sompolski (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juliusz Sompolski resolved SPARK-25284.
---
Resolution: Duplicate

> Spark UI: make sure skipped stages are updated onJobEnd
> ---
>
> Key: SPARK-25284
> URL: https://issues.apache.org/jira/browse/SPARK-25284
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Tiny bug in onJobEnd not forcing update of skipped stages in KVstore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25284) Spark UI: make sure skipped stages are updated onJobEnd

2018-08-31 Thread Juliusz Sompolski (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598592#comment-16598592
 ] 

Juliusz Sompolski commented on SPARK-25284:
---

Contained by SPARK-24415

> Spark UI: make sure skipped stages are updated onJobEnd
> ---
>
> Key: SPARK-25284
> URL: https://issues.apache.org/jira/browse/SPARK-25284
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Tiny bug in onJobEnd not forcing update of skipped stages in KVstore.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25289) ChiSqSelector max on empty collection

2018-08-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25289:


Assignee: (was: Apache Spark)

> ChiSqSelector max on empty collection
> -
>
> Key: SPARK-25289
> URL: https://issues.apache.org/jira/browse/SPARK-25289
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.3.1
>Reporter: Marie Beaulieu
>Priority: Major
>
> In org.apache.spark.mllib.feature.ChiSqSelector.fit, there is a max taken on 
> a possibly empty collection.
> I am using Spark 2.3.1.
> Here is an example to reproduce.
> {code:java}
> import org.apache.spark.mllib.feature.ChiSqSelector
> import org.apache.spark.mllib.linalg.Vectors
> import org.apache.spark.mllib.regression.LabeledPoint
> import org.apache.spark.sql.SQLContext
> val sqlContext = new SQLContext(sc)
> implicit val spark = sqlContext.sparkSession
> val labeledPoints = (0 to 1).map(n => {
>   val v = Vectors.dense((1 to 3).map(_ => n * 1.0).toArray)
>   LabeledPoint(n.toDouble, v)
> })
> val rdd = sc.parallelize(labeledPoints)
> val selector = new ChiSqSelector().setSelectorType("fdr").setFdr(0.05)
> selector.fit(rdd){code}
> Here is the stack trace:
> {code:java}
> java.lang.UnsupportedOperationException: empty.max
> at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229)
> at scala.collection.mutable.ArrayOps$ofInt.max(ArrayOps.scala:234)
> at org.apache.spark.mllib.feature.ChiSqSelector.fit(ChiSqSelector.scala:280)
> {code}
> Looking at line 280 in ChiSqSelector, it's pretty obvious how the collection 
> can be empty. A simple non empty validation should do the trick.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25289) ChiSqSelector max on empty collection

2018-08-31 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598588#comment-16598588
 ] 

Apache Spark commented on SPARK-25289:
--

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/22303

> ChiSqSelector max on empty collection
> -
>
> Key: SPARK-25289
> URL: https://issues.apache.org/jira/browse/SPARK-25289
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.3.1
>Reporter: Marie Beaulieu
>Priority: Major
>
> In org.apache.spark.mllib.feature.ChiSqSelector.fit, there is a max taken on 
> a possibly empty collection.
> I am using Spark 2.3.1.
> Here is an example to reproduce.
> {code:java}
> import org.apache.spark.mllib.feature.ChiSqSelector
> import org.apache.spark.mllib.linalg.Vectors
> import org.apache.spark.mllib.regression.LabeledPoint
> import org.apache.spark.sql.SQLContext
> val sqlContext = new SQLContext(sc)
> implicit val spark = sqlContext.sparkSession
> val labeledPoints = (0 to 1).map(n => {
>   val v = Vectors.dense((1 to 3).map(_ => n * 1.0).toArray)
>   LabeledPoint(n.toDouble, v)
> })
> val rdd = sc.parallelize(labeledPoints)
> val selector = new ChiSqSelector().setSelectorType("fdr").setFdr(0.05)
> selector.fit(rdd){code}
> Here is the stack trace:
> {code:java}
> java.lang.UnsupportedOperationException: empty.max
> at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229)
> at scala.collection.mutable.ArrayOps$ofInt.max(ArrayOps.scala:234)
> at org.apache.spark.mllib.feature.ChiSqSelector.fit(ChiSqSelector.scala:280)
> {code}
> Looking at line 280 in ChiSqSelector, it's pretty obvious how the collection 
> can be empty. A simple non empty validation should do the trick.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25289) ChiSqSelector max on empty collection

2018-08-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25289:


Assignee: Apache Spark

> ChiSqSelector max on empty collection
> -
>
> Key: SPARK-25289
> URL: https://issues.apache.org/jira/browse/SPARK-25289
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.3.1
>Reporter: Marie Beaulieu
>Assignee: Apache Spark
>Priority: Major
>
> In org.apache.spark.mllib.feature.ChiSqSelector.fit, there is a max taken on 
> a possibly empty collection.
> I am using Spark 2.3.1.
> Here is an example to reproduce.
> {code:java}
> import org.apache.spark.mllib.feature.ChiSqSelector
> import org.apache.spark.mllib.linalg.Vectors
> import org.apache.spark.mllib.regression.LabeledPoint
> import org.apache.spark.sql.SQLContext
> val sqlContext = new SQLContext(sc)
> implicit val spark = sqlContext.sparkSession
> val labeledPoints = (0 to 1).map(n => {
>   val v = Vectors.dense((1 to 3).map(_ => n * 1.0).toArray)
>   LabeledPoint(n.toDouble, v)
> })
> val rdd = sc.parallelize(labeledPoints)
> val selector = new ChiSqSelector().setSelectorType("fdr").setFdr(0.05)
> selector.fit(rdd){code}
> Here is the stack trace:
> {code:java}
> java.lang.UnsupportedOperationException: empty.max
> at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229)
> at scala.collection.mutable.ArrayOps$ofInt.max(ArrayOps.scala:234)
> at org.apache.spark.mllib.feature.ChiSqSelector.fit(ChiSqSelector.scala:280)
> {code}
> Looking at line 280 in ChiSqSelector, it's pretty obvious how the collection 
> can be empty. A simple non empty validation should do the trick.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25293) Dataframe write to csv saves part files in outputDireotry/task-xx/part-xxx instead of directly saving in outputDir

2018-08-31 Thread omkar puttagunta (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598586#comment-16598586
 ] 

omkar puttagunta commented on SPARK-25293:
--

Apologies for setting to critical! Thans for the response. I believe this is a 
bug!   
Why would the dataframe write output be saved in __temporary directory. It 
should be directly saved under the output directory specified!!!  Also provided 
the sample code to replicate the issue!



> Dataframe write to csv saves part files in outputDireotry/task-xx/part-xxx 
> instead of directly saving in outputDir
> --
>
> Key: SPARK-25293
> URL: https://issues.apache.org/jira/browse/SPARK-25293
> Project: Spark
>  Issue Type: Bug
>  Components: EC2, Java API, Spark Shell, Spark Submit
>Affects Versions: 2.0.2
>Reporter: omkar puttagunta
>Priority: Major
>
> [https://stackoverflow.com/questions/52108335/why-spark-dataframe-writes-part-files-to-temporary-in-instead-directly-creating]
> {quote}Running Spark 2.0.2 in Standalone Cluster Mode; 2 workers and 1 master 
> node on AWS EC2
> {quote}
> Simple Test; reading pipe delimited file and writing data to csv. Commands 
> below are executed in spark-shell with master-url set
> {{val df = 
> spark.sqlContext.read.option("delimiter","|").option("quote","\u").csv("/home/input-files/")
>  val emailDf=df.filter("_c3='EML'") 
> emailDf.repartition(100).write.csv("/opt/outputFile/")}}
> After executing the cmds above in spark-shell with master url set.
> {quote}In {{worker1}} -> Each part file is created 
> in\{{/opt/outputFile/_temporary/task-x-xxx/part-xxx-xxx}}
>  In {{worker2}} -> {{/opt/outputFile/part-xxx}} => part files are generated 
> directly under outputDirectory specified during write.
> {quote}
> *Same thing happens with coalesce(100) or without specifying 
> repartition/coalesce!!! Tried with Java also!*
> *_Quesiton_*
> 1) why {{worker1}} {{/opt/outputFile/}} output directory doesn't have 
> {{part-}} files just like in {{worker2}}? why {{_temporary}} directory is 
> created and {{part-xxx-xx}} files reside in the \{{task-xxx}}directories?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Onur Satici (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598534#comment-16598534
 ] 

Onur Satici commented on SPARK-24434:
-

[~rvesse] [~skonto] - I agree, we should discuss a more structured way of 
reflecting the decisions of the weekly sync to Apache resources.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21786) The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s)

2018-08-31 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598514#comment-16598514
 ] 

Apache Spark commented on SPARK-21786:
--

User 'fjh100456' has created a pull request for this issue:
https://github.com/apache/spark/pull/22302

> The 'spark.sql.parquet.compression.codec' configuration doesn't take effect 
> on tables with partition field(s)
> -
>
> Key: SPARK-21786
> URL: https://issues.apache.org/jira/browse/SPARK-21786
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jinhua Fu
>Assignee: Jinhua Fu
>Priority: Major
> Fix For: 2.3.0
>
>
> Since Hive 1.1, Hive allows users to set parquet compression codec via 
> table-level properties parquet.compression. See the JIRA: 
> https://issues.apache.org/jira/browse/HIVE-7858 . We do support 
> orc.compression for ORC. Thus, for external users, it is more straightforward 
> to support both. See the stackflow question: 
> https://stackoverflow.com/questions/36941122/spark-sql-ignores-parquet-compression-propertie-specified-in-tblproperties
> In Spark side, our table-level compression conf compression was added by 
> #11464 since Spark 2.0.
> We need to support both table-level conf. Users might also use session-level 
> conf spark.sql.parquet.compression.codec. The priority rule will be like
> If other compression codec configuration was found through hive or parquet, 
> the precedence would be compression, parquet.compression, 
> spark.sql.parquet.compression.codec. Acceptable values include: none, 
> uncompressed, snappy, gzip, lzo.
> The rule for Parquet is consistent with the ORC after the change.
> Changes:
> 1.Increased acquiring 'compressionCodecClassName' from 
> parquet.compression,and the precedence order is 
> compression,parquet.compression,spark.sql.parquet.compression.codec, just 
> like what we do in OrcOptions.
> 2.Change spark.sql.parquet.compression.codec to support "none".Actually in 
> ParquetOptions,we do support "none" as equivalent to "uncompressed", but it 
> does not allowed to configured to "none".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25206) wrong records are returned when Hive metastore schema and parquet schema are in different letter cases

2018-08-31 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598467#comment-16598467
 ] 

Dongjoon Hyun commented on SPARK-25206:
---

Thank you all for the decision.

> wrong records are returned when Hive metastore schema and parquet schema are 
> in different letter cases
> --
>
> Key: SPARK-25206
> URL: https://issues.apache.org/jira/browse/SPARK-25206
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.2, 2.3.1
>Reporter: yucai
>Priority: Blocker
>  Labels: Parquet, correctness
> Attachments: image-2018-08-24-18-05-23-485.png, 
> image-2018-08-24-22-33-03-231.png, image-2018-08-24-22-34-11-539.png, 
> image-2018-08-24-22-46-05-346.png, image-2018-08-25-09-54-53-219.png, 
> image-2018-08-25-10-04-21-901.png, pr22183.png
>
>
> In current Spark 2.3.1, below query returns wrong data silently.
> {code:java}
> spark.range(10).write.parquet("/tmp/data")
> sql("DROP TABLE t")
> sql("CREATE TABLE t (ID LONG) USING parquet LOCATION '/tmp/data'")
> scala> sql("select * from t where id > 0").show
> +---+
> | ID|
> +---+
> +---+
> {code}
>  
> *Root Cause*
> After deep dive, it has two issues, both are related to different letter 
> cases between Hive metastore schema and parquet schema.
> 1. Wrong column is pushdown.
> Spark pushdowns FilterApi.gt(intColumn("{color:#ff}ID{color}"), 0: 
> Integer) into parquet, but {color:#ff}ID{color} does not exist in 
> /tmp/data (parquet is case sensitive, it has {color:#ff}id{color} 
> actually).
> So no records are returned.
> Since SPARK-24716, Spark uses Parquet schema instead of Hive metastore schema 
> to do the pushdown, perfect for this issue.
> 2. Spark SQL returns NULL for a column whose Hive metastore schema and 
> Parquet schema are in different letter cases, even spark.sql.caseSensitive 
> set to false.
> SPARK-25132 addressed this issue already.
>  
> The biggest difference is, in Spark 2.1, user will get Exception for the same 
> query:
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Column [ID] was not found in 
> schema!{code}
> So they will know the issue and fix the query.
> But in Spark 2.3, user will get the wrong results sliently.
>  
> To make the above query work, we need both SPARK-25132 and -SPARK-24716.-
>  
> [~yumwang] , [~cloud_fan], [~smilegator], any thoughts? Should we backport it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19809) NullPointerException on zero-size ORC file

2018-08-31 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598465#comment-16598465
 ] 

Dongjoon Hyun commented on SPARK-19809:
---

Hi, [~shirisht]. You need to turn on `convertMetastoreOrc`

{code}
scala> sql("set spark.sql.hive.convertMetastoreOrc=true")
res4: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> sql("select * from empty_orc").show
+---+
|  a|
+---+
+---+
{code}

> NullPointerException on zero-size ORC file
> --
>
> Key: SPARK-19809
> URL: https://issues.apache.org/jira/browse/SPARK-19809
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.1, 2.2.1
>Reporter: Michał Dawid
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 2.3.0
>
> Attachments: image-2018-02-26-20-29-49-410.png, 
> spark.sql.hive.convertMetastoreOrc.txt
>
>
> When reading from hive ORC table if there are some 0 byte files we get 
> NullPointerException:
> {code}java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:560)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1010)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:190)
>   at 
> org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
>   at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
>   at 
> org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498)
>   at 
> org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375)
>   at 
> 

[jira] [Commented] (SPARK-18818) Window...orderBy() should accept an 'ascending' parameter just like DataFrame.orderBy()

2018-08-31 Thread Anna Molchanova (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598438#comment-16598438
 ] 

Anna Molchanova commented on SPARK-18818:
-

Hello, I'll pick this up

> Window...orderBy() should accept an 'ascending' parameter just like 
> DataFrame.orderBy()
> ---
>
> Key: SPARK-18818
> URL: https://issues.apache.org/jira/browse/SPARK-18818
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Reporter: Nicholas Chammas
>Priority: Minor
>
> It seems inconsistent that {{Window...orderBy()}} does not accept an 
> {{ascending}} parameter, when {{DataFrame.orderBy()}} does.
> It's also slightly inconvenient since to specify a descending sort order you 
> have to build a column object, whereas with the {{ascending}} parameter you 
> don't.
> For example:
> {code}
> from pyspark.sql.functions import row_number
> df.select(
> row_number()
> .over(
> Window
> .partitionBy(...)
> .orderBy('timestamp', ascending=False)))
> {code}
> vs.
> {code}
> from pyspark.sql.functions import row_number, col
> df.select(
> row_number()
> .over(
> Window
> .partitionBy(...)
> .orderBy(col('timestamp').desc(
> {code}
> It would be better if {{Window...orderBy()}} supported an {{ascending}} 
> parameter just like {{DataFrame.orderBy()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21786) The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s)

2018-08-31 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598432#comment-16598432
 ] 

Apache Spark commented on SPARK-21786:
--

User 'fjh100456' has created a pull request for this issue:
https://github.com/apache/spark/pull/22301

> The 'spark.sql.parquet.compression.codec' configuration doesn't take effect 
> on tables with partition field(s)
> -
>
> Key: SPARK-21786
> URL: https://issues.apache.org/jira/browse/SPARK-21786
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jinhua Fu
>Assignee: Jinhua Fu
>Priority: Major
> Fix For: 2.3.0
>
>
> Since Hive 1.1, Hive allows users to set parquet compression codec via 
> table-level properties parquet.compression. See the JIRA: 
> https://issues.apache.org/jira/browse/HIVE-7858 . We do support 
> orc.compression for ORC. Thus, for external users, it is more straightforward 
> to support both. See the stackflow question: 
> https://stackoverflow.com/questions/36941122/spark-sql-ignores-parquet-compression-propertie-specified-in-tblproperties
> In Spark side, our table-level compression conf compression was added by 
> #11464 since Spark 2.0.
> We need to support both table-level conf. Users might also use session-level 
> conf spark.sql.parquet.compression.codec. The priority rule will be like
> If other compression codec configuration was found through hive or parquet, 
> the precedence would be compression, parquet.compression, 
> spark.sql.parquet.compression.codec. Acceptable values include: none, 
> uncompressed, snappy, gzip, lzo.
> The rule for Parquet is consistent with the ORC after the change.
> Changes:
> 1.Increased acquiring 'compressionCodecClassName' from 
> parquet.compression,and the precedence order is 
> compression,parquet.compression,spark.sql.parquet.compression.codec, just 
> like what we do in OrcOptions.
> 2.Change spark.sql.parquet.compression.codec to support "none".Actually in 
> ParquetOptions,we do support "none" as equivalent to "uncompressed", but it 
> does not allowed to configured to "none".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598406#comment-16598406
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:20 AM:
--

[~onursatici] I was never informed about the decision and how urgent was that, 
otherwise I could  have responded to that, had no chance.


was (Author: skonto):
[~onursatici] I was never informed about the decision and how urgent was that, 
otherwise I could respond to that, had no chance.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598406#comment-16598406
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:19 AM:
--

[~onursatici] I was never informed about the decision and how urgent was that, 
otherwise I could respond to that, had no chance.


was (Author: skonto):
[~onursatici] I was never informed about the decision. Anyway.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598406#comment-16598406
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:19 AM:
--

[~onursatici] I was never informed about the decision. Anyway.


was (Author: skonto):
[~onursatici] I was never informed about the decision.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598406#comment-16598406
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:18 AM:
--

[~onursatici] I was never informed about the decision.


was (Author: skonto):
[~onursatici] I was never informed about the decision. Also I notified people 
on the slack channel that im working on it (8th of August):

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

I never said im not actively working on it.  I just said that it will be 
delayed since im off. Anyway.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598406#comment-16598406
 ] 

Stavros Kontopoulos commented on SPARK-24434:
-

[~onursatici] I was never informed about the decision. Also I notified people 
in the slack channel that im working on it (8th of August):

Stavros [3:27 PM]
@liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

I never said im not actively working on it.  I just said that it will be 
delayed since im off. Anyway.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598406#comment-16598406
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 8/31/18 8:17 AM:
--

[~onursatici] I was never informed about the decision. Also I notified people 
on the slack channel that im working on it (8th of August):

Stavros [3:27 PM]
 @liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

I never said im not actively working on it.  I just said that it will be 
delayed since im off. Anyway.


was (Author: skonto):
[~onursatici] I was never informed about the decision. Also I notified people 
in the slack channel that im working on it (8th of August):

Stavros [3:27 PM]
@liyinan926 @eje I am working on the pod template PR but i will be off for a 
couple of weeks, work more on that after.

I never said im not actively working on it.  I just said that it will be 
delayed since im off. Anyway.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598402#comment-16598402
 ] 

Rob Vesse commented on SPARK-24434:
---

{quote}
I think the miscommunication here was because of the discrepancy between this 
Jira and k8s-sig-big-data weekly meeting notes
{quote}

As an Apache member this comment raises red flags for me.  All Spark 
development discussions should either be happening on Apache resources (JIRA, 
mailing lists, GitHub repos) or being captured and posted to Apache resources.  
If people are having to follow external resources, particularly live meetings 
which naturally exclude portions of the community due to timezone/availability 
constraints, to participate in an Apache community then that community is not 
operating as a proper Apache community.  

This doesn't mean that such discussions and meetings can't happen but they 
should be summarised back on Apache resources so the wider community has the 
opportunity to participate.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Onur Satici (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598393#comment-16598393
 ] 

Onur Satici edited comment on SPARK-24434 at 8/31/18 8:00 AM:
--

Hello [~felixcheung], sorry I think the miscommunication here was because of 
the discrepancy between this Jira and k8s-sig-big-data weekly meeting notes. On 
[15 
Aug|https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit#heading=h.d1p209nfiamv]
 it was discussed that as [~skonto] was out, and was not actively working on 
this PR at that moment, [~yifeih] and I can take over and start working on 
this. I think we should have reflected that decision in this Jira after the 
meeting to clear our intentions.

We needed this change urgently as we had a couple of PR's adding new spark 
configuration options to customize Spark pods, and they were all blocked by 
this.

 


was (Author: onursatici):
Hello [~felixcheung], sorry I think the miscommunication here was because of 
the discrepancy between this Jira and k8s-sig-big-data weekly meeting notes. On 
[link 15 
Aug|https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit#heading=h.d1p209nfiamv]
 it was discussed that as [~skonto] was out, and was not actively working on 
this PR at that moment, [~yifeih] and I can take over and start working on 
this. I think we should have reflected that decision in this Jira after the 
meeting to clear our intentions.

We needed this change urgently as we had a couple of PR's adding new spark 
configuration options to customize Spark pods, and they were all blocked by 
this.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Onur Satici (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598393#comment-16598393
 ] 

Onur Satici commented on SPARK-24434:
-

Hello [~felixcheung], sorry I think the miscommunication here was because of 
the discrepancy between this Jira and k8s-sig-big-data weekly meeting notes. On 
[link 15 
Aug|https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA/edit#heading=h.d1p209nfiamv]
 it was discussed that as [~skonto] was out, and was not actively working on 
this PR at that moment, [~yifeih] and I can take over and start working on 
this. I think we should have reflected that decision in this Jira after the 
meeting to clear our intentions.

We needed this change urgently as we had a couple of PR's adding new spark 
configuration options to customize Spark pods, and they were all blocked by 
this.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25294) Add integration test for Kerberos

2018-08-31 Thread Steven Rand (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598366#comment-16598366
 ] 

Steven Rand commented on SPARK-25294:
-

+1 – another example of how easy it is to break the krb integration w/o 
noticing is https://issues.apache.org/jira/browse/SPARK-22319

> Add integration test for Kerberos 
> --
>
> Key: SPARK-25294
> URL: https://issues.apache.org/jira/browse/SPARK-25294
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Some changes may cause Kerberos issues, such as {{Yarn}}, {{Hive}}, {{HDFS}}. 
> we should add tests.
> https://issues.apache.org/jira/browse/SPARK-23789
> https://github.com/apache/spark/pull/21987#issuecomment-417560077



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25296) Create ExplainSuite

2018-08-31 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598343#comment-16598343
 ] 

Apache Spark commented on SPARK-25296:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/22300

> Create ExplainSuite
> ---
>
> Key: SPARK-25296
> URL: https://issues.apache.org/jira/browse/SPARK-25296
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
>
> Move the output verification of Explain to a new suite ExplainSuite. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25296) Create ExplainSuite

2018-08-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25296:


Assignee: Xiao Li  (was: Apache Spark)

> Create ExplainSuite
> ---
>
> Key: SPARK-25296
> URL: https://issues.apache.org/jira/browse/SPARK-25296
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
>
> Move the output verification of Explain to a new suite ExplainSuite. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25296) Create ExplainSuite

2018-08-31 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25296:


Assignee: Apache Spark  (was: Xiao Li)

> Create ExplainSuite
> ---
>
> Key: SPARK-25296
> URL: https://issues.apache.org/jira/browse/SPARK-25296
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Major
>
> Move the output verification of Explain to a new suite ExplainSuite. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25296) Create ExplainSuite

2018-08-31 Thread Xiao Li (JIRA)
Xiao Li created SPARK-25296:
---

 Summary: Create ExplainSuite
 Key: SPARK-25296
 URL: https://issues.apache.org/jira/browse/SPARK-25296
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.4.0
Reporter: Xiao Li
Assignee: Xiao Li


Move the output verification of Explain to a new suite ExplainSuite. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25183) Spark HiveServer2 registers shutdown hook with JVM, not ShutdownHookManager; race conditions can arise

2018-08-31 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-25183:
---

Assignee: Steve Loughran

> Spark HiveServer2 registers shutdown hook with JVM, not ShutdownHookManager; 
> race conditions can arise
> --
>
> Key: SPARK-25183
> URL: https://issues.apache.org/jira/browse/SPARK-25183
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.4.0
>
>
> Spark's HiveServer2 registers a shutdown hook with the JVM 
> {{Runtime.addShutdownHook()}} which can happen in parallel with the 
> ShutdownHookManager sequence of spark & Hadoop, which operate the shutdowns 
> in an ordered sequence.
> This has some risks
> * FS shutdown before rename of logs completes, SPARK-6933
> * Delays of rename on object stores may block FS close operation, which, on 
> clusters with timeouts hooks (HADOOP-12950) of FileSystem.closeAll() can 
> force a kill of that shutdown hook and other problems.
> General outcome: logs aren't present.
> Proposed fix: 
> * register hook with {{org.apache.spark.util.ShutdownHookManager}}
> * HADOOP-15679 to make shutdown wait time configurable, so O(data) renames 
> don't trigger timeouts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25183) Spark HiveServer2 registers shutdown hook with JVM, not ShutdownHookManager; race conditions can arise

2018-08-31 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-25183.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 22186
[https://github.com/apache/spark/pull/22186]

> Spark HiveServer2 registers shutdown hook with JVM, not ShutdownHookManager; 
> race conditions can arise
> --
>
> Key: SPARK-25183
> URL: https://issues.apache.org/jira/browse/SPARK-25183
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 2.4.0
>
>
> Spark's HiveServer2 registers a shutdown hook with the JVM 
> {{Runtime.addShutdownHook()}} which can happen in parallel with the 
> ShutdownHookManager sequence of spark & Hadoop, which operate the shutdowns 
> in an ordered sequence.
> This has some risks
> * FS shutdown before rename of logs completes, SPARK-6933
> * Delays of rename on object stores may block FS close operation, which, on 
> clusters with timeouts hooks (HADOOP-12950) of FileSystem.closeAll() can 
> force a kill of that shutdown hook and other problems.
> General outcome: logs aren't present.
> Proposed fix: 
> * register hook with {{org.apache.spark.util.ShutdownHookManager}}
> * HADOOP-15679 to make shutdown wait time configurable, so O(data) renames 
> don't trigger timeouts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25288) Kafka transaction tests are flaky

2018-08-31 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-25288.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

> Kafka transaction tests are flaky
> -
>
> Key: SPARK-25288
> URL: https://issues.apache.org/jira/browse/SPARK-25288
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.4.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 2.4.0
>
>
> http://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.kafka010.KafkaRelationSuite_name=read+Kafka+transactional+messages%3A+read_committed
> http://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.kafka010.KafkaMicroBatchV1SourceSuite_name=read+Kafka+transactional+messages%3A+read_committed
> http://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.kafka010.KafkaMicroBatchV2SourceSuite_name=read+Kafka+transactional+messages%3A+read_committed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25275) require memberhip in wheel to run 'su' (in dockerfiles)

2018-08-31 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-25275:


Assignee: Erik Erlandson

> require memberhip in wheel to run 'su' (in dockerfiles)
> ---
>
> Key: SPARK-25275
> URL: https://issues.apache.org/jira/browse/SPARK-25275
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Erik Erlandson
>Assignee: Erik Erlandson
>Priority: Major
>  Labels: docker, kubernetes
> Fix For: 2.4.0
>
>
> For improved security, configure that users must be in wheel group in order 
> to run su.
> see example:
> [https://github.com/ope]nshift-evangelists/terminal-base-image/blob/master/image/Dockerfile#L53



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24433) Add Spark R support

2018-08-31 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-24433:


Assignee: Ilan Filonenko

> Add Spark R support
> ---
>
> Key: SPARK-24433
> URL: https://issues.apache.org/jira/browse/SPARK-24433
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Assignee: Ilan Filonenko
>Priority: Major
> Fix For: 2.4.0
>
>
> This is the ticket to track work on adding support for R binding into the 
> Kubernetes mode. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark and needs to be upstreamed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25295) Pod names conflicts in client mode, if previous submission was not a clean shutdown.

2018-08-31 Thread Prashant Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-25295:

Description: 
If the previous job was killed somehow, by disconnecting the client. It leaves 
behind the executor pods named spark-exec-#, which cause naming conflicts and 
failures for the next job submission.

io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST 
at: https://:6443/api/v1/namespaces/default/pods. Message: pods 
"spark-exec-4" already exists. Received status: Status(apiVersion=v1, code=409, 
details=StatusDetails(causes=[], group=null, kind=pods, name=spark-exec-4, 
retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
message=pods "spark-exec-4" already exists, 
metadata=ListMeta(resourceVersion=null, selfLink=null, 
additionalProperties={}), reason=AlreadyExists, status=Failure, 
additionalProperties={}).


  was:
If the previous job was killed somehow, by disconnecting the client. It leaves 
behind the executor pods named spark-exec-#, which cause naming conflicts and 
failures for the next job submission.

io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST 
at: https://9.30.110.150:6443/api/v1/namespaces/default/pods. Message: pods 
"spark-exec-4" already exists. Received status: Status(apiVersion=v1, code=409, 
details=StatusDetails(causes=[], group=null, kind=pods, name=spark-exec-4, 
retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
message=pods "spark-exec-4" already exists, 
metadata=ListMeta(resourceVersion=null, selfLink=null, 
additionalProperties={}), reason=AlreadyExists, status=Failure, 
additionalProperties={}).



> Pod names conflicts in client mode, if previous submission was not a clean 
> shutdown.
> 
>
> Key: SPARK-25295
> URL: https://issues.apache.org/jira/browse/SPARK-25295
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Prashant Sharma
>Priority: Major
>
> If the previous job was killed somehow, by disconnecting the client. It 
> leaves behind the executor pods named spark-exec-#, which cause naming 
> conflicts and failures for the next job submission.
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: 
> POST at: https://:6443/api/v1/namespaces/default/pods. Message: pods 
> "spark-exec-4" already exists. Received status: Status(apiVersion=v1, 
> code=409, details=StatusDetails(causes=[], group=null, kind=pods, 
> name=spark-exec-4, retryAfterSeconds=null, uid=null, 
> additionalProperties={}), kind=Status, message=pods "spark-exec-4" already 
> exists, metadata=ListMeta(resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=AlreadyExists, status=Failure, 
> additionalProperties={}).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598300#comment-16598300
 ] 

Felix Cheung commented on SPARK-24434:
--

so [~onursatici] is there a reason you open a PR even though it was clearly 
stated in this Jira that Stavros is working on this? 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25295) Pod names conflicts in client mode, if previous submission was not a clean shutdown.

2018-08-31 Thread Prashant Sharma (JIRA)
Prashant Sharma created SPARK-25295:
---

 Summary: Pod names conflicts in client mode, if previous 
submission was not a clean shutdown.
 Key: SPARK-25295
 URL: https://issues.apache.org/jira/browse/SPARK-25295
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Prashant Sharma


If the previous job was killed somehow, by disconnecting the client. It leaves 
behind the executor pods named spark-exec-#, which cause naming conflicts and 
failures for the next job submission.

io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST 
at: https://9.30.110.150:6443/api/v1/namespaces/default/pods. Message: pods 
"spark-exec-4" already exists. Received status: Status(apiVersion=v1, code=409, 
details=StatusDetails(causes=[], group=null, kind=pods, name=spark-exec-4, 
retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, 
message=pods "spark-exec-4" already exists, 
metadata=ListMeta(resourceVersion=null, selfLink=null, 
additionalProperties={}), reason=AlreadyExists, status=Failure, 
additionalProperties={}).




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24748) Support for reporting custom metrics via Streaming Query Progress

2018-08-31 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598298#comment-16598298
 ] 

Apache Spark commented on SPARK-24748:
--

User 'arunmahadevan' has created a pull request for this issue:
https://github.com/apache/spark/pull/22299

> Support for reporting custom metrics via Streaming Query Progress
> -
>
> Key: SPARK-24748
> URL: https://issues.apache.org/jira/browse/SPARK-24748
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Arun Mahadevan
>Assignee: Arun Mahadevan
>Priority: Major
> Fix For: 2.4.0
>
>
> Currently the Structured Streaming sources and sinks does not have a way to 
> report custom metrics. Providing an option to report custom metrics and 
> making it available via Streaming Query progress can enable sources and sinks 
> to report custom progress information (E.g. the lag metrics for Kafka source).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-08-31 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598297#comment-16598297
 ] 

Felix Cheung commented on SPARK-24434:
--

[~skonto] - hi what's happening? Henry is right, with Spark, we don't generally 
assign issue/Jira to a particular user until the issue is resolved/closed.

Also, changing assignee in asf Jira requires particular permission.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-31 Thread Evelyn Bayes (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598275#comment-16598275
 ] 

Evelyn Bayes edited comment on SPARK-25150 at 8/31/18 6:06 AM:
---

I'd love the chance to bug patch this.

I've included a simplified version of the python script which produces it, if 
you switch out the second join to the commented join it works as it should. 
Unable to embed resource: zombie-analysis.py of type application/octet-stream

What's happening is during the creation of the logical plan it's re-aliasing 
the right side of the join because the left and right refer to the same base 
column. When it does this it renames all the columns in the right side of the 
join to the new alias but not the column which is actually a part of the join.

Then because the join refers to the column which hasn't been updated it now 
refers to the left side of the join. So it does a cartesian join on itself and 
straps on the right side of the join on the end.

The part of the code which is doing the renaming is:
 
[https://github.com/apache/spark/blob/v2.3.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala]
 It's using ResolveReferences.dedupRight which as it says just de duplicates 
the right side references from the left side (this might be a naive 
understanding of it).

Then if you just alias one of these columns it's fine. But that really 
shouldn't be required for the logical plan to be accurate.

 

 


was (Author: eeveeb):
I'd love the chance to bug patch this.

I've included a simplified version of the python script which produces it, if 
you switch out the second join to the commented join it works as it should. 
!zombie-analysis.py|width=7,height=7,align=absmiddle!

What's happening is it's re-aliasing the right side of the join because the 
left and right refer to the same base column. When it does this it renames all 
the columns in the right side of the join to the new alias but not the column 
which is actually a part of the join.

Then because the join refers to the column which hasn't been updated it now 
refers to the left side of the join. So it does a cartesian join on itself and 
straps on the right side of the join on the end.

The part of the code which is doing the renaming is:
[https://github.com/apache/spark/blob/v2.3.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala]
It's using ResolveReferences.dedupRight which as it says just de duplicates the 
right side references from the left side (this might be a naive understanding 
of it).

Then if you just alias one of these columns it's fine. But that really 
shouldn't be required for the logical plan to be accurate.

 

 

> Joining DataFrames derived from the same source yields confusing/incorrect 
> results
> --
>
> Key: SPARK-25150
> URL: https://issues.apache.org/jira/browse/SPARK-25150
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Nicholas Chammas
>Priority: Major
> Attachments: output-with-implicit-cross-join.txt, 
> output-without-implicit-cross-join.txt, persons.csv, states.csv, 
> zombie-analysis.py
>
>
> I have two DataFrames, A and B. From B, I have derived two additional 
> DataFrames, B1 and B2. When joining A to B1 and B2, I'm getting a very 
> confusing error:
> {code:java}
> Join condition is missing or trivial.
> Either: use the CROSS JOIN syntax to allow cartesian products between these
> relations, or: enable implicit cartesian products by setting the configuration
> variable spark.sql.crossJoin.enabled=true;
> {code}
> Then, when I configure "spark.sql.crossJoin.enabled=true" as instructed, 
> Spark appears to give me incorrect answers.
> I am not sure if I am missing something obvious, or if there is some kind of 
> bug here. The "join condition is missing" error is confusing and doesn't make 
> sense to me, and the seemingly incorrect output is concerning.
> I've attached a reproduction, along with the output I'm seeing with and 
> without the implicit cross join enabled.
> I realize the join I've written is not correct in the sense that it should be 
> left outer join instead of an inner join (since some of the aggregates are 
> not available for all states), but that doesn't explain Spark's behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25294) Add integration test for Kerberos

2018-08-31 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598290#comment-16598290
 ] 

Yuming Wang commented on SPARK-25294:
-

cc [~srowen] What do you think?

> Add integration test for Kerberos 
> --
>
> Key: SPARK-25294
> URL: https://issues.apache.org/jira/browse/SPARK-25294
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>
> Some changes may cause Kerberos issues, such as {{Yarn}}, {{Hive}}, {{HDFS}}. 
> we should add tests.
> https://issues.apache.org/jira/browse/SPARK-23789
> https://github.com/apache/spark/pull/21987#issuecomment-417560077



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25294) Add integration test for Kerberos

2018-08-31 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-25294:
---

 Summary: Add integration test for Kerberos 
 Key: SPARK-25294
 URL: https://issues.apache.org/jira/browse/SPARK-25294
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 2.4.0
Reporter: Yuming Wang


Some changes may cause Kerberos issues, such as {{Yarn}}, {{Hive}}, {{HDFS}}. 
we should add tests.

https://issues.apache.org/jira/browse/SPARK-23789
https://github.com/apache/spark/pull/21987#issuecomment-417560077



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25150) Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-31 Thread Evelyn Bayes (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598284#comment-16598284
 ] 

Evelyn Bayes edited comment on SPARK-25150 at 8/31/18 6:00 AM:
---

Sorry my attachment doesn't want to stick, feel free to ask me to email it or 
explain to me how it works. Sorry!

 


was (Author: eeveeb):
Sorry my attachment doesn't want to stick,I'll give it another try.

 

[^zombie-analysis.py]

> Joining DataFrames derived from the same source yields confusing/incorrect 
> results
> --
>
> Key: SPARK-25150
> URL: https://issues.apache.org/jira/browse/SPARK-25150
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Nicholas Chammas
>Priority: Major
> Attachments: output-with-implicit-cross-join.txt, 
> output-without-implicit-cross-join.txt, persons.csv, states.csv, 
> zombie-analysis.py
>
>
> I have two DataFrames, A and B. From B, I have derived two additional 
> DataFrames, B1 and B2. When joining A to B1 and B2, I'm getting a very 
> confusing error:
> {code:java}
> Join condition is missing or trivial.
> Either: use the CROSS JOIN syntax to allow cartesian products between these
> relations, or: enable implicit cartesian products by setting the configuration
> variable spark.sql.crossJoin.enabled=true;
> {code}
> Then, when I configure "spark.sql.crossJoin.enabled=true" as instructed, 
> Spark appears to give me incorrect answers.
> I am not sure if I am missing something obvious, or if there is some kind of 
> bug here. The "join condition is missing" error is confusing and doesn't make 
> sense to me, and the seemingly incorrect output is concerning.
> I've attached a reproduction, along with the output I'm seeing with and 
> without the implicit cross join enabled.
> I realize the join I've written is not correct in the sense that it should be 
> left outer join instead of an inner join (since some of the aggregates are 
> not available for all states), but that doesn't explain Spark's behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org