[jira] [Created] (SPARK-34327) Omit inlining passwords during build process.

2021-02-02 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-34327:
---

 Summary: Omit inlining passwords during build process.
 Key: SPARK-34327
 URL: https://issues.apache.org/jira/browse/SPARK-34327
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.0.1, 2.4.7, 3.2.0, 3.1.1
Reporter: Prashant Sharma


Spark release process uses github urls embedded with passwords. And this script 
would inline those passwords into build info and then release it.

It is important to strip this, before such inadvertent exposure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32221) Avoid possible errors due to incorrect file size or type supplied in spark conf.

2021-01-06 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma resolved SPARK-32221.
-
Resolution: Fixed

> Avoid possible errors due to incorrect file size or type supplied in spark 
> conf.
> 
>
> Key: SPARK-32221
> URL: https://issues.apache.org/jira/browse/SPARK-32221
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
>
> This would avoid failures, in case the files are a bit large or a user places 
> a binary file inside the SPARK_CONF_DIR.
> Both of which are not supported at the moment.
> The reason is, underlying etcd store does limit the size of each entry to 
> only 1.5 MiB.
> [https://etcd.io/docs/v3.4.0/dev-guide/limit/] 
> We can apply a straightforward approach of skipping files that cannot be 
> accommodated within 1.5MiB limit (limit is configurable as per above link) 
> and WARNING the user about the same.
> For most use cases, this limit is more than sufficient, however a user may 
> accidentally place a larger file and observe an unpredictable result or 
> failures at run time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32221) Avoid possible errors due to incorrect file size or type supplied in spark conf.

2021-01-06 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32221:

Fix Version/s: 3.1.0

> Avoid possible errors due to incorrect file size or type supplied in spark 
> conf.
> 
>
> Key: SPARK-32221
> URL: https://issues.apache.org/jira/browse/SPARK-32221
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
> Fix For: 3.1.0
>
>
> This would avoid failures, in case the files are a bit large or a user places 
> a binary file inside the SPARK_CONF_DIR.
> Both of which are not supported at the moment.
> The reason is, underlying etcd store does limit the size of each entry to 
> only 1.5 MiB.
> [https://etcd.io/docs/v3.4.0/dev-guide/limit/] 
> We can apply a straightforward approach of skipping files that cannot be 
> accommodated within 1.5MiB limit (limit is configurable as per above link) 
> and WARNING the user about the same.
> For most use cases, this limit is more than sufficient, however a user may 
> accidentally place a larger file and observe an unpredictable result or 
> failures at run time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32221) Avoid possible errors due to incorrect file size or type supplied in spark conf.

2021-01-06 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma reassigned SPARK-32221:
---

Assignee: Prashant Sharma

> Avoid possible errors due to incorrect file size or type supplied in spark 
> conf.
> 
>
> Key: SPARK-32221
> URL: https://issues.apache.org/jira/browse/SPARK-32221
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
>
> This would avoid failures, in case the files are a bit large or a user places 
> a binary file inside the SPARK_CONF_DIR.
> Both of which are not supported at the moment.
> The reason is, underlying etcd store does limit the size of each entry to 
> only 1.5 MiB.
> [https://etcd.io/docs/v3.4.0/dev-guide/limit/] 
> We can apply a straightforward approach of skipping files that cannot be 
> accommodated within 1.5MiB limit (limit is configurable as per above link) 
> and WARNING the user about the same.
> For most use cases, this limit is more than sufficient, however a user may 
> accidentally place a larger file and observe an unpredictable result or 
> failures at run time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32007) Spark Driver Supervise does not work reliably

2020-12-13 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32007:

Priority: Major  (was: Critical)

> Spark Driver Supervise does not work reliably
> -
>
> Key: SPARK-32007
> URL: https://issues.apache.org/jira/browse/SPARK-32007
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
> Environment: |Java Version|1.8.0_121 (Oracle Corporation)|
> |Java Home|/usr/java/jdk1.8.0_121/jre|
> |Scala Version|version 2.11.12|
> |OS|Amazon Linux|
> h4.  
>Reporter: Suraj Sharma
>Priority: Major
>
> I have a standalone cluster setup. I DO NOT have a streaming use case. I use 
> AWS EC2 machines to have spark master and worker processes.
> *Problem*: If a spark worker machine running some drivers and executor dies, 
> then the driver is not spawned again on other healthy machines.
> *Below are my findings:*
> ||Action/Behaviour||Executor||Driver||
> |Worker Machine Stop|Relaunches on an active machine|NO Relaunch|
> |kill -9 to process|Relaunches on other machines|Relaunches on other machines|
> |kill to process|Relaunches on other machines|Relaunches on other machines|
> *Cluster Setup:*
>  # I have a spark standalone cluster
>  # {{spark.driver.supervise=true}}
>  # Spark Master HA is enabled and is backed by zookeeper
>  # Spark version = 2.4.4
>  # I am using a systemd script for the spark worker process



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33668) Fix flaky test "Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties."

2020-12-04 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-33668:

Description: 
The test is flaking, with multiple flaked instances - the reason for the 
failure has been similar to:
{code:java}
  The code passed to eventually never returned normally. Attempted 109 times 
over 3.007988241397 minutes. Last failure message: Failure executing: GET 
at: 
https://192.168.39.167:8443/api/v1/namespaces/b37fc72a991b49baa68a2eaaa1516463/pods/spark-pi-97a9bc76308e7fe3-exec-1/log?pretty=false.
 Message: pods "spark-pi-97a9bc76308e7fe3-exec-1" not found. Received status: 
Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, 
kind=pods, name=spark-pi-97a9bc76308e7fe3-exec-1, retryAfterSeconds=null, 
uid=null, additionalProperties={}), kind=Status, message=pods 
"spark-pi-97a9bc76308e7fe3-exec-1" not found, metadata=ListMeta(_continue=null, 
remainingItemCount=null, resourceVersion=null, selfLink=null, 
additionalProperties={}), reason=NotFound, status=Failure, 
additionalProperties={}).. (KubernetesSuite.scala:402)
{code}

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36854/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36852/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36850/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36848/console

>From the above failures, it seems, that executor finishes too quickly and is 
>removed by spark before the test can complete. 

So, in order to mitigate this situation, one way is to turn on the flag

{code}
   "spark.kubernetes.executor.deleteOnTermination"
{code}

  was:
The test is flaking, and at more than one instance and the reason for the 
failure is
{code:java}
  The code passed to eventually never returned normally. Attempted 109 times 
over 3.007988241397 minutes. Last failure message: Failure executing: GET 
at: 
https://192.168.39.167:8443/api/v1/namespaces/b37fc72a991b49baa68a2eaaa1516463/pods/spark-pi-97a9bc76308e7fe3-exec-1/log?pretty=false.
 Message: pods "spark-pi-97a9bc76308e7fe3-exec-1" not found. Received status: 
Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, 
kind=pods, name=spark-pi-97a9bc76308e7fe3-exec-1, retryAfterSeconds=null, 
uid=null, additionalProperties={}), kind=Status, message=pods 
"spark-pi-97a9bc76308e7fe3-exec-1" not found, metadata=ListMeta(_continue=null, 
remainingItemCount=null, resourceVersion=null, selfLink=null, 
additionalProperties={}), reason=NotFound, status=Failure, 
additionalProperties={}).. (KubernetesSuite.scala:402)
{code}

>From the above failure, it seems, that executor finishes too quickly and is 
>removed by spark before the test can complete. 

So, in order to mitigate this situation, one way is to turn on the flag

{code}
   "spark.kubernetes.executor.deleteOnTermination"
{code}


> Fix flaky test "Verify logging configuration is picked from the provided 
> SPARK_CONF_DIR/log4j.properties."
> --
>
> Key: SPARK-33668
> URL: https://issues.apache.org/jira/browse/SPARK-33668
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> The test is flaking, with multiple flaked instances - the reason for the 
> failure has been similar to:
> {code:java}
>   The code passed to eventually never returned normally. Attempted 109 times 
> over 3.007988241397 minutes. Last failure message: Failure executing: GET 
> at: 
> https://192.168.39.167:8443/api/v1/namespaces/b37fc72a991b49baa68a2eaaa1516463/pods/spark-pi-97a9bc76308e7fe3-exec-1/log?pretty=false.
>  Message: pods "spark-pi-97a9bc76308e7fe3-exec-1" not found. Received status: 
> Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, 
> kind=pods, name=spark-pi-97a9bc76308e7fe3-exec-1, retryAfterSeconds=null, 
> uid=null, additionalProperties={}), kind=Status, message=pods 
> "spark-pi-97a9bc76308e7fe3-exec-1" not found, 
> metadata=ListMeta(_continue=null, remainingItemCount=null, 
> resourceVersion=null, selfLink=null, additionalProperties={}), 
> reason=NotFound, status=Failure, additionalProperties={}).. 
> (KubernetesSuite.scala:402)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36854/console
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36852/console
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36850/console
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36848/console
> From the above failures, it seems, that executor finishes too quickly and is 

[jira] [Created] (SPARK-33668) Fix flaky test "Verify logging configuration is picked from the provided SPARK_CONF_DIR/log4j.properties."

2020-12-04 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-33668:
---

 Summary: Fix flaky test "Verify logging configuration is picked 
from the provided SPARK_CONF_DIR/log4j.properties."
 Key: SPARK-33668
 URL: https://issues.apache.org/jira/browse/SPARK-33668
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, Tests
Affects Versions: 3.1.0
Reporter: Prashant Sharma


The test is flaking, and at more than one instance and the reason for the 
failure is
{code:java}
  The code passed to eventually never returned normally. Attempted 109 times 
over 3.007988241397 minutes. Last failure message: Failure executing: GET 
at: 
https://192.168.39.167:8443/api/v1/namespaces/b37fc72a991b49baa68a2eaaa1516463/pods/spark-pi-97a9bc76308e7fe3-exec-1/log?pretty=false.
 Message: pods "spark-pi-97a9bc76308e7fe3-exec-1" not found. Received status: 
Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, 
kind=pods, name=spark-pi-97a9bc76308e7fe3-exec-1, retryAfterSeconds=null, 
uid=null, additionalProperties={}), kind=Status, message=pods 
"spark-pi-97a9bc76308e7fe3-exec-1" not found, metadata=ListMeta(_continue=null, 
remainingItemCount=null, resourceVersion=null, selfLink=null, 
additionalProperties={}), reason=NotFound, status=Failure, 
additionalProperties={}).. (KubernetesSuite.scala:402)
{code}

>From the above failure, it seems, that executor finishes too quickly and is 
>removed by spark before the test can complete. 

So, in order to mitigate this situation, one way is to turn on the flag

{code}
   "spark.kubernetes.executor.deleteOnTermination"
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33626) Allow k8s integration tests to assert both driver and executor logs for expected log(s)

2020-12-02 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-33626:

Summary: Allow k8s integration tests to assert both driver and executor 
logs for expected log(s)  (was: Allow k8s integration tests to assert both 
driver and executor logs for expected text(s))

> Allow k8s integration tests to assert both driver and executor logs for 
> expected log(s)
> ---
>
> Key: SPARK-33626
> URL: https://issues.apache.org/jira/browse/SPARK-33626
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Improve the k8s tests, to be able to assert both driver and executor logs for 
> expected 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33626) Allow k8s integration tests to assert both driver and executor logs for expected log(s)

2020-12-02 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-33626:

Description: Improve the k8s tests, to be able to assert both driver and 
executor logs for expected logs  (was: Improve the k8s tests, to be able to 
assert both driver and executor logs for expected )

> Allow k8s integration tests to assert both driver and executor logs for 
> expected log(s)
> ---
>
> Key: SPARK-33626
> URL: https://issues.apache.org/jira/browse/SPARK-33626
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Improve the k8s tests, to be able to assert both driver and executor logs for 
> expected logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33626) Allow k8s integration tests to assert both driver and executor logs for expected text(s)

2020-12-02 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-33626:

Description: Improve the k8s tests, to be able to assert both driver and 
executor logs for expected   (was: Improve the k8s tests, to be able to assert 
both driver and executor logs for must contain and must not contain.)

> Allow k8s integration tests to assert both driver and executor logs for 
> expected text(s)
> 
>
> Key: SPARK-33626
> URL: https://issues.apache.org/jira/browse/SPARK-33626
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Improve the k8s tests, to be able to assert both driver and executor logs for 
> expected 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33626) Allow k8s integration tests to assert both driver and executor logs for expected text(s)

2020-12-02 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-33626:

Summary: Allow k8s integration tests to assert both driver and executor 
logs for expected text(s)  (was: k8s integration tests should assert driver and 
executor logs for must & must not contain)

> Allow k8s integration tests to assert both driver and executor logs for 
> expected text(s)
> 
>
> Key: SPARK-33626
> URL: https://issues.apache.org/jira/browse/SPARK-33626
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Improve the k8s tests, to be able to assert both driver and executor logs for 
> must contain and must not contain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33626) k8s integration tests should assert driver and executor logs for must & must not contain

2020-12-01 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-33626:
---

 Summary: k8s integration tests should assert driver and executor 
logs for must & must not contain
 Key: SPARK-33626
 URL: https://issues.apache.org/jira/browse/SPARK-33626
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma


Improve the k8s tests, to be able to assert both driver and executor logs for 
must contain and must not contain.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32223) Support adding a user provided config map.

2020-11-25 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239069#comment-17239069
 ] 

Prashant Sharma commented on SPARK-32223:
-

One problem, I can think of with driver template is, even though it is possible 
to mount a config map, it is not as straight forward to mount it as 
SPARK_CONF_DIR. One may have to copy the spark.properties during container 
init. 


> Support adding a user provided config map.
> --
>
> Key: SPARK-32223
> URL: https://issues.apache.org/jira/browse/SPARK-32223
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> One of the challenge with this is, spark.properties is not user provided and 
> is calculated based on certain factors. So a user provided config map, cannot 
> be used as is to mount as SPARK_CONF_DIR, so it will have to be somehow 
> augmented with the correct spark.properties.
> Q, Do we support update to config map properties for an already running job?
> Ans: No, since the spark.properties is calculated at the time of job 
> submission, it cannot be updated on the fly and it is not supported by Spark 
> at the moment for all the configuration values.
> Q. What are the usecases, where supplying SPARK_CONF_DIR via a config map 
> helps?
> One of the use case, I can think of is programmatically submitting a `spark 
> on k8s` job - e.g. spark as a service on a cloud deployment may find this 
> feature useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32223) Support adding a user provided config map.

2020-11-25 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32223:

Description: 
One of the challenge with this is, spark.properties is not user provided and is 
calculated based on certain factors. So a user provided config map, cannot be 
used as is to mount as SPARK_CONF_DIR, so it will have to be somehow augmented 
with the correct spark.properties.

Q, Do we support update to config map properties for an already running job?
Ans: No, since the spark.properties is calculated at the time of job 
submission, it cannot be updated on the fly and it is not supported by Spark at 
the moment for all the configuration values.

Q. What are the usecases, where supplying SPARK_CONF_DIR via a config map helps?
One of the use case, I can think of is programmatically submitting a `spark on 
k8s` job - e.g. spark as a service on a cloud deployment may find this feature 
useful.



  was:
One of the challenge with this is, spark.properties is not user provided and is 
calculated based on certain factors. So a user provided config map, cannot be 
used as is to mount as SPARK_CONF_DIR, so it will have to be somehow augmented 
with the correct spark.properties.

Q, Do we support update to config map properties for an already running job?
Ans: No, since the spark.properties is calculated at the time of job 
submission, it cannot be updated on the fly and it is not supported by Spark at 
the moment for all the configuration values.

Q. What are the usecases, where supplying SPARK_CONF_DIR via a config map helps?
One of the use case, I can think of is programmatically submitting a spark on 
k8s job - e.g. spark as a service on cloud deployment may find this feature 
useful.




> Support adding a user provided config map.
> --
>
> Key: SPARK-32223
> URL: https://issues.apache.org/jira/browse/SPARK-32223
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> One of the challenge with this is, spark.properties is not user provided and 
> is calculated based on certain factors. So a user provided config map, cannot 
> be used as is to mount as SPARK_CONF_DIR, so it will have to be somehow 
> augmented with the correct spark.properties.
> Q, Do we support update to config map properties for an already running job?
> Ans: No, since the spark.properties is calculated at the time of job 
> submission, it cannot be updated on the fly and it is not supported by Spark 
> at the moment for all the configuration values.
> Q. What are the usecases, where supplying SPARK_CONF_DIR via a config map 
> helps?
> One of the use case, I can think of is programmatically submitting a `spark 
> on k8s` job - e.g. spark as a service on a cloud deployment may find this 
> feature useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32223) Support adding a user provided config map.

2020-11-25 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238617#comment-17238617
 ] 

Prashant Sharma commented on SPARK-32223:
-

Hi [~dongjoon],

Do you think this is useful ? Any other thoughts on this? 

Thanks ! 

> Support adding a user provided config map.
> --
>
> Key: SPARK-32223
> URL: https://issues.apache.org/jira/browse/SPARK-32223
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> One of the challenge with this is, spark.properties is not user provided and 
> is calculated based on certain factors. So a user provided config map, cannot 
> be used as is to mount as SPARK_CONF_DIR, so it will have to be somehow 
> augmented with the correct spark.properties.
> Q, Do we support update to config map properties for an already running job?
> Ans: No, since the spark.properties is calculated at the time of job 
> submission, it cannot be updated on the fly and it is not supported by Spark 
> at the moment for all the configuration values.
> Q. What are the usecases, where supplying SPARK_CONF_DIR via a config map 
> helps?
> One of the use case, I can think of is programmatically submitting a spark on 
> k8s job - e.g. spark as a service on cloud deployment may find this feature 
> useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32223) Support adding a user provided config map.

2020-11-25 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32223:

Description: 
One of the challenge with this is, spark.properties is not user provided and is 
calculated based on certain factors. So a user provided config map, cannot be 
used as is to mount as SPARK_CONF_DIR, so it will have to be somehow augmented 
with the correct spark.properties.

Q, Do we support update to config map properties for an already running job?
Ans: No, since the spark.properties is calculated at the time of job 
submission, it cannot be updated on the fly and it is not supported by Spark at 
the moment for all the configuration values.

Q. What are the usecases, where supplying SPARK_CONF_DIR via a config map helps?
One of the use case, I can think of is programmatically submitting a spark on 
k8s job - e.g. spark as a service on cloud deployment may find this feature 
useful.



  was:The semantics of this will be discussed and added soon.


> Support adding a user provided config map.
> --
>
> Key: SPARK-32223
> URL: https://issues.apache.org/jira/browse/SPARK-32223
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> One of the challenge with this is, spark.properties is not user provided and 
> is calculated based on certain factors. So a user provided config map, cannot 
> be used as is to mount as SPARK_CONF_DIR, so it will have to be somehow 
> augmented with the correct spark.properties.
> Q, Do we support update to config map properties for an already running job?
> Ans: No, since the spark.properties is calculated at the time of job 
> submission, it cannot be updated on the fly and it is not supported by Spark 
> at the moment for all the configuration values.
> Q. What are the usecases, where supplying SPARK_CONF_DIR via a config map 
> helps?
> One of the use case, I can think of is programmatically submitting a spark on 
> k8s job - e.g. spark as a service on cloud deployment may find this feature 
> useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32221) Avoid possible errors due to incorrect file size or type supplied in spark conf.

2020-11-23 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32221:

Description: 
This would avoid failures, in case the files are a bit large or a user places a 
binary file inside the SPARK_CONF_DIR.

Both of which are not supported at the moment.

The reason is, underlying etcd store does limit the size of each entry to only 
1.5 MiB.

[https://etcd.io/docs/v3.4.0/dev-guide/limit/] 

We can apply a straightforward approach of skipping files that cannot be 
accommodated within 1.5MiB limit (limit is configurable as per above link) and 
WARNING the user about the same.

For most use cases, this limit is more than sufficient, however a user may 
accidentally place a larger file and observe an unpredictable result or 
failures at run time.


  was:
This would avoid failures, in case the files are a bit large or a user places a 
binary file inside the SPARK_CONF_DIR.

Both of which are not supported at the moment.

The reason is, underlying etcd store does limit the size of each entry to only 
1 MiB( Recent versions of K8s have moved to using 3.4.x of etcd which allows 
for 1.5MiB limit). Once etcd is upgraded in all the popular k8s clusters, then 
we can hope to overcome this limitation. e.g. 
[https://etcd.io/docs/v3.4.0/dev-guide/limit/] version of etcd allows for 
higher limit on each entry.

Even if that does not happen, there are other ways to overcome this limitation, 
for example, we can have config files split across multiple configMaps. We need 
to discuss, and prioritise, this issue takes the straightforward approach of 
skipping files that cannot be accommodated within 1.5MiB limit and WARNING the 
user about the same.


> Avoid possible errors due to incorrect file size or type supplied in spark 
> conf.
> 
>
> Key: SPARK-32221
> URL: https://issues.apache.org/jira/browse/SPARK-32221
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> This would avoid failures, in case the files are a bit large or a user places 
> a binary file inside the SPARK_CONF_DIR.
> Both of which are not supported at the moment.
> The reason is, underlying etcd store does limit the size of each entry to 
> only 1.5 MiB.
> [https://etcd.io/docs/v3.4.0/dev-guide/limit/] 
> We can apply a straightforward approach of skipping files that cannot be 
> accommodated within 1.5MiB limit (limit is configurable as per above link) 
> and WARNING the user about the same.
> For most use cases, this limit is more than sufficient, however a user may 
> accidentally place a larger file and observe an unpredictable result or 
> failures at run time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32221) Avoid possible errors due to incorrect file size or type supplied in spark conf.

2020-11-17 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32221:

Description: 
This would avoid failures, in case the files are a bit large or a user places a 
binary file inside the SPARK_CONF_DIR.

Both of which are not supported at the moment.

The reason is, underlying etcd store does limit the size of each entry to only 
1 MiB( Recent versions of K8s have moved to using 3.4.x of etcd which allows 
for 1.5MiB limit). Once etcd is upgraded in all the popular k8s clusters, then 
we can hope to overcome this limitation. e.g. 
[https://etcd.io/docs/v3.4.0/dev-guide/limit/] version of etcd allows for 
higher limit on each entry.

Even if that does not happen, there are other ways to overcome this limitation, 
for example, we can have config files split across multiple configMaps. We need 
to discuss, and prioritise, this issue takes the straightforward approach of 
skipping files that cannot be accommodated within 1.5MiB limit and WARNING the 
user about the same.

  was:
This would avoid failures, in case the files are a bit large or a user places a 
binary file inside the SPARK_CONF_DIR.

Both of which are not supported at the moment.

The reason is, underlying etcd store does limit the size of each entry to only 
1 MiB. Once etcd is upgraded in all the popular k8s clusters, then we can hope 
to overcome this limitation. e.g. 
[https://etcd.io/docs/v3.4.0/dev-guide/limit/] version of etcd allows for 
higher limit on each entry.

Even if that does not happen, there are other ways to overcome this limitation, 
for example, we can have config files split across multiple configMaps. We need 
to discuss, and prioritise, this issue takes the straightforward approach of 
skipping files that cannot be accommodated within 1MiB limit and WARNING the 
user about the same.


> Avoid possible errors due to incorrect file size or type supplied in spark 
> conf.
> 
>
> Key: SPARK-32221
> URL: https://issues.apache.org/jira/browse/SPARK-32221
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> This would avoid failures, in case the files are a bit large or a user places 
> a binary file inside the SPARK_CONF_DIR.
> Both of which are not supported at the moment.
> The reason is, underlying etcd store does limit the size of each entry to 
> only 1 MiB( Recent versions of K8s have moved to using 3.4.x of etcd which 
> allows for 1.5MiB limit). Once etcd is upgraded in all the popular k8s 
> clusters, then we can hope to overcome this limitation. e.g. 
> [https://etcd.io/docs/v3.4.0/dev-guide/limit/] version of etcd allows for 
> higher limit on each entry.
> Even if that does not happen, there are other ways to overcome this 
> limitation, for example, we can have config files split across multiple 
> configMaps. We need to discuss, and prioritise, this issue takes the 
> straightforward approach of skipping files that cannot be accommodated within 
> 1.5MiB limit and WARNING the user about the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

2020-11-16 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233227#comment-17233227
 ] 

Prashant Sharma commented on SPARK-30985:
-

Thanks [~dongjoon], you have resolved the confusion I had. Indeed, this is what 
was intended.

> Propagate SPARK_CONF_DIR files to driver and exec pods.
> ---
>
> Key: SPARK-30985
> URL: https://issues.apache.org/jira/browse/SPARK-30985
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
> Fix For: 3.1.0
>
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific 
> configuration files.
> So this feature, will let the user specific configuration files be mounted on 
> the driver and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.
>  
> [Google docs link|https://bit.ly/spark-30985]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

2020-11-16 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232631#comment-17232631
 ] 

Prashant Sharma edited comment on SPARK-30985 at 11/16/20, 9:21 AM:


Reopening this JIRA as this was closed because, I created a PR incorrectly 
targeting the umbrella JIRA instead of the subtask : 


was (Author: prashant_):
Reopening this JIRA as, this is Umbrella jira and I created a PR incorrectly 
targeting the umbrella JIRA instead of the subtask : 

> Propagate SPARK_CONF_DIR files to driver and exec pods.
> ---
>
> Key: SPARK-30985
> URL: https://issues.apache.org/jira/browse/SPARK-30985
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
> Fix For: 3.1.0
>
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific 
> configuration files.
> So this feature, will let the user specific configuration files be mounted on 
> the driver and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.
>  
> [Google docs link|https://bit.ly/spark-30985]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

2020-11-16 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma reassigned SPARK-30985:
---

Assignee: (was: Prashant Sharma)

> Propagate SPARK_CONF_DIR files to driver and exec pods.
> ---
>
> Key: SPARK-30985
> URL: https://issues.apache.org/jira/browse/SPARK-30985
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
> Fix For: 3.1.0
>
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific 
> configuration files.
> So this feature, will let the user specific configuration files be mounted on 
> the driver and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.
>  
> [Google docs link|https://bit.ly/spark-30985]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33461) Foundational work for propagating SPARK_CONF_DIR

2020-11-16 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma reassigned SPARK-33461:
---

Assignee: Prashant Sharma

> Foundational work for propagating SPARK_CONF_DIR
> 
>
> Key: SPARK-33461
> URL: https://issues.apache.org/jira/browse/SPARK-33461
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
>
> Foundational work for propagating SPARK_CONF_DIR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33461) Foundational work for propagating SPARK_CONF_DIR

2020-11-16 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma resolved SPARK-33461.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

> Foundational work for propagating SPARK_CONF_DIR
> 
>
> Key: SPARK-33461
> URL: https://issues.apache.org/jira/browse/SPARK-33461
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
> Fix For: 3.1.0
>
>
> Foundational work for propagating SPARK_CONF_DIR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33461) Foundational work for propagating SPARK_CONF_DIR

2020-11-16 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-33461:
---

 Summary: Foundational work for propagating SPARK_CONF_DIR
 Key: SPARK-33461
 URL: https://issues.apache.org/jira/browse/SPARK-33461
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma


Foundational work for propagating SPARK_CONF_DIR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

2020-11-16 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma reopened SPARK-30985:
-

Reopening this JIRA as, this is Umbrella jira and I created a PR incorrectly 
targeting the umbrella JIRA instead of the subtask : 

> Propagate SPARK_CONF_DIR files to driver and exec pods.
> ---
>
> Key: SPARK-30985
> URL: https://issues.apache.org/jira/browse/SPARK-30985
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
> Fix For: 3.1.0
>
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific 
> configuration files.
> So this feature, will let the user specific configuration files be mounted on 
> the driver and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.
>  
> [Google docs link|https://bit.ly/spark-30985]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

2020-11-16 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232628#comment-17232628
 ] 

Prashant Sharma commented on SPARK-30985:
-

[~dongjoon] Hm.. my mistake. As you said, I added subtasks after creating the 
PR and this JIRA. I will re-open this JIRA and create a subtask and resolve it.

> Propagate SPARK_CONF_DIR files to driver and exec pods.
> ---
>
> Key: SPARK-30985
> URL: https://issues.apache.org/jira/browse/SPARK-30985
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
> Fix For: 3.1.0
>
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific 
> configuration files.
> So this feature, will let the user specific configuration files be mounted on 
> the driver and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.
>  
> [Google docs link|https://bit.ly/spark-30985]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33157) Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (Teradata dialect)

2020-10-15 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-33157:

Description: 
Override the default SQL strings for:
ALTER TABLE ADD COLUMN
ALTER TABLE UPDATE COLUMN TYPE
ALTER TABLE RENAME COLUMN
ALTER TABLE UPDATE COLUMN NULLABILITY
in the following PostgreSQL JDBC dialect according to official documentation.
Write PostgreSQL integration tests for JDBC.

  was:
Override the default SQL strings for:
ALTER TABLE UPDATE COLUMN TYPE
ALTER TABLE UPDATE COLUMN NULLABILITY
in the following PostgreSQL JDBC dialect according to official documentation.
Write PostgreSQL integration tests for JDBC.


> Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of 
> columns (Teradata dialect)
> ---
>
> Key: SPARK-33157
> URL: https://issues.apache.org/jira/browse/SPARK-33157
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Override the default SQL strings for:
> ALTER TABLE ADD COLUMN
> ALTER TABLE UPDATE COLUMN TYPE
> ALTER TABLE RENAME COLUMN
> ALTER TABLE UPDATE COLUMN NULLABILITY
> in the following PostgreSQL JDBC dialect according to official documentation.
> Write PostgreSQL integration tests for JDBC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33157) Support ALTER TABLE in JDBC v2 Table Catalog: update type and nullability of columns (Teradata dialect)

2020-10-15 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-33157:
---

 Summary: Support ALTER TABLE in JDBC v2 Table Catalog: update type 
and nullability of columns (Teradata dialect)
 Key: SPARK-33157
 URL: https://issues.apache.org/jira/browse/SPARK-33157
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Prashant Sharma


Override the default SQL strings for:
ALTER TABLE UPDATE COLUMN TYPE
ALTER TABLE UPDATE COLUMN NULLABILITY
in the following PostgreSQL JDBC dialect according to official documentation.
Write PostgreSQL integration tests for JDBC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33130) Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MsSqlServer dialect)

2020-10-13 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-33130:
---

 Summary: Support ALTER TABLE in JDBC v2 Table Catalog: add, update 
type and nullability of columns (MsSqlServer dialect)
 Key: SPARK-33130
 URL: https://issues.apache.org/jira/browse/SPARK-33130
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Prashant Sharma


Override the default SQL strings for:
ALTER TABLE RENAME COLUMN
ALTER TABLE UPDATE COLUMN NULLABILITY
in the following MsSQLServer JDBC dialect according to official documentation.
Write MsSqlServer integration tests for JDBC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33129) Since the sbt version is now upgraded, old `test-only` needs to be replaced with `testOnly`

2020-10-13 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-33129:

Description: Follow up to SPARK-21708, updating the references to test-only 
with testOnly. As the older syntax no longer works.

> Since the sbt version is now upgraded, old `test-only` needs to be replaced 
> with `testOnly`
> ---
>
> Key: SPARK-33129
> URL: https://issues.apache.org/jira/browse/SPARK-33129
> Project: Spark
>  Issue Type: Bug
>  Components: Build, docs
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Follow up to SPARK-21708, updating the references to test-only with testOnly. 
> As the older syntax no longer works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33129) Since the sbt version is now upgraded, old `test-only` needs to be replaced with `testOnly`

2020-10-13 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-33129:
---

 Summary: Since the sbt version is now upgraded, old `test-only` 
needs to be replaced with `testOnly`
 Key: SPARK-33129
 URL: https://issues.apache.org/jira/browse/SPARK-33129
 Project: Spark
  Issue Type: Bug
  Components: Build, docs
Affects Versions: 3.1.0
Reporter: Prashant Sharma






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33095) Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-13 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-33095:

Description: 
Override the default SQL strings for:
ALTER TABLE UPDATE COLUMN TYPE
ALTER TABLE UPDATE COLUMN NULLABILITY
in the following MySQL JDBC dialect according to official documentation.
Write MySQL integration tests for JDBC.

  was:
Override the default SQL strings for:
ALTER TABLE ADD COLUMN
ALTER TABLE UPDATE COLUMN TYPE
ALTER TABLE UPDATE COLUMN NULLABILITY
in the following MySQL JDBC dialect according to official documentation.
Write MySQL integration tests for JDBC.


> Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and 
> nullability of columns (MySQL dialect)
> -
>
> Key: SPARK-33095
> URL: https://issues.apache.org/jira/browse/SPARK-33095
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Override the default SQL strings for:
> ALTER TABLE UPDATE COLUMN TYPE
> ALTER TABLE UPDATE COLUMN NULLABILITY
> in the following MySQL JDBC dialect according to official documentation.
> Write MySQL integration tests for JDBC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33095) Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (MySQL dialect)

2020-10-08 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-33095:
---

 Summary: Support ALTER TABLE in JDBC v2 Table Catalog: add, update 
type and nullability of columns (MySQL dialect)
 Key: SPARK-33095
 URL: https://issues.apache.org/jira/browse/SPARK-33095
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Prashant Sharma


Override the default SQL strings for:
ALTER TABLE ADD COLUMN
ALTER TABLE UPDATE COLUMN TYPE
ALTER TABLE UPDATE COLUMN NULLABILITY
in the following MySQL JDBC dialect according to official documentation.
Write MySQL integration tests for JDBC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.

2020-09-18 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-32937:
---

 Summary: DecomissionSuite in k8s integration tests is failing.
 Key: SPARK-32937
 URL: https://issues.apache.org/jira/browse/SPARK-32937
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma



Logs from the failing test, copied from jenkins. As of now, it is always 
failing. 

{code}
- Test basic decommissioning *** FAILED ***
  The code passed to eventually never returned normally. Attempted 182 times 
over 3.00377927275 minutes. Last failure message: "++ id -u
  + myuid=185
  ++ id -g
  + mygid=0
  + set +e
  ++ getent passwd 185
  + uidentry=
  + set -e
  + '[' -z '' ']'
  + '[' -w /etc/passwd ']'
  + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
  + SPARK_CLASSPATH=':/opt/spark/jars/*'
  + env
  + grep SPARK_JAVA_OPT_
  + sort -t_ -k4 -n
  + sed 's/[^=]*=\(.*\)/\1/g'
  + readarray -t SPARK_EXECUTOR_JAVA_OPTS
  + '[' -n '' ']'
  + '[' 3 == 2 ']'
  + '[' 3 == 3 ']'
  ++ python3 -V
  + pyv3='Python 3.7.3'
  + export PYTHON_VERSION=3.7.3
  + PYTHON_VERSION=3.7.3
  + export PYSPARK_PYTHON=python3
  + PYSPARK_PYTHON=python3
  + export PYSPARK_DRIVER_PYTHON=python3
  + PYSPARK_DRIVER_PYTHON=python3
  + '[' -n '' ']'
  + '[' -z ']'
  + '[' -z x ']'
  + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
  + case "$1" in
  + shift 1
  + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
  + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
local:///opt/spark/tests/decommissioning.py
  20/09/17 11:06:56 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
  Starting decom test
  Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties
  20/09/17 11:06:56 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT
  20/09/17 11:06:57 INFO ResourceUtils: 
==
  20/09/17 11:06:57 INFO ResourceUtils: No custom resources configured for 
spark.driver.
  20/09/17 11:06:57 INFO ResourceUtils: 
==
  20/09/17 11:06:57 INFO SparkContext: Submitted application: PyMemoryTest
  20/09/17 11:06:57 INFO ResourceProfile: Default ResourceProfile created, 
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
memory -> name: memory, amount: 1024, script: , vendor: ), task resources: 
Map(cpus -> name: cpus, amount: 1.0)
  20/09/17 11:06:57 INFO ResourceProfile: Limiting resource is cpus at 1 tasks 
per executor
  20/09/17 11:06:57 INFO ResourceProfileManager: Added ResourceProfile id: 0
  20/09/17 11:06:57 INFO SecurityManager: Changing view acls to: 185,jenkins
  20/09/17 11:06:57 INFO SecurityManager: Changing modify acls to: 185,jenkins
  20/09/17 11:06:57 INFO SecurityManager: Changing view acls groups to: 
  20/09/17 11:06:57 INFO SecurityManager: Changing modify acls groups to: 
  20/09/17 11:06:57 INFO SecurityManager: SecurityManager: authentication 
enabled; ui acls disabled; users  with view permissions: Set(185, jenkins); 
groups with view permissions: Set(); users  with modify permissions: Set(185, 
jenkins); groups with modify permissions: Set()
  20/09/17 11:06:57 INFO Utils: Successfully started service 'sparkDriver' on 
port 7078.
  20/09/17 11:06:57 INFO SparkEnv: Registering MapOutputTracker
  20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMaster
  20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
  20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint 
up
  20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
  20/09/17 11:06:57 INFO DiskBlockManager: Created local directory at 
/var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/blockmgr-3bd403d0-6689-46be-997e-5bc699ecefd3
  20/09/17 11:06:57 INFO MemoryStore: MemoryStore started with capacity 593.9 
MiB
  20/09/17 11:06:57 INFO SparkEnv: Registering OutputCommitCoordinator
  20/09/17 11:06:58 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
  20/09/17 11:06:58 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://spark-test-app-08853d749bbee080-driver-svc.a0af92633bef4a91b5f7e262e919afd9.svc:4040
  20/09/17 11:06:58 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file
  20/09/17 11:06:59 INFO ExecutorPodsAllocator: Going to request 3 executors 
from Kubernetes.
  20/09/17 11:06:59 INFO KubernetesClientUtils: Spark configuration files 

[jira] [Resolved] (SPARK-32495) Update jackson-databind versions to fix various vulnerabilities.

2020-08-31 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma resolved SPARK-32495.
-
Resolution: Won't Fix

Resolving it as won't fix for now, as most of us feel the behaviour change that 
this may lead to, is not acceptable. And these security vulnerabilities do not 
impact Apache Spark.

For more details on the discussion see the Pull Request. 

https://github.com/apache/spark/pull/29334

> Update jackson-databind versions to fix various vulnerabilities.
> 
>
> Key: SPARK-32495
> URL: https://issues.apache.org/jira/browse/SPARK-32495
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: SHOBHIT SHUKLA
>Priority: Major
>
> As a vulnerability for Fasterxml Jackson version 2.6.7.3 is affected by 
> CVE-2017-15095 and CVE-2018-5968 CVEs 
> [https://nvd.nist.gov/vuln/detail/CVE-2018-5968], Would it be possible to 
> upgrade the jackson version for spark-2.4.6 and so on(2.4.x).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32617) Upgrade kubernetes client version to support latest minikube version.

2020-08-14 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-32617:
---

 Summary: Upgrade kubernetes client version to support latest 
minikube version.
 Key: SPARK-32617
 URL: https://issues.apache.org/jira/browse/SPARK-32617
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma


Following error comes, when the k8s integration tests are run against the 
minikube cluster with version 1.2.1

{code:java}
Run starting. Expected test count is: 18
KubernetesSuite:
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED ***
  io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred.
  at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
  at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:53)
  at 
io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:196)
  at 
io.fabric8.kubernetes.client.utils.HttpClientUtils.createHttpClient(HttpClientUtils.java:62)
  at io.fabric8.kubernetes.client.BaseClient.(BaseClient.java:51)
  at 
io.fabric8.kubernetes.client.DefaultKubernetesClient.(DefaultKubernetesClient.java:105)
  at 
org.apache.spark.deploy.k8s.integrationtest.backend.minikube.Minikube$.getKubernetesClient(Minikube.scala:81)
  at 
org.apache.spark.deploy.k8s.integrationtest.backend.minikube.MinikubeTestBackend$.initialize(MinikubeTestBackend.scala:33)
  at 
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.beforeAll(KubernetesSuite.scala:131)
  at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
  ...
  Cause: java.nio.file.NoSuchFileException: /root/.minikube/apiserver.crt
  at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
  at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
  at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
  at 
sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
  at java.nio.file.Files.newByteChannel(Files.java:361)
  at java.nio.file.Files.newByteChannel(Files.java:407)
  at java.nio.file.Files.readAllBytes(Files.java:3152)
  at 
io.fabric8.kubernetes.client.internal.CertUtils.getInputStreamFromDataOrFile(CertUtils.java:72)
  at 
io.fabric8.kubernetes.client.internal.CertUtils.createKeyStore(CertUtils.java:242)
  at 
io.fabric8.kubernetes.client.internal.SSLUtils.keyManagers(SSLUtils.java:128)
  ...
Run completed in 1 second, 821 milliseconds.
Total number of tests run: 0
Suites: completed 1, aborted 1
Tests: succeeded 0, failed 0, canceled 0, ignored 0, pending 0
*** 1 SUITE ABORTED ***
[INFO] 
[INFO] Reactor Summary for Spark Project Parent POM 3.1.0-SNAPSHOT:
[INFO] 
[INFO] Spark Project Parent POM ... SUCCESS [  4.454 s]
[INFO] Spark Project Tags . SUCCESS [  4.768 s]
[INFO] Spark Project Local DB . SUCCESS [  2.961 s]
[INFO] Spark Project Networking ... SUCCESS [  4.258 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  5.703 s]
[INFO] Spark Project Unsafe ... SUCCESS [  3.239 s]
[INFO] Spark Project Launcher . SUCCESS [  3.224 s]
[INFO] Spark Project Core . SUCCESS [02:25 min]
[INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 17.244 s]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  03:12 min
[INFO] Finished at: 2020-08-11T06:26:15-05:00
[INFO] 
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:2.0.0:test 
(integration-test) on project spark-kubernetes-integration-tests_2.12: There 
are test failures -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]  {code}

New minikube has support for profiles, which is simply enabled by upgrading the 
minikube version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32018) Fix UnsafeRow set overflowed decimal

2020-08-13 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17176906#comment-17176906
 ] 

Prashant Sharma commented on SPARK-32018:
-

This issue is resolved as fixed with version 2.4.7. However, I am unable to 
find the fix in branch 2.4. 

> Fix UnsafeRow set overflowed decimal
> 
>
> Key: SPARK-32018
> URL: https://issues.apache.org/jira/browse/SPARK-32018
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.6, 3.0.0
>Reporter: Allison Wang
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> There is a bug that writing an overflowed decimal into UnsafeRow is fine but 
> reading it out will throw ArithmeticException. This exception is thrown when 
> calling {{getDecimal}} in UnsafeRow with input decimal's precision greater 
> than the input precision. Setting the value of the overflowed decimal to null 
> when writing into UnsafeRow should fix this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32556) Fix release script to uri encode the user provided passwords.

2020-08-13 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32556:

Fix Version/s: 2.4.7

> Fix release script to uri encode the user provided passwords.
> -
>
> Key: SPARK-32556
> URL: https://issues.apache.org/jira/browse/SPARK-32556
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
> Fix For: 2.4.7, 3.0.1, 3.1.0
>
>
> As I was trying to do the release using the docker
> {code:java}
>  dev/create-release/do-release-docker.sh{code}
> script, there were some failures.
>  
>  # If the release manager password contains a char, that is not allowed in 
> URL, then it fails the build at the clone spark step.
>  # If the .gitignore file is missing, it fails the build at rm .gitignore 
> step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32556) Fix release script to uri encode the user provided passwords.

2020-08-07 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma resolved SPARK-32556.
-
Fix Version/s: 3.1.0
   3.0.1
   Resolution: Fixed

Issue resolved by pull request 29373
[https://github.com/apache/spark/pull/29373]

> Fix release script to uri encode the user provided passwords.
> -
>
> Key: SPARK-32556
> URL: https://issues.apache.org/jira/browse/SPARK-32556
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
> Fix For: 3.0.1, 3.1.0
>
>
> As I was trying to do the release using the docker
> {code:java}
>  dev/create-release/do-release-docker.sh{code}
> script, there were some failures.
>  
>  # If the release manager password contains a char, that is not allowed in 
> URL, then it fails the build at the clone spark step.
>  # If the .gitignore file is missing, it fails the build at rm .gitignore 
> step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32556) Fix release script to uri encode the user provided passwords.

2020-08-07 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma reassigned SPARK-32556:
---

Assignee: Prashant Sharma

> Fix release script to uri encode the user provided passwords.
> -
>
> Key: SPARK-32556
> URL: https://issues.apache.org/jira/browse/SPARK-32556
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Assignee: Prashant Sharma
>Priority: Major
>
> As I was trying to do the release using the docker
> {code:java}
>  dev/create-release/do-release-docker.sh{code}
> script, there were some failures.
>  
>  # If the release manager password contains a char, that is not allowed in 
> URL, then it fails the build at the clone spark step.
>  # If the .gitignore file is missing, it fails the build at rm .gitignore 
> step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32556) Fix release script to uri encode the user provided passwords.

2020-08-06 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32556:

Component/s: Project Infra

> Fix release script to uri encode the user provided passwords.
> -
>
> Key: SPARK-32556
> URL: https://issues.apache.org/jira/browse/SPARK-32556
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Project Infra
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> As I was trying to do the release using the docker
> {code:java}
>  dev/create-release/do-release-docker.sh{code}
> script, there were some failures.
>  
>  # If the release manager password contains a char, that is not allowed in 
> URL, then it fails the build at the clone spark step.
>  # If the .gitignore file is missing, it fails the build at rm .gitignore 
> step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32556) Fix release script to uri encode the user provided passwords.

2020-08-05 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32556:

Description: 
As I was trying to do the release using the docker
{code:java}
 dev/create-release/do-release-docker.sh{code}
script, there were some failures.

 
 # If the release manager password contains a char, that is not allowed in URL, 
then it fails the build at the clone spark step.
 # If the .gitignore file is missing, it fails the build at rm .gitignore step.

  was:
As I was trying to do the release using the docker
{code:java}
 dev/create-release/do-release-docker.sh{code}
script, there were some failures.

 
 # If the release manager password contains a char, that is not allowed in URL, 
then it fails the build at the clone spark step.


 # If the .gitignore file is missing, it fails the build at rm .gitignore step.


> Fix release script to uri encode the user provided passwords.
> -
>
> Key: SPARK-32556
> URL: https://issues.apache.org/jira/browse/SPARK-32556
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> As I was trying to do the release using the docker
> {code:java}
>  dev/create-release/do-release-docker.sh{code}
> script, there were some failures.
>  
>  # If the release manager password contains a char, that is not allowed in 
> URL, then it fails the build at the clone spark step.
>  # If the .gitignore file is missing, it fails the build at rm .gitignore 
> step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32556) Fix release script to uri encode the user provided passwords.

2020-08-05 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-32556:
---

 Summary: Fix release script to uri encode the user provided 
passwords.
 Key: SPARK-32556
 URL: https://issues.apache.org/jira/browse/SPARK-32556
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.0.0, 2.4.6, 3.1.0
Reporter: Prashant Sharma


As I was trying to do the release using the docker
{code:java}
 dev/create-release/do-release-docker.sh{code}
script, there were some failures.

 
 # If the release manager password contains a char, that is not allowed in URL, 
then it fails the build at the clone spark step.


 # If the .gitignore file is missing, it fails the build at rm .gitignore step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32495) Update jackson-databind versions to fix various vulnerabilities.

2020-08-03 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169800#comment-17169800
 ] 

Prashant Sharma commented on SPARK-32495:
-

In general, upgrading the version of a dependency can have a serious impact on 
the downstream users. In the above case, both of the times you have mentioned 
CVEs were found to be fixed in the version that spark currently depends on. It 
might be the advisories database is not updated with it, I have tried to ping 
the issues for fixing that. 

Personally, I feel the version 2.6.x is not maintained by jackson community, it 
might be affected by some security vulnerabilities that are not mentioned by 
you. As we continue to release 2.4.x line, in my opinion we should move to a 
maintained version of jackson. Therefore, I am going to make a PR and seek the 
community approval for the same.

> Update jackson-databind versions to fix various vulnerabilities.
> 
>
> Key: SPARK-32495
> URL: https://issues.apache.org/jira/browse/SPARK-32495
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: SHOBHIT SHUKLA
>Priority: Major
>
> As a vulnerability for Fasterxml Jackson version 2.6.7.3 is affected by 
> CVE-2017-15095 and CVE-2018-5968 CVEs 
> [https://nvd.nist.gov/vuln/detail/CVE-2018-5968], Would it be possible to 
> upgrade the jackson version for spark-2.4.6 and so on(2.4.x).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-32495) Update jackson-databind versions to fix various vulnerabilities.

2020-08-03 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169756#comment-17169756
 ] 

Prashant Sharma edited comment on SPARK-32495 at 8/3/20, 7:54 AM:
--

As per the issues and commit: 
[https://github.com/FasterXML/jackson-databind/commit/a3939d36edcc755c8af55bdc1969e0fa8438f9db|backport-commit],
 It is interesting to note that fix to both of these CVEs did land in 2.6.7.3,

1. [https://github.com/FasterXML/jackson-databind/pull/1945] 
 2. [https://github.com/FasterXML/jackson-databind/issues/1899]

And it happened in 2019, those advisories seem to indicate the opposite.


was (Author: prashant_):
As per the issues and commit: 
[https://github.com/FasterXML/jackson-databind/commit/a3939d36edcc755c8af55bdc1969e0fa8438f9db|backport-commit],
 It is interesting to note that fix to both of these CVEs did land in 2.6.7.3,

1. [https://github.com/FasterXML/jackson-databind/issues/1855]
 2. [https://github.com/FasterXML/jackson-databind/issues/1899]

And it happened in 2019, those advisories seem to indicate the opposite.

> Update jackson-databind versions to fix various vulnerabilities.
> 
>
> Key: SPARK-32495
> URL: https://issues.apache.org/jira/browse/SPARK-32495
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: SHOBHIT SHUKLA
>Priority: Major
>
> As a vulnerability for Fasterxml Jackson version 2.6.7.3 is affected by 
> CVE-2017-15095 and CVE-2018-5968 CVEs 
> [https://nvd.nist.gov/vuln/detail/CVE-2018-5968], Would it be possible to 
> upgrade the jackson version for spark-2.4.6 and so on(2.4.x).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32486) Issue with deserialization and persist api in latest spark java versions

2020-08-03 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169781#comment-17169781
 ] 

Prashant Sharma commented on SPARK-32486:
-

This issue needs discussion, before we can mark it as blocker. 

> Issue with deserialization and persist api in latest spark java versions
> 
>
> Key: SPARK-32486
> URL: https://issues.apache.org/jira/browse/SPARK-32486
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.4.4, 2.4.5, 2.4.6, 3.0.0
> Environment: It's happening on all the os and java8
>Reporter: Dinesh Kumar
>Priority: Major
>
> Hey Team, We have class level object instantiations in one of our Classes. 
> When we want to persist that data into the Dataset of this class Type it's 
> not persisting the null values instead it's taking class level precedence. 
> i.e. It's showing as new object.
> Eg: 
> _Test.class has below class level attributes:_
> _private Test1 testNumber = new Test1();_
> _private Test2 testNumber2;_
>  
> String inputLocation = "src/test/resources/pipeline/test.parquet";
> Dataset ds = this.session.read().parquet(inputLocation);
> ds.printSchema();
> ds.foreach(input->{
>  System.out.println(input); // When we verified it's showing testNumber, 
> testNumber2 as null
> });
> Dataset inputDataSet = ds.as(Encoders.bean(Test.class));
> inputDataSet.foreach(input->{
>  System.out.println(input); // When we verified it's showing testNumber as 
> new Test1(), testNumber2 as null
> });
>  
>  
> This is the same issue with dataset.persist() call aswell. It is happening 
> with all 2.4.4 and higher versions. Can you please fix it?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32486) Issue with deserialization and persist api in latest spark java versions

2020-08-03 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32486:

Priority: Major  (was: Blocker)

> Issue with deserialization and persist api in latest spark java versions
> 
>
> Key: SPARK-32486
> URL: https://issues.apache.org/jira/browse/SPARK-32486
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.4.4, 2.4.5, 2.4.6, 3.0.0
> Environment: It's happening on all the os and java8
>Reporter: Dinesh Kumar
>Priority: Major
>
> Hey Team, We have class level object instantiations in one of our Classes. 
> When we want to persist that data into the Dataset of this class Type it's 
> not persisting the null values instead it's taking class level precedence. 
> i.e. It's showing as new object.
> Eg: 
> _Test.class has below class level attributes:_
> _private Test1 testNumber = new Test1();_
> _private Test2 testNumber2;_
>  
> String inputLocation = "src/test/resources/pipeline/test.parquet";
> Dataset ds = this.session.read().parquet(inputLocation);
> ds.printSchema();
> ds.foreach(input->{
>  System.out.println(input); // When we verified it's showing testNumber, 
> testNumber2 as null
> });
> Dataset inputDataSet = ds.as(Encoders.bean(Test.class));
> inputDataSet.foreach(input->{
>  System.out.println(input); // When we verified it's showing testNumber as 
> new Test1(), testNumber2 as null
> });
>  
>  
> This is the same issue with dataset.persist() call aswell. It is happening 
> with all 2.4.4 and higher versions. Can you please fix it?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32495) Update jackson-databind versions to fix various vulnerabilities.

2020-08-03 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169756#comment-17169756
 ] 

Prashant Sharma commented on SPARK-32495:
-

As per the issues and commit: 
[https://github.com/FasterXML/jackson-databind/commit/a3939d36edcc755c8af55bdc1969e0fa8438f9db|backport-commit],
 It is interesting to note that fix to both of these CVEs did land in 2.6.7.3,

1. [https://github.com/FasterXML/jackson-databind/issues/1855]
 2. [https://github.com/FasterXML/jackson-databind/issues/1899]

And it happened in 2019, those advisories seem to indicate the opposite.

> Update jackson-databind versions to fix various vulnerabilities.
> 
>
> Key: SPARK-32495
> URL: https://issues.apache.org/jira/browse/SPARK-32495
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: SHOBHIT SHUKLA
>Priority: Major
>
> As a vulnerability for Fasterxml Jackson version 2.6.7.3 is affected by 
> CVE-2017-15095 and CVE-2018-5968 CVEs 
> [https://nvd.nist.gov/vuln/detail/CVE-2018-5968], Would it be possible to 
> upgrade the jackson version for spark-2.4.6 and so on(2.4.x).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32495) Update jackson-databind versions to fix various vulnerabilities.

2020-08-02 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-32495:

Summary: Update jackson-databind versions to fix various vulnerabilities.  
(was: Update jackson versions from 2.4.6 and so on(2.4.x))

> Update jackson-databind versions to fix various vulnerabilities.
> 
>
> Key: SPARK-32495
> URL: https://issues.apache.org/jira/browse/SPARK-32495
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: SHOBHIT SHUKLA
>Priority: Major
>
> As a vulnerability for Fasterxml Jackson version 2.6.7.3 is affected by 
> CVE-2017-15095 and CVE-2018-5968 CVEs 
> [https://nvd.nist.gov/vuln/detail/CVE-2018-5968], Would it be possible to 
> upgrade the jackson version for spark-2.4.6 and so on(2.4.x).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32495) Update jackson versions from 2.4.6 and so on(2.4.x)

2020-07-31 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168825#comment-17168825
 ] 

Prashant Sharma commented on SPARK-32495:
-

Furthermore, according to 
https://github.com/FasterXML/jackson-databind/commits/2.6 , the version 2.6.7.3 
has fixes to all the CVE upto version 2.9.10 which is >= 2.9.8.
 

> Update jackson versions from 2.4.6 and so on(2.4.x)
> ---
>
> Key: SPARK-32495
> URL: https://issues.apache.org/jira/browse/SPARK-32495
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: SHOBHIT SHUKLA
>Priority: Major
>
> Fasterxml Jackson version before 2.9.8 is affected by multiple CVEs 
> [https://github.com/FasterXML/jackson-databind/issues/2186], Would it be 
> possible to upgrade the jackson version to >= 2.9.8 for spark-2.4.6 and so 
> on(2.4.x).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32495) Update jackson versions from 2.4.6 and so on(2.4.x)

2020-07-31 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168628#comment-17168628
 ] 

Prashant Sharma commented on SPARK-32495:
-

Is this not specific to, jackson-databind?

And the link says, version 2.6.7.3 also has the fix to CVEs.

It appears, the issue SPARK-30333 already fixed it. Did I miss anything here?

> Update jackson versions from 2.4.6 and so on(2.4.x)
> ---
>
> Key: SPARK-32495
> URL: https://issues.apache.org/jira/browse/SPARK-32495
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.4.6
>Reporter: SHOBHIT SHUKLA
>Priority: Major
>
> Fasterxml Jackson version before 2.9.8 is affected by multiple CVEs 
> [https://github.com/FasterXML/jackson-databind/issues/2186], Would it be 
> possible to upgrade the jackson version to >= 2.9.8 for spark-2.4.6 and so 
> on(2.4.x).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32379) docker based spark release script should use correct CRAN repo.

2020-07-21 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-32379:
---

 Summary: docker based spark release script should use correct CRAN 
repo.
 Key: SPARK-32379
 URL: https://issues.apache.org/jira/browse/SPARK-32379
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.4.6
Reporter: Prashant Sharma


While running, dev/create-release/do-release-docker.sh script, it is failing 
with following errors

{code}
[root@kyok-test-1 ~]# tail docker-build.log 
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 r-base : Depends: r-base-core (>= 4.0.2-1.1804.0) but it is not going to be 
installed
  Depends: r-recommended (= 4.0.2-1.1804.0) but it is not going to be 
installed
 r-base-dev : Depends: r-base-core (>= 4.0.2-1.1804.0) but it is not going to 
be installed
E: Unable to correct problems, you have held broken packages.
The command '/bin/sh -c apt-get clean && apt-get update && $APT_INSTALL gnupg 
ca-certificates apt-transport-https &&   echo 'deb 
https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/' >> 
/etc/apt/sources.list &&   gpg --keyserver keyserver.ubuntu.com --recv-key 
E298A3A825C0D65DFD57CBB651716619E084DAB9 &&   gpg -a --export E084DAB9 | 
apt-key add - &&   apt-get clean &&   rm -rf /var/lib/apt/lists/* &&   apt-get 
clean &&   apt-get update &&   $APT_INSTALL software-properties-common &&   
apt-add-repository -y ppa:brightbox/ruby-ng &&   apt-get update &&   
$APT_INSTALL openjdk-8-jdk &&   update-alternatives --set java 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java &&   $APT_INSTALL curl wget git 
maven ivy subversion make gcc lsof libffi-dev pandoc pandoc-citeproc 
libssl-dev libcurl4-openssl-dev libxml2-dev &&   ln -s -T 
/usr/share/java/ivy.jar /usr/share/ant/lib/ivy.jar &&   curl -sL 
https://deb.nodesource.com/setup_4.x | bash &&   $APT_INSTALL nodejs &&   
$APT_INSTALL libpython2.7-dev libpython3-dev python-pip python3-pip &&   pip 
install --upgrade pip && hash -r pip &&   pip install setuptools &&   pip 
install $BASE_PIP_PKGS &&   pip install $PIP_PKGS &&   cd &&   virtualenv -p 
python3 /opt/p35 &&   . /opt/p35/bin/activate &&   pip install setuptools &&   
pip install $BASE_PIP_PKGS &&   pip install $PIP_PKGS &&   $APT_INSTALL r-base 
r-base-dev &&   $APT_INSTALL texlive-latex-base texlive texlive-fonts-extra 
texinfo qpdf &&   Rscript -e "install.packages(c('curl', 'xml2', 'httr', 
'devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2', 'e1071', 'survival'), 
repos='https://cloud.r-project.org/')" &&   Rscript -e 
"devtools::install_github('jimhester/lintr')" &&   $APT_INSTALL ruby2.3 
ruby2.3-dev mkdocs &&   gem install jekyll --no-rdoc --no-ri -v 3.8.6 &&   gem 
install jekyll-redirect-from -v 0.15.0 &&   gem install pygments.rb' returned a 
non-zero code: 100

{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32371) Autodetect persistently failing executor pods and fail the application logging the cause.

2020-07-20 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-32371:
---

 Summary: Autodetect persistently failing executor pods and fail 
the application logging the cause.
 Key: SPARK-32371
 URL: https://issues.apache.org/jira/browse/SPARK-32371
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma


{code:java}
[root@kyok-test-1 ~]# kubectl get po -w

NAME                                   READY   STATUS    RESTARTS   AGE

spark-shell-a3962a736bf9e775-exec-36   1/1     Running   0          5s

spark-shell-a3962a736bf9e775-exec-37   1/1     Running   0          3s

spark-shell-a3962a736bf9e775-exec-36   0/1     Error     0          5s

spark-shell-a3962a736bf9e775-exec-38   0/1     Pending   0          1s

spark-shell-a3962a736bf9e775-exec-38   0/1     Pending   0          1s

spark-shell-a3962a736bf9e775-exec-38   0/1     ContainerCreating   0          1s

spark-shell-a3962a736bf9e775-exec-36   0/1     Terminating         0          6s

spark-shell-a3962a736bf9e775-exec-36   0/1     Terminating         0          6s

spark-shell-a3962a736bf9e775-exec-37   0/1     Error               0          5s

spark-shell-a3962a736bf9e775-exec-38   1/1     Running             0          2s

spark-shell-a3962a736bf9e775-exec-39   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-39   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-39   0/1     ContainerCreating   0          0s

spark-shell-a3962a736bf9e775-exec-37   0/1     Terminating         0          6s

spark-shell-a3962a736bf9e775-exec-37   0/1     Terminating         0          6s

spark-shell-a3962a736bf9e775-exec-38   0/1     Error               0          4s

spark-shell-a3962a736bf9e775-exec-39   1/1     Running             0          1s

spark-shell-a3962a736bf9e775-exec-40   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-40   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-40   0/1     ContainerCreating   0          0s

spark-shell-a3962a736bf9e775-exec-38   0/1     Terminating         0          5s

spark-shell-a3962a736bf9e775-exec-38   0/1     Terminating         0          5s

spark-shell-a3962a736bf9e775-exec-39   0/1     Error               0          3s

spark-shell-a3962a736bf9e775-exec-40   1/1     Running             0          1s

spark-shell-a3962a736bf9e775-exec-41   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-41   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-41   0/1     ContainerCreating   0          0s

spark-shell-a3962a736bf9e775-exec-39   0/1     Terminating         0          4s

spark-shell-a3962a736bf9e775-exec-39   0/1     Terminating         0          4s

spark-shell-a3962a736bf9e775-exec-41   1/1     Running             0          2s

spark-shell-a3962a736bf9e775-exec-40   0/1     Error               0          4s

spark-shell-a3962a736bf9e775-exec-42   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-42   0/1     Pending             0          0s

spark-shell-a3962a736bf9e775-exec-42   0/1     ContainerCreating   0          0s

spark-shell-a3962a736bf9e775-exec-40   0/1     Terminating         0          4s

spark-shell-a3962a736bf9e775-exec-40   0/1     Terminating         0          4s

{code}
A cascade of creating and terminating pods within 3-4 seconds, is created. It 
is difficult to see the logs of these constantly created and terminated pods. 
Thankfully, there is an option
{code:java}
spark.kubernetes.executor.deleteOnTermination false  {code}
to turn off the auto deletion of executor pods, and gives us opportunity to 
diagnose the problem. However, this is not turned on by default, and sometimes 
one may need to guess what caused the problem the previous run and steps to 
reproduce it and then re run the application with exact same setup to reproduce.

So, it might be good, if we could somehow detect this situation, of pod failing 
as soon as they start or failing on particular task and capture the error that 
caused the pod to terminate and relay it back to driver and log it. 

Alternatively, if we could auto-detect this situation, we can also auto stop 
creating more executor pods and fail with appropriate error also retaining the 
last failed pod for user's further investigation.

So far it is not yet evaluated how this can be achieved, but, this feature 
might be useful for K8s growing as a preferred choice for deploying spark. 
Logging this issue for further investigation and work.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

2020-07-13 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-30985:

Description: 
SPARK_CONF_DIR hosts configuration files like, 
 1) spark-defaults.conf - containing all the spark properties.
 2) log4j.properties - Logger configuration.
 3) spark-env.sh - Environment variables to be setup at driver and executor.
 4) core-site.xml - Hadoop related configuration.
 5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
 6) metrics.properties - Spark metrics.
 7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific 
configuration files.

So this feature, will let the user specific configuration files be mounted on 
the driver and executor pods' SPARK_CONF_DIR.

Please review the attached design doc, for more details.

 

[Google docs link|https://bit.ly/spark-30985]

 

  was:
SPARK_CONF_DIR hosts configuration files like, 
 1) spark-defaults.conf - containing all the spark properties.
 2) log4j.properties - Logger configuration.
 3) spark-env.sh - Environment variables to be setup at driver and executor.
 4) core-site.xml - Hadoop related configuration.
 5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
 6) metrics.properties - Spark metrics.
 7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific 
configuration files.

So this feature, will let the user specific configuration files be mounted on 
the driver and executor pods' SPARK_CONF_DIR.

Please review the attached design doc, for more details.

 

[https://docs.google.com/document/d/1DUmNqMz5ky55yfegdh4e_CeItM_nqtrglFqFxsTxeeA/edit?usp=sharing]

 


> Propagate SPARK_CONF_DIR files to driver and exec pods.
> ---
>
> Key: SPARK-30985
> URL: https://issues.apache.org/jira/browse/SPARK-30985
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific 
> configuration files.
> So this feature, will let the user specific configuration files be mounted on 
> the driver and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.
>  
> [Google docs link|https://bit.ly/spark-30985]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

2020-07-13 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-30985:

Component/s: (was: Spark Core)

> Propagate SPARK_CONF_DIR files to driver and exec pods.
> ---
>
> Key: SPARK-30985
> URL: https://issues.apache.org/jira/browse/SPARK-30985
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific 
> configuration files.
> So this feature, will let the user specific configuration files be mounted on 
> the driver and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.
>  
> [https://docs.google.com/document/d/1DUmNqMz5ky55yfegdh4e_CeItM_nqtrglFqFxsTxeeA/edit?usp=sharing]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32223) Support adding a user provided config map.

2020-07-08 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-32223:
---

 Summary: Support adding a user provided config map.
 Key: SPARK-32223
 URL: https://issues.apache.org/jira/browse/SPARK-32223
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma


The semantics of this will be discussed and added soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32222) Add integration tests

2020-07-08 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-3:
---

 Summary: Add integration tests
 Key: SPARK-3
 URL: https://issues.apache.org/jira/browse/SPARK-3
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma


An integration test by placing a configuration file in SPARK_CONF_DIR, and 
verifying it is loaded on the executors in both client and cluster deploy mode. 
For this, a log4j.properties file is a good candidate for testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32221) Avoid possible errors due to incorrect file size or type supplied in spark conf.

2020-07-08 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-32221:
---

 Summary: Avoid possible errors due to incorrect file size or type 
supplied in spark conf.
 Key: SPARK-32221
 URL: https://issues.apache.org/jira/browse/SPARK-32221
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma


This would avoid failures, in case the files are a bit large or a user places a 
binary file inside the SPARK_CONF_DIR.

Both of which are not supported at the moment.

The reason is, underlying etcd store does limit the size of each entry to only 
1 MiB. Once etcd is upgraded in all the popular k8s clusters, then we can hope 
to overcome this limitation. e.g. 
[https://etcd.io/docs/v3.4.0/dev-guide/limit/] version of etcd allows for 
higher limit on each entry.

Even if that does not happen, there are other ways to overcome this limitation, 
for example, we can have config files split across multiple configMaps. We need 
to discuss, and prioritise, this issue takes the straightforward approach of 
skipping files that cannot be accommodated within 1MiB limit and WARNING the 
user about the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

2020-07-07 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-30985:

Description: 
SPARK_CONF_DIR hosts configuration files like, 
 1) spark-defaults.conf - containing all the spark properties.
 2) log4j.properties - Logger configuration.
 3) spark-env.sh - Environment variables to be setup at driver and executor.
 4) core-site.xml - Hadoop related configuration.
 5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
 6) metrics.properties - Spark metrics.
 7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific 
configuration files.

So this feature, will let the user specific configuration files be mounted on 
the driver and executor pods' SPARK_CONF_DIR.

Please review the attached design doc, for more details.

 

[https://docs.google.com/document/d/1DUmNqMz5ky55yfegdh4e_CeItM_nqtrglFqFxsTxeeA/edit?usp=sharing]

 

  was:
SPARK_CONF_DIR hosts configuration files like, 
 1) spark-defaults.conf - containing all the spark properties.
 2) log4j.properties - Logger configuration.
 3) spark-env.sh - Environment variables to be setup at driver and executor.
 4) core-site.xml - Hadoop related configuration.
 5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
 6) metrics.properties - Spark metrics.
 7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific 
configuration files.

So this feature, will let the user specific configuration files be mounted on 
the driver and executor pods' SPARK_CONF_DIR.

Please review the attached design doc, for more details.

 

https://drive.google.com/file/d/1p6gaJyOJdlB1rosJDFner3bj5VekTCJ3/view?usp=sharing


> Propagate SPARK_CONF_DIR files to driver and exec pods.
> ---
>
> Key: SPARK-30985
> URL: https://issues.apache.org/jira/browse/SPARK-30985
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific 
> configuration files.
> So this feature, will let the user specific configuration files be mounted on 
> the driver and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.
>  
> [https://docs.google.com/document/d/1DUmNqMz5ky55yfegdh4e_CeItM_nqtrglFqFxsTxeeA/edit?usp=sharing]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

2020-07-01 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-30985:

Description: 
SPARK_CONF_DIR hosts configuration files like, 
 1) spark-defaults.conf - containing all the spark properties.
 2) log4j.properties - Logger configuration.
 3) spark-env.sh - Environment variables to be setup at driver and executor.
 4) core-site.xml - Hadoop related configuration.
 5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
 6) metrics.properties - Spark metrics.
 7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific 
configuration files.

So this feature, will let the user specific configuration files be mounted on 
the driver and executor pods' SPARK_CONF_DIR.

Please review the attached design doc, for more details.

 

https://drive.google.com/file/d/1p6gaJyOJdlB1rosJDFner3bj5VekTCJ3/view?usp=sharing

  was:
SPARK_CONF_DIR hosts configuration files like, 
 1) spark-defaults.conf - containing all the spark properties.
 2) log4j.properties - Logger configuration.
 3) spark-env.sh - Environment variables to be setup at driver and executor.
 4) core-site.xml - Hadoop related configuration.
 5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
 6) metrics.properties - Spark metrics.
 7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific 
configuration files.

So this feature, will let the user specific configuration files be mounted on 
the driver and executor pods' SPARK_CONF_DIR.



Please review the attached design doc, for more details.


> Propagate SPARK_CONF_DIR files to driver and exec pods.
> ---
>
> Key: SPARK-30985
> URL: https://issues.apache.org/jira/browse/SPARK-30985
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific 
> configuration files.
> So this feature, will let the user specific configuration files be mounted on 
> the driver and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.
>  
> https://drive.google.com/file/d/1p6gaJyOJdlB1rosJDFner3bj5VekTCJ3/view?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

2020-07-01 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-30985:

Description: 
SPARK_CONF_DIR hosts configuration files like, 
 1) spark-defaults.conf - containing all the spark properties.
 2) log4j.properties - Logger configuration.
 3) spark-env.sh - Environment variables to be setup at driver and executor.
 4) core-site.xml - Hadoop related configuration.
 5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
 6) metrics.properties - Spark metrics.
 7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific 
configuration files.

So this feature, will let the user specific configuration files be mounted on 
the driver and executor pods' SPARK_CONF_DIR.



Please review the attached design doc, for more details.

  was:
SPARK_CONF_DIR hosts configuration files like, 
1) spark-defaults.conf - containing all the spark properties.
2) log4j.properties - Logger configuration.
3) spark-env.sh - Environment variables to be setup at driver and executor.
4) core-site.xml - Hadoop related configuration.
5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
6) metrics.properties - Spark metrics.
7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific 
configuration files and the default behaviour in the Yarn or standalone mode is 
that these configuration files are copied to the worker nodes as required by 
the users themselves. In other words, they are not auto-copied.

But, in the case of  spark on kubernetes, we use spark images and generally 
these images are approved or undergoe some kind of standardisation. These files 
cannot be simply copied to the SPARK_CONF_DIR of the running executor and 
driver pods by the user. 

So, at the moment we have special casing for providing each configuration and 
for any other user specific configuration files, the process is more complex, 
i.e. - e.g. one can start with their own custom image of spark with 
configuration files pre installed etc..
Examples of special casing are:
1. Hadoop configuration in spark.kubernetes.hadoop.configMapName
2. Spark-env.sh as in spark.kubernetes.driverEnv.[EnvironmentVariableName]
3. Log4j.properties as in https://github.com/apache/spark/pull/26193
... And for those such special casing does not exist, they are simply out of 
luck.

So this feature, will let the user specific configuration files be mounted on 
the driver and executor pods' SPARK_CONF_DIR.
At the moment it is not clear, if there is a need to, let user specify which 
config files to propagate - to driver and or executor. But, if there is a case 
that feature will be helpful, we can increase the scope of this work or create 
another JIRA issue to track that work.


> Propagate SPARK_CONF_DIR files to driver and exec pods.
> ---
>
> Key: SPARK-30985
> URL: https://issues.apache.org/jira/browse/SPARK-30985
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific 
> configuration files.
> So this feature, will let the user specific configuration files be mounted on 
> the driver and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30466) remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13

2020-06-24 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143607#comment-17143607
 ] 

Prashant Sharma commented on SPARK-30466:
-

https://issues.apache.org/jira/browse/HADOOP-15984 should update to latest 
jersey, this will give us a hadoop release without the 
({{j}}{{ackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13}}).

> remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13
> --
>
> Key: SPARK-30466
> URL: https://issues.apache.org/jira/browse/SPARK-30466
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Michael Burgener
>Priority: Major
>  Labels: security
>
> These 2 libraries are deprecated and replaced by the jackson-databind 
> libraries which are already included.  These two libraries are flagged by our 
> vulnerability scanners as having the following security vulnerabilities.  
> I've set the priority to Major due to the Critical nature and hopefully they 
> can be addressed quickly.  Please note, I'm not a developer but work in 
> InfoSec and this was flagged when we incorporated spark into our product.  If 
> you feel the priority is not set correctly please change accordingly.  I'll 
> watch the issue and flag our dev team to update once resolved.  
> jackson-mapper-asl-1.9.13
> CVE-2018-7489 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-7489] 
>  
> CVE-2017-7525 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-7525]
>  
> CVE-2017-17485 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-17485]
>  
> CVE-2017-15095 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-15095]
>  
> CVE-2018-5968 (CVSS 3.0 Score 8.1 High)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-5968]
>  
> jackson-core-asl-1.9.13
> CVE-2016-7051 (CVSS 3.0 Score 8.6 High)
> https://nvd.nist.gov/vuln/detail/CVE-2016-7051



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30466) remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13

2020-06-24 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143569#comment-17143569
 ] 

Prashant Sharma commented on SPARK-30466:
-

I just saw, Hadoop 3.2.1 still uses these jars(jackson-mapper-asl-1.9.13 and 
jackson-core-asl-1.9.13), they are a transitive dependency on jersey-json. See 
below.
{code:java}
[INFO] org.apache.hadoop:hadoop-common:jar:3.2.1
[INFO] +- org.apache.hadoop:hadoop-annotations:jar:3.2.1:compile
[INFO] |  \- jdk.tools:jdk.tools:jar:1.8:system
[INFO] +- com.google.guava:guava:jar:27.0-jre:compile
[INFO] |  +- com.google.guava:failureaccess:jar:1.0:compile
[INFO] |  +- 
com.google.guava:listenablefuture:jar:.0-empty-to-avoid-conflict-with-guava:compile
[INFO] |  +- org.checkerframework:checker-qual:jar:2.5.2:compile
[INFO] |  +- com.google.errorprone:error_prone_annotations:jar:2.2.0:compile
[INFO] |  +- com.google.j2objc:j2objc-annotations:jar:1.1:compile
[INFO] |  \- org.codehaus.mojo:animal-sniffer-annotations:jar:1.17:compile
[INFO] +- commons-cli:commons-cli:jar:1.2:compile
[INFO] +- org.apache.commons:commons-math3:jar:3.1.1:compile
[INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.6:compile
[INFO] |  \- org.apache.httpcomponents:httpcore:jar:4.4.10:compile
[INFO] +- commons-codec:commons-codec:jar:1.11:compile
[INFO] +- commons-io:commons-io:jar:2.5:compile
[INFO] +- commons-net:commons-net:jar:3.6:compile
[INFO] +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] +- javax.servlet:javax.servlet-api:jar:3.1.0:compile
[INFO] +- org.eclipse.jetty:jetty-server:jar:9.3.24.v20180605:compile
[INFO] |  +- org.eclipse.jetty:jetty-http:jar:9.3.24.v20180605:compile
[INFO] |  \- org.eclipse.jetty:jetty-io:jar:9.3.24.v20180605:compile
[INFO] +- org.eclipse.jetty:jetty-util:jar:9.3.24.v20180605:compile
[INFO] +- org.eclipse.jetty:jetty-servlet:jar:9.3.24.v20180605:compile
[INFO] |  \- org.eclipse.jetty:jetty-security:jar:9.3.24.v20180605:compile
[INFO] +- org.eclipse.jetty:jetty-webapp:jar:9.3.24.v20180605:compile
[INFO] |  \- org.eclipse.jetty:jetty-xml:jar:9.3.24.v20180605:compile
[INFO] +- org.eclipse.jetty:jetty-util-ajax:jar:9.3.24.v20180605:test
[INFO] +- javax.servlet.jsp:jsp-api:jar:2.1:runtime
[INFO] +- com.sun.jersey:jersey-core:jar:1.19:compile
[INFO] |  \- javax.ws.rs:jsr311-api:jar:1.1.1:compile
[INFO] +- com.sun.jersey:jersey-servlet:jar:1.19:compile
[INFO] +- com.sun.jersey:jersey-json:jar:1.19:compile
[INFO] |  +- org.codehaus.jettison:jettison:jar:1.1:compile
[INFO] |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
[INFO] |  |  \- javax.xml.bind:jaxb-api:jar:2.2.11:compile
[INFO] |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
[INFO] |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:compile
[INFO] |  \- org.codehaus.jackson:jackson-xc:jar:1.9.13:compile
[INFO] +- com.sun.jersey:jersey-server:jar:1.19:compile

{code}

> remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13
> --
>
> Key: SPARK-30466
> URL: https://issues.apache.org/jira/browse/SPARK-30466
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Michael Burgener
>Priority: Major
>  Labels: security
>
> These 2 libraries are deprecated and replaced by the jackson-databind 
> libraries which are already included.  These two libraries are flagged by our 
> vulnerability scanners as having the following security vulnerabilities.  
> I've set the priority to Major due to the Critical nature and hopefully they 
> can be addressed quickly.  Please note, I'm not a developer but work in 
> InfoSec and this was flagged when we incorporated spark into our product.  If 
> you feel the priority is not set correctly please change accordingly.  I'll 
> watch the issue and flag our dev team to update once resolved.  
> jackson-mapper-asl-1.9.13
> CVE-2018-7489 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-7489] 
>  
> CVE-2017-7525 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-7525]
>  
> CVE-2017-17485 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-17485]
>  
> CVE-2017-15095 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-15095]
>  
> CVE-2018-5968 (CVSS 3.0 Score 8.1 High)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-5968]
>  
> jackson-core-asl-1.9.13
> CVE-2016-7051 (CVSS 3.0 Score 8.6 High)
> https://nvd.nist.gov/vuln/detail/CVE-2016-7051



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, 

[jira] [Updated] (SPARK-31994) Docker image should use `https` urls for only mirrors that support it(SSL)

2020-06-15 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-31994:

Summary: Docker image should use `https` urls for only mirrors that support 
it(SSL)  (was: Docker image should use `https` urls for only deb.debian.org 
mirrors.)

> Docker image should use `https` urls for only mirrors that support it(SSL)
> --
>
> Key: SPARK-31994
> URL: https://issues.apache.org/jira/browse/SPARK-31994
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> It appears, that security.debian.org does not support https.
> {code}
> curl https://security.debian.org
> curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
> security.debian.org:443 
> {code}
> While building the image, it fails in the following way.
> {code}
> MacBook-Pro:spark prashantsharma$ bin/docker-image-tool.sh -r scrapcodes -t 
> v3.1.0-1 build
> Sending build context to Docker daemon  222.1MB
> Step 1/18 : ARG java_image_tag=8-jre-slim
> Step 2/18 : FROM openjdk:${java_image_tag}
>  ---> 381b20190cf7
> Step 3/18 : ARG spark_uid=185
>  ---> Using cache
>  ---> 65c06f86753c
> Step 4/18 : RUN set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*
>  ---> Running in a3461dadd6eb
> + sed -i s/http:/https:/g /etc/apt/sources.list
> + apt-get update
> Ign:1 https://security.debian.org/debian-security buster/updates InRelease
> Err:2 https://security.debian.org/debian-security buster/updates Release
>   Could not handshake: The TLS connection was non-properly terminated. [IP: 
> 151.101.0.204 443]
> Get:3 https://deb.debian.org/debian buster InRelease [121 kB]
> Get:4 https://deb.debian.org/debian buster-updates InRelease [51.9 kB]
> Get:5 https://deb.debian.org/debian buster/main amd64 Packages [7905 kB]
> Get:6 https://deb.debian.org/debian buster-updates/main amd64 Packages [7868 
> B]
> Reading package lists...
> E: The repository 'https://security.debian.org/debian-security buster/updates 
> Release' does not have a Release file.
> The command '/bin/sh -c set -ex && sed -i 's/http:/https:/g' 
> /etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && 
> apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
> mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
> /opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && 
> ln -sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
> /etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
> rm -rf /var/cache/apt/*' returned a non-zero code: 100
> Failed to build Spark JVM Docker image, please refer to Docker build output 
> for details.
> {code}
> So, if we limit the https support to only deb.debian.org, that does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31994) Docker image should use `https` urls for only deb.debian.org mirrors.

2020-06-15 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-31994:
---

 Summary: Docker image should use `https` urls for only 
deb.debian.org mirrors.
 Key: SPARK-31994
 URL: https://issues.apache.org/jira/browse/SPARK-31994
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.0.0, 3.1.0
Reporter: Prashant Sharma


It appears, that security.debian.org does not support https.
{code}
curl https://security.debian.org
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 
security.debian.org:443 
{code}

While building the image, it fails in the following way.
{code}
MacBook-Pro:spark prashantsharma$ bin/docker-image-tool.sh -r scrapcodes -t 
v3.1.0-1 build
Sending build context to Docker daemon  222.1MB
Step 1/18 : ARG java_image_tag=8-jre-slim
Step 2/18 : FROM openjdk:${java_image_tag}
 ---> 381b20190cf7
Step 3/18 : ARG spark_uid=185
 ---> Using cache
 ---> 65c06f86753c
Step 4/18 : RUN set -ex && sed -i 's/http:/https:/g' /etc/apt/sources.list 
&& apt-get update && ln -s /lib /lib64 && apt install -y bash tini 
libc6 libpam-modules krb5-user libnss3 procps && mkdir -p /opt/spark && 
mkdir -p /opt/spark/examples && mkdir -p /opt/spark/work-dir && touch 
/opt/spark/RELEASE && rm /bin/sh && ln -sv /bin/bash /bin/sh && 
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && chgrp root 
/etc/passwd && chmod ug+rw /etc/passwd && rm -rf /var/cache/apt/*
 ---> Running in a3461dadd6eb
+ sed -i s/http:/https:/g /etc/apt/sources.list
+ apt-get update
Ign:1 https://security.debian.org/debian-security buster/updates InRelease
Err:2 https://security.debian.org/debian-security buster/updates Release
  Could not handshake: The TLS connection was non-properly terminated. [IP: 
151.101.0.204 443]
Get:3 https://deb.debian.org/debian buster InRelease [121 kB]
Get:4 https://deb.debian.org/debian buster-updates InRelease [51.9 kB]
Get:5 https://deb.debian.org/debian buster/main amd64 Packages [7905 kB]
Get:6 https://deb.debian.org/debian buster-updates/main amd64 Packages [7868 B]
Reading package lists...
E: The repository 'https://security.debian.org/debian-security buster/updates 
Release' does not have a Release file.
The command '/bin/sh -c set -ex && sed -i 's/http:/https:/g' 
/etc/apt/sources.list && apt-get update && ln -s /lib /lib64 && apt 
install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && mkdir 
-p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
/opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && ln 
-sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
/etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
rm -rf /var/cache/apt/*' returned a non-zero code: 100
Failed to build Spark JVM Docker image, please refer to Docker build output for 
details.
{code}

So, if we limit the https support to only deb.debian.org, that does the trick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31371) FileStreamSource: Decide seen files on the checksum, instead of filename.

2020-04-29 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095286#comment-17095286
 ] 

Prashant Sharma commented on SPARK-31371:
-

Thank you for responding. As you said, a link to the JIRA or discussion would 
be helpful. My hope is something could be done, can we hide this behind a flag? 
What happens if a file is changed, which was processed earlier. Can we process 
the file as a fresh input, on each time it is updated(or checksum changed) ?

> FileStreamSource: Decide seen files on the checksum, instead of filename.
> -
>
> Key: SPARK-31371
> URL: https://issues.apache.org/jira/browse/SPARK-31371
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
>
> At the moment structured streaming's file source, ignores updates to the same 
> file, it has processed earlier. However, for reasons beyond our control, a 
> software might update the same file with new data. A case in point can be 
> rolling logs, where the latest log file is always e.g. log.txt and the rolled 
> logs could be log-1.txt etc... 
> So by supporting this, it may not actually be a special casing but supporting 
> a genuine use case. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30467) On Federal Information Processing Standard (FIPS) enabled cluster, Spark Workers are not able to connect to Remote Master.

2020-04-22 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma resolved SPARK-30467.
-
Resolution: Cannot Reproduce

> On Federal Information Processing Standard (FIPS) enabled cluster, Spark 
> Workers are not able to connect to Remote Master.
> --
>
> Key: SPARK-30467
> URL: https://issues.apache.org/jira/browse/SPARK-30467
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.3, 2.3.4, 2.4.4
>Reporter: SHOBHIT SHUKLA
>Priority: Major
>  Labels: security
>
> On _*Federal Information Processing Standard*_ (FIPS) enabled clusters, If we 
> configured *spark.network.crypto.enabled true* , Spark Workers are not able 
> to create Spark Context because of communication between Spark Worker and 
> Spark Master is failing.
> Default Algorithm ( *_spark.network.crypto.keyFactoryAlgorithm_* ) is set to 
> *_PBKDF2WithHmacSHA1_* and that is one of the Non Approved Cryptographic 
> Algorithm. We had tried so many values from FIPS Approved Cryptographic 
> Algorithm but those values are also not working.
> *Error logs :*
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> *fips.c(145): OpenSSL internal error, assertion failed: FATAL FIPS SELFTEST 
> FAILURE*
> JVMDUMP039I Processing dump event "abort", detail "" at 2020/01/09 06:41:50 - 
> please wait.
> JVMDUMP032I JVM requested System dump using 
> '/bin/core.20200109.064150.283.0001.dmp' in response to an event
> JVMDUMP030W Cannot write dump to 
> file/bin/core.20200109.064150.283.0001.dmp: Permission denied
> JVMDUMP012E Error in System dump: The core file created by child process with 
> pid = 375 was not found. Expected to find core file with name 
> "/var/cores/core-netty-rpc-conne-sig11-user1000320999-group0-pid375-time*"
> JVMDUMP030W Cannot write dump to file 
> /bin/javacore.20200109.064150.283.0002.txt: Permission denied
> JVMDUMP032I JVM requested Java dump using 
> '/tmp/javacore.20200109.064150.283.0002.txt' in response to an event
> JVMDUMP010I Java dump written to /tmp/javacore.20200109.064150.283.0002.txt
> JVMDUMP032I JVM requested Snap dump using 
> '/bin/Snap.20200109.064150.283.0003.trc' in response to an event
> JVMDUMP030W Cannot write dump to file 
> /bin/Snap.20200109.064150.283.0003.trc: Permission denied
> JVMDUMP010I Snap dump written to /tmp/Snap.20200109.064150.283.0003.trc
> JVMDUMP030W Cannot write dump to file 
> /bin/jitdump.20200109.064150.283.0004.dmp: Permission denied
> JVMDUMP007I JVM Requesting JIT dump using 
> '/tmp/jitdump.20200109.064150.283.0004.dmp'
> JVMDUMP010I JIT dump written to /tmp/jitdump.20200109.064150.283.0004.dmp
> JVMDUMP013I Processed dump event "abort", detail "".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30467) On Federal Information Processing Standard (FIPS) enabled cluster, Spark Workers are not able to connect to Remote Master.

2020-04-22 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089517#comment-17089517
 ] 

Prashant Sharma commented on SPARK-30467:
-

As it is already mentioned here, this is not a SPARK issue. It is an issue 
related to the JVM in use. As I was able to run various FIPS compliant 
configurations [Blog 
link|https://github.com/ScrapCodes/FIPS-compliance/blob/master/blogs/spark-meets-fips.md].

Based on this, I am closing this issue as cannot reproduce. Feel free to 
reopen, if you can get us more detail and a way to reproduce.

> On Federal Information Processing Standard (FIPS) enabled cluster, Spark 
> Workers are not able to connect to Remote Master.
> --
>
> Key: SPARK-30467
> URL: https://issues.apache.org/jira/browse/SPARK-30467
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 2.3.3, 2.3.4, 2.4.4
>Reporter: SHOBHIT SHUKLA
>Priority: Major
>  Labels: security
>
> On _*Federal Information Processing Standard*_ (FIPS) enabled clusters, If we 
> configured *spark.network.crypto.enabled true* , Spark Workers are not able 
> to create Spark Context because of communication between Spark Worker and 
> Spark Master is failing.
> Default Algorithm ( *_spark.network.crypto.keyFactoryAlgorithm_* ) is set to 
> *_PBKDF2WithHmacSHA1_* and that is one of the Non Approved Cryptographic 
> Algorithm. We had tried so many values from FIPS Approved Cryptographic 
> Algorithm but those values are also not working.
> *Error logs :*
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> *fips.c(145): OpenSSL internal error, assertion failed: FATAL FIPS SELFTEST 
> FAILURE*
> JVMDUMP039I Processing dump event "abort", detail "" at 2020/01/09 06:41:50 - 
> please wait.
> JVMDUMP032I JVM requested System dump using 
> '/bin/core.20200109.064150.283.0001.dmp' in response to an event
> JVMDUMP030W Cannot write dump to 
> file/bin/core.20200109.064150.283.0001.dmp: Permission denied
> JVMDUMP012E Error in System dump: The core file created by child process with 
> pid = 375 was not found. Expected to find core file with name 
> "/var/cores/core-netty-rpc-conne-sig11-user1000320999-group0-pid375-time*"
> JVMDUMP030W Cannot write dump to file 
> /bin/javacore.20200109.064150.283.0002.txt: Permission denied
> JVMDUMP032I JVM requested Java dump using 
> '/tmp/javacore.20200109.064150.283.0002.txt' in response to an event
> JVMDUMP010I Java dump written to /tmp/javacore.20200109.064150.283.0002.txt
> JVMDUMP032I JVM requested Snap dump using 
> '/bin/Snap.20200109.064150.283.0003.trc' in response to an event
> JVMDUMP030W Cannot write dump to file 
> /bin/Snap.20200109.064150.283.0003.trc: Permission denied
> JVMDUMP010I Snap dump written to /tmp/Snap.20200109.064150.283.0003.trc
> JVMDUMP030W Cannot write dump to file 
> /bin/jitdump.20200109.064150.283.0004.dmp: Permission denied
> JVMDUMP007I JVM Requesting JIT dump using 
> '/tmp/jitdump.20200109.064150.283.0004.dmp'
> JVMDUMP010I JIT dump written to /tmp/jitdump.20200109.064150.283.0004.dmp
> JVMDUMP013I Processed dump event "abort", detail "".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31371) FileStreamSource: Decide seen files on the checksum, instead of filename.

2020-04-07 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077048#comment-17077048
 ] 

Prashant Sharma commented on SPARK-31371:
-

[~tdas] What do you think?

> FileStreamSource: Decide seen files on the checksum, instead of filename.
> -
>
> Key: SPARK-31371
> URL: https://issues.apache.org/jira/browse/SPARK-31371
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
>
> At the moment structured streaming's file source, ignores updates to the same 
> file, it has processed earlier. However, for reasons beyond our control, a 
> software might update the same file with new data. A case in point can be 
> rolling logs, where the latest log file is always e.g. log.txt and the rolled 
> logs could be log-1.txt etc... 
> So by supporting this, it may not actually be a special casing but supporting 
> a genuine use case. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31371) FileStreamSource: Decide seen files on the checksum, instead of filename.

2020-04-07 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-31371:
---

 Summary: FileStreamSource: Decide seen files on the checksum, 
instead of filename.
 Key: SPARK-31371
 URL: https://issues.apache.org/jira/browse/SPARK-31371
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 2.4.5, 3.0.0
Reporter: Prashant Sharma


At the moment structured streaming's file source, ignores updates to the same 
file, it has processed earlier. However, for reasons beyond our control, a 
software might update the same file with new data. A case in point can be 
rolling logs, where the latest log file is always e.g. log.txt and the rolled 
logs could be log-1.txt etc... 
So by supporting this, it may not actually be a special casing but supporting a 
genuine use case. 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31200) Docker image build fails with Mirror sync in progress? errors.

2020-03-20 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-31200:

Description: 
Following errors would appear, in spite of trying to switch between various 
mirrors, while building spark docker image.

{code}
bash-3.2$ bin/docker-image-tool.sh -r scrapcodes -t v3.1.0-f1cc86 build
Sending build context to Docker daemon  203.4MB
Step 1/18 : ARG java_image_tag=8-jre-slim
Step 2/18 : FROM openjdk:${java_image_tag}
 ---> 381b20190cf7
Step 3/18 : ARG spark_uid=185
 ---> Using cache
 ---> 65c06f86753c
Step 4/18 : RUN set -ex && apt-get update && ln -s /lib /lib64 && 
apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
/opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && ln 
-sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
/etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
rm -rf /var/cache/apt/*
 ---> Running in 96bcbe927d35
+ apt-get update
Get:1 http://deb.debian.org/debian buster InRelease [122 kB]
Get:2 http://deb.debian.org/debian buster-updates InRelease [49.3 kB]
Get:3 http://deb.debian.org/debian buster/main amd64 Packages [7907 kB]
Err:3 http://deb.debian.org/debian buster/main amd64 Packages
  File has unexpected size (13217 != 7906744). Mirror sync in progress? [IP: 
151.101.10.133 80]
  Hashes of expected file:
   - Filesize:7906744 [weak]
   - SHA256:80ed5d1cc1f31a568b77e4fadfd9e01fa4d65e951243fd2ce29eee14d4b532cc
   - MD5Sum:80b6d9c1b6630b2234161e42f4040ab3 [weak]
  Release file created at: Sat, 08 Feb 2020 10:57:10 +
Get:5 http://deb.debian.org/debian buster-updates/main amd64 Packages [7380 B]
Err:5 http://deb.debian.org/debian buster-updates/main amd64 Packages
  File has unexpected size (13233 != 7380). Mirror sync in progress? [IP: 
151.101.10.133 80]
  Hashes of expected file:
   - Filesize:7380 [weak]
   - SHA256:6af9ea081b6a3da33cfaf76a81978517f65d38e45230089a5612e56f2b6b789d
  Release file created at: Fri, 20 Mar 2020 02:28:11 +
Get:4 http://security-cdn.debian.org/debian-security buster/updates InRelease 
[65.4 kB]
Get:6 http://security-cdn.debian.org/debian-security buster/updates/main amd64 
Packages [183 kB]
Fetched 419 kB in 1s (327 kB/s)
Reading package lists...
E: Failed to fetch 
http://deb.debian.org/debian/dists/buster/main/binary-amd64/by-hash/SHA256/80ed5d1cc1f31a568b77e4fadfd9e01fa4d65e951243fd2ce29eee14d4b532cc
  File has unexpected size (13217 != 7906744). Mirror sync in progress? [IP: 
151.101.10.133 80]
   Hashes of expected file:
- Filesize:7906744 [weak]
- SHA256:80ed5d1cc1f31a568b77e4fadfd9e01fa4d65e951243fd2ce29eee14d4b532cc
- MD5Sum:80b6d9c1b6630b2234161e42f4040ab3 [weak]
   Release file created at: Sat, 08 Feb 2020 10:57:10 +
E: Failed to fetch 
http://deb.debian.org/debian/dists/buster-updates/main/binary-amd64/by-hash/SHA256/6af9ea081b6a3da33cfaf76a81978517f65d38e45230089a5612e56f2b6b789d
  File has unexpected size (13233 != 7380). Mirror sync in progress? [IP: 
151.101.10.133 80]
   Hashes of expected file:
- Filesize:7380 [weak]
- SHA256:6af9ea081b6a3da33cfaf76a81978517f65d38e45230089a5612e56f2b6b789d
   Release file created at: Fri, 20 Mar 2020 02:28:11 +
E: Some index files failed to download. They have been ignored, or old ones 
used instead.
The command '/bin/sh -c set -ex && apt-get update && ln -s /lib /lib64 
&& apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps 
&& mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
/opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && ln 
-sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
/etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
rm -rf /var/cache/apt/*' returned a non-zero code: 100
Failed to build Spark JVM Docker image, please refer to Docker build output for 
details.

{code}

Soon, I found that, it was due to my ISP (Since we are working from home now !) 
trying to intercept http traffic. So it may not happen to others. But, just in 
case, if some is hit by such errors. Changing mirrors to https from http helped 
resolve it.

  was:
Somehow I was having trouble with Debian mirrors while building the spark 
image. Following errors would appear, in spite of trying to switch between 
various mirrors.
{code}
bash-3.2$ bin/docker-image-tool.sh -r scrapcodes -t v3.1.0-f1cc86 build
Sending build context to Docker daemon  203.4MB
Step 1/18 : ARG java_image_tag=8-jre-slim
Step 2/18 : FROM openjdk:${java_image_tag}
 ---> 381b20190cf7
Step 3/18 : ARG spark_uid=185
 ---> Using cache
 ---> 65c06f86753c
Step 4/18 : RUN set -ex && apt-get update && ln -s /lib /lib64 && 
apt install -y bash tini libc6 libpam-modules 

[jira] [Created] (SPARK-31200) Docker image build fails with Mirror sync in progress? errors.

2020-03-20 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-31200:
---

 Summary: Docker image build fails with Mirror sync in progress? 
errors.
 Key: SPARK-31200
 URL: https://issues.apache.org/jira/browse/SPARK-31200
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Prashant Sharma


Somehow I was having trouble with Debian mirrors while building the spark 
image. Following errors would appear, in spite of trying to switch between 
various mirrors.
{code}
bash-3.2$ bin/docker-image-tool.sh -r scrapcodes -t v3.1.0-f1cc86 build
Sending build context to Docker daemon  203.4MB
Step 1/18 : ARG java_image_tag=8-jre-slim
Step 2/18 : FROM openjdk:${java_image_tag}
 ---> 381b20190cf7
Step 3/18 : ARG spark_uid=185
 ---> Using cache
 ---> 65c06f86753c
Step 4/18 : RUN set -ex && apt-get update && ln -s /lib /lib64 && 
apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps && 
mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
/opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && ln 
-sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
/etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
rm -rf /var/cache/apt/*
 ---> Running in 96bcbe927d35
+ apt-get update
Get:1 http://deb.debian.org/debian buster InRelease [122 kB]
Get:2 http://deb.debian.org/debian buster-updates InRelease [49.3 kB]
Get:3 http://deb.debian.org/debian buster/main amd64 Packages [7907 kB]
Err:3 http://deb.debian.org/debian buster/main amd64 Packages
  File has unexpected size (13217 != 7906744). Mirror sync in progress? [IP: 
151.101.10.133 80]
  Hashes of expected file:
   - Filesize:7906744 [weak]
   - SHA256:80ed5d1cc1f31a568b77e4fadfd9e01fa4d65e951243fd2ce29eee14d4b532cc
   - MD5Sum:80b6d9c1b6630b2234161e42f4040ab3 [weak]
  Release file created at: Sat, 08 Feb 2020 10:57:10 +
Get:5 http://deb.debian.org/debian buster-updates/main amd64 Packages [7380 B]
Err:5 http://deb.debian.org/debian buster-updates/main amd64 Packages
  File has unexpected size (13233 != 7380). Mirror sync in progress? [IP: 
151.101.10.133 80]
  Hashes of expected file:
   - Filesize:7380 [weak]
   - SHA256:6af9ea081b6a3da33cfaf76a81978517f65d38e45230089a5612e56f2b6b789d
  Release file created at: Fri, 20 Mar 2020 02:28:11 +
Get:4 http://security-cdn.debian.org/debian-security buster/updates InRelease 
[65.4 kB]
Get:6 http://security-cdn.debian.org/debian-security buster/updates/main amd64 
Packages [183 kB]
Fetched 419 kB in 1s (327 kB/s)
Reading package lists...
E: Failed to fetch 
http://deb.debian.org/debian/dists/buster/main/binary-amd64/by-hash/SHA256/80ed5d1cc1f31a568b77e4fadfd9e01fa4d65e951243fd2ce29eee14d4b532cc
  File has unexpected size (13217 != 7906744). Mirror sync in progress? [IP: 
151.101.10.133 80]
   Hashes of expected file:
- Filesize:7906744 [weak]
- SHA256:80ed5d1cc1f31a568b77e4fadfd9e01fa4d65e951243fd2ce29eee14d4b532cc
- MD5Sum:80b6d9c1b6630b2234161e42f4040ab3 [weak]
   Release file created at: Sat, 08 Feb 2020 10:57:10 +
E: Failed to fetch 
http://deb.debian.org/debian/dists/buster-updates/main/binary-amd64/by-hash/SHA256/6af9ea081b6a3da33cfaf76a81978517f65d38e45230089a5612e56f2b6b789d
  File has unexpected size (13233 != 7380). Mirror sync in progress? [IP: 
151.101.10.133 80]
   Hashes of expected file:
- Filesize:7380 [weak]
- SHA256:6af9ea081b6a3da33cfaf76a81978517f65d38e45230089a5612e56f2b6b789d
   Release file created at: Fri, 20 Mar 2020 02:28:11 +
E: Some index files failed to download. They have been ignored, or old ones 
used instead.
The command '/bin/sh -c set -ex && apt-get update && ln -s /lib /lib64 
&& apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps 
&& mkdir -p /opt/spark && mkdir -p /opt/spark/examples && mkdir -p 
/opt/spark/work-dir && touch /opt/spark/RELEASE && rm /bin/sh && ln 
-sv /bin/bash /bin/sh && echo "auth required pam_wheel.so use_uid" >> 
/etc/pam.d/su && chgrp root /etc/passwd && chmod ug+rw /etc/passwd && 
rm -rf /var/cache/apt/*' returned a non-zero code: 100
Failed to build Spark JVM Docker image, please refer to Docker build output for 
details.

{code}

Soon, I found that, it was due to my ISP (Since we are working from home now !) 
trying to intercept http traffic. So it may not happen to others. But, just in 
case, if some is hit by such errors. Changing mirrors to https from http helped 
resolve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31120) Support enabling maven profiles for importing via sbt on Intellij IDEA.

2020-03-11 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-31120:

Summary: Support enabling maven profiles for importing via sbt on Intellij 
IDEA.  (was: Support enabling maven profiles while importing via sbt on 
Intellij IDEA.)

> Support enabling maven profiles for importing via sbt on Intellij IDEA.
> ---
>
> Key: SPARK-31120
> URL: https://issues.apache.org/jira/browse/SPARK-31120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Minor
> Attachments: Screenshot 2020-03-11 at 4.09.57 PM.png, Screenshot 
> 2020-03-11 at 4.18.09 PM.png
>
>
> At the moment there is no easy way to enable maven profiles if the intellij 
> IDEA project is imported via SBT. Only other work around is to set either OS 
> level Environment variable SBT_MAVEN_PROFILES. 
> So, in this patch we add a property sbt.maven.profiles, which can be 
> configured at the time of importing spark in IntelliJ IDEA. 
> See attached image for steps to set it up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31120) Support enabling maven profiles while importing via sbt on Intellij IDEA.

2020-03-11 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-31120:

Attachment: Screenshot 2020-03-11 at 4.18.09 PM.png

> Support enabling maven profiles while importing via sbt on Intellij IDEA.
> -
>
> Key: SPARK-31120
> URL: https://issues.apache.org/jira/browse/SPARK-31120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Minor
> Attachments: Screenshot 2020-03-11 at 4.09.57 PM.png, Screenshot 
> 2020-03-11 at 4.18.09 PM.png
>
>
> At the moment there is no easy way to enable maven profiles if the intellij 
> IDEA project is imported via SBT. Only other work around is to set either OS 
> level Environment variable SBT_MAVEN_PROFILES. 
> So, in this patch we add a property sbt.maven.profiles, which can be 
> configured at the time of importing spark in IntelliJ IDEA. 
> See attached image for steps to set it up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31120) Support enabling maven profiles while importing via sbt on Intellij IDEA.

2020-03-11 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-31120:

Attachment: Screenshot 2020-03-11 at 4.09.57 PM.png

> Support enabling maven profiles while importing via sbt on Intellij IDEA.
> -
>
> Key: SPARK-31120
> URL: https://issues.apache.org/jira/browse/SPARK-31120
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Minor
> Attachments: Screenshot 2020-03-11 at 4.09.57 PM.png
>
>
> At the moment there is no easy way to enable maven profiles if the intellij 
> IDEA project is imported via SBT. Only other work around is to set either OS 
> level Environment variable SBT_MAVEN_PROFILES. 
> So, in this patch we add a property sbt.maven.profiles, which can be 
> configured at the time of importing spark in IntelliJ IDEA. 
> See attached image for steps to set it up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31120) Support enabling maven profiles while importing via sbt on Intellij IDEA.

2020-03-11 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-31120:
---

 Summary: Support enabling maven profiles while importing via sbt 
on Intellij IDEA.
 Key: SPARK-31120
 URL: https://issues.apache.org/jira/browse/SPARK-31120
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.0.0
Reporter: Prashant Sharma


At the moment there is no easy way to enable maven profiles if the intellij 
IDEA project is imported via SBT. Only other work around is to set either OS 
level Environment variable SBT_MAVEN_PROFILES. 
So, in this patch we add a property sbt.maven.profiles, which can be configured 
at the time of importing spark in IntelliJ IDEA. 
See attached image for steps to set it up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

2020-03-05 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma reassigned SPARK-30985:
---

Assignee: (was: Prashant Sharma)

> Propagate SPARK_CONF_DIR files to driver and exec pods.
> ---
>
> Key: SPARK-30985
> URL: https://issues.apache.org/jira/browse/SPARK-30985
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
>
> SPARK_CONF_DIR hosts configuration files like, 
> 1) spark-defaults.conf - containing all the spark properties.
> 2) log4j.properties - Logger configuration.
> 3) spark-env.sh - Environment variables to be setup at driver and executor.
> 4) core-site.xml - Hadoop related configuration.
> 5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
> 6) metrics.properties - Spark metrics.
> 7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific 
> configuration files and the default behaviour in the Yarn or standalone mode 
> is that these configuration files are copied to the worker nodes as required 
> by the users themselves. In other words, they are not auto-copied.
> But, in the case of  spark on kubernetes, we use spark images and generally 
> these images are approved or undergoe some kind of standardisation. These 
> files cannot be simply copied to the SPARK_CONF_DIR of the running executor 
> and driver pods by the user. 
> So, at the moment we have special casing for providing each configuration and 
> for any other user specific configuration files, the process is more complex, 
> i.e. - e.g. one can start with their own custom image of spark with 
> configuration files pre installed etc..
> Examples of special casing are:
> 1. Hadoop configuration in spark.kubernetes.hadoop.configMapName
> 2. Spark-env.sh as in spark.kubernetes.driverEnv.[EnvironmentVariableName]
> 3. Log4j.properties as in https://github.com/apache/spark/pull/26193
> ... And for those such special casing does not exist, they are simply out of 
> luck.
> So this feature, will let the user specific configuration files be mounted on 
> the driver and executor pods' SPARK_CONF_DIR.
> At the moment it is not clear, if there is a need to, let user specify which 
> config files to propagate - to driver and or executor. But, if there is a 
> case that feature will be helpful, we can increase the scope of this work or 
> create another JIRA issue to track that work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31006) Mark Spark streaming as deprecated and add warnings.

2020-03-02 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-31006:

Description: 
It is noticed that some of the users of Spark streaming do not immediately 
realise that it is a deprecated component and it would be scary, if they end up 
with it in production. Now that we are in a position to release about Spark 
3.0.0, may be we should discuss - should the spark streaming carry an explicit 
notice? That it is not under active development.



  was:
It is noticed that some of the users of Spark streaming do not immediately 
realise that it is a deprecated component and fear for them that they end up 
with it in production. Now that we are in a position to release about Spark 
3.0.0, may be we should discuss - should the spark streaming carry an explicit 
notice? That it is not under active development.




> Mark Spark streaming as deprecated and add warnings.
> 
>
> Key: SPARK-31006
> URL: https://issues.apache.org/jira/browse/SPARK-31006
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
>
> It is noticed that some of the users of Spark streaming do not immediately 
> realise that it is a deprecated component and it would be scary, if they end 
> up with it in production. Now that we are in a position to release about 
> Spark 3.0.0, may be we should discuss - should the spark streaming carry an 
> explicit notice? That it is not under active development.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31006) Mark Spark streaming as deprecated and add warnings.

2020-03-02 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-31006:
---

 Summary: Mark Spark streaming as deprecated and add warnings.
 Key: SPARK-31006
 URL: https://issues.apache.org/jira/browse/SPARK-31006
 Project: Spark
  Issue Type: Bug
  Components: Documentation, Structured Streaming
Affects Versions: 3.0.0
Reporter: Prashant Sharma


It is noticed that some of the users of Spark streaming do not immediately 
realise that it is a deprecated component and fear for them that they end up 
with it in production. Now that we are in a position to release about Spark 
3.0.0, may be we should discuss - should the spark streaming carry an explicit 
notice? That it is not under active development.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30985) Propagate SPARK_CONF_DIR files to driver and exec pods.

2020-02-28 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-30985:
---

 Summary: Propagate SPARK_CONF_DIR files to driver and exec pods.
 Key: SPARK-30985
 URL: https://issues.apache.org/jira/browse/SPARK-30985
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Prashant Sharma
Assignee: Prashant Sharma


SPARK_CONF_DIR hosts configuration files like, 
1) spark-defaults.conf - containing all the spark properties.
2) log4j.properties - Logger configuration.
3) spark-env.sh - Environment variables to be setup at driver and executor.
4) core-site.xml - Hadoop related configuration.
5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
6) metrics.properties - Spark metrics.
7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific 
configuration files and the default behaviour in the Yarn or standalone mode is 
that these configuration files are copied to the worker nodes as required by 
the users themselves. In other words, they are not auto-copied.

But, in the case of  spark on kubernetes, we use spark images and generally 
these images are approved or undergoe some kind of standardisation. These files 
cannot be simply copied to the SPARK_CONF_DIR of the running executor and 
driver pods by the user. 

So, at the moment we have special casing for providing each configuration and 
for any other user specific configuration files, the process is more complex, 
i.e. - e.g. one can start with their own custom image of spark with 
configuration files pre installed etc..
Examples of special casing are:
1. Hadoop configuration in spark.kubernetes.hadoop.configMapName
2. Spark-env.sh as in spark.kubernetes.driverEnv.[EnvironmentVariableName]
3. Log4j.properties as in https://github.com/apache/spark/pull/26193
... And for those such special casing does not exist, they are simply out of 
luck.

So this feature, will let the user specific configuration files be mounted on 
the driver and executor pods' SPARK_CONF_DIR.
At the moment it is not clear, if there is a need to, let user specify which 
config files to propagate - to driver and or executor. But, if there is a case 
that feature will be helpful, we can increase the scope of this work or create 
another JIRA issue to track that work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30771) Failed mount warning from kubernetes and support the "optional" mount.

2020-02-10 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-30771:

Attachment: (was: Screenshot 2020-02-10 at 3.10.01 PM.png)

> Failed mount warning from kubernetes and support the "optional" mount.
> --
>
> Key: SPARK-30771
> URL: https://issues.apache.org/jira/browse/SPARK-30771
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
> Attachments: Screenshot 2020-02-10 at 3.10.01 PM.png
>
>
> 1)https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#configmapvolumesource-v1-core
> Kubernetes allows an optional field indicating, that if the mount for this 
> config map fails, then it is not reattempted nor the pod is declared to be 
> failed.
> In our current code base, we try to mount the volumes and create them later, 
> it works because, kubernetes reattempts failed mounting attempt, because the 
> `optional` field is `false` by default.
> But, if this optional field is set to true, then that mount will not take 
> place at all. Because, when the mount is performed the volume is not created 
> - so mount fails. And this time the mount is not reattempted because the 
> optional field is set as true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30771) Failed mount warning from kubernetes and support the "optional" mount.

2020-02-10 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-30771:

Attachment: Screenshot 2020-02-10 at 3.10.01 PM.png

> Failed mount warning from kubernetes and support the "optional" mount.
> --
>
> Key: SPARK-30771
> URL: https://issues.apache.org/jira/browse/SPARK-30771
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
> Attachments: Screenshot 2020-02-10 at 3.10.01 PM.png
>
>
> 1)https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#configmapvolumesource-v1-core
> Kubernetes allows an optional field indicating, that if the mount for this 
> config map fails, then it is not reattempted nor the pod is declared to be 
> failed.
> In our current code base, we try to mount the volumes and create them later, 
> it works because, kubernetes reattempts failed mounting attempt, because the 
> `optional` field is `false` by default.
> But, if this optional field is set to true, then that mount will not take 
> place at all. Because, when the mount is performed the volume is not created 
> - so mount fails. And this time the mount is not reattempted because the 
> optional field is set as true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30771) Avoid failed mount warning from kubernetes and support the "optional" mount.

2020-02-10 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-30771:
---

 Summary: Avoid failed mount warning from kubernetes and support 
the "optional" mount.
 Key: SPARK-30771
 URL: https://issues.apache.org/jira/browse/SPARK-30771
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Prashant Sharma
 Attachments: Screenshot 2020-02-10 at 3.10.01 PM.png

1)https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#configmapvolumesource-v1-core

Kubernetes allows an optional field indicating, that if the mount for this 
config map fails, then it is not reattempted nor the pod is declared to be 
failed.

In our current code base, we try to mount the volumes and create them later, it 
works because, kubernetes reattempts failed mounting attempt, because the 
`optional` field is `false` by default.

But, if this optional field is set to true, then that mount will not take place 
at all. Because, when the mount is performed the volume is not created - so 
mount fails. And this time the mount is not reattempted because the optional 
field is set as true.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30771) Failed mount warning from kubernetes and support the "optional" mount.

2020-02-10 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-30771:

Summary: Failed mount warning from kubernetes and support the "optional" 
mount.  (was: Avoid failed mount warning from kubernetes and support the 
"optional" mount.)

> Failed mount warning from kubernetes and support the "optional" mount.
> --
>
> Key: SPARK-30771
> URL: https://issues.apache.org/jira/browse/SPARK-30771
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
> Attachments: Screenshot 2020-02-10 at 3.10.01 PM.png
>
>
> 1)https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#configmapvolumesource-v1-core
> Kubernetes allows an optional field indicating, that if the mount for this 
> config map fails, then it is not reattempted nor the pod is declared to be 
> failed.
> In our current code base, we try to mount the volumes and create them later, 
> it works because, kubernetes reattempts failed mounting attempt, because the 
> `optional` field is `false` by default.
> But, if this optional field is set to true, then that mount will not take 
> place at all. Because, when the mount is performed the volume is not created 
> - so mount fails. And this time the mount is not reattempted because the 
> optional field is set as true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30771) Failed mount warning from kubernetes and support the "optional" mount.

2020-02-10 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-30771:

Attachment: Screenshot 2020-02-10 at 3.10.01 PM.png

> Failed mount warning from kubernetes and support the "optional" mount.
> --
>
> Key: SPARK-30771
> URL: https://issues.apache.org/jira/browse/SPARK-30771
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
> Attachments: Screenshot 2020-02-10 at 3.10.01 PM.png
>
>
> 1)https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/#configmapvolumesource-v1-core
> Kubernetes allows an optional field indicating, that if the mount for this 
> config map fails, then it is not reattempted nor the pod is declared to be 
> failed.
> In our current code base, we try to mount the volumes and create them later, 
> it works because, kubernetes reattempts failed mounting attempt, because the 
> `optional` field is `false` by default.
> But, if this optional field is set to true, then that mount will not take 
> place at all. Because, when the mount is performed the volume is not created 
> - so mount fails. And this time the mount is not reattempted because the 
> optional field is set as true.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30595) Unable to create local temp dir on spark on k8s mode, with defaults.

2020-02-05 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17031309#comment-17031309
 ] 

Prashant Sharma commented on SPARK-30595:
-

Resolving this as not a problem, as this was a problem with my own patch.

> Unable to create local temp dir on spark on k8s mode, with defaults.
> 
>
> Key: SPARK-30595
> URL: https://issues.apache.org/jira/browse/SPARK-30595
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Unless we configure the property,  {code} spark.local.dir /tmp {code} 
> following error occurs:
> {noformat}
> *20/01/21 08:33:17 INFO SparkEnv: Registering BlockManagerMasterHeartbeat*
> *20/01/21 08:33:17 ERROR DiskBlockManager: Failed to create local dir in 
> /var/data/spark-284c6844-8969-4288-9a6b-b72679c5b8e4. Ignoring this 
> directory.*
> *java.io.IOException: Failed to create a temp directory (under 
> /var/data/spark-284c6844-8969-4288-9a6b-b72679c5b8e4) after 10 attempts!*
> *at org.apache.spark.util.Utils$.createDirectory(Utils.scala:304)*
> *at 
> org.apache.spark.storage.DiskBlockManager.$anonfun$createLocalDirs$1(DiskBlockManager.scala:164)*
> *at 
> scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)*
> *at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)*
> {noformat}
> I have not yet fully understood the root cause, will post my findings once it 
> is clear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30595) Unable to create local temp dir on spark on k8s mode, with defaults.

2020-02-05 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma resolved SPARK-30595.
-
Resolution: Not A Problem

> Unable to create local temp dir on spark on k8s mode, with defaults.
> 
>
> Key: SPARK-30595
> URL: https://issues.apache.org/jira/browse/SPARK-30595
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Unless we configure the property,  {code} spark.local.dir /tmp {code} 
> following error occurs:
> {noformat}
> *20/01/21 08:33:17 INFO SparkEnv: Registering BlockManagerMasterHeartbeat*
> *20/01/21 08:33:17 ERROR DiskBlockManager: Failed to create local dir in 
> /var/data/spark-284c6844-8969-4288-9a6b-b72679c5b8e4. Ignoring this 
> directory.*
> *java.io.IOException: Failed to create a temp directory (under 
> /var/data/spark-284c6844-8969-4288-9a6b-b72679c5b8e4) after 10 attempts!*
> *at org.apache.spark.util.Utils$.createDirectory(Utils.scala:304)*
> *at 
> org.apache.spark.storage.DiskBlockManager.$anonfun$createLocalDirs$1(DiskBlockManager.scala:164)*
> *at 
> scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)*
> *at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)*
> {noformat}
> I have not yet fully understood the root cause, will post my findings once it 
> is clear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25065) Driver and executors pick the wrong logging configuration file.

2020-02-04 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029640#comment-17029640
 ] 

Prashant Sharma commented on SPARK-25065:
-

I have an updated patch available for this issue, any feedbacks would be 
appreciated.

> Driver and executors pick the wrong logging configuration file.
> ---
>
> Key: SPARK-25065
> URL: https://issues.apache.org/jira/browse/SPARK-25065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Currently, when running in kubernetes mode, it sets necessary configuration 
> properties by creating a spark.properties file and mounting a conf dir.
> The shipped Dockerfile, do not copy conf to the image, and this is on purpose 
> and that is well understood. However, one would like to have his custom 
> logging configuration file in the image conf directory.
> In order to achieve this, it is not enough to copy it in the spark's conf dir 
> of the resultant image, as it is reset during kubernetes mount conf volume 
> step.
>  
> In order to reproduce, please add {code}-Dlog4j.debug{code} to 
> {code:java}spark.(executor|driver).extraJavaOptions{code}. This way, it was 
> found the provided log4j file is not picked and the one coming from 
> kubernetes client jar was picked up by the driver process.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30595) Unable to create local temp dir on spark on k8s mode, with defaults.

2020-01-21 Thread Prashant Sharma (Jira)
Prashant Sharma created SPARK-30595:
---

 Summary: Unable to create local temp dir on spark on k8s mode, 
with defaults.
 Key: SPARK-30595
 URL: https://issues.apache.org/jira/browse/SPARK-30595
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Prashant Sharma


Unless we configure the property,  {code} spark.local.dir /tmp {code} following 
error occurs:

{noformat}
*20/01/21 08:33:17 INFO SparkEnv: Registering BlockManagerMasterHeartbeat*

*20/01/21 08:33:17 ERROR DiskBlockManager: Failed to create local dir in 
/var/data/spark-284c6844-8969-4288-9a6b-b72679c5b8e4. Ignoring this directory.*

*java.io.IOException: Failed to create a temp directory (under 
/var/data/spark-284c6844-8969-4288-9a6b-b72679c5b8e4) after 10 attempts!*

*at org.apache.spark.util.Utils$.createDirectory(Utils.scala:304)*

*at 
org.apache.spark.storage.DiskBlockManager.$anonfun$createLocalDirs$1(DiskBlockManager.scala:164)*

*at 
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)*

*at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)*
{noformat}

I have not yet fully understood the root cause, will post my findings once it 
is clear.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25065) Driver and executors pick the wrong logging configuration file.

2019-10-21 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-25065:

Labels:   (was: bulk-closed)

> Driver and executors pick the wrong logging configuration file.
> ---
>
> Key: SPARK-25065
> URL: https://issues.apache.org/jira/browse/SPARK-25065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Currently, when running in kubernetes mode, it sets necessary configuration 
> properties by creating a spark.properties file and mounting a conf dir.
> The shipped Dockerfile, do not copy conf to the image, and this is on purpose 
> and that is well understood. However, one would like to have his custom 
> logging configuration file in the image conf directory.
> In order to achieve this, it is not enough to copy it in the spark's conf dir 
> of the resultant image, as it is reset during kubernetes mount conf volume 
> step.
>  
> In order to reproduce, please add {code}-Dlog4j.debug{code} to 
> {code:java}spark.(executor|driver).extraJavaOptions{code}. This way, it was 
> found the provided log4j file is not picked and the one coming from 
> kubernetes client jar was picked up by the driver process.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-25065) Driver and executors pick the wrong logging configuration file.

2019-10-13 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma reopened SPARK-25065:
-

> Driver and executors pick the wrong logging configuration file.
> ---
>
> Key: SPARK-25065
> URL: https://issues.apache.org/jira/browse/SPARK-25065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.1
>Reporter: Prashant Sharma
>Priority: Major
>  Labels: bulk-closed
>
> Currently, when running in kubernetes mode, it sets necessary configuration 
> properties by creating a spark.properties file and mounting a conf dir.
> The shipped Dockerfile, do not copy conf to the image, and this is on purpose 
> and that is well understood. However, one would like to have his custom 
> logging configuration file in the image conf directory.
> In order to achieve this, it is not enough to copy it in the spark's conf dir 
> of the resultant image, as it is reset during kubernetes mount conf volume 
> step.
>  
> In order to reproduce, please add {code}-Dlog4j.debug{code} to 
> {code:java}spark.(executor|driver).extraJavaOptions{code}. This way, it was 
> found the provided log4j file is not picked and the one coming from 
> kubernetes client jar was picked up by the driver process.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25065) Driver and executors pick the wrong logging configuration file.

2019-10-13 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-25065:

Affects Version/s: (was: 2.3.1)
   3.0.0
   2.4.4

> Driver and executors pick the wrong logging configuration file.
> ---
>
> Key: SPARK-25065
> URL: https://issues.apache.org/jira/browse/SPARK-25065
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
>  Labels: bulk-closed
>
> Currently, when running in kubernetes mode, it sets necessary configuration 
> properties by creating a spark.properties file and mounting a conf dir.
> The shipped Dockerfile, do not copy conf to the image, and this is on purpose 
> and that is well understood. However, one would like to have his custom 
> logging configuration file in the image conf directory.
> In order to achieve this, it is not enough to copy it in the spark's conf dir 
> of the resultant image, as it is reset during kubernetes mount conf volume 
> step.
>  
> In order to reproduce, please add {code}-Dlog4j.debug{code} to 
> {code:java}spark.(executor|driver).extraJavaOptions{code}. This way, it was 
> found the provided log4j file is not picked and the one coming from 
> kubernetes client jar was picked up by the driver process.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22865) Publish Official Apache Spark Docker images

2019-09-17 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-22865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931464#comment-16931464
 ] 

Prashant Sharma commented on SPARK-22865:
-

Hi, what is the update on this issue? Do we have consensus on the official 
publishing pipeline for spark docker images?

> Publish Official Apache Spark Docker images
> ---
>
> Key: SPARK-22865
> URL: https://issues.apache.org/jira/browse/SPARK-22865
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22842) Support consuming oauth token and certs through environment variables

2019-09-17 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931462#comment-16931462
 ] 

Prashant Sharma commented on SPARK-22842:
-

Fabric8 client supports exporting Env vars for config. I have been using 
KUBECONFIG export, and it works. Is this issue, for a specific env variable? or 
can I close it?

> Support consuming oauth token and certs through environment variables
> -
>
> Key: SPARK-22842
> URL: https://issues.apache.org/jira/browse/SPARK-22842
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> Based on https://github.com/apache/spark/pull/19946#discussion_r156499655
> We want to support these through env-vars to avoid passing them to 
> spark-submit, thereby making them visible in ps.
> This is likely already supported under the k8s-client and just needs 
> verification and documentation. This is not a blocker for 2.3.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21869) A cached Kafka producer should not be closed if any task is using it.

2019-09-09 Thread Prashant Sharma (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-21869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-21869:

Affects Version/s: (was: 2.2.0)
   3.0.0
   2.4.4

> A cached Kafka producer should not be closed if any task is using it.
> -
>
> Key: SPARK-21869
> URL: https://issues.apache.org/jira/browse/SPARK-21869
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Shixiong Zhu
>Priority: Major
>
> Right now a cached Kafka producer may be closed if a large task uses it for 
> more than 10 minutes.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28367) Kafka connector infinite wait because metadata never updated

2019-09-05 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923132#comment-16923132
 ] 

Prashant Sharma commented on SPARK-28367:
-

Hi, can you please provide the link to the kafka discussion as well?

> Kafka connector infinite wait because metadata never updated
> 
>
> Key: SPARK-28367
> URL: https://issues.apache.org/jira/browse/SPARK-28367
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.3, 2.2.3, 2.3.3, 2.4.3, 3.0.0
>Reporter: Gabor Somogyi
>Priority: Critical
>
> Spark uses an old and deprecated API named poll(long) which never returns and 
> stays in live lock if metadata is not updated (for instance when broker 
> disappears at consumer creation).
> I've created a small standalone application to test it and the alternatives: 
> https://github.com/gaborgsomogyi/kafka-get-assignment



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27664) Performance issue with FileStatusCache, while reading from object stores.

2019-06-12 Thread Prashant Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma resolved SPARK-27664.
-
Resolution: Won't Fix

I am marking it as won't fix, because, this is now difficult to reproduce. In 
version, 2.3.x the problem was more evident but after the merging of 
SPARK-23896, the version 2.4.x and above do not do a lot of re-listing for the 
general case. 
 But, problem of relisting still exists (i.e. the version 2.4.3 and current 
3.0.0 unreleased) and following code can be used to reproduce it.
{code:java}
// First create the object store data for testing 
spark.range(0,100, 1, 10).selectExpr("id", "id < 100 as 
p").write.partitionBy("p").save("")

// Then following commands would reproduce it.
// With times.
// 19/05/24 03:07:56
val s = s"""
|CREATE EXTERNAL TABLE test11(id bigint)
|PARTITIONED BY (p boolean)
|STORED AS parquet
|LOCATION ''""".stripMargin
spark.sql(s)

spark.sql("ALTER TABLE test11 add partition (p=true)")
spark.sql("ALTER TABLE test11 add partition (p=false)")
spark.sql("SELECT * FROM test11 where id <10").show()
// 19/05/24 03:50:43
spark.sql("SELECT * FROM test11 where id <100").show()
// 19/05/24 04:28:19


{code}
 As you can see above, the overall time taken is much more the time taken for 
an extra re-listing. So, the difference in performance is hard to notice. 
However, this issue along with the fix can be reconsidered later, if the 
problem resurfaces with larger impact.

> Performance issue with FileStatusCache, while reading from object stores.
> -
>
> Key: SPARK-27664
> URL: https://issues.apache.org/jira/browse/SPARK-27664
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Prashant Sharma
>Priority: Major
>
> In short,
> This issue(i.e. degraded performance ) surfaces when the number of files are 
> large > 100K, and is stored on an object store, or any remote storage. The 
> actual issue is due to,
> Everything is inserted as a single entry in the FileStatusCache i.e. guava 
> cache, which does not fit unless the cache is configured to be very very 
> large or 4X. Reason: [https://github.com/google/guava/issues/3462].
>  
> Full story, with possible solutions,
> When we read a directory in spark by,
> {code:java}
> spark.read.parquet("/dir/data/test").limit(1).show()
> {code}
> behind the scenes, it fetches the FileStatus objects and caches them, inside 
> a FileStatusCache, so that it does not need to refetch these objects. 
> Internally, it scans using listLeafFiles function at driver. 
>  Inside the cache, the entire content of the listing as array of FileStatus 
> objects is inserted as a single entry, with key as "/dir/data/test", in the 
> FileStatusCache. The default size of this cache is 250MB and it is 
> configurable. This underlying cache uses guava cache.
> The guava cache has one interesting property, i.e. a single entry can only be 
> as large as
> {code:java}
> maximumSize/concurrencyLevel{code}
> see [https://github.com/google/guava/issues/3462], for details. So for a 
> cache size of 250MB, a single entry can be as large as only 250MB/4, since 
> the default concurrency level is 4 in guava. This size is around 62MB, which 
> is good enough for most datasets, but for directories with larger listing, it 
> does not work that well. And the effect of this is especially evident when 
> such listings are for object stores like Amazon s3 or IBM Cloud object store 
> etc..
> So, currently one can work around this problem by setting the value of size 
> of the cache (i.e. `spark.sql.hive.filesourcePartitionFileCacheSize`) as very 
> high, as it needs to be much more than 4x of what is required. But this has a 
> drawback, that either one has to start the driver with large amount of memory 
> than required or risk an OOM when cache does not evict older entries as the 
> size is configured to be 4x.
> In order to fix this issue, we can take 3 different approaches,
> 1) one stop gap fix can be, reduce the concurrency level of the guava cache 
> to be just 1, for few entries with very large size, we do not lose much by 
> doing this.
> 2) The alternative would be, to divide the input array into multiple entries 
> in the cache, instead of inserting everything against a single key. This can 
> be done using directories as keys, if there are multiple nested directories 
> under a directory, but if a user has everything listed under a single dir, 
> then this solution does not help either and we cannot depend on them creating 
> partitions in their hive/sql table.
> 3) One more alternative fix would be, to make concurrency level configurable, 
> for those who want to change it. And while inserting the entry in the cache 
> divide it into 

[jira] [Updated] (SPARK-27664) Performance issue with FileStatusCache, while reading from object stores.

2019-05-10 Thread Prashant Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-27664:

Description: 
In short,

This issue(i.e. degraded performance ) surfaces when the number of files are 
large > 100K, and is stored on an object store, or any remote storage. The 
actual issue is due to,

Everything is inserted as a single entry in the FileStatusCache i.e. guava 
cache, which does not fit unless the cache is configured to be very very large 
or 4X. Reason: [https://github.com/google/guava/issues/3462].

 

Full story, with possible solutions,

When we read a directory in spark by,
{code:java}
spark.read.parquet("/dir/data/test").limit(1).show()
{code}
behind the scenes, it fetches the FileStatus objects and caches them, inside a 
FileStatusCache, so that it does not need to refetch these objects. Internally, 
it scans using listLeafFiles function at driver. 
 Inside the cache, the entire content of the listing as array of FileStatus 
objects is inserted as a single entry, with key as "/dir/data/test", in the 
FileStatusCache. The default size of this cache is 250MB and it is 
configurable. This underlying cache uses guava cache.

The guava cache has one interesting property, i.e. a single entry can only be 
as large as
{code:java}
maximumSize/concurrencyLevel{code}
see [https://github.com/google/guava/issues/3462], for details. So for a cache 
size of 250MB, a single entry can be as large as only 250MB/4, since the 
default concurrency level is 4 in guava. This size is around 62MB, which is 
good enough for most datasets, but for directories with larger listing, it does 
not work that well. And the effect of this is especially evident when such 
listings are for object stores like Amazon s3 or IBM Cloud object store etc..

So, currently one can work around this problem by setting the value of size of 
the cache (i.e. `spark.sql.hive.filesourcePartitionFileCacheSize`) as very 
high, as it needs to be much more than 4x of what is required. But this has a 
drawback, that either one has to start the driver with large amount of memory 
than required or risk an OOM when cache does not evict older entries as the 
size is configured to be 4x.

In order to fix this issue, we can take 3 different approaches,

1) one stop gap fix can be, reduce the concurrency level of the guava cache to 
be just 1, for few entries with very large size, we do not lose much by doing 
this.

2) The alternative would be, to divide the input array into multiple entries in 
the cache, instead of inserting everything against a single key. This can be 
done using directories as keys, if there are multiple nested directories under 
a directory, but if a user has everything listed under a single dir, then this 
solution does not help either and we cannot depend on them creating partitions 
in their hive/sql table.

3) One more alternative fix would be, to make concurrency level configurable, 
for those who want to change it. And while inserting the entry in the cache 
divide it into the `concurrencyLevel`(or even 2X or 3X of it) number of parts, 
before inserting. This way cache will perform optimally, and even if there is 
an eviction, it will evict only a part of the entries, as against all the 
entries in the current implementation. How many entries are evicted due to 
size, depends on concurrencyLevel configured. This approach can be taken, even 
without making `concurrencyLevel` configurable.

The problem with this approach is, the partitions in cache are of no use as 
such, because even if one partition is evicted, then all the partitions of the 
key should also be evicted, otherwise the results would be wrong. 

  was:
In short,

This issue(i.e. degraded performance ) surfaces when the number of files are 
large > 100K, and is stored on an object store, or any remote storage. The 
actual issue is due to,

Everything is inserted as a single entry in the FileStatusCache i.e. guava 
cache, which does not fit unless the cache is configured to be very very large 
or 4X. Reason: [https://github.com/google/guava/issues/3462].

 

Full story, with possible solutions,

When we read a directory in spark by,
{code:java}
spark.read.parquet("/dir/data/test").limit(1).show()
{code}
behind the scenes, it fetches the FileStatus objects and caches them, inside a 
FileStatusCache, so that it does not need to refetch these objects. Internally, 
it scans using listLeafFiles function at driver. 
 Inside the cache, the entire content of the listing as array of FileStatus 
objects is inserted as a single entry, with key as "/dir/data/test", in the 
FileStatusCache. The default size of this cache is 250MB and it is 
configurable. This underlying cache uses guava cache.

The guava cache has one interesting property, i.e. a single entry can only be 
as large as
{code:java}
maximumSize/concurrencyLevel{code}
see 

[jira] [Updated] (SPARK-27664) Performance issue with FileStatusCache, while reading from object stores.

2019-05-10 Thread Prashant Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-27664:

Description: 
In short,

This issue(i.e. degraded performance ) surfaces when the number of files are 
large > 100K, and is stored on an object store, or any remote storage. The 
actual issue is due to,

Everything is inserted as a single entry in the FileStatusCache i.e. guava 
cache, which does not fit unless the cache is configured to be very very large 
or 4X. Reason: [https://github.com/google/guava/issues/3462].

 

Full story, with possible solutions,

When we read a directory in spark by,
{code:java}
spark.read.parquet("/dir/data/test").limit(1).show()
{code}
behind the scenes, it fetches the FileStatus objects and caches them, inside a 
FileStatusCache, so that it does not need to refetch these objects. Internally, 
it scans using listLeafFiles function at driver. 
 Inside the cache, the entire content of the listing as array of FileStatus 
objects is inserted as a single entry, with key as "/dir/data/test", in the 
FileStatusCache. The default size of this cache is 250MB and it is 
configurable. This underlying cache uses guava cache.

The guava cache has one interesting property, i.e. a single entry can only be 
as large as
{code:java}
maximumSize/concurrencyLevel{code}
see [https://github.com/google/guava/issues/3462], for details. So for a cache 
size of 250MB, a single entry can be as large as only 250MB/4, since the 
default concurrency level is 4 in guava. This size is around 62MB, which is 
good enough for most datasets, but for directories with larger listing, it does 
not work that well. And the effect of this is especially evident when such 
listings are for object stores like Amazon s3 or IBM Cloud object store etc..

So, currently one can work around this problem by setting the value of size of 
the cache (i.e. `spark.sql.hive.filesourcePartitionFileCacheSize`) as very 
high, as it needs to be much more than 4x of what is required.

In order to fix this issue, we can take 3 different approaches,

1) one stop gap fix can be, reduce the concurrency level of the guava cache to 
be just 1, because if everything has to be just one single entry per job, then 
concurrency is not helpful anyway.

2) The alternative would be, to divide the input array into multiple entries in 
the cache, instead of inserting everything against a single key. This can be 
done using directories as keys, if there are multiple nested directories under 
a directory, but if a user has everything listed under a single dir, then this 
solution does not help either and we cannot depend on them creating partitions 
in their hive/sql table.

3) One more alternative fix would be, to make concurrency level configurable, 
for those who want to change it. And while inserting the entry in the cache 
divide it into the `concurrencyLevel`(or even 2X or 3X of it) number of parts, 
before inserting. This way cache will perform optimally, and even if there is 
an eviction, it will evict only a part of the entries, as against all the 
entries in the current implementation. How many entries are evicted due to 
size, depends on concurrencyLevel configured. This approach can be taken, even 
without making `concurrencyLevel` configurable.

The problem with this approach is, the partitions in cache are of no use as 
such, because even if one partition is evicted, then all the partitions of the 
key should also be evicted, otherwise the results would be wrong. 

  was:
In short,

This issue(i.e. degraded performance ) surfaces when the number of files are 
large > 100K, and is stored on an object store, or any remote storage. The 
actual issue is due to,

Everything is inserted as a single entry in the FileStatusCache i.e. guava 
cache, which does not fit unless the cache is configured to be very very large 
or 4X. Reason: [https://github.com/google/guava/issues/3462].

 

Full story, with possible solutions,

When we read a directory in spark by,
{code:java}
spark.read.parquet("/dir/data/test").limit(1).show()
{code}
behind the scenes, it fetches the FileStatus objects and caches them, inside a 
FileStatusCache, so that it does not need to refetch these objects. Internally, 
it scans using listLeafFiles function at driver. 
 Inside the cache, the entire content of the listing as array of FileStatus 
objects is inserted as a single entry, with key as "/dir/data/test", in the 
FileStatusCache. The default size of this cache is 250MB and it is 
configurable. This underlying cache uses guava cache.

The guava cache has one interesting property, i.e. a single entry can only be 
as large as
{code:java}
maximumSize/concurrencyLevel{code}
see [https://github.com/google/guava/issues/3462], for details. So for a cache 
size of 250MB, a single entry can be as large as only 250MB/4, since the 
default concurrency level is 4 in guava. This size 

[jira] [Updated] (SPARK-27664) Performance issue with FileStatusCache, while reading from object stores.

2019-05-09 Thread Prashant Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-27664:

Description: 
In short,

This issue(i.e. degraded performance ) surfaces when the number of files are 
large > 100K, and is stored on an object store, or any remote storage. The 
actual issue is due to,

Everything is inserted as a single entry in the FileStatusCache i.e. guava 
cache, which does not fit unless the cache is configured to be very very large 
or 4X. Reason: [https://github.com/google/guava/issues/3462].

 

Full story, with possible solutions,

When we read a directory in spark by,
{code:java}
spark.read.parquet("/dir/data/test").limit(1).show()
{code}
behind the scenes, it fetches the FileStatus objects and caches them, inside a 
FileStatusCache, so that it does not need to refetch these objects. Internally, 
it scans using listLeafFiles function at driver. 
 Inside the cache, the entire content of the listing as array of FileStatus 
objects is inserted as a single entry, with key as "/dir/data/test", in the 
FileStatusCache. The default size of this cache is 250MB and it is 
configurable. This underlying cache uses guava cache.

The guava cache has one interesting property, i.e. a single entry can only be 
as large as
{code:java}
maximumSize/concurrencyLevel{code}
see [https://github.com/google/guava/issues/3462], for details. So for a cache 
size of 250MB, a single entry can be as large as only 250MB/4, since the 
default concurrency level is 4 in guava. This size is around 62MB, which is 
good enough for most datasets, but for directories with larger listing, it does 
not work that well. And the effect of this is especially evident when such 
listings are for object stores like Amazon s3 or IBM Cloud object store etc..

So, currently one can work around this problem by setting the value of size of 
the cache (i.e. `spark.sql.hive.filesourcePartitionFileCacheSize`) as very 
high, as it needs to be much more than 4x of what is required.

In order to fix this issue, we can take 3 different approaches,

1) one stop gap fix can be, reduce the concurrency level of the guava cache to 
be just 1, because if everything has to be just one single entry per job, then 
concurrency is not helpful anyway.

2) The ideal fix would be, to divide the input array into multiple entries in 
the cache, instead of inserting everything against a single key. This can be 
done using directories as keys, if there are multiple nested directories under 
a directory, but if a user has everything listed under a single dir, then this 
solution does not help either. 

3) Even more ideal fix would be, to make concurrency level configurable, for 
those who want to change it. And while inserting the entry in the cache divide 
it into the `concurrencyLevel`(or even 2X or 3X of it) number of parts, before 
inserting. This way cache will perform optimally, and even if there is an 
eviction, it will evict only a part of the entries, as against all the entries 
in the current implementation. How many entries are evicted due to size, 
depends on concurrencyLevel configured. This approach can be taken, even 
without making `concurrencyLevel` configurable.

  was:
In short,

This issue(i.e. degraded performance ) surfaces when the number of files are 
large > 200K, and is stored on an object store, or any remote storage. The 
actual issue is due to,

Everything is inserted as a single entry in the FileStatusCache i.e. guava 
cache, which does not fit unless the cache is configured to be very very large 
or 4X. Reason: [https://github.com/google/guava/issues/3462].

 

Full story, with possible solutions,

When we read a directory in spark by,
{code:java}
spark.read.parquet("/dir/data/test").limit(1).show()
{code}
behind the scenes, it fetches the FileStatus objects and caches them, inside a 
FileStatusCache, so that it does not need to refetch these objects. Internally, 
it scans using listLeafFiles function at driver. 
 Inside the cache, the entire content of the listing as array of FileStatus 
objects is inserted as a single entry, with key as "/dir/data/test", in the 
FileStatusCache. The default size of this cache is 250MB and it is 
configurable. This underlying cache uses guava cache.

The guava cache has one interesting property, i.e. a single entry can only be 
as large as
{code:java}
maximumSize/concurrencyLevel{code}
see [https://github.com/google/guava/issues/3462], for details. So for a cache 
size of 250MB, a single entry can be as large as only 250MB/4, since the 
default concurrency level is 4 in guava. This size is around 62MB, which is 
good enough for most datasets, but for directories with larger listing, it does 
not work that well. And the effect of this is especially evident when such 
listings are for object stores like Amazon s3 or IBM Cloud object store etc..

So, currently one can work around this 

  1   2   3   4   >