date:20191204

[jira] [Resolved] (SPARK-30093) Improve error message for creating views

2019-12-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30093.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26731
[https://github.com/apache/spark/pull/26731]

> Improve error message for creating views
> 
>
> Key: SPARK-30093
> URL: https://issues.apache.org/jira/browse/SPARK-30093
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Aman Omer
>Assignee: Aman Omer
>Priority: Major
> Fix For: 3.0.0
>
>
> Improve error message for creating views.
> https://github.com/apache/spark/pull/26317#discussion_r352377363



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-30093) Improve error message for creating views

2019-12-04 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30093:
---

Assignee: Aman Omer

> Improve error message for creating views
> 
>
> Key: SPARK-30093
> URL: https://issues.apache.org/jira/browse/SPARK-30093
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Aman Omer
>Assignee: Aman Omer
>Priority: Major
>
> Improve error message for creating views.
> https://github.com/apache/spark/pull/26317#discussion_r352377363



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30135) Add documentation for DELETE JAR and DELETE File command

2019-12-04 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-30135:
--
Summary: Add documentation for DELETE JAR and DELETE File command  (was: 
Add documentation for DELETE JAR command)

> Add documentation for DELETE JAR and DELETE File command
> 
>
> Key: SPARK-30135
> URL: https://issues.apache.org/jira/browse/SPARK-30135
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30137) Support DELETE file

2019-12-04 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988508#comment-16988508
 ] 

Sandeep Katta commented on SPARK-30137:
---

I started working on this feature, I will raise the PR soon

> Support DELETE file 
> 
>
> Key: SPARK-30137
> URL: https://issues.apache.org/jira/browse/SPARK-30137
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30137) Support DELETE file

2019-12-04 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30137:
-

 Summary: Support DELETE file 
 Key: SPARK-30137
 URL: https://issues.apache.org/jira/browse/SPARK-30137
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Sandeep Katta






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30133) Support DELETE Jar and DELETE File functionality in spark

2019-12-04 Thread Sandeep Katta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Katta updated SPARK-30133:
--
Summary: Support DELETE Jar and DELETE File functionality in spark  (was: 
Support DELETE Jar functionality in spark)

> Support DELETE Jar and DELETE File functionality in spark
> -
>
> Key: SPARK-30133
> URL: https://issues.apache.org/jira/browse/SPARK-30133
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>  Labels: Umbrella
>
> SPARK should support delete jar feature



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30136) DELETE JAR should also remove the jar from executor classPath

2019-12-04 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30136:
-

 Summary: DELETE JAR should also remove the jar from executor 
classPath
 Key: SPARK-30136
 URL: https://issues.apache.org/jira/browse/SPARK-30136
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Sandeep Katta






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30136) DELETE JAR should also remove the jar from executor classPath

2019-12-04 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988504#comment-16988504
 ] 

Sandeep Katta commented on SPARK-30136:
---

I started working on this feature, I will raise the PR soon

> DELETE JAR should also remove the jar from executor classPath
> -
>
> Key: SPARK-30136
> URL: https://issues.apache.org/jira/browse/SPARK-30136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30134) DELETE JAR should remove from addedJars list and from classpath

2019-12-04 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988502#comment-16988502
 ] 

Sandeep Katta commented on SPARK-30134:
---

I started working on this feature, I will raise the PR soon

> DELETE JAR should remove  from addedJars list and from classpath
> 
>
> Key: SPARK-30134
> URL: https://issues.apache.org/jira/browse/SPARK-30134
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30135) Add documentation for DELETE JAR command

2019-12-04 Thread Sandeep Katta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988503#comment-16988503
 ] 

Sandeep Katta commented on SPARK-30135:
---

I started working on this feature, I will raise the PR soon

> Add documentation for DELETE JAR command
> 
>
> Key: SPARK-30135
> URL: https://issues.apache.org/jira/browse/SPARK-30135
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30134) DELETE JAR should remove from addedJars list and from classpath

2019-12-04 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30134:
-

 Summary: DELETE JAR should remove  from addedJars list and from 
classpath
 Key: SPARK-30134
 URL: https://issues.apache.org/jira/browse/SPARK-30134
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Sandeep Katta






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30135) Add documentation for DELETE JAR command

2019-12-04 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30135:
-

 Summary: Add documentation for DELETE JAR command
 Key: SPARK-30135
 URL: https://issues.apache.org/jira/browse/SPARK-30135
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 3.0.0
Reporter: Sandeep Katta






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30133) Support DELETE Jar functionality in spark

2019-12-04 Thread Sandeep Katta (Jira)

Sandeep Katta created SPARK-30133:
-

 Summary: Support DELETE Jar functionality in spark
 Key: SPARK-30133
 URL: https://issues.apache.org/jira/browse/SPARK-30133
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.0.0
Reporter: Sandeep Katta


SPARK should support delete jar feature



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29915) spark-py and spark-r images are not created with docker-image-tool.sh

2019-12-04 Thread Veera (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988492#comment-16988492
 ] 

Veera commented on SPARK-29915:
---

Yes this option is not included , but you can add manually in docker files and 
create your own custom docker image

 

> spark-py and spark-r images are not created with docker-image-tool.sh
> -
>
> Key: SPARK-29915
> URL: https://issues.apache.org/jira/browse/SPARK-29915
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Michał Wesołowski
>Priority: Major
>
> Currently at version 3.0.0-preview docker-image-tool.sh script has the 
> [following 
> lines|[https://github.com/apache/spark/blob/master/bin/docker-image-tool.sh#L173]]
>  defined:
> {code:java}
>  local PYDOCKERFILE=${PYDOCKERFILE:-false} 
>  local RDOCKERFILE=${RDOCKERFILE:-false} {code}
> Because of this change spark-py nor spark-r images get created. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29021) NoSuchElementException: key not found: hostPath.spark-local-dir-5.options.path

2019-12-04 Thread Veera (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988487#comment-16988487
 ] 

Veera commented on SPARK-29021:
---

Hi 

 

Even im faing the issue while submiitng the spark

 

/opt/spark/bin/spark-submit --master XXX/ \
--deploy-mode cluster \
--name spark-wordcount \
--conf spark.kubernetes.namespace=spark 
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spar-sa \
--conf spark.executor.instances=3 \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.container.image=container image latest version \
--conf spark.kubernetes.pyspark.pythonVersion=3 \
--conf 
spark.kubernetes.authenticate.submission.caCertFile=/etc/kubernetes/pki/ca.crt \
--conf spark.kubernetes.driver.volumes.hostPath.sparkwordcount.mount.path=/root 
\
--conf 
spark.kubernetes.driver.volumes.hostPath.sparkwordcount.mount.readOnly=false \
--conf 
spark.kubernetes.driver.volumes.hostPath.sparkwordcount.options.claimName=spark-wordcount-claim
 \
/root/pyscripts/wordcount.py

 

 

Exception in thread "main" java.util.NoSuchElementException: 
hostPath.sparkwordcount.options.path
 at 
org.apache.spark.deploy.k8s.KubernetesVolumeUtils$MapOps$$anonfun$getTry$1.apply(KubernetesVolumeUtils.scala:107)
 at 
org.apache.spark.deploy.k8s.KubernetesVolumeUtils$MapOps$$anonfun$getTry$1.apply(KubernetesVolumeUtils.scala:107)
 at scala.Option.fold(Option.scala:158)
 at 
org.apache.spark.deploy.k8s.KubernetesVolumeUtils$MapOps.getTry(KubernetesVolumeUtils.scala:107)

> NoSuchElementException: key not found: hostPath.spark-local-dir-5.options.path
> --
>
> Key: SPARK-29021
> URL: https://issues.apache.org/jira/browse/SPARK-29021
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Kent Yao
>Priority: Major
>
> Mount hostPath has an issue:
> {code:java}
> Exception in thread "main" java.util.NoSuchElementException: key not found: 
> hostPath.spark-local-dir-5.options.pathException in thread "main" 
> java.util.NoSuchElementException: key not found: 
> hostPath.spark-local-dir-5.options.path at 
> scala.collection.MapLike.default(MapLike.scala:235) at 
> scala.collection.MapLike.default$(MapLike.scala:234) at 
> scala.collection.AbstractMap.default(Map.scala:63) at 
> scala.collection.MapLike.apply(MapLike.scala:144) at 
> scala.collection.MapLike.apply$(MapLike.scala:143) at 
> scala.collection.AbstractMap.apply(Map.scala:63) at 
> org.apache.spark.deploy.k8s.KubernetesVolumeUtils$.parseVolumeSpecificConf(KubernetesVolumeUtils.scala:70)
>  at 
> org.apache.spark.deploy.k8s.KubernetesVolumeUtils$.$anonfun$parseVolumesWithPrefix$1(KubernetesVolumeUtils.scala:43)
>  at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at 
> scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:321) at 
> scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:977) at 
> scala.collection.TraversableLike.map(TraversableLike.scala:237) at 
> scala.collection.TraversableLike.map$(TraversableLike.scala:230) at 
> scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:51)
>  at scala.collection.SetLike.map(SetLike.scala:104) at 
> scala.collection.SetLike.map$(SetLike.scala:104) at 
> scala.collection.AbstractSet.map(Set.scala:51) at 
> org.apache.spark.deploy.k8s.KubernetesVolumeUtils$.parseVolumesWithPrefix(KubernetesVolumeUtils.scala:33)
>  at 
> org.apache.spark.deploy.k8s.KubernetesConf$.createDriverConf(KubernetesConf.scala:179)
>  at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:214)
>  at 
> org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:198)
>  at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:920)
>  at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:179) at 
> org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:202) at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:89) at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:999) 
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1008) at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29152) Spark Executor Plugin API shutdown is not proper when dynamic allocation enabled

2019-12-04 Thread Rakesh Raushan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Raushan updated SPARK-29152:
---
Affects Version/s: 3.0.0

> Spark Executor Plugin API shutdown is not proper when dynamic allocation 
> enabled
> 
>
> Key: SPARK-29152
> URL: https://issues.apache.org/jira/browse/SPARK-29152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0
>Reporter: jobit mathew
>Priority: Major
>
> *Issue Description*
> Spark Executor Plugin API *shutdown handling is not proper*, when dynamic 
> allocation enabled .Plugin's shutdown method is not processed when dynamic 
> allocation is enabled and *executors become dead* after inactive time.
> *Test Precondition*
> 1. Create a plugin and make a jar named SparkExecutorplugin.jar
> import org.apache.spark.ExecutorPlugin;
> public class ExecutoTest1 implements ExecutorPlugin{
> public void init(){
> System.out.println("Executor Plugin Initialised.");
> }
> public void shutdown(){
> System.out.println("Executor plugin closed successfully.");
> }
> }
> 2. Create the  jars with the same and put it in folder /spark/examples/jars
> *Test Steps*
> 1. launch bin/spark-sql with dynamic allocation enabled
> ./spark-sql --master yarn --conf spark.executor.plugins=ExecutoTest1  --jars 
> /opt/HA/C10/install/spark/spark/examples/jars/SparkExecutorPlugin.jar --conf 
> spark.dynamicAllocation.enabled=true --conf 
> spark.dynamicAllocation.initialExecutors=2 --conf 
> spark.dynamicAllocation.minExecutors=1
> 2 create a table , insert the data and select * from tablename
> 3.Check the spark UI Jobs tab/SQL tab
> 4. Check all Executors(executor tab will give all executors details) 
> application log file for Executor plugin Initialization and Shutdown messages 
> or operations.
> Example 
> /yarn/logdir/application_1567156749079_0025/container_e02_1567156749079_0025_01_05/
>  stdout
> 5. Wait for the executor to be dead after the inactive time and check the 
> same container log 
> 6. Kill the spark sql and check the container log  for executor plugin 
> shutdown.
> *Expect Output*
> 1. Job should be success. Create table ,insert and select query should be 
> success.
> 2.While running query All Executors  log should contain the executor plugin 
> Init messages or operations.
> "Executor Plugin Initialised.
> 3.Once the executors are dead ,shutdown message should be there in log file.
> “ Executor plugin closed successfully.
> 4.Once the sql application closed ,shutdown message should be there in log.
> “ Executor plugin closed successfully". 
> *Actual Output*
> Shutdown message is not called when executor is dead after inactive time.
> *Observation*
> Without dynamic allocation Executor plugin is working fine. But after 
> enabling dynamic allocation,Executor shutdown is not processed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30121) Fix memory usage in sbt build script

2019-12-04 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-30121:
-
Description: 
1. the default memory setting is missing in usage instructions 

{code:java}
```
build/sbt -h
```

```
-mem  set memory options (default: , which is -Xms2048m -Xmx2048m 
-XX:ReservedCodeCacheSize=256m)
```

{code}
2. the Perm space is not needed anymore, since java7 is removed.

  was:

{code:java}
```
-mem  set memory options (default: , which is -Xms2048m -Xmx2048m 
-XX:ReservedCodeCacheSize=256m)
```
{code}



> Fix memory usage in sbt build script
> 
>
> Key: SPARK-30121
> URL: https://issues.apache.org/jira/browse/SPARK-30121
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Kent Yao
>Priority: Minor
>
> 1. the default memory setting is missing in usage instructions 
> {code:java}
> ```
> build/sbt -h
> ```
> ```
> -mem  set memory options (default: , which is -Xms2048m 
> -Xmx2048m -XX:ReservedCodeCacheSize=256m)
> ```
> {code}
> 2. the Perm space is not needed anymore, since java7 is removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30121) Fix memory usage in sbt build script

2019-12-04 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-30121:
-
Summary: Fix memory usage in sbt build script  (was: Miss default memory 
setting for sbt usage)

> Fix memory usage in sbt build script
> 
>
> Key: SPARK-30121
> URL: https://issues.apache.org/jira/browse/SPARK-30121
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Kent Yao
>Priority: Minor
>
> {code:java}
> ```
> -mem  set memory options (default: , which is -Xms2048m 
> -Xmx2048m -XX:ReservedCodeCacheSize=256m)
> ```
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30129) New auth engine does not keep client ID in TransportClient after auth

2019-12-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30129.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/26760

> New auth engine does not keep client ID in TransportClient after auth
> -
>
> Key: SPARK-30129
> URL: https://issues.apache.org/jira/browse/SPARK-30129
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> Found a little bug when working on a feature; when auth is on, it's expected 
> that the {{TransportClient}} provides the authenticated ID of the client 
> (generally the app ID), but the new auth engine is not setting that 
> information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-30129) New auth engine does not keep client ID in TransportClient after auth

2019-12-04 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-30129:
-

Assignee: Marcelo Masiero Vanzin

> New auth engine does not keep client ID in TransportClient after auth
> -
>
> Key: SPARK-30129
> URL: https://issues.apache.org/jira/browse/SPARK-30129
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Major
>
> Found a little bug when working on a feature; when auth is on, it's expected 
> that the {{TransportClient}} provides the authenticated ID of the client 
> (generally the app ID), but the new auth engine is not setting that 
> information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30132) Scala 2.13 compile errors from Hadoop LocalFileSystem subclasses

2019-12-04 Thread Sean R. Owen (Jira)

Sean R. Owen created SPARK-30132:


 Summary: Scala 2.13 compile errors from Hadoop LocalFileSystem 
subclasses
 Key: SPARK-30132
 URL: https://issues.apache.org/jira/browse/SPARK-30132
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Sean R. Owen



A few classes in our test code extend Hadoop's LocalFileSystem. Scala 2.13 
returns a compile error here - not for the Spark code, but because the Hadoop 
code (it says) illegally overrides appendFile() with slightly different generic 
types in its return value. This code is valid Java, evidently, and the code 
actually doesn't define any generic types, so, I even wonder if it's a scalac 
bug.

So far I don't see a workaround for this.

This only affects the Hadoop 3.2 build, in that it comes up with respect to a 
method new in Hadoop 3. (There is actually another instance of a similar 
problem that affects Hadoop 2, but I can see a tiny hack workaround for it).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30125) Remove PostgreSQL dialect

2019-12-04 Thread Yuanjian Li (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988301#comment-16988301
 ] 

Yuanjian Li commented on SPARK-30125:
-

[~aman_omer] Sorry, my PR was a little delayed by fixing related UTs, you can 
still help in reviewing.

> Remove PostgreSQL dialect
> -
>
> Key: SPARK-30125
> URL: https://issues.apache.org/jira/browse/SPARK-30125
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>
> As the discussion in 
> [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html],
>  we need to remove PostgreSQL dialect form code base for several reasons:
> 1. The current approach makes the codebase complicated and hard to maintain.
> 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now.
>  
> Curently we have 3 features under PostgreSQL dialect:
> 1. SPARK-27931: when casting string to boolean, `t`, `tr`, `tru`, `yes`, .. 
> are also allowed as true string.
> 2. SPARK-29364: `date - date`  returns interval in Spark (SQL standard 
> behavior), but return int in PostgreSQL
> 3. SPARK-28395: `int / int` returns double in Spark, but returns int in 
> PostgreSQL. (there is no standard)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-30084) Add docs showing how to automatically rebuild Python API docs

2019-12-04 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-30084:


Assignee: Nicholas Chammas

> Add docs showing how to automatically rebuild Python API docs
> -
>
> Key: SPARK-30084
> URL: https://issues.apache.org/jira/browse/SPARK-30084
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 3.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>
> `jekyll serve --watch` doesn't watch the API docs. That means you have to 
> kill and restart jekyll every time you update your API docs, just to see the 
> effect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30084) Add docs showing how to automatically rebuild Python API docs

2019-12-04 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-30084.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26719
[https://github.com/apache/spark/pull/26719]

> Add docs showing how to automatically rebuild Python API docs
> -
>
> Key: SPARK-30084
> URL: https://issues.apache.org/jira/browse/SPARK-30084
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Affects Versions: 3.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
> Fix For: 3.0.0
>
>
> `jekyll serve --watch` doesn't watch the API docs. That means you have to 
> kill and restart jekyll every time you update your API docs, just to see the 
> effect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30131) Add array_median function

2019-12-04 Thread Alexander Hagerf (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988289#comment-16988289
 ] 

Alexander Hagerf commented on SPARK-30131:
--

Created PR for this issue: [https://github.com/apache/spark/pull/26762]

> Add array_median function
> -
>
> Key: SPARK-30131
> URL: https://issues.apache.org/jira/browse/SPARK-30131
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Alexander Hagerf
>Priority: Minor
> Fix For: 3.0.0
>
>
> It is known that there isn't any exact median function in Spark SQL, and this 
> might be a difficult problem to solve efficiently. However, to find the 
> median for an array should be a simple task, and something that users can 
> utilize when collecting numeric values to a list or set.
>  
> This can already be achieved by using sorting and choosing element, but can 
> get cumbersome and if a fully tested function is provided in the API, I think 
> it can solve some headache for many.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30131) Add array_median function

2019-12-04 Thread Alexander Hagerf (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Hagerf updated SPARK-30131:
-
Description: 
It is known that there isn't any exact median function in Spark SQL, and this 
might be a difficult problem to solve efficiently. However, to find the median 
for an array should be a simple task, and something that users can utilize when 
collecting numeric values to a list or set.

This can already be achieved by using sorting and choosing element, but can get 
cumbersome and if a fully tested function is provided in the API, I think it 
can solve some headache for many.

  was:
It is known that there isn't any exact median function in Spark SQL, and this 
might be a difficult problem to solve efficiently. However, to find the median 
for an array should be a simple task, and something that users can utilize when 
collecting numeric values to a list or set.

 

This can already be achieved by using sorting and choosing element, but can get 
cumbersome and if a fully tested function is provided in the API, I think it 
can solve some headache for many.


> Add array_median function
> -
>
> Key: SPARK-30131
> URL: https://issues.apache.org/jira/browse/SPARK-30131
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Alexander Hagerf
>Priority: Minor
> Fix For: 3.0.0
>
>
> It is known that there isn't any exact median function in Spark SQL, and this 
> might be a difficult problem to solve efficiently. However, to find the 
> median for an array should be a simple task, and something that users can 
> utilize when collecting numeric values to a list or set.
> This can already be achieved by using sorting and choosing element, but can 
> get cumbersome and if a fully tested function is provided in the API, I think 
> it can solve some headache for many.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30131) Add array_median function

2019-12-04 Thread Alexander Hagerf (Jira)

Alexander Hagerf created SPARK-30131:


 Summary: Add array_median function
 Key: SPARK-30131
 URL: https://issues.apache.org/jira/browse/SPARK-30131
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.4.4
Reporter: Alexander Hagerf
 Fix For: 3.0.0


It is known that there isn't any exact median function in Spark SQL, and this 
might be a difficult problem to solve efficiently. However, to find the median 
for an array should be a simple task, and something that users can utilize when 
collecting numeric values to a list or set.

 

This can already be achieved by using sorting and choosing element, but can get 
cumbersome and if a fully tested function is provided in the API, I think it 
can solve some headache for many.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-12-04 Thread Andy Grove (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reopened SPARK-29640:


Re-opening this as the issue came back and is ongoing. I am exploring other 
solutions and will post here if/when I find a solution that works.

> [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in 
> Spark driver
> --
>
> Key: SPARK-29640
> URL: https://issues.apache.org/jira/browse/SPARK-29640
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Andy Grove
>Priority: Major
>
> We are running into intermittent DNS issues where the Spark driver fails to 
> resolve "kubernetes.default.svc" when trying to create executors. We are 
> running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode in EKS.
> This happens approximately 10% of the time.
> Here is the stack trace:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: External 
> scheduler cannot be instantiated
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
>   at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
>   at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
> [get]  for kind: [Pod]  with name: 
> [wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
> [tenant-8-workflows]  failed.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
>   at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
>   ... 20 more
> Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
>   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1193)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1127)
>   at okhttp3.Dns$1.lookup(Dns.java:39)
>   at 
>

[jira] [Commented] (SPARK-30130) Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions

2019-12-04 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988168#comment-16988168
 ] 

Ankit Raj Boudh commented on SPARK-30130:
-

i will check this issue

> Hardcoded numeric values in common table expressions which utilize GROUP BY 
> are interpreted as ordinal positions
> 
>
> Key: SPARK-30130
> URL: https://issues.apache.org/jira/browse/SPARK-30130
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Matt Boegner
>Priority: Minor
>
> Hardcoded numeric values in common table expressions which utilize GROUP BY 
> are interpreted as ordinal positions. 
> {code:java}
> val df = spark.sql("""
>  with a as (select 0 as test, count group by test)
>  select * from a
>  """)
>  df.show(){code}
>  This results in an error message like {color:#e01e5a}GROUP BY position 0 is 
> not in select list (valid range is [1, 2]){color} .
>  
> However, this error does not appear in a traditional subselect format. For 
> example, this query executes correctly:
> {code:java}
> val df = spark.sql("""
>  select * from (select 0 as test, count group by test) a
>  """)
>  df.show(){code}
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29453) Improve tooltip information for SQL tab

2019-12-04 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29453.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26641
[https://github.com/apache/spark/pull/26641]

> Improve tooltip information for SQL tab
> ---
>
> Key: SPARK-29453
> URL: https://issues.apache.org/jira/browse/SPARK-29453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Assignee: Ankit Raj Boudh
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29453) Improve tooltip information for SQL tab

2019-12-04 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-29453:


Assignee: Ankit Raj Boudh

> Improve tooltip information for SQL tab
> ---
>
> Key: SPARK-29453
> URL: https://issues.apache.org/jira/browse/SPARK-29453
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Sandeep Katta
>Assignee: Ankit Raj Boudh
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30118) ALTER VIEW QUERY does not work

2019-12-04 Thread John Zhuge (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988078#comment-16988078
 ] 

John Zhuge commented on SPARK-30118:


[~cltlfcjin] Thanks for the comment. Do you know which commit fixed the issue?

> ALTER VIEW QUERY does not work
> --
>
> Key: SPARK-30118
> URL: https://issues.apache.org/jira/browse/SPARK-30118
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: John Zhuge
>Priority: Major
>
> `ALTER VIEW AS` does not change view query. It leaves the view in a corrupted 
> state.
> {code:sql}
> spark-sql> CREATE VIEW jzhuge.v1 AS SELECT 'foo' foo1;
> spark-sql> SHOW CREATE TABLE jzhuge.v1;
> CREATE VIEW `jzhuge`.`v1`(foo1) AS
> SELECT 'foo' foo1
> spark-sql> ALTER VIEW jzhuge.v1 AS SELECT 'foo' foo2;
> spark-sql> SHOW CREATE TABLE jzhuge.v1;
> CREATE VIEW `jzhuge`.`v1`(foo1) AS
> SELECT 'foo' foo1
> spark-sql> TABLE jzhuge.v1;
> Error in query: Attribute with name 'foo2' is not found in '(foo1)';;
> SubqueryAlias `jzhuge`.`v1`
> +- View (`jzhuge`.`v1`, [foo1#33])
>+- Project [foo AS foo1#34]
>   +- OneRowRelation
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30130) Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions

2019-12-04 Thread Matt Boegner (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Boegner updated SPARK-30130:
-
Description: 
Hardcoded numeric values in common table expressions which utilize GROUP BY are 
interpreted as ordinal positions. 
{code:java}
val df = spark.sql("""
 with a as (select 0 as test, count group by test)
 select * from a
 """)
 df.show(){code}

 This results in an error message like {color:#e01e5a}GROUP BY position 0 is 
not in select list (valid range is [1, 2]){color} .

 

However, this error does not appear in a traditional subselect format. For 
example, this query executes correctly:
{code:java}
val df = spark.sql("""
 select * from (select 0 as test, count group by test) a
 """)
 df.show(){code}

  

  was:
Hardcoded numeric values in common table expressions which utilize GROUP BY are 
interpreted as ordinal positions. 
val df = spark.sql("""
with a as (select 0 as test, count(*) group by test)
select * from a
""")
df.show()
This results in an error message like {color:#e01e5a}GROUP BY position 0 is not 
in select list (valid range is [1, 2]){color} .

 

However, this error does not appear in a traditional subselect format. For 
example, this query executes correctly:
val df = spark.sql("""
select * from (select 0 as test, count(*) group by test) a
""")
df.show()
 


> Hardcoded numeric values in common table expressions which utilize GROUP BY 
> are interpreted as ordinal positions
> 
>
> Key: SPARK-30130
> URL: https://issues.apache.org/jira/browse/SPARK-30130
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Matt Boegner
>Priority: Minor
>
> Hardcoded numeric values in common table expressions which utilize GROUP BY 
> are interpreted as ordinal positions. 
> {code:java}
> val df = spark.sql("""
>  with a as (select 0 as test, count group by test)
>  select * from a
>  """)
>  df.show(){code}
>  This results in an error message like {color:#e01e5a}GROUP BY position 0 is 
> not in select list (valid range is [1, 2]){color} .
>  
> However, this error does not appear in a traditional subselect format. For 
> example, this query executes correctly:
> {code:java}
> val df = spark.sql("""
>  select * from (select 0 as test, count group by test) a
>  """)
>  df.show(){code}
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30130) Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions

2019-12-04 Thread Matt Boegner (Jira)

Matt Boegner created SPARK-30130:


 Summary: Hardcoded numeric values in common table expressions 
which utilize GROUP BY are interpreted as ordinal positions
 Key: SPARK-30130
 URL: https://issues.apache.org/jira/browse/SPARK-30130
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.4
Reporter: Matt Boegner


Hardcoded numeric values in common table expressions which utilize GROUP BY are 
interpreted as ordinal positions. 
val df = spark.sql("""
with a as (select 0 as test, count(*) group by test)
select * from a
""")
df.show()
This results in an error message like {color:#e01e5a}GROUP BY position 0 is not 
in select list (valid range is [1, 2]){color} .

 

However, this error does not appear in a traditional subselect format. For 
example, this query executes correctly:
val df = spark.sql("""
select * from (select 0 as test, count(*) group by test) a
""")
df.show()
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30129) New auth engine does not keep client ID in TransportClient after auth

2019-12-04 Thread Marcelo Masiero Vanzin (Jira)

Marcelo Masiero Vanzin created SPARK-30129:
--

 Summary: New auth engine does not keep client ID in 
TransportClient after auth
 Key: SPARK-30129
 URL: https://issues.apache.org/jira/browse/SPARK-30129
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4, 3.0.0
Reporter: Marcelo Masiero Vanzin


Found a little bug when working on a feature; when auth is on, it's expected 
that the {{TransportClient}} provides the authenticated ID of the client 
(generally the app ID), but the new auth engine is not setting that information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30125) Remove PostgreSQL dialect

2019-12-04 Thread Aman Omer (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988032#comment-16988032
 ] 

Aman Omer commented on SPARK-30125:
---

I would like to take this work.

> Remove PostgreSQL dialect
> -
>
> Key: SPARK-30125
> URL: https://issues.apache.org/jira/browse/SPARK-30125
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>
> As the discussion in 
> [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html],
>  we need to remove PostgreSQL dialect form code base for several reasons:
> 1. The current approach makes the codebase complicated and hard to maintain.
> 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now.
>  
> Curently we have 3 features under PostgreSQL dialect:
> 1. SPARK-27931: when casting string to boolean, `t`, `tr`, `tru`, `yes`, .. 
> are also allowed as true string.
> 2. SPARK-29364: `date - date`  returns interval in Spark (SQL standard 
> behavior), but return int in PostgreSQL
> 3. SPARK-28395: `int / int` returns double in Spark, but returns int in 
> PostgreSQL. (there is no standard)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-21488) Make saveAsTable() and createOrReplaceTempView() return dataframe of created table/ created view

2019-12-04 Thread Ruslan Dautkhanov (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-21488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruslan Dautkhanov reopened SPARK-21488:
---

> Make saveAsTable() and createOrReplaceTempView() return dataframe of created 
> table/ created view
> 
>
> Key: SPARK-21488
> URL: https://issues.apache.org/jira/browse/SPARK-21488
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed
>
> It would be great to make saveAsTable() return dataframe of created table, 
> so you could pipe result further as for example
> {code}
> mv_table_df = (sqlc.sql('''
> SELECT ...
> FROM 
> ''')
> .write.format("parquet").mode("overwrite")
> .saveAsTable('test.parquet_table')
> .createOrReplaceTempView('mv_table')
> )
> {code}
> ... Above code returns now expectedly:
> {noformat}
> AttributeError: 'NoneType' object has no attribute 'createOrReplaceTempView'
> {noformat}
> If this is implemented, we can skip a step like
> {code}
> sqlc.sql('SELECT * FROM 
> test.parquet_table').createOrReplaceTempView('mv_table')
> {code}
> We have this pattern very frequently. 
> Further improvement can be made if createOrReplaceTempView also returns 
> dataframe object, so in one pipeline of functions 
> we can 
> - create an external table 
> - create a dataframe reference to this newly created for SparkSQL and as a 
> Spark variable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21488) Make saveAsTable() and createOrReplaceTempView() return dataframe of created table/ created view

2019-12-04 Thread Ruslan Dautkhanov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-21488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988011#comment-16988011
 ] 

Ruslan Dautkhanov commented on SPARK-21488:
---

[~zsxwing] any chance this can be added to Spark 3.0?

I can try to create a PR for this.. many of our users are still reporting this 
as relevant 
as this would streamline their code in many places . 

We always have a good mix of Spark SQL and Spark API calls in many places,
and this would be a huge win for code readability. 

 

> Make saveAsTable() and createOrReplaceTempView() return dataframe of created 
> table/ created view
> 
>
> Key: SPARK-21488
> URL: https://issues.apache.org/jira/browse/SPARK-21488
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Ruslan Dautkhanov
>Priority: Minor
>  Labels: bulk-closed
>
> It would be great to make saveAsTable() return dataframe of created table, 
> so you could pipe result further as for example
> {code}
> mv_table_df = (sqlc.sql('''
> SELECT ...
> FROM 
> ''')
> .write.format("parquet").mode("overwrite")
> .saveAsTable('test.parquet_table')
> .createOrReplaceTempView('mv_table')
> )
> {code}
> ... Above code returns now expectedly:
> {noformat}
> AttributeError: 'NoneType' object has no attribute 'createOrReplaceTempView'
> {noformat}
> If this is implemented, we can skip a step like
> {code}
> sqlc.sql('SELECT * FROM 
> test.parquet_table').createOrReplaceTempView('mv_table')
> {code}
> We have this pattern very frequently. 
> Further improvement can be made if createOrReplaceTempView also returns 
> dataframe object, so in one pipeline of functions 
> we can 
> - create an external table 
> - create a dataframe reference to this newly created for SparkSQL and as a 
> Spark variable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30126) sparkContext.addFile fails when file path contains spaces

2019-12-04 Thread Ankit Raj Boudh (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988001#comment-16988001
 ] 

Ankit Raj Boudh commented on SPARK-30126:
-

i will check this issue

> sparkContext.addFile fails when file path contains spaces
> -
>
> Key: SPARK-30126
> URL: https://issues.apache.org/jira/browse/SPARK-30126
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.3
>Reporter: Jan
>Priority: Minor
>
> When uploading a file to the spark context via the addFile function, an 
> exception is thrown when file path contains a space character. Escaping the 
> space with %20 or \\ or + doesn't change the result.
>  
> to reproduce: 
> file_path = "/home/user/test dir/config.conf"
> sparkContext.addFile(file_path)
>  
> results in:
> py4j.protocol.Py4JJavaError: An error occurred while calling o131.addFile.
> : java.io.FileNotFoundException: File file:/home/user/test%20dir/config.conf 
> does not exist



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19842) Informational Referential Integrity Constraints Support in Spark

2019-12-04 Thread Ruslan Dautkhanov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987999#comment-16987999
 ] 

Ruslan Dautkhanov commented on SPARK-19842:
---

>From the design document 

"""

This alternative proposes to use the KEY_CONSTRAINTS catalog table when Spark 
upgrates to Hive 2.1. Therefore, this proposal will introduce a dependency on 
Hive metastore 2.1. 

""" 

It seems Spark 3.0 is moving towards Hive 2.1 which has FK support.. would it 
be possible to add FKs and related optimizations to Spark 3.0 too? 

Thanks!

 

> Informational Referential Integrity Constraints Support in Spark
> 
>
> Key: SPARK-19842
> URL: https://issues.apache.org/jira/browse/SPARK-19842
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Ioana Delaney
>Priority: Major
> Attachments: InformationalRIConstraints.doc
>
>
> *Informational Referential Integrity Constraints Support in Spark*
> This work proposes support for _informational primary key_ and _foreign key 
> (referential integrity) constraints_ in Spark. The main purpose is to open up 
> an area of query optimization techniques that rely on referential integrity 
> constraints semantics. 
> An _informational_ or _statistical constraint_ is a constraint such as a 
> _unique_, _primary key_, _foreign key_, or _check constraint_, that can be 
> used by Spark to improve query performance. Informational constraints are not 
> enforced by the Spark SQL engine; rather, they are used by Catalyst to 
> optimize the query processing. They provide semantics information that allows 
> Catalyst to rewrite queries to eliminate joins, push down aggregates, remove 
> unnecessary Distinct operations, and perform a number of other optimizations. 
> Informational constraints are primarily targeted to applications that load 
> and analyze data that originated from a data warehouse. For such 
> applications, the conditions for a given constraint are known to be true, so 
> the constraint does not need to be enforced during data load operations. 
> The attached document covers constraint definition, metastore storage, 
> constraint validation, and maintenance. The document shows many examples of 
> query performance improvements that utilize referential integrity constraints 
> and can be implemented in Spark.
> Link to the google doc: 
> [InformationalRIConstraints|https://docs.google.com/document/d/17r-cOqbKF7Px0xb9L7krKg2-RQB_gD2pxOmklm-ehsw/edit]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30128) Promote remaining "hidden" PySpark DataFrameReader options to load APIs

2019-12-04 Thread Nicholas Chammas (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-30128:
-
Description: 
Following on to SPARK-29903 and similar issues (linked), there are options 
available to the DataFrameReader for certain source formats, but which are not 
exposed properly in the relevant APIs.

These options include `timeZone` and `pathGlobFilter`. Instead of being noted 
under [the option() 
method|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.option],
 they should be implemented directly into load APIs that support them.

  was:
Following on to SPARK-27990 and similar issues (linked), there are options 
available to the DataFrameReader for certain source formats, but which are not 
exposed properly in the relevant APIs.

These options include `timeZone` and `pathGlobFilter`.


> Promote remaining "hidden" PySpark DataFrameReader options to load APIs
> ---
>
> Key: SPARK-30128
> URL: https://issues.apache.org/jira/browse/SPARK-30128
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> Following on to SPARK-29903 and similar issues (linked), there are options 
> available to the DataFrameReader for certain source formats, but which are 
> not exposed properly in the relevant APIs.
> These options include `timeZone` and `pathGlobFilter`. Instead of being noted 
> under [the option() 
> method|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.option],
>  they should be implemented directly into load APIs that support them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30128) Promote remaining "hidden" PySpark DataFrameReader options to load APIs

2019-12-04 Thread Nicholas Chammas (Jira)

Nicholas Chammas created SPARK-30128:


 Summary: Promote remaining "hidden" PySpark DataFrameReader 
options to load APIs
 Key: SPARK-30128
 URL: https://issues.apache.org/jira/browse/SPARK-30128
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 2.4.4, 3.0.0
Reporter: Nicholas Chammas


Following on to SPARK-27990 and similar issues (linked), there are options 
available to the DataFrameReader for certain source formats, but which are not 
exposed properly in the relevant APIs.

These options include `timeZone` and `pathGlobFilter`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30127) UDF should work for case class like Dataset operations

2019-12-04 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-30127:
---

 Summary: UDF should work for case class like Dataset operations
 Key: SPARK-30127
 URL: https://issues.apache.org/jira/browse/SPARK-30127
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan


Currently, Spark UDF can only work on data types like java.lang.String, 
o.a.s.sql.Row, Seq[_], etc. This is inconvenient if you want to apply an 
operation on one column, and the column is struct type. You must access data 
from a Row object, instead of your domain object like Dataset operations. It 
will be great if UDF can work on types that are supported by Dataset, e.g. case 
classes.

Note that, there are multiple ways to register a UDF, and it's only possible to 
support this feature if the UDF is registered using Scala API that provides 
type tag, e.g. `def udf[RT: TypeTag, A1: TypeTag](f: Function1[A1, RT])`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30126) sparkContext.addFile fails when file path contains spaces

2019-12-04 Thread Jan (Jira)

Jan created SPARK-30126:
---

 Summary: sparkContext.addFile fails when file path contains spaces
 Key: SPARK-30126
 URL: https://issues.apache.org/jira/browse/SPARK-30126
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.4.3
Reporter: Jan


When uploading a file to the spark context via the addFile function, an 
exception is thrown when file path contains a space character. Escaping the 
space with %20 or \\ or + doesn't change the result.

 

to reproduce: 

file_path = "/home/user/test dir/config.conf"

sparkContext.addFile(file_path)

 

results in:

py4j.protocol.Py4JJavaError: An error occurred while calling o131.addFile.
: java.io.FileNotFoundException: File file:/home/user/test%20dir/config.conf 
does not exist



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30125) Remove PostgreSQL dialect

2019-12-04 Thread Yuanjian Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanjian Li updated SPARK-30125:

Summary: Remove PostgreSQL dialect  (was: Remove PostgreSQL dialect to 
reduce codebase maintenance cost)

> Remove PostgreSQL dialect
> -
>
> Key: SPARK-30125
> URL: https://issues.apache.org/jira/browse/SPARK-30125
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Priority: Major
>
> As the discussion in 
> [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html],
>  we need to remove PostgreSQL dialect form code base for several reasons:
> 1. The current approach makes the codebase complicated and hard to maintain.
> 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now.
>  
> Curently we have 3 features under PostgreSQL dialect:
> 1. SPARK-27931: when casting string to boolean, `t`, `tr`, `tru`, `yes`, .. 
> are also allowed as true string.
> 2. SPARK-29364: `date - date`  returns interval in Spark (SQL standard 
> behavior), but return int in PostgreSQL
> 3. SPARK-28395: `int / int` returns double in Spark, but returns int in 
> PostgreSQL. (there is no standard)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30125) Remove PostgreSQL dialect to reduce codebase maintenance cost

2019-12-04 Thread Yuanjian Li (Jira)

Yuanjian Li created SPARK-30125:
---

 Summary: Remove PostgreSQL dialect to reduce codebase maintenance 
cost
 Key: SPARK-30125
 URL: https://issues.apache.org/jira/browse/SPARK-30125
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuanjian Li


As the discussion in 
[http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html],
 we need to remove PostgreSQL dialect form code base for several reasons:
1. The current approach makes the codebase complicated and hard to maintain.
2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now.
 
Curently we have 3 features under PostgreSQL dialect:
1. SPARK-27931: when casting string to boolean, `t`, `tr`, `tru`, `yes`, .. are 
also allowed as true string.
2. SPARK-29364: `date - date`  returns interval in Spark (SQL standard 
behavior), but return int in PostgreSQL
3. SPARK-28395: `int / int` returns double in Spark, but returns int in 
PostgreSQL. (there is no standard)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30124) unnecessary persist in PythonMLLibAPI.scala

2019-12-04 Thread Aman Omer (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Omer updated SPARK-30124:
--
Summary: unnecessary persist in PythonMLLibAPI.scala  (was: Improper 
persist in PythonMLLibAPI.scala)

> unnecessary persist in PythonMLLibAPI.scala
> ---
>
> Key: SPARK-30124
> URL: https://issues.apache.org/jira/browse/SPARK-30124
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 3.0.0
>Reporter: Aman Omer
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30124) Improper persist in PythonMLLibAPI.scala

2019-12-04 Thread Aman Omer (Jira)

Aman Omer created SPARK-30124:
-

 Summary: Improper persist in PythonMLLibAPI.scala
 Key: SPARK-30124
 URL: https://issues.apache.org/jira/browse/SPARK-30124
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 3.0.0
Reporter: Aman Omer






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30123) PartitionPruning should consider more case

2019-12-04 Thread deshanxiao (Jira)

deshanxiao created SPARK-30123:
--

 Summary: PartitionPruning should consider more case
 Key: SPARK-30123
 URL: https://issues.apache.org/jira/browse/SPARK-30123
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: deshanxiao


If left has partitionScan and right has PruningFilter but hasBenefit is false. 
The right will never be added a SubQuery.


{code:java}
var partScan = getPartitionTableScan(l, left)
if (partScan.isDefined && canPruneLeft(joinType) &&
hasPartitionPruningFilter(right)) {
  val hasBenefit = pruningHasBenefit(l, partScan.get, r, right)
  newLeft = insertPredicate(l, newLeft, r, right, rightKeys, 
hasBenefit)
} else {
  partScan = getPartitionTableScan(r, right)
  if (partScan.isDefined && canPruneRight(joinType) &&
  hasPartitionPruningFilter(left) ) {
val hasBenefit = pruningHasBenefit(r, partScan.get, l, left)
newRight = insertPredicate(r, newRight, l, left, leftKeys, 
hasBenefit)
  }
}
  case _ =>
}
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30092) Number of active tasks is negative in Live UI Executors page

2019-12-04 Thread shahid (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987799#comment-16987799
 ] 

shahid commented on SPARK-30092:


[~zhongyu09]Do you have any steps for reproducing the issue?

> Number of active tasks is negative in Live UI Executors page
> 
>
> Key: SPARK-30092
> URL: https://issues.apache.org/jira/browse/SPARK-30092
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.1
> Environment: Hadoop version: 2.7.3
> ResourceManager version: 2.7.3
>Reporter: ZhongYu
>Priority: Major
> Attachments: wx20191202-102...@2x.png
>
>
> The number of active tasks is negative in Live UI Executors page when there 
> is executor lost and task failure. I am using spark on yarn which built on 
> AWS spot instances. When yarn work lost, there is a large probability to 
> become negative active tasks in Spark Live UI.  
> I saw related tickets below and resolved in earlier version of Spark. But 
> Same things happened again in Spark 2.4.1. See attachment.
> https://issues.apache.org/jira/browse/SPARK-8560
> https://issues.apache.org/jira/browse/SPARK-10141
> https://issues.apache.org/jira/browse/SPARK-19356



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30041) Add Codegen Stage Id to Stage DAG visualization in Web UI

2019-12-04 Thread Luca Canali (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-30041:

Attachment: (was: Snippet_StagesDags_with_CodegenId.png)

> Add Codegen Stage Id to Stage DAG visualization in Web UI
> -
>
> Key: SPARK-30041
> URL: https://issues.apache.org/jira/browse/SPARK-30041
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Priority: Minor
> Attachments: Snippet_StagesDags_with_CodegenId _annotated.png
>
>
> SPARK-29894 provides information on the Codegen Stage Id in WEBUI for SQL 
> Plan graphs. Similarly, this proposes to add Codegen Stage Id in the DAG 
> visualization for Stage execution. DAGs for Stage execution are available in 
> the WEBUI under the Jobs and Stages tabs.
>  This is proposed as an aid for drill-down analysis of complex SQL statement 
> execution, as it is not always easy to match parts of the SQL Plan graph with 
> the corresponding Stage DAG execution graph. Adding Codegen Stage Id for 
> WholeStageCodegen operations makes this task easier.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30041) Add Codegen Stage Id to Stage DAG visualization in Web UI

2019-12-04 Thread Luca Canali (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-30041:

Attachment: Snippet_StagesDags_with_CodegenId _annotated.png

> Add Codegen Stage Id to Stage DAG visualization in Web UI
> -
>
> Key: SPARK-30041
> URL: https://issues.apache.org/jira/browse/SPARK-30041
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Priority: Minor
> Attachments: Snippet_StagesDags_with_CodegenId _annotated.png, 
> Snippet_StagesDags_with_CodegenId.png
>
>
> SPARK-29894 provides information on the Codegen Stage Id in WEBUI for SQL 
> Plan graphs. Similarly, this proposes to add Codegen Stage Id in the DAG 
> visualization for Stage execution. DAGs for Stage execution are available in 
> the WEBUI under the Jobs and Stages tabs.
>  This is proposed as an aid for drill-down analysis of complex SQL statement 
> execution, as it is not always easy to match parts of the SQL Plan graph with 
> the corresponding Stage DAG execution graph. Adding Codegen Stage Id for 
> WholeStageCodegen operations makes this task easier.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30122) Allow setting serviceAccountName for executor pods

2019-12-04 Thread Jira

Juho Mäkinen created SPARK-30122:


 Summary: Allow setting serviceAccountName for executor pods
 Key: SPARK-30122
 URL: https://issues.apache.org/jira/browse/SPARK-30122
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.4
Reporter: Juho Mäkinen


Currently it doesn't seem to be possible to have Spark Driver set the 
serviceAccountName for executor pods it launches.

There is a "
spark.kubernetes.authenticate.driver.serviceAccountName" property so naturally 
one can expect to have a similar 
"spark.kubernetes.authenticate.executor.serviceAccountName" property, but such 
doesn't exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-29591) Support data insertion in a different order if you wish or even omit some columns in spark sql also like postgresql

2019-12-04 Thread Aman Omer (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Omer updated SPARK-29591:
--
Comment: was deleted

(was: Thanks for confirming. I will start working on this.)

> Support data insertion in a different order if you wish or even omit some 
> columns in spark sql also like postgresql
> ---
>
> Key: SPARK-29591
> URL: https://issues.apache.org/jira/browse/SPARK-29591
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: jobit mathew
>Priority: Major
>
> Support data insertion in a different order if you wish or even omit some 
> columns in spark sql also like postgre sql.
> *In postgre sql*
> {code:java}
> CREATE TABLE weather (
>  city varchar(80),
>  temp_lo int, – low temperature
>  temp_hi int, – high temperature
>  prcp real, – precipitation
>  date date
>  );
> {code}
> *You can list the columns in a different order if you wish or even omit some 
> columns,*
> {code:java}
> INSERT INTO weather (date, city, temp_hi, temp_lo)
>  VALUES ('1994-11-29', 'Hayward', 54, 37);
> {code}
>  
> *Spark SQL*
> But in spark sql is not allowing to insert data in different order or omit 
> any column.Better to support this as it can save time if we can not predict 
> any specific column value or if some value is fixed always.
> {code:java}
> create table jobit(id int,name string);
> > insert into jobit values(1,"Ankit");
>  Time taken: 0.548 seconds
>  spark-sql> *insert into jobit (id) values(1);*
>  *Error in query:*
>  mismatched input 'id' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
> 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)
> == SQL ==
>  insert into jobit (id) values(1)
>  ---^^^
> spark-sql> *insert into jobit (name,id) values("Ankit",1);*
>  *Error in query:*
>  mismatched input 'name' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 
> 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)
> == SQL ==
>  insert into jobit (name,id) values("Ankit",1)
>  ---^^^
> spark-sql>
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30121) Miss default memory setting for sbt usage

2019-12-04 Thread Kent Yao (Jira)

Kent Yao created SPARK-30121:


 Summary: Miss default memory setting for sbt usage
 Key: SPARK-30121
 URL: https://issues.apache.org/jira/browse/SPARK-30121
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.4.4, 3.0.0
Reporter: Kent Yao



{code:java}
```
-mem  set memory options (default: , which is -Xms2048m -Xmx2048m 
-XX:ReservedCodeCacheSize=256m)
```
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29667) implicitly convert mismatched datatypes on right side of "IN" operator

2019-12-04 Thread Marco Gaido (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987712#comment-16987712
 ] 

Marco Gaido commented on SPARK-29667:
-

I can agree more with you [~hyukjin.kwon]. I think that having different 
coercion rules for the two types of IN is very confusing. It'd be great for 
such things to be consistent among all the framework in order to avoid 
"surprises" for users IMHO.

> implicitly convert mismatched datatypes on right side of "IN" operator
> --
>
> Key: SPARK-29667
> URL: https://issues.apache.org/jira/browse/SPARK-29667
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Jessie Lin
>Priority: Minor
>
> Ran into error on this sql
> Mismatched columns:
> {code}
> [(a.`id`:decimal(28,0), db1.table1.`id`:decimal(18,0))] 
> {code}
> the sql and clause
> {code}
>   AND   a.id in (select id from db1.table1 where col1 = 1 group by id)
> {code}
> Once I cast {{decimal(18,0)}} to {{decimal(28,0)}} explicitly above, the sql 
> ran just fine. Can the sql engine cast implicitly in this case?
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29914) ML models append metadata in `transform`/`transformSchema`

2019-12-04 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-29914.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 26547
[https://github.com/apache/spark/pull/26547]

> ML models append metadata in `transform`/`transformSchema`
> --
>
> Key: SPARK-29914
> URL: https://issues.apache.org/jira/browse/SPARK-29914
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 3.1.0
>
>
> There are many impls (like 
> `Binarizer`/`Bucketizer`/`VectorAssembler`/`OneHotEncoder`/`FeatureHasher`/`HashingTF`/`VectorSlicer`/...)
>  in `.ml` that append appropriate metadata in `transform`/`transformSchema` 
> method.
> However there are also many impls return no metadata in transformation, even 
> some metadata like `vector.size`/`numAttrs`/`attrs` can be ealily inferred.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29914) ML models append metadata in `transform`/`transformSchema`

2019-12-04 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-29914:


Assignee: zhengruifeng

> ML models append metadata in `transform`/`transformSchema`
> --
>
> Key: SPARK-29914
> URL: https://issues.apache.org/jira/browse/SPARK-29914
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>
> There are many impls (like 
> `Binarizer`/`Bucketizer`/`VectorAssembler`/`OneHotEncoder`/`FeatureHasher`/`HashingTF`/`VectorSlicer`/...)
>  in `.ml` that append appropriate metadata in `transform`/`transformSchema` 
> method.
> However there are also many impls return no metadata in transformation, even 
> some metadata like `vector.size`/`numAttrs`/`attrs` can be ealily inferred.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18886) Delay scheduling should not delay some executors indefinitely if one task is scheduled before delay timeout

2019-12-04 Thread Nicholas Brett Marcott (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-18886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987633#comment-16987633
 ] 

Nicholas Brett Marcott commented on SPARK-18886:


Thanks for mentioning the PRs here.

My proposed solution in the second [PR mentioned 
above|https://github.com/apache/spark/pull/26696] is what I believe Kay said 
was ideal in comments of this [PR|https://github.com/apache/spark/pull/9433], 
but seemed to think was impractical. 

*The proposed solution:*

Currently the time window that locality wait times are measuring is the time 
since the last task launched for a TSM. The proposed change is to instead 
measure the time since this TSM's available slots were fully utilized.

The number of available slots for a TSM can be determined by dividing all slots 
among the TSMs according to the scheduling policy (FIFO vs FAIR).

*Other possible solutions and their issues:*
 # Never reset timer: delay scheduling would likely only work on first wave*
 # Per slot timer: delay scheduling should apply per task/taskset, otherwise, 
timers started by one taskset could cause delay scheduling to be ignored for 
the next taskset,  which might lead you to try approach #3
 # Per slot per stage timer: tasks can be starved by being offered unique slots 
over a period of time. Possibly a taskset or other job that doesn't care about 
locality would use those resources.  Also too many timers/bookkeeping
 # Per task timer: you still need a way to distinguish between when a task is 
waiting for a slot to become available vs it has them available but is not 
utilizing them (which is what this PR does). To do this right seems to be this 
PR + more timers.

 

*wave = one round of running as many tasks as there are available slots for a 
taskset. imagine you have 2 slots and 10 tasks. it would take 10 / 2 = 5 waves 
to complete the taskset

 

> Delay scheduling should not delay some executors indefinitely if one task is 
> scheduled before delay timeout
> ---
>
> Key: SPARK-18886
> URL: https://issues.apache.org/jira/browse/SPARK-18886
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.1.0
>Reporter: Imran Rashid
>Priority: Major
>
> Delay scheduling can introduce an unbounded delay and underutilization of 
> cluster resources under the following circumstances:
> 1. Tasks have locality preferences for a subset of available resources
> 2. Tasks finish in less time than the delay scheduling.
> Instead of having *one* delay to wait for resources with better locality, 
> spark waits indefinitely.
> As an example, consider a cluster with 100 executors, and a taskset with 500 
> tasks.  Say all tasks have a preference for one executor, which is by itself 
> on one host.  Given the default locality wait of 3s per level, we end up with 
> a 6s delay till we schedule on other hosts (process wait + host wait).
> If each task takes 5 seconds (under the 6 second delay), then _all 500_ tasks 
> get scheduled on _only one_ executor.  This means you're only using a 1% of 
> your cluster, and you get a ~100x slowdown.  You'd actually be better off if 
> tasks took 7 seconds.
> *WORKAROUNDS*: 
> (1) You can change the locality wait times so that it is shorter than the 
> task execution time.  You need to take into account the sum of all wait times 
> to use all the resources on your cluster.  For example, if you have resources 
> on different racks, this will include the sum of 
> "spark.locality.wait.process" + "spark.locality.wait.node" + 
> "spark.locality.wait.rack".  Those each default to "3s".  The simplest way to 
> be to set "spark.locality.wait.process" to your desired wait interval, and 
> set both "spark.locality.wait.node" and "spark.locality.wait.rack" to "0".  
> For example, if your tasks take ~3 seconds on average, you might set 
> "spark.locality.wait.process" to "1s".  *NOTE*: due to SPARK-18967, avoid 
> setting the {{spark.locality.wait=0}} -- instead, use 
> {{spark.locality.wait=1ms}}.
> Note that this workaround isn't perfect --with less delay scheduling, you may 
> not get as good resource locality.  After this issue is fixed, you'd most 
> likely want to undo these configuration changes.
> (2) The worst case here will only happen if your tasks have extreme skew in 
> their locality preferences.  Users may be able to modify their job to 
> controlling the distribution of the original input data.
> (2a) A shuffle may end up with very skewed locality preferences, especially 
> if you do a repartition starting from a small number of partitions.  (Shuffle 
> locality preference is assigned if any node has more than 20% of the shuffle 
> input data -- by chance, you may have one node just above that

59 matches

Mail list logo