[jira] [Commented] (SPARK-33357) Support SparkLauncher in Kubernetes

2020-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239550#comment-17239550
 ] 

Apache Spark commented on SPARK-33357:
--

User 'hddong' has created a pull request for this issue:
https://github.com/apache/spark/pull/30520

> Support SparkLauncher in Kubernetes
> ---
>
> Key: SPARK-33357
> URL: https://issues.apache.org/jira/browse/SPARK-33357
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.0.1
>Reporter: hong dongdong
>Priority: Major
>
> Now, SparkAppHandle can not get state report in k8s, we can support it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33357) Support SparkLauncher in Kubernetes

2020-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239549#comment-17239549
 ] 

Apache Spark commented on SPARK-33357:
--

User 'hddong' has created a pull request for this issue:
https://github.com/apache/spark/pull/30520

> Support SparkLauncher in Kubernetes
> ---
>
> Key: SPARK-33357
> URL: https://issues.apache.org/jira/browse/SPARK-33357
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.0.1
>Reporter: hong dongdong
>Priority: Major
>
> Now, SparkAppHandle can not get state report in k8s, we can support it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33357) Support SparkLauncher in Kubernetes

2020-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33357:


Assignee: (was: Apache Spark)

> Support SparkLauncher in Kubernetes
> ---
>
> Key: SPARK-33357
> URL: https://issues.apache.org/jira/browse/SPARK-33357
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.0.1
>Reporter: hong dongdong
>Priority: Major
>
> Now, SparkAppHandle can not get state report in k8s, we can support it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33357) Support SparkLauncher in Kubernetes

2020-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33357:


Assignee: Apache Spark

> Support SparkLauncher in Kubernetes
> ---
>
> Key: SPARK-33357
> URL: https://issues.apache.org/jira/browse/SPARK-33357
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.0.1
>Reporter: hong dongdong
>Assignee: Apache Spark
>Priority: Major
>
> Now, SparkAppHandle can not get state report in k8s, we can support it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-32691) Update commons-crypto to v1.1.0

2020-11-26 Thread RuiChen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239536#comment-17239536
 ] 

RuiChen edited comment on SPARK-32691 at 11/27/20, 7:18 AM:


[~huangtianhua] looks Spark ARM CI passed in these days, the issue have been 
fixed, right?

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/


was (Author: ruichen):
[~huangtianhua] looks Spark ARM CI passed in these days, 

> Update commons-crypto to v1.1.0
> ---
>
> Key: SPARK-32691
> URL: https://issues.apache.org/jira/browse/SPARK-32691
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 2.4.7, 3.0.0, 3.0.1, 3.1.0
> Environment: ARM64
>Reporter: huangtianhua
>Assignee: huangtianhua
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: Screen Shot 2020-09-28 at 8.49.04 AM.png, failure.log, 
> success.log
>
>
> Tests of org.apache.spark.DistributedSuite are failed on arm64 jenkins: 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/ 
> - caching in memory and disk, replicated (encryption = on) (with 
> replication as stream) *** FAILED ***
>   3 did not equal 2; got 3 replicas instead of 2 (DistributedSuite.scala:191)
> - caching in memory and disk, serialized, replicated (encryption = on) 
> (with replication as stream) *** FAILED ***
>   3 did not equal 2; got 3 replicas instead of 2 (DistributedSuite.scala:191)
> - caching in memory, serialized, replicated (encryption = on) (with 
> replication as stream) *** FAILED ***
>   3 did not equal 2; got 3 replicas instead of 2 (DistributedSuite.scala:191)
> ...
> 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32691) Update commons-crypto to v1.1.0

2020-11-26 Thread RuiChen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239536#comment-17239536
 ] 

RuiChen commented on SPARK-32691:
-

[~huangtianhua] looks Spark ARM CI passed in these days, 

> Update commons-crypto to v1.1.0
> ---
>
> Key: SPARK-32691
> URL: https://issues.apache.org/jira/browse/SPARK-32691
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 2.4.7, 3.0.0, 3.0.1, 3.1.0
> Environment: ARM64
>Reporter: huangtianhua
>Assignee: huangtianhua
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: Screen Shot 2020-09-28 at 8.49.04 AM.png, failure.log, 
> success.log
>
>
> Tests of org.apache.spark.DistributedSuite are failed on arm64 jenkins: 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/ 
> - caching in memory and disk, replicated (encryption = on) (with 
> replication as stream) *** FAILED ***
>   3 did not equal 2; got 3 replicas instead of 2 (DistributedSuite.scala:191)
> - caching in memory and disk, serialized, replicated (encryption = on) 
> (with replication as stream) *** FAILED ***
>   3 did not equal 2; got 3 replicas instead of 2 (DistributedSuite.scala:191)
> - caching in memory, serialized, replicated (encryption = on) (with 
> replication as stream) *** FAILED ***
>   3 did not equal 2; got 3 replicas instead of 2 (DistributedSuite.scala:191)
> ...
> 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33575) Fix incorrect exception message for ANALYZE COLUMN

2020-11-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33575.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30519
[https://github.com/apache/spark/pull/30519]

> Fix incorrect exception message for ANALYZE COLUMN
> --
>
> Key: SPARK-33575
> URL: https://issues.apache.org/jira/browse/SPARK-33575
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.1.0
>
>
> Currently, "ANALYZE TABLE tempView COMPUTE STATISTICS FOR COLUMNS " 
> throws "NoSuchTableException" even if "tempView" exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33575) Fix incorrect exception message for ANALYZE COLUMN

2020-11-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33575:
---

Assignee: Terry Kim

> Fix incorrect exception message for ANALYZE COLUMN
> --
>
> Key: SPARK-33575
> URL: https://issues.apache.org/jira/browse/SPARK-33575
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
>
> Currently, "ANALYZE TABLE tempView COMPUTE STATISTICS FOR COLUMNS " 
> throws "NoSuchTableException" even if "tempView" exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33566.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/30518

> Incorrectly Parsing CSV file
> 
>
> Key: SPARK-33566
> URL: https://issues.apache.org/jira/browse/SPARK-33566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Stephen More
>Priority: Minor
> Fix For: 3.1.0
>
>
> Here is a test case: 
> [https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]
> It shows how I believe apache commons csv and opencsv correctly parses the 
> sample csv file.
> spark is not correctly parsing the sample csv file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33576) PythonException: An exception was thrown from a UDF: 'OSError: Invalid IPC message: negative bodyLength'.

2020-11-26 Thread Darshat (Jira)
Darshat created SPARK-33576:
---

 Summary: PythonException: An exception was thrown from a UDF: 
'OSError: Invalid IPC message: negative bodyLength'.
 Key: SPARK-33576
 URL: https://issues.apache.org/jira/browse/SPARK-33576
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.0.1
 Environment: Databricks runtime 7.3

Spakr 3.0.1

Scala 2.12
Reporter: Darshat


Hello,

We are using Databricks on Azure to process large amount of ecommerce data. 
Databricks runtime is 7.3 which includes Apache spark 3.0.1 and Scala 2.12.

During processing, there is a groupby operation on the DataFrame that 
consistently gets an exception of this type:

 

{color:#FF}PythonException: An exception was thrown from a UDF: 'OSError: 
Invalid IPC message: negative bodyLength'. Full traceback below: Traceback 
(most recent call last): File "/databricks/spark/python/pyspark/worker.py", 
line 654, in main process() File "/databricks/spark/python/pyspark/worker.py", 
line 646, in process serializer.dump_stream(out_iter, outfile) File 
"/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 281, in 
dump_stream timely_flush_timeout_ms=self.timely_flush_timeout_ms) File 
"/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 97, in 
dump_stream for batch in iterator: File 
"/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 271, in 
init_stream_yield_batches for series in iterator: File 
"/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 287, in 
load_stream for batch in batches: File 
"/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 228, in 
load_stream for batch in batches: File 
"/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 118, in 
load_stream for batch in reader: File "pyarrow/ipc.pxi", line 412, in __iter__ 
File "pyarrow/ipc.pxi", line 432, in 
pyarrow.lib._CRecordBatchReader.read_next_batch File "pyarrow/error.pxi", line 
99, in pyarrow.lib.check_status OSError: Invalid IPC message: negative 
bodyLength{color}

 

Code that causes this:

{color:#57d9a3}## df has 22 million rows and 3 distinct provider ids. Domain 
features adds couple of computed columns to the dataframe{color}
{color:#FF}x = df.groupby('providerid').apply(domain_features){color}

{color:#FF}display(x.info()){color}

We've put all possible checks in the code for null values, or corrupt data and 
we are not able to track this to application level code. I hope we can get some 
help troubleshooting this as this is a blocker for rolling out at scale.

Dataframe size - 22 million rows, 31 columns
One of the columns is a string ('providerid') on which we do a groupby followed 
by an apply  operation. There are 3 distinct provider ids in this set. While 
trying to enumerate/count the results, we get this exception.



The cluster has 8 nodes + driver, all 28GB. I can provide any other settings 
that could be useful. 
Hope to get some insights into the problem. 

Thanks,

Darshat Shah



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33570) Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-26 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-33570:
---
Description: 
For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server is 
currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
available in the official apt repository and MariaDBKrbIntegrationSuite doesn't 
pass for now.
It seems that only the most recent three versions are available and they are 
10.5.6, 10.5.7 and 10.5.8 for now.
Further, the release cycle of MariaDB seems to be very rapid (1 ~ 2 months) so 
I don't think it's a good idea to set to an specific version for 
mariadb-plugin-gssapi-server.


  was:
For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server is 
currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
available in the official apt repository and MariaDBKrbIntegrationSuite doesn't 
pass for now.
It seems that only the most recent three versions are available and they are 
10.5.6, 10.5.7 and 10.5.8 for now.
Further, the release cycle of MariaDB seems to be too fast (1 ~ 2 months) so I 
don't think it's a good idea to set to an specific version for 
mariadb-plugin-gssapi-server.



> Set the proper version of gssapi plugin automatically for 
> MariaDBKrbIntegrationsuite
> 
>
> Key: SPARK-33570
> URL: https://issues.apache.org/jira/browse/SPARK-33570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server 
> is currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
> available in the official apt repository and MariaDBKrbIntegrationSuite 
> doesn't pass for now.
> It seems that only the most recent three versions are available and they are 
> 10.5.6, 10.5.7 and 10.5.8 for now.
> Further, the release cycle of MariaDB seems to be very rapid (1 ~ 2 months) 
> so I don't think it's a good idea to set to an specific version for 
> mariadb-plugin-gssapi-server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33560) Add "unused import" check to Maven compilation process

2020-11-26 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239495#comment-17239495
 ] 

Hyukjin Kwon commented on SPARK-33560:
--

Seems like Scalac now has its built-in silencer support. Once we drop Scala 
2.12, I think we can add it this into Maven as well, see also 
[https://github.com/scala/scala/pull/8373].
cc [~maxgekk] FYI

> Add "unused import" check to Maven compilation process
> --
>
> Key: SPARK-33560
> URL: https://issues.apache.org/jira/browse/SPARK-33560
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Minor
>
> Similar to SPARK-33441, need add "unused import" check  to maven pom.
> The blocker is how to achieve the same effect as SBT compiler check, It seems 
> that adding "-P:silencer:globalFilters=.*deprecated.*" configuration to 
> "scala-maven-plugin" is not supported at present



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33575) Fix incorrect exception message for ANALYZE COLUMN

2020-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33575:


Assignee: Apache Spark

> Fix incorrect exception message for ANALYZE COLUMN
> --
>
> Key: SPARK-33575
> URL: https://issues.apache.org/jira/browse/SPARK-33575
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, "ANALYZE TABLE tempView COMPUTE STATISTICS FOR COLUMNS " 
> throws "NoSuchTableException" even if "tempView" exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33575) Fix incorrect exception message for ANALYZE COLUMN

2020-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33575:


Assignee: (was: Apache Spark)

> Fix incorrect exception message for ANALYZE COLUMN
> --
>
> Key: SPARK-33575
> URL: https://issues.apache.org/jira/browse/SPARK-33575
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Priority: Minor
>
> Currently, "ANALYZE TABLE tempView COMPUTE STATISTICS FOR COLUMNS " 
> throws "NoSuchTableException" even if "tempView" exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33575) Fix incorrect exception message for ANALYZE COLUMN

2020-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239478#comment-17239478
 ] 

Apache Spark commented on SPARK-33575:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/30519

> Fix incorrect exception message for ANALYZE COLUMN
> --
>
> Key: SPARK-33575
> URL: https://issues.apache.org/jira/browse/SPARK-33575
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Priority: Minor
>
> Currently, "ANALYZE TABLE tempView COMPUTE STATISTICS FOR COLUMNS " 
> throws "NoSuchTableException" even if "tempView" exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33563) Expose inverse hyperbolic trig functions in PySpark and SparkR

2020-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33563:


Assignee: Maciej Szymkiewicz

> Expose inverse hyperbolic trig functions in PySpark and SparkR
> --
>
> Key: SPARK-33563
> URL: https://issues.apache.org/jira/browse/SPARK-33563
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SparkR, SQL
>Affects Versions: 3.1.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
>
> {{acosh}}, {{asinh}} and {{atanh}} were exposed in Scala {{sql.functions}} in 
> Spark 3.1 (SPARK-33061).
> For consistency, we should expose these in Python and R as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33563) Expose inverse hyperbolic trig functions in PySpark and SparkR

2020-11-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33563.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30501
[https://github.com/apache/spark/pull/30501]

> Expose inverse hyperbolic trig functions in PySpark and SparkR
> --
>
> Key: SPARK-33563
> URL: https://issues.apache.org/jira/browse/SPARK-33563
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SparkR, SQL
>Affects Versions: 3.1.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
> Fix For: 3.1.0
>
>
> {{acosh}}, {{asinh}} and {{atanh}} were exposed in Scala {{sql.functions}} in 
> Spark 3.1 (SPARK-33061).
> For consistency, we should expose these in Python and R as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33575) Fix incorrect exception message for ANALYZE COLUMN

2020-11-26 Thread Terry Kim (Jira)
Terry Kim created SPARK-33575:
-

 Summary: Fix incorrect exception message for ANALYZE COLUMN
 Key: SPARK-33575
 URL: https://issues.apache.org/jira/browse/SPARK-33575
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Terry Kim


Currently, "ANALYZE TABLE tempView COMPUTE STATISTICS FOR COLUMNS " 
throws "NoSuchTableException" even if "tempView" exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33517) Incorrect menu item display and link in PySpark Usage Guide for Pandas with Apache Arrow

2020-11-26 Thread liucht-inspur (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liucht-inspur updated SPARK-33517:
--
Description: 
Error setting menu item and link, change "Apache Arrow in Spark" to "Apache 
Arrow in PySpark"

  !image-2020-11-23-18-47-01-591.png!

 

after:

!image-2020-11-27-09-43-58-141.png!

 

  was:
Error setting menu item and link, change "Apache Arrow in Spark" to "Apache 
Arrow in PySpark"

  !image-2020-11-23-18-47-01-591.png!


> Incorrect menu item display and link in PySpark Usage Guide for Pandas with 
> Apache Arrow
> 
>
> Key: SPARK-33517
> URL: https://issues.apache.org/jira/browse/SPARK-33517
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: liucht-inspur
>Priority: Minor
> Attachments: image-2020-11-23-18-47-01-591.png, 
> image-2020-11-27-09-43-58-141.png, spark-doc.jpg
>
>
> Error setting menu item and link, change "Apache Arrow in Spark" to "Apache 
> Arrow in PySpark"
>   !image-2020-11-23-18-47-01-591.png!
>  
> after:
> !image-2020-11-27-09-43-58-141.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33517) Incorrect menu item display and link in PySpark Usage Guide for Pandas with Apache Arrow

2020-11-26 Thread liucht-inspur (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liucht-inspur updated SPARK-33517:
--
Attachment: image-2020-11-27-09-43-58-141.png

> Incorrect menu item display and link in PySpark Usage Guide for Pandas with 
> Apache Arrow
> 
>
> Key: SPARK-33517
> URL: https://issues.apache.org/jira/browse/SPARK-33517
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: liucht-inspur
>Priority: Minor
> Attachments: image-2020-11-23-18-47-01-591.png, 
> image-2020-11-27-09-43-58-141.png, spark-doc.jpg
>
>
> Error setting menu item and link, change "Apache Arrow in Spark" to "Apache 
> Arrow in PySpark"
>   !image-2020-11-23-18-47-01-591.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33489) Support null for conversion from and to Arrow type

2020-11-26 Thread Yuya Kanai (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239461#comment-17239461
 ] 

Yuya Kanai commented on SPARK-33489:


[~bryanc] 
Yes I'll try working on it. 
Thank you for mentioning.

> Support null for conversion from and to Arrow type
> --
>
> Key: SPARK-33489
> URL: https://issues.apache.org/jira/browse/SPARK-33489
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.0.1
>Reporter: Yuya Kanai
>Priority: Minor
>
> I got below error when using from_arrow_type() in pyspark.sql.pandas.types
> {{Unsupported type in conversion from Arrow: null}}
> I noticed NullType exists under pyspark.sql.types so it seems possible to 
> convert from pyarrow null to pyspark null type and vice versa.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16854) mapWithState Support for Python

2020-11-26 Thread Haim Bendanan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-16854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239407#comment-17239407
 ] 

Haim Bendanan commented on SPARK-16854:
---

+1

> mapWithState Support for Python
> ---
>
> Key: SPARK-16854
> URL: https://issues.apache.org/jira/browse/SPARK-16854
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Boaz
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33568) install coverage for pypy3

2020-11-26 Thread Shane Knapp (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp resolved SPARK-33568.
-
Resolution: Fixed

> install coverage for pypy3
> --
>
> Key: SPARK-33568
> URL: https://issues.apache.org/jira/browse/SPARK-33568
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark
>Affects Versions: 3.0.0
>Reporter: Shane Knapp
>Assignee: Shane Knapp
>Priority: Major
>
> from:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-2.7-hive-1.2/1002/console
>  
> Coverage is not installed in Python executable 'pypy3' but 
> 'COVERAGE_PROCESS_START' environment variable is set, exiting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33568) install coverage for pypy3

2020-11-26 Thread Shane Knapp (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp reassigned SPARK-33568:
---

Assignee: Shane Knapp

> install coverage for pypy3
> --
>
> Key: SPARK-33568
> URL: https://issues.apache.org/jira/browse/SPARK-33568
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark
>Affects Versions: 3.0.0
>Reporter: Shane Knapp
>Assignee: Shane Knapp
>Priority: Major
>
> from:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-2.7-hive-1.2/1002/console
>  
> Coverage is not installed in Python executable 'pypy3' but 
> 'COVERAGE_PROCESS_START' environment variable is set, exiting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33568) install coverage for pypy3

2020-11-26 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239399#comment-17239399
 ] 

Shane Knapp commented on SPARK-33568:
-

this is now installed on the ubuntu 16 workers

> install coverage for pypy3
> --
>
> Key: SPARK-33568
> URL: https://issues.apache.org/jira/browse/SPARK-33568
> Project: Spark
>  Issue Type: Bug
>  Components: Build, PySpark
>Affects Versions: 3.0.0
>Reporter: Shane Knapp
>Assignee: Shane Knapp
>Priority: Major
>
> from:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-2.7-hive-1.2/1002/console
>  
> Coverage is not installed in Python executable 'pypy3' but 
> 'COVERAGE_PROCESS_START' environment variable is set, exiting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33574) Improve locality for push-based shuffle especially for join like operations

2020-11-26 Thread Min Shen (Jira)
Min Shen created SPARK-33574:


 Summary: Improve locality for push-based shuffle especially for 
join like operations
 Key: SPARK-33574
 URL: https://issues.apache.org/jira/browse/SPARK-33574
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle, Spark Core
Affects Versions: 3.1.0
Reporter: Min Shen


Currently, we only set locality for ShuffledRDD and ShuffledRowRDD with 
push-based shuffle.

In simple stage DAGs where a ShuffledRDD or ShuffledRowRDD is the only input 
RDD, Spark can handle locality fine. However, if we have a join operation where 
a stage can consume multiple shuffle inputs or other non-shuffle inputs, the 
locality will take a hit with how DAGScheduler currently determines the 
preferred location.

With push-based shuffle, we could potentially reuse the same set of merger 
locations across sibling ShuffleMapStages. This would enable a much better 
locality on the reducer stage side, where corresponding merged shuffle 
partitions for the multiple shuffle inputs are already colocated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33573) Server and client side metrics related to push-based shuffle

2020-11-26 Thread Min Shen (Jira)
Min Shen created SPARK-33573:


 Summary: Server and client side metrics related to push-based 
shuffle
 Key: SPARK-33573
 URL: https://issues.apache.org/jira/browse/SPARK-33573
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle, Spark Core
Affects Versions: 3.1.0
Reporter: Min Shen


Need to add metrics on both server and client side related to push-based 
shuffle.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33235) Push-based Shuffle Improvement Tasks

2020-11-26 Thread Min Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Shen updated SPARK-33235:
-
Description: This is the parent jira for follow-up improvement tasks for 
supporting Push-based shuffle. Refer SPARK-30602.  (was: This is the parent 
jira for the phase 2 or follow-up tasks for supporting Push-based shuffle. 
Refer SPARK-30602.
)

> Push-based Shuffle Improvement Tasks
> 
>
> Key: SPARK-33235
> URL: https://issues.apache.org/jira/browse/SPARK-33235
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>  Labels: release-notes
>
> This is the parent jira for follow-up improvement tasks for supporting 
> Push-based shuffle. Refer SPARK-30602.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33235) Push-based Shuffle Improvement Tasks

2020-11-26 Thread Min Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Shen updated SPARK-33235:
-
Summary: Push-based Shuffle Improvement Tasks  (was: Push-based Shuffle 
Phase 2 Tasks)

> Push-based Shuffle Improvement Tasks
> 
>
> Key: SPARK-33235
> URL: https://issues.apache.org/jira/browse/SPARK-33235
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Chandni Singh
>Priority: Major
>  Labels: release-notes
>
> This is the parent jira for the phase 2 or follow-up tasks for supporting 
> Push-based shuffle. Refer SPARK-30602.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33566:


Assignee: Apache Spark

> Incorrectly Parsing CSV file
> 
>
> Key: SPARK-33566
> URL: https://issues.apache.org/jira/browse/SPARK-33566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Stephen More
>Assignee: Apache Spark
>Priority: Minor
>
> Here is a test case: 
> [https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]
> It shows how I believe apache commons csv and opencsv correctly parses the 
> sample csv file.
> spark is not correctly parsing the sample csv file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239304#comment-17239304
 ] 

Apache Spark commented on SPARK-33566:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30518

> Incorrectly Parsing CSV file
> 
>
> Key: SPARK-33566
> URL: https://issues.apache.org/jira/browse/SPARK-33566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Stephen More
>Priority: Minor
>
> Here is a test case: 
> [https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]
> It shows how I believe apache commons csv and opencsv correctly parses the 
> sample csv file.
> spark is not correctly parsing the sample csv file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239303#comment-17239303
 ] 

Apache Spark commented on SPARK-33566:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30518

> Incorrectly Parsing CSV file
> 
>
> Key: SPARK-33566
> URL: https://issues.apache.org/jira/browse/SPARK-33566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Stephen More
>Priority: Minor
>
> Here is a test case: 
> [https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]
> It shows how I believe apache commons csv and opencsv correctly parses the 
> sample csv file.
> spark is not correctly parsing the sample csv file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33566:


Assignee: (was: Apache Spark)

> Incorrectly Parsing CSV file
> 
>
> Key: SPARK-33566
> URL: https://issues.apache.org/jira/browse/SPARK-33566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Stephen More
>Priority: Minor
>
> Here is a test case: 
> [https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]
> It shows how I believe apache commons csv and opencsv correctly parses the 
> sample csv file.
> spark is not correctly parsing the sample csv file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-26 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239256#comment-17239256
 ] 

Yang Jie edited comment on SPARK-33566 at 11/26/20, 1:12 PM:
-

I think the reason for the bad case is Spark use "STOP_AT_DELIMITER" as default 
"UnescapedQuoteHandling" to build "CsvParser".  Configure 
"UnescapedQuoteHandling" to  "STOP_AT_CLOSING_QUOTE" seems can resolve this 
issue, but Spark not support configure this option now. [~hyukjin.kwon] 
[~moresmores]


was (Author: luciferyang):
I think the reason for the bad case is Spark use "STOP_AT_DELIMITER" as default 
"UnescapedQuoteHandling" to build "CsvParser".  Configure 
"UnescapedQuoteHandling" to  "STOP_AT_CLOSING_QUOTE" seems can resolve this 
issue. [~hyukjin.kwon] [~moresmores]

> Incorrectly Parsing CSV file
> 
>
> Key: SPARK-33566
> URL: https://issues.apache.org/jira/browse/SPARK-33566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Stephen More
>Priority: Minor
>
> Here is a test case: 
> [https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]
> It shows how I believe apache commons csv and opencsv correctly parses the 
> sample csv file.
> spark is not correctly parsing the sample csv file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33566) Incorrectly Parsing CSV file

2020-11-26 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239256#comment-17239256
 ] 

Yang Jie commented on SPARK-33566:
--

I think the reason for the bad case is Spark use "STOP_AT_DELIMITER" as default 
"UnescapedQuoteHandling" to build "CsvParser".  Configure 
"UnescapedQuoteHandling" to  "STOP_AT_CLOSING_QUOTE" seems can resolve this 
issue. [~hyukjin.kwon] [~moresmores]

> Incorrectly Parsing CSV file
> 
>
> Key: SPARK-33566
> URL: https://issues.apache.org/jira/browse/SPARK-33566
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7
>Reporter: Stephen More
>Priority: Minor
>
> Here is a test case: 
> [https://github.com/mores/maven-examples/blob/master/comma/src/test/java/org/test/CommaTest.java]
> It shows how I believe apache commons csv and opencsv correctly parses the 
> sample csv file.
> spark is not correctly parsing the sample csv file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33564) Prometheus metrics for Master and Worker isn't working

2020-11-26 Thread Paulo Roberto de Oliveira Castro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Roberto de Oliveira Castro updated SPARK-33564:
-
Description: 
Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
{{metrics.properties}} adding this content:
{quote}{{*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet}}
 {{*.sink.prometheusServlet.path=/metrics/prometheus}}
 master.sink.prometheusServlet.path=/metrics/master/prometheus
 applications.sink.prometheusServlet.path=/metrics/applications/prometheus
{quote}
Then I ran: 
{quote}{{$ sbin/start-master.sh}}
 {{$ sbin/start-slave.sh spark://`hostname`:7077}}
 {{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties}}
{quote}
{{The Spark shell opens without problems:}}
{quote}{{20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable}}

{{Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties}}

{{Setting default log level to "WARN".}}

{{To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).}}

{{Spark context Web UI available at 
[http://192.168.0.6:4040|http://192.168.0.6:4040/]}}

{{Spark context available as 'sc' (master = 
spark://MacBook-Pro-de-Paulo-2.local:7077, app id = app-20201125173618-0002).}}

{{Spark session available as 'spark'.}}

{{Welcome to}}

{{                    __}}

{{     / __/_   _/ /__}}

{{    _\ \/ _ \/ _ `/ __/  '_/}}

{{   /___/ .__/_,_/_/ /_/_\   version 3.0.0}}

{{      /_/}}

{{         }}

{{Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_212)}}

{{Type in expressions to have them evaluated.}}

{{Type :help for more information. }}

{{scala>}}
{quote}
{{And when I try to fetch prometheus metrics for driver, everything works 
fine:}}
{quote}$ curl -s [http://localhost:4040/metrics/prometheus/] | head -n 5

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Number\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_disk_diskSpaceUsed_MB_Value\{type="gauges"}
 0

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Number\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxMem_MB_Value\{type="gauges"}
 732

metrics_app_20201125173618_0002_driver_BlockManager_memory_maxOffHeapMem_MB_Number\{type="gauges"}
 0
{quote}
*The problem appears when I try accessing master metrics*, and I get the 
following problem:
{quote}{{$ curl -s [http://localhost:8080/metrics/master/prometheus]}}

{{}}

{{      }}

{{        setUIRoot('')}}

{{        }}

{{        }}

{{        Spark Master at 
spark://MacBook-Pro-de-Paulo-2.local:7077}}

{{      }}

{{      }}

{{        }}

{{          }}

{{            }}

{{              }}

{{                }}

{{                  }}

{{                  3.0.0}}

{{                }}

{{                Spark Master at spark://MacBook-Pro-de-Paulo-2.local:7077}}

{{              }}

{{            }}

{{          }}

{{          }}

{{          }}

{{            }}

{{              URL: 
spark://MacBook-Pro-de-Paulo-2.local:7077}}
 ...
{quote}
Instead of the metrics I'm getting an HTML page.  The same happens for all of 
those here:
{quote}{{$ curl -s [http://localhost:8080/metrics/applications/prometheus/]}}
 {{$ curl -s [http://localhost:8081/metrics/prometheus/]}}
{quote}
Instead, *I expected metrics in prometheus metrics*. All related JSON endpoints 
seem to be working fine.

  was:
Following the [PR|https://github.com/apache/spark/pull/25769] that introduced 
the Prometheus sink, I downloaded the {{spark-3.0.1-bin-hadoop2.7.tgz}}  (also 
tested with 3.0.0), uncompressed the tgz and created a file called 
{{metrics.properties}} adding this content:
{quote}{{*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet}}
 {{*.sink.prometheusServlet.path=/metrics/prometheus}}
master.sink.prometheusServlet.path=/metrics/master/prometheus
applications.sink.prometheusServlet.path=/metrics/applications/prometheus
{quote}
Then I ran: 
{quote}{{$ sbin/start-master.sh}}
 {{$ sbin/start-slave.sh spark://`hostname`:7077}}
 {{$ bin/spark-shell --master spark://`hostname`:7077 
--files=./metrics.properties --conf spark.metrics.conf=./metrics.properties}}
{quote}
{{The Spark shell opens without problems:}}
{quote}{{20/11/25 17:36:07 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable}}

{{Using Spark's default log4j profile: 
org/apache/spark/log4j-defaults.properties}}

{{Setting default log 

[jira] [Commented] (SPARK-33498) Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239233#comment-17239233
 ] 

Apache Spark commented on SPARK-33498:
--

User 'waitinfuture' has created a pull request for this issue:
https://github.com/apache/spark/pull/30516

> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid
> --
>
> Key: SPARK-33498
> URL: https://issues.apache.org/jira/browse/SPARK-33498
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Leanken.Lin
>Priority: Major
>
> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid, when ANSI mode is enable. This patch should update 
> GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33572) Datetime building should fail if the year, month, ..., second combination is invalid

2020-11-26 Thread zhoukeyong (Jira)
zhoukeyong created SPARK-33572:
--

 Summary: Datetime building should fail if the year, month, ..., 
second combination is invalid
 Key: SPARK-33572
 URL: https://issues.apache.org/jira/browse/SPARK-33572
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: zhoukeyong


Datetime building should fail if the year, month, ..., second combination is 
invalid, when ANSI mode is enabled. This patch should update MakeDate, 
MakeTimestamp and MakeInterval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-11-26 Thread Simon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239204#comment-17239204
 ] 

Simon commented on SPARK-33571:
---

Below the output of the date testscript with the noise removed
Writing without additional config works as expected. Spark 3.0.1. throws a 
`SparkUpgradeException`
Reading without additional config works as expected. Spark 3.0.1. throws a 
`SparkUpgradeException` when reading parquet files written with Spark 2.4.5 in 
Spark 3.0.1.

Reading using the two different `datetimeRebaseModeInRead` modes doesn't work 
though, it shows no difference

{code:java}
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), 
('spark.sql.legacy.parquet.datetimeRebaseModeInRead', 'LEGACY'), 
('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/datespark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- date: date (nullable = true)

+---+--+
|row|  date|
+---+--+
|  1|0220-10-01|
|  2|1880-10-01|
|  3|2020-10-01|
+---+--+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), 
('spark.sql.legacy.parquet.datetimeRebaseModeInRead', 'CORRECTED'), 
('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/datespark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- date: date (nullable = true)

+---+--+
|row|  date|
+---+--+
|  1|0220-10-01|
|  2|1880-10-01|
|  3|2020-10-01|
+---+--+

done
{code}
Note no difference in the dates shown

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog post linked there and looking at the Spark code.
> From our testing we're seeing several issues:
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` which contain timestamps before 
> the above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compares to that dataframe in Spark 2.4.5
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` or `DateType` which contain 
> dates or timestamps before the above mentioned moments in time with 
> `datetimeRebaseModeInRead` set to `LEGACY` resul

[jira] [Comment Edited] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-11-26 Thread Simon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239204#comment-17239204
 ] 

Simon edited comment on SPARK-33571 at 11/26/20, 11:16 AM:
---

Below the output of the date testscript with the noise removed
Writing without additional config works as expected. Spark 3.0.1. throws a 
`SparkUpgradeException` when writing to parquet and the dataframe contains old 
dates.
Reading without additional config works as expected. Spark 3.0.1. throws a 
`SparkUpgradeException` when reading parquet files written with Spark 2.4.5 in 
Spark 3.0.1.

Reading using the two different `datetimeRebaseModeInRead` modes doesn't work 
though, it shows no difference

{code:java}
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), 
('spark.sql.legacy.parquet.datetimeRebaseModeInRead', 'LEGACY'), 
('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/datespark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- date: date (nullable = true)

+---+--+
|row|  date|
+---+--+
|  1|0220-10-01|
|  2|1880-10-01|
|  3|2020-10-01|
+---+--+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), 
('spark.sql.legacy.parquet.datetimeRebaseModeInRead', 'CORRECTED'), 
('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/datespark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- date: date (nullable = true)

+---+--+
|row|  date|
+---+--+
|  1|0220-10-01|
|  2|1880-10-01|
|  3|2020-10-01|
+---+--+

done
{code}
Note no difference in the dates shown


was (Author: simonvanderveldt):
Below the output of the date testscript with the noise removed
Writing without additional config works as expected. Spark 3.0.1. throws a 
`SparkUpgradeException`
Reading without additional config works as expected. Spark 3.0.1. throws a 
`SparkUpgradeException` when reading parquet files written with Spark 2.4.5 in 
Spark 3.0.1.

Reading using the two different `datetimeRebaseModeInRead` modes doesn't work 
though, it shows no difference

{code:java}
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), 
('spark.sql.legacy.parquet.datetimeRebaseModeInRead', 'LEGACY'), 
('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/datespark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- date: date (nullable = true)

+---+--+
|row|  date|
+---+--+
|  1|0220-10-01|
|  2|1880-10-01|
|  3|2020-10-01|
+---+--+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), 
('spark.sql.legacy.parquet.datetimeRebaseModeInRead', 'CORRECTED'), 
('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/datespark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- date: date (nullable = true)

+---+--+
|row|  date|
+---+--+
|  1|0220-10-01|
|  2|1880-10-01|
|  3|2020-10-01|
+---+--+

done
{code}
Note no difference in the dates shown

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` 

[jira] [Comment Edited] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-11-26 Thread Simon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239193#comment-17239193
 ] 

Simon edited comment on SPARK-33571 at 11/26/20, 11:11 AM:
---

Below the output of the timestamp test script with the noise removed
*Writing:*
{code:java}
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark246/
done
...
Spark version: 3.0.1
Spark conf [('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.app.name', 
'generate-timestamp-data'), ('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+   
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark301/
done
{code}

Note no exception was raised when writing old timestamps to parquet in spark 
3.0.1

*Reading*
{code:java}
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark246/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark301/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

done
{code}
Note no exception was raised when reading parquet files written with Spark 
2.4.5 containing old timestamps

*Reading using the two different datetimeRebaseModeInRead modes*

{code:java}
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), 
('spark.sql.legacy.parquet.datetimeRebaseModeInRead', 'LEGACY'), 
('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), 
('spark.sql.legacy.parquet.datetimeRebaseModeInRead', 'CORRECTED'), 
('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timesta

[jira] [Comment Edited] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-11-26 Thread Simon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239193#comment-17239193
 ] 

Simon edited comment on SPARK-33571 at 11/26/20, 11:11 AM:
---

Below the output of the timestamp test script with the noise removed
*Writing:*
{code:java}
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark246/
done
...
Spark version: 3.0.1
Spark conf [('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.app.name', 
'generate-timestamp-data'), ('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+   
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark301/
done
{code}

Note no exception was raised when writing old timestamps to parquet in spark 
3.0.1

*Reading*
{code:java}
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark246/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark301/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

done
{code}
Note no exception was raised when reading parquet files written with Spark 
2.4.5 containing old timestamps

*Reading using the two different datetimeRebaseModeInRead modes*

{code:java}
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), 
('spark.sql.legacy.parquet.datetimeRebaseModeInRead', 'LEGACY'), 
('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), 
('spark.sql.legacy.parquet.datetimeRebaseModeInRead', 'CORRECTED'), 
('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timesta

[jira] [Comment Edited] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-11-26 Thread Simon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239193#comment-17239193
 ] 

Simon edited comment on SPARK-33571 at 11/26/20, 11:07 AM:
---

Below the output of the timestamp test script with the noise removed
*Writing:*
{code:java}
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark246/
done
...
Spark version: 3.0.1
Spark conf [('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.app.name', 
'generate-timestamp-data'), ('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+   
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark301/
done
{code}

Note no exception was raised when writing old timestamps to parquet in spark 
3.0.1

*Reading*
{code:java}
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark246/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark301/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

done
{code}
Note no exception was raised when reading parquet files written with Spark 
2.4.5 containing old timestamps


was (Author: simonvanderveldt):
Below the output of the timestamp test script with the noise removed
*Writing:*
{code:java}
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+--

[jira] [Comment Edited] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-11-26 Thread Simon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239193#comment-17239193
 ] 

Simon edited comment on SPARK-33571 at 11/26/20, 11:06 AM:
---

Below the output of the timestamp test script with the noise removed
*Writing:*
{code:java}
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark246/
done
...
Spark version: 3.0.1
Spark conf [('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.app.name', 
'generate-timestamp-data'), ('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+   
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark301/
done
{code}

Note no exception was raised when writing old timestamps to parquet in spark 
3.0.1

*Reading*
{code:java}
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark246/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark301/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

done
{code}
Note no exception was raised when reading the parquet files written with Spark 
2.4.5


was (Author: simonvanderveldt):
Below the output of the timestamp test script with the noise removed
*Writing:*
{code:java}
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing pa

[jira] [Comment Edited] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-11-26 Thread Simon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239193#comment-17239193
 ] 

Simon edited comment on SPARK-33571 at 11/26/20, 11:06 AM:
---

Below the output of the timestamp test script with the noise removed
*Writing:*
{code:java}
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark246/
done
...
Spark version: 3.0.1
Spark conf [('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.app.name', 
'generate-timestamp-data'), ('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+   
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark301/
done
{code}

Note no exception was raised when writing old timestamps to parquet in spark 
3.0.1

*Reading*
{code:java}
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark246/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark301/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

done
{code}
Note no exception was raised when reading parquet the parquet files written 
with Spark 2.4.5 which contains old timestamps


was (Author: simonvanderveldt):
Below the output of the timestamp test script with the noise removed
#Writing:
{code:java}
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|

[jira] [Comment Edited] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-11-26 Thread Simon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239193#comment-17239193
 ] 

Simon edited comment on SPARK-33571 at 11/26/20, 11:05 AM:
---

Below the output of the timestamp test script with the noise removed
#Writing:
{code:java}
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark246/
done
...
Spark version: 3.0.1
Spark conf [('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.app.name', 
'generate-timestamp-data'), ('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+   
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark301/
done
{code}

Note no exception was raised when writing old timestamps to parquet in spark 
3.0.1

# Reading

{code:java}
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark246/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark301/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

done
{code}
Note no exception was raised when reading parquet the parquet files written 
with Spark 2.4.5 which contains old timestamps


was (Author: simonvanderveldt):
Below the output of the timestamp test script with the noise removed
#Writing:
{code:java}
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|

[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-11-26 Thread Simon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239193#comment-17239193
 ] 

Simon commented on SPARK-33571:
---

Below the output of the timestamp test script with the noise removed
#Writing:
```
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark246/
done
...
Spark version: 3.0.1
Spark conf [('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.app.name', 
'generate-timestamp-data'), ('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+   
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark301/
done
```
Note not exception was raised when writing old timestamps to parquet in spark 
3.0.1

# Reading
```
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark246/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark301/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

done
```

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above m

[jira] [Comment Edited] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-11-26 Thread Simon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239193#comment-17239193
 ] 

Simon edited comment on SPARK-33571 at 11/26/20, 11:03 AM:
---

Below the output of the timestamp test script with the noise removed
#Writing:
{code:java}
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark246/
done
...
Spark version: 3.0.1
Spark conf [('spark.master', 'local[*]'), ('spark.submit.pyFiles', ''), 
('spark.submit.deployMode', 'client'), ('spark.app.name', 
'generate-timestamp-data'), ('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+   
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark301/
done
{code}

Note not exception was raised when writing old timestamps to parquet in spark 
3.0.1

# Reading
```
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark245/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark246/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

done
...
Spark version: 3.0.1
Spark conf [('spark.app.name', 'read-data'), ('spark.master', 'local[*]'), 
('spark.submit.pyFiles', ''), ('spark.submit.deployMode', 'client'), 
('spark.ui.showConsoleProgress', 'true')]
Reading parquet files from output/timestampspark301/*.parquet
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:10:10|
|  2|1880-10-01 10:10:10|
|  3|2020-10-01 10:10:10|
+---+---+

done
```


was (Author: simonvanderveldt):
Below the output of the timestamp test script with the noise removed
#Writing:
```
Spark version: 2.4.5
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark245/
done
...
Spark version: 2.4.6
Spark conf [('spark.master', 'local[*]'), ('spark.submit.deployMode', 
'client'), ('spark.app.name', 'generate-timestamp-data'), 
('spark.ui.showConsoleProgress', 'true')]
root
 |-- row: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

+---+---+
|row|  timestamp|
+---+---+
|  1|0220-10-01 10:50:38|
|  2|1880-10-01 10:50:38|
|  3|2020-10-01 10:10:10|
+---+---+

Writing parquet files to output/timestampspark246/
done
...
Spark version: 3.0.1
Spark conf [('spark.master', 'local

[jira] [Created] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-11-26 Thread Simon (Jira)
Simon created SPARK-33571:
-

 Summary: Handling of hybrid to proleptic calendar when reading and 
writing Parquet data not working correctly
 Key: SPARK-33571
 URL: https://issues.apache.org/jira/browse/SPARK-33571
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.1, 3.0.0
Reporter: Simon


The handling of old dates written with older Spark versions (<2.4.6) using the 
hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
correctly.

>From what I understand it should work like this:
 * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
1900-01-01T00:00:00Z
 * Only applies when reading or writing parquet files
 * When reading parquet files written with Spark < 2.4.6 which contain dates or 
timestamps before the above mentioned moments in time a `SparkUpgradeException` 
should be raised informing the user to choose either `LEGACY` or `CORRECTED` 
for the `datetimeRebaseModeInRead`
 * When reading parquet files written with Spark < 2.4.6 which contain dates or 
timestamps before the above mentioned moments in time and 
`datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
show the same values in Spark 3.0.1. with for example `df.show()` as they did 
in Spark 2.4.5
 * When reading parquet files written with Spark < 2.4.6 which contain dates or 
timestamps before the above mentioned moments in time and 
`datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
should show different values in Spark 3.0.1. with for example `df.show()` as 
they did in Spark 2.4.5
 * When writing parqet files with Spark > 3.0.0 which contain dates or 
timestamps before the above mentioned moment in time a `SparkUpgradeException` 
should be raised informing the user to choose either `LEGACY` or `CORRECTED` 
for the `datetimeRebaseModeInWrite`

First of all I'm not 100% sure all of this is correct. I've been unable to find 
any clear documentation on the expected behavior. The understanding I have was 
pieced together from the mailing list 
([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
 the blog post linked there and looking at the Spark code.

>From our testing we're seeing several issues:
 * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
that contains fields of type `TimestampType` which contain timestamps before 
the above mentioned moments in time without `datetimeRebaseModeInRead` set 
doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
the resulting dataframe compares to that dataframe in Spark 2.4.5
 * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
that contains fields of type `TimestampType` or `DateType` which contain dates 
or timestamps before the above mentioned moments in time with 
`datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the 
dataframe as when using `CORRECTED`, so it seems like no rebasing is happening.

I've made some scripts to help with testing/show the behavior, it uses pyspark 
2.4.5, 2.4.6 and 3.0.1. You can find them here 
[https://github.com/simonvanderveldt/spark3-rebasemode-issue.] I'll post the 
outputs in a comment below as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33570) Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-26 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-33570:
---
Description: 
For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server is 
currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
available in the official apt repository and MariaDBKrbIntegrationSuite doesn't 
pass for now.
It seems that only the most recent three versions are available and they are 
10.5.6, 10.5.7 and 10.5.8 for now.
Further, the release cycle of MariaDB seems to be too fast (1 ~ 2 months) so I 
don't think it's a good idea to set to an specific version for 
mariadb-plugin-gssapi-server.


  was:
For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server is 
currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
available in the official apt repository and MariaDBKrbIntegrationSuite doesn't 
pass for now.
It seems that only the most recent three versions are available and they are 
10.5.6, 10.5.7 and 10.5.8 for now.
Further, the release cycle for MariaDB seems to be too fast (1 ~ 2 months) so I 
don't think it's a good idea to set to an specific version for 
mariadb-plugin-gssapi-server.



> Set the proper version of gssapi plugin automatically for 
> MariaDBKrbIntegrationsuite
> 
>
> Key: SPARK-33570
> URL: https://issues.apache.org/jira/browse/SPARK-33570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server 
> is currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
> available in the official apt repository and MariaDBKrbIntegrationSuite 
> doesn't pass for now.
> It seems that only the most recent three versions are available and they are 
> 10.5.6, 10.5.7 and 10.5.8 for now.
> Further, the release cycle of MariaDB seems to be too fast (1 ~ 2 months) so 
> I don't think it's a good idea to set to an specific version for 
> mariadb-plugin-gssapi-server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33570) Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33570:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Set the proper version of gssapi plugin automatically for 
> MariaDBKrbIntegrationsuite
> 
>
> Key: SPARK-33570
> URL: https://issues.apache.org/jira/browse/SPARK-33570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server 
> is currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
> available in the official apt repository and MariaDBKrbIntegrationSuite 
> doesn't pass for now.
> It seems that only the most recent three versions are available and they are 
> 10.5.6, 10.5.7 and 10.5.8 for now.
> Further, the release cycle for MariaDB seems to be too fast (1 ~ 2 months) so 
> I don't think it's a good idea to set to an specific version for 
> mariadb-plugin-gssapi-server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33570) Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33570:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Set the proper version of gssapi plugin automatically for 
> MariaDBKrbIntegrationsuite
> 
>
> Key: SPARK-33570
> URL: https://issues.apache.org/jira/browse/SPARK-33570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server 
> is currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
> available in the official apt repository and MariaDBKrbIntegrationSuite 
> doesn't pass for now.
> It seems that only the most recent three versions are available and they are 
> 10.5.6, 10.5.7 and 10.5.8 for now.
> Further, the release cycle for MariaDB seems to be too fast (1 ~ 2 months) so 
> I don't think it's a good idea to set to an specific version for 
> mariadb-plugin-gssapi-server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33570) Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239145#comment-17239145
 ] 

Apache Spark commented on SPARK-33570:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/30515

> Set the proper version of gssapi plugin automatically for 
> MariaDBKrbIntegrationsuite
> 
>
> Key: SPARK-33570
> URL: https://issues.apache.org/jira/browse/SPARK-33570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server 
> is currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
> available in the official apt repository and MariaDBKrbIntegrationSuite 
> doesn't pass for now.
> It seems that only the most recent three versions are available and they are 
> 10.5.6, 10.5.7 and 10.5.8 for now.
> Further, the release cycle for MariaDB seems to be too fast (1 ~ 2 months) so 
> I don't think it's a good idea to set to an specific version for 
> mariadb-plugin-gssapi-server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33570) Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-26 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-33570:
---
Description: 
For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server is 
currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
available in the official apt repository and MariaDBKrbIntegrationSuite doesn't 
pass for now.
It seems that only the most recent three versions are available and they are 
10.5.6, 10.5.7 and 10.5.8 for now.
Further, the release cycle for MariaDB seems to be too fast (1 ~ 2 months) so I 
don't think it's a good idea to set to an specific version for 
mariadb-plugin-gssapi-server.


  was:
For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server is 
currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
available in the official apt repository.
It seems that only the most recent three versions are available and they are 
10.5.6, 10.5.7 and 10.5.8 for now.
Further, the release cycle for MariaDB seems to be too fast (1 ~ 2 months) so I 
don't think it's a good idea to set to an specific version for 
mariadb-plugin-gssapi-server.



> Set the proper version of gssapi plugin automatically for 
> MariaDBKrbIntegrationsuite
> 
>
> Key: SPARK-33570
> URL: https://issues.apache.org/jira/browse/SPARK-33570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server 
> is currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
> available in the official apt repository and MariaDBKrbIntegrationSuite 
> doesn't pass for now.
> It seems that only the most recent three versions are available and they are 
> 10.5.6, 10.5.7 and 10.5.8 for now.
> Further, the release cycle for MariaDB seems to be too fast (1 ~ 2 months) so 
> I don't think it's a good idea to set to an specific version for 
> mariadb-plugin-gssapi-server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33570) Set the proper version of gssapi plugin automatically for MariaDBKrbIntegrationsuite

2020-11-26 Thread Kousuke Saruta (Jira)
Kousuke Saruta created SPARK-33570:
--

 Summary: Set the proper version of gssapi plugin automatically for 
MariaDBKrbIntegrationsuite
 Key: SPARK-33570
 URL: https://issues.apache.org/jira/browse/SPARK-33570
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.1.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


For MariaDBKrbIntegrationSuite, the version of mariadb-plugin-gssapi-server is 
currently set to 10.5.5 in mariadb_docker_entrypoint.sh but it's no longer 
available in the official apt repository.
It seems that only the most recent three versions are available and they are 
10.5.6, 10.5.7 and 10.5.8 for now.
Further, the release cycle for MariaDB seems to be too fast (1 ~ 2 months) so I 
don't think it's a good idea to set to an specific version for 
mariadb-plugin-gssapi-server.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33569) Remove getting partitions by only ident

2020-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33569:


Assignee: (was: Apache Spark)

> Remove getting partitions by only ident
> ---
>
> Key: SPARK-33569
> URL: https://issues.apache.org/jira/browse/SPARK-33569
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> This is a follow up of SPARK-33509 which added a function for getting 
> partitions by names and ident. The function which gets partitions by ident is 
> not used anymore, and it can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33569) Remove getting partitions by only ident

2020-11-26 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33569:


Assignee: Apache Spark

> Remove getting partitions by only ident
> ---
>
> Key: SPARK-33569
> URL: https://issues.apache.org/jira/browse/SPARK-33569
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> This is a follow up of SPARK-33509 which added a function for getting 
> partitions by names and ident. The function which gets partitions by ident is 
> not used anymore, and it can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33569) Remove getting partitions by only ident

2020-11-26 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17239128#comment-17239128
 ] 

Apache Spark commented on SPARK-33569:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30514

> Remove getting partitions by only ident
> ---
>
> Key: SPARK-33569
> URL: https://issues.apache.org/jira/browse/SPARK-33569
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> This is a follow up of SPARK-33509 which added a function for getting 
> partitions by names and ident. The function which gets partitions by ident is 
> not used anymore, and it can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33569) Remove getting partitions by only ident

2020-11-26 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33569:
--

 Summary: Remove getting partitions by only ident
 Key: SPARK-33569
 URL: https://issues.apache.org/jira/browse/SPARK-33569
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


This is a follow up of SPARK-33509 which added a function for getting 
partitions by names and ident. The function which gets partitions by ident is 
not used anymore, and it can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org