[jira] [Updated] (SPARK-32082) Project Zen: Improving Python usability

2020-06-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-32082:
-
Summary: Project Zen: Improving Python usability  (was: The importance of 
Python and PySpark has grown radically recently. This ticket targets to improve 
the usability in PySpark.)

> Project Zen: Improving Python usability
> ---
>
> Key: SPARK-32082
> URL: https://issues.apache.org/jira/browse/SPARK-32082
> Project: Spark
>  Issue Type: Epic
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> The importance of Python and PySpark has grown radically in the last few 
> years. This epic tickets aims to improve the usability in PySpark, and make 
> it more Pythonic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32068) Spark 3 UI task launch time show in error time zone

2020-06-24 Thread Smith Cruise (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143579#comment-17143579
 ] 

Smith Cruise commented on SPARK-32068:
--

OK, I have uploaded it.

> Spark 3 UI task launch time show in error time zone
> ---
>
> Key: SPARK-32068
> URL: https://issues.apache.org/jira/browse/SPARK-32068
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Smith Cruise
>Priority: Major
>  Labels: easyfix
> Attachments: correct.png, incorrect.png
>
>
> For example,
> In this link: history/app-20200623133209-0015/stages/ , stage submit time is 
> correct (CST)
>  
> But in this link: 
> history/app-20200623133209-0015/stages/stage/?id=0=0 , task launch 
> time is incorrect(UTC)
>  
> The same problem exists in port 4040 Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32068) Spark 3 UI task launch time show in error time zone

2020-06-24 Thread Smith Cruise (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Smith Cruise updated SPARK-32068:
-
Attachment: incorrect.png
correct.png

> Spark 3 UI task launch time show in error time zone
> ---
>
> Key: SPARK-32068
> URL: https://issues.apache.org/jira/browse/SPARK-32068
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Smith Cruise
>Priority: Major
>  Labels: easyfix
> Attachments: correct.png, incorrect.png
>
>
> For example,
> In this link: history/app-20200623133209-0015/stages/ , stage submit time is 
> correct (CST)
>  
> But in this link: 
> history/app-20200623133209-0015/stages/stage/?id=0=0 , task launch 
> time is incorrect(UTC)
>  
> The same problem exists in port 4040 Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32068) Spark 3 UI task launch time show in error time zone

2020-06-24 Thread Smith Cruise (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Smith Cruise updated SPARK-32068:
-
Description: 
For example,

In this link: history/app-20200623133209-0015/stages/ , stage submit time is 
correct (CST)

 

But in this link: history/app-20200623133209-0015/stages/stage/?id=0=0 
, task launch time is incorrect(UTC)

 

The same problem exists in port 4040 Web UI.

  was:
For example,

In this link: history/app-20200623133209-0015/stages/ , stage submit time is 
correct (UTS)

 

But in this link: history/app-20200623133209-0015/stages/stage/?id=0=0 
, task launch time is incorrect(UTC)

 

The same problem exists in port 4040 Web UI.


> Spark 3 UI task launch time show in error time zone
> ---
>
> Key: SPARK-32068
> URL: https://issues.apache.org/jira/browse/SPARK-32068
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Smith Cruise
>Priority: Major
>  Labels: easyfix
>
> For example,
> In this link: history/app-20200623133209-0015/stages/ , stage submit time is 
> correct (CST)
>  
> But in this link: 
> history/app-20200623133209-0015/stages/stage/?id=0=0 , task launch 
> time is incorrect(UTC)
>  
> The same problem exists in port 4040 Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32068) Spark 3 UI task launch time show in error time zone

2020-06-24 Thread Smith Cruise (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Smith Cruise updated SPARK-32068:
-
Attachment: (was: 未命名.png)

> Spark 3 UI task launch time show in error time zone
> ---
>
> Key: SPARK-32068
> URL: https://issues.apache.org/jira/browse/SPARK-32068
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Smith Cruise
>Priority: Major
>  Labels: easyfix
>
> For example,
> In this link: history/app-20200623133209-0015/stages/ , stage submit time is 
> correct (CST)
>  
> But in this link: 
> history/app-20200623133209-0015/stages/stage/?id=0=0 , task launch 
> time is incorrect(UTC)
>  
> The same problem exists in port 4040 Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32068) Spark 3 UI task launch time show in error time zone

2020-06-24 Thread Smith Cruise (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Smith Cruise updated SPARK-32068:
-
Attachment: 未命名.png

> Spark 3 UI task launch time show in error time zone
> ---
>
> Key: SPARK-32068
> URL: https://issues.apache.org/jira/browse/SPARK-32068
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Smith Cruise
>Priority: Major
>  Labels: easyfix
>
> For example,
> In this link: history/app-20200623133209-0015/stages/ , stage submit time is 
> correct (CST)
>  
> But in this link: 
> history/app-20200623133209-0015/stages/stage/?id=0=0 , task launch 
> time is incorrect(UTC)
>  
> The same problem exists in port 4040 Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-32063) Spark native temporary table

2020-06-24 Thread Lantao Jin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143575#comment-17143575
 ] 

Lantao Jin edited comment on SPARK-32063 at 6/24/20, 6:53 AM:
--

[~viirya] For 1, even RDD cache or table cache can improve performance, but I 
still think they have totally different scopes. Besides, we also can cache a 
temporary table to memory to get more performance improvement. In production 
usage, I found our data engineers and data scientists do not always remember to 
uncached cached tables or views. This situation became worse in the Spark 
thrift-server (sharing Spark driver). 

For 2, we found when Adaptive Query Execution enabled, complex views are easily 
stuck in the optimization step. Cache this view couldn't help.

For 3, the scenario is in our migration case, move SQL from Teradata to Spark. 
Without the temporary table, TD users have to create permanent tables and drop 
them at the end of a script as an alternate of TD volatile table, if JDBC 
session closed or script failed before cleaning up, no mechanism guarantee to 
drop the intermediate data. If they use Spark temporary view, many logic 
couldn't work well. For example, they want to execute UPDATE/DELETE op on 
intermediate tables but we cannot convert a temporary view to Delta table or 
Hudi table ...


was (Author: cltlfcjin):
For 1, even RDD cache or table cache can improve performance, but I still think 
they have totally different scopes. Besides, we also can cache a temporary 
table to memory to get more performance improvement. In production usage, I 
found our data engineers and data scientists do not always remember to uncached 
cached tables or views. This situation became worse in the Spark thrift-server 
(sharing Spark driver). 

For 2, we found when Adaptive Query Execution enabled, complex views are easily 
stuck in the optimization step. Cache this view couldn't help.

For 3, the scenario is in our migration case, move SQL from Teradata to Spark. 
Without the temporary table, TD users have to create permanent tables and drop 
them at the end of a script as an alternate of TD volatile table, if JDBC 
session closed or script failed before cleaning up, no mechanism guarantee to 
drop the intermediate data. If they use Spark temporary view, many logic 
couldn't work well. For example, they want to execute UPDATE/DELETE op on 
intermediate tables but we cannot convert a temporary view to Delta table or 
Hudi table ...

> Spark native temporary table
> 
>
> Key: SPARK-32063
> URL: https://issues.apache.org/jira/browse/SPARK-32063
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Lantao Jin
>Priority: Major
>
> Many databases and data warehouse SQL engines support temporary tables. A 
> temporary table, as its named implied, is a short-lived table that its life 
> will be only for current session.
> In Spark, there is no temporary table. the DDL “CREATE TEMPORARY TABLE AS 
> SELECT” will create a temporary view. A temporary view is totally different 
> with a temporary table. 
> A temporary view is just a VIEW. It doesn’t materialize data in storage. So 
> it has below shortage:
>  # View will not give improved performance. Materialize intermediate data in 
> temporary tables for a complex query will accurate queries, especially in an 
> ETL pipeline.
>  # View which calls other views can cause severe performance issues. Even, 
> executing a very complex view may fail in Spark. 
>  # Temporary view has no database namespace. In some complex ETL pipelines or 
> data warehouse applications, without database prefix is not convenient. It 
> needs some tables which only used in current session.
>  
> More details are described in [Design 
> Docs|https://docs.google.com/document/d/1RS4Q3VbxlZ_Yy0fdWgTJ-k0QxFd1dToCqpLAYvIJ34U/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32063) Spark native temporary table

2020-06-24 Thread Lantao Jin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143575#comment-17143575
 ] 

Lantao Jin commented on SPARK-32063:


For 1, even RDD cache or table cache can improve performance, but I still think 
they have totally different scopes. Besides, we also can cache a temporary 
table to memory to get more performance improvement. In production usage, I 
found our data engineers and data scientists do not always remember to uncached 
cached tables or views. This situation became worse in the Spark thrift-server 
(sharing Spark driver). 

For 2, we found when Adaptive Query Execution enabled, complex views are easily 
stuck in the optimization step. Cache this view couldn't help.

For 3, the scenario is in our migration case, move SQL from Teradata to Spark. 
Without the temporary table, TD users have to create permanent tables and drop 
them at the end of a script as an alternate of TD volatile table, if JDBC 
session closed or script failed before cleaning up, no mechanism guarantee to 
drop the intermediate data. If they use Spark temporary view, many logic 
couldn't work well. For example, they want to execute UPDATE/DELETE op on 
intermediate tables but we cannot convert a temporary view to Delta table or 
Hudi table ...

> Spark native temporary table
> 
>
> Key: SPARK-32063
> URL: https://issues.apache.org/jira/browse/SPARK-32063
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Lantao Jin
>Priority: Major
>
> Many databases and data warehouse SQL engines support temporary tables. A 
> temporary table, as its named implied, is a short-lived table that its life 
> will be only for current session.
> In Spark, there is no temporary table. the DDL “CREATE TEMPORARY TABLE AS 
> SELECT” will create a temporary view. A temporary view is totally different 
> with a temporary table. 
> A temporary view is just a VIEW. It doesn’t materialize data in storage. So 
> it has below shortage:
>  # View will not give improved performance. Materialize intermediate data in 
> temporary tables for a complex query will accurate queries, especially in an 
> ETL pipeline.
>  # View which calls other views can cause severe performance issues. Even, 
> executing a very complex view may fail in Spark. 
>  # Temporary view has no database namespace. In some complex ETL pipelines or 
> data warehouse applications, without database prefix is not convenient. It 
> needs some tables which only used in current session.
>  
> More details are described in [Design 
> Docs|https://docs.google.com/document/d/1RS4Q3VbxlZ_Yy0fdWgTJ-k0QxFd1dToCqpLAYvIJ34U/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30466) remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13

2020-06-24 Thread Prashant Sharma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143569#comment-17143569
 ] 

Prashant Sharma commented on SPARK-30466:
-

I just saw, Hadoop 3.2.1 still uses these jars(jackson-mapper-asl-1.9.13 and 
jackson-core-asl-1.9.13), they are a transitive dependency on jersey-json. See 
below.
{code:java}
[INFO] org.apache.hadoop:hadoop-common:jar:3.2.1
[INFO] +- org.apache.hadoop:hadoop-annotations:jar:3.2.1:compile
[INFO] |  \- jdk.tools:jdk.tools:jar:1.8:system
[INFO] +- com.google.guava:guava:jar:27.0-jre:compile
[INFO] |  +- com.google.guava:failureaccess:jar:1.0:compile
[INFO] |  +- 
com.google.guava:listenablefuture:jar:.0-empty-to-avoid-conflict-with-guava:compile
[INFO] |  +- org.checkerframework:checker-qual:jar:2.5.2:compile
[INFO] |  +- com.google.errorprone:error_prone_annotations:jar:2.2.0:compile
[INFO] |  +- com.google.j2objc:j2objc-annotations:jar:1.1:compile
[INFO] |  \- org.codehaus.mojo:animal-sniffer-annotations:jar:1.17:compile
[INFO] +- commons-cli:commons-cli:jar:1.2:compile
[INFO] +- org.apache.commons:commons-math3:jar:3.1.1:compile
[INFO] +- org.apache.httpcomponents:httpclient:jar:4.5.6:compile
[INFO] |  \- org.apache.httpcomponents:httpcore:jar:4.4.10:compile
[INFO] +- commons-codec:commons-codec:jar:1.11:compile
[INFO] +- commons-io:commons-io:jar:2.5:compile
[INFO] +- commons-net:commons-net:jar:3.6:compile
[INFO] +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] +- javax.servlet:javax.servlet-api:jar:3.1.0:compile
[INFO] +- org.eclipse.jetty:jetty-server:jar:9.3.24.v20180605:compile
[INFO] |  +- org.eclipse.jetty:jetty-http:jar:9.3.24.v20180605:compile
[INFO] |  \- org.eclipse.jetty:jetty-io:jar:9.3.24.v20180605:compile
[INFO] +- org.eclipse.jetty:jetty-util:jar:9.3.24.v20180605:compile
[INFO] +- org.eclipse.jetty:jetty-servlet:jar:9.3.24.v20180605:compile
[INFO] |  \- org.eclipse.jetty:jetty-security:jar:9.3.24.v20180605:compile
[INFO] +- org.eclipse.jetty:jetty-webapp:jar:9.3.24.v20180605:compile
[INFO] |  \- org.eclipse.jetty:jetty-xml:jar:9.3.24.v20180605:compile
[INFO] +- org.eclipse.jetty:jetty-util-ajax:jar:9.3.24.v20180605:test
[INFO] +- javax.servlet.jsp:jsp-api:jar:2.1:runtime
[INFO] +- com.sun.jersey:jersey-core:jar:1.19:compile
[INFO] |  \- javax.ws.rs:jsr311-api:jar:1.1.1:compile
[INFO] +- com.sun.jersey:jersey-servlet:jar:1.19:compile
[INFO] +- com.sun.jersey:jersey-json:jar:1.19:compile
[INFO] |  +- org.codehaus.jettison:jettison:jar:1.1:compile
[INFO] |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
[INFO] |  |  \- javax.xml.bind:jaxb-api:jar:2.2.11:compile
[INFO] |  +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:compile
[INFO] |  +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:compile
[INFO] |  +- org.codehaus.jackson:jackson-jaxrs:jar:1.9.13:compile
[INFO] |  \- org.codehaus.jackson:jackson-xc:jar:1.9.13:compile
[INFO] +- com.sun.jersey:jersey-server:jar:1.19:compile

{code}

> remove dependency on jackson-mapper-asl-1.9.13 and jackson-core-asl-1.9.13
> --
>
> Key: SPARK-30466
> URL: https://issues.apache.org/jira/browse/SPARK-30466
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Michael Burgener
>Priority: Major
>  Labels: security
>
> These 2 libraries are deprecated and replaced by the jackson-databind 
> libraries which are already included.  These two libraries are flagged by our 
> vulnerability scanners as having the following security vulnerabilities.  
> I've set the priority to Major due to the Critical nature and hopefully they 
> can be addressed quickly.  Please note, I'm not a developer but work in 
> InfoSec and this was flagged when we incorporated spark into our product.  If 
> you feel the priority is not set correctly please change accordingly.  I'll 
> watch the issue and flag our dev team to update once resolved.  
> jackson-mapper-asl-1.9.13
> CVE-2018-7489 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-7489] 
>  
> CVE-2017-7525 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-7525]
>  
> CVE-2017-17485 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-17485]
>  
> CVE-2017-15095 (CVSS 3.0 Score 9.8 CRITICAL)
> [https://nvd.nist.gov/vuln/detail/CVE-2017-15095]
>  
> CVE-2018-5968 (CVSS 3.0 Score 8.1 High)
> [https://nvd.nist.gov/vuln/detail/CVE-2018-5968]
>  
> jackson-core-asl-1.9.13
> CVE-2016-7051 (CVSS 3.0 Score 8.6 High)
> https://nvd.nist.gov/vuln/detail/CVE-2016-7051



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, 

[jira] [Resolved] (SPARK-32074) Update AppVeyor R to 4.0.2

2020-06-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32074.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/28909

> Update AppVeyor R to 4.0.2
> --
>
> Key: SPARK-32074
> URL: https://issues.apache.org/jira/browse/SPARK-32074
> Project: Spark
>  Issue Type: Improvement
>  Components: R
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
> Fix For: 3.1.0
>
>
> We should test R 4.0.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25244) [Python] Setting `spark.sql.session.timeZone` only partially respected

2020-06-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143566#comment-17143566
 ] 

Hyukjin Kwon commented on SPARK-25244:
--

This issue was closed because it marked the affected version as 2.3 which is 
EOL. Feel free to create new JIRA with a reproducer and analysis if the issue 
persists.

> [Python] Setting `spark.sql.session.timeZone` only partially respected
> --
>
> Key: SPARK-25244
> URL: https://issues.apache.org/jira/browse/SPARK-25244
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.1
>Reporter: Anton Daitche
>Priority: Major
>  Labels: bulk-closed
>
> The setting `spark.sql.session.timeZone` is respected by PySpark when 
> converting from and to Pandas, as described 
> [here|http://spark.apache.org/docs/latest/sql-programming-guide.html#timestamp-with-time-zone-semantics].
>  However, when timestamps are converted directly to Pythons `datetime` 
> objects, its ignored and the systems timezone is used.
> This can be checked by the following code snippet
> {code:java}
> import pyspark.sql
> spark = (pyspark
>  .sql
>  .SparkSession
>  .builder
>  .master('local[1]')
>  .config("spark.sql.session.timeZone", "UTC")
>  .getOrCreate()
> )
> df = spark.createDataFrame([("2018-06-01 01:00:00",)], ["ts"])
> df = df.withColumn("ts", df["ts"].astype("timestamp"))
> print(df.toPandas().iloc[0,0])
> print(df.collect()[0][0])
> {code}
> Which for me prints (the exact result depends on the timezone of your system, 
> mine is Europe/Berlin)
> {code:java}
> 2018-06-01 01:00:00
> 2018-06-01 03:00:00
> {code}
> Hence, the method `toPandas` respected the timezone setting (UTC), but the 
> method `collect` ignored it and converted the timestamp to my systems 
> timezone.
> The cause for this behaviour is that the methods `toInternal` and 
> `fromInternal` of PySparks `TimestampType` class don't take into account the 
> setting `spark.sql.session.timeZone` and use the system timezone.
> If the maintainers agree that this should be fixed, I would try to come up 
> with a patch. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31887) Date casting to string is giving wrong value

2020-06-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143564#comment-17143564
 ] 

Hyukjin Kwon commented on SPARK-31887:
--

The changes that fixed this issue is likely about calendar switching at 
SPARK-26651, which is a very big and invasive change. It will not likely be 
ported to back.

> Date casting to string is giving wrong value
> 
>
> Key: SPARK-31887
> URL: https://issues.apache.org/jira/browse/SPARK-31887
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.5
> Environment: The spark is running on cluster mode with Mesos.
>  
> Mesos agents are dockerised running on Ubuntu 18.
>  
> Timezone setting of docker instance: UTC
> Timezone of server hosting docker: America/New_York
> Timezone of driver machine: America/New_York
>Reporter: Amit Gupta
>Priority: Major
>
> The code converts the string to date and then write it in csv.
> {code:java}
> val x = Seq(("2020-02-19", "2020-02-19 05:11:00")).toDF("a", 
> "b").select('a.cast("date"), 'b.cast("timestamp"))
> x.show()
> +--+---+
> | a|  b|
> +--+---+
> |2020-02-19|2020-02-19 05:11:00|
> +--+---+
> x.write.mode("overwrite").option("header", true).csv("/tmp/test1.csv")
> {code}
>  
> The date written in CSV file is different:
> {code:java}
> > snakebite cat "/tmp/test1.csv/*.csv"
> a,b
> 2020-02-18,2020-02-19T05:11:00.000Z{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27281) Wrong latest offsets returned by DirectKafkaInputDStream#latestOffsets

2020-06-24 Thread Yuanyuan Xia (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143562#comment-17143562
 ] 

Yuanyuan Xia commented on SPARK-27281:
--

In our environment, we encounter the same issue and the cause seems also 
related to [KAFKA-7703|https://issues.apache.org/jira/browse/KAFKA-7703]

> Wrong latest offsets returned by DirectKafkaInputDStream#latestOffsets
> --
>
> Key: SPARK-27281
> URL: https://issues.apache.org/jira/browse/SPARK-27281
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.4.0
>Reporter: Viacheslav Krot
>Priority: Major
>
> I have a very strange and hard to reproduce issue when using kafka direct 
> streaming, version 2.4.0
>  From time to time, maybe once a day - once a week I get following error 
> {noformat}
> java.lang.IllegalArgumentException: requirement failed: numRecords must not 
> be negative
> at scala.Predef$.require(Predef.scala:224)
> at 
> org.apache.spark.streaming.scheduler.StreamInputInfo.(InputInfoTracker.scala:38)
> at 
> org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:250)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:336)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:334)
> at scala.Option.orElse(Option.scala:289)
> at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:331)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:48)
> at 
> org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:122)
> at 
> org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:121)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
> at 
> org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:121)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:249)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:247)
> at scala.util.Try$.apply(Try.scala:192)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:247)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:183)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:89)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
> 19/01/29 13:10:00 ERROR apps.BusinessRuleEngine: Job failed. Stopping JVM
> java.lang.IllegalArgumentException: requirement failed: numRecords must not 
> be negative
> at scala.Predef$.require(Predef.scala:224)
> at 
> org.apache.spark.streaming.scheduler.StreamInputInfo.(InputInfoTracker.scala:38)
> at 
> org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:250)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
> at 
> 

[jira] [Issue Comment Deleted] (SPARK-27281) Wrong latest offsets returned by DirectKafkaInputDStream#latestOffsets

2020-06-24 Thread Yuanyuan Xia (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanyuan Xia updated SPARK-27281:
-
Comment: was deleted

(was: In our environment, we encounter the same issue and the cause seems also 
related to [KAFKA-7703|https://issues.apache.org/jira/browse/KAFKA-7703])

> Wrong latest offsets returned by DirectKafkaInputDStream#latestOffsets
> --
>
> Key: SPARK-27281
> URL: https://issues.apache.org/jira/browse/SPARK-27281
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.4.0
>Reporter: Viacheslav Krot
>Priority: Major
>
> I have a very strange and hard to reproduce issue when using kafka direct 
> streaming, version 2.4.0
>  From time to time, maybe once a day - once a week I get following error 
> {noformat}
> java.lang.IllegalArgumentException: requirement failed: numRecords must not 
> be negative
> at scala.Predef$.require(Predef.scala:224)
> at 
> org.apache.spark.streaming.scheduler.StreamInputInfo.(InputInfoTracker.scala:38)
> at 
> org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:250)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:336)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:334)
> at scala.Option.orElse(Option.scala:289)
> at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:331)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:48)
> at 
> org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:122)
> at 
> org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:121)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
> at 
> org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:121)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:249)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:247)
> at scala.util.Try$.apply(Try.scala:192)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:247)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:183)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:89)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
> 19/01/29 13:10:00 ERROR apps.BusinessRuleEngine: Job failed. Stopping JVM
> java.lang.IllegalArgumentException: requirement failed: numRecords must not 
> be negative
> at scala.Predef$.require(Predef.scala:224)
> at 
> org.apache.spark.streaming.scheduler.StreamInputInfo.(InputInfoTracker.scala:38)
> at 
> org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:250)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
> at 
> 

[jira] [Commented] (SPARK-27281) Wrong latest offsets returned by DirectKafkaInputDStream#latestOffsets

2020-06-24 Thread Yuanyuan Xia (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143561#comment-17143561
 ] 

Yuanyuan Xia commented on SPARK-27281:
--

In our environment, we encounter the same issue and the cause seems also 
related to [KAFKA-7703|https://issues.apache.org/jira/browse/KAFKA-7703]

> Wrong latest offsets returned by DirectKafkaInputDStream#latestOffsets
> --
>
> Key: SPARK-27281
> URL: https://issues.apache.org/jira/browse/SPARK-27281
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.4.0
>Reporter: Viacheslav Krot
>Priority: Major
>
> I have a very strange and hard to reproduce issue when using kafka direct 
> streaming, version 2.4.0
>  From time to time, maybe once a day - once a week I get following error 
> {noformat}
> java.lang.IllegalArgumentException: requirement failed: numRecords must not 
> be negative
> at scala.Predef$.require(Predef.scala:224)
> at 
> org.apache.spark.streaming.scheduler.StreamInputInfo.(InputInfoTracker.scala:38)
> at 
> org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:250)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:336)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:334)
> at scala.Option.orElse(Option.scala:289)
> at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:331)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:48)
> at 
> org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:122)
> at 
> org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:121)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
> at 
> org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:121)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:249)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$3.apply(JobGenerator.scala:247)
> at scala.util.Try$.apply(Try.scala:192)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator.generateJobs(JobGenerator.scala:247)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:183)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:89)
> at 
> org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:88)
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
> 19/01/29 13:10:00 ERROR apps.BusinessRuleEngine: Job failed. Stopping JVM
> java.lang.IllegalArgumentException: requirement failed: numRecords must not 
> be negative
> at scala.Predef$.require(Predef.scala:224)
> at 
> org.apache.spark.streaming.scheduler.StreamInputInfo.(InputInfoTracker.scala:38)
> at 
> org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:250)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:342)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
> at 
> 

[jira] [Resolved] (SPARK-32050) GBTClassifier not working with OnevsRest

2020-06-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32050.
--
Resolution: Duplicate

> GBTClassifier not working with OnevsRest
> 
>
> Key: SPARK-32050
> URL: https://issues.apache.org/jira/browse/SPARK-32050
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.4.0
> Environment: spark 2.4.0
>Reporter: Raghuvarran V H
>Priority: Minor
>
> I am trying to use GBT classifier for multi class classification using 
> OnevsRest
>  
> {code:java}
> from pyspark.ml.classification import 
> MultilayerPerceptronClassifier,OneVsRest,GBTClassifier
> from pyspark.ml import Pipeline,PipelineModel
> lr = GBTClassifier(featuresCol='features', labelCol='label', 
> predictionCol='prediction', maxDepth=5,   
>
> maxBins=32,minInstancesPerNode=1, minInfoGain=0.0, maxMemoryInMB=256, 
> cacheNodeIds=False,checkpointInterval=10, lossType='logistic', 
> maxIter=20,stepSize=0.1, seed=None,subsamplingRate=1.0, 
> featureSubsetStrategy='auto')
> classifier = OneVsRest(featuresCol='features', labelCol='label', 
> predictionCol='prediction', classifier=lr,    weightCol=None,parallelism=1)
> pipeline = Pipeline(stages=[str_indxr,ohe,vecAssembler,normalizer,classifier])
> model = pipeline.fit(train_data)
> {code}
>  
>  
> When I try this I get this error:
> /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/spark/python/pyspark/ml/classification.py
>  in _fit(self, dataset)
>  1800 classifier = self.getClassifier()
>  1801 assert isinstance(classifier, HasRawPredictionCol),\
>  -> 1802 "Classifier %s doesn't extend from HasRawPredictionCol." % 
> type(classifier)
>  1803 
>  1804 numClasses = int(dataset.agg(\{labelCol: 
> "max"}).head()["max("+labelCol+")"]) + 1
> AssertionError: Classifier  
> doesn't extend from HasRawPredictionCol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32051) Dataset.foreachPartition returns object

2020-06-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-32051:
-
Priority: Major  (was: Critical)

> Dataset.foreachPartition returns object
> ---
>
> Key: SPARK-32051
> URL: https://issues.apache.org/jira/browse/SPARK-32051
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Frank Oosterhuis
>Priority: Major
>
> I'm trying to map values from the Dataset[Row], but since 3.0.0 this fails.
> In 3.0.0 I'm dealing with an error: "Error:(28, 38) value map is not a member 
> of Object"
>  
> This is the simplest code that works in 2.4.x, but fails in 3.0.0:
> {code:scala}
> spark.range(100)
>   .repartition(10)
>   .foreachPartition(part => println(part.toList))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32053) pyspark save of serialized model is failing for windows.

2020-06-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32053.
--
Resolution: Incomplete

> pyspark save of serialized model is failing for windows.
> 
>
> Key: SPARK-32053
> URL: https://issues.apache.org/jira/browse/SPARK-32053
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Kayal
>Priority: Major
> Attachments: image-2020-06-22-18-19-32-236.png
>
>
> {color:#172b4d}Hi, {color}
> {color:#172b4d}We are using spark functionality to save the serialized model 
> to disk . On windows platform we are seeing save of the serialized model is 
> failing with the error:  o288.save() failed. {color}
>  
>  
>  
> !image-2020-06-22-18-19-32-236.png!
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32053) pyspark save of serialized model is failing for windows.

2020-06-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143557#comment-17143557
 ] 

Hyukjin Kwon commented on SPARK-32053:
--

[~kaganesa] Spark 2.3.0 is EOL so we won't be able to land any fix. Can you see 
if this issue still persists in higher versions?
Also, it would be great if you share the full reproducer and full error 
messages.

> pyspark save of serialized model is failing for windows.
> 
>
> Key: SPARK-32053
> URL: https://issues.apache.org/jira/browse/SPARK-32053
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Kayal
>Priority: Major
> Attachments: image-2020-06-22-18-19-32-236.png
>
>
> {color:#172b4d}Hi, {color}
> {color:#172b4d}We are using spark functionality to save the serialized model 
> to disk . On windows platform we are seeing save of the serialized model is 
> failing with the error:  o288.save() failed. {color}
>  
>  
>  
> !image-2020-06-22-18-19-32-236.png!
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32068) Spark 3 UI task launch time show in error time zone

2020-06-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143556#comment-17143556
 ] 

Hyukjin Kwon commented on SPARK-32068:
--

[~d87904488] can you attach the snapshots?

> Spark 3 UI task launch time show in error time zone
> ---
>
> Key: SPARK-32068
> URL: https://issues.apache.org/jira/browse/SPARK-32068
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Smith Cruise
>Priority: Major
>  Labels: easyfix
>
> For example,
> In this link: history/app-20200623133209-0015/stages/ , stage submit time is 
> correct (UTS)
>  
> But in this link: 
> history/app-20200623133209-0015/stages/stage/?id=0=0 , task launch 
> time is incorrect(UTC)
>  
> The same problem exists in port 4040 Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-32068) Spark 3 UI task launch time show in error time zone

2020-06-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143556#comment-17143556
 ] 

Hyukjin Kwon edited comment on SPARK-32068 at 6/24/20, 6:24 AM:


[~d87904488] can you attach the screenshots?


was (Author: hyukjin.kwon):
[~d87904488] can you attach the snapshots?

> Spark 3 UI task launch time show in error time zone
> ---
>
> Key: SPARK-32068
> URL: https://issues.apache.org/jira/browse/SPARK-32068
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Smith Cruise
>Priority: Major
>  Labels: easyfix
>
> For example,
> In this link: history/app-20200623133209-0015/stages/ , stage submit time is 
> correct (UTS)
>  
> But in this link: 
> history/app-20200623133209-0015/stages/stage/?id=0=0 , task launch 
> time is incorrect(UTC)
>  
> The same problem exists in port 4040 Web UI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32081) facing Invalid UTF-32 character v2.4.5 running pyspark

2020-06-24 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143554#comment-17143554
 ] 

Hyukjin Kwon commented on SPARK-32081:
--

Please just don't copy and paste the errors. The error message say the encoding 
of your file is wrong:

{code}
java.io.CharConversionException: Invalid UTF-32 character 0x100(above 
10) at char #206, byte
{code}

> facing Invalid UTF-32 character v2.4.5 running pyspark
> --
>
> Key: SPARK-32081
> URL: https://issues.apache.org/jira/browse/SPARK-32081
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 2.4.5
>Reporter: Yaniv Kempler
>Priority: Major
>
> facing Invalid UTF-32 character while reading json files
>  
> Py4JJavaError Traceback (most recent call last)  in  
> ~/.local/lib/python3.6/site-packages/pyspark/sql/readwriter.py in json(self, 
> path, schema, primitivesAsString, prefersDecimal, allowComments, 
> allowUnquotedFieldNames, allowSingleQuotes, allowNumericLeadingZero, 
> allowBackslashEscapingAnyCharacter, mode, columnNameOfCorruptRecord, 
> dateFormat, timestampFormat, multiLine, allowUnquotedControlChars, lineSep, 
> samplingRatio, dropFieldIfAllNull, encoding)  284 keyed._bypass_serializer = 
> True  285 jrdd = keyed._jrdd.map(self._spark._jvm.BytesToString()) --> 286 
> return self._df(self._jreader.json(jrdd))  287 else:  288 raise 
> TypeError("path can be only string, list or RDD") 
> ~/.local/lib/python3.6/site-packages/py4j/java_gateway.py in __call__(self, 
> *args)  1255 answer = self.gateway_client.send_command(command)  1256 
> return_value = get_return_value( -> 1257 answer, self.gateway_client, 
> self.target_id, self.name)  1258  1259 for temp_arg in temp_args: 
> ~/.local/lib/python3.6/site-packages/pyspark/sql/utils.py in deco(*a, **kw)  
> 61 def deco(*a, **kw):  62 try: ---> 63 return f(*a, **kw)  64 except 
> py4j.protocol.Py4JJavaError as e:  65 s = e.java_exception.toString() 
> ~/.local/lib/python3.6/site-packages/py4j/protocol.py in 
> get_return_value(answer, gateway_client, target_id, name)  326 raise 
> Py4JJavaError(  327 "An error occurred while calling \{0}{1}\{2}.\n". --> 328 
> format(target_id, ".", name), value)  329 else:  330 raise Py4JError( 
> Py4JJavaError: An error occurred while calling o67.json. : 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 546 
> in stage 0.0 failed 4 times, most recent failure: Lost task 546.3 in stage 
> 0.0 (TID 642, 172.31.30.196, executor 1): java.io.CharConversionException: 
> Invalid UTF-32 character 0x100(above 10) at char #206, byte #827) at 
> com.fasterxml.jackson.core.io.UTF32Reader.reportInvalid(UTF32Reader.java:189) 
> at com.fasterxml.jackson.core.io.UTF32Reader.read(UTF32Reader.java:150) at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.loadMore(ReaderBasedJsonParser.java:153)
>  at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:2017)
>  at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:577)
>  at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(JsonInferSchema.scala:56)
>  at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(JsonInferSchema.scala:55)
>  at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1$$anonfun$apply$1.apply(JsonInferSchema.scala:55)
>  at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1$$anonfun$apply$1.apply(JsonInferSchema.scala:53)
>  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at 
> scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185) 
> at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1334) at 
> scala.collection.TraversableOnce$class.reduceLeftOption(TraversableOnce.scala:203)
>  at scala.collection.AbstractIterator.reduceLeftOption(Iterator.scala:1334) 
> at 
> scala.collection.TraversableOnce$class.reduceOption(TraversableOnce.scala:210)
>  at scala.collection.AbstractIterator.reduceOption(Iterator.scala:1334) at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1.apply(JsonInferSchema.scala:70)
>  at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1.apply(JsonInferSchema.scala:50)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)
>  at 
> 

[jira] [Resolved] (SPARK-32081) facing Invalid UTF-32 character v2.4.5 running pyspark

2020-06-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32081.
--
Resolution: Cannot Reproduce

> facing Invalid UTF-32 character v2.4.5 running pyspark
> --
>
> Key: SPARK-32081
> URL: https://issues.apache.org/jira/browse/SPARK-32081
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 2.4.5
>Reporter: Yaniv Kempler
>Priority: Major
>
> facing Invalid UTF-32 character while reading json files
>  
> Py4JJavaError Traceback (most recent call last)  in  
> ~/.local/lib/python3.6/site-packages/pyspark/sql/readwriter.py in json(self, 
> path, schema, primitivesAsString, prefersDecimal, allowComments, 
> allowUnquotedFieldNames, allowSingleQuotes, allowNumericLeadingZero, 
> allowBackslashEscapingAnyCharacter, mode, columnNameOfCorruptRecord, 
> dateFormat, timestampFormat, multiLine, allowUnquotedControlChars, lineSep, 
> samplingRatio, dropFieldIfAllNull, encoding)  284 keyed._bypass_serializer = 
> True  285 jrdd = keyed._jrdd.map(self._spark._jvm.BytesToString()) --> 286 
> return self._df(self._jreader.json(jrdd))  287 else:  288 raise 
> TypeError("path can be only string, list or RDD") 
> ~/.local/lib/python3.6/site-packages/py4j/java_gateway.py in __call__(self, 
> *args)  1255 answer = self.gateway_client.send_command(command)  1256 
> return_value = get_return_value( -> 1257 answer, self.gateway_client, 
> self.target_id, self.name)  1258  1259 for temp_arg in temp_args: 
> ~/.local/lib/python3.6/site-packages/pyspark/sql/utils.py in deco(*a, **kw)  
> 61 def deco(*a, **kw):  62 try: ---> 63 return f(*a, **kw)  64 except 
> py4j.protocol.Py4JJavaError as e:  65 s = e.java_exception.toString() 
> ~/.local/lib/python3.6/site-packages/py4j/protocol.py in 
> get_return_value(answer, gateway_client, target_id, name)  326 raise 
> Py4JJavaError(  327 "An error occurred while calling \{0}{1}\{2}.\n". --> 328 
> format(target_id, ".", name), value)  329 else:  330 raise Py4JError( 
> Py4JJavaError: An error occurred while calling o67.json. : 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 546 
> in stage 0.0 failed 4 times, most recent failure: Lost task 546.3 in stage 
> 0.0 (TID 642, 172.31.30.196, executor 1): java.io.CharConversionException: 
> Invalid UTF-32 character 0x100(above 10) at char #206, byte #827) at 
> com.fasterxml.jackson.core.io.UTF32Reader.reportInvalid(UTF32Reader.java:189) 
> at com.fasterxml.jackson.core.io.UTF32Reader.read(UTF32Reader.java:150) at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.loadMore(ReaderBasedJsonParser.java:153)
>  at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:2017)
>  at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:577)
>  at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(JsonInferSchema.scala:56)
>  at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(JsonInferSchema.scala:55)
>  at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1$$anonfun$apply$1.apply(JsonInferSchema.scala:55)
>  at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1$$anonfun$apply$1.apply(JsonInferSchema.scala:53)
>  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at 
> scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185) 
> at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1334) at 
> scala.collection.TraversableOnce$class.reduceLeftOption(TraversableOnce.scala:203)
>  at scala.collection.AbstractIterator.reduceLeftOption(Iterator.scala:1334) 
> at 
> scala.collection.TraversableOnce$class.reduceOption(TraversableOnce.scala:210)
>  at scala.collection.AbstractIterator.reduceOption(Iterator.scala:1334) at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1.apply(JsonInferSchema.scala:70)
>  at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1.apply(JsonInferSchema.scala:50)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:310) 

[jira] [Updated] (SPARK-32081) facing Invalid UTF-32 character v2.4.5 running pyspark

2020-06-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-32081:
-
Priority: Major  (was: Blocker)

> facing Invalid UTF-32 character v2.4.5 running pyspark
> --
>
> Key: SPARK-32081
> URL: https://issues.apache.org/jira/browse/SPARK-32081
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 2.4.5
>Reporter: Yaniv Kempler
>Priority: Major
>
> facing Invalid UTF-32 character while reading json files
>  
> Py4JJavaError Traceback (most recent call last)  in  
> ~/.local/lib/python3.6/site-packages/pyspark/sql/readwriter.py in json(self, 
> path, schema, primitivesAsString, prefersDecimal, allowComments, 
> allowUnquotedFieldNames, allowSingleQuotes, allowNumericLeadingZero, 
> allowBackslashEscapingAnyCharacter, mode, columnNameOfCorruptRecord, 
> dateFormat, timestampFormat, multiLine, allowUnquotedControlChars, lineSep, 
> samplingRatio, dropFieldIfAllNull, encoding)  284 keyed._bypass_serializer = 
> True  285 jrdd = keyed._jrdd.map(self._spark._jvm.BytesToString()) --> 286 
> return self._df(self._jreader.json(jrdd))  287 else:  288 raise 
> TypeError("path can be only string, list or RDD") 
> ~/.local/lib/python3.6/site-packages/py4j/java_gateway.py in __call__(self, 
> *args)  1255 answer = self.gateway_client.send_command(command)  1256 
> return_value = get_return_value( -> 1257 answer, self.gateway_client, 
> self.target_id, self.name)  1258  1259 for temp_arg in temp_args: 
> ~/.local/lib/python3.6/site-packages/pyspark/sql/utils.py in deco(*a, **kw)  
> 61 def deco(*a, **kw):  62 try: ---> 63 return f(*a, **kw)  64 except 
> py4j.protocol.Py4JJavaError as e:  65 s = e.java_exception.toString() 
> ~/.local/lib/python3.6/site-packages/py4j/protocol.py in 
> get_return_value(answer, gateway_client, target_id, name)  326 raise 
> Py4JJavaError(  327 "An error occurred while calling \{0}{1}\{2}.\n". --> 328 
> format(target_id, ".", name), value)  329 else:  330 raise Py4JError( 
> Py4JJavaError: An error occurred while calling o67.json. : 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 546 
> in stage 0.0 failed 4 times, most recent failure: Lost task 546.3 in stage 
> 0.0 (TID 642, 172.31.30.196, executor 1): java.io.CharConversionException: 
> Invalid UTF-32 character 0x100(above 10) at char #206, byte #827) at 
> com.fasterxml.jackson.core.io.UTF32Reader.reportInvalid(UTF32Reader.java:189) 
> at com.fasterxml.jackson.core.io.UTF32Reader.read(UTF32Reader.java:150) at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.loadMore(ReaderBasedJsonParser.java:153)
>  at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._skipWSOrEnd(ReaderBasedJsonParser.java:2017)
>  at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:577)
>  at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(JsonInferSchema.scala:56)
>  at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(JsonInferSchema.scala:55)
>  at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1$$anonfun$apply$1.apply(JsonInferSchema.scala:55)
>  at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1$$anonfun$apply$1.apply(JsonInferSchema.scala:53)
>  at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435) at 
> scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441) at 
> scala.collection.Iterator$class.foreach(Iterator.scala:891) at 
> scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at 
> scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185) 
> at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1334) at 
> scala.collection.TraversableOnce$class.reduceLeftOption(TraversableOnce.scala:203)
>  at scala.collection.AbstractIterator.reduceLeftOption(Iterator.scala:1334) 
> at 
> scala.collection.TraversableOnce$class.reduceOption(TraversableOnce.scala:210)
>  at scala.collection.AbstractIterator.reduceOption(Iterator.scala:1334) at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1.apply(JsonInferSchema.scala:70)
>  at 
> org.apache.spark.sql.catalyst.json.JsonInferSchema$$anonfun$1.apply(JsonInferSchema.scala:50)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:823)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) 
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) at 
> 

[jira] [Commented] (SPARK-31998) Change package references for ArrowBuf

2020-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143553#comment-17143553
 ] 

Apache Spark commented on SPARK-31998:
--

User 'BryanCutler' has created a pull request for this issue:
https://github.com/apache/spark/pull/28915

> Change package references for ArrowBuf
> --
>
> Key: SPARK-31998
> URL: https://issues.apache.org/jira/browse/SPARK-31998
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Liya Fan
>Priority: Major
>
> Recently, we have moved class ArrowBuf from package io.netty.buffer to 
> org.apache.arrow.memory. So after upgrading Arrow library, we need to update 
> the references to ArrowBuf with the correct package name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31998) Change package references for ArrowBuf

2020-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31998:


Assignee: Apache Spark

> Change package references for ArrowBuf
> --
>
> Key: SPARK-31998
> URL: https://issues.apache.org/jira/browse/SPARK-31998
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Liya Fan
>Assignee: Apache Spark
>Priority: Major
>
> Recently, we have moved class ArrowBuf from package io.netty.buffer to 
> org.apache.arrow.memory. So after upgrading Arrow library, we need to update 
> the references to ArrowBuf with the correct package name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31998) Change package references for ArrowBuf

2020-06-24 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31998:


Assignee: (was: Apache Spark)

> Change package references for ArrowBuf
> --
>
> Key: SPARK-31998
> URL: https://issues.apache.org/jira/browse/SPARK-31998
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Liya Fan
>Priority: Major
>
> Recently, we have moved class ArrowBuf from package io.netty.buffer to 
> org.apache.arrow.memory. So after upgrading Arrow library, we need to update 
> the references to ArrowBuf with the correct package name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31998) Change package references for ArrowBuf

2020-06-24 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17143551#comment-17143551
 ] 

Apache Spark commented on SPARK-31998:
--

User 'BryanCutler' has created a pull request for this issue:
https://github.com/apache/spark/pull/28915

> Change package references for ArrowBuf
> --
>
> Key: SPARK-31998
> URL: https://issues.apache.org/jira/browse/SPARK-31998
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Liya Fan
>Priority: Major
>
> Recently, we have moved class ArrowBuf from package io.netty.buffer to 
> org.apache.arrow.memory. So after upgrading Arrow library, we need to update 
> the references to ArrowBuf with the correct package name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2