date:20190705

[jira] [Assigned] (SPARK-28020) Add date.sql

2019-07-05 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28020:
-

Assignee: Yuming Wang

> Add date.sql
> 
>
> Key: SPARK-28020
> URL: https://issues.apache.org/jira/browse/SPARK-28020
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28020) Add date.sql

2019-07-05 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28020.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24850
[https://github.com/apache/spark/pull/24850]

> Add date.sql
> 
>
> Key: SPARK-28020
> URL: https://issues.apache.org/jira/browse/SPARK-28020
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27296) User Defined Aggregating Functions (UDAFs) have a major efficiency problem

2019-07-05 Thread Erik Erlandson (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Erlandson updated SPARK-27296:
---
Target Version/s: 3.0.0

> User Defined Aggregating Functions (UDAFs) have a major efficiency problem
> --
>
> Key: SPARK-27296
> URL: https://issues.apache.org/jira/browse/SPARK-27296
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL, Structured Streaming
>Affects Versions: 2.3.3, 2.4.0, 3.0.0
>Reporter: Erik Erlandson
>Assignee: Erik Erlandson
>Priority: Major
>  Labels: performance, usability
>
> Spark's UDAFs appear to be serializing and de-serializing to/from the 
> MutableAggregationBuffer for each row.  This gist shows a small reproducing 
> UDAF and a spark shell session:
> [https://gist.github.com/erikerlandson/3c4d8c6345d1521d89e0d894a423046f]
> The UDAF and its compantion UDT are designed to count the number of times 
> that ser/de is invoked for the aggregator.  The spark shell session 
> demonstrates that it is executing ser/de on every row of the data frame.
> Note, Spark's pre-defined aggregators do not have this problem, as they are 
> based on an internal aggregating trait that does the correct thing and only 
> calls ser/de at points such as partition boundaries, presenting final 
> results, etc.
> This is a major problem for UDAFs, as it means that every UDAF is doing a 
> massive amount of unnecessary work per row, including but not limited to Row 
> object allocations. For a more realistic UDAF having its own non trivial 
> internal structure it is obviously that much worse.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28266) data correctness issue: data duplication when `path` serde peroperty is present

2019-07-05 Thread Ruslan Dautkhanov (JIRA)

Ruslan Dautkhanov created SPARK-28266:
-

 Summary: data correctness issue: data duplication when `path` 
serde peroperty is present
 Key: SPARK-28266
 URL: https://issues.apache.org/jira/browse/SPARK-28266
 Project: Spark
  Issue Type: Bug
  Components: Optimizer, Spark Core
Affects Versions: 2.4.3, 2.4.2, 2.4.1, 2.4.0, 2.3.3, 2.3.2, 2.3.1, 2.3.0, 
2.2.3, 2.2.2, 2.2.1, 2.2.0, 2.3.4, 2.4.4, 3.0.0
Reporter: Ruslan Dautkhanov


Spark duplicates returned datasets when `path` serde is present in a parquet 
table. 

Confirmed versions affected: Spark 2.2, Spark 2.3, Spark 2.4.

Confirmed unaffected versions: Spark 2.1 and earlier (tested with Spark 1.6 at 
least).

Reproducer:

{code:python}
>>> spark.sql("create table ruslan_test.test55 as select 1 as id")
DataFrame[]

>>> spark.table("ruslan_test.test55").explain()

== Physical Plan ==
HiveTableScan [id#16], HiveTableRelation `ruslan_test`.`test55`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#16]

>>> spark.table("ruslan_test.test55").count()
1

{code}

(all is good at this point, now exist session and run in Hive for example - )

{code:sql}
ALTER TABLE ruslan_test.test55 SET SERDEPROPERTIES ( 
'path'='hdfs://epsdatalake/hivewarehouse/ruslan_test.db/test55' )
{code}

So LOCATION and serde `path` property would point to the same location.
Now see count returns two records instead of one:

{code:python}
>>> spark.table("ruslan_test.test55").count()
2

>>> spark.table("ruslan_test.test55").explain()
== Physical Plan ==
*(1) FileScan parquet ruslan_test.test55[id#9] Batched: true, Format: Parquet, 
Location: 
InMemoryFileIndex[hdfs://epsdatalake/hivewarehouse/ruslan_test.db/test55, 
hdfs://epsdatalake/hive..., PartitionFilters: [], PushedFilters: [], 
ReadSchema: struct
>>>

{code}

Also notice that the presence of `path` serde property makes TABLE location 
show up twice - 
{quote}
InMemoryFileIndex[hdfs://epsdatalake/hivewarehouse/ruslan_test.db/test55, 
hdfs://epsdatalake/hive..., 
{quote}

We have some applications that create parquet tables in Hive with `path` serde 
property
and it makes data duplicate in query results. 

Hive, Impala etc and Spark version 2.1 and earlier read such tables fine, but 
not Spark 2.2 and later releases.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28265) Missing TableCatalog API to rename table

2019-07-05 Thread Edgar Rodriguez (JIRA)

Edgar Rodriguez created SPARK-28265:
---

 Summary: Missing TableCatalog API to rename table
 Key: SPARK-28265
 URL: https://issues.apache.org/jira/browse/SPARK-28265
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.3
Reporter: Edgar Rodriguez


In the [Table Metadata API 
SPIP|https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#]
 ([SPARK-27658|https://issues.apache.org/jira/browse/SPARK-27067]) the 
{{renameTable}} operation for the TableCatalog API is defined as:
{code:java}
renameTable(CatalogIdentifier from, CatalogIdentifier to): Unit{code}
However, it was not included in the PR implementing it in 
[https://github.com/apache/spark/pull/24246].

It seems like this method is missing or is it unsupported?

Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28264) Revisiting Python / pandas UDF

2019-07-05 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-28264:
---

 Summary: Revisiting Python / pandas UDF
 Key: SPARK-28264
 URL: https://issues.apache.org/jira/browse/SPARK-28264
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 3.0.0
Reporter: Reynold Xin
Assignee: Reynold Xin


In the past two years, the pandas UDFs are perhaps the most important changes 
to Spark for Python data science. However, these functionalities have evolved 
organically, leading to some inconsistencies and confusions among users. This 
document revisits UDF definition and naming, as a result of discussions among 
Xiangrui, Li Jin, Hyukjin, and Reynold.

 

See document here: 
[https://docs.google.com/document/d/10Pkl-rqygGao2xQf6sddt0b-4FYK4g8qr_bXLKTL65A/edit#|https://docs.google.com/document/d/10Pkl-rqygGao2xQf6sddt0b-4FYK4g8qr_bXLKTL65A/edit]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25994) SPIP: Property Graphs, Cypher Queries, and Algorithms

2019-07-05 Thread Sam hendley (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879508#comment-16879508
 ] 

Sam hendley commented on SPARK-25994:
-

Forgive the intrusion but it is unclear to me if there is still going to be a 
'low level' Pregel api as part of this redesign? If so I have a few 
modifications I would like to propose to the pregel API to make it more useful 
and easier to track/debug if you could point me to an appropriate ticket.

> SPIP: Property Graphs, Cypher Queries, and Algorithms
> -
>
> Key: SPARK-25994
> URL: https://issues.apache.org/jira/browse/SPARK-25994
> Project: Spark
>  Issue Type: Epic
>  Components: Graph
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Martin Junghanns
>Priority: Major
>  Labels: SPIP
>
> Copied from the SPIP doc:
> {quote}
> GraphX was one of the foundational pillars of the Spark project, and is the 
> current graph component. This reflects the importance of the graphs data 
> model, which naturally pairs with an important class of analytic function, 
> the network or graph algorithm. 
> However, GraphX is not actively maintained. It is based on RDDs, and cannot 
> exploit Spark 2’s Catalyst query engine. GraphX is only available to Scala 
> users.
> GraphFrames is a Spark package, which implements DataFrame-based graph 
> algorithms, and also incorporates simple graph pattern matching with fixed 
> length patterns (called “motifs”). GraphFrames is based on DataFrames, but 
> has a semantically weak graph data model (based on untyped edges and 
> vertices). The motif pattern matching facility is very limited by comparison 
> with the well-established Cypher language. 
> The Property Graph data model has become quite widespread in recent years, 
> and is the primary focus of commercial graph data management and of graph 
> data research, both for on-premises and cloud data management. Many users of 
> transactional graph databases also wish to work with immutable graphs in 
> Spark.
> The idea is to define a Cypher-compatible Property Graph type based on 
> DataFrames; to replace GraphFrames querying with Cypher; to reimplement 
> GraphX/GraphFrames algos on the PropertyGraph type. 
> To achieve this goal, a core subset of Cypher for Apache Spark (CAPS), 
> reusing existing proven designs and code, will be employed in Spark 3.0. This 
> graph query processor, like CAPS, will overlay and drive the SparkSQL 
> Catalyst query engine, using the CAPS graph query planner.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28263) Spark-submit can not find class (ClassNotFoundException)

2019-07-05 Thread Zhiyuan (JIRA)

Zhiyuan created SPARK-28263:
---

 Summary: Spark-submit can not find class (ClassNotFoundException)
 Key: SPARK-28263
 URL: https://issues.apache.org/jira/browse/SPARK-28263
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell, Spark Submit
Affects Versions: 2.4.3
Reporter: Zhiyuan


I try to run the Main class in my folder using the following code in the script:

 
{code:java}
spark-shell --class com.navercorp.Main /target/node2vec-0.0.1-SNAPSHOT.jar 
--cmd node2vec ../graph/karate.edgelist --output ../walk/walk.txt {code}
But it raises the error:
{code:java}
19/07/05 14:39:20 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
19/07/05 14:39:20 WARN deploy.SparkSubmit$$anon$2: Failed to load 
com.navercorp.Main.
java.lang.ClassNotFoundException: com.navercorp.Main
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:810)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala){code}
I have jar file in my folder, this is the structure:
{code:java}
1node2vec
     2node2vec_spark
           3main
                 4resources
                 4com
                        5novercorp
                               6lib
                                      7Main
                                      7Node2vec
                                      7Word2vec
     2target
            3lib
            3classes
            3maven-archiver
            3node2vec-0.0.1-SNAPSHOT.jar       
     2graph
            3---karate.edgelist
     2walk
            3walk.txt
{code}
Also, I attach the structure of jar file:
{code:java}
```META-INF/
META-INF/MANIFEST.MF
log4j2.properties
com/
com/navercorp/
com/navercorp/Node2vec$.class
com/navercorp/Main$Params$$typecreator1$1.class
com/navercorp/Main$$anon$1$$anonfun$11.class
com/navercorp/Word2vec$.class
com/navercorp/Main$$anon$1$$anonfun$8.class
com/navercorp/Node2vec$$anonfun$randomWalk$1$$anonfun$8.class
com/navercorp/Node2vec$$anonfun$indexingGraph$4.class
com/navercorp/Node2vec$$anonfun$initTransitionProb$1.class
com/navercorp/Main$.class
com/navercorp/Node2vec$$anonfun$loadNode2Id$1.class
com/navercorp/Node2vec$$anonfun$14.class
com/navercorp/Node2vec$$anonfun$readIndexedGraph$2$$anonfun$1.class
```{code}

Could someone give me the advice on how to connect Main class?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-27296) User Defined Aggregating Functions (UDAFs) have a major efficiency problem

2019-07-05 Thread Erik Erlandson (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Erlandson reassigned SPARK-27296:
--

Assignee: Erik Erlandson

> User Defined Aggregating Functions (UDAFs) have a major efficiency problem
> --
>
> Key: SPARK-27296
> URL: https://issues.apache.org/jira/browse/SPARK-27296
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL, Structured Streaming
>Affects Versions: 2.3.3, 2.4.0, 3.0.0
>Reporter: Erik Erlandson
>Assignee: Erik Erlandson
>Priority: Major
>  Labels: performance, usability
>
> Spark's UDAFs appear to be serializing and de-serializing to/from the 
> MutableAggregationBuffer for each row.  This gist shows a small reproducing 
> UDAF and a spark shell session:
> [https://gist.github.com/erikerlandson/3c4d8c6345d1521d89e0d894a423046f]
> The UDAF and its compantion UDT are designed to count the number of times 
> that ser/de is invoked for the aggregator.  The spark shell session 
> demonstrates that it is executing ser/de on every row of the data frame.
> Note, Spark's pre-defined aggregators do not have this problem, as they are 
> based on an internal aggregating trait that does the correct thing and only 
> calls ser/de at points such as partition boundaries, presenting final 
> results, etc.
> This is a major problem for UDAFs, as it means that every UDAF is doing a 
> massive amount of unnecessary work per row, including but not limited to Row 
> object allocations. For a more realistic UDAF having its own non trivial 
> internal structure it is obviously that much worse.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28206) "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc

2019-07-05 Thread Xiangrui Meng (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng reassigned SPARK-28206:
-

Assignee: Hyukjin Kwon

> "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc
> -
>
> Key: SPARK-28206
> URL: https://issues.apache.org/jira/browse/SPARK-28206
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 2.4.1
>Reporter: Xiangrui Meng
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: Screen Shot 2019-06-28 at 9.55.13 AM.png
>
>
> Just noticed that in [pandas_udf API doc 
> |https://spark.apache.org/docs/2.4.1/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf],
>  "@pandas_udf" is render as ":pandas_udf".
> cc: [~hyukjin.kwon] [~smilegator]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28206) "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc

2019-07-05 Thread Xiangrui Meng (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-28206.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25060
[https://github.com/apache/spark/pull/25060]

> "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc
> -
>
> Key: SPARK-28206
> URL: https://issues.apache.org/jira/browse/SPARK-28206
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 2.4.1
>Reporter: Xiangrui Meng
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: Screen Shot 2019-06-28 at 9.55.13 AM.png
>
>
> Just noticed that in [pandas_udf API doc 
> |https://spark.apache.org/docs/2.4.1/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf],
>  "@pandas_udf" is render as ":pandas_udf".
> cc: [~hyukjin.kwon] [~smilegator]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27898) Support 4 date operators(date + integer, integer + date, date - integer and date - date)

2019-07-05 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27898.
---
   Resolution: Fixed
 Assignee: Yuming Wang
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/24755

> Support 4 date operators(date + integer, integer + date, date - integer and 
> date - date)
> 
>
> Key: SPARK-27898
> URL: https://issues.apache.org/jira/browse/SPARK-27898
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> Support 4 date operators(date + integer, integer + date, date - integer and 
> date - date):
> ||Operator||Example||Result||
> |+|date '2001-09-28' + integer '7'|date '2001-10-05'|
> |-|date '2001-10-01' - integer '7'|date '2001-09-24'|
> |-|date '2001-10-01' - date '2001-09-28'|integer '3' (days)|
> [https://www.postgresql.org/docs/12/functions-datetime.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28262) Support DROP GLOBAL TEMPORARY VIEW

2019-07-05 Thread Yuming Wang (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28262:

Description: 
We create global temporary view by:
CREATE GLOBAL TEMPORARY VIEW temp_view AS SELECT 1 AS col1;
But we need to specify {{spark.sql.globalTempDatabase}} when drop it:
DROP VIEW global_temp.temp_view;
Which is not very convenient, we should add support {{DROP GLOBAL TEMPORARY 
VIEW}}.

> Support DROP GLOBAL TEMPORARY VIEW
> --
>
> Key: SPARK-28262
> URL: https://issues.apache.org/jira/browse/SPARK-28262
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> We create global temporary view by:
> CREATE GLOBAL TEMPORARY VIEW temp_view AS SELECT 1 AS col1;
> But we need to specify {{spark.sql.globalTempDatabase}} when drop it:
> DROP VIEW global_temp.temp_view;
> Which is not very convenient, we should add support {{DROP GLOBAL TEMPORARY 
> VIEW}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28262) Support DROP GLOBAL TEMPORARY VIEW

2019-07-05 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28262:


Assignee: Apache Spark

> Support DROP GLOBAL TEMPORARY VIEW
> --
>
> Key: SPARK-28262
> URL: https://issues.apache.org/jira/browse/SPARK-28262
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28262) Support DROP GLOBAL TEMPORARY VIEW

2019-07-05 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28262:


Assignee: (was: Apache Spark)

> Support DROP GLOBAL TEMPORARY VIEW
> --
>
> Key: SPARK-28262
> URL: https://issues.apache.org/jira/browse/SPARK-28262
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28262) Support DROP GLOBAL TEMPORARY VIEW

2019-07-05 Thread Yuming Wang (JIRA)

Yuming Wang created SPARK-28262:
---

 Summary: Support DROP GLOBAL TEMPORARY VIEW
 Key: SPARK-28262
 URL: https://issues.apache.org/jira/browse/SPARK-28262
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28261) Flaky test: org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable

2019-07-05 Thread Gabor Somogyi (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Somogyi updated SPARK-28261:
--
Description: 
Error message:
{noformat}
java.lang.AssertionError: expected:<3> but was:<4>
...{noformat}

> Flaky test: 
> org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable
> ---
>
> Key: SPARK-28261
> URL: https://issues.apache.org/jira/browse/SPARK-28261
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> Error message:
> {noformat}
> java.lang.AssertionError: expected:<3> but was:<4>
> ...{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28261) Flaky test: org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable

2019-07-05 Thread Gabor Somogyi (JIRA)

Gabor Somogyi created SPARK-28261:
-

 Summary: Flaky test: 
org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable
 Key: SPARK-28261
 URL: https://issues.apache.org/jira/browse/SPARK-28261
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests
Affects Versions: 3.0.0
Reporter: Gabor Somogyi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28261) Flaky test: org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable

2019-07-05 Thread Gabor Somogyi (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879350#comment-16879350
 ] 

Gabor Somogyi commented on SPARK-28261:
---

I'm working on this.

> Flaky test: 
> org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable
> ---
>
> Key: SPARK-28261
> URL: https://issues.apache.org/jira/browse/SPARK-28261
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28260) Add CLOSED state to ExecutionState

2019-07-05 Thread Yuming Wang (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879349#comment-16879349
 ] 

Yuming Wang commented on SPARK-28260:
-

I'm working on.

> Add CLOSED state to ExecutionState
> --
>
> Key: SPARK-28260
> URL: https://issues.apache.org/jira/browse/SPARK-28260
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Currently, the ThriftServerTab displays a FINISHED state when the operation 
> finishes execution, but quite often it still takes a lot of time to fetch the 
> results. OperationState has state CLOSED for when after the iterator is 
> closed. Could we add CLOSED state to ExecutionState, and override the close() 
> in SparkExecuteStatement / GetSchemas / GetTables / GetColumns to do 
> HiveThriftServerListener.onOperationClosed?
>  
> https://github.com/apache/spark/pull/25043#issuecomment-508722874



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28260) Add CLOSED state to ExecutionState

2019-07-05 Thread Yuming Wang (JIRA)

Yuming Wang created SPARK-28260:
---

 Summary: Add CLOSED state to ExecutionState
 Key: SPARK-28260
 URL: https://issues.apache.org/jira/browse/SPARK-28260
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


Currently, the ThriftServerTab displays a FINISHED state when the operation 
finishes execution, but quite often it still takes a lot of time to fetch the 
results. OperationState has state CLOSED for when after the iterator is closed. 
Could we add CLOSED state to ExecutionState, and override the close() in 
SparkExecuteStatement / GetSchemas / GetTables / GetColumns to do 
HiveThriftServerListener.onOperationClosed?

 

https://github.com/apache/spark/pull/25043#issuecomment-508722874



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28200) Decimal overflow handling in ExpressionEncoder

2019-07-05 Thread Wenchen Fan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-28200:
---

Assignee: Mick Jermsurawong

> Decimal overflow handling in ExpressionEncoder
> --
>
> Key: SPARK-28200
> URL: https://issues.apache.org/jira/browse/SPARK-28200
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Marco Gaido
>Assignee: Mick Jermsurawong
>Priority: Major
> Fix For: 3.0.0
>
>
> As pointed out in https://github.com/apache/spark/pull/20350, we are 
> currently not checking the overflow when serializing a java/scala 
> `BigDecimal` in `ExpressionEncoder` / `ScalaReflection`.
> We should add this check there too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28200) Decimal overflow handling in ExpressionEncoder

2019-07-05 Thread Wenchen Fan (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-28200.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25016
[https://github.com/apache/spark/pull/25016]

> Decimal overflow handling in ExpressionEncoder
> --
>
> Key: SPARK-28200
> URL: https://issues.apache.org/jira/browse/SPARK-28200
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Marco Gaido
>Priority: Major
> Fix For: 3.0.0
>
>
> As pointed out in https://github.com/apache/spark/pull/20350, we are 
> currently not checking the overflow when serializing a java/scala 
> `BigDecimal` in `ExpressionEncoder` / `ScalaReflection`.
> We should add this check there too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28259) Date/Time Output Styles and Date Order Conventions

2019-07-05 Thread Yuming Wang (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28259:

Description: 
*Date/Time Output Styles*
||Style Specification||Description||Example||
|{{ISO}}|ISO 8601, SQL standard|{{1997-12-17 07:37:16-08}}|
|{{SQL}}|traditional style|{{12/17/1997 07:37:16.00 PST}}|
|{{Postgres}}|original style|{{Wed Dec 17 07:37:16 1997 PST}}|
|{{German}}|regional style|{{17.12.1997 07:37:16.00 PST}}|

[https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT-TABLE]

 

*Date Order Conventions*
||{{datestyle}} Setting||Input Ordering||Example Output||
|{{SQL, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{17/12/1997 15:37:16.00 CET}}|
|{{SQL, MDY}}|_{{month}}_/_{{day}}_/_{{year}}_|{{12/17/1997 07:37:16.00 PST}}|
|{{Postgres, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{Wed 17 Dec 07:37:16 1997 
PST}}|

[https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT2-TABLE]

 

  was:
*Date/Time Output Styles*
||Style Specification||Description||Example||
|{{ISO}}|ISO 8601, SQL standard|{{1997-12-17 07:37:16-08}}|
|{{SQL}}|traditional style|{{12/17/1997 07:37:16.00 PST}}|
|{{Postgres}}|original style|{{Wed Dec 17 07:37:16 1997 PST}}|
|{{German}}|regional style|{{17.12.1997 07:37:16.00 PST}}|

[
https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT-TABLE]

 

*Date Order Conventions*
||{{datestyle}} Setting||Input Ordering||Example Output||
|{{SQL, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{17/12/1997 15:37:16.00 CET}}|
|{{SQL, MDY}}|_{{month}}_/_{{day}}_/_{{year}}_|{{12/17/1997 07:37:16.00 PST}}|
|{{Postgres, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{Wed 17 Dec 07:37:16 1997 
PST}}|

[https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT2-TABLE]

 


> Date/Time Output Styles and Date Order Conventions
> --
>
> Key: SPARK-28259
> URL: https://issues.apache.org/jira/browse/SPARK-28259
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> *Date/Time Output Styles*
> ||Style Specification||Description||Example||
> |{{ISO}}|ISO 8601, SQL standard|{{1997-12-17 07:37:16-08}}|
> |{{SQL}}|traditional style|{{12/17/1997 07:37:16.00 PST}}|
> |{{Postgres}}|original style|{{Wed Dec 17 07:37:16 1997 PST}}|
> |{{German}}|regional style|{{17.12.1997 07:37:16.00 PST}}|
> [https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT-TABLE]
>  
> *Date Order Conventions*
> ||{{datestyle}} Setting||Input Ordering||Example Output||
> |{{SQL, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{17/12/1997 15:37:16.00 CET}}|
> |{{SQL, MDY}}|_{{month}}_/_{{day}}_/_{{year}}_|{{12/17/1997 07:37:16.00 PST}}|
> |{{Postgres, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{Wed 17 Dec 07:37:16 
> 1997 PST}}|
> [https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT2-TABLE]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28259) Date/Time Output Styles and Date Order Conventions

2019-07-05 Thread Yuming Wang (JIRA)

Yuming Wang created SPARK-28259:
---

 Summary: Date/Time Output Styles and Date Order Conventions
 Key: SPARK-28259
 URL: https://issues.apache.org/jira/browse/SPARK-28259
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


*Date/Time Output Styles*
||Style Specification||Description||Example||
|{{ISO}}|ISO 8601, SQL standard|{{1997-12-17 07:37:16-08}}|
|{{SQL}}|traditional style|{{12/17/1997 07:37:16.00 PST}}|
|{{Postgres}}|original style|{{Wed Dec 17 07:37:16 1997 PST}}|
|{{German}}|regional style|{{17.12.1997 07:37:16.00 PST}}|

[
https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT-TABLE]

 

*Date Order Conventions*
||{{datestyle}} Setting||Input Ordering||Example Output||
|{{SQL, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{17/12/1997 15:37:16.00 CET}}|
|{{SQL, MDY}}|_{{month}}_/_{{day}}_/_{{year}}_|{{12/17/1997 07:37:16.00 PST}}|
|{{Postgres, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{Wed 17 Dec 07:37:16 1997 
PST}}|

[https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT2-TABLE]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28206) "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc

2019-07-05 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28206:


Assignee: (was: Apache Spark)

> "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc
> -
>
> Key: SPARK-28206
> URL: https://issues.apache.org/jira/browse/SPARK-28206
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 2.4.1
>Reporter: Xiangrui Meng
>Priority: Major
> Attachments: Screen Shot 2019-06-28 at 9.55.13 AM.png
>
>
> Just noticed that in [pandas_udf API doc 
> |https://spark.apache.org/docs/2.4.1/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf],
>  "@pandas_udf" is render as ":pandas_udf".
> cc: [~hyukjin.kwon] [~smilegator]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28206) "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc

2019-07-05 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28206:


Assignee: Apache Spark

> "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc
> -
>
> Key: SPARK-28206
> URL: https://issues.apache.org/jira/browse/SPARK-28206
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 2.4.1
>Reporter: Xiangrui Meng
>Assignee: Apache Spark
>Priority: Major
> Attachments: Screen Shot 2019-06-28 at 9.55.13 AM.png
>
>
> Just noticed that in [pandas_udf API doc 
> |https://spark.apache.org/docs/2.4.1/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf],
>  "@pandas_udf" is render as ":pandas_udf".
> cc: [~hyukjin.kwon] [~smilegator]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28258) Incopatibility betweek spark docker image and hadoop 3.2 and azure tools

2019-07-05 Thread Jose Luis Pedrosa (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Luis Pedrosa updated SPARK-28258:
--
Description: 
Currently the docker images generated by the distro uses openjdk8 based on 
alpine.
 This means that the version shipped of libssl is 1.1.1b-r1:
  
{noformat}
sh-4.4# apk list | grep ssl
libssl1.1-1.1.1b-r1 x86_64 {openssl} (OpenSSL) [installed] 
{noformat}
The hadoop distro ships wildfly-openssl-1.0.4.Final.jar, which is affected by 
[https://issues.jboss.org/browse/JBEAP-16425].

This results on error running on the executor:
{noformat}
2019-07-04 22:32:40,339 INFO openssl.SSL: WFOPENSSL0002 OpenSSL Version OpenSSL 
1.1.1b 26 Feb 2019
2019-07-04 22:32:40,363 WARN streaming.FileStreamSink: Error while looking for 
metadata directory.
Exception in thread "main" java.lang.NullPointerException
 at 
org.wildfly.openssl.CipherSuiteConverter.toJava(CipherSuiteConverter.java:284)
{noformat}
In my tests creating a Docker image with an updated version of wildfly, solves 
the issue: 1.0.7.Final

Not sure if this is an Spark problem, if so, where would be the right place to 
solve it. 

It seems they may take care of it in Hadoop directly, but tickets are open.

https://issues.apache.org/jira/browse/HADOOP-16410

https://issues.apache.org/jira/browse/HADOOP-16405

  was:
Currently the docker images generated by the distro uses openjdk8 based on 
alpine.
 This means that the version shipped of libssl is 1.1.1b-r1:
 
{noformat}
sh-4.4# apk list | grep ssl
libssl1.1-1.1.1b-r1 x86_64 {openssl} (OpenSSL) [installed] 
{noformat}

The hadoop distro ships wildfly-openssl-1.0.4.Final.jar, which is affected by 
https://issues.jboss.org/browse/JBEAP-16425.

This results on error running on the executor:
{noformat}
2019-07-04 22:32:40,339 INFO openssl.SSL: WFOPENSSL0002 OpenSSL Version OpenSSL 
1.1.1b 26 Feb 2019
2019-07-04 22:32:40,363 WARN streaming.FileStreamSink: Error while looking for 
metadata directory.
Exception in thread "main" java.lang.NullPointerException
 at 
org.wildfly.openssl.CipherSuiteConverter.toJava(CipherSuiteConverter.java:284)
{noformat}

In my tests creating a Docker image with an updated version of wildfly, solves 
the issue: 1.0.7.Final

Not sure if this is Spark problem, if so, where would be the right place to 
solve it.



> Incopatibility betweek spark docker image and hadoop 3.2 and azure tools
> 
>
> Key: SPARK-28258
> URL: https://issues.apache.org/jira/browse/SPARK-28258
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.3
>Reporter: Jose Luis Pedrosa
>Priority: Minor
>
> Currently the docker images generated by the distro uses openjdk8 based on 
> alpine.
>  This means that the version shipped of libssl is 1.1.1b-r1:
>   
> {noformat}
> sh-4.4# apk list | grep ssl
> libssl1.1-1.1.1b-r1 x86_64 {openssl} (OpenSSL) [installed] 
> {noformat}
> The hadoop distro ships wildfly-openssl-1.0.4.Final.jar, which is affected by 
> [https://issues.jboss.org/browse/JBEAP-16425].
> This results on error running on the executor:
> {noformat}
> 2019-07-04 22:32:40,339 INFO openssl.SSL: WFOPENSSL0002 OpenSSL Version 
> OpenSSL 1.1.1b 26 Feb 2019
> 2019-07-04 22:32:40,363 WARN streaming.FileStreamSink: Error while looking 
> for metadata directory.
> Exception in thread "main" java.lang.NullPointerException
>  at 
> org.wildfly.openssl.CipherSuiteConverter.toJava(CipherSuiteConverter.java:284)
> {noformat}
> In my tests creating a Docker image with an updated version of wildfly, 
> solves the issue: 1.0.7.Final
> Not sure if this is an Spark problem, if so, where would be the right place 
> to solve it. 
> It seems they may take care of it in Hadoop directly, but tickets are open.
> https://issues.apache.org/jira/browse/HADOOP-16410
> https://issues.apache.org/jira/browse/HADOOP-16405



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28241) Show metadata operations on ThriftServerTab

2019-07-05 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-28241.
---
   Resolution: Fixed
 Assignee: Yuming Wang
Fix Version/s: 3.0.0

> Show metadata operations on ThriftServerTab
> ---
>
> Key: SPARK-28241
> URL: https://issues.apache.org/jira/browse/SPARK-28241
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> !https://user-images.githubusercontent.com/5399861/60579741-4cd2c180-9db6-11e9-822a-0433be509b67.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28258) Incopatibility betweek spark docker image and hadoop 3.2 and azure tools

2019-07-05 Thread Jose Luis Pedrosa (JIRA)

Jose Luis Pedrosa created SPARK-28258:
-

 Summary: Incopatibility betweek spark docker image and hadoop 3.2 
and azure tools
 Key: SPARK-28258
 URL: https://issues.apache.org/jira/browse/SPARK-28258
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 2.4.3
Reporter: Jose Luis Pedrosa


Currently the docker images generated by the distro uses openjdk8 based on 
alpine.
 This means that the version shipped of libssl is 1.1.1b-r1:
 
{noformat}
sh-4.4# apk list | grep ssl
libssl1.1-1.1.1b-r1 x86_64 {openssl} (OpenSSL) [installed] 
{noformat}

The hadoop distro ships wildfly-openssl-1.0.4.Final.jar, which is affected by 
https://issues.jboss.org/browse/JBEAP-16425.

This results on error running on the executor:
{noformat}
2019-07-04 22:32:40,339 INFO openssl.SSL: WFOPENSSL0002 OpenSSL Version OpenSSL 
1.1.1b 26 Feb 2019
2019-07-04 22:32:40,363 WARN streaming.FileStreamSink: Error while looking for 
metadata directory.
Exception in thread "main" java.lang.NullPointerException
 at 
org.wildfly.openssl.CipherSuiteConverter.toJava(CipherSuiteConverter.java:284)
{noformat}

In my tests creating a Docker image with an updated version of wildfly, solves 
the issue: 1.0.7.Final

Not sure if this is Spark problem, if so, where would be the right place to 
solve it.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28206) "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc

2019-07-05 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879121#comment-16879121
 ] 

Hyukjin Kwon commented on SPARK-28206:
--

This is a side effect by Epydoc doc plugin, which is a legacy in PySpark. I am 
not sure if I can get rid of it and replace all Epydoc specific syntax but let 
me try.

> "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc
> -
>
> Key: SPARK-28206
> URL: https://issues.apache.org/jira/browse/SPARK-28206
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 2.4.1
>Reporter: Xiangrui Meng
>Priority: Major
> Attachments: Screen Shot 2019-06-28 at 9.55.13 AM.png
>
>
> Just noticed that in [pandas_udf API doc 
> |https://spark.apache.org/docs/2.4.1/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf],
>  "@pandas_udf" is render as ":pandas_udf".
> cc: [~hyukjin.kwon] [~smilegator]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28257) Use ConfigEntry for hardcoded configs in SQL module

2019-07-05 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28257:


Assignee: Apache Spark

> Use ConfigEntry for hardcoded configs in SQL module
> ---
>
> Key: SPARK-28257
> URL: https://issues.apache.org/jira/browse/SPARK-28257
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: EdisonWang
>Assignee: Apache Spark
>Priority: Minor
>
> Use ConfigEntry for hardcoded configs in SQL module



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28257) Use ConfigEntry for hardcoded configs in SQL module

2019-07-05 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28257:


Assignee: (was: Apache Spark)

> Use ConfigEntry for hardcoded configs in SQL module
> ---
>
> Key: SPARK-28257
> URL: https://issues.apache.org/jira/browse/SPARK-28257
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: EdisonWang
>Priority: Minor
>
> Use ConfigEntry for hardcoded configs in SQL module



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28257) Use ConfigEntry for hardcoded configs in SQL module

2019-07-05 Thread EdisonWang (JIRA)

EdisonWang created SPARK-28257:
--

 Summary: Use ConfigEntry for hardcoded configs in SQL module
 Key: SPARK-28257
 URL: https://issues.apache.org/jira/browse/SPARK-28257
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: EdisonWang


Use ConfigEntry for hardcoded configs in SQL module



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28015) Invalid date formats should throw an exception

2019-07-05 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879062#comment-16879062
 ] 

Dongjoon Hyun commented on SPARK-28015:
---

Thank you, [~iskenderunlu804]. Please file a PR on Apache Spark repo.

> Invalid date formats should throw an exception
> --
>
> Key: SPARK-28015
> URL: https://issues.apache.org/jira/browse/SPARK-28015
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Yuming Wang
>Priority: Major
>
> Invalid date formats should throw an exception:
> {code:sql}
> SELECT date '1999 08 01'
> 1999-01-01
> {code}
> Supported date formats:
> https://github.com/apache/spark/blob/ab8710b57916a129fcb89464209361120d224535/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L365-L374



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28218) Migrate Avro to File source V2

2019-07-05 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28218.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25017
[https://github.com/apache/spark/pull/25017]

> Migrate Avro to File source V2
> --
>
> Key: SPARK-28218
> URL: https://issues.apache.org/jira/browse/SPARK-28218
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28218) Migrate Avro to File source V2

2019-07-05 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28218:
-

Assignee: Gengliang Wang

> Migrate Avro to File source V2
> --
>
> Key: SPARK-28218
> URL: https://issues.apache.org/jira/browse/SPARK-28218
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28248) Upgrade docker image and library for PostgreSQL integration test

2019-07-05 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28248:
--
Summary: Upgrade docker image and library for PostgreSQL integration test  
(was: Upgrade Postgres docker image)

> Upgrade docker image and library for PostgreSQL integration test
> 
>
> Key: SPARK-28248
> URL: https://issues.apache.org/jira/browse/SPARK-28248
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28248) Upgrade docker image and library for PostgreSQL integration test

2019-07-05 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28248.
---
   Resolution: Fixed
 Assignee: Yuming Wang
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/25050

> Upgrade docker image and library for PostgreSQL integration test
> 
>
> Key: SPARK-28248
> URL: https://issues.apache.org/jira/browse/SPARK-28248
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28248) Upgrade Postgres docker image

2019-07-05 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28248:
--
Priority: Minor  (was: Major)

> Upgrade Postgres docker image
> -
>
> Key: SPARK-28248
> URL: https://issues.apache.org/jira/browse/SPARK-28248
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-07-05 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21067:


Assignee: Apache Spark

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0, 2.4.0, 2.4.3
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Assignee: Apache Spark
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at

[jira] [Assigned] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-07-05 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21067:


Assignee: (was: Apache Spark)

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0, 2.4.0, 2.4.3
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> at

[jira] [Assigned] (SPARK-28239) Allow TCP connections created by shuffle service auto close on YARN NodeManagers

2019-07-05 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28239:


Assignee: (was: Apache Spark)

> Allow TCP connections created by shuffle service auto close on YARN 
> NodeManagers
> 
>
> Key: SPARK-28239
> URL: https://issues.apache.org/jira/browse/SPARK-28239
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, YARN
>Affects Versions: 2.4.0
> Environment: Hadoop2.6.0-CDH5.8.3(netty3)
> Spark2.4.0(netty4)
> Configs:
> spark.shuffle.service.enabled=true
>Reporter: Deegue
>Priority: Minor
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> When executing shuffle tasks, TCP connections(on port 7337 by default) will 
> be established by shuffle service.
> It will like:
>  !screenshot-1.png! 
> However, some of the TCP connections are still busy when the task is actually 
> finished. These connections won't close automatically until we restart the 
> NodeManager process.
> Connections pile up and NodeManagers are getting slower and slower.
>  !screenshot-2.png! 
> These unclosed TCP connections stay busy and it seem doesn't take effect when 
> I set ChannelOption.SO_KEEPALIVE to true according to 
> [SPARK-23182|https://github.com/apache/spark/pull/20512].
> So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which 
> our cluster(running 1+ jobs / day) is processing normally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28239) Allow TCP connections created by shuffle service auto close on YARN NodeManagers

2019-07-05 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28239:


Assignee: Apache Spark

> Allow TCP connections created by shuffle service auto close on YARN 
> NodeManagers
> 
>
> Key: SPARK-28239
> URL: https://issues.apache.org/jira/browse/SPARK-28239
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, YARN
>Affects Versions: 2.4.0
> Environment: Hadoop2.6.0-CDH5.8.3(netty3)
> Spark2.4.0(netty4)
> Configs:
> spark.shuffle.service.enabled=true
>Reporter: Deegue
>Assignee: Apache Spark
>Priority: Minor
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> When executing shuffle tasks, TCP connections(on port 7337 by default) will 
> be established by shuffle service.
> It will like:
>  !screenshot-1.png! 
> However, some of the TCP connections are still busy when the task is actually 
> finished. These connections won't close automatically until we restart the 
> NodeManager process.
> Connections pile up and NodeManagers are getting slower and slower.
>  !screenshot-2.png! 
> These unclosed TCP connections stay busy and it seem doesn't take effect when 
> I set ChannelOption.SO_KEEPALIVE to true according to 
> [SPARK-23182|https://github.com/apache/spark/pull/20512].
> So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which 
> our cluster(running 1+ jobs / day) is processing normally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28248) Upgrade Postgres docker image

2019-07-05 Thread Yuming Wang (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28248:

Summary: Upgrade Postgres docker image  (was: Upgrade DB2 and Postgres 
docker image)

> Upgrade Postgres docker image
> -
>
> Key: SPARK-28248
> URL: https://issues.apache.org/jira/browse/SPARK-28248
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28256) Failed to initialize FileContextBasedCheckpointFileManager with uri without authority

2019-07-05 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28256:


Assignee: Apache Spark

> Failed to initialize FileContextBasedCheckpointFileManager with uri without 
> authority
> -
>
> Key: SPARK-28256
> URL: https://issues.apache.org/jira/browse/SPARK-28256
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Genmao Yu
>Assignee: Apache Spark
>Priority: Minor
>
> reproduce code
> {code:sql}
> CREATE TABLE `user_click_count` (`userId` STRING, `click` BIGINT)
> USING org.apache.spark.sql.json
> OPTIONS (path 'hdfs:///tmp/test');
> {code}
> error:
> {code:java}
> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250)
>   at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342)
>   at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
>   at 
> org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339)
>   at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456)
>   at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297)
>   at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63)
>   at 
> org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379)
> ...
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:134)
>   ... 67 more
> Caused by: org.apache.hadoop.HadoopIllegalArgumentException: Uri without 
> authority: hdfs:/tmp/test/_spark_metadata
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.getUri(AbstractFileSystem.java:313)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.(AbstractFileSystem.java:266)
>   at org.apache.hadoop.fs.Hdfs.(Hdfs.java:80)
>   ... 72 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28256) Failed to initialize FileContextBasedCheckpointFileManager with uri without authority

2019-07-05 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28256:


Assignee: (was: Apache Spark)

> Failed to initialize FileContextBasedCheckpointFileManager with uri without 
> authority
> -
>
> Key: SPARK-28256
> URL: https://issues.apache.org/jira/browse/SPARK-28256
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Genmao Yu
>Priority: Minor
>
> reproduce code
> {code:sql}
> CREATE TABLE `user_click_count` (`userId` STRING, `click` BIGINT)
> USING org.apache.spark.sql.json
> OPTIONS (path 'hdfs:///tmp/test');
> {code}
> error:
> {code:java}
> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250)
>   at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342)
>   at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
>   at 
> org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339)
>   at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456)
>   at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297)
>   at 
> org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189)
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63)
>   at 
> org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297)
>   at 
> org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379)
> ...
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:134)
>   ... 67 more
> Caused by: org.apache.hadoop.HadoopIllegalArgumentException: Uri without 
> authority: hdfs:/tmp/test/_spark_metadata
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.getUri(AbstractFileSystem.java:313)
>   at 
> org.apache.hadoop.fs.AbstractFileSystem.(AbstractFileSystem.java:266)
>   at org.apache.hadoop.fs.Hdfs.(Hdfs.java:80)
>   ... 72 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28256) Failed to initialize FileContextBasedCheckpointFileManager with uri without authority

2019-07-05 Thread Genmao Yu (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated SPARK-28256:
--
Description: 
reproduce code

{code:sql}

CREATE TABLE `user_click_count` (`userId` STRING, `click` BIGINT)
USING org.apache.spark.sql.json
OPTIONS (path 'hdfs:///tmp/test');
{code}

error:
{code:java}
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136)
at 
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165)
at 
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
at 
org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456)
at 
org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297)
at 
org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63)
at 
org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46)
at 
org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85)
at 
org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98)
at 
org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297)
at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379)

...

Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:134)
... 67 more
Caused by: org.apache.hadoop.HadoopIllegalArgumentException: Uri without 
authority: hdfs:/tmp/test/_spark_metadata
at 
org.apache.hadoop.fs.AbstractFileSystem.getUri(AbstractFileSystem.java:313)
at 
org.apache.hadoop.fs.AbstractFileSystem.(AbstractFileSystem.java:266)
at org.apache.hadoop.fs.Hdfs.(Hdfs.java:80)
... 72 more
{code}


  was:
{code:java}
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136)
at 
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165)
at 
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
at 
org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456)
at 
org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297)
at 
org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63)
at 
org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46)
at 
org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85)
at 
org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98)
at 
org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297)
at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379)

...

Caused by: java.lang.reflect.InvocationTargetException
at

[jira] [Updated] (SPARK-28256) Failed to initialize FileContextBasedCheckpointFileManager with uri without authority

2019-07-05 Thread Genmao Yu (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Genmao Yu updated SPARK-28256:
--
Description: 
{code:java}
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136)
at 
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165)
at 
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
at 
org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456)
at 
org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297)
at 
org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63)
at 
org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46)
at 
org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85)
at 
org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98)
at 
org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297)
at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379)

...

Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:134)
... 67 more
Caused by: org.apache.hadoop.HadoopIllegalArgumentException: Uri without 
authority: hdfs:/tmp/test/_spark_metadata
at 
org.apache.hadoop.fs.AbstractFileSystem.getUri(AbstractFileSystem.java:313)
at 
org.apache.hadoop.fs.AbstractFileSystem.(AbstractFileSystem.java:266)
at org.apache.hadoop.fs.Hdfs.(Hdfs.java:80)
... 72 more
{code}

  was:
{code}
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136)
at 
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165)
at 
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
at 
org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456)
at 
org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297)
at 
org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63)
at 
org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46)
at 
org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85)
at 
org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98)
at 
org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297)
at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379)

...

Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at

[jira] [Created] (SPARK-28256) Failed to initialize FileContextBasedCheckpointFileManager with uri without authority

2019-07-05 Thread Genmao Yu (JIRA)

Genmao Yu created SPARK-28256:
-

 Summary: Failed to initialize 
FileContextBasedCheckpointFileManager with uri without authority
 Key: SPARK-28256
 URL: https://issues.apache.org/jira/browse/SPARK-28256
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.0.0
Reporter: Genmao Yu


{code}
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136)
at 
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165)
at 
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911)
at 
org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456)
at 
org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297)
at 
org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63)
at 
org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46)
at 
org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85)
at 
org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98)
at 
org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297)
at 
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379)

...

Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:134)
... 67 more
Caused by: org.apache.hadoop.HadoopIllegalArgumentException: Uri without 
authority: hdfs:/tmp/test13/_spark_metadata
at 
org.apache.hadoop.fs.AbstractFileSystem.getUri(AbstractFileSystem.java:313)
at 
org.apache.hadoop.fs.AbstractFileSystem.(AbstractFileSystem.java:266)
at org.apache.hadoop.fs.Hdfs.(Hdfs.java:80)
... 72 more
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

50 matches

Mail list logo