[jira] [Assigned] (SPARK-28020) Add date.sql
[ https://issues.apache.org/jira/browse/SPARK-28020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28020: - Assignee: Yuming Wang > Add date.sql > > > Key: SPARK-28020 > URL: https://issues.apache.org/jira/browse/SPARK-28020 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > In this ticket, we plan to add the regression test cases of > https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28020) Add date.sql
[ https://issues.apache.org/jira/browse/SPARK-28020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28020. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 24850 [https://github.com/apache/spark/pull/24850] > Add date.sql > > > Key: SPARK-28020 > URL: https://issues.apache.org/jira/browse/SPARK-28020 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > In this ticket, we plan to add the regression test cases of > https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27296) User Defined Aggregating Functions (UDAFs) have a major efficiency problem
[ https://issues.apache.org/jira/browse/SPARK-27296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Erlandson updated SPARK-27296: --- Target Version/s: 3.0.0 > User Defined Aggregating Functions (UDAFs) have a major efficiency problem > -- > > Key: SPARK-27296 > URL: https://issues.apache.org/jira/browse/SPARK-27296 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL, Structured Streaming >Affects Versions: 2.3.3, 2.4.0, 3.0.0 >Reporter: Erik Erlandson >Assignee: Erik Erlandson >Priority: Major > Labels: performance, usability > > Spark's UDAFs appear to be serializing and de-serializing to/from the > MutableAggregationBuffer for each row. This gist shows a small reproducing > UDAF and a spark shell session: > [https://gist.github.com/erikerlandson/3c4d8c6345d1521d89e0d894a423046f] > The UDAF and its compantion UDT are designed to count the number of times > that ser/de is invoked for the aggregator. The spark shell session > demonstrates that it is executing ser/de on every row of the data frame. > Note, Spark's pre-defined aggregators do not have this problem, as they are > based on an internal aggregating trait that does the correct thing and only > calls ser/de at points such as partition boundaries, presenting final > results, etc. > This is a major problem for UDAFs, as it means that every UDAF is doing a > massive amount of unnecessary work per row, including but not limited to Row > object allocations. For a more realistic UDAF having its own non trivial > internal structure it is obviously that much worse. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28266) data correctness issue: data duplication when `path` serde peroperty is present
Ruslan Dautkhanov created SPARK-28266: - Summary: data correctness issue: data duplication when `path` serde peroperty is present Key: SPARK-28266 URL: https://issues.apache.org/jira/browse/SPARK-28266 Project: Spark Issue Type: Bug Components: Optimizer, Spark Core Affects Versions: 2.4.3, 2.4.2, 2.4.1, 2.4.0, 2.3.3, 2.3.2, 2.3.1, 2.3.0, 2.2.3, 2.2.2, 2.2.1, 2.2.0, 2.3.4, 2.4.4, 3.0.0 Reporter: Ruslan Dautkhanov Spark duplicates returned datasets when `path` serde is present in a parquet table. Confirmed versions affected: Spark 2.2, Spark 2.3, Spark 2.4. Confirmed unaffected versions: Spark 2.1 and earlier (tested with Spark 1.6 at least). Reproducer: {code:python} >>> spark.sql("create table ruslan_test.test55 as select 1 as id") DataFrame[] >>> spark.table("ruslan_test.test55").explain() == Physical Plan == HiveTableScan [id#16], HiveTableRelation `ruslan_test`.`test55`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#16] >>> spark.table("ruslan_test.test55").count() 1 {code} (all is good at this point, now exist session and run in Hive for example - ) {code:sql} ALTER TABLE ruslan_test.test55 SET SERDEPROPERTIES ( 'path'='hdfs://epsdatalake/hivewarehouse/ruslan_test.db/test55' ) {code} So LOCATION and serde `path` property would point to the same location. Now see count returns two records instead of one: {code:python} >>> spark.table("ruslan_test.test55").count() 2 >>> spark.table("ruslan_test.test55").explain() == Physical Plan == *(1) FileScan parquet ruslan_test.test55[id#9] Batched: true, Format: Parquet, Location: InMemoryFileIndex[hdfs://epsdatalake/hivewarehouse/ruslan_test.db/test55, hdfs://epsdatalake/hive..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct >>> {code} Also notice that the presence of `path` serde property makes TABLE location show up twice - {quote} InMemoryFileIndex[hdfs://epsdatalake/hivewarehouse/ruslan_test.db/test55, hdfs://epsdatalake/hive..., {quote} We have some applications that create parquet tables in Hive with `path` serde property and it makes data duplicate in query results. Hive, Impala etc and Spark version 2.1 and earlier read such tables fine, but not Spark 2.2 and later releases. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28265) Missing TableCatalog API to rename table
Edgar Rodriguez created SPARK-28265: --- Summary: Missing TableCatalog API to rename table Key: SPARK-28265 URL: https://issues.apache.org/jira/browse/SPARK-28265 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.3 Reporter: Edgar Rodriguez In the [Table Metadata API SPIP|https://docs.google.com/document/d/1zLFiA1VuaWeVxeTDXNg8bL6GP3BVoOZBkewFtEnjEoo/edit#] ([SPARK-27658|https://issues.apache.org/jira/browse/SPARK-27067]) the {{renameTable}} operation for the TableCatalog API is defined as: {code:java} renameTable(CatalogIdentifier from, CatalogIdentifier to): Unit{code} However, it was not included in the PR implementing it in [https://github.com/apache/spark/pull/24246]. It seems like this method is missing or is it unsupported? Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28264) Revisiting Python / pandas UDF
Reynold Xin created SPARK-28264: --- Summary: Revisiting Python / pandas UDF Key: SPARK-28264 URL: https://issues.apache.org/jira/browse/SPARK-28264 Project: Spark Issue Type: Improvement Components: PySpark, SQL Affects Versions: 3.0.0 Reporter: Reynold Xin Assignee: Reynold Xin In the past two years, the pandas UDFs are perhaps the most important changes to Spark for Python data science. However, these functionalities have evolved organically, leading to some inconsistencies and confusions among users. This document revisits UDF definition and naming, as a result of discussions among Xiangrui, Li Jin, Hyukjin, and Reynold. See document here: [https://docs.google.com/document/d/10Pkl-rqygGao2xQf6sddt0b-4FYK4g8qr_bXLKTL65A/edit#|https://docs.google.com/document/d/10Pkl-rqygGao2xQf6sddt0b-4FYK4g8qr_bXLKTL65A/edit] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25994) SPIP: Property Graphs, Cypher Queries, and Algorithms
[ https://issues.apache.org/jira/browse/SPARK-25994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879508#comment-16879508 ] Sam hendley commented on SPARK-25994: - Forgive the intrusion but it is unclear to me if there is still going to be a 'low level' Pregel api as part of this redesign? If so I have a few modifications I would like to propose to the pregel API to make it more useful and easier to track/debug if you could point me to an appropriate ticket. > SPIP: Property Graphs, Cypher Queries, and Algorithms > - > > Key: SPARK-25994 > URL: https://issues.apache.org/jira/browse/SPARK-25994 > Project: Spark > Issue Type: Epic > Components: Graph >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Martin Junghanns >Priority: Major > Labels: SPIP > > Copied from the SPIP doc: > {quote} > GraphX was one of the foundational pillars of the Spark project, and is the > current graph component. This reflects the importance of the graphs data > model, which naturally pairs with an important class of analytic function, > the network or graph algorithm. > However, GraphX is not actively maintained. It is based on RDDs, and cannot > exploit Spark 2’s Catalyst query engine. GraphX is only available to Scala > users. > GraphFrames is a Spark package, which implements DataFrame-based graph > algorithms, and also incorporates simple graph pattern matching with fixed > length patterns (called “motifs”). GraphFrames is based on DataFrames, but > has a semantically weak graph data model (based on untyped edges and > vertices). The motif pattern matching facility is very limited by comparison > with the well-established Cypher language. > The Property Graph data model has become quite widespread in recent years, > and is the primary focus of commercial graph data management and of graph > data research, both for on-premises and cloud data management. Many users of > transactional graph databases also wish to work with immutable graphs in > Spark. > The idea is to define a Cypher-compatible Property Graph type based on > DataFrames; to replace GraphFrames querying with Cypher; to reimplement > GraphX/GraphFrames algos on the PropertyGraph type. > To achieve this goal, a core subset of Cypher for Apache Spark (CAPS), > reusing existing proven designs and code, will be employed in Spark 3.0. This > graph query processor, like CAPS, will overlay and drive the SparkSQL > Catalyst query engine, using the CAPS graph query planner. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28263) Spark-submit can not find class (ClassNotFoundException)
Zhiyuan created SPARK-28263: --- Summary: Spark-submit can not find class (ClassNotFoundException) Key: SPARK-28263 URL: https://issues.apache.org/jira/browse/SPARK-28263 Project: Spark Issue Type: Bug Components: Spark Shell, Spark Submit Affects Versions: 2.4.3 Reporter: Zhiyuan I try to run the Main class in my folder using the following code in the script: {code:java} spark-shell --class com.navercorp.Main /target/node2vec-0.0.1-SNAPSHOT.jar --cmd node2vec ../graph/karate.edgelist --output ../walk/walk.txt {code} But it raises the error: {code:java} 19/07/05 14:39:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/07/05 14:39:20 WARN deploy.SparkSubmit$$anon$2: Failed to load com.navercorp.Main. java.lang.ClassNotFoundException: com.navercorp.Main at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:238) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:810) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala){code} I have jar file in my folder, this is the structure: {code:java} 1node2vec 2node2vec_spark 3main 4resources 4com 5novercorp 6lib 7Main 7Node2vec 7Word2vec 2target 3lib 3classes 3maven-archiver 3node2vec-0.0.1-SNAPSHOT.jar 2graph 3---karate.edgelist 2walk 3walk.txt {code} Also, I attach the structure of jar file: {code:java} ```META-INF/ META-INF/MANIFEST.MF log4j2.properties com/ com/navercorp/ com/navercorp/Node2vec$.class com/navercorp/Main$Params$$typecreator1$1.class com/navercorp/Main$$anon$1$$anonfun$11.class com/navercorp/Word2vec$.class com/navercorp/Main$$anon$1$$anonfun$8.class com/navercorp/Node2vec$$anonfun$randomWalk$1$$anonfun$8.class com/navercorp/Node2vec$$anonfun$indexingGraph$4.class com/navercorp/Node2vec$$anonfun$initTransitionProb$1.class com/navercorp/Main$.class com/navercorp/Node2vec$$anonfun$loadNode2Id$1.class com/navercorp/Node2vec$$anonfun$14.class com/navercorp/Node2vec$$anonfun$readIndexedGraph$2$$anonfun$1.class ```{code} Could someone give me the advice on how to connect Main class? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27296) User Defined Aggregating Functions (UDAFs) have a major efficiency problem
[ https://issues.apache.org/jira/browse/SPARK-27296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Erlandson reassigned SPARK-27296: -- Assignee: Erik Erlandson > User Defined Aggregating Functions (UDAFs) have a major efficiency problem > -- > > Key: SPARK-27296 > URL: https://issues.apache.org/jira/browse/SPARK-27296 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL, Structured Streaming >Affects Versions: 2.3.3, 2.4.0, 3.0.0 >Reporter: Erik Erlandson >Assignee: Erik Erlandson >Priority: Major > Labels: performance, usability > > Spark's UDAFs appear to be serializing and de-serializing to/from the > MutableAggregationBuffer for each row. This gist shows a small reproducing > UDAF and a spark shell session: > [https://gist.github.com/erikerlandson/3c4d8c6345d1521d89e0d894a423046f] > The UDAF and its compantion UDT are designed to count the number of times > that ser/de is invoked for the aggregator. The spark shell session > demonstrates that it is executing ser/de on every row of the data frame. > Note, Spark's pre-defined aggregators do not have this problem, as they are > based on an internal aggregating trait that does the correct thing and only > calls ser/de at points such as partition boundaries, presenting final > results, etc. > This is a major problem for UDAFs, as it means that every UDAF is doing a > massive amount of unnecessary work per row, including but not limited to Row > object allocations. For a more realistic UDAF having its own non trivial > internal structure it is obviously that much worse. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28206) "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc
[ https://issues.apache.org/jira/browse/SPARK-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng reassigned SPARK-28206: - Assignee: Hyukjin Kwon > "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc > - > > Key: SPARK-28206 > URL: https://issues.apache.org/jira/browse/SPARK-28206 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 2.4.1 >Reporter: Xiangrui Meng >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > Attachments: Screen Shot 2019-06-28 at 9.55.13 AM.png > > > Just noticed that in [pandas_udf API doc > |https://spark.apache.org/docs/2.4.1/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf], > "@pandas_udf" is render as ":pandas_udf". > cc: [~hyukjin.kwon] [~smilegator] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28206) "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc
[ https://issues.apache.org/jira/browse/SPARK-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-28206. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25060 [https://github.com/apache/spark/pull/25060] > "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc > - > > Key: SPARK-28206 > URL: https://issues.apache.org/jira/browse/SPARK-28206 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 2.4.1 >Reporter: Xiangrui Meng >Priority: Major > Fix For: 3.0.0 > > Attachments: Screen Shot 2019-06-28 at 9.55.13 AM.png > > > Just noticed that in [pandas_udf API doc > |https://spark.apache.org/docs/2.4.1/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf], > "@pandas_udf" is render as ":pandas_udf". > cc: [~hyukjin.kwon] [~smilegator] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27898) Support 4 date operators(date + integer, integer + date, date - integer and date - date)
[ https://issues.apache.org/jira/browse/SPARK-27898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27898. --- Resolution: Fixed Assignee: Yuming Wang Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/24755 > Support 4 date operators(date + integer, integer + date, date - integer and > date - date) > > > Key: SPARK-27898 > URL: https://issues.apache.org/jira/browse/SPARK-27898 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > Support 4 date operators(date + integer, integer + date, date - integer and > date - date): > ||Operator||Example||Result|| > |+|date '2001-09-28' + integer '7'|date '2001-10-05'| > |-|date '2001-10-01' - integer '7'|date '2001-09-24'| > |-|date '2001-10-01' - date '2001-09-28'|integer '3' (days)| > [https://www.postgresql.org/docs/12/functions-datetime.html] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28262) Support DROP GLOBAL TEMPORARY VIEW
[ https://issues.apache.org/jira/browse/SPARK-28262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-28262: Description: We create global temporary view by: CREATE GLOBAL TEMPORARY VIEW temp_view AS SELECT 1 AS col1; But we need to specify {{spark.sql.globalTempDatabase}} when drop it: DROP VIEW global_temp.temp_view; Which is not very convenient, we should add support {{DROP GLOBAL TEMPORARY VIEW}}. > Support DROP GLOBAL TEMPORARY VIEW > -- > > Key: SPARK-28262 > URL: https://issues.apache.org/jira/browse/SPARK-28262 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > We create global temporary view by: > CREATE GLOBAL TEMPORARY VIEW temp_view AS SELECT 1 AS col1; > But we need to specify {{spark.sql.globalTempDatabase}} when drop it: > DROP VIEW global_temp.temp_view; > Which is not very convenient, we should add support {{DROP GLOBAL TEMPORARY > VIEW}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28262) Support DROP GLOBAL TEMPORARY VIEW
[ https://issues.apache.org/jira/browse/SPARK-28262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28262: Assignee: Apache Spark > Support DROP GLOBAL TEMPORARY VIEW > -- > > Key: SPARK-28262 > URL: https://issues.apache.org/jira/browse/SPARK-28262 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28262) Support DROP GLOBAL TEMPORARY VIEW
[ https://issues.apache.org/jira/browse/SPARK-28262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28262: Assignee: (was: Apache Spark) > Support DROP GLOBAL TEMPORARY VIEW > -- > > Key: SPARK-28262 > URL: https://issues.apache.org/jira/browse/SPARK-28262 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28262) Support DROP GLOBAL TEMPORARY VIEW
Yuming Wang created SPARK-28262: --- Summary: Support DROP GLOBAL TEMPORARY VIEW Key: SPARK-28262 URL: https://issues.apache.org/jira/browse/SPARK-28262 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28261) Flaky test: org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable
[ https://issues.apache.org/jira/browse/SPARK-28261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Somogyi updated SPARK-28261: -- Description: Error message: {noformat} java.lang.AssertionError: expected:<3> but was:<4> ...{noformat} > Flaky test: > org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable > --- > > Key: SPARK-28261 > URL: https://issues.apache.org/jira/browse/SPARK-28261 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Minor > > Error message: > {noformat} > java.lang.AssertionError: expected:<3> but was:<4> > ...{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28261) Flaky test: org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable
Gabor Somogyi created SPARK-28261: - Summary: Flaky test: org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable Key: SPARK-28261 URL: https://issues.apache.org/jira/browse/SPARK-28261 Project: Spark Issue Type: Bug Components: Spark Core, Tests Affects Versions: 3.0.0 Reporter: Gabor Somogyi -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28261) Flaky test: org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable
[ https://issues.apache.org/jira/browse/SPARK-28261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879350#comment-16879350 ] Gabor Somogyi commented on SPARK-28261: --- I'm working on this. > Flaky test: > org.apache.spark.network.TransportClientFactorySuite.reuseClientsUpToConfigVariable > --- > > Key: SPARK-28261 > URL: https://issues.apache.org/jira/browse/SPARK-28261 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28260) Add CLOSED state to ExecutionState
[ https://issues.apache.org/jira/browse/SPARK-28260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879349#comment-16879349 ] Yuming Wang commented on SPARK-28260: - I'm working on. > Add CLOSED state to ExecutionState > -- > > Key: SPARK-28260 > URL: https://issues.apache.org/jira/browse/SPARK-28260 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Currently, the ThriftServerTab displays a FINISHED state when the operation > finishes execution, but quite often it still takes a lot of time to fetch the > results. OperationState has state CLOSED for when after the iterator is > closed. Could we add CLOSED state to ExecutionState, and override the close() > in SparkExecuteStatement / GetSchemas / GetTables / GetColumns to do > HiveThriftServerListener.onOperationClosed? > > https://github.com/apache/spark/pull/25043#issuecomment-508722874 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28260) Add CLOSED state to ExecutionState
Yuming Wang created SPARK-28260: --- Summary: Add CLOSED state to ExecutionState Key: SPARK-28260 URL: https://issues.apache.org/jira/browse/SPARK-28260 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang Currently, the ThriftServerTab displays a FINISHED state when the operation finishes execution, but quite often it still takes a lot of time to fetch the results. OperationState has state CLOSED for when after the iterator is closed. Could we add CLOSED state to ExecutionState, and override the close() in SparkExecuteStatement / GetSchemas / GetTables / GetColumns to do HiveThriftServerListener.onOperationClosed? https://github.com/apache/spark/pull/25043#issuecomment-508722874 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28200) Decimal overflow handling in ExpressionEncoder
[ https://issues.apache.org/jira/browse/SPARK-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-28200: --- Assignee: Mick Jermsurawong > Decimal overflow handling in ExpressionEncoder > -- > > Key: SPARK-28200 > URL: https://issues.apache.org/jira/browse/SPARK-28200 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Assignee: Mick Jermsurawong >Priority: Major > Fix For: 3.0.0 > > > As pointed out in https://github.com/apache/spark/pull/20350, we are > currently not checking the overflow when serializing a java/scala > `BigDecimal` in `ExpressionEncoder` / `ScalaReflection`. > We should add this check there too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28200) Decimal overflow handling in ExpressionEncoder
[ https://issues.apache.org/jira/browse/SPARK-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-28200. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25016 [https://github.com/apache/spark/pull/25016] > Decimal overflow handling in ExpressionEncoder > -- > > Key: SPARK-28200 > URL: https://issues.apache.org/jira/browse/SPARK-28200 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Priority: Major > Fix For: 3.0.0 > > > As pointed out in https://github.com/apache/spark/pull/20350, we are > currently not checking the overflow when serializing a java/scala > `BigDecimal` in `ExpressionEncoder` / `ScalaReflection`. > We should add this check there too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28259) Date/Time Output Styles and Date Order Conventions
[ https://issues.apache.org/jira/browse/SPARK-28259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-28259: Description: *Date/Time Output Styles* ||Style Specification||Description||Example|| |{{ISO}}|ISO 8601, SQL standard|{{1997-12-17 07:37:16-08}}| |{{SQL}}|traditional style|{{12/17/1997 07:37:16.00 PST}}| |{{Postgres}}|original style|{{Wed Dec 17 07:37:16 1997 PST}}| |{{German}}|regional style|{{17.12.1997 07:37:16.00 PST}}| [https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT-TABLE] *Date Order Conventions* ||{{datestyle}} Setting||Input Ordering||Example Output|| |{{SQL, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{17/12/1997 15:37:16.00 CET}}| |{{SQL, MDY}}|_{{month}}_/_{{day}}_/_{{year}}_|{{12/17/1997 07:37:16.00 PST}}| |{{Postgres, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{Wed 17 Dec 07:37:16 1997 PST}}| [https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT2-TABLE] was: *Date/Time Output Styles* ||Style Specification||Description||Example|| |{{ISO}}|ISO 8601, SQL standard|{{1997-12-17 07:37:16-08}}| |{{SQL}}|traditional style|{{12/17/1997 07:37:16.00 PST}}| |{{Postgres}}|original style|{{Wed Dec 17 07:37:16 1997 PST}}| |{{German}}|regional style|{{17.12.1997 07:37:16.00 PST}}| [ https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT-TABLE] *Date Order Conventions* ||{{datestyle}} Setting||Input Ordering||Example Output|| |{{SQL, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{17/12/1997 15:37:16.00 CET}}| |{{SQL, MDY}}|_{{month}}_/_{{day}}_/_{{year}}_|{{12/17/1997 07:37:16.00 PST}}| |{{Postgres, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{Wed 17 Dec 07:37:16 1997 PST}}| [https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT2-TABLE] > Date/Time Output Styles and Date Order Conventions > -- > > Key: SPARK-28259 > URL: https://issues.apache.org/jira/browse/SPARK-28259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > *Date/Time Output Styles* > ||Style Specification||Description||Example|| > |{{ISO}}|ISO 8601, SQL standard|{{1997-12-17 07:37:16-08}}| > |{{SQL}}|traditional style|{{12/17/1997 07:37:16.00 PST}}| > |{{Postgres}}|original style|{{Wed Dec 17 07:37:16 1997 PST}}| > |{{German}}|regional style|{{17.12.1997 07:37:16.00 PST}}| > [https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT-TABLE] > > *Date Order Conventions* > ||{{datestyle}} Setting||Input Ordering||Example Output|| > |{{SQL, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{17/12/1997 15:37:16.00 CET}}| > |{{SQL, MDY}}|_{{month}}_/_{{day}}_/_{{year}}_|{{12/17/1997 07:37:16.00 PST}}| > |{{Postgres, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{Wed 17 Dec 07:37:16 > 1997 PST}}| > [https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT2-TABLE] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28259) Date/Time Output Styles and Date Order Conventions
Yuming Wang created SPARK-28259: --- Summary: Date/Time Output Styles and Date Order Conventions Key: SPARK-28259 URL: https://issues.apache.org/jira/browse/SPARK-28259 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang *Date/Time Output Styles* ||Style Specification||Description||Example|| |{{ISO}}|ISO 8601, SQL standard|{{1997-12-17 07:37:16-08}}| |{{SQL}}|traditional style|{{12/17/1997 07:37:16.00 PST}}| |{{Postgres}}|original style|{{Wed Dec 17 07:37:16 1997 PST}}| |{{German}}|regional style|{{17.12.1997 07:37:16.00 PST}}| [ https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT-TABLE] *Date Order Conventions* ||{{datestyle}} Setting||Input Ordering||Example Output|| |{{SQL, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{17/12/1997 15:37:16.00 CET}}| |{{SQL, MDY}}|_{{month}}_/_{{day}}_/_{{year}}_|{{12/17/1997 07:37:16.00 PST}}| |{{Postgres, DMY}}|_{{day}}_/_{{month}}_/_{{year}}_|{{Wed 17 Dec 07:37:16 1997 PST}}| [https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-DATETIME-OUTPUT2-TABLE] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28206) "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc
[ https://issues.apache.org/jira/browse/SPARK-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28206: Assignee: (was: Apache Spark) > "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc > - > > Key: SPARK-28206 > URL: https://issues.apache.org/jira/browse/SPARK-28206 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 2.4.1 >Reporter: Xiangrui Meng >Priority: Major > Attachments: Screen Shot 2019-06-28 at 9.55.13 AM.png > > > Just noticed that in [pandas_udf API doc > |https://spark.apache.org/docs/2.4.1/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf], > "@pandas_udf" is render as ":pandas_udf". > cc: [~hyukjin.kwon] [~smilegator] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28206) "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc
[ https://issues.apache.org/jira/browse/SPARK-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28206: Assignee: Apache Spark > "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc > - > > Key: SPARK-28206 > URL: https://issues.apache.org/jira/browse/SPARK-28206 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 2.4.1 >Reporter: Xiangrui Meng >Assignee: Apache Spark >Priority: Major > Attachments: Screen Shot 2019-06-28 at 9.55.13 AM.png > > > Just noticed that in [pandas_udf API doc > |https://spark.apache.org/docs/2.4.1/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf], > "@pandas_udf" is render as ":pandas_udf". > cc: [~hyukjin.kwon] [~smilegator] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28258) Incopatibility betweek spark docker image and hadoop 3.2 and azure tools
[ https://issues.apache.org/jira/browse/SPARK-28258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Luis Pedrosa updated SPARK-28258: -- Description: Currently the docker images generated by the distro uses openjdk8 based on alpine. This means that the version shipped of libssl is 1.1.1b-r1: {noformat} sh-4.4# apk list | grep ssl libssl1.1-1.1.1b-r1 x86_64 {openssl} (OpenSSL) [installed] {noformat} The hadoop distro ships wildfly-openssl-1.0.4.Final.jar, which is affected by [https://issues.jboss.org/browse/JBEAP-16425]. This results on error running on the executor: {noformat} 2019-07-04 22:32:40,339 INFO openssl.SSL: WFOPENSSL0002 OpenSSL Version OpenSSL 1.1.1b 26 Feb 2019 2019-07-04 22:32:40,363 WARN streaming.FileStreamSink: Error while looking for metadata directory. Exception in thread "main" java.lang.NullPointerException at org.wildfly.openssl.CipherSuiteConverter.toJava(CipherSuiteConverter.java:284) {noformat} In my tests creating a Docker image with an updated version of wildfly, solves the issue: 1.0.7.Final Not sure if this is an Spark problem, if so, where would be the right place to solve it. It seems they may take care of it in Hadoop directly, but tickets are open. https://issues.apache.org/jira/browse/HADOOP-16410 https://issues.apache.org/jira/browse/HADOOP-16405 was: Currently the docker images generated by the distro uses openjdk8 based on alpine. This means that the version shipped of libssl is 1.1.1b-r1: {noformat} sh-4.4# apk list | grep ssl libssl1.1-1.1.1b-r1 x86_64 {openssl} (OpenSSL) [installed] {noformat} The hadoop distro ships wildfly-openssl-1.0.4.Final.jar, which is affected by https://issues.jboss.org/browse/JBEAP-16425. This results on error running on the executor: {noformat} 2019-07-04 22:32:40,339 INFO openssl.SSL: WFOPENSSL0002 OpenSSL Version OpenSSL 1.1.1b 26 Feb 2019 2019-07-04 22:32:40,363 WARN streaming.FileStreamSink: Error while looking for metadata directory. Exception in thread "main" java.lang.NullPointerException at org.wildfly.openssl.CipherSuiteConverter.toJava(CipherSuiteConverter.java:284) {noformat} In my tests creating a Docker image with an updated version of wildfly, solves the issue: 1.0.7.Final Not sure if this is Spark problem, if so, where would be the right place to solve it. > Incopatibility betweek spark docker image and hadoop 3.2 and azure tools > > > Key: SPARK-28258 > URL: https://issues.apache.org/jira/browse/SPARK-28258 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.3 >Reporter: Jose Luis Pedrosa >Priority: Minor > > Currently the docker images generated by the distro uses openjdk8 based on > alpine. > This means that the version shipped of libssl is 1.1.1b-r1: > > {noformat} > sh-4.4# apk list | grep ssl > libssl1.1-1.1.1b-r1 x86_64 {openssl} (OpenSSL) [installed] > {noformat} > The hadoop distro ships wildfly-openssl-1.0.4.Final.jar, which is affected by > [https://issues.jboss.org/browse/JBEAP-16425]. > This results on error running on the executor: > {noformat} > 2019-07-04 22:32:40,339 INFO openssl.SSL: WFOPENSSL0002 OpenSSL Version > OpenSSL 1.1.1b 26 Feb 2019 > 2019-07-04 22:32:40,363 WARN streaming.FileStreamSink: Error while looking > for metadata directory. > Exception in thread "main" java.lang.NullPointerException > at > org.wildfly.openssl.CipherSuiteConverter.toJava(CipherSuiteConverter.java:284) > {noformat} > In my tests creating a Docker image with an updated version of wildfly, > solves the issue: 1.0.7.Final > Not sure if this is an Spark problem, if so, where would be the right place > to solve it. > It seems they may take care of it in Hadoop directly, but tickets are open. > https://issues.apache.org/jira/browse/HADOOP-16410 > https://issues.apache.org/jira/browse/HADOOP-16405 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28241) Show metadata operations on ThriftServerTab
[ https://issues.apache.org/jira/browse/SPARK-28241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-28241. --- Resolution: Fixed Assignee: Yuming Wang Fix Version/s: 3.0.0 > Show metadata operations on ThriftServerTab > --- > > Key: SPARK-28241 > URL: https://issues.apache.org/jira/browse/SPARK-28241 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > !https://user-images.githubusercontent.com/5399861/60579741-4cd2c180-9db6-11e9-822a-0433be509b67.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28258) Incopatibility betweek spark docker image and hadoop 3.2 and azure tools
Jose Luis Pedrosa created SPARK-28258: - Summary: Incopatibility betweek spark docker image and hadoop 3.2 and azure tools Key: SPARK-28258 URL: https://issues.apache.org/jira/browse/SPARK-28258 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 2.4.3 Reporter: Jose Luis Pedrosa Currently the docker images generated by the distro uses openjdk8 based on alpine. This means that the version shipped of libssl is 1.1.1b-r1: {noformat} sh-4.4# apk list | grep ssl libssl1.1-1.1.1b-r1 x86_64 {openssl} (OpenSSL) [installed] {noformat} The hadoop distro ships wildfly-openssl-1.0.4.Final.jar, which is affected by https://issues.jboss.org/browse/JBEAP-16425. This results on error running on the executor: {noformat} 2019-07-04 22:32:40,339 INFO openssl.SSL: WFOPENSSL0002 OpenSSL Version OpenSSL 1.1.1b 26 Feb 2019 2019-07-04 22:32:40,363 WARN streaming.FileStreamSink: Error while looking for metadata directory. Exception in thread "main" java.lang.NullPointerException at org.wildfly.openssl.CipherSuiteConverter.toJava(CipherSuiteConverter.java:284) {noformat} In my tests creating a Docker image with an updated version of wildfly, solves the issue: 1.0.7.Final Not sure if this is Spark problem, if so, where would be the right place to solve it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28206) "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc
[ https://issues.apache.org/jira/browse/SPARK-28206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879121#comment-16879121 ] Hyukjin Kwon commented on SPARK-28206: -- This is a side effect by Epydoc doc plugin, which is a legacy in PySpark. I am not sure if I can get rid of it and replace all Epydoc specific syntax but let me try. > "@pandas_udf" in doctest is rendered as ":pandas_udf" in html API doc > - > > Key: SPARK-28206 > URL: https://issues.apache.org/jira/browse/SPARK-28206 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 2.4.1 >Reporter: Xiangrui Meng >Priority: Major > Attachments: Screen Shot 2019-06-28 at 9.55.13 AM.png > > > Just noticed that in [pandas_udf API doc > |https://spark.apache.org/docs/2.4.1/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf], > "@pandas_udf" is render as ":pandas_udf". > cc: [~hyukjin.kwon] [~smilegator] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28257) Use ConfigEntry for hardcoded configs in SQL module
[ https://issues.apache.org/jira/browse/SPARK-28257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28257: Assignee: Apache Spark > Use ConfigEntry for hardcoded configs in SQL module > --- > > Key: SPARK-28257 > URL: https://issues.apache.org/jira/browse/SPARK-28257 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: EdisonWang >Assignee: Apache Spark >Priority: Minor > > Use ConfigEntry for hardcoded configs in SQL module -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28257) Use ConfigEntry for hardcoded configs in SQL module
[ https://issues.apache.org/jira/browse/SPARK-28257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28257: Assignee: (was: Apache Spark) > Use ConfigEntry for hardcoded configs in SQL module > --- > > Key: SPARK-28257 > URL: https://issues.apache.org/jira/browse/SPARK-28257 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: EdisonWang >Priority: Minor > > Use ConfigEntry for hardcoded configs in SQL module -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28257) Use ConfigEntry for hardcoded configs in SQL module
EdisonWang created SPARK-28257: -- Summary: Use ConfigEntry for hardcoded configs in SQL module Key: SPARK-28257 URL: https://issues.apache.org/jira/browse/SPARK-28257 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: EdisonWang Use ConfigEntry for hardcoded configs in SQL module -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28015) Invalid date formats should throw an exception
[ https://issues.apache.org/jira/browse/SPARK-28015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879062#comment-16879062 ] Dongjoon Hyun commented on SPARK-28015: --- Thank you, [~iskenderunlu804]. Please file a PR on Apache Spark repo. > Invalid date formats should throw an exception > -- > > Key: SPARK-28015 > URL: https://issues.apache.org/jira/browse/SPARK-28015 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: Yuming Wang >Priority: Major > > Invalid date formats should throw an exception: > {code:sql} > SELECT date '1999 08 01' > 1999-01-01 > {code} > Supported date formats: > https://github.com/apache/spark/blob/ab8710b57916a129fcb89464209361120d224535/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L365-L374 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28218) Migrate Avro to File source V2
[ https://issues.apache.org/jira/browse/SPARK-28218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28218. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25017 [https://github.com/apache/spark/pull/25017] > Migrate Avro to File source V2 > -- > > Key: SPARK-28218 > URL: https://issues.apache.org/jira/browse/SPARK-28218 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28218) Migrate Avro to File source V2
[ https://issues.apache.org/jira/browse/SPARK-28218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28218: - Assignee: Gengliang Wang > Migrate Avro to File source V2 > -- > > Key: SPARK-28218 > URL: https://issues.apache.org/jira/browse/SPARK-28218 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28248) Upgrade docker image and library for PostgreSQL integration test
[ https://issues.apache.org/jira/browse/SPARK-28248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28248: -- Summary: Upgrade docker image and library for PostgreSQL integration test (was: Upgrade Postgres docker image) > Upgrade docker image and library for PostgreSQL integration test > > > Key: SPARK-28248 > URL: https://issues.apache.org/jira/browse/SPARK-28248 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28248) Upgrade docker image and library for PostgreSQL integration test
[ https://issues.apache.org/jira/browse/SPARK-28248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28248. --- Resolution: Fixed Assignee: Yuming Wang Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/25050 > Upgrade docker image and library for PostgreSQL integration test > > > Key: SPARK-28248 > URL: https://issues.apache.org/jira/browse/SPARK-28248 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28248) Upgrade Postgres docker image
[ https://issues.apache.org/jira/browse/SPARK-28248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28248: -- Priority: Minor (was: Major) > Upgrade Postgres docker image > - > > Key: SPARK-28248 > URL: https://issues.apache.org/jira/browse/SPARK-28248 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21067: Assignee: Apache Spark > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0, 2.4.0, 2.4.3 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard >Assignee: Apache Spark >Priority: Major > Attachments: SPARK-21067.patch > > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at
[jira] [Assigned] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
[ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21067: Assignee: (was: Apache Spark) > Thrift Server - CTAS fail with Unable to move source > > > Key: SPARK-21067 > URL: https://issues.apache.org/jira/browse/SPARK-21067 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1, 2.2.0, 2.4.0, 2.4.3 > Environment: Yarn > Hive MetaStore > HDFS (HA) >Reporter: Dominic Ricard >Priority: Major > Attachments: SPARK-21067.patch > > > After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS > would fail, sometimes... > Most of the time, the CTAS would work only once, after starting the thrift > server. After that, dropping the table and re-issuing the same CTAS would > fail with the following message (Sometime, it fails right away, sometime it > work for a long period of time): > {noformat} > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > We have already found the following Jira > (https://issues.apache.org/jira/browse/SPARK-11021) which state that the > {{hive.exec.stagingdir}} had to be added in order for Spark to be able to > handle CREATE TABLE properly as of 2.0. As you can see in the error, we have > ours set to "/tmp/hive-staging/\{user.name\}" > Same issue with INSERT statements: > {noformat} > CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE > dricard.test SELECT 1; > Error: org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0 > to destination > hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > (state=,code=0) > {noformat} > This worked fine in 1.6.2, which we currently run in our Production > Environment but since 2.0+, we haven't been able to CREATE TABLE consistently > on the cluster. > SQL to reproduce issue: > {noformat} > DROP SCHEMA IF EXISTS dricard CASCADE; > CREATE SCHEMA dricard; > CREATE TABLE dricard.test (col1 int); > INSERT INTO TABLE dricard.test SELECT 1; > SELECT * from dricard.test; > DROP TABLE dricard.test; > CREATE TABLE dricard.test AS select 1 as `col1`; > SELECT * from dricard.test > {noformat} > Thrift server usually fails at INSERT... > Tried the same procedure in a spark context using spark.sql() and didn't > encounter the same issue. > Full stack Trace: > {noformat} > 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error > executing query, currentState RUNNING, > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source > hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0 > to desti > nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221) > at > org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.Dataset.(Dataset.scala:185) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) > at
[jira] [Assigned] (SPARK-28239) Allow TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28239: Assignee: (was: Apache Spark) > Allow TCP connections created by shuffle service auto close on YARN > NodeManagers > > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > Configs: > spark.shuffle.service.enabled=true >Reporter: Deegue >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When executing shuffle tasks, TCP connections(on port 7337 by default) will > be established by shuffle service. > It will like: > !screenshot-1.png! > However, some of the TCP connections are still busy when the task is actually > finished. These connections won't close automatically until we restart the > NodeManager process. > Connections pile up and NodeManagers are getting slower and slower. > !screenshot-2.png! > These unclosed TCP connections stay busy and it seem doesn't take effect when > I set ChannelOption.SO_KEEPALIVE to true according to > [SPARK-23182|https://github.com/apache/spark/pull/20512]. > So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which > our cluster(running 1+ jobs / day) is processing normally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28239) Allow TCP connections created by shuffle service auto close on YARN NodeManagers
[ https://issues.apache.org/jira/browse/SPARK-28239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28239: Assignee: Apache Spark > Allow TCP connections created by shuffle service auto close on YARN > NodeManagers > > > Key: SPARK-28239 > URL: https://issues.apache.org/jira/browse/SPARK-28239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, YARN >Affects Versions: 2.4.0 > Environment: Hadoop2.6.0-CDH5.8.3(netty3) > Spark2.4.0(netty4) > Configs: > spark.shuffle.service.enabled=true >Reporter: Deegue >Assignee: Apache Spark >Priority: Minor > Attachments: screenshot-1.png, screenshot-2.png > > > When executing shuffle tasks, TCP connections(on port 7337 by default) will > be established by shuffle service. > It will like: > !screenshot-1.png! > However, some of the TCP connections are still busy when the task is actually > finished. These connections won't close automatically until we restart the > NodeManager process. > Connections pile up and NodeManagers are getting slower and slower. > !screenshot-2.png! > These unclosed TCP connections stay busy and it seem doesn't take effect when > I set ChannelOption.SO_KEEPALIVE to true according to > [SPARK-23182|https://github.com/apache/spark/pull/20512]. > So the solution is setting ChannelOption.AUTO_CLOSE to true, and after which > our cluster(running 1+ jobs / day) is processing normally. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28248) Upgrade Postgres docker image
[ https://issues.apache.org/jira/browse/SPARK-28248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-28248: Summary: Upgrade Postgres docker image (was: Upgrade DB2 and Postgres docker image) > Upgrade Postgres docker image > - > > Key: SPARK-28248 > URL: https://issues.apache.org/jira/browse/SPARK-28248 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28256) Failed to initialize FileContextBasedCheckpointFileManager with uri without authority
[ https://issues.apache.org/jira/browse/SPARK-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28256: Assignee: Apache Spark > Failed to initialize FileContextBasedCheckpointFileManager with uri without > authority > - > > Key: SPARK-28256 > URL: https://issues.apache.org/jira/browse/SPARK-28256 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Genmao Yu >Assignee: Apache Spark >Priority: Minor > > reproduce code > {code:sql} > CREATE TABLE `user_click_count` (`userId` STRING, `click` BIGINT) > USING org.apache.spark.sql.json > OPTIONS (path 'hdfs:///tmp/test'); > {code} > error: > {code:java} > java.lang.RuntimeException: java.lang.reflect.InvocationTargetException > at > org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136) > at > org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165) > at > org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250) > at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342) > at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911) > at > org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339) > at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456) > at > org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297) > at > org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189) > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63) > at > org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46) > at > org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85) > at > org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98) > at > org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297) > at > org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379) > ... > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:134) > ... 67 more > Caused by: org.apache.hadoop.HadoopIllegalArgumentException: Uri without > authority: hdfs:/tmp/test/_spark_metadata > at > org.apache.hadoop.fs.AbstractFileSystem.getUri(AbstractFileSystem.java:313) > at > org.apache.hadoop.fs.AbstractFileSystem.(AbstractFileSystem.java:266) > at org.apache.hadoop.fs.Hdfs.(Hdfs.java:80) > ... 72 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28256) Failed to initialize FileContextBasedCheckpointFileManager with uri without authority
[ https://issues.apache.org/jira/browse/SPARK-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28256: Assignee: (was: Apache Spark) > Failed to initialize FileContextBasedCheckpointFileManager with uri without > authority > - > > Key: SPARK-28256 > URL: https://issues.apache.org/jira/browse/SPARK-28256 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Genmao Yu >Priority: Minor > > reproduce code > {code:sql} > CREATE TABLE `user_click_count` (`userId` STRING, `click` BIGINT) > USING org.apache.spark.sql.json > OPTIONS (path 'hdfs:///tmp/test'); > {code} > error: > {code:java} > java.lang.RuntimeException: java.lang.reflect.InvocationTargetException > at > org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136) > at > org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165) > at > org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250) > at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342) > at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911) > at > org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339) > at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456) > at > org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297) > at > org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189) > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63) > at > org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46) > at > org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85) > at > org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98) > at > org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297) > at > org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379) > ... > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:134) > ... 67 more > Caused by: org.apache.hadoop.HadoopIllegalArgumentException: Uri without > authority: hdfs:/tmp/test/_spark_metadata > at > org.apache.hadoop.fs.AbstractFileSystem.getUri(AbstractFileSystem.java:313) > at > org.apache.hadoop.fs.AbstractFileSystem.(AbstractFileSystem.java:266) > at org.apache.hadoop.fs.Hdfs.(Hdfs.java:80) > ... 72 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28256) Failed to initialize FileContextBasedCheckpointFileManager with uri without authority
[ https://issues.apache.org/jira/browse/SPARK-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated SPARK-28256: -- Description: reproduce code {code:sql} CREATE TABLE `user_click_count` (`userId` STRING, `click` BIGINT) USING org.apache.spark.sql.json OPTIONS (path 'hdfs:///tmp/test'); {code} error: {code:java} java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136) at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165) at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911) at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339) at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189) at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63) at org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46) at org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85) at org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98) at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379) ... Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:134) ... 67 more Caused by: org.apache.hadoop.HadoopIllegalArgumentException: Uri without authority: hdfs:/tmp/test/_spark_metadata at org.apache.hadoop.fs.AbstractFileSystem.getUri(AbstractFileSystem.java:313) at org.apache.hadoop.fs.AbstractFileSystem.(AbstractFileSystem.java:266) at org.apache.hadoop.fs.Hdfs.(Hdfs.java:80) ... 72 more {code} was: {code:java} java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136) at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165) at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911) at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339) at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189) at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63) at org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46) at org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85) at org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98) at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379) ... Caused by: java.lang.reflect.InvocationTargetException at
[jira] [Updated] (SPARK-28256) Failed to initialize FileContextBasedCheckpointFileManager with uri without authority
[ https://issues.apache.org/jira/browse/SPARK-28256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated SPARK-28256: -- Description: {code:java} java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136) at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165) at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911) at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339) at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189) at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63) at org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46) at org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85) at org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98) at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379) ... Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:134) ... 67 more Caused by: org.apache.hadoop.HadoopIllegalArgumentException: Uri without authority: hdfs:/tmp/test/_spark_metadata at org.apache.hadoop.fs.AbstractFileSystem.getUri(AbstractFileSystem.java:313) at org.apache.hadoop.fs.AbstractFileSystem.(AbstractFileSystem.java:266) at org.apache.hadoop.fs.Hdfs.(Hdfs.java:80) ... 72 more {code} was: {code} java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136) at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165) at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911) at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339) at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189) at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63) at org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46) at org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85) at org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98) at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379) ... Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at
[jira] [Created] (SPARK-28256) Failed to initialize FileContextBasedCheckpointFileManager with uri without authority
Genmao Yu created SPARK-28256: - Summary: Failed to initialize FileContextBasedCheckpointFileManager with uri without authority Key: SPARK-28256 URL: https://issues.apache.org/jira/browse/SPARK-28256 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.0.0 Reporter: Genmao Yu {code} java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:136) at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:165) at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:250) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342) at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1911) at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339) at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:456) at org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.(CheckpointFileManager.scala:297) at org.apache.spark.sql.execution.streaming.CheckpointFileManager$.create(CheckpointFileManager.scala:189) at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.(HDFSMetadataLog.scala:63) at org.apache.spark.sql.execution.streaming.CompactibleFileStreamLog.(CompactibleFileStreamLog.scala:46) at org.apache.spark.sql.execution.streaming.FileStreamSinkLog.(FileStreamSinkLog.scala:85) at org.apache.spark.sql.execution.streaming.FileStreamSink.(FileStreamSink.scala:98) at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:297) at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:379) ... Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:134) ... 67 more Caused by: org.apache.hadoop.HadoopIllegalArgumentException: Uri without authority: hdfs:/tmp/test13/_spark_metadata at org.apache.hadoop.fs.AbstractFileSystem.getUri(AbstractFileSystem.java:313) at org.apache.hadoop.fs.AbstractFileSystem.(AbstractFileSystem.java:266) at org.apache.hadoop.fs.Hdfs.(Hdfs.java:80) ... 72 more {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org