[jira] [Updated] (SPARK-28930) Spark DESC FORMATTED TABLENAME information display issues
[ https://issues.apache.org/jira/browse/SPARK-28930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-28930: - Description: Spark DESC FORMATTED TABLENAME information display issues.Showing incorrect *Last Access time and* feeling some information displays can make it better. Test steps: 1. Open spark sql 2. Create table with partition CREATE EXTERNAL TABLE IF NOT EXISTS employees_info_extended ( id INT, name STRING, usd_flag STRING, salary DOUBLE, deductions MAP, address STRING ) PARTITIONED BY (entrytime STRING) STORED AS TEXTFILE location 'hdfs://hacluster/user/sparkhive/warehouse'; 3. from spark sql check the table description desc formatted tablename; 4. From scala shell check the table description sql("desc formatted tablename").show() *Issue1:* If there is no comment for spark scala shell shows *"null" in small letters* but all other places Hive beeline/Spark beeline/Spark SQL it is showing in *CAPITAL "NULL*". Better to show same in all places. {code:java} *scala>* sql("desc formatted employees_info_extended").show(false); +-+---++--- |col_name|data_type|*comment*| +-+---++--- |id|int|*null*| |name|string|*null*| |usd_flag|string|*null*| |salary|double|*null*| |deductions|map|*null*| |address|string|null| |entrytime|string|null| | # Partition Information| | | | # col_name|data_type|comment| |entrytime|string|null| | | | | | # Detailed Table Information| | | |Database|sparkdb__| | |Table|employees_info_extended| | |Owner|root| | *|Created Time |Tue Aug 20 13:42:06 CST 2019| |* *|Last Access |Thu Jan 01 08:00:00 CST 1970| |* |Created By|Spark 2.4.3| | |Type|EXTERNAL| | |Provider|hive| | +-+---++--- only showing top 20 rows *scala>* {code} *Issue 2:* Spark SQL "desc formatted tablename" is not showing the header [# col_name,data_type,comment|#col_name,data_type,comment] in the top of the query result.But header is showing on top of partition description. For Better understanding show the header on Top of the query result.Other than in spark sql ,we are able to see the header like [# col_name,data_type,comment|#col_name,data_type,comment] in spark-beeline & hive beeline . {code:java} *spark-sql>* desc formatted employees_info_extended1; id int *NULL* name string *NULL* usd_flag string NULL salary double NULL deductions map NULL address string NULL entrytime string NULL * ## Partition Information* ## col_name data_type comment* entrytime string *NULL* # Detailed Table Information Database sparkdb__ Table employees_info_extended1 Owner spark *Created Time Tue Aug 20 14:50:37 CST 2019* *Last Access Thu Jan 01 08:00:00 CST 1970* Created By Spark 2.3.2.0201 Type EXTERNAL Provider hive Table Properties [transient_lastDdlTime=1566286655] Location hdfs://hacluster/user/sparkhive/warehouse Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat org.apache.hadoop.mapred.TextInputFormat OutputFormat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Storage Properties [serialization.format=1] Partition Provider Catalog Time taken: 0.477 seconds, Fetched 27 row(s) *spark-sql>* This is the spark-beeline which is showing the headers 0: jdbc:hive2://10.186.60.158:23040/default> desc formatted employees; +---+-+--+--+ | col_name|data_type | comment | +---+-+--+--+ | name | string | Employee name| | salary| float | Employee salary | | | | | {code} *Issue 3:* I created the table on Aug 20.So it is showing created time correct .*But Last access time showing 1970 Jan 01*. It is not good to show Last access time earlier time than the created time.Better to show the correct date and time else show UNKNOWN. *[Created Time,Tue Aug 20 13:42:06 CST 2019,]* *[Last Access,Thu Jan 01 08:00:00 CST 1970,]* was: Spark DESC FORMATTED TABLENAME information display issues.Showing incorrect *Last Access time and* feeling some information displays can make it better. Test steps: 1. Open spark sql 2. Create table with partition CREATE EXTERNAL TABLE IF NOT EXISTS employees_info_extended ( id INT, name
[jira] [Updated] (SPARK-28930) Spark DESC FORMATTED TABLENAME information display issues
[ https://issues.apache.org/jira/browse/SPARK-28930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jobit mathew updated SPARK-28930: - Description: Spark DESC FORMATTED TABLENAME information display issues.Showing incorrect *Last Access time and* feeling some information displays can make it better. Test steps: 1. Open spark sql 2. Create table with partition CREATE EXTERNAL TABLE IF NOT EXISTS employees_info_extended ( id INT, name STRING, usd_flag STRING, salary DOUBLE, deductions MAP, address STRING ) PARTITIONED BY (entrytime STRING) STORED AS TEXTFILE location 'hdfs://hacluster/user/sparkhive/warehouse'; 3. from spark sql check the table description desc formatted tablename; 4. From scala shell check the table description sql("desc formatted tablename").show() *Issue1:* If there is no comment for spark scala shell shows *"null" in small letters* but all other places Hive beeline/Spark beeline/Spark SQL it is showing in *CAPITAL "NULL*". Better to show same in all places. {code:java} *scala>* sql("desc formatted employees_info_extended").show(false); +-+---++--- |col_name|data_type|*comment*| +-+---++--- |id|int|*null*| |name|string|*null*| |usd_flag|string|*null*| |salary|double|*null*| |deductions|map|*null*| |address|string|null| |entrytime|string|null| | # Partition Information| | | | # col_name|data_type|comment| |entrytime|string|null| | | | | | # Detailed Table Information| | | |Database|sparkdb__| | |Table|employees_info_extended| | |Owner|root| | *|Created Time |Tue Aug 20 13:42:06 CST 2019| |* *|Last Access |Thu Jan 01 08:00:00 CST 1970| |* |Created By|Spark 2.4.3| | |Type|EXTERNAL| | |Provider|hive| | +-+---++--- only showing top 20 rows *scala>* {code} *Issue 2:* Spark SQL "desc formatted tablename" is not showing the header [# col_name,data_type,comment|#col_name,data_type,comment] in the top of the query result.But header is showing on top of partition description. For Better understanding show the header on Top of the query result.Other than in spark sql ,we are able to see the header like [# col_name,data_type,comment|#col_name,data_type,comment] in spark-beeline & hive beeline . {code:java} *spark-sql>* desc formatted employees_info_extended1; id int *NULL* name string *NULL* usd_flag string NULL salary double NULL deductions map NULL address string NULL entrytime string NULL * ## Partition Information* ## col_name data_type comment* entrytime string *NULL* # Detailed Table Information Database sparkdb__ Table employees_info_extended1 Owner spark *Created Time Tue Aug 20 14:50:37 CST 2019* *Last Access Thu Jan 01 08:00:00 CST 1970* Created By Spark 2.3.2.0201 Type EXTERNAL Provider hive Table Properties [transient_lastDdlTime=1566286655] Location hdfs://hacluster/user/sparkhive/warehouse Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat org.apache.hadoop.mapred.TextInputFormat OutputFormat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Storage Properties [serialization.format=1] Partition Provider Catalog Time taken: 0.477 seconds, Fetched 27 row(s) *spark-sql>* this is the spark-beeline which is showing the headers 0: jdbc:hive2://10.186.60.158:23040/default> desc formatted employees; +---+-+--+--+ | col_name|data_type | comment | +---+-+--+--+ | name | string | Employee name| | salary| float | Employee salary | | | | | | # Detailed Table Information | | | | Database | sparkdb__ | | | Table | employees | | | Owner | spark | | | Created Time | Mon Aug 26 15:25:01 CST 2019 |
[jira] [Assigned] (SPARK-28573) Convert InsertIntoTable(HiveTableRelation) to Datasource inserting for partitioned table
[ https://issues.apache.org/jira/browse/SPARK-28573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-28573: --- Assignee: Xianjin YE > Convert InsertIntoTable(HiveTableRelation) to Datasource inserting for > partitioned table > > > Key: SPARK-28573 > URL: https://issues.apache.org/jira/browse/SPARK-28573 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xianjin YE >Assignee: Xianjin YE >Priority: Major > > Currently we don't translate InsertInto(HiveTableRelation) to DataSource > insertion when partitioned table is involved, the reason is that, quote from > the comments: > {quote}// Inserting into partitioned table is not supported in Parquet/Orc > data source (yet). > {quote} > > which doesn't hold any more. Since datasource table dynamic partition insert > now supports > dynamic mode (SPARK-20236). I think it's worthy to translate > InsertIntoTable(HiveTableRelation) to datasource table. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28573) Convert InsertIntoTable(HiveTableRelation) to Datasource inserting for partitioned table
[ https://issues.apache.org/jira/browse/SPARK-28573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-28573. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25306 [https://github.com/apache/spark/pull/25306] > Convert InsertIntoTable(HiveTableRelation) to Datasource inserting for > partitioned table > > > Key: SPARK-28573 > URL: https://issues.apache.org/jira/browse/SPARK-28573 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xianjin YE >Assignee: Xianjin YE >Priority: Major > Fix For: 3.0.0 > > > Currently we don't translate InsertInto(HiveTableRelation) to DataSource > insertion when partitioned table is involved, the reason is that, quote from > the comments: > {quote}// Inserting into partitioned table is not supported in Parquet/Orc > data source (yet). > {quote} > > which doesn't hold any more. Since datasource table dynamic partition insert > now supports > dynamic mode (SPARK-20236). I think it's worthy to translate > InsertIntoTable(HiveTableRelation) to datasource table. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28942) [Spark][WEB UI]Spark in local mode hostname display localhost in the Host Column of Task Summary Page
[ https://issues.apache.org/jira/browse/SPARK-28942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ABHISHEK KUMAR GUPTA updated SPARK-28942: - Summary: [Spark][WEB UI]Spark in local mode hostname display localhost in the Host Column of Task Summary Page (was: Spark in local mode hostname display localhost in the Host Column of Task Summary Page) > [Spark][WEB UI]Spark in local mode hostname display localhost in the Host > Column of Task Summary Page > - > > Key: SPARK-28942 > URL: https://issues.apache.org/jira/browse/SPARK-28942 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > In the stage page under Task Summary Page Host Column shows 'localhost' > instead of showing host IP or host name mentioned against the Driver Host Name > Steps: > spark-shell --master local > create table emp(id int); > insert into emp values(100); > select * from emp; > Go to Stage UI page and check the Task Summary Page. > Host column will display 'localhost' instead the driver host. > > Note in case of spark-shell --master yarn mode UI display correct host name > under the column. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28372) Document Spark WEB UI
[ https://issues.apache.org/jira/browse/SPARK-28372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921151#comment-16921151 ] zhengruifeng commented on SPARK-28372: -- [~smilegator] I think we may need to add a subtask for streaming? As [~planga82] suggested. > Document Spark WEB UI > - > > Key: SPARK-28372 > URL: https://issues.apache.org/jira/browse/SPARK-28372 > Project: Spark > Issue Type: Umbrella > Components: Documentation, Web UI >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > Spark web UIs are being used to monitor the status and resource consumption > of your Spark applications and clusters. However, we do not have the > corresponding document. It is hard for end users to use and understand them. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28373) Document JDBC/ODBC Server page
[ https://issues.apache.org/jira/browse/SPARK-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921138#comment-16921138 ] zhengruifeng commented on SPARK-28373: -- [~planga82] Thanks!:D > Document JDBC/ODBC Server page > -- > > Key: SPARK-28373 > URL: https://issues.apache.org/jira/browse/SPARK-28373 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Web UI >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > !https://user-images.githubusercontent.com/5399861/60809590-9dcf2500-a1bd-11e9-826e-33729bb97daf.png|width=1720,height=503! > > [https://github.com/apache/spark/pull/25062] added a new column CLOSE TIME > and EXECUTION TIME. It is hard to understand the difference. We need to > document them; otherwise, it is hard for end users to understand them > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-28953) Integration tests fail due to malformed URL
[ https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-28953: Comment: was deleted (was: What is your Hadoop version and JDK version? Please see SPARK-27177 and SPARK-28693 for more details.) > Integration tests fail due to malformed URL > --- > > Key: SPARK-28953 > URL: https://issues.apache.org/jira/browse/SPARK-28953 > Project: Spark > Issue Type: Bug > Components: jenkins, Kubernetes >Affects Versions: 3.0.0 >Reporter: Stavros Kontopoulos >Priority: Major > > Tests failed on Ubuntu, verified on two different machines: > KubernetesSuite: > - Launcher client dependencies *** FAILED *** > java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706 > at java.net.URL.(URL.java:600) > at java.net.URL.(URL.java:497) > at java.net.URL.(URL.java:446) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT > /_/ > > Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222) > Type in expressions to have them evaluated. > Type :help for more information. > > scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service > ceph-nano-s3 -n spark --url") > pb: ProcessBuilder = java.lang.ProcessBuilder@46092840 > scala> pb.redirectErrorStream(true) > res0: ProcessBuilder = java.lang.ProcessBuilder@46092840 > scala> val proc = pb.start() > proc: Process = java.lang.UNIXProcess@5e9650d3 > scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream()) > r: String = > "* http://172.31.46.91:30706 > " > Although (no asterisk): > $ minikube service ceph-nano-s3 -n spark --url > [http://172.31.46.91:30706|http://172.31.46.91:30706/] > > This is weird because it fails at the java level, where does the asterisk > come from? > $ minikube version > minikube version: v1.3.1 > commit: ca60a424ce69a4d79f502650199ca2b52f29e631 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28953) Integration tests fail due to malformed URL
[ https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921115#comment-16921115 ] Yuming Wang edited comment on SPARK-28953 at 9/3/19 2:31 AM: - What is your Hadoop version and JDK version? Please see SPARK-27177 and SPARK-28693 for more details. was (Author: q79969786): What is you Hadoop version and JDK version? Please see SPARK-27177 and SPARK-28693 for more details. > Integration tests fail due to malformed URL > --- > > Key: SPARK-28953 > URL: https://issues.apache.org/jira/browse/SPARK-28953 > Project: Spark > Issue Type: Bug > Components: jenkins, Kubernetes >Affects Versions: 3.0.0 >Reporter: Stavros Kontopoulos >Priority: Major > > Tests failed on Ubuntu, verified on two different machines: > KubernetesSuite: > - Launcher client dependencies *** FAILED *** > java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706 > at java.net.URL.(URL.java:600) > at java.net.URL.(URL.java:497) > at java.net.URL.(URL.java:446) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT > /_/ > > Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222) > Type in expressions to have them evaluated. > Type :help for more information. > > scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service > ceph-nano-s3 -n spark --url") > pb: ProcessBuilder = java.lang.ProcessBuilder@46092840 > scala> pb.redirectErrorStream(true) > res0: ProcessBuilder = java.lang.ProcessBuilder@46092840 > scala> val proc = pb.start() > proc: Process = java.lang.UNIXProcess@5e9650d3 > scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream()) > r: String = > "* http://172.31.46.91:30706 > " > Although (no asterisk): > $ minikube service ceph-nano-s3 -n spark --url > [http://172.31.46.91:30706|http://172.31.46.91:30706/] > > This is weird because it fails at the java level, where does the asterisk > come from? > $ minikube version > minikube version: v1.3.1 > commit: ca60a424ce69a4d79f502650199ca2b52f29e631 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28953) Integration tests fail due to malformed URL
[ https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921115#comment-16921115 ] Yuming Wang commented on SPARK-28953: - What is you Hadoop version and JDK version? Please see SPARK-27177 and SPARK-28693 for more details. > Integration tests fail due to malformed URL > --- > > Key: SPARK-28953 > URL: https://issues.apache.org/jira/browse/SPARK-28953 > Project: Spark > Issue Type: Bug > Components: jenkins, Kubernetes >Affects Versions: 3.0.0 >Reporter: Stavros Kontopoulos >Priority: Major > > Tests failed on Ubuntu, verified on two different machines: > KubernetesSuite: > - Launcher client dependencies *** FAILED *** > java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706 > at java.net.URL.(URL.java:600) > at java.net.URL.(URL.java:497) > at java.net.URL.(URL.java:446) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT > /_/ > > Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222) > Type in expressions to have them evaluated. > Type :help for more information. > > scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service > ceph-nano-s3 -n spark --url") > pb: ProcessBuilder = java.lang.ProcessBuilder@46092840 > scala> pb.redirectErrorStream(true) > res0: ProcessBuilder = java.lang.ProcessBuilder@46092840 > scala> val proc = pb.start() > proc: Process = java.lang.UNIXProcess@5e9650d3 > scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream()) > r: String = > "* http://172.31.46.91:30706 > " > Although (no asterisk): > $ minikube service ceph-nano-s3 -n spark --url > [http://172.31.46.91:30706|http://172.31.46.91:30706/] > > This is weird because it fails at the java level, where does the asterisk > come from? > $ minikube version > minikube version: v1.3.1 > commit: ca60a424ce69a4d79f502650199ca2b52f29e631 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28955) Support for LocalDateTime semantics
Bill Schneider created SPARK-28955: -- Summary: Support for LocalDateTime semantics Key: SPARK-28955 URL: https://issues.apache.org/jira/browse/SPARK-28955 Project: Spark Issue Type: Wish Components: SQL Affects Versions: 2.3.0 Reporter: Bill Schneider It would be great if Spark supported local times in DataFrames, rather than only instants. The specific use case I have in mind is something like * parse "2019-01-01 17:00" (no timezone) from CSV -> LocalDateTime in dataframe * save to Parquet: LocalDateTime is stored with same integer value as 2019-01-01 17:00 UTC, but with isAdjustedToUTC=false. (Currently Spark saves either INT96 or TIME_MILLIS/TIME_MICROS which has isAdjustedToUTC=true) -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28954) For SparkCLI, start up with conf of HIVEAUXJARS, we add jar with SessionStateResourceLoader's addJar() API
[ https://issues.apache.org/jira/browse/SPARK-28954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-28954: -- Description: When startup SparkSQL CLI . For extra jar passed through hive conf {{HiveConf.ConfVars.HIVEAUXJARS}}, we don't need to use complex APIs to fix different hive version problem, we just can handle it through spark's SessionResourceLoader's API. add jar to Spark and SparkSession's running env. *SessionResourceLoader api* : {code:java} val resourceLoader = SparkSQLEnv.sqlContext.sessionState.resourceLoader StringUtils.split(auxJars, ",").foreach(resourceLoader.addJar(_)) {code} *v1.2.1ThriftServerShimUtils*: {code:java} private[thriftserver] def addToClassPath( loader: ClassLoader, auxJars: Array[String]): ClassLoader = { Utilities.addToClassPath(loader, auxJars) } {code} *v2.3.5ThriftServerShimUtils*: {code:java} private[thriftserver] def addToClassPath( loader: ClassLoader, auxJars: Array[String]): ClassLoader = { val addAction = new AddToClassPathAction(loader, auxJars.toList.asJava) AccessController.doPrivileged(addAction) } {code} > For SparkCLI, start up with conf of HIVEAUXJARS, we add jar with > SessionStateResourceLoader's addJar() API > -- > > Key: SPARK-28954 > URL: https://issues.apache.org/jira/browse/SPARK-28954 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: angerszhu >Priority: Major > > When startup SparkSQL CLI . > For extra jar passed through hive conf {{HiveConf.ConfVars.HIVEAUXJARS}}, we > don't need to use complex APIs to fix different hive version problem, we just > can handle it through spark's SessionResourceLoader's API. add jar to Spark > and SparkSession's running env. > *SessionResourceLoader api* : > {code:java} > val resourceLoader = SparkSQLEnv.sqlContext.sessionState.resourceLoader > StringUtils.split(auxJars, ",").foreach(resourceLoader.addJar(_)) > {code} > *v1.2.1ThriftServerShimUtils*: > {code:java} > private[thriftserver] def addToClassPath( > loader: ClassLoader, > auxJars: Array[String]): ClassLoader = { > Utilities.addToClassPath(loader, auxJars) > } > {code} > *v2.3.5ThriftServerShimUtils*: > {code:java} > private[thriftserver] def addToClassPath( > loader: ClassLoader, > auxJars: Array[String]): ClassLoader = { > val addAction = new AddToClassPathAction(loader, auxJars.toList.asJava) > AccessController.doPrivileged(addAction) > } > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28954) For SparkCLI, start up with conf of HIVEAUXJARS, we add jar with SessionStateResourceLoader's addJar() API
angerszhu created SPARK-28954: - Summary: For SparkCLI, start up with conf of HIVEAUXJARS, we add jar with SessionStateResourceLoader's addJar() API Key: SPARK-28954 URL: https://issues.apache.org/jira/browse/SPARK-28954 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0, 3.0.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28864) Add spark connector for Alibaba Log Service
[ https://issues.apache.org/jira/browse/SPARK-28864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921096#comment-16921096 ] Jungtaek Lim commented on SPARK-28864: -- I guess general recommendation of new connectors has been "Apache Bahir serves for this purpose", as Spark moved out existing connectors from Spark codebase to Bahir project and tries to keep only the first class of connectors in Spark codebase. Bahir project is here: [https://bahir.apache.org/] > Add spark connector for Alibaba Log Service > --- > > Key: SPARK-28864 > URL: https://issues.apache.org/jira/browse/SPARK-28864 > Project: Spark > Issue Type: New Feature > Components: Input/Output >Affects Versions: 3.0.0 >Reporter: Ke Li >Priority: Major > > Alibaba Log Service is a big data service which has been widely used in > Alibaba Group and thousands of customers of Alibaba Cloud. The core storage > engine of Log Service is named Loghub which is a large scale distributed > storage system which provides producer and consumer to push and pull data > like Kafka, AWS Kinesis and Azure Eventhub does. > There are a lot of users of Log Service are using Spark Streaming, Spark SQL > and Spark Structured Streaming to analysis data collected from both on > premise and cloud data sources. > Happy to hear any comments. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28864) Add spark source connector for Alibaba Log Service
[ https://issues.apache.org/jira/browse/SPARK-28864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Li updated SPARK-28864: -- Summary: Add spark source connector for Alibaba Log Service (was: Add spark source connector for Aliyun Log Service) > Add spark source connector for Alibaba Log Service > -- > > Key: SPARK-28864 > URL: https://issues.apache.org/jira/browse/SPARK-28864 > Project: Spark > Issue Type: New Feature > Components: Input/Output >Affects Versions: 3.0.0 >Reporter: Ke Li >Priority: Major > > Alibaba Log Service is a big data service which has been widely used in > Alibaba Group and thousands of customers of Alibaba Cloud. The core storage > engine of Log Service is named Loghub which is a large scale distributed > storage system which provides producer and consumer to push and pull data > like Kafka, AWS Kinesis and Azure Eventhub does. > There are a lot of users of Log Service are using Spark Streaming, Spark SQL > and Spark Structured Streaming to analysis data collected from both on > premise and cloud data sources. > Happy to hear any comments. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28864) Add spark connector for Alibaba Log Service
[ https://issues.apache.org/jira/browse/SPARK-28864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Li updated SPARK-28864: -- Summary: Add spark connector for Alibaba Log Service (was: Add spark source connector for Alibaba Log Service) > Add spark connector for Alibaba Log Service > --- > > Key: SPARK-28864 > URL: https://issues.apache.org/jira/browse/SPARK-28864 > Project: Spark > Issue Type: New Feature > Components: Input/Output >Affects Versions: 3.0.0 >Reporter: Ke Li >Priority: Major > > Alibaba Log Service is a big data service which has been widely used in > Alibaba Group and thousands of customers of Alibaba Cloud. The core storage > engine of Log Service is named Loghub which is a large scale distributed > storage system which provides producer and consumer to push and pull data > like Kafka, AWS Kinesis and Azure Eventhub does. > There are a lot of users of Log Service are using Spark Streaming, Spark SQL > and Spark Structured Streaming to analysis data collected from both on > premise and cloud data sources. > Happy to hear any comments. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)
[ https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-28921: -- Fix Version/s: 2.4.5 > Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, > 1.12.10, 1.11.10) > --- > > Key: SPARK-28921 > URL: https://issues.apache.org/jira/browse/SPARK-28921 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.3, 2.4.3 >Reporter: Paul Schweigert >Assignee: Andy Grove >Priority: Major > Fix For: 2.4.5, 3.0.0 > > > Spark jobs are failing on latest versions of Kubernetes when jobs attempt to > provision executor pods (jobs like Spark-Pi that do not launch executors run > without a problem): > > Here's an example error message: > > {code:java} > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes. > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: > HTTP 403, Status: 403 - > java.net.ProtocolException: Expected HTTP 101 response but was '403 > Forbidden' > at > okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > {code} > > Looks like the issue is caused by fixes for a recent CVE : > CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809] > Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669] > > Looks like upgrading kubernetes-client to 4.4.2 would solve this issue. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28953) Integration tests fail due to malformed URL
[ https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921087#comment-16921087 ] Stavros Kontopoulos edited comment on SPARK-28953 at 9/3/19 12:04 AM: -- [~shaneknapp] [~eje] I can fix this since I am working on: SPARK-27936 but im wondering about the root cause. was (Author: skonto): [~shaneknapp] [~eje] I can fix this since I am working on: SPARK-27936 but im wondering of the root cause. > Integration tests fail due to malformed URL > --- > > Key: SPARK-28953 > URL: https://issues.apache.org/jira/browse/SPARK-28953 > Project: Spark > Issue Type: Bug > Components: jenkins, Kubernetes >Affects Versions: 3.0.0 >Reporter: Stavros Kontopoulos >Priority: Major > > Tests failed on Ubuntu, verified on two different machines: > KubernetesSuite: > - Launcher client dependencies *** FAILED *** > java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706 > at java.net.URL.(URL.java:600) > at java.net.URL.(URL.java:497) > at java.net.URL.(URL.java:446) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT > /_/ > > Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222) > Type in expressions to have them evaluated. > Type :help for more information. > > scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service > ceph-nano-s3 -n spark --url") > pb: ProcessBuilder = java.lang.ProcessBuilder@46092840 > scala> pb.redirectErrorStream(true) > res0: ProcessBuilder = java.lang.ProcessBuilder@46092840 > scala> val proc = pb.start() > proc: Process = java.lang.UNIXProcess@5e9650d3 > scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream()) > r: String = > "* http://172.31.46.91:30706 > " > Although (no asterisk): > $ minikube service ceph-nano-s3 -n spark --url > [http://172.31.46.91:30706|http://172.31.46.91:30706/] > > This is weird because it fails at the java level, where does the asterisk > come from? > $ minikube version > minikube version: v1.3.1 > commit: ca60a424ce69a4d79f502650199ca2b52f29e631 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28953) Integration tests fail due to malformed URL
[ https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921087#comment-16921087 ] Stavros Kontopoulos edited comment on SPARK-28953 at 9/3/19 12:03 AM: -- [~shaneknapp] [~eje] I can fix this since I am working on: SPARK-27936 but im wondering of the root cause. was (Author: skonto): [~shaneknapp] I can fix this since I am working on: SPARK-27936 but im wondering of the root cause. > Integration tests fail due to malformed URL > --- > > Key: SPARK-28953 > URL: https://issues.apache.org/jira/browse/SPARK-28953 > Project: Spark > Issue Type: Bug > Components: jenkins, Kubernetes >Affects Versions: 3.0.0 >Reporter: Stavros Kontopoulos >Priority: Major > > Tests failed on Ubuntu, verified on two different machines: > KubernetesSuite: > - Launcher client dependencies *** FAILED *** > java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706 > at java.net.URL.(URL.java:600) > at java.net.URL.(URL.java:497) > at java.net.URL.(URL.java:446) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT > /_/ > > Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222) > Type in expressions to have them evaluated. > Type :help for more information. > > scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service > ceph-nano-s3 -n spark --url") > pb: ProcessBuilder = java.lang.ProcessBuilder@46092840 > scala> pb.redirectErrorStream(true) > res0: ProcessBuilder = java.lang.ProcessBuilder@46092840 > scala> val proc = pb.start() > proc: Process = java.lang.UNIXProcess@5e9650d3 > scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream()) > r: String = > "* http://172.31.46.91:30706 > " > Although (no asterisk): > $ minikube service ceph-nano-s3 -n spark --url > [http://172.31.46.91:30706|http://172.31.46.91:30706/] > > This is weird because it fails at the java level, where does the asterisk > come from? > $ minikube version > minikube version: v1.3.1 > commit: ca60a424ce69a4d79f502650199ca2b52f29e631 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28953) Integration tests fail due to malformed URL
[ https://issues.apache.org/jira/browse/SPARK-28953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921087#comment-16921087 ] Stavros Kontopoulos commented on SPARK-28953: - [~shaneknapp] I can fix this since I am working on: SPARK-27936 but im wondering of the root cause. > Integration tests fail due to malformed URL > --- > > Key: SPARK-28953 > URL: https://issues.apache.org/jira/browse/SPARK-28953 > Project: Spark > Issue Type: Bug > Components: jenkins, Kubernetes >Affects Versions: 3.0.0 >Reporter: Stavros Kontopoulos >Priority: Major > > Tests failed on Ubuntu, verified on two different machines: > KubernetesSuite: > - Launcher client dependencies *** FAILED *** > java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706 > at java.net.URL.(URL.java:600) > at java.net.URL.(URL.java:497) > at java.net.URL.(URL.java:446) > at > org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) > > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT > /_/ > > Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222) > Type in expressions to have them evaluated. > Type :help for more information. > > scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service > ceph-nano-s3 -n spark --url") > pb: ProcessBuilder = java.lang.ProcessBuilder@46092840 > scala> pb.redirectErrorStream(true) > res0: ProcessBuilder = java.lang.ProcessBuilder@46092840 > scala> val proc = pb.start() > proc: Process = java.lang.UNIXProcess@5e9650d3 > scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream()) > r: String = > "* http://172.31.46.91:30706 > " > Although (no asterisk): > $ minikube service ceph-nano-s3 -n spark --url > [http://172.31.46.91:30706|http://172.31.46.91:30706/] > > This is weird because it fails at the java level, where does the asterisk > come from? > $ minikube version > minikube version: v1.3.1 > commit: ca60a424ce69a4d79f502650199ca2b52f29e631 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28953) Integration tests fail due to malformed URL
Stavros Kontopoulos created SPARK-28953: --- Summary: Integration tests fail due to malformed URL Key: SPARK-28953 URL: https://issues.apache.org/jira/browse/SPARK-28953 Project: Spark Issue Type: Bug Components: jenkins, Kubernetes Affects Versions: 3.0.0 Reporter: Stavros Kontopoulos Tests failed on Ubuntu, verified on two different machines: KubernetesSuite: - Launcher client dependencies *** FAILED *** java.net.MalformedURLException: no protocol: * http://172.31.46.91:30706 at java.net.URL.(URL.java:600) at java.net.URL.(URL.java:497) at java.net.URL.(URL.java:446) at org.apache.spark.deploy.k8s.integrationtest.DepsTestsSuite.$anonfun$$init$$1(DepsTestsSuite.scala:160) at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.0-SNAPSHOT /_/ Using Scala version 2.12.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222) Type in expressions to have them evaluated. Type :help for more information. scala> val pb = new ProcessBuilder().command("bash", "-c", "minikube service ceph-nano-s3 -n spark --url") pb: ProcessBuilder = java.lang.ProcessBuilder@46092840 scala> pb.redirectErrorStream(true) res0: ProcessBuilder = java.lang.ProcessBuilder@46092840 scala> val proc = pb.start() proc: Process = java.lang.UNIXProcess@5e9650d3 scala> val r = org.apache.commons.io.IOUtils.toString(proc.getInputStream()) r: String = "* http://172.31.46.91:30706 " Although (no asterisk): $ minikube service ceph-nano-s3 -n spark --url [http://172.31.46.91:30706|http://172.31.46.91:30706/] This is weird because it fails at the java level, where does the asterisk come from? $ minikube version minikube version: v1.3.1 commit: ca60a424ce69a4d79f502650199ca2b52f29e631 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24227) Not able to submit spark job to kubernetes on 2.3
[ https://issues.apache.org/jira/browse/SPARK-24227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921082#comment-16921082 ] Dongjoon Hyun edited comment on SPARK-24227 at 9/2/19 11:58 PM: Apache Spark 2.3.4 was the last release and `branch-2.3` becomes EOL. I'll resolve this issue as `Not A Problem`. Let's use the latest version with the proper K8s configuration as [~felipejfc] described in the above. was (Author: dongjoon): Apache Spark 2.3.4 was the last release and `branch-2.3` becomes EOL. I'll resolve this issue as `Not A Problem`. Please use the latest version with the proper K8s configuration. > Not able to submit spark job to kubernetes on 2.3 > - > > Key: SPARK-24227 > URL: https://issues.apache.org/jira/browse/SPARK-24227 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Felipe Cavalcanti >Priority: Major > > Hi, I'm trying to submit a spark job to kubernetes with no success, I > followed the steps @ > [https://spark.apache.org/docs/latest/running-on-kubernetes.html] with no > success, when I run: > > {code:java} > bin/spark-submit \ > --master k8s://https://${host}:${port} \ > --deploy-mode cluster \ > --name jaeger-spark \ > --class io.jaegertracing.spark.dependencies.DependenciesSparkJob \ > --conf spark.executor.instances=5 \ > --conf spark.kubernetes.container.image=bla/jaeger-deps-spark:latest\ > --conf spark.kubernetes.namespace=spark \ > local:///opt/spark/jars/jaeger-spark-dependencies-0.0.1-SNAPSHOT.jar > {code} > > Im getting the following stack trace: > {code:java} > 2018-05-09 17:06:02 WARN WatchConnectionManager:192 - Exec Failure > javax.net.ssl.SSLHandshakeException: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException: unable to find > valid certification path to requested target at > sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at > sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) at > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302) at > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at > sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514) > at > sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) > at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026) at > sun.security.ssl.Handshaker.process_record(Handshaker.java:961) at > sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062) at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at > okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281) > at > okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251) > at > okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151) > at > okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195) > at > okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121) > at > okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100) > at > okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:90) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at > okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) at >
[jira] [Resolved] (SPARK-24227) Not able to submit spark job to kubernetes on 2.3
[ https://issues.apache.org/jira/browse/SPARK-24227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-24227. --- Resolution: Not A Problem Apache Spark 2.3.4 was the last release and `branch-2.3` becomes EOL. I'll resolve this issue as `Not A Problem`. Please use the latest version with the proper K8s configuration. > Not able to submit spark job to kubernetes on 2.3 > - > > Key: SPARK-24227 > URL: https://issues.apache.org/jira/browse/SPARK-24227 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Felipe Cavalcanti >Priority: Major > > Hi, I'm trying to submit a spark job to kubernetes with no success, I > followed the steps @ > [https://spark.apache.org/docs/latest/running-on-kubernetes.html] with no > success, when I run: > > {code:java} > bin/spark-submit \ > --master k8s://https://${host}:${port} \ > --deploy-mode cluster \ > --name jaeger-spark \ > --class io.jaegertracing.spark.dependencies.DependenciesSparkJob \ > --conf spark.executor.instances=5 \ > --conf spark.kubernetes.container.image=bla/jaeger-deps-spark:latest\ > --conf spark.kubernetes.namespace=spark \ > local:///opt/spark/jars/jaeger-spark-dependencies-0.0.1-SNAPSHOT.jar > {code} > > Im getting the following stack trace: > {code:java} > 2018-05-09 17:06:02 WARN WatchConnectionManager:192 - Exec Failure > javax.net.ssl.SSLHandshakeException: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException: unable to find > valid certification path to requested target at > sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at > sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) at > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302) at > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at > sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514) > at > sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) > at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026) at > sun.security.ssl.Handshaker.process_record(Handshaker.java:961) at > sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062) at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at > okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281) > at > okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251) > at > okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151) > at > okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195) > at > okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121) > at > okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100) > at > okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:90) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at > okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) at > okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) Caused by: >
[jira] [Resolved] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)
[ https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28921. --- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25640 [https://github.com/apache/spark/pull/25640] > Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, > 1.12.10, 1.11.10) > --- > > Key: SPARK-28921 > URL: https://issues.apache.org/jira/browse/SPARK-28921 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.3, 2.4.3 >Reporter: Paul Schweigert >Assignee: Andy Grove >Priority: Major > Fix For: 3.0.0 > > > Spark jobs are failing on latest versions of Kubernetes when jobs attempt to > provision executor pods (jobs like Spark-Pi that do not launch executors run > without a problem): > > Here's an example error message: > > {code:java} > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes. > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: > HTTP 403, Status: 403 - > java.net.ProtocolException: Expected HTTP 101 response but was '403 > Forbidden' > at > okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > {code} > > Looks like the issue is caused by fixes for a recent CVE : > CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809] > Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669] > > Looks like upgrading kubernetes-client to 4.4.2 would solve this issue. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)
[ https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28921: - Assignee: Andy Grove > Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, > 1.12.10, 1.11.10) > --- > > Key: SPARK-28921 > URL: https://issues.apache.org/jira/browse/SPARK-28921 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.3, 2.4.3 >Reporter: Paul Schweigert >Assignee: Andy Grove >Priority: Major > > Spark jobs are failing on latest versions of Kubernetes when jobs attempt to > provision executor pods (jobs like Spark-Pi that do not launch executors run > without a problem): > > Here's an example error message: > > {code:java} > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes. > 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors > from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: > HTTP 403, Status: 403 - > java.net.ProtocolException: Expected HTTP 101 response but was '403 > Forbidden' > at > okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) > at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) > at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) > at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748) > {code} > > Looks like the issue is caused by fixes for a recent CVE : > CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809] > Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669] > > Looks like upgrading kubernetes-client to 4.4.2 would solve this issue. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24227) Not able to submit spark job to kubernetes on 2.3
[ https://issues.apache.org/jira/browse/SPARK-24227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921076#comment-16921076 ] Dongjoon Hyun commented on SPARK-24227: --- Thank you for reporting and analysis, [~felipejfc]! > Not able to submit spark job to kubernetes on 2.3 > - > > Key: SPARK-24227 > URL: https://issues.apache.org/jira/browse/SPARK-24227 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Felipe Cavalcanti >Priority: Major > > Hi, I'm trying to submit a spark job to kubernetes with no success, I > followed the steps @ > [https://spark.apache.org/docs/latest/running-on-kubernetes.html] with no > success, when I run: > > {code:java} > bin/spark-submit \ > --master k8s://https://${host}:${port} \ > --deploy-mode cluster \ > --name jaeger-spark \ > --class io.jaegertracing.spark.dependencies.DependenciesSparkJob \ > --conf spark.executor.instances=5 \ > --conf spark.kubernetes.container.image=bla/jaeger-deps-spark:latest\ > --conf spark.kubernetes.namespace=spark \ > local:///opt/spark/jars/jaeger-spark-dependencies-0.0.1-SNAPSHOT.jar > {code} > > Im getting the following stack trace: > {code:java} > 2018-05-09 17:06:02 WARN WatchConnectionManager:192 - Exec Failure > javax.net.ssl.SSLHandshakeException: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException: unable to find > valid certification path to requested target at > sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at > sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) at > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302) at > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at > sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514) > at > sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) > at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026) at > sun.security.ssl.Handshaker.process_record(Handshaker.java:961) at > sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062) at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at > okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281) > at > okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251) > at > okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151) > at > okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195) > at > okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121) > at > okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100) > at > okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:90) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at > okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) at > okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) Caused by: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException: unable
[jira] [Updated] (SPARK-24227) Not able to submit spark job to kubernetes on 2.3
[ https://issues.apache.org/jira/browse/SPARK-24227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24227: -- Labels: kubernetes (was: kubernetes spark) > Not able to submit spark job to kubernetes on 2.3 > - > > Key: SPARK-24227 > URL: https://issues.apache.org/jira/browse/SPARK-24227 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Submit >Affects Versions: 2.3.0 >Reporter: Felipe Cavalcanti >Priority: Major > Labels: kubernetes > > Hi, I'm trying to submit a spark job to kubernetes with no success, I > followed the steps @ > [https://spark.apache.org/docs/latest/running-on-kubernetes.html] with no > success, when I run: > > {code:java} > bin/spark-submit \ > --master k8s://https://${host}:${port} \ > --deploy-mode cluster \ > --name jaeger-spark \ > --class io.jaegertracing.spark.dependencies.DependenciesSparkJob \ > --conf spark.executor.instances=5 \ > --conf spark.kubernetes.container.image=bla/jaeger-deps-spark:latest\ > --conf spark.kubernetes.namespace=spark \ > local:///opt/spark/jars/jaeger-spark-dependencies-0.0.1-SNAPSHOT.jar > {code} > > Im getting the following stack trace: > {code:java} > 2018-05-09 17:06:02 WARN WatchConnectionManager:192 - Exec Failure > javax.net.ssl.SSLHandshakeException: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException: unable to find > valid certification path to requested target at > sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at > sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) at > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302) at > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at > sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514) > at > sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) > at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026) at > sun.security.ssl.Handshaker.process_record(Handshaker.java:961) at > sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062) at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at > okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281) > at > okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251) > at > okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151) > at > okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195) > at > okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121) > at > okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100) > at > okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:90) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at > okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) at > okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) Caused by: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException:
[jira] [Updated] (SPARK-24227) Not able to submit spark job to kubernetes on 2.3
[ https://issues.apache.org/jira/browse/SPARK-24227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24227: -- Labels: (was: kubernetes) > Not able to submit spark job to kubernetes on 2.3 > - > > Key: SPARK-24227 > URL: https://issues.apache.org/jira/browse/SPARK-24227 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Submit >Affects Versions: 2.3.0 >Reporter: Felipe Cavalcanti >Priority: Major > > Hi, I'm trying to submit a spark job to kubernetes with no success, I > followed the steps @ > [https://spark.apache.org/docs/latest/running-on-kubernetes.html] with no > success, when I run: > > {code:java} > bin/spark-submit \ > --master k8s://https://${host}:${port} \ > --deploy-mode cluster \ > --name jaeger-spark \ > --class io.jaegertracing.spark.dependencies.DependenciesSparkJob \ > --conf spark.executor.instances=5 \ > --conf spark.kubernetes.container.image=bla/jaeger-deps-spark:latest\ > --conf spark.kubernetes.namespace=spark \ > local:///opt/spark/jars/jaeger-spark-dependencies-0.0.1-SNAPSHOT.jar > {code} > > Im getting the following stack trace: > {code:java} > 2018-05-09 17:06:02 WARN WatchConnectionManager:192 - Exec Failure > javax.net.ssl.SSLHandshakeException: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException: unable to find > valid certification path to requested target at > sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at > sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) at > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302) at > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at > sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514) > at > sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) > at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026) at > sun.security.ssl.Handshaker.process_record(Handshaker.java:961) at > sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062) at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at > okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281) > at > okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251) > at > okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151) > at > okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195) > at > okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121) > at > okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100) > at > okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:90) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at > okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) at > okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) Caused by: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException: unable to find > valid certification path to
[jira] [Updated] (SPARK-24227) Not able to submit spark job to kubernetes on 2.3
[ https://issues.apache.org/jira/browse/SPARK-24227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24227: -- Component/s: (was: Spark Submit) (was: Spark Core) Kubernetes > Not able to submit spark job to kubernetes on 2.3 > - > > Key: SPARK-24227 > URL: https://issues.apache.org/jira/browse/SPARK-24227 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.0 >Reporter: Felipe Cavalcanti >Priority: Major > > Hi, I'm trying to submit a spark job to kubernetes with no success, I > followed the steps @ > [https://spark.apache.org/docs/latest/running-on-kubernetes.html] with no > success, when I run: > > {code:java} > bin/spark-submit \ > --master k8s://https://${host}:${port} \ > --deploy-mode cluster \ > --name jaeger-spark \ > --class io.jaegertracing.spark.dependencies.DependenciesSparkJob \ > --conf spark.executor.instances=5 \ > --conf spark.kubernetes.container.image=bla/jaeger-deps-spark:latest\ > --conf spark.kubernetes.namespace=spark \ > local:///opt/spark/jars/jaeger-spark-dependencies-0.0.1-SNAPSHOT.jar > {code} > > Im getting the following stack trace: > {code:java} > 2018-05-09 17:06:02 WARN WatchConnectionManager:192 - Exec Failure > javax.net.ssl.SSLHandshakeException: > sun.security.validator.ValidatorException: PKIX path building failed: > sun.security.provider.certpath.SunCertPathBuilderException: unable to find > valid certification path to requested target at > sun.security.ssl.Alerts.getSSLException(Alerts.java:192) at > sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1949) at > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302) at > sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296) at > sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1514) > at > sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:216) > at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1026) at > sun.security.ssl.Handshaker.process_record(Handshaker.java:961) at > sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1062) at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375) > at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403) at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387) at > okhttp3.internal.connection.RealConnection.connectTls(RealConnection.java:281) > at > okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:251) > at > okhttp3.internal.connection.RealConnection.connect(RealConnection.java:151) > at > okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:195) > at > okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121) > at > okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100) > at > okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at > io.fabric8.kubernetes.client.utils.HttpClientUtils$2.intercept(HttpClientUtils.java:90) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) > at > okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) > at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at > okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) at > okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) Caused by: > sun.security.validator.ValidatorException: PKIX path building failed: >
[jira] [Assigned] (SPARK-28951) Add release announce template
[ https://issues.apache.org/jira/browse/SPARK-28951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28951: - Assignee: Dongjoon Hyun > Add release announce template > - > > Key: SPARK-28951 > URL: https://issues.apache.org/jira/browse/SPARK-28951 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 2.4.5, 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28951) Add release announce template
[ https://issues.apache.org/jira/browse/SPARK-28951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28951. --- Fix Version/s: 3.0.0 2.4.5 Resolution: Fixed Issue resolved by pull request 25656 [https://github.com/apache/spark/pull/25656] > Add release announce template > - > > Key: SPARK-28951 > URL: https://issues.apache.org/jira/browse/SPARK-28951 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 2.4.5, 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Trivial > Fix For: 2.4.5, 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28372) Document Spark WEB UI
[ https://issues.apache.org/jira/browse/SPARK-28372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921026#comment-16921026 ] Pablo Langa Blanco commented on SPARK-28372: [~smilegator] We have a streaming tab section in the documentation with little documentation. Do you think we should open a new issue to complete this? > Document Spark WEB UI > - > > Key: SPARK-28372 > URL: https://issues.apache.org/jira/browse/SPARK-28372 > Project: Spark > Issue Type: Umbrella > Components: Documentation, Web UI >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > Spark web UIs are being used to monitor the status and resource consumption > of your Spark applications and clusters. However, we do not have the > corresponding document. It is hard for end users to use and understand them. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28373) Document JDBC/ODBC Server page
[ https://issues.apache.org/jira/browse/SPARK-28373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921019#comment-16921019 ] Pablo Langa Blanco commented on SPARK-28373: [~podongfeng] Ok i can take care of it > Document JDBC/ODBC Server page > -- > > Key: SPARK-28373 > URL: https://issues.apache.org/jira/browse/SPARK-28373 > Project: Spark > Issue Type: Sub-task > Components: Documentation, Web UI >Affects Versions: 3.0.0 >Reporter: Xiao Li >Priority: Major > > !https://user-images.githubusercontent.com/5399861/60809590-9dcf2500-a1bd-11e9-826e-33729bb97daf.png|width=1720,height=503! > > [https://github.com/apache/spark/pull/25062] added a new column CLOSE TIME > and EXECUTION TIME. It is hard to understand the difference. We need to > document them; otherwise, it is hard for end users to understand them > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27733) Upgrade to Avro 1.9.x
[ https://issues.apache.org/jira/browse/SPARK-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921009#comment-16921009 ] Dongjoon Hyun commented on SPARK-27733: --- Great! Thank you for the follow-ups, [~Fokko]! > Upgrade to Avro 1.9.x > - > > Key: SPARK-27733 > URL: https://issues.apache.org/jira/browse/SPARK-27733 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.0.0 >Reporter: Ismaël Mejía >Priority: Minor > > Avro 1.9.0 was released with many nice features including reduced size (1MB > less), and removed dependencies, no paranmer, no shaded guava, security > updates, so probably a worth upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28952) Getting Error using 'Linearregression' in spark 2.3.4
Sandeep Singh created SPARK-28952: - Summary: Getting Error using 'Linearregression' in spark 2.3.4 Key: SPARK-28952 URL: https://issues.apache.org/jira/browse/SPARK-28952 Project: Spark Issue Type: Bug Components: ML Affects Versions: 2.4.3 Reporter: Sandeep Singh Getting following error While fitting the 'LinearRegression': File "C:\Spark\spark-2.4.3-bin-hadoop2.7\python\pyspark\sql\utils.py", line 79, in deco raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace) IllegalArgumentException: 'requirement failed: Column features must be of type struct,values:array> but was actually struct,values:array>.' -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28951) Add release announce template
Dongjoon Hyun created SPARK-28951: - Summary: Add release announce template Key: SPARK-28951 URL: https://issues.apache.org/jira/browse/SPARK-28951 Project: Spark Issue Type: Task Components: Project Infra Affects Versions: 2.4.5, 3.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28912) MatchError exception in CheckpointWriteHandler
[ https://issues.apache.org/jira/browse/SPARK-28912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920979#comment-16920979 ] Aleksandr Kashkirov commented on SPARK-28912: - Steps to reproduce the error: # Start Hadoop in a pseudo-distributed mode. # In another terminal run command {{nc -lk }} # In the Spark shell execute the following statements: {code:java} scala> val ssc = new StreamingContext(sc, Seconds(30)) ssc: org.apache.spark.streaming.StreamingContext = org.apache.spark.streaming.StreamingContext@376fd14f scala> ssc.checkpoint("hdfs://localhost:9000/checkpoint-01") scala> val lines = ssc.socketTextStream("localhost", ) lines: org.apache.spark.streaming.dstream.ReceiverInputDStream[String] = org.apache.spark.streaming.dstream.SocketInputDStream@39b7d031 scala> val words = lines.flatMap(_.split(" ")) words: org.apache.spark.streaming.dstream.DStream[String] = org.apache.spark.streaming.dstream.FlatMappedDStream@637ae337 scala> val pairs = words.map(word => (word, 1)) pairs: org.apache.spark.streaming.dstream.DStream[(String, Int)] = org.apache.spark.streaming.dstream.MappedDStream@523d07cc scala> val wordCounts = pairs.reduceByKey(_ + _) wordCounts: org.apache.spark.streaming.dstream.DStream[(String, Int)] = org.apache.spark.streaming.dstream.ShuffledDStream@3c62183b scala> wordCounts.print() scala> ssc.start() scala> ssc.awaitTermination() {code} > MatchError exception in CheckpointWriteHandler > -- > > Key: SPARK-28912 > URL: https://issues.apache.org/jira/browse/SPARK-28912 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0, 2.3.2 >Reporter: Aleksandr Kashkirov >Priority: Minor > > Setting checkpoint directory name to "checkpoint-" plus some digits (e.g. > "checkpoint-01") results in the following error: > {code:java} > Exception in thread "pool-32-thread-1" scala.MatchError: > 0523a434-0daa-4ea6-a050-c4eb3c557d8c (of class java.lang.String) > at > org.apache.spark.streaming.Checkpoint$.org$apache$spark$streaming$Checkpoint$$sortFunc$1(Checkpoint.scala:121) > > at > org.apache.spark.streaming.Checkpoint$$anonfun$getCheckpointFiles$1.apply(Checkpoint.scala:132) > > at > org.apache.spark.streaming.Checkpoint$$anonfun$getCheckpointFiles$1.apply(Checkpoint.scala:132) > > at scala.math.Ordering$$anon$9.compare(Ordering.scala:200) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355) > at java.util.TimSort.sort(TimSort.java:234) > at java.util.Arrays.sort(Arrays.java:1438) > at scala.collection.SeqLike$class.sorted(SeqLike.scala:648) > at scala.collection.mutable.ArrayOps$ofRef.sorted(ArrayOps.scala:186) > at scala.collection.SeqLike$class.sortWith(SeqLike.scala:601) > at scala.collection.mutable.ArrayOps$ofRef.sortWith(ArrayOps.scala:186) > at > org.apache.spark.streaming.Checkpoint$.getCheckpointFiles(Checkpoint.scala:132) > > at > org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(Checkpoint.scala:262) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28463) Thriftserver throws java.math.BigDecimal incompatible with org.apache.hadoop.hive.common.type.HiveDecimal
[ https://issues.apache.org/jira/browse/SPARK-28463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-28463: --- Assignee: Yuming Wang > Thriftserver throws java.math.BigDecimal incompatible with > org.apache.hadoop.hive.common.type.HiveDecimal > - > > Key: SPARK-28463 > URL: https://issues.apache.org/jira/browse/SPARK-28463 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:sh} > build/sbt clean package -Phive -Phive-thriftserver -Phadoop-3.2 > export SPARK_PREPEND_CLASSES=true > sbin/start-thriftserver.sh > [root@spark-3267648 spark]# bin/beeline -u > jdbc:hive2://localhost:1/default -e "select cast(1 as decimal(38, 18));" > Connecting to jdbc:hive2://localhost:1/default > Connected to: Spark SQL (version 3.0.0-SNAPSHOT) > Driver: Hive JDBC (version 2.3.5) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: java.lang.ClassCastException: java.math.BigDecimal incompatible with > org.apache.hadoop.hive.common.type.HiveDecimal (state=,code=0) > Closing: 0: jdbc:hive2://localhost:1/default > {code} > Logs: > {noformat} > java.lang.RuntimeException: java.lang.ClassCastException: > java.math.BigDecimal incompatible with > org.apache.hadoop.hive.common.type.HiveDecimal > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83) > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > at > java.security.AccessController.doPrivileged(AccessController.java:770) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > at com.sun.proxy.$Proxy31.fetchResults(Unknown Source) > at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:521) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:623) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:819) > Caused by: java.lang.ClassCastException: java.math.BigDecimal incompatible > with org.apache.hadoop.hive.common.type.HiveDecimal > at > org.apache.hive.service.cli.ColumnBasedSet.addRow(ColumnBasedSet.java:111) > at > org.apache.hive.service.cli.ColumnBasedSet.addRow(ColumnBasedSet.java:42) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$getNextRowSet$1(SparkExecuteStatementOperation.scala:150) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$Lambda$1921.9054D6E0.apply(Unknown > Source) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withSchedulerPool(SparkExecuteStatementOperation.scala:298) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(SparkExecuteStatementOperation.scala:112) > at > org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:244) > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:799) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > ... 18 more > {noformat} -- This message was sent by Atlassian Jira
[jira] [Resolved] (SPARK-28463) Thriftserver throws java.math.BigDecimal incompatible with org.apache.hadoop.hive.common.type.HiveDecimal
[ https://issues.apache.org/jira/browse/SPARK-28463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-28463. - Resolution: Fixed > Thriftserver throws java.math.BigDecimal incompatible with > org.apache.hadoop.hive.common.type.HiveDecimal > - > > Key: SPARK-28463 > URL: https://issues.apache.org/jira/browse/SPARK-28463 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > How to reproduce this issue: > {code:sh} > build/sbt clean package -Phive -Phive-thriftserver -Phadoop-3.2 > export SPARK_PREPEND_CLASSES=true > sbin/start-thriftserver.sh > [root@spark-3267648 spark]# bin/beeline -u > jdbc:hive2://localhost:1/default -e "select cast(1 as decimal(38, 18));" > Connecting to jdbc:hive2://localhost:1/default > Connected to: Spark SQL (version 3.0.0-SNAPSHOT) > Driver: Hive JDBC (version 2.3.5) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: java.lang.ClassCastException: java.math.BigDecimal incompatible with > org.apache.hadoop.hive.common.type.HiveDecimal (state=,code=0) > Closing: 0: jdbc:hive2://localhost:1/default > {code} > Logs: > {noformat} > java.lang.RuntimeException: java.lang.ClassCastException: > java.math.BigDecimal incompatible with > org.apache.hadoop.hive.common.type.HiveDecimal > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83) > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > at > java.security.AccessController.doPrivileged(AccessController.java:770) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > at com.sun.proxy.$Proxy31.fetchResults(Unknown Source) > at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:521) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:623) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:819) > Caused by: java.lang.ClassCastException: java.math.BigDecimal incompatible > with org.apache.hadoop.hive.common.type.HiveDecimal > at > org.apache.hive.service.cli.ColumnBasedSet.addRow(ColumnBasedSet.java:111) > at > org.apache.hive.service.cli.ColumnBasedSet.addRow(ColumnBasedSet.java:42) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$getNextRowSet$1(SparkExecuteStatementOperation.scala:150) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$Lambda$1921.9054D6E0.apply(Unknown > Source) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withSchedulerPool(SparkExecuteStatementOperation.scala:298) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(SparkExecuteStatementOperation.scala:112) > at > org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:244) > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:799) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > ... 18 more > {noformat} -- This message was sent by Atlassian Jira
[jira] [Updated] (SPARK-28463) Thriftserver throws java.math.BigDecimal incompatible with org.apache.hadoop.hive.common.type.HiveDecimal
[ https://issues.apache.org/jira/browse/SPARK-28463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-28463: Fix Version/s: 3.0.0 > Thriftserver throws java.math.BigDecimal incompatible with > org.apache.hadoop.hive.common.type.HiveDecimal > - > > Key: SPARK-28463 > URL: https://issues.apache.org/jira/browse/SPARK-28463 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > How to reproduce this issue: > {code:sh} > build/sbt clean package -Phive -Phive-thriftserver -Phadoop-3.2 > export SPARK_PREPEND_CLASSES=true > sbin/start-thriftserver.sh > [root@spark-3267648 spark]# bin/beeline -u > jdbc:hive2://localhost:1/default -e "select cast(1 as decimal(38, 18));" > Connecting to jdbc:hive2://localhost:1/default > Connected to: Spark SQL (version 3.0.0-SNAPSHOT) > Driver: Hive JDBC (version 2.3.5) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: java.lang.ClassCastException: java.math.BigDecimal incompatible with > org.apache.hadoop.hive.common.type.HiveDecimal (state=,code=0) > Closing: 0: jdbc:hive2://localhost:1/default > {code} > Logs: > {noformat} > java.lang.RuntimeException: java.lang.ClassCastException: > java.math.BigDecimal incompatible with > org.apache.hadoop.hive.common.type.HiveDecimal > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83) > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > at > java.security.AccessController.doPrivileged(AccessController.java:770) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > at com.sun.proxy.$Proxy31.fetchResults(Unknown Source) > at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:521) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:623) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1717) > at > org.apache.hive.service.rpc.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1702) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:819) > Caused by: java.lang.ClassCastException: java.math.BigDecimal incompatible > with org.apache.hadoop.hive.common.type.HiveDecimal > at > org.apache.hive.service.cli.ColumnBasedSet.addRow(ColumnBasedSet.java:111) > at > org.apache.hive.service.cli.ColumnBasedSet.addRow(ColumnBasedSet.java:42) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.$anonfun$getNextRowSet$1(SparkExecuteStatementOperation.scala:150) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$Lambda$1921.9054D6E0.apply(Unknown > Source) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withSchedulerPool(SparkExecuteStatementOperation.scala:298) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(SparkExecuteStatementOperation.scala:112) > at > org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:244) > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:799) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > ... 18 more > {noformat} -- This message was sent
[jira] [Created] (SPARK-28950) FollowingUp: Change whereClause to be optional in DELETE
Xianyin Xin created SPARK-28950: --- Summary: FollowingUp: Change whereClause to be optional in DELETE Key: SPARK-28950 URL: https://issues.apache.org/jira/browse/SPARK-28950 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Xianyin Xin -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28950) [SPARK-28351] FollowingUp: Change whereClause to be optional in DELETE
[ https://issues.apache.org/jira/browse/SPARK-28950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianyin Xin updated SPARK-28950: Summary: [SPARK-28351] FollowingUp: Change whereClause to be optional in DELETE (was: FollowingUp: Change whereClause to be optional in DELETE) > [SPARK-28351] FollowingUp: Change whereClause to be optional in DELETE > -- > > Key: SPARK-28950 > URL: https://issues.apache.org/jira/browse/SPARK-28950 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Xianyin Xin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28705) drop the tables in AnalysisExternalCatalogSuite after the testcase execution
[ https://issues.apache.org/jira/browse/SPARK-28705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-28705. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 25427 [https://github.com/apache/spark/pull/25427] > drop the tables in AnalysisExternalCatalogSuite after the testcase execution > > > Key: SPARK-28705 > URL: https://issues.apache.org/jira/browse/SPARK-28705 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.3 >Reporter: Sandeep Katta >Assignee: Sandeep Katta >Priority: Trivial > Fix For: 3.0.0 > > > drop the tables after each testcase executed -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28705) drop the tables in AnalysisExternalCatalogSuite after the testcase execution
[ https://issues.apache.org/jira/browse/SPARK-28705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-28705: Assignee: Sandeep Katta > drop the tables in AnalysisExternalCatalogSuite after the testcase execution > > > Key: SPARK-28705 > URL: https://issues.apache.org/jira/browse/SPARK-28705 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.3 >Reporter: Sandeep Katta >Assignee: Sandeep Katta >Priority: Trivial > > drop the tables after each testcase executed -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28949) Kubernetes CGroup leaking leads to Spark Pods hang in Pending status
[ https://issues.apache.org/jira/browse/SPARK-28949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-28949: - Description: After running Spark on k8s for a few days, some kubelet fails to create pod caused by warning message like {code:java} \"mkdir /sys/fs/cgroup/memory/kubepods/burstable/podb4a04361-ca89-11e9-a224-6c92bf35392e/1d5aed3ea20b246ec4f121f778f48c493e3e8678f2afe58a96c15180176e: no space left on device\" {code} The k8s cluster and the kubelet node are free. These pods zombie over days before we manually notify and terminate them. Maybe it is a little bit easy to identify zombied driver pods, but it is quite inconvenient to identify executor pods when spark applications scale-out. This probably related to [https://github.com/kubernetes/kubernetes/issues/70324] Do we need a timeout, retry or failover mechanism for Spark to handle these kinds of k8s kernel issues? was: After running Spark on k8s for a few days, some kubelet fails to create pod caused by warning message like {code:java} \"mkdir /sys/fs/cgroup/memory/kubepods/burstable/podb4a04361-ca89-11e9-a224-6c92bf35392e/1d5aed3ea20b246ec4f121f778f48c493e3e8678f2afe58a96c15180176e: no space left on device\" {code} The k8s cluster and the kubelet node are free. These pods zombie over days before we manually notify and terminate them. Maybe it is a little bit This probably related to [https://github.com/kubernetes/kubernetes/issues/70324] Do we need a timeout, retry or failover mechanism for Spark to handle these kinds of k8s kernel issues? > Kubernetes CGroup leaking leads to Spark Pods hang in Pending status > > > Key: SPARK-28949 > URL: https://issues.apache.org/jira/browse/SPARK-28949 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.3, 2.4.4 >Reporter: Kent Yao >Priority: Major > Attachments: describe-driver-pod.txt, describe-executor-pod.txt > > > After running Spark on k8s for a few days, some kubelet fails to create pod > caused by warning message like > {code:java} > \"mkdir > /sys/fs/cgroup/memory/kubepods/burstable/podb4a04361-ca89-11e9-a224-6c92bf35392e/1d5aed3ea20b246ec4f121f778f48c493e3e8678f2afe58a96c15180176e: > no space left on device\" > {code} > The k8s cluster and the kubelet node are free. > These pods zombie over days before we manually notify and terminate them. > Maybe it > is a little bit easy to identify zombied driver pods, but it is quite > inconvenient to identify executor pods when spark applications scale-out. > This probably related to > [https://github.com/kubernetes/kubernetes/issues/70324] > Do we need a timeout, retry or failover mechanism for Spark to handle these > kinds of k8s kernel issues? > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28949) Kubernetes CGroup leaking leads to Spark Pods hang in Pending status
[ https://issues.apache.org/jira/browse/SPARK-28949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-28949: - Attachment: describe-executor-pod.txt > Kubernetes CGroup leaking leads to Spark Pods hang in Pending status > > > Key: SPARK-28949 > URL: https://issues.apache.org/jira/browse/SPARK-28949 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.3, 2.4.4 >Reporter: Kent Yao >Priority: Major > Attachments: describe-driver-pod.txt, describe-executor-pod.txt > > > After running Spark on k8s for a few days, some kubelet fails to create pod > caused by warning message like > {code:java} > \"mkdir > /sys/fs/cgroup/memory/kubepods/burstable/podb4a04361-ca89-11e9-a224-6c92bf35392e/1d5aed3ea20b246ec4f121f778f48c493e3e8678f2afe58a96c15180176e: > no space left on device\" > {code} > The k8s cluster and the kubelet node are free. > These pods zombie over days before we manually notify and terminate them. > Maybe it > is a little bit > This probably related to > [https://github.com/kubernetes/kubernetes/issues/70324] > Do we need a timeout, retry or failover mechanism for Spark to handle these > kinds of k8s kernel issues? > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28949) Kubernetes CGroup leaking leads to Spark Pods hang in Pending status
[ https://issues.apache.org/jira/browse/SPARK-28949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-28949: - Attachment: describe-driver-pod.txt > Kubernetes CGroup leaking leads to Spark Pods hang in Pending status > > > Key: SPARK-28949 > URL: https://issues.apache.org/jira/browse/SPARK-28949 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.3.3, 2.4.4 >Reporter: Kent Yao >Priority: Major > Attachments: describe-driver-pod.txt, describe-executor-pod.txt > > > After running Spark on k8s for a few days, some kubelet fails to create pod > caused by warning message like > {code:java} > \"mkdir > /sys/fs/cgroup/memory/kubepods/burstable/podb4a04361-ca89-11e9-a224-6c92bf35392e/1d5aed3ea20b246ec4f121f778f48c493e3e8678f2afe58a96c15180176e: > no space left on device\" > {code} > The k8s cluster and the kubelet node are free. > These pods zombie over days before we manually notify and terminate them. > Maybe it > is a little bit > This probably related to > [https://github.com/kubernetes/kubernetes/issues/70324] > Do we need a timeout, retry or failover mechanism for Spark to handle these > kinds of k8s kernel issues? > > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28949) Kubernetes CGroup leaking leads to Spark Pods hang in Pending status
Kent Yao created SPARK-28949: Summary: Kubernetes CGroup leaking leads to Spark Pods hang in Pending status Key: SPARK-28949 URL: https://issues.apache.org/jira/browse/SPARK-28949 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 2.4.4, 2.3.3 Reporter: Kent Yao After running Spark on k8s for a few days, some kubelet fails to create pod caused by warning message like {code:java} \"mkdir /sys/fs/cgroup/memory/kubepods/burstable/podb4a04361-ca89-11e9-a224-6c92bf35392e/1d5aed3ea20b246ec4f121f778f48c493e3e8678f2afe58a96c15180176e: no space left on device\" {code} The k8s cluster and the kubelet node are free. These pods zombie over days before we manually notify and terminate them. Maybe it is a little bit This probably related to [https://github.com/kubernetes/kubernetes/issues/70324] Do we need a timeout, retry or failover mechanism for Spark to handle these kinds of k8s kernel issues? -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28025) HDFSBackedStateStoreProvider should not leak .crc files
[ https://issues.apache.org/jira/browse/SPARK-28025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920706#comment-16920706 ] Gabor Somogyi commented on SPARK-28025: --- [~ste...@apache.org] I think adding "file.bytes-per-checksum = 0" possibility would be a workaround (at least here). The whole point why one choose ChecksumFileSystem is to have checksum. > HDFSBackedStateStoreProvider should not leak .crc files > > > Key: SPARK-28025 > URL: https://issues.apache.org/jira/browse/SPARK-28025 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.3 > Environment: Spark 2.4.3 > Kubernetes 1.11(?) (OpenShift) > StateStore storage on a mounted PVC. Viewed as a local filesystem by the > `FileContextBasedCheckpointFileManager` : > {noformat} > scala> glusterfm.isLocal > res17: Boolean = true{noformat} >Reporter: Gerard Maas >Assignee: Jungtaek Lim >Priority: Major > Fix For: 2.4.4, 3.0.0 > > > The HDFSBackedStateStoreProvider when using the default CheckpointFileManager > is leaving '.crc' files behind. There's a .crc file created for each > `atomicFile` operation of the CheckpointFileManager. > Over time, the number of files becomes very large. It makes the state store > file system constantly increase in size and, in our case, deteriorates the > file system performance. > Here's a sample of one of our spark storage volumes after 2 days of execution > (4 stateful streaming jobs, each on a different sub-dir): > # > {noformat} > Total files in PVC (used for checkpoints and state store) > $find . | wc -l > 431796 > # .crc files > $find . -name "*.crc" | wc -l > 418053{noformat} > With each .crc file taking one storage block, the used storage runs into the > GBs of data. > These jobs are running on Kubernetes. Our shared storage provider, GlusterFS, > shows serious performance deterioration with this large number of files: > {noformat} > DEBUG HDFSBackedStateStoreProvider: fetchFiles() took 29164ms{noformat} > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28906) `bin/spark-submit --version` shows incorrect info
[ https://issues.apache.org/jira/browse/SPARK-28906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920594#comment-16920594 ] Kazuaki Ishizaki edited comment on SPARK-28906 at 9/2/19 8:48 AM: -- For the information on {{git}} comand, {{.git}} directory is deleted after {{git clone}} is executed. As a result, we cannot get infomration on {{git}} command. When I tentatively stop deleting {{.git}} directory, {{spark-version-info.properties}} can include the correct information like: {code} version=2.3.4 user=ishizaki revision=8c6f8150f3c6298ff4e1c7e06028f12d7eaf0210 branch=HEAD date=2019-09-02T02:31:25Z url=https://gitbox.apache.org/repos/asf/spark.git {code} was (Author: kiszk): For the information on {{git}} comand, {{.git}} directory is deleted after {{git clone}} is executed. When I tentatively stop deleting {{.git}} directory, {{spark-version-info.properties}} can include the correct information like: {code} version=2.3.4 user=ishizaki revision=8c6f8150f3c6298ff4e1c7e06028f12d7eaf0210 branch=HEAD date=2019-09-02T02:31:25Z url=https://gitbox.apache.org/repos/asf/spark.git {code} > `bin/spark-submit --version` shows incorrect info > - > > Key: SPARK-28906 > URL: https://issues.apache.org/jira/browse/SPARK-28906 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.4, 2.4.0, 2.4.1, 2.4.2, > 3.0.0, 2.4.3 >Reporter: Marcelo Vanzin >Priority: Minor > Attachments: image-2019-08-29-05-50-13-526.png > > > Since Spark 2.3.1, `spark-submit` shows a wrong information. > {code} > $ bin/spark-submit --version > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/___/ .__/\_,_/_/ /_/\_\ version 2.3.3 > /_/ > Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_222 > Branch > Compiled by user on 2019-02-04T13:00:46Z > Revision > Url > Type --help for more information. > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28916) Generated SpecificSafeProjection.apply method grows beyond 64 KB when use SparkSQL
[ https://issues.apache.org/jira/browse/SPARK-28916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920692#comment-16920692 ] Wenchen Fan commented on SPARK-28916: - to double-check, this is just error message not actual exception, right? > Generated SpecificSafeProjection.apply method grows beyond 64 KB when use > SparkSQL > --- > > Key: SPARK-28916 > URL: https://issues.apache.org/jira/browse/SPARK-28916 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1, 2.4.3 >Reporter: MOBIN >Priority: Major > > Can be reproduced by the following steps: > 1. Create a table with 5000 fields > 2. val data=spark.sql("select * from spark64kb limit 10"); > 3. data.describe() > Then,The following error occurred > {code:java} > WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 0, localhost, > executor 1): org.codehaus.janino.InternalCompilerException: failed to > compile: org.codehaus.janino.InternalCompilerException: Compiling > "GeneratedClass": Code of method > "apply(Ljava/lang/Object;)Ljava/lang/Object;" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection" > grows beyond 64 KB > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1298) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1376) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1373) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at > org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1238) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:143) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.generate(GenerateMutableProjection.scala:44) > at > org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:385) > at > org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:96) > at > org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3$$anonfun$4.apply(SortAggregateExec.scala:95) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator.generateProcessRow(AggregationIterator.scala:180) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator.(AggregationIterator.scala:199) > at > org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.(SortBasedAggregationIterator.scala:40) > at > org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:86) > at > org.apache.spark.sql.execution.aggregate.SortAggregateExec$$anonfun$doExecute$1$$anonfun$3.apply(SortAggregateExec.scala:77) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:121) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.codehaus.janino.InternalCompilerException: Compiling > "GeneratedClass": Code of method > "apply(Ljava/lang/Object;)Ljava/lang/Object;"
[jira] [Commented] (SPARK-27733) Upgrade to Avro 1.9.x
[ https://issues.apache.org/jira/browse/SPARK-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920685#comment-16920685 ] Fokko Driesprong commented on SPARK-27733: -- The regression issue has been resolved with the freshly released Avro 1.9.1. I'll look into the issues with the Hive dependency. > Upgrade to Avro 1.9.x > - > > Key: SPARK-27733 > URL: https://issues.apache.org/jira/browse/SPARK-27733 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.0.0 >Reporter: Ismaël Mejía >Priority: Minor > > Avro 1.9.0 was released with many nice features including reduced size (1MB > less), and removed dependencies, no paranmer, no shaded guava, security > updates, so probably a worth upgrade. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28694) Add Java/Scala StructuredKerberizedKafkaWordCount examples
[ https://issues.apache.org/jira/browse/SPARK-28694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920677#comment-16920677 ] daile commented on SPARK-28694: --- ok > Add Java/Scala StructuredKerberizedKafkaWordCount examples > -- > > Key: SPARK-28694 > URL: https://issues.apache.org/jira/browse/SPARK-28694 > Project: Spark > Issue Type: Improvement > Components: Examples, Structured Streaming >Affects Versions: 3.0.0 >Reporter: hong dongdong >Priority: Minor > > Now,`StructuredKafkaWordCount` example is not support to visit kafka using > kerberos authentication. Add a parameter which target if kerberos is used. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18084) write.partitionBy() does not recognize nested columns that select() can access
[ https://issues.apache.org/jira/browse/SPARK-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920676#comment-16920676 ] Wenchen Fan commented on SPARK-18084: - It looks to me that we should just add more doc in `partitionBy`. The string there is the column name, while the string in `select` is a general expression. > write.partitionBy() does not recognize nested columns that select() can access > -- > > Key: SPARK-18084 > URL: https://issues.apache.org/jira/browse/SPARK-18084 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.4.3 >Reporter: Nicholas Chammas >Priority: Minor > > Here's a simple repro in the PySpark shell: > {code} > from pyspark.sql import Row > rdd = spark.sparkContext.parallelize([Row(a=Row(b=5))]) > df = spark.createDataFrame(rdd) > df.printSchema() > df.select('a.b').show() # works > df.write.partitionBy('a.b').text('/tmp/test') # doesn't work > {code} > Here's what I see when I run this: > {code} > >>> from pyspark.sql import Row > >>> rdd = spark.sparkContext.parallelize([Row(a=Row(b=5))]) > >>> df = spark.createDataFrame(rdd) > >>> df.printSchema() > root > |-- a: struct (nullable = true) > ||-- b: long (nullable = true) > >>> df.show() > +---+ > | a| > +---+ > |[5]| > +---+ > >>> df.select('a.b').show() > +---+ > | b| > +---+ > | 5| > +---+ > >>> df.write.partitionBy('a.b').text('/tmp/test') > Traceback (most recent call last): > File > "/usr/local/Cellar/apache-spark/2.0.1/libexec/python/pyspark/sql/utils.py", > line 63, in deco > return f(*a, **kw) > File > "/usr/local/Cellar/apache-spark/2.0.1/libexec/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", > line 319, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o233.text. > : org.apache.spark.sql.AnalysisException: Partition column a.b not found in > schema > StructType(StructField(a,StructType(StructField(b,LongType,true)),true)); > at > org.apache.spark.sql.execution.datasources.PartitioningUtils$$anonfun$partitionColumnsSchema$1$$anonfun$apply$10.apply(PartitioningUtils.scala:368) > at > org.apache.spark.sql.execution.datasources.PartitioningUtils$$anonfun$partitionColumnsSchema$1$$anonfun$apply$10.apply(PartitioningUtils.scala:368) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.execution.datasources.PartitioningUtils$$anonfun$partitionColumnsSchema$1.apply(PartitioningUtils.scala:367) > at > org.apache.spark.sql.execution.datasources.PartitioningUtils$$anonfun$partitionColumnsSchema$1.apply(PartitioningUtils.scala:366) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.partitionColumnsSchema(PartitioningUtils.scala:366) > at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.validatePartitionColumn(PartitioningUtils.scala:349) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:458) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194) > at org.apache.spark.sql.DataFrameWriter.text(DataFrameWriter.scala:534) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:214) > at java.lang.Thread.run(Thread.java:745) > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "", line 1, in >
[jira] [Commented] (SPARK-10892) Join with Data Frame returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-10892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920675#comment-16920675 ] Wenchen Fan commented on SPARK-10892: - This is kind of fixed in 3.0 by https://github.com/apache/spark/pull/25107 . Now Spark can detect ambiguous join condition and throw exception. > Join with Data Frame returns wrong results > -- > > Key: SPARK-10892 > URL: https://issues.apache.org/jira/browse/SPARK-10892 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.5.0, 2.4.0 >Reporter: Ofer Mendelevitch >Priority: Critical > Labels: correctness > Attachments: data.json > > > I'm attaching a simplified reproducible example of the problem: > 1. Loading a JSON file from HDFS as a Data Frame > 2. Creating 3 data frames: PRCP, TMIN, TMAX > 3. Joining the data frames together. Each of those has a column "value" with > the same name, so renaming them after the join. > 4. The output seems incorrect; the first column has the correct values, but > the two other columns seem to have a copy of the values from the first column. > Here's the sample code: > {code} > import org.apache.spark.sql._ > val sqlc = new SQLContext(sc) > val weather = sqlc.read.format("json").load("data.json") > val prcp = weather.filter("metric = 'PRCP'").as("prcp").cache() > val tmin = weather.filter("metric = 'TMIN'").as("tmin").cache() > val tmax = weather.filter("metric = 'TMAX'").as("tmax").cache() > prcp.filter("year=2012 and month=10").show() > tmin.filter("year=2012 and month=10").show() > tmax.filter("year=2012 and month=10").show() > val out = (prcp.join(tmin, "date_str").join(tmax, "date_str") > .select(prcp("year"), prcp("month"), prcp("day"), prcp("date_str"), > prcp("value").alias("PRCP"), tmin("value").alias("TMIN"), > tmax("value").alias("TMAX")) ) > out.filter("year=2012 and month=10").show() > {code} > The output is: > {code} > ++---+--+-+---+-++ > |date_str|day|metric|month|station|value|year| > ++---+--+-+---+-++ > |20121001| 1| PRCP| 10|USW00023272|0|2012| > |20121002| 2| PRCP| 10|USW00023272|0|2012| > |20121003| 3| PRCP| 10|USW00023272|0|2012| > |20121004| 4| PRCP| 10|USW00023272|0|2012| > |20121005| 5| PRCP| 10|USW00023272|0|2012| > |20121006| 6| PRCP| 10|USW00023272|0|2012| > |20121007| 7| PRCP| 10|USW00023272|0|2012| > |20121008| 8| PRCP| 10|USW00023272|0|2012| > |20121009| 9| PRCP| 10|USW00023272|0|2012| > |20121010| 10| PRCP| 10|USW00023272|0|2012| > |20121011| 11| PRCP| 10|USW00023272|3|2012| > |20121012| 12| PRCP| 10|USW00023272|0|2012| > |20121013| 13| PRCP| 10|USW00023272|0|2012| > |20121014| 14| PRCP| 10|USW00023272|0|2012| > |20121015| 15| PRCP| 10|USW00023272|0|2012| > |20121016| 16| PRCP| 10|USW00023272|0|2012| > |20121017| 17| PRCP| 10|USW00023272|0|2012| > |20121018| 18| PRCP| 10|USW00023272|0|2012| > |20121019| 19| PRCP| 10|USW00023272|0|2012| > |20121020| 20| PRCP| 10|USW00023272|0|2012| > ++---+--+-+---+-+——+ > ++---+--+-+---+-++ > |date_str|day|metric|month|station|value|year| > ++---+--+-+---+-++ > |20121001| 1| TMIN| 10|USW00023272| 139|2012| > |20121002| 2| TMIN| 10|USW00023272| 178|2012| > |20121003| 3| TMIN| 10|USW00023272| 144|2012| > |20121004| 4| TMIN| 10|USW00023272| 144|2012| > |20121005| 5| TMIN| 10|USW00023272| 139|2012| > |20121006| 6| TMIN| 10|USW00023272| 128|2012| > |20121007| 7| TMIN| 10|USW00023272| 122|2012| > |20121008| 8| TMIN| 10|USW00023272| 122|2012| > |20121009| 9| TMIN| 10|USW00023272| 139|2012| > |20121010| 10| TMIN| 10|USW00023272| 128|2012| > |20121011| 11| TMIN| 10|USW00023272| 122|2012| > |20121012| 12| TMIN| 10|USW00023272| 117|2012| > |20121013| 13| TMIN| 10|USW00023272| 122|2012| > |20121014| 14| TMIN| 10|USW00023272| 128|2012| > |20121015| 15| TMIN| 10|USW00023272| 128|2012| > |20121016| 16| TMIN| 10|USW00023272| 156|2012| > |20121017| 17| TMIN| 10|USW00023272| 139|2012| > |20121018| 18| TMIN| 10|USW00023272| 161|2012| > |20121019| 19| TMIN| 10|USW00023272| 133|2012| > |20121020| 20| TMIN| 10|USW00023272| 122|2012| > ++---+--+-+---+-+——+ > ++---+--+-+---+-++ > |date_str|day|metric|month|station|value|year| > ++---+--+-+---+-++ > |20121001| 1| TMAX| 10|USW00023272| 322|2012| > |20121002| 2| TMAX| 10|USW00023272| 344|2012| > |20121003| 3| TMAX| 10|USW00023272|
[jira] [Commented] (SPARK-28694) Add Java/Scala StructuredKerberizedKafkaWordCount examples
[ https://issues.apache.org/jira/browse/SPARK-28694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920674#comment-16920674 ] hong dongdong commented on SPARK-28694: --- [~726575...@qq.com] thanks, but I working now and will push today later. It is related to SPARK-28691. > Add Java/Scala StructuredKerberizedKafkaWordCount examples > -- > > Key: SPARK-28694 > URL: https://issues.apache.org/jira/browse/SPARK-28694 > Project: Spark > Issue Type: Improvement > Components: Examples, Structured Streaming >Affects Versions: 3.0.0 >Reporter: hong dongdong >Priority: Minor > > Now,`StructuredKafkaWordCount` example is not support to visit kafka using > kerberos authentication. Add a parameter which target if kerberos is used. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28948) support data source v2 in CREATE TABLE USING
Wenchen Fan created SPARK-28948: --- Summary: support data source v2 in CREATE TABLE USING Key: SPARK-28948 URL: https://issues.apache.org/jira/browse/SPARK-28948 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.0.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28694) Add Java/Scala StructuredKerberizedKafkaWordCount examples
[ https://issues.apache.org/jira/browse/SPARK-28694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920665#comment-16920665 ] daile commented on SPARK-28694: --- I will work on this > Add Java/Scala StructuredKerberizedKafkaWordCount examples > -- > > Key: SPARK-28694 > URL: https://issues.apache.org/jira/browse/SPARK-28694 > Project: Spark > Issue Type: Improvement > Components: Examples, Structured Streaming >Affects Versions: 3.0.0 >Reporter: hong dongdong >Priority: Minor > > Now,`StructuredKafkaWordCount` example is not support to visit kafka using > kerberos authentication. Add a parameter which target if kerberos is used. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28947) Status logging occurs on every state change but not at an interval for liveness.
Kent Yao created SPARK-28947: Summary: Status logging occurs on every state change but not at an interval for liveness. Key: SPARK-28947 URL: https://issues.apache.org/jira/browse/SPARK-28947 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 2.4.4, 2.3.3 Reporter: Kent Yao The start method of `LoggingPodStatusWatcherImpl` should be invoked -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28940) Subquery reuse across all subquery levels
[ https://issues.apache.org/jira/browse/SPARK-28940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Toth updated SPARK-28940: --- Summary: Subquery reuse across all subquery levels (was: Subquery reuse accross all subquery levels) > Subquery reuse across all subquery levels > - > > Key: SPARK-28940 > URL: https://issues.apache.org/jira/browse/SPARK-28940 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Peter Toth >Priority: Major > > Currently subquery reuse doesn't work across all subquery levels. > Here is an example query: > {noformat} > SELECT (SELECT avg(key) FROM testData), (SELECT (SELECT avg(key) FROM > testData)) > FROM testData > LIMIT 1 > {noformat} > where the plan now is: > {noformat} > CollectLimit 1 > +- *(1) Project [Subquery scalar-subquery#268, [id=#231] AS > scalarsubquery()#276, Subquery scalar-subquery#270, [id=#266] AS > scalarsubquery()#277] >: :- Subquery scalar-subquery#268, [id=#231] >: : +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 as > bigint))], output=[avg(key)#272]) >: : +- Exchange SinglePartition, true, [id=#227] >: :+- *(1) HashAggregate(keys=[], > functions=[partial_avg(cast(key#13 as bigint))], output=[sum#282, count#283L]) >: : +- *(1) SerializeFromObject > [knownnotnull(assertnotnull(input[0, > org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] >: : +- Scan[obj#12] >: +- Subquery scalar-subquery#270, [id=#266] >: +- *(1) Project [Subquery scalar-subquery#269, [id=#263] AS > scalarsubquery()#275] >:: +- Subquery scalar-subquery#269, [id=#263] >:: +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 > as bigint))], output=[avg(key)#274]) >::+- Exchange SinglePartition, true, [id=#259] >:: +- *(1) HashAggregate(keys=[], > functions=[partial_avg(cast(key#13 as bigint))], output=[sum#286, count#287L]) >:: +- *(1) SerializeFromObject > [knownnotnull(assertnotnull(input[0, > org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] >:: +- Scan[obj#12] >:+- *(1) Scan OneRowRelation[] >+- *(1) SerializeFromObject > +- Scan[obj#12] > {noformat} > but it could be: > {noformat} > CollectLimit 1 > +- *(1) Project [ReusedSubquery Subquery scalar-subquery#241, [id=#148] AS > scalarsubquery()#248, Subquery scalar-subquery#242, [id=#164] AS > scalarsubquery()#249] >: :- ReusedSubquery Subquery scalar-subquery#241, [id=#148] >: +- Subquery scalar-subquery#242, [id=#164] >: +- *(1) Project [Subquery scalar-subquery#241, [id=#148] AS > scalarsubquery()#247] >:: +- Subquery scalar-subquery#241, [id=#148] >:: +- *(2) HashAggregate(keys=[], functions=[avg(cast(key#13 > as bigint))], output=[avg(key)#246]) >::+- Exchange SinglePartition, true, [id=#144] >:: +- *(1) HashAggregate(keys=[], > functions=[partial_avg(cast(key#13 as bigint))], output=[sum#258, count#259L]) >:: +- *(1) SerializeFromObject > [knownnotnull(assertnotnull(input[0, > org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#13] >:: +- Scan[obj#12] >:+- *(1) Scan OneRowRelation[] >+- *(1) SerializeFromObject > +- Scan[obj#12] > {noformat} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28912) MatchError exception in CheckpointWriteHandler
[ https://issues.apache.org/jira/browse/SPARK-28912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920647#comment-16920647 ] Hyukjin Kwon commented on SPARK-28912: -- ping [~avk1] > MatchError exception in CheckpointWriteHandler > -- > > Key: SPARK-28912 > URL: https://issues.apache.org/jira/browse/SPARK-28912 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0, 2.3.2 >Reporter: Aleksandr Kashkirov >Priority: Minor > > Setting checkpoint directory name to "checkpoint-" plus some digits (e.g. > "checkpoint-01") results in the following error: > {code:java} > Exception in thread "pool-32-thread-1" scala.MatchError: > 0523a434-0daa-4ea6-a050-c4eb3c557d8c (of class java.lang.String) > at > org.apache.spark.streaming.Checkpoint$.org$apache$spark$streaming$Checkpoint$$sortFunc$1(Checkpoint.scala:121) > > at > org.apache.spark.streaming.Checkpoint$$anonfun$getCheckpointFiles$1.apply(Checkpoint.scala:132) > > at > org.apache.spark.streaming.Checkpoint$$anonfun$getCheckpointFiles$1.apply(Checkpoint.scala:132) > > at scala.math.Ordering$$anon$9.compare(Ordering.scala:200) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355) > at java.util.TimSort.sort(TimSort.java:234) > at java.util.Arrays.sort(Arrays.java:1438) > at scala.collection.SeqLike$class.sorted(SeqLike.scala:648) > at scala.collection.mutable.ArrayOps$ofRef.sorted(ArrayOps.scala:186) > at scala.collection.SeqLike$class.sortWith(SeqLike.scala:601) > at scala.collection.mutable.ArrayOps$ofRef.sortWith(ArrayOps.scala:186) > at > org.apache.spark.streaming.Checkpoint$.getCheckpointFiles(Checkpoint.scala:132) > > at > org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler.run(Checkpoint.scala:262) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28941) Spark Sql Jobs
[ https://issues.apache.org/jira/browse/SPARK-28941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-28941. -- Resolution: Invalid Please ask questions to mailing lists. > Spark Sql Jobs > -- > > Key: SPARK-28941 > URL: https://issues.apache.org/jira/browse/SPARK-28941 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Brahmendra >Priority: Major > Labels: github-import, pull-request-available > Fix For: 2.4.3 > > > HI Team, > I need one favor on spark sql jobs. > I have to 200+ spark sql query running on 7 different hive table. > How can we do this in one jar file to execute all 200+ spark sql jobs. > currently we are managing 7 jar files for each tables. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28943) NoSuchMethodError: shaded.parquet.org.apache.thrift.EncodingUtils.setBit(BIZ)B
[ https://issues.apache.org/jira/browse/SPARK-28943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920644#comment-16920644 ] Hyukjin Kwon commented on SPARK-28943: -- Does this happen in regular Apache Spark too, not CDH? Also, please provide steps to reproduce. > NoSuchMethodError: > shaded.parquet.org.apache.thrift.EncodingUtils.setBit(BIZ)B > --- > > Key: SPARK-28943 > URL: https://issues.apache.org/jira/browse/SPARK-28943 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Michael Heuer >Priority: Major > > Since adapting our build for Spark 2.4.x, we are unable to run on Spark 2.2.0 > provided by CDH. For more details, please see linked issue > https://github.com/bigdatagenomics/adam/issues/2157 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28941) Spark Sql Jobs
[ https://issues.apache.org/jira/browse/SPARK-28941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-28941: - Target Version/s: (was: 2.4.3) > Spark Sql Jobs > -- > > Key: SPARK-28941 > URL: https://issues.apache.org/jira/browse/SPARK-28941 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Brahmendra >Priority: Major > Labels: github-import, pull-request-available > Fix For: 2.4.3 > > > HI Team, > I need one favor on spark sql jobs. > I have to 200+ spark sql query running on 7 different hive table. > How can we do this in one jar file to execute all 200+ spark sql jobs. > currently we are managing 7 jar files for each tables. > -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-28943) NoSuchMethodError: shaded.parquet.org.apache.thrift.EncodingUtils.setBit(BIZ)B
[ https://issues.apache.org/jira/browse/SPARK-28943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920644#comment-16920644 ] Hyukjin Kwon edited comment on SPARK-28943 at 9/2/19 6:27 AM: -- Does this happen in regular Apache Spark too, not CDH? Also, please provide steps to reproduce. Also, does that happen in Apache Spark 2.4.x? or 2.2.0? was (Author: hyukjin.kwon): Does this happen in regular Apache Spark too, not CDH? Also, please provide steps to reproduce. > NoSuchMethodError: > shaded.parquet.org.apache.thrift.EncodingUtils.setBit(BIZ)B > --- > > Key: SPARK-28943 > URL: https://issues.apache.org/jira/browse/SPARK-28943 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Michael Heuer >Priority: Major > > Since adapting our build for Spark 2.4.x, we are unable to run on Spark 2.2.0 > provided by CDH. For more details, please see linked issue > https://github.com/bigdatagenomics/adam/issues/2157 -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27336) Incorrect DataSet.summary() result
[ https://issues.apache.org/jira/browse/SPARK-27336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-27336. -- Resolution: Won't Fix > Incorrect DataSet.summary() result > -- > > Key: SPARK-27336 > URL: https://issues.apache.org/jira/browse/SPARK-27336 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Priority: Major > Attachments: test.csv > > > There is a single data point in the minimum_nights column that is 1.0E8 out > of 8k records, but .summary() says it is the 75% and the max. > I compared this with approxQuantile, and approxQuantile for 75% gave the > correct value of 30.0. > To reproduce: > {code:java} > scala> val df = > spark.read.format("csv").load("test.csv").withColumn("minimum_nights", > '_c0.cast("Int")) > df: org.apache.spark.sql.DataFrame = [_c0: string, minimum_nights: int] > scala> df.select("minimum_nights").summary().show() > +---+--+ > |summary|minimum_nights| > +---+--+ > | count| 7072| > | mean| 14156.35407239819| > | stddev|1189128.5444975856| > |min| 1| > |25%| 2| > |50%| 4| > |75%| 1| > |max| 1| > +---+--+ > scala> df.stat.approxQuantile("minimum_nights", Array(0.75), 0.1) > res1: Array[Double] = Array(30.0) > scala> df.stat.approxQuantile("minimum_nights", Array(0.75), 0.001) > res2: Array[Double] = Array(30.0) > scala> df.stat.approxQuantile("minimum_nights", Array(0.75), 0.0001) > res3: Array[Double] = Array(1.0E8) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org