[jira] [Updated] (SPARK-26522) Auth secret error in RBackendAuthHandler
[ https://issues.apache.org/jira/browse/SPARK-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Euijun updated SPARK-26522: --- Description: Hi expert, I try to use livy to connect sparkR backend. This is related to [https://stackoverflow.com/questions/53900995/livy-spark-r-issue] Error message is, {code:java} Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : Auth secret not provided in environment.{code} caused by, spark-2.3.1/R/pkg/R/sparkR.R {code:java} sparkR.sparkContext <- function( ... authSecret <- Sys.getenv("SPARKR_BACKEND_AUTH_SECRET") if (nchar(authSecret) == 0) { stop("Auth secret not provided in environment.") } ... ) {code} Best regard. was: Hi expert, I try to use livy to connect sparkR backend. This is related to https://stackoverflow.com/questions/53900995/livy-spark-r-issue Error message is, {code:java} Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : Auth secret not provided in environment.{code} caused by, spark-2.3.1/R/pkg/R/sparkR.R {code:java} sparkR.sparkContext <- function( ... authSecret <- Sys.getenv("SPARKR_BACKEND_AUTH_SECRET") if (nchar(authSecret) == 0) { stop("Auth secret not provided in environment.") } ... ) {code} Best regard. > Auth secret error in RBackendAuthHandler > > > Key: SPARK-26522 > URL: https://issues.apache.org/jira/browse/SPARK-26522 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.1 >Reporter: Euijun >Assignee: Matt Cheah >Priority: Minor > Labels: newbie > > Hi expert, > I try to use livy to connect sparkR backend. > This is related to > [https://stackoverflow.com/questions/53900995/livy-spark-r-issue] > > Error message is, > {code:java} > Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : > Auth secret not provided in environment.{code} > > caused by, > spark-2.3.1/R/pkg/R/sparkR.R > {code:java} > sparkR.sparkContext <- function( > ... > authSecret <- Sys.getenv("SPARKR_BACKEND_AUTH_SECRET") > if (nchar(authSecret) == 0) { > stop("Auth secret not provided in environment.") > } > ... > ) > {code} > > Best regard. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26523) Getting this error while reading from kinesis :- Could not read until the end sequence number of the range: SequenceNumberRange
CHIRAG YADAV created SPARK-26523: Summary: Getting this error while reading from kinesis :- Could not read until the end sequence number of the range: SequenceNumberRange Key: SPARK-26523 URL: https://issues.apache.org/jira/browse/SPARK-26523 Project: Spark Issue Type: Brainstorming Components: DStreams, Spark Submit, Structured Streaming Affects Versions: 2.4.0 Reporter: CHIRAG YADAV I am using spark to read data from kinesis stream and after reading data for sometime i get this error ERROR Executor: Exception in task 74.0 in stage 52.0 (TID 339) org.apache.spark.SparkException: Could not read until the end sequence number of the range: SequenceNumberRange(godel-logs,shardId-0007,49591040259365283625183097566179815847537156031957172338,49591040259365283625183097600068424422974441881954418802,4517) Can someone please tell why am i getting this error and how to resolve this -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26522) Auth secret error in RBackendAuthHandler
[ https://issues.apache.org/jira/browse/SPARK-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Euijun updated SPARK-26522: --- Shepherd: Apache Spark Labels: newbie (was: ) > Auth secret error in RBackendAuthHandler > > > Key: SPARK-26522 > URL: https://issues.apache.org/jira/browse/SPARK-26522 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.1 >Reporter: Euijun >Assignee: Matt Cheah >Priority: Minor > Labels: newbie > > Hi expert, > I try to use livy to connect sparkR backend. > This is related to > https://stackoverflow.com/questions/53900995/livy-spark-r-issue > > Error message is, > {code:java} > Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : > Auth secret not provided in environment.{code} > caused by, > spark-2.3.1/R/pkg/R/sparkR.R > > {code:java} > sparkR.sparkContext <- function( > ... > authSecret <- Sys.getenv("SPARKR_BACKEND_AUTH_SECRET") > if (nchar(authSecret) == 0) { > stop("Auth secret not provided in environment.") > } > ... > ) > {code} > > Best regard. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26522) Auth secret error in RBackendAuthHandler
[ https://issues.apache.org/jira/browse/SPARK-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Euijun updated SPARK-26522: --- Affects Version/s: (was: 3.0.0) 2.3.1 Priority: Minor (was: Major) Fix Version/s: (was: 3.0.0) Description: Hi expert, I try to use livy to connect sparkR backend. This is related to https://stackoverflow.com/questions/53900995/livy-spark-r-issue Error message is, {code:java} Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : Auth secret not provided in environment.{code} caused by, spark-2.3.1/R/pkg/R/sparkR.R {code:java} sparkR.sparkContext <- function( ... authSecret <- Sys.getenv("SPARKR_BACKEND_AUTH_SECRET") if (nchar(authSecret) == 0) { stop("Auth secret not provided in environment.") } ... ) {code} Best regard. was: This is a follow up to SPARK-26194, which aims to add auto-generated secrets similar to the YARN backend. There's a desire to support different ways to generate and propagate these auth secrets (e.g. using things like Vault). Need to investigate: - exposing configuration to support that - changing SecurityManager so that it can delegate some of the secret-handling logic to custom implementations - figuring out whether this can also be used in client-mode, where the driver is not created by the k8s backend in Spark. Component/s: (was: Kubernetes) SparkR Issue Type: Bug (was: New Feature) Summary: Auth secret error in RBackendAuthHandler (was: CLONE - Add configurable auth secret source in k8s backend) > Auth secret error in RBackendAuthHandler > > > Key: SPARK-26522 > URL: https://issues.apache.org/jira/browse/SPARK-26522 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.3.1 >Reporter: Euijun >Assignee: Matt Cheah >Priority: Minor > > Hi expert, > I try to use livy to connect sparkR backend. > This is related to > https://stackoverflow.com/questions/53900995/livy-spark-r-issue > > Error message is, > {code:java} > Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : > Auth secret not provided in environment.{code} > caused by, > spark-2.3.1/R/pkg/R/sparkR.R > > {code:java} > sparkR.sparkContext <- function( > ... > authSecret <- Sys.getenv("SPARKR_BACKEND_AUTH_SECRET") > if (nchar(authSecret) == 0) { > stop("Auth secret not provided in environment.") > } > ... > ) > {code} > > Best regard. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26522) CLONE - Add configurable auth secret source in k8s backend
Euijun created SPARK-26522: -- Summary: CLONE - Add configurable auth secret source in k8s backend Key: SPARK-26522 URL: https://issues.apache.org/jira/browse/SPARK-26522 Project: Spark Issue Type: New Feature Components: Kubernetes Affects Versions: 3.0.0 Reporter: Euijun Assignee: Matt Cheah Fix For: 3.0.0 This is a follow up to SPARK-26194, which aims to add auto-generated secrets similar to the YARN backend. There's a desire to support different ways to generate and propagate these auth secrets (e.g. using things like Vault). Need to investigate: - exposing configuration to support that - changing SecurityManager so that it can delegate some of the secret-handling logic to custom implementations - figuring out whether this can also be used in client-mode, where the driver is not created by the k8s backend in Spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26519) spark sql CHANGE COLUMN not working
[ https://issues.apache.org/jira/browse/SPARK-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732714#comment-16732714 ] suman gorantla commented on SPARK-26519: Dear Hyun, Below command was successfully working in hive/ impala but failing in spark sql and spark submit . Found the same error message in the logs as well . This seems to be in correct behavior. Please let me know the reason ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col STRING AFTER old_col -- సుమన్ గోరంట్ల > spark sql CHANGE COLUMN not working > -- > > Key: SPARK-26519 > URL: https://issues.apache.org/jira/browse/SPARK-26519 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 > Environment: !image-2019-01-02-14-25-34-594.png! >Reporter: suman gorantla >Priority: Major > Attachments: sparksql error.PNG > > > Dear Team, > with spark sql I am unable to change the newly added column() position after > an existing column in the table (old_column) of a hive external table please > see the screenshot as in below > scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col > STRING)") > res14: org.apache.spark.sql.DataFrame = [] > sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col > STRING AFTER old_col ") > org.apache.spark.sql.catalyst.parser.ParseException: > Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE > COLUMN ... FIRST | AFTER otherCol(line 1, pos 0) > == SQL == > ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col > STRING AFTER old_col > ^^^ > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485) > at > org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623) > ... 48 elided > !image-2019-01-02-14-25-40-980.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26437) Decimal data becomes bigint to query, unable to query
[ https://issues.apache.org/jira/browse/SPARK-26437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732701#comment-16732701 ] zengxl commented on SPARK-26437: Thanks [~dongjoon] > Decimal data becomes bigint to query, unable to query > - > > Key: SPARK-26437 > URL: https://issues.apache.org/jira/browse/SPARK-26437 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.2, 2.3.1 >Reporter: zengxl >Priority: Major > Fix For: 3.0.0 > > > this is my sql: > create table tmp.tmp_test_6387_1224_spark stored as ORCFile as select 0.00 > as a > select a from tmp.tmp_test_6387_1224_spark > CREATE TABLE `tmp.tmp_test_6387_1224_spark`( > {color:#f79232} `a` decimal(2,2)){color} > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > When I query this table(use hive or sparksql,the exception is same), I throw > the following exception information > *Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed > stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 > limit: 0* > *at > org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readBigInteger(SerializationUtils.java:176)* > *at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$DecimalTreeReader.next(TreeReaderFactory.java:1264)* > *at > org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)* > *at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039)* > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26521) Sparksql cannot modify the field name of a table
[ https://issues.apache.org/jira/browse/SPARK-26521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732698#comment-16732698 ] zengxl commented on SPARK-26521: ok,thank you [~dongjoon] > Sparksql cannot modify the field name of a table > > > Key: SPARK-26521 > URL: https://issues.apache.org/jira/browse/SPARK-26521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: zengxl >Priority: Major > > When i alter table info use sparksql,throw excepiton > > alter table tmp.testchange change column i m string; > *Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing > column 'i' with type 'StringType' to 'm' with type 'StringType';* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26512) Spark 2.4.0 is not working with Hadoop 2.8.3 in windows 10?
[ https://issues.apache.org/jira/browse/SPARK-26512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26512: -- Flags: (was: Important) > Spark 2.4.0 is not working with Hadoop 2.8.3 in windows 10? > --- > > Key: SPARK-26512 > URL: https://issues.apache.org/jira/browse/SPARK-26512 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell, YARN >Affects Versions: 2.4.0 > Environment: operating system : Windows 10 > Spark Version : 2.4.0 > Hadoop Version : 2.8.3 >Reporter: Anubhav Jain >Priority: Minor > Labels: windows > Attachments: log.png > > > I have installed Hadoop version 2.8.3 in my windows 10 environment and its > working fine. Now when i try to install Apache Spark(version 2.4.0) with yarn > as cluster manager and its not working. When i try to submit a spark job > using spark-submit for testing , so its coming under ACCEPTED tab in YARN UI > after that it fail -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26521) Sparksql cannot modify the field name of a table
[ https://issues.apache.org/jira/browse/SPARK-26521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-26521. --- Resolution: Duplicate Hi, [~zengxl]. Thank you for reporting, but this duplicates SPARK-24602. Please search JIRA issues before creating next time. > Sparksql cannot modify the field name of a table > > > Key: SPARK-26521 > URL: https://issues.apache.org/jira/browse/SPARK-26521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: zengxl >Priority: Major > > When i alter table info use sparksql,throw excepiton > > alter table tmp.testchange change column i m string; > *Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing > column 'i' with type 'StringType' to 'm' with type 'StringType';* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-26521) Sparksql cannot modify the field name of a table
[ https://issues.apache.org/jira/browse/SPARK-26521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-26521. - > Sparksql cannot modify the field name of a table > > > Key: SPARK-26521 > URL: https://issues.apache.org/jira/browse/SPARK-26521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: zengxl >Priority: Major > > When i alter table info use sparksql,throw excepiton > > alter table tmp.testchange change column i m string; > *Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing > column 'i' with type 'StringType' to 'm' with type 'StringType';* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value
[ https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732659#comment-16732659 ] Dongjoon Hyun commented on SPARK-22951: --- Hi, [~feng...@databricks.com] and [~lian cheng]. Since this is a correctness issue reported on branch-2.2, I'll backport this for Spark 2.2.3. > count() after dropDuplicates() on emptyDataFrame returns incorrect value > > > Key: SPARK-22951 > URL: https://issues.apache.org/jira/browse/SPARK-22951 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.2.0, 2.3.0 >Reporter: Michael Dreibelbis >Assignee: Feng Liu >Priority: Major > Labels: correctness > Fix For: 2.3.0 > > > here is a minimal Spark Application to reproduce: > {code} > import org.apache.spark.sql.SQLContext > import org.apache.spark.{SparkConf, SparkContext} > object DropDupesApp extends App { > > override def main(args: Array[String]): Unit = { > val conf = new SparkConf() > .setAppName("test") > .setMaster("local") > val sc = new SparkContext(conf) > val sql = SQLContext.getOrCreate(sc) > assert(sql.emptyDataFrame.count == 0) // expected > assert(sql.emptyDataFrame.dropDuplicates.count == 1) // unexpected > } > > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value
[ https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-22951: -- Target Version/s: 2.3.0, 2.2.3 (was: 2.3.0) > count() after dropDuplicates() on emptyDataFrame returns incorrect value > > > Key: SPARK-22951 > URL: https://issues.apache.org/jira/browse/SPARK-22951 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2, 2.2.0, 2.3.0 >Reporter: Michael Dreibelbis >Assignee: Feng Liu >Priority: Major > Labels: correctness > Fix For: 2.3.0 > > > here is a minimal Spark Application to reproduce: > {code} > import org.apache.spark.sql.SQLContext > import org.apache.spark.{SparkConf, SparkContext} > object DropDupesApp extends App { > > override def main(args: Array[String]): Unit = { > val conf = new SparkConf() > .setAppName("test") > .setMaster("local") > val sc = new SparkContext(conf) > val sql = SQLContext.getOrCreate(sc) > assert(sql.emptyDataFrame.count == 0) // expected > assert(sql.emptyDataFrame.dropDuplicates.count == 1) // unexpected > } > > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26019) pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()
[ https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26019. -- Resolution: Fixed Issue resolved by pull request 23337 [https://github.com/apache/spark/pull/23337] > pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" > in authenticate_and_accum_updates() > > > Key: SPARK-26019 > URL: https://issues.apache.org/jira/browse/SPARK-26019 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2, 2.4.0 >Reporter: Ruslan Dautkhanov >Assignee: Imran Rashid >Priority: Major > Fix For: 2.3.3, 2.4.1 > > > pyspark's accumulator server expects a secure py4j connection between python > and the jvm. Spark will normally create a secure connection, but there is a > public api which allows you to pass in your own py4j connection. (this is > used by zeppelin, at least.) When this happens, you get an error like: > {noformat} > pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" > in authenticate_and_accum_updates() > {noformat} > We should change pyspark to > 1) warn loudly if a user passes in an insecure connection > 1a) I'd like to suggest that we even error out, unless the user actively > opts-in with a config like "spark.python.allowInsecurePy4j=true" > 2) The accumulator server should be changed to allow insecure connections. > note that SPARK-26349 will disallow insecure connections completely in 3.0. > > More info on how this occurs: > {code:python} > Exception happened during processing of request from ('127.0.0.1', 43418) > > Traceback (most recent call last): > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 290, in _handle_request_noblock > self.process_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 318, in process_request > self.finish_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 331, in finish_request > self.RequestHandlerClass(request, client_address, self) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 652, in __init__ > self.handle() > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 263, in handle > poll(authenticate_and_accum_updates) > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 238, in poll > if func(): > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 251, in authenticate_and_accum_updates > received_token = self.rfile.read(len(auth_token)) > TypeError: object of type 'NoneType' has no len() > > {code} > > Error happens here: > https://github.com/apache/spark/blob/cb90617f894fd51a092710271823ec7d1cd3a668/python/pyspark/accumulators.py#L254 > The PySpark code was just running a simple pipeline of > binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. ) > and then converting it to a dataframe and running a count on it. > It seems error is flaky - on next rerun it didn't happen. (But accumulators > don't actually work.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26019) pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()
[ https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-26019: Assignee: Imran Rashid > pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" > in authenticate_and_accum_updates() > > > Key: SPARK-26019 > URL: https://issues.apache.org/jira/browse/SPARK-26019 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2, 2.4.0 >Reporter: Ruslan Dautkhanov >Assignee: Imran Rashid >Priority: Major > Fix For: 2.3.3, 2.4.1 > > > pyspark's accumulator server expects a secure py4j connection between python > and the jvm. Spark will normally create a secure connection, but there is a > public api which allows you to pass in your own py4j connection. (this is > used by zeppelin, at least.) When this happens, you get an error like: > {noformat} > pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" > in authenticate_and_accum_updates() > {noformat} > We should change pyspark to > 1) warn loudly if a user passes in an insecure connection > 1a) I'd like to suggest that we even error out, unless the user actively > opts-in with a config like "spark.python.allowInsecurePy4j=true" > 2) The accumulator server should be changed to allow insecure connections. > note that SPARK-26349 will disallow insecure connections completely in 3.0. > > More info on how this occurs: > {code:python} > Exception happened during processing of request from ('127.0.0.1', 43418) > > Traceback (most recent call last): > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 290, in _handle_request_noblock > self.process_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 318, in process_request > self.finish_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 331, in finish_request > self.RequestHandlerClass(request, client_address, self) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 652, in __init__ > self.handle() > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 263, in handle > poll(authenticate_and_accum_updates) > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 238, in poll > if func(): > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 251, in authenticate_and_accum_updates > received_token = self.rfile.read(len(auth_token)) > TypeError: object of type 'NoneType' has no len() > > {code} > > Error happens here: > https://github.com/apache/spark/blob/cb90617f894fd51a092710271823ec7d1cd3a668/python/pyspark/accumulators.py#L254 > The PySpark code was just running a simple pipeline of > binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. ) > and then converting it to a dataframe and running a count on it. > It seems error is flaky - on next rerun it didn't happen. (But accumulators > don't actually work.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26019) pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()
[ https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26019: - Fix Version/s: 2.4.1 2.3.3 > pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" > in authenticate_and_accum_updates() > > > Key: SPARK-26019 > URL: https://issues.apache.org/jira/browse/SPARK-26019 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2, 2.4.0 >Reporter: Ruslan Dautkhanov >Priority: Major > Fix For: 2.3.3, 2.4.1 > > > pyspark's accumulator server expects a secure py4j connection between python > and the jvm. Spark will normally create a secure connection, but there is a > public api which allows you to pass in your own py4j connection. (this is > used by zeppelin, at least.) When this happens, you get an error like: > {noformat} > pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" > in authenticate_and_accum_updates() > {noformat} > We should change pyspark to > 1) warn loudly if a user passes in an insecure connection > 1a) I'd like to suggest that we even error out, unless the user actively > opts-in with a config like "spark.python.allowInsecurePy4j=true" > 2) The accumulator server should be changed to allow insecure connections. > note that SPARK-26349 will disallow insecure connections completely in 3.0. > > More info on how this occurs: > {code:python} > Exception happened during processing of request from ('127.0.0.1', 43418) > > Traceback (most recent call last): > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 290, in _handle_request_noblock > self.process_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 318, in process_request > self.finish_request(request, client_address) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 331, in finish_request > self.RequestHandlerClass(request, client_address, self) > File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line > 652, in __init__ > self.handle() > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 263, in handle > poll(authenticate_and_accum_updates) > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 238, in poll > if func(): > File > "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py", > line 251, in authenticate_and_accum_updates > received_token = self.rfile.read(len(auth_token)) > TypeError: object of type 'NoneType' has no len() > > {code} > > Error happens here: > https://github.com/apache/spark/blob/cb90617f894fd51a092710271823ec7d1cd3a668/python/pyspark/accumulators.py#L254 > The PySpark code was just running a simple pipeline of > binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. ) > and then converting it to a dataframe and running a count on it. > It seems error is flaky - on next rerun it didn't happen. (But accumulators > don't actually work.) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26403) DataFrame pivot using array column fails with "Unsupported literal type class"
[ https://issues.apache.org/jira/browse/SPARK-26403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-26403. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23349 [https://github.com/apache/spark/pull/23349] > DataFrame pivot using array column fails with "Unsupported literal type class" > -- > > Key: SPARK-26403 > URL: https://issues.apache.org/jira/browse/SPARK-26403 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Huon Wilson >Assignee: Hyukjin Kwon >Priority: Minor > Fix For: 3.0.0 > > > Doing a pivot (using the {{pivot(pivotColumn: Column)}} overload) on a column > containing arrays results in a runtime error: > {code:none} > scala> val df = Seq((1, Seq("a", "x"), 2), (1, Seq("b"), 3), (2, Seq("a", > "x"), 10), (3, Seq(), 100)).toDF("x", "s", "y") > df: org.apache.spark.sql.DataFrame = [x: int, s: array ... 1 more > field] > scala> df.show > +---+--+---+ > | x| s| y| > +---+--+---+ > | 1|[a, x]| 2| > | 1| [b]| 3| > | 2|[a, x]| 10| > | 3|[]|100| > +---+--+---+ > scala> df.groupBy("x").pivot("s").agg(collect_list($"y")).show > java.lang.RuntimeException: Unsupported literal type class > scala.collection.mutable.WrappedArray$ofRef WrappedArray() > at > org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78) > at > org.apache.spark.sql.RelationalGroupedDataset$$anonfun$pivot$1.apply(RelationalGroupedDataset.scala:419) > at > org.apache.spark.sql.RelationalGroupedDataset$$anonfun$pivot$1.apply(RelationalGroupedDataset.scala:419) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:419) > at > org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:397) > at > org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:317) > ... 49 elided > {code} > However, this doesn't seem to be a fundamental limitation with {{pivot}}, as > it works fine using the {{pivot(pivotColumn: Column, values: Seq[Any])}} > overload, as long as the arrays are mapped to the {{Array}} type: > {code:none} > scala> val rawValues = df.select("s").distinct.sort("s").collect > rawValues: Array[org.apache.spark.sql.Row] = Array([WrappedArray()], > [WrappedArray(a, x)], [WrappedArray(b)]) > scala> val values = rawValues.map(_.getSeq[String](0).to[Array]) > values: Array[Array[String]] = Array(Array(), Array(a, x), Array(b)) > scala> df.groupBy("x").pivot("s", values).agg(collect_list($"y")).show > +---+-+--+---+ > | x| []|[a, x]|[b]| > +---+-+--+---+ > | 1| []| [2]|[3]| > | 3|[100]|[]| []| > | 2| []| [10]| []| > +---+-+--+---+ > {code} > It would be nice if {{pivot}} was more resilient to Spark's own > representation of array columns, and so the first version worked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26403) DataFrame pivot using array column fails with "Unsupported literal type class"
[ https://issues.apache.org/jira/browse/SPARK-26403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-26403: Assignee: Hyukjin Kwon > DataFrame pivot using array column fails with "Unsupported literal type class" > -- > > Key: SPARK-26403 > URL: https://issues.apache.org/jira/browse/SPARK-26403 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Huon Wilson >Assignee: Hyukjin Kwon >Priority: Minor > > Doing a pivot (using the {{pivot(pivotColumn: Column)}} overload) on a column > containing arrays results in a runtime error: > {code:none} > scala> val df = Seq((1, Seq("a", "x"), 2), (1, Seq("b"), 3), (2, Seq("a", > "x"), 10), (3, Seq(), 100)).toDF("x", "s", "y") > df: org.apache.spark.sql.DataFrame = [x: int, s: array ... 1 more > field] > scala> df.show > +---+--+---+ > | x| s| y| > +---+--+---+ > | 1|[a, x]| 2| > | 1| [b]| 3| > | 2|[a, x]| 10| > | 3|[]|100| > +---+--+---+ > scala> df.groupBy("x").pivot("s").agg(collect_list($"y")).show > java.lang.RuntimeException: Unsupported literal type class > scala.collection.mutable.WrappedArray$ofRef WrappedArray() > at > org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78) > at > org.apache.spark.sql.RelationalGroupedDataset$$anonfun$pivot$1.apply(RelationalGroupedDataset.scala:419) > at > org.apache.spark.sql.RelationalGroupedDataset$$anonfun$pivot$1.apply(RelationalGroupedDataset.scala:419) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:419) > at > org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:397) > at > org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:317) > ... 49 elided > {code} > However, this doesn't seem to be a fundamental limitation with {{pivot}}, as > it works fine using the {{pivot(pivotColumn: Column, values: Seq[Any])}} > overload, as long as the arrays are mapped to the {{Array}} type: > {code:none} > scala> val rawValues = df.select("s").distinct.sort("s").collect > rawValues: Array[org.apache.spark.sql.Row] = Array([WrappedArray()], > [WrappedArray(a, x)], [WrappedArray(b)]) > scala> val values = rawValues.map(_.getSeq[String](0).to[Array]) > values: Array[Array[String]] = Array(Array(), Array(a, x), Array(b)) > scala> df.groupBy("x").pivot("s", values).agg(collect_list($"y")).show > +---+-+--+---+ > | x| []|[a, x]|[b]| > +---+-+--+---+ > | 1| []| [2]|[3]| > | 3|[100]|[]| []| > | 2| []| [10]| []| > +---+-+--+---+ > {code} > It would be nice if {{pivot}} was more resilient to Spark's own > representation of array columns, and so the first version worked. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs
[ https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-25591: - Fix Version/s: 2.3.3 2.2.3 > PySpark Accumulators with multiple PythonUDFs > - > > Key: SPARK-25591 > URL: https://issues.apache.org/jira/browse/SPARK-25591 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2 >Reporter: Abdeali Kothari >Assignee: Liang-Chi Hsieh >Priority: Blocker > Labels: correctness > Fix For: 2.2.3, 2.3.3, 2.4.0 > > > When having multiple Python UDFs - the last Python UDF's accumulator is the > only accumulator that gets updated. > {code:python} > import pyspark > from pyspark.sql import SparkSession, Row > from pyspark.sql import functions as F > from pyspark.sql import types as T > from pyspark import AccumulatorParam > spark = SparkSession.builder.getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > test_accum = spark.sparkContext.accumulator(0.0) > SHUFFLE = False > def main(data): > print(">>> Check0", test_accum.value) > def test(x): > global test_accum > test_accum += 1.0 > return x > print(">>> Check1", test_accum.value) > def test2(x): > global test_accum > test_accum += 100.0 > return x > print(">>> Check2", test_accum.value) > func_udf = F.udf(test, T.DoubleType()) > print(">>> Check3", test_accum.value) > func_udf2 = F.udf(test2, T.DoubleType()) > print(">>> Check4", test_accum.value) > data = data.withColumn("out1", func_udf(data["a"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check5", test_accum.value) > data = data.withColumn("out2", func_udf2(data["b"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check6", test_accum.value) > data.show() # ACTION > print(">>> Check7", test_accum.value) > return data > df = spark.createDataFrame([ > [1.0, 2.0] > ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for > field_name in ["a", "b"]])) > df2 = main(df) > {code} > {code:python} > Output 1 - with SHUFFLE=False > ... > # >>> Check7 100.0 > Output 2 - with SHUFFLE=True > ... > # >>> Check7 101.0 > {code} > Basically looks like: > - Accumulator works only for last UDF before a shuffle-like operation -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs
[ https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25591: -- Target Version/s: 2.4.0, 2.2.3, 2.3.3 (was: 2.4.0) > PySpark Accumulators with multiple PythonUDFs > - > > Key: SPARK-25591 > URL: https://issues.apache.org/jira/browse/SPARK-25591 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2 >Reporter: Abdeali Kothari >Assignee: Liang-Chi Hsieh >Priority: Blocker > Labels: correctness > Fix For: 2.4.0 > > > When having multiple Python UDFs - the last Python UDF's accumulator is the > only accumulator that gets updated. > {code:python} > import pyspark > from pyspark.sql import SparkSession, Row > from pyspark.sql import functions as F > from pyspark.sql import types as T > from pyspark import AccumulatorParam > spark = SparkSession.builder.getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > test_accum = spark.sparkContext.accumulator(0.0) > SHUFFLE = False > def main(data): > print(">>> Check0", test_accum.value) > def test(x): > global test_accum > test_accum += 1.0 > return x > print(">>> Check1", test_accum.value) > def test2(x): > global test_accum > test_accum += 100.0 > return x > print(">>> Check2", test_accum.value) > func_udf = F.udf(test, T.DoubleType()) > print(">>> Check3", test_accum.value) > func_udf2 = F.udf(test2, T.DoubleType()) > print(">>> Check4", test_accum.value) > data = data.withColumn("out1", func_udf(data["a"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check5", test_accum.value) > data = data.withColumn("out2", func_udf2(data["b"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check6", test_accum.value) > data.show() # ACTION > print(">>> Check7", test_accum.value) > return data > df = spark.createDataFrame([ > [1.0, 2.0] > ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for > field_name in ["a", "b"]])) > df2 = main(df) > {code} > {code:python} > Output 1 - with SHUFFLE=False > ... > # >>> Check7 100.0 > Output 2 - with SHUFFLE=True > ... > # >>> Check7 101.0 > {code} > Basically looks like: > - Accumulator works only for last UDF before a shuffle-like operation -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26457) Show hadoop configurations in HistoryServer environment tab
[ https://issues.apache.org/jira/browse/SPARK-26457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-26457: Priority: Minor (was: Major) > Show hadoop configurations in HistoryServer environment tab > --- > > Key: SPARK-26457 > URL: https://issues.apache.org/jira/browse/SPARK-26457 > Project: Spark > Issue Type: New Feature > Components: Spark Core, Web UI >Affects Versions: 2.3.2, 2.4.0 > Environment: Maybe it is good to show some configurations in > HistoryServer environment tab for debugging some bugs about hadoop >Reporter: deshanxiao >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26516) zeppelin with spark on mesos: environment variable setting
[ https://issues.apache.org/jira/browse/SPARK-26516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved SPARK-26516. - Resolution: Invalid > zeppelin with spark on mesos: environment variable setting > -- > > Key: SPARK-26516 > URL: https://issues.apache.org/jira/browse/SPARK-26516 > Project: Spark > Issue Type: Question > Components: Mesos, Spark Core >Affects Versions: 2.4.0 >Reporter: Yui Hirasawa >Priority: Major > > I am trying to use zeppelin with spark on mesos mode following [Apache > Zeppelin on Spark Cluster > Mode|https://zeppelin.apache.org/docs/0.8.0/setup/deployment/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin-1]. > In the instruction, we should set these environment variables: > {code:java} > export MASTER=mesos://127.0.1.1:5050 > export MESOS_NATIVE_JAVA_LIBRARY=[PATH OF libmesos.so] > export SPARK_HOME=[PATH OF SPARK HOME] > {code} > As far as I know, these environment variables are used by zeppelin, so it > should be set in localhost rather than in docker container(if i am wrong > please correct me). > But mesos and spark is running inside docker container, so do we need to set > these environment variables so that they are pointing to the path inside the > docker container? If so, how should one achieve that? > Thanks in advance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26513) Trigger GC on executor node idle
[ https://issues.apache.org/jira/browse/SPARK-26513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao updated SPARK-26513: Fix Version/s: (was: 3.0.0) > Trigger GC on executor node idle > > > Key: SPARK-26513 > URL: https://issues.apache.org/jira/browse/SPARK-26513 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Sandish Kumar HN >Priority: Major > > > Correct me if I'm wrong. > *Stage:* > On a large cluster, each stage would have some executors. were a few > executors would finish a couple of tasks first and wait for whole stage or > remaining tasks to finish which are executed by different executors nodes in > a cluster. a stage will only be completed when all tasks in a current stage > finish its execution. and the next stage execution has to wait till all tasks > of the current stage are completed. > > why don't we trigger GC, when the executor node is waiting for remaining > tasks to finish, or executor Idle? anyways executor has to wait for the > remaining tasks to finish which can at least take a couple of seconds. why > don't we trigger GC? which will max take <300ms > > I have proposed a small code snippet which triggers GC when running tasks are > empty and heap usage in current executor node is more than the given > threshold. > This could improve performance for long-running spark job's. > we referred this paper > [https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] > and we found performance improvements in our long-running spark batch job's. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26516) zeppelin with spark on mesos: environment variable setting
[ https://issues.apache.org/jira/browse/SPARK-26516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732587#comment-16732587 ] Saisai Shao commented on SPARK-26516: - Questions should go to user@spark mail list. Also if this is a problem of Zeppelin, it would be better to ask in the Zeppelin mail list. > zeppelin with spark on mesos: environment variable setting > -- > > Key: SPARK-26516 > URL: https://issues.apache.org/jira/browse/SPARK-26516 > Project: Spark > Issue Type: Question > Components: Mesos, Spark Core >Affects Versions: 2.4.0 >Reporter: Yui Hirasawa >Priority: Major > > I am trying to use zeppelin with spark on mesos mode following [Apache > Zeppelin on Spark Cluster > Mode|https://zeppelin.apache.org/docs/0.8.0/setup/deployment/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin-1]. > In the instruction, we should set these environment variables: > {code:java} > export MASTER=mesos://127.0.1.1:5050 > export MESOS_NATIVE_JAVA_LIBRARY=[PATH OF libmesos.so] > export SPARK_HOME=[PATH OF SPARK HOME] > {code} > As far as I know, these environment variables are used by zeppelin, so it > should be set in localhost rather than in docker container(if i am wrong > please correct me). > But mesos and spark is running inside docker container, so do we need to set > these environment variables so that they are pointing to the path inside the > docker container? If so, how should one achieve that? > Thanks in advance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs
[ https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732574#comment-16732574 ] Hyukjin Kwon commented on SPARK-25591: -- +1 > PySpark Accumulators with multiple PythonUDFs > - > > Key: SPARK-25591 > URL: https://issues.apache.org/jira/browse/SPARK-25591 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2 >Reporter: Abdeali Kothari >Assignee: Liang-Chi Hsieh >Priority: Blocker > Labels: correctness > Fix For: 2.4.0 > > > When having multiple Python UDFs - the last Python UDF's accumulator is the > only accumulator that gets updated. > {code:python} > import pyspark > from pyspark.sql import SparkSession, Row > from pyspark.sql import functions as F > from pyspark.sql import types as T > from pyspark import AccumulatorParam > spark = SparkSession.builder.getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > test_accum = spark.sparkContext.accumulator(0.0) > SHUFFLE = False > def main(data): > print(">>> Check0", test_accum.value) > def test(x): > global test_accum > test_accum += 1.0 > return x > print(">>> Check1", test_accum.value) > def test2(x): > global test_accum > test_accum += 100.0 > return x > print(">>> Check2", test_accum.value) > func_udf = F.udf(test, T.DoubleType()) > print(">>> Check3", test_accum.value) > func_udf2 = F.udf(test2, T.DoubleType()) > print(">>> Check4", test_accum.value) > data = data.withColumn("out1", func_udf(data["a"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check5", test_accum.value) > data = data.withColumn("out2", func_udf2(data["b"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check6", test_accum.value) > data.show() # ACTION > print(">>> Check7", test_accum.value) > return data > df = spark.createDataFrame([ > [1.0, 2.0] > ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for > field_name in ["a", "b"]])) > df2 = main(df) > {code} > {code:python} > Output 1 - with SHUFFLE=False > ... > # >>> Check7 100.0 > Output 2 - with SHUFFLE=True > ... > # >>> Check7 101.0 > {code} > Basically looks like: > - Accumulator works only for last UDF before a shuffle-like operation -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs
[ https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732573#comment-16732573 ] Dongjoon Hyun commented on SPARK-25591: --- Thank you for confirming, [~viirya]. Yes. Please make two PRs for them. > PySpark Accumulators with multiple PythonUDFs > - > > Key: SPARK-25591 > URL: https://issues.apache.org/jira/browse/SPARK-25591 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2 >Reporter: Abdeali Kothari >Assignee: Liang-Chi Hsieh >Priority: Blocker > Labels: correctness > Fix For: 2.4.0 > > > When having multiple Python UDFs - the last Python UDF's accumulator is the > only accumulator that gets updated. > {code:python} > import pyspark > from pyspark.sql import SparkSession, Row > from pyspark.sql import functions as F > from pyspark.sql import types as T > from pyspark import AccumulatorParam > spark = SparkSession.builder.getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > test_accum = spark.sparkContext.accumulator(0.0) > SHUFFLE = False > def main(data): > print(">>> Check0", test_accum.value) > def test(x): > global test_accum > test_accum += 1.0 > return x > print(">>> Check1", test_accum.value) > def test2(x): > global test_accum > test_accum += 100.0 > return x > print(">>> Check2", test_accum.value) > func_udf = F.udf(test, T.DoubleType()) > print(">>> Check3", test_accum.value) > func_udf2 = F.udf(test2, T.DoubleType()) > print(">>> Check4", test_accum.value) > data = data.withColumn("out1", func_udf(data["a"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check5", test_accum.value) > data = data.withColumn("out2", func_udf2(data["b"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check6", test_accum.value) > data.show() # ACTION > print(">>> Check7", test_accum.value) > return data > df = spark.createDataFrame([ > [1.0, 2.0] > ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for > field_name in ["a", "b"]])) > df2 = main(df) > {code} > {code:python} > Output 1 - with SHUFFLE=False > ... > # >>> Check7 100.0 > Output 2 - with SHUFFLE=True > ... > # >>> Check7 101.0 > {code} > Basically looks like: > - Accumulator works only for last UDF before a shuffle-like operation -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs
[ https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732569#comment-16732569 ] Liang-Chi Hsieh commented on SPARK-25591: - I can make backport PRs if you need. [~dongjoon] > PySpark Accumulators with multiple PythonUDFs > - > > Key: SPARK-25591 > URL: https://issues.apache.org/jira/browse/SPARK-25591 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2 >Reporter: Abdeali Kothari >Assignee: Liang-Chi Hsieh >Priority: Blocker > Labels: correctness > Fix For: 2.4.0 > > > When having multiple Python UDFs - the last Python UDF's accumulator is the > only accumulator that gets updated. > {code:python} > import pyspark > from pyspark.sql import SparkSession, Row > from pyspark.sql import functions as F > from pyspark.sql import types as T > from pyspark import AccumulatorParam > spark = SparkSession.builder.getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > test_accum = spark.sparkContext.accumulator(0.0) > SHUFFLE = False > def main(data): > print(">>> Check0", test_accum.value) > def test(x): > global test_accum > test_accum += 1.0 > return x > print(">>> Check1", test_accum.value) > def test2(x): > global test_accum > test_accum += 100.0 > return x > print(">>> Check2", test_accum.value) > func_udf = F.udf(test, T.DoubleType()) > print(">>> Check3", test_accum.value) > func_udf2 = F.udf(test2, T.DoubleType()) > print(">>> Check4", test_accum.value) > data = data.withColumn("out1", func_udf(data["a"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check5", test_accum.value) > data = data.withColumn("out2", func_udf2(data["b"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check6", test_accum.value) > data.show() # ACTION > print(">>> Check7", test_accum.value) > return data > df = spark.createDataFrame([ > [1.0, 2.0] > ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for > field_name in ["a", "b"]])) > df2 = main(df) > {code} > {code:python} > Output 1 - with SHUFFLE=False > ... > # >>> Check7 100.0 > Output 2 - with SHUFFLE=True > ... > # >>> Check7 101.0 > {code} > Basically looks like: > - Accumulator works only for last UDF before a shuffle-like operation -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs
[ https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732567#comment-16732567 ] Liang-Chi Hsieh commented on SPARK-25591: - This is bug fixing, so I think it makes sense to backport this to branch-2.3 and 2.2 if needed. > PySpark Accumulators with multiple PythonUDFs > - > > Key: SPARK-25591 > URL: https://issues.apache.org/jira/browse/SPARK-25591 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2 >Reporter: Abdeali Kothari >Assignee: Liang-Chi Hsieh >Priority: Blocker > Labels: correctness > Fix For: 2.4.0 > > > When having multiple Python UDFs - the last Python UDF's accumulator is the > only accumulator that gets updated. > {code:python} > import pyspark > from pyspark.sql import SparkSession, Row > from pyspark.sql import functions as F > from pyspark.sql import types as T > from pyspark import AccumulatorParam > spark = SparkSession.builder.getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > test_accum = spark.sparkContext.accumulator(0.0) > SHUFFLE = False > def main(data): > print(">>> Check0", test_accum.value) > def test(x): > global test_accum > test_accum += 1.0 > return x > print(">>> Check1", test_accum.value) > def test2(x): > global test_accum > test_accum += 100.0 > return x > print(">>> Check2", test_accum.value) > func_udf = F.udf(test, T.DoubleType()) > print(">>> Check3", test_accum.value) > func_udf2 = F.udf(test2, T.DoubleType()) > print(">>> Check4", test_accum.value) > data = data.withColumn("out1", func_udf(data["a"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check5", test_accum.value) > data = data.withColumn("out2", func_udf2(data["b"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check6", test_accum.value) > data.show() # ACTION > print(">>> Check7", test_accum.value) > return data > df = spark.createDataFrame([ > [1.0, 2.0] > ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for > field_name in ["a", "b"]])) > df2 = main(df) > {code} > {code:python} > Output 1 - with SHUFFLE=False > ... > # >>> Check7 100.0 > Output 2 - with SHUFFLE=True > ... > # >>> Check7 101.0 > {code} > Basically looks like: > - Accumulator works only for last UDF before a shuffle-like operation -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26521) Sparksql cannot modify the field name of a table
[ https://issues.apache.org/jira/browse/SPARK-26521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zengxl updated SPARK-26521: --- Environment: (was: alter table tmp.testchange change column i m string; *Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing column 'i' with type 'StringType' to 'm' with type 'StringType';*) > Sparksql cannot modify the field name of a table > > > Key: SPARK-26521 > URL: https://issues.apache.org/jira/browse/SPARK-26521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: zengxl >Priority: Major > > When i alter table info use sparksql,throw excepiton -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26521) Sparksql cannot modify the field name of a table
[ https://issues.apache.org/jira/browse/SPARK-26521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zengxl updated SPARK-26521: --- Description: When i alter table info use sparksql,throw excepiton alter table tmp.testchange change column i m string; *Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing column 'i' with type 'StringType' to 'm' with type 'StringType';* was:When i alter table info use sparksql,throw excepiton > Sparksql cannot modify the field name of a table > > > Key: SPARK-26521 > URL: https://issues.apache.org/jira/browse/SPARK-26521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: zengxl >Priority: Major > > When i alter table info use sparksql,throw excepiton > > alter table tmp.testchange change column i m string; > *Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing > column 'i' with type 'StringType' to 'm' with type 'StringType';* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26521) Sparksql cannot modify the field name of a table
zengxl created SPARK-26521: -- Summary: Sparksql cannot modify the field name of a table Key: SPARK-26521 URL: https://issues.apache.org/jira/browse/SPARK-26521 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1 Environment: alter table tmp.testchange change column i m string; *Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing column 'i' with type 'StringType' to 'm' with type 'StringType';* Reporter: zengxl When i alter table info use sparksql,throw excepiton -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs
[ https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732556#comment-16732556 ] Dongjoon Hyun commented on SPARK-25591: --- Hi, [~viirya], [~hyukjin.kwon]. This is only in branch-2.4. Can we backport this to older branches like branch-2.3 and branch-2.2? cc [~AbdealiJK] > PySpark Accumulators with multiple PythonUDFs > - > > Key: SPARK-25591 > URL: https://issues.apache.org/jira/browse/SPARK-25591 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.3.2 >Reporter: Abdeali Kothari >Assignee: Liang-Chi Hsieh >Priority: Blocker > Labels: correctness > Fix For: 2.4.0 > > > When having multiple Python UDFs - the last Python UDF's accumulator is the > only accumulator that gets updated. > {code:python} > import pyspark > from pyspark.sql import SparkSession, Row > from pyspark.sql import functions as F > from pyspark.sql import types as T > from pyspark import AccumulatorParam > spark = SparkSession.builder.getOrCreate() > spark.sparkContext.setLogLevel("ERROR") > test_accum = spark.sparkContext.accumulator(0.0) > SHUFFLE = False > def main(data): > print(">>> Check0", test_accum.value) > def test(x): > global test_accum > test_accum += 1.0 > return x > print(">>> Check1", test_accum.value) > def test2(x): > global test_accum > test_accum += 100.0 > return x > print(">>> Check2", test_accum.value) > func_udf = F.udf(test, T.DoubleType()) > print(">>> Check3", test_accum.value) > func_udf2 = F.udf(test2, T.DoubleType()) > print(">>> Check4", test_accum.value) > data = data.withColumn("out1", func_udf(data["a"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check5", test_accum.value) > data = data.withColumn("out2", func_udf2(data["b"])) > if SHUFFLE: > data = data.repartition(2) > print(">>> Check6", test_accum.value) > data.show() # ACTION > print(">>> Check7", test_accum.value) > return data > df = spark.createDataFrame([ > [1.0, 2.0] > ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for > field_name in ["a", "b"]])) > df2 = main(df) > {code} > {code:python} > Output 1 - with SHUFFLE=False > ... > # >>> Check7 100.0 > Output 2 - with SHUFFLE=True > ... > # >>> Check7 101.0 > {code} > Basically looks like: > - Accumulator works only for last UDF before a shuffle-like operation -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23980) Resilient Spark driver on Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-23980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-23980: -- Assignee: (was: Marcelo Vanzin) > Resilient Spark driver on Kubernetes > > > Key: SPARK-23980 > URL: https://issues.apache.org/jira/browse/SPARK-23980 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Sebastian Toader >Priority: Major > > The current implementation of `Spark driver` on Kubernetes is not resilient > to node failures as it’s implemented as a `Pod`. In case of a node failure > Kubernetes terminates the pods that were running on that node. Kubernetes > doesn't reschedule these pods to any of the other nodes of the cluster. > If the `driver` is implemented as Kubernetes > [Job|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/] > than it will be rescheduled to other node. > When the driver is terminated its executors (that may run on other nodes) are > terminated by Kubernetes with some delay by [Kubernetes Garbage > collection|https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/]. > This can lead to concurrency issues where the re-spawned `driver` was trying > to create new executors with same name as the executors being in the middle > of being cleaned up by Kubernetes garbage collection. > To solve this issue the executor name must be made unique for each `driver` > *instance*. > The PR linked to this lira is an implementation of the above that creates > spark driver as a Job and ensures that executor pod names are unique per > driver instance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23980) Resilient Spark driver on Kubernetes
[ https://issues.apache.org/jira/browse/SPARK-23980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-23980: -- Assignee: Marcelo Vanzin > Resilient Spark driver on Kubernetes > > > Key: SPARK-23980 > URL: https://issues.apache.org/jira/browse/SPARK-23980 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Sebastian Toader >Assignee: Marcelo Vanzin >Priority: Major > > The current implementation of `Spark driver` on Kubernetes is not resilient > to node failures as it’s implemented as a `Pod`. In case of a node failure > Kubernetes terminates the pods that were running on that node. Kubernetes > doesn't reschedule these pods to any of the other nodes of the cluster. > If the `driver` is implemented as Kubernetes > [Job|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/] > than it will be rescheduled to other node. > When the driver is terminated its executors (that may run on other nodes) are > terminated by Kubernetes with some delay by [Kubernetes Garbage > collection|https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/]. > This can lead to concurrency issues where the re-spawned `driver` was trying > to create new executors with same name as the executors being in the middle > of being cleaned up by Kubernetes garbage collection. > To solve this issue the executor name must be made unique for each `driver` > *instance*. > The PR linked to this lira is an implementation of the above that creates > spark driver as a Job and ensures that executor pod names are unique per > driver instance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26441) Add kind configuration of driver pod
[ https://issues.apache.org/jira/browse/SPARK-26441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-26441. Resolution: Duplicate > Add kind configuration of driver pod > - > > Key: SPARK-26441 > URL: https://issues.apache.org/jira/browse/SPARK-26441 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.3.1, 2.3.2, 2.4.0 >Reporter: Fei Han >Priority: Critical > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26519) spark sql CHANGE COLUMN not working
[ https://issues.apache.org/jira/browse/SPARK-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26519: -- Flags: (was: Important) > spark sql CHANGE COLUMN not working > -- > > Key: SPARK-26519 > URL: https://issues.apache.org/jira/browse/SPARK-26519 > Project: Spark > Issue Type: Bug > Components: Spark Shell, Spark Submit >Affects Versions: 2.1.0 > Environment: !image-2019-01-02-14-25-34-594.png! >Reporter: suman gorantla >Priority: Major > Attachments: sparksql error.PNG > > > Dear Team, > with spark sql I am unable to change the newly added column() position after > an existing column in the table (old_column) of a hive external table please > see the screenshot as in below > scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col > STRING)") > res14: org.apache.spark.sql.DataFrame = [] > sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col > STRING AFTER old_col ") > org.apache.spark.sql.catalyst.parser.ParseException: > Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE > COLUMN ... FIRST | AFTER otherCol(line 1, pos 0) > == SQL == > ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col > STRING AFTER old_col > ^^^ > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485) > at > org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623) > ... 48 elided > !image-2019-01-02-14-25-40-980.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26519) spark sql CHANGE COLUMN not working
[ https://issues.apache.org/jira/browse/SPARK-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26519: -- Component/s: (was: Spark Submit) (was: Spark Shell) SQL > spark sql CHANGE COLUMN not working > -- > > Key: SPARK-26519 > URL: https://issues.apache.org/jira/browse/SPARK-26519 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 > Environment: !image-2019-01-02-14-25-34-594.png! >Reporter: suman gorantla >Priority: Major > Attachments: sparksql error.PNG > > > Dear Team, > with spark sql I am unable to change the newly added column() position after > an existing column in the table (old_column) of a hive external table please > see the screenshot as in below > scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col > STRING)") > res14: org.apache.spark.sql.DataFrame = [] > sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col > STRING AFTER old_col ") > org.apache.spark.sql.catalyst.parser.ParseException: > Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE > COLUMN ... FIRST | AFTER otherCol(line 1, pos 0) > == SQL == > ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col > STRING AFTER old_col > ^^^ > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485) > at > org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623) > ... 48 elided > !image-2019-01-02-14-25-40-980.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26519) spark sql CHANGE COLUMN not working
[ https://issues.apache.org/jira/browse/SPARK-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732528#comment-16732528 ] Dongjoon Hyun commented on SPARK-26519: --- Hi, [~sumanGorantla]. It's not a bug, isn't it? Please see the log. {code} Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE COLUMN ... FIRST | AFTER otherCol(line 1, pos 0) {code} > spark sql CHANGE COLUMN not working > -- > > Key: SPARK-26519 > URL: https://issues.apache.org/jira/browse/SPARK-26519 > Project: Spark > Issue Type: Bug > Components: Spark Shell, Spark Submit >Affects Versions: 2.1.0 > Environment: !image-2019-01-02-14-25-34-594.png! >Reporter: suman gorantla >Priority: Major > Attachments: sparksql error.PNG > > > Dear Team, > with spark sql I am unable to change the newly added column() position after > an existing column in the table (old_column) of a hive external table please > see the screenshot as in below > scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col > STRING)") > res14: org.apache.spark.sql.DataFrame = [] > sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col > STRING AFTER old_col ") > org.apache.spark.sql.catalyst.parser.ParseException: > Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE > COLUMN ... FIRST | AFTER otherCol(line 1, pos 0) > == SQL == > ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col > STRING AFTER old_col > ^^^ > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485) > at > org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623) > ... 48 elided > !image-2019-01-02-14-25-40-980.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23525) ALTER TABLE CHANGE COLUMN COMMENT doesn't work for external hive table
[ https://issues.apache.org/jira/browse/SPARK-23525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-23525: -- Summary: ALTER TABLE CHANGE COLUMN COMMENT doesn't work for external hive table (was: ALTER TABLE CHANGE COLUMN doesn't work for external hive table) > ALTER TABLE CHANGE COLUMN COMMENT doesn't work for external hive table > -- > > Key: SPARK-23525 > URL: https://issues.apache.org/jira/browse/SPARK-23525 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0, 2.3.0 >Reporter: Pavlo Skliar >Assignee: Xingbo Jiang >Priority: Major > Fix For: 2.2.2, 2.3.1, 2.4.0 > > > {code:java} > print(spark.sql(""" > SHOW CREATE TABLE test.trends > """).collect()[0].createtab_stmt) > /// OUTPUT > CREATE EXTERNAL TABLE `test`.`trends`(`id` string COMMENT '', `metric` string > COMMENT '', `amount` bigint COMMENT '') > COMMENT '' > PARTITIONED BY (`date` string COMMENT '') > ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > WITH SERDEPROPERTIES ( > 'serialization.format' = '1' > ) > STORED AS > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION 's3://x/x/' > TBLPROPERTIES ( > 'transient_lastDdlTime' = '1519729384', > 'last_modified_time' = '1519645652', > 'last_modified_by' = 'pavlo', > 'last_castor_run_ts' = '1513561658.0' > ) > spark.sql(""" > DESCRIBE test.trends > """).collect() > // OUTPUT > [Row(col_name='id', data_type='string', comment=''), > Row(col_name='metric', data_type='string', comment=''), > Row(col_name='amount', data_type='bigint', comment=''), > Row(col_name='date', data_type='string', comment=''), > Row(col_name='# Partition Information', data_type='', comment=''), > Row(col_name='# col_name', data_type='data_type', comment='comment'), > Row(col_name='date', data_type='string', comment='')] > spark.sql("""alter table test.trends change column id id string comment > 'unique identifier'""") > spark.sql(""" > DESCRIBE test.trends > """).collect() > // OUTPUT > [Row(col_name='id', data_type='string', comment=''), Row(col_name='metric', > data_type='string', comment=''), Row(col_name='amount', data_type='bigint', > comment=''), Row(col_name='date', data_type='string', comment=''), > Row(col_name='# Partition Information', data_type='', comment=''), > Row(col_name='# col_name', data_type='data_type', comment='comment'), > Row(col_name='date', data_type='string', comment='')] > {code} > The strange is that I've assigned comment to the id field from hive > successfully, and it's visible in Hue UI, but it's still not visible in from > spark, and any spark requests doesn't have effect on the comments. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26277) WholeStageCodegen metrics should be tested with whole-stage codegen enabled
[ https://issues.apache.org/jira/browse/SPARK-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-26277: - Assignee: Chenxiao Mao > WholeStageCodegen metrics should be tested with whole-stage codegen enabled > --- > > Key: SPARK-26277 > URL: https://issues.apache.org/jira/browse/SPARK-26277 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.0 >Reporter: Chenxiao Mao >Assignee: Chenxiao Mao >Priority: Major > > In {{org.apache.spark.sql.execution.metric.SQLMetricsSuite}}, there's a test > case named "WholeStageCodegen metrics". However, it is executed with > whole-stage codegen disabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26277) WholeStageCodegen metrics should be tested with whole-stage codegen enabled
[ https://issues.apache.org/jira/browse/SPARK-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-26277. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23224 [https://github.com/apache/spark/pull/23224] > WholeStageCodegen metrics should be tested with whole-stage codegen enabled > --- > > Key: SPARK-26277 > URL: https://issues.apache.org/jira/browse/SPARK-26277 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 2.4.0 >Reporter: Chenxiao Mao >Assignee: Chenxiao Mao >Priority: Major > Fix For: 3.0.0 > > > In {{org.apache.spark.sql.execution.metric.SQLMetricsSuite}}, there's a test > case named "WholeStageCodegen metrics". However, it is executed with > whole-stage codegen disabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26502) Get rid of hiveResultString() in QueryExecution
[ https://issues.apache.org/jira/browse/SPARK-26502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732435#comment-16732435 ] Mark Hamstra commented on SPARK-26502: -- Don't lose track of this comment: [https://github.com/apache/spark/blob/948414afe706e0b526d7f83f598cbd204d2fc687/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala#L41] Any significant change to QueryExecution needs to doc'd carefully and included in the release notes since you will be forcing 3rd party changes. > Get rid of hiveResultString() in QueryExecution > --- > > Key: SPARK-26502 > URL: https://issues.apache.org/jira/browse/SPARK-26502 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Priority: Minor > > The method hiveResultString() of QueryExecution is used in test and > SparkSQLDriver. It should be moved from QueryExecution to more specific class. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26520) data source V2 API refactoring (micro-batch read)
[ https://issues.apache.org/jira/browse/SPARK-26520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26520: Assignee: Apache Spark (was: Wenchen Fan) > data source V2 API refactoring (micro-batch read) > - > > Key: SPARK-26520 > URL: https://issues.apache.org/jira/browse/SPARK-26520 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26520) data source V2 API refactoring (micro-batch read)
[ https://issues.apache.org/jira/browse/SPARK-26520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26520: Assignee: Wenchen Fan (was: Apache Spark) > data source V2 API refactoring (micro-batch read) > - > > Key: SPARK-26520 > URL: https://issues.apache.org/jira/browse/SPARK-26520 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26520) data source V2 API refactoring (micro-batch read)
Wenchen Fan created SPARK-26520: --- Summary: data source V2 API refactoring (micro-batch read) Key: SPARK-26520 URL: https://issues.apache.org/jira/browse/SPARK-26520 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Wenchen Fan Assignee: Wenchen Fan -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26519) spark sql CHANGE COLUMN not working
[ https://issues.apache.org/jira/browse/SPARK-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] suman gorantla updated SPARK-26519: --- Description: Dear Team, with spark sql I am unable to change the newly added column() position after an existing column in the table (old_column) of a hive external table please see the screenshot as in below scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col STRING)") res14: org.apache.spark.sql.DataFrame = [] sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col STRING AFTER old_col ") org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE COLUMN ... FIRST | AFTER otherCol(line 1, pos 0) == SQL == ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col STRING AFTER old_col ^^^ at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928) at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42) at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623) ... 48 elided !image-2019-01-02-14-25-40-980.png! was: Dear Team, with spark sql I am unable to change the newly added column() position after an existing column in the table (old_column) of a hive external table please see the screenshot as in below scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col STRING)") res14: org.apache.spark.sql.DataFrame = [] sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col STRING AFTER old_col ") org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE COLUMN ... FIRST | AFTER otherCol(line 1, pos 0) == SQL == ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN column_ui column_ui STRING AFTER col1 ^^^ at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928) at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42) at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68) at
[jira] [Updated] (SPARK-26519) spark sql CHANGE COLUMN not working
[ https://issues.apache.org/jira/browse/SPARK-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] suman gorantla updated SPARK-26519: --- Attachment: sparksql error.PNG > spark sql CHANGE COLUMN not working > -- > > Key: SPARK-26519 > URL: https://issues.apache.org/jira/browse/SPARK-26519 > Project: Spark > Issue Type: Bug > Components: Spark Shell, Spark Submit >Affects Versions: 2.1.0 > Environment: !image-2019-01-02-14-25-34-594.png! >Reporter: suman gorantla >Priority: Major > Attachments: sparksql error.PNG > > > Dear Team, > with spark sql I am unable to change the newly added column() position after > an existing column in the table (old_column) of a hive external table please > see the screenshot as in below > scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col > STRING)") > res14: org.apache.spark.sql.DataFrame = [] > sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col > STRING AFTER old_col ") > org.apache.spark.sql.catalyst.parser.ParseException: > Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE > COLUMN ... FIRST | AFTER otherCol(line 1, pos 0) > == SQL == > ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN column_ui column_ui > STRING AFTER col1 > ^^^ > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928) > at > org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485) > at > org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) > at > org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) > at > org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623) > ... 48 elided > !image-2019-01-02-14-25-40-980.png! > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26519) spark sql CHANGE COLUMN not working
suman gorantla created SPARK-26519: -- Summary: spark sql CHANGE COLUMN not working Key: SPARK-26519 URL: https://issues.apache.org/jira/browse/SPARK-26519 Project: Spark Issue Type: Bug Components: Spark Shell, Spark Submit Affects Versions: 2.1.0 Environment: !image-2019-01-02-14-25-34-594.png! Reporter: suman gorantla Attachments: sparksql error.PNG Dear Team, with spark sql I am unable to change the newly added column() position after an existing column in the table (old_column) of a hive external table please see the screenshot as in below scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col STRING)") res14: org.apache.spark.sql.DataFrame = [] sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col STRING AFTER old_col ") org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE COLUMN ... FIRST | AFTER otherCol(line 1, pos 0) == SQL == ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN column_ui column_ui STRING AFTER col1 ^^^ at org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934) at org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928) at org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55) at org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485) at org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42) at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) at org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71) at org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99) at org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623) ... 48 elided !image-2019-01-02-14-25-40-980.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory
[ https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732388#comment-16732388 ] Steve Loughran commented on SPARK-2984: --- Gaurav, if this has returned in a 2.x version against HDFS, best to open a new JIRA and mark as related to this one. > FileNotFoundException on _temporary directory > - > > Key: SPARK-2984 > URL: https://issues.apache.org/jira/browse/SPARK-2984 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Andrew Ash >Assignee: Josh Rosen >Priority: Critical > Fix For: 1.3.0 > > > We've seen several stacktraces and threads on the user mailing list where > people are having issues with a {{FileNotFoundException}} stemming from an > HDFS path containing {{_temporary}}. > I ([~aash]) think this may be related to {{spark.speculation}}. I think the > error condition might manifest in this circumstance: > 1) task T starts on a executor E1 > 2) it takes a long time, so task T' is started on another executor E2 > 3) T finishes in E1 so moves its data from {{_temporary}} to the final > destination and deletes the {{_temporary}} directory during cleanup > 4) T' finishes in E2 and attempts to move its data from {{_temporary}}, but > those files no longer exist! exception > Some samples: > {noformat} > 14/08/11 08:05:08 ERROR JobScheduler: Error running job streaming job > 140774430 ms.0 > java.io.FileNotFoundException: File > hdfs://hadoopc/user/csong/output/human_bot/-140774430.out/_temporary/0/task_201408110805__m_07 > does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:654) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:360) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310) > at > org.apache.hadoop.mapred.FileOutputCommitter.commitJob(FileOutputCommitter.java:136) > at > org.apache.spark.SparkHadoopWriter.commitJob(SparkHadoopWriter.scala:126) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:841) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:724) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:643) > at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1068) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:773) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:771) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > at scala.util.Try$.apply(Try.scala:161) > at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32) > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > -- Chen Song at > http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFiles-file-not-found-exception-td10686.html > {noformat} > I am running a Spark Streaming job that uses saveAsTextFiles to save results > into hdfs files. However, it has an exception after 20 batches > result-140631234/_temporary/0/task_201407251119__m_03 does not > exist. > {noformat} > and > {noformat} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on /apps/data/vddil/real-time/checkpoint/temp: File does not exist. > Holder DFSClient_NONMAPREDUCE_327993456_13 does not have any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2946) >
[jira] [Commented] (SPARK-24603) Typo in comments
[ https://issues.apache.org/jira/browse/SPARK-24603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732340#comment-16732340 ] Dongjoon Hyun commented on SPARK-24603: --- I removed 2.2.2 and added 2.2.3 since this wasn't released as a part of Apache Spark 2.2.2. - https://dist.apache.org/repos/dist/release/spark/spark-2.2.2/spark-2.2.2.tgz > Typo in comments > > > Key: SPARK-24603 > URL: https://issues.apache.org/jira/browse/SPARK-24603 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Trivial > Fix For: 2.2.3, 2.3.2, 2.4.0 > > > The findTightestCommonTypeOfTwo has been renamed to findTightestCommonType -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24603) Typo in comments
[ https://issues.apache.org/jira/browse/SPARK-24603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24603: -- Fix Version/s: 2.2.3 > Typo in comments > > > Key: SPARK-24603 > URL: https://issues.apache.org/jira/browse/SPARK-24603 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Trivial > Fix For: 2.2.3, 2.3.2, 2.4.0 > > > The findTightestCommonTypeOfTwo has been renamed to findTightestCommonType -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24603) Typo in comments
[ https://issues.apache.org/jira/browse/SPARK-24603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24603: -- Fix Version/s: (was: 2.2.2) > Typo in comments > > > Key: SPARK-24603 > URL: https://issues.apache.org/jira/browse/SPARK-24603 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Trivial > Fix For: 2.3.2, 2.4.0 > > > The findTightestCommonTypeOfTwo has been renamed to findTightestCommonType -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25253) Refactor pyspark connection & authentication
[ https://issues.apache.org/jira/browse/SPARK-25253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732329#comment-16732329 ] Dongjoon Hyun commented on SPARK-25253: --- I added `2.2.3` at the fix versions because `branch-2.2` has this. It seems that we need to add `2.3.x`, too. > Refactor pyspark connection & authentication > > > Key: SPARK-25253 > URL: https://issues.apache.org/jira/browse/SPARK-25253 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Assignee: Imran Rashid >Priority: Minor > Fix For: 2.2.3, 2.4.0 > > > We've got a few places in pyspark that connect to local sockets, with varying > levels of ipv6 handling, graceful error handling, and lots of copy-and-paste. > should be pretty easy to clean this up. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25253) Refactor pyspark connection & authentication
[ https://issues.apache.org/jira/browse/SPARK-25253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25253: -- Fix Version/s: 2.2.3 > Refactor pyspark connection & authentication > > > Key: SPARK-25253 > URL: https://issues.apache.org/jira/browse/SPARK-25253 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Assignee: Imran Rashid >Priority: Minor > Fix For: 2.2.3, 2.4.0 > > > We've got a few places in pyspark that connect to local sockets, with varying > levels of ipv6 handling, graceful error handling, and lots of copy-and-paste. > should be pretty easy to clean this up. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732299#comment-16732299 ] Sujith commented on SPARK-26432: Test description is been updated. let me know for any suggestions or input. thnks all. > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > Test steps: > Steps to test Spark-Hbase connection > 1. Create 2 tables in hbase shell > >Launch hbase shell > >Enter commands to create tables and load data > create 'table1','cf' > put 'table1','row1','cf:cid','20' > create 'table2','cf' > put 'table2','row1','cf:cid','30' > > >Show values command > get 'table1','row1','cf:cid' will diplay value as 20 > get 'table2','row1','cf:cid' will diplay value as 30 > > > 2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit > spark-submit --master yarn-cluster --class > com.mrs.example.spark.SparkHbasetoHbase --conf > "spark.yarn.security.credentials.hbase.enabled"="true" --conf > "spark.security.credentials.hbase.enabled"="true" --keytab > /opt/client/user.keytab --principal sen testSpark.jar > The SparkHbasetoHbase class will update the value of table2 with sum of > values of table1 & table2. > table2 = table1+table2 > > 3.Verify the result in hbase shell > Expected Result: The value of table2 should be 50. > get 'table1','row1','cf:cid' will diplay value as 50 > Actual Result : Not updating the value as an error will be thrown when spark > tries to connect with hbase service. > Attached the snapshot of error logs below for more details -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26432: --- Description: Getting NoSuchMethodException : org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) while trying connect hbase 2.1 service from spark. This is mainly happening because in spark uses a deprecated hbase api public static Token obtainToken(Configuration conf) for obtaining the token and the same has been removed from hbase 2.1 version. Test steps: Steps to test Spark-Hbase connection 1. Create 2 tables in hbase shell >Launch hbase shell >Enter commands to create tables and load data create 'table1','cf' put 'table1','row1','cf:cid','20' create 'table2','cf' put 'table2','row1','cf:cid','30' >Show values command get 'table1','row1','cf:cid' will diplay value as 20 get 'table2','row1','cf:cid' will diplay value as 30 2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit spark-submit --master yarn-cluster --class com.mrs.example.spark.SparkHbasetoHbase --conf "spark.yarn.security.credentials.hbase.enabled"="true" --conf "spark.security.credentials.hbase.enabled"="true" --keytab /opt/client/user.keytab --principal sen testSpark.jar The SparkHbasetoHbase class will update the value of table2 with sum of values of table1 & table2. table2 = table1+table2 3.Verify the result in hbase shell Expected Result: The value of table2 should be 50. get 'table1','row1','cf:cid' will diplay value as 50 Actual Result : Not updating the value as an error will be thrown when spark tries to connect with hbase service. Attached the snapshot of error logs below for more details was: Getting NoSuchMethodException : org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) while trying connect hbase 2.1 service from spark. This is mainly happening because in spark uses a deprecated hbase api public static Token obtainToken(Configuration conf) for obtaining the token and the same has been removed from hbase 2.1 version. Attached the snapshot of error logs > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > Test steps: > Steps to test Spark-Hbase connection > 1. Create 2 tables in hbase shell > >Launch hbase shell > >Enter commands to create tables and load data > create 'table1','cf' > put 'table1','row1','cf:cid','20' > create 'table2','cf' > put 'table2','row1','cf:cid','30' > > >Show values command > get 'table1','row1','cf:cid' will diplay value as 20 > get 'table2','row1','cf:cid' will diplay value as 30 > > > 2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit > spark-submit --master yarn-cluster --class > com.mrs.example.spark.SparkHbasetoHbase --conf > "spark.yarn.security.credentials.hbase.enabled"="true" --conf > "spark.security.credentials.hbase.enabled"="true" --keytab > /opt/client/user.keytab --principal sen testSpark.jar > The SparkHbasetoHbase class will update the value of table2 with sum of > values of table1 & table2. > table2 = table1+table2 > > 3.Verify the result in hbase shell > Expected Result: The value of table2 should be 50. > get 'table1','row1','cf:cid' will diplay value as 50 > Actual Result : Not updating the value as an error will be thrown when spark > tries to connect with hbase service. > Attached the snapshot of error logs below for more details -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26432: Assignee: Apache Spark > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Assignee: Apache Spark >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > > Attached the snapshot of error logs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732281#comment-16732281 ] Sujith edited comment on SPARK-26432 at 1/2/19 6:27 PM: sorry for the late response due to holidays :), raised a PR please let me know for any suggestions. thanks. PR is in WIP as i need to attach test report which i will attach tomorrow was (Author: s71955): sorry for the late response due to holidays :), raised a PR please let me know for any suggestions. thanks > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > > Attached the snapshot of error logs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732281#comment-16732281 ] Sujith commented on SPARK-26432: sorry for the late response due to holidays :), raised a PR please let me know for any suggestions. thanks > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > > Attached the snapshot of error logs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26432: Assignee: (was: Apache Spark) > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > > Attached the snapshot of error logs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26507) Fix core tests for Java 11
[ https://issues.apache.org/jira/browse/SPARK-26507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-26507. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23419 [https://github.com/apache/spark/pull/23419] > Fix core tests for Java 11 > -- > > Key: SPARK-26507 > URL: https://issues.apache.org/jira/browse/SPARK-26507 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > Fix For: 3.0.0 > > > Several core tests still don't pass in Java 11. Some simple fixes will make > them pass. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26518) UI Application Info Race Condition Can Throw NoSuchElement
Russell Spitzer created SPARK-26518: --- Summary: UI Application Info Race Condition Can Throw NoSuchElement Key: SPARK-26518 URL: https://issues.apache.org/jira/browse/SPARK-26518 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.4.0, 2.3.0 Reporter: Russell Spitzer There is a slight race condition in the [AppStatusStore|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala#L39] Which calls `next` on the returned store even if it is empty which i can be for a short period of time after the UI is up but before the store is populated. {code} Error 500 Server Error HTTP ERROR 500 Problem accessing /jobs/. Reason: Server ErrorCaused by:java.util.NoSuchElementException at java.util.Collections$EmptyIterator.next(Collections.java:4189) at org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.next(InMemoryStore.java:281) at org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:38) at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:275) at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:86) at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:86) at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:865) at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:535) at org.spark_project.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317) at org.spark_project.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) at org.spark_project.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219) at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:724) at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219) at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.spark_project.jetty.server.Server.handle(Server.java:531) at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:352) at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:102) at org.spark_project.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762) at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680) at java.lang.Thread.run(Thread.java:748) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26489) Use ConfigEntry for hardcoded configs for python/r categories.
[ https://issues.apache.org/jira/browse/SPARK-26489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26489: Assignee: (was: Apache Spark) > Use ConfigEntry for hardcoded configs for python/r categories. > -- > > Key: SPARK-26489 > URL: https://issues.apache.org/jira/browse/SPARK-26489 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Takuya Ueshin >Priority: Major > > Make the following hardcoded configs to use ConfigEntry. > {code} > spark.python > spark.r > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26489) Use ConfigEntry for hardcoded configs for python/r categories.
[ https://issues.apache.org/jira/browse/SPARK-26489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26489: Assignee: Apache Spark > Use ConfigEntry for hardcoded configs for python/r categories. > -- > > Key: SPARK-26489 > URL: https://issues.apache.org/jira/browse/SPARK-26489 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > Make the following hardcoded configs to use ConfigEntry. > {code} > spark.python > spark.r > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732005#comment-16732005 ] Udbhav Agrawal commented on SPARK-26454: okay i will work on that and raise the PR. > While creating new UDF with JAR though UDF is created successfully, it throws > IllegegalArgument Exception > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26511) java.lang.ClassCastException error when loading Spark MLlib model from parquet file
[ https://issues.apache.org/jira/browse/SPARK-26511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731996#comment-16731996 ] Amy Koh commented on SPARK-26511: - Thanks [~viirya]. I do indeed have a slightly modified format of the saved model. I reordered the columns in the schema and it's now working OK. Would be nice it could provide a more meaningful error message in the schema validation step! > java.lang.ClassCastException error when loading Spark MLlib model from > parquet file > --- > > Key: SPARK-26511 > URL: https://issues.apache.org/jira/browse/SPARK-26511 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.4.0 >Reporter: Amy Koh >Priority: Major > Attachments: repro.zip > > > When I tried to load a decision tree model from a parquet file, the following > error is thrown. > {code:bash} > Py4JJavaError: An error occurred while calling > z:org.apache.spark.mllib.tree.model.DecisionTreeModel.load. : > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 > (TID 2, localhost, executor driver): java.lang.ClassCastException: class > java.lang.Double cannot be cast to class java.lang.Integer (java.lang.Double > and java.lang.Integer are in module java.base of loader 'bootstrap') at > scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101) at > org.apache.spark.sql.Row$class.getInt(Row.scala:223) at > org.apache.spark.sql.catalyst.expressions.GenericRow.getInt(rows.scala:165) > at > org.apache.spark.mllib.tree.model.DecisionTreeModel$SaveLoadV1_0$SplitData$.apply(DecisionTreeModel.scala:171) > at > org.apache.spark.mllib.tree.model.DecisionTreeModel$SaveLoadV1_0$NodeData$.apply(DecisionTreeModel.scala:195) > at > org.apache.spark.mllib.tree.model.DecisionTreeModel$SaveLoadV1_0$$anonfun$9.apply(DecisionTreeModel.scala:247) > at > org.apache.spark.mllib.tree.model.DecisionTreeModel$SaveLoadV1_0$$anonfun$9.apply(DecisionTreeModel.scala:247) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at > scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at > org.apache.spark.scheduler.Task.run(Task.scala:108) at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:834) Driver stacktrace: at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) > at scala.Option.foreach(Option.scala:257) at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2022) at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2043) at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2062) at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2087) at > org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936) at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at
[jira] [Commented] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732002#comment-16732002 ] Sujith commented on SPARK-26454: I think [~hyukjin.kwon] idea is better and simple, we can reduce level to warn, because when you say error which means user wont expect the particular operation to be successful sometimes . so to avoid confusions better to lower the error level. > While creating new UDF with JAR though UDF is created successfully, it throws > IllegegalArgument Exception > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731769#comment-16731769 ] Udbhav Agrawal edited comment on SPARK-26454 at 1/2/19 12:15 PM: - CC [~sandeep-katta] [~sujith] [~ajithshetty28] [~S71955] was (Author: udbhav agrawal): CC [~sandeep-katta] [~sujith] [~ajithshetty28] > While creating new UDF with JAR though UDF is created successfully, it throws > IllegegalArgument Exception > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731983#comment-16731983 ] Hyukjin Kwon commented on SPARK-26454: -- Thing is, we shouldn't introduce behaviour change when we fix. Maybe we could consider lowering log level from error to warning. > While creating new UDF with JAR though UDF is created successfully, it throws > IllegegalArgument Exception > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731983#comment-16731983 ] Hyukjin Kwon edited comment on SPARK-26454 at 1/2/19 11:54 AM: --- Thing is, we shouldn't introduce behaviour change when we fix. This code path is shared by multiple APIs. Maybe we could consider lowering log level from error to warning. was (Author: hyukjin.kwon): Thing is, we shouldn't introduce behaviour change when we fix. Maybe we could consider lowering log level from error to warning. > While creating new UDF with JAR though UDF is created successfully, it throws > IllegegalArgument Exception > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731974#comment-16731974 ] Udbhav Agrawal edited comment on SPARK-26454 at 1/2/19 11:26 AM: - [~hyukjin.kwon] [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L78] i thought of changing the require function to conditional statement and instead of throwing an exception provide a warning message instead, but didn't feel any major use. can you suggest to go ahead with the same or else i will close the issue. was (Author: udbhav agrawal): [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L78] i thought of changing the require function to conditional statement and instead of throwing an exception provide a warning message instead, but didn't feel any major use. can you suggest to go ahead with the same or else i will close the issue. > While creating new UDF with JAR though UDF is created successfully, it throws > IllegegalArgument Exception > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731974#comment-16731974 ] Udbhav Agrawal commented on SPARK-26454: [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L78] i thought of changing the require function to conditional statement and instead of throwing an exception provide a warning message instead, but didn't feel any major use. can you suggest to go ahead with the same or else i will close the issue. > While creating new UDF with JAR though UDF is created successfully, it throws > IllegegalArgument Exception > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731967#comment-16731967 ] Udbhav Agrawal commented on SPARK-26454: [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L78] > While creating new UDF with JAR though UDF is created successfully, it throws > IllegegalArgument Exception > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udbhav Agrawal updated SPARK-26454: --- Comment: was deleted (was: [https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L78] ) > While creating new UDF with JAR though UDF is created successfully, it throws > IllegegalArgument Exception > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26462) Use ConfigEntry for hardcoded configs for execution categories.
[ https://issues.apache.org/jira/browse/SPARK-26462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731953#comment-16731953 ] Takuya Ueshin commented on SPARK-26462: --- [~pralabhkumar] Sure. Please feel free to create the pull request. Thanks! > Use ConfigEntry for hardcoded configs for execution categories. > --- > > Key: SPARK-26462 > URL: https://issues.apache.org/jira/browse/SPARK-26462 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Takuya Ueshin >Priority: Major > > Make the following hardcoded configs to use ConfigEntry. > {code} > spark.memory > spark.storage > spark.io > spark.buffer > spark.rdd > spark.locality > spark.broadcast > spark.reducer > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udbhav Agrawal updated SPARK-26454: --- Summary: While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception (was: while creating new UDF with JAR though UDF is created successfully) > While creating new UDF with JAR though UDF is created successfully, it throws > IllegegalArgument Exception > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26454) while creating new UDF with JAR though UDF is created successfully
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udbhav Agrawal updated SPARK-26454: --- Description: 【Test step】: 1.launch spark-shell 2. set role admin; 3. create new function CREATE FUNCTION Func AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 'hdfs:///tmp/super_udf/two_udfs.jar' 4. Do select on the function sql("select Func('2018-03-09')").show() 5.Create new UDF with same JAR sql("CREATE FUNCTION newFunc AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 'hdfs:///tmp/super_udf/two_udfs.jar'") 6. Do select on the new function created. sql("select newFunc ('2018-03-09')").show() 【Output】: Function is getting created but illegal argument exception is thrown , select provides result but with illegal argument exception. was: 【Test step】: 1.launch spark-shell 2. set role admin; 3. create new function CREATE FUNCTION Func AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 'hdfs:///tmp/super_udf/two_udfs.jar' 4. Do select on the function sql("select Func('2018-03-09')").show() 5.Create new UDF with same JAR sql("CREATE FUNCTION newFunc AS 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 'hdfs:///tmp/super_udf/two_udfs.jar'") 6. Do select on the new function created. sql("select newFunc ('2018-03-09')").show() 【Output】: Function is getting created but illegal argument exception is thrown , select provides result but with illegal argument exception. Summary: while creating new UDF with JAR though UDF is created successfully (was: IllegegalArgument Exception is Thrown while creating new UDF with JAR) > while creating new UDF with JAR though UDF is created successfully > -- > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26517) Avoid duplicate test in ParquetSchemaPruningSuite
[ https://issues.apache.org/jira/browse/SPARK-26517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26517: Assignee: (was: Apache Spark) > Avoid duplicate test in ParquetSchemaPruningSuite > - > > Key: SPARK-26517 > URL: https://issues.apache.org/jira/browse/SPARK-26517 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Priority: Minor > > `testExactCaseQueryPruning` and `testMixedCaseQueryPruning` don't need to set > up `PARQUET_VECTORIZED_READER_ENABLED` config. Because `withMixedCaseData` > will run against both Spark vectorized reader and Parquet-mr reader. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26517) Avoid duplicate test in ParquetSchemaPruningSuite
[ https://issues.apache.org/jira/browse/SPARK-26517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26517: Assignee: Apache Spark > Avoid duplicate test in ParquetSchemaPruningSuite > - > > Key: SPARK-26517 > URL: https://issues.apache.org/jira/browse/SPARK-26517 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark >Priority: Minor > > `testExactCaseQueryPruning` and `testMixedCaseQueryPruning` don't need to set > up `PARQUET_VECTORIZED_READER_ENABLED` config. Because `withMixedCaseData` > will run against both Spark vectorized reader and Parquet-mr reader. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26517) Avoid duplicate test in ParquetSchemaPruningSuite
Liang-Chi Hsieh created SPARK-26517: --- Summary: Avoid duplicate test in ParquetSchemaPruningSuite Key: SPARK-26517 URL: https://issues.apache.org/jira/browse/SPARK-26517 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.0 Reporter: Liang-Chi Hsieh `testExactCaseQueryPruning` and `testMixedCaseQueryPruning` don't need to set up `PARQUET_VECTORIZED_READER_ENABLED` config. Because `withMixedCaseData` will run against both Spark vectorized reader and Parquet-mr reader. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18805) InternalMapWithStateDStream make java.lang.StackOverflowError
[ https://issues.apache.org/jira/browse/SPARK-18805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731924#comment-16731924 ] Joost Verdoorn commented on SPARK-18805: This issue occurs relatively often within our application, when resuming from checkpoint. Is there any progress on this? > InternalMapWithStateDStream make java.lang.StackOverflowError > -- > > Key: SPARK-18805 > URL: https://issues.apache.org/jira/browse/SPARK-18805 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 1.6.3, 2.0.2 > Environment: mesos >Reporter: etienne >Priority: Major > > When load InternalMapWithStateDStream from a check point. > If isValidTime is true and if there is no generatedRDD at the given time > there is an infinite loop. > 1) compute is call on InternalMapWithStateDStream > 2) InternalMapWithStateDStream try to generate the previousRDD > 3) Stream look in generatedRDD if the RDD is already generated for the given > time > 4) It not fund the rdd so it check if the time is valid. > 5) if the time is valid call compute on InternalMapWithStateDStream > 6) restart from 1) > Here the exception that illustrate this error > {code} > Exception in thread "streaming-start" java.lang.StackOverflowError > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340) > at > org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333) > at scala.Option.orElse(Option.scala:289) > at > org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330) > at > org.apache.spark.streaming.dstream.InternalMapWithStateDStream.compute(MapWithStateDStream.scala:134) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341) > at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340) > at > org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333) > at scala.Option.orElse(Option.scala:289) > at > org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330) > at > org.apache.spark.streaming.dstream.InternalMapWithStateDStream.compute(MapWithStateDStream.scala:134) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26454) IllegegalArgument Exception is Thrown while creating new UDF with JAR
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731902#comment-16731902 ] Hyukjin Kwon commented on SPARK-26454: -- Can you fix the Jira title and description? it sounds like it doesn't work at all. If it's easy to fix, go ahead. Otherwise, I won't fix. > IllegegalArgument Exception is Thrown while creating new UDF with JAR > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26454) IllegegalArgument Exception is Thrown while creating new UDF with JAR
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26454: - Priority: Trivial (was: Major) > IllegegalArgument Exception is Thrown while creating new UDF with JAR > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26454) IllegegalArgument Exception is Thrown while creating new UDF with JAR
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731856#comment-16731856 ] Udbhav Agrawal commented on SPARK-26454: [~hyukjin.kwon] Yes code is working fine but showing the error log as well, and i checked in other clients nowhere error log is coming other than spark-shell. So do we need to handle this case or it is not required as code is working fine. please give your suggestions. > IllegegalArgument Exception is Thrown while creating new UDF with JAR > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Major > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26462) Use ConfigEntry for hardcoded configs for execution categories.
[ https://issues.apache.org/jira/browse/SPARK-26462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731862#comment-16731862 ] pralabhkumar commented on SPARK-26462: -- [~ueshin] I can work on this . Please let me know if its ok . I'll create the pull request > Use ConfigEntry for hardcoded configs for execution categories. > --- > > Key: SPARK-26462 > URL: https://issues.apache.org/jira/browse/SPARK-26462 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Takuya Ueshin >Priority: Major > > Make the following hardcoded configs to use ConfigEntry. > {code} > spark.memory > spark.storage > spark.io > spark.buffer > spark.rdd > spark.locality > spark.broadcast > spark.reducer > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26516) zeppelin with spark on mesos: environment variable setting
Yui Hirasawa created SPARK-26516: Summary: zeppelin with spark on mesos: environment variable setting Key: SPARK-26516 URL: https://issues.apache.org/jira/browse/SPARK-26516 Project: Spark Issue Type: IT Help Components: Mesos, Spark Core Affects Versions: 2.4.0 Reporter: Yui Hirasawa I am trying to use zeppelin with spark on mesos mode following [Apache Zeppelin on Spark Cluster Mode|https://zeppelin.apache.org/docs/0.8.0/setup/deployment/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin-1]. In the instruction, we should set these environment variables: {code:java} export MASTER=mesos://127.0.1.1:5050 export MESOS_NATIVE_JAVA_LIBRARY=[PATH OF libmesos.so] export SPARK_HOME=[PATH OF SPARK HOME] {code} As far as I know, these environment variables are used by zeppelin, so it should be set in localhost rather than in docker container(if i am wrong please correct me). But mesos and spark is running inside docker container, so do we need to set these environment variables so that they are pointing to the path inside the docker container? If so, how should one achieve that? Thanks in advance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26454) IllegegalArgument Exception is Thrown while creating new UDF with JAR
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731818#comment-16731818 ] Hyukjin Kwon commented on SPARK-26454: -- Which Spark version do you use? Looks line number is different from Spark 2.3.2: https://github.com/apache/spark/blob/v2.3.2/core/src/main/scala/org/apache/spark/SparkContext.scala#L1810-L1858 {code} java.lang.IllegalArgumentException: requirement failed: File custom.jar was already registered with a different path (old path = /opt/sparkclient/Spark2x/tmp/spark-10ea8f59-fa23-46c5-af12-aa029bf2f5cb/custom.jar, new path = /opt/sparkclient/Spark2x/tmp/spark-ed12eb5e-b7b9-49d0-a7a4-a0dba9141ac9/custom.jar at scala.Predef$.require(Predef.scala:224) at org.apache.spark.rpc.netty.NettyStreamManager.addJar(NettyStreamManager.scala:78) at org.apache.spark.SparkContext.addJarFile$1(SparkContext.scala:1829) at org.apache.spark.SparkContext.addJar(SparkContext.scala:1851) at org.apache.spark.sql.internal.SessionResourceLoader.addJar(SessionState.scala:189) at org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:119) at org.apache.spark.sql.hive.HiveACLSessionResourceLoader.addJar(HiveACLSessionStateBuilder.scala:110) at org.apache.spark.sql.internal.SessionResourceLoader.loadResource(SessionState.scala:157) at org.apache.spark.sql.catalyst.catalog.SessionCatalog$$anonfun$loadFunctionResources$1.apply(SessionCatalo {code} Assuming from the codes, it should just show the error log and the code should work. > IllegegalArgument Exception is Thrown while creating new UDF with JAR > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Major > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26516) zeppelin with spark on mesos: environment variable setting
[ https://issues.apache.org/jira/browse/SPARK-26516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yui Hirasawa updated SPARK-26516: - Issue Type: Question (was: IT Help) > zeppelin with spark on mesos: environment variable setting > -- > > Key: SPARK-26516 > URL: https://issues.apache.org/jira/browse/SPARK-26516 > Project: Spark > Issue Type: Question > Components: Mesos, Spark Core >Affects Versions: 2.4.0 >Reporter: Yui Hirasawa >Priority: Major > > I am trying to use zeppelin with spark on mesos mode following [Apache > Zeppelin on Spark Cluster > Mode|https://zeppelin.apache.org/docs/0.8.0/setup/deployment/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin-1]. > In the instruction, we should set these environment variables: > {code:java} > export MASTER=mesos://127.0.1.1:5050 > export MESOS_NATIVE_JAVA_LIBRARY=[PATH OF libmesos.so] > export SPARK_HOME=[PATH OF SPARK HOME] > {code} > As far as I know, these environment variables are used by zeppelin, so it > should be set in localhost rather than in docker container(if i am wrong > please correct me). > But mesos and spark is running inside docker container, so do we need to set > these environment variables so that they are pointing to the path inside the > docker container? If so, how should one achieve that? > Thanks in advance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org