date:20190102

[jira] [Updated] (SPARK-26522) Auth secret error in RBackendAuthHandler

2019-01-02 Thread Euijun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Euijun updated SPARK-26522:
---
Description: 
Hi expert,

I try to use livy to connect sparkR backend.

This is related to 
[https://stackoverflow.com/questions/53900995/livy-spark-r-issue]

 

Error message is,
{code:java}
Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : Auth 
secret not provided in environment.{code}
 

caused by,

spark-2.3.1/R/pkg/R/sparkR.R
{code:java}
sparkR.sparkContext <- function(

...

    authSecret <- Sys.getenv("SPARKR_BACKEND_AUTH_SECRET")
    if (nchar(authSecret) == 0) {
  stop("Auth secret not provided in environment.")
    }

...

)
{code}
 

Best regard.

  was:
Hi expert,

I try to use livy to connect sparkR backend.

This is related to 
https://stackoverflow.com/questions/53900995/livy-spark-r-issue

 

Error message is,
{code:java}
Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : Auth 
secret not provided in environment.{code}
caused by,

spark-2.3.1/R/pkg/R/sparkR.R

 
{code:java}
sparkR.sparkContext <- function(

...

    authSecret <- Sys.getenv("SPARKR_BACKEND_AUTH_SECRET")
    if (nchar(authSecret) == 0) {
  stop("Auth secret not provided in environment.")
    }

...

)
{code}
 

Best regard.


> Auth secret error in RBackendAuthHandler
> 
>
> Key: SPARK-26522
> URL: https://issues.apache.org/jira/browse/SPARK-26522
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.1
>Reporter: Euijun
>Assignee: Matt Cheah
>Priority: Minor
>  Labels: newbie
>
> Hi expert,
> I try to use livy to connect sparkR backend.
> This is related to 
> [https://stackoverflow.com/questions/53900995/livy-spark-r-issue]
>  
> Error message is,
> {code:java}
> Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : 
> Auth secret not provided in environment.{code}
>  
> caused by,
> spark-2.3.1/R/pkg/R/sparkR.R
> {code:java}
> sparkR.sparkContext <- function(
> ...
>     authSecret <- Sys.getenv("SPARKR_BACKEND_AUTH_SECRET")
>     if (nchar(authSecret) == 0) {
>   stop("Auth secret not provided in environment.")
>     }
> ...
> )
> {code}
>  
> Best regard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26523) Getting this error while reading from kinesis :- Could not read until the end sequence number of the range: SequenceNumberRange

2019-01-02 Thread CHIRAG YADAV (JIRA)

CHIRAG YADAV created SPARK-26523:


 Summary: Getting this error while reading from kinesis :- Could 
not read until the end sequence number of the range: SequenceNumberRange
 Key: SPARK-26523
 URL: https://issues.apache.org/jira/browse/SPARK-26523
 Project: Spark
  Issue Type: Brainstorming
  Components: DStreams, Spark Submit, Structured Streaming
Affects Versions: 2.4.0
Reporter: CHIRAG YADAV


I am using spark to read data from kinesis stream and after reading data for 
sometime i get this error ERROR Executor: Exception in task 74.0 in stage 52.0 
(TID 339) org.apache.spark.SparkException: Could not read until the end 
sequence number of the range: 
SequenceNumberRange(godel-logs,shardId-0007,49591040259365283625183097566179815847537156031957172338,49591040259365283625183097600068424422974441881954418802,4517)

 
Can someone please tell why am i getting this error and how to resolve this
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26522) Auth secret error in RBackendAuthHandler

2019-01-02 Thread Euijun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Euijun updated SPARK-26522:
---
Shepherd: Apache Spark
  Labels: newbie  (was: )

> Auth secret error in RBackendAuthHandler
> 
>
> Key: SPARK-26522
> URL: https://issues.apache.org/jira/browse/SPARK-26522
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.1
>Reporter: Euijun
>Assignee: Matt Cheah
>Priority: Minor
>  Labels: newbie
>
> Hi expert,
> I try to use livy to connect sparkR backend.
> This is related to 
> https://stackoverflow.com/questions/53900995/livy-spark-r-issue
>  
> Error message is,
> {code:java}
> Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : 
> Auth secret not provided in environment.{code}
> caused by,
> spark-2.3.1/R/pkg/R/sparkR.R
>  
> {code:java}
> sparkR.sparkContext <- function(
> ...
>     authSecret <- Sys.getenv("SPARKR_BACKEND_AUTH_SECRET")
>     if (nchar(authSecret) == 0) {
>   stop("Auth secret not provided in environment.")
>     }
> ...
> )
> {code}
>  
> Best regard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26522) Auth secret error in RBackendAuthHandler

2019-01-02 Thread Euijun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Euijun updated SPARK-26522:
---
Affects Version/s: (was: 3.0.0)
   2.3.1
 Priority: Minor  (was: Major)
Fix Version/s: (was: 3.0.0)
  Description: 
Hi expert,

I try to use livy to connect sparkR backend.

This is related to 
https://stackoverflow.com/questions/53900995/livy-spark-r-issue

 

Error message is,
{code:java}
Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : Auth 
secret not provided in environment.{code}
caused by,

spark-2.3.1/R/pkg/R/sparkR.R

 
{code:java}
sparkR.sparkContext <- function(

...

    authSecret <- Sys.getenv("SPARKR_BACKEND_AUTH_SECRET")
    if (nchar(authSecret) == 0) {
  stop("Auth secret not provided in environment.")
    }

...

)
{code}
 

Best regard.

  was:
This is a follow up to SPARK-26194, which aims to add auto-generated secrets 
similar to the YARN backend.

There's a desire to support different ways to generate and propagate these auth 
secrets (e.g. using things like Vault). Need to investigate:

- exposing configuration to support that
- changing SecurityManager so that it can delegate some of the secret-handling 
logic to custom implementations
- figuring out whether this can also be used in client-mode, where the driver 
is not created by the k8s backend in Spark.


  Component/s: (was: Kubernetes)
   SparkR
   Issue Type: Bug  (was: New Feature)
  Summary: Auth secret error in RBackendAuthHandler  (was: CLONE - 
Add configurable auth secret source in k8s backend)

> Auth secret error in RBackendAuthHandler
> 
>
> Key: SPARK-26522
> URL: https://issues.apache.org/jira/browse/SPARK-26522
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.1
>Reporter: Euijun
>Assignee: Matt Cheah
>Priority: Minor
>
> Hi expert,
> I try to use livy to connect sparkR backend.
> This is related to 
> https://stackoverflow.com/questions/53900995/livy-spark-r-issue
>  
> Error message is,
> {code:java}
> Error in sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, : 
> Auth secret not provided in environment.{code}
> caused by,
> spark-2.3.1/R/pkg/R/sparkR.R
>  
> {code:java}
> sparkR.sparkContext <- function(
> ...
>     authSecret <- Sys.getenv("SPARKR_BACKEND_AUTH_SECRET")
>     if (nchar(authSecret) == 0) {
>   stop("Auth secret not provided in environment.")
>     }
> ...
> )
> {code}
>  
> Best regard.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26522) CLONE - Add configurable auth secret source in k8s backend

2019-01-02 Thread Euijun (JIRA)

Euijun created SPARK-26522:
--

 Summary: CLONE - Add configurable auth secret source in k8s backend
 Key: SPARK-26522
 URL: https://issues.apache.org/jira/browse/SPARK-26522
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Euijun
Assignee: Matt Cheah
 Fix For: 3.0.0


This is a follow up to SPARK-26194, which aims to add auto-generated secrets 
similar to the YARN backend.

There's a desire to support different ways to generate and propagate these auth 
secrets (e.g. using things like Vault). Need to investigate:

- exposing configuration to support that
- changing SecurityManager so that it can delegate some of the secret-handling 
logic to custom implementations
- figuring out whether this can also be used in client-mode, where the driver 
is not created by the k8s backend in Spark.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26519) spark sql CHANGE COLUMN not working

2019-01-02 Thread suman gorantla (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732714#comment-16732714
 ] 

suman gorantla commented on SPARK-26519:


Dear Hyun,
Below command was successfully working in hive/ impala but failing in spark
sql and spark submit . Found the same error message in the logs as well .
This seems to be in correct behavior. Please let me know the reason
ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col  new_col
STRING AFTER  old_col


-- 
సుమన్ గోరంట్ల


> spark sql   CHANGE COLUMN not working 
> --
>
> Key: SPARK-26519
> URL: https://issues.apache.org/jira/browse/SPARK-26519
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: !image-2019-01-02-14-25-34-594.png!
>Reporter: suman gorantla
>Priority: Major
> Attachments: sparksql error.PNG
>
>
> Dear Team,
> with spark sql I am unable to change the newly added column() position after 
> an existing column in the table (old_column) of a hive external table please 
> see the screenshot as in below 
> scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col  
> STRING)")
>  res14: org.apache.spark.sql.DataFrame = []
>  sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col  
> STRING AFTER old_col ")
>  org.apache.spark.sql.catalyst.parser.ParseException:
>  Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE 
> COLUMN ... FIRST | AFTER otherCol(line 1, pos 0)
> == SQL ==
>  ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col  new_col  
> STRING AFTER  old_col 
>  ^^^
> at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55)
>  at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97)
>  at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
>  ... 48 elided
> !image-2019-01-02-14-25-40-980.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26437) Decimal data becomes bigint to query, unable to query

2019-01-02 Thread zengxl (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732701#comment-16732701
 ] 

zengxl commented on SPARK-26437:


Thanks [~dongjoon]

> Decimal data becomes bigint to query, unable to query
> -
>
> Key: SPARK-26437
> URL: https://issues.apache.org/jira/browse/SPARK-26437
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.2, 2.3.1
>Reporter: zengxl
>Priority: Major
> Fix For: 3.0.0
>
>
> this is my sql:
> create table tmp.tmp_test_6387_1224_spark  stored  as ORCFile  as select 0.00 
> as a
> select a from tmp.tmp_test_6387_1224_spark
> CREATE TABLE `tmp.tmp_test_6387_1224_spark`(
>  {color:#f79232} `a` decimal(2,2)){color}
> ROW FORMAT SERDE
>   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> When I query this table（use hive or sparksql，the exception is same）, I throw 
> the following exception information
> *Caused by: java.io.EOFException: Reading BigInteger past EOF from compressed 
> stream Stream for column 1 kind DATA position: 0 length: 0 range: 0 offset: 0 
> limit: 0*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.SerializationUtils.readBigInteger(SerializationUtils.java:176)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$DecimalTreeReader.next(TreeReaderFactory.java:1264)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.next(TreeReaderFactory.java:2004)*
>     *at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:1039)*
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26521) Sparksql cannot modify the field name of a table

2019-01-02 Thread zengxl (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732698#comment-16732698
 ] 

zengxl commented on SPARK-26521:


ok,thank you [~dongjoon]

> Sparksql cannot modify the field name of a table
> 
>
> Key: SPARK-26521
> URL: https://issues.apache.org/jira/browse/SPARK-26521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: zengxl
>Priority: Major
>
> When i alter table info use sparksql,throw excepiton
>  
>  alter table tmp.testchange change column i m string;
> *Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing 
> column 'i' with type 'StringType' to 'm' with type 'StringType';*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26512) Spark 2.4.0 is not working with Hadoop 2.8.3 in windows 10?

2019-01-02 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26512:
--
Flags:   (was: Important)

> Spark 2.4.0 is not working with Hadoop 2.8.3 in windows 10?
> ---
>
> Key: SPARK-26512
> URL: https://issues.apache.org/jira/browse/SPARK-26512
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell, YARN
>Affects Versions: 2.4.0
> Environment: operating system : Windows 10
> Spark Version : 2.4.0
> Hadoop Version : 2.8.3
>Reporter: Anubhav Jain
>Priority: Minor
>  Labels: windows
> Attachments: log.png
>
>
> I have installed Hadoop version 2.8.3 in my windows 10 environment and its 
> working fine. Now when i try to install Apache Spark(version 2.4.0) with yarn 
> as cluster manager and its not working. When i try to submit a spark job 
> using spark-submit for testing , so its coming under ACCEPTED tab in YARN UI 
> after that it fail



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26521) Sparksql cannot modify the field name of a table

2019-01-02 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-26521.
---
Resolution: Duplicate

Hi, [~zengxl]. Thank you for reporting, but this duplicates SPARK-24602.
Please search JIRA issues before creating next time.

> Sparksql cannot modify the field name of a table
> 
>
> Key: SPARK-26521
> URL: https://issues.apache.org/jira/browse/SPARK-26521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: zengxl
>Priority: Major
>
> When i alter table info use sparksql,throw excepiton
>  
>  alter table tmp.testchange change column i m string;
> *Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing 
> column 'i' with type 'StringType' to 'm' with type 'StringType';*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-26521) Sparksql cannot modify the field name of a table

2019-01-02 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-26521.
-

> Sparksql cannot modify the field name of a table
> 
>
> Key: SPARK-26521
> URL: https://issues.apache.org/jira/browse/SPARK-26521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: zengxl
>Priority: Major
>
> When i alter table info use sparksql,throw excepiton
>  
>  alter table tmp.testchange change column i m string;
> *Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing 
> column 'i' with type 'StringType' to 'm' with type 'StringType';*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value

2019-01-02 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732659#comment-16732659
 ] 

Dongjoon Hyun commented on SPARK-22951:
---

Hi, [~feng...@databricks.com] and [~lian cheng].
Since this is a correctness issue reported on branch-2.2, I'll backport this 
for Spark 2.2.3.

> count() after dropDuplicates() on emptyDataFrame returns incorrect value
> 
>
> Key: SPARK-22951
> URL: https://issues.apache.org/jira/browse/SPARK-22951
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2, 2.2.0, 2.3.0
>Reporter: Michael Dreibelbis
>Assignee: Feng Liu
>Priority: Major
>  Labels: correctness
> Fix For: 2.3.0
>
>
> here is a minimal Spark Application to reproduce:
> {code}
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.{SparkConf, SparkContext}
> object DropDupesApp extends App {
>   
>   override def main(args: Array[String]): Unit = {
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local")
> val sc = new SparkContext(conf)
> val sql = SQLContext.getOrCreate(sc)
> assert(sql.emptyDataFrame.count == 0) // expected
> assert(sql.emptyDataFrame.dropDuplicates.count == 1) // unexpected
>   }
>   
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22951) count() after dropDuplicates() on emptyDataFrame returns incorrect value

2019-01-02 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-22951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-22951:
--
Target Version/s: 2.3.0, 2.2.3  (was: 2.3.0)

> count() after dropDuplicates() on emptyDataFrame returns incorrect value
> 
>
> Key: SPARK-22951
> URL: https://issues.apache.org/jira/browse/SPARK-22951
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2, 2.2.0, 2.3.0
>Reporter: Michael Dreibelbis
>Assignee: Feng Liu
>Priority: Major
>  Labels: correctness
> Fix For: 2.3.0
>
>
> here is a minimal Spark Application to reproduce:
> {code}
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.{SparkConf, SparkContext}
> object DropDupesApp extends App {
>   
>   override def main(args: Array[String]): Unit = {
> val conf = new SparkConf()
>   .setAppName("test")
>   .setMaster("local")
> val sc = new SparkContext(conf)
> val sql = SQLContext.getOrCreate(sc)
> assert(sql.emptyDataFrame.count == 0) // expected
> assert(sql.emptyDataFrame.dropDuplicates.count == 1) // unexpected
>   }
>   
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26019) pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()

2019-01-02 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-26019.
--
Resolution: Fixed

Issue resolved by pull request 23337
[https://github.com/apache/spark/pull/23337]

> pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" 
> in authenticate_and_accum_updates()
> 
>
> Key: SPARK-26019
> URL: https://issues.apache.org/jira/browse/SPARK-26019
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Ruslan Dautkhanov
>Assignee: Imran Rashid
>Priority: Major
> Fix For: 2.3.3, 2.4.1
>
>
> pyspark's accumulator server expects a secure py4j connection between python 
> and the jvm.  Spark will normally create a secure connection, but there is a 
> public api which allows you to pass in your own py4j connection.  (this is 
> used by zeppelin, at least.)  When this happens, you get an error like:
> {noformat}
> pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" 
> in authenticate_and_accum_updates()
> {noformat}
> We should change pyspark to
> 1) warn loudly if a user passes in an insecure connection
> 1a) I'd like to suggest that we even error out, unless the user actively 
> opts-in with a config like "spark.python.allowInsecurePy4j=true"
> 2) The accumulator server should be changed to allow insecure connections.
> note that SPARK-26349 will disallow insecure connections completely in 3.0.
>  
> More info on how this occurs:
> {code:python}
> Exception happened during processing of request from ('127.0.0.1', 43418)
> 
> Traceback (most recent call last):
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 290, in _handle_request_noblock
>     self.process_request(request, client_address)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 318, in process_request
>     self.finish_request(request, client_address)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 331, in finish_request
>     self.RequestHandlerClass(request, client_address, self)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 652, in __init__
>     self.handle()
>   File 
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py",
>  line 263, in handle
>     poll(authenticate_and_accum_updates)
>   File 
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py",
>  line 238, in poll
>     if func():
>   File 
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py",
>  line 251, in authenticate_and_accum_updates
>     received_token = self.rfile.read(len(auth_token))
> TypeError: object of type 'NoneType' has no len()
>  
> {code}
>  
> Error happens here:
> https://github.com/apache/spark/blob/cb90617f894fd51a092710271823ec7d1cd3a668/python/pyspark/accumulators.py#L254
> The PySpark code was just running a simple pipeline of 
> binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. )
> and then converting it to a dataframe and running a count on it.
> It seems error is flaky - on next rerun it didn't happen. (But accumulators 
> don't actually work.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26019) pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()

2019-01-02 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-26019:


Assignee: Imran Rashid

> pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" 
> in authenticate_and_accum_updates()
> 
>
> Key: SPARK-26019
> URL: https://issues.apache.org/jira/browse/SPARK-26019
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Ruslan Dautkhanov
>Assignee: Imran Rashid
>Priority: Major
> Fix For: 2.3.3, 2.4.1
>
>
> pyspark's accumulator server expects a secure py4j connection between python 
> and the jvm.  Spark will normally create a secure connection, but there is a 
> public api which allows you to pass in your own py4j connection.  (this is 
> used by zeppelin, at least.)  When this happens, you get an error like:
> {noformat}
> pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" 
> in authenticate_and_accum_updates()
> {noformat}
> We should change pyspark to
> 1) warn loudly if a user passes in an insecure connection
> 1a) I'd like to suggest that we even error out, unless the user actively 
> opts-in with a config like "spark.python.allowInsecurePy4j=true"
> 2) The accumulator server should be changed to allow insecure connections.
> note that SPARK-26349 will disallow insecure connections completely in 3.0.
>  
> More info on how this occurs:
> {code:python}
> Exception happened during processing of request from ('127.0.0.1', 43418)
> 
> Traceback (most recent call last):
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 290, in _handle_request_noblock
>     self.process_request(request, client_address)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 318, in process_request
>     self.finish_request(request, client_address)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 331, in finish_request
>     self.RequestHandlerClass(request, client_address, self)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 652, in __init__
>     self.handle()
>   File 
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py",
>  line 263, in handle
>     poll(authenticate_and_accum_updates)
>   File 
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py",
>  line 238, in poll
>     if func():
>   File 
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py",
>  line 251, in authenticate_and_accum_updates
>     received_token = self.rfile.read(len(auth_token))
> TypeError: object of type 'NoneType' has no len()
>  
> {code}
>  
> Error happens here:
> https://github.com/apache/spark/blob/cb90617f894fd51a092710271823ec7d1cd3a668/python/pyspark/accumulators.py#L254
> The PySpark code was just running a simple pipeline of 
> binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. )
> and then converting it to a dataframe and running a count on it.
> It seems error is flaky - on next rerun it didn't happen. (But accumulators 
> don't actually work.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26019) pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" in authenticate_and_accum_updates()

2019-01-02 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-26019:
-
Fix Version/s: 2.4.1
   2.3.3

> pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" 
> in authenticate_and_accum_updates()
> 
>
> Key: SPARK-26019
> URL: https://issues.apache.org/jira/browse/SPARK-26019
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Ruslan Dautkhanov
>Priority: Major
> Fix For: 2.3.3, 2.4.1
>
>
> pyspark's accumulator server expects a secure py4j connection between python 
> and the jvm.  Spark will normally create a secure connection, but there is a 
> public api which allows you to pass in your own py4j connection.  (this is 
> used by zeppelin, at least.)  When this happens, you get an error like:
> {noformat}
> pyspark/accumulators.py: "TypeError: object of type 'NoneType' has no len()" 
> in authenticate_and_accum_updates()
> {noformat}
> We should change pyspark to
> 1) warn loudly if a user passes in an insecure connection
> 1a) I'd like to suggest that we even error out, unless the user actively 
> opts-in with a config like "spark.python.allowInsecurePy4j=true"
> 2) The accumulator server should be changed to allow insecure connections.
> note that SPARK-26349 will disallow insecure connections completely in 3.0.
>  
> More info on how this occurs:
> {code:python}
> Exception happened during processing of request from ('127.0.0.1', 43418)
> 
> Traceback (most recent call last):
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 290, in _handle_request_noblock
>     self.process_request(request, client_address)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 318, in process_request
>     self.finish_request(request, client_address)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 331, in finish_request
>     self.RequestHandlerClass(request, client_address, self)
>   File "/opt/cloudera/parcels/Anaconda/lib/python2.7/SocketServer.py", line 
> 652, in __init__
>     self.handle()
>   File 
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py",
>  line 263, in handle
>     poll(authenticate_and_accum_updates)
>   File 
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py",
>  line 238, in poll
>     if func():
>   File 
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/lib/spark2/python/lib/pyspark.zip/pyspark/accumulators.py",
>  line 251, in authenticate_and_accum_updates
>     received_token = self.rfile.read(len(auth_token))
> TypeError: object of type 'NoneType' has no len()
>  
> {code}
>  
> Error happens here:
> https://github.com/apache/spark/blob/cb90617f894fd51a092710271823ec7d1cd3a668/python/pyspark/accumulators.py#L254
> The PySpark code was just running a simple pipeline of 
> binary_rdd = sc.binaryRecords(full_file_path, record_length).map(lambda .. )
> and then converting it to a dataframe and running a count on it.
> It seems error is flaky - on next rerun it didn't happen. (But accumulators 
> don't actually work.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26403) DataFrame pivot using array column fails with "Unsupported literal type class"

2019-01-02 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-26403.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23349
[https://github.com/apache/spark/pull/23349]

> DataFrame pivot using array column fails with "Unsupported literal type class"
> --
>
> Key: SPARK-26403
> URL: https://issues.apache.org/jira/browse/SPARK-26403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Huon Wilson
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 3.0.0
>
>
> Doing a pivot (using the {{pivot(pivotColumn: Column)}} overload) on a column 
> containing arrays results in a runtime error:
> {code:none}
> scala> val df = Seq((1, Seq("a", "x"), 2), (1, Seq("b"), 3), (2, Seq("a", 
> "x"), 10), (3, Seq(), 100)).toDF("x", "s", "y")
> df: org.apache.spark.sql.DataFrame = [x: int, s: array ... 1 more 
> field]
> scala> df.show
> +---+--+---+
> |  x| s|  y|
> +---+--+---+
> |  1|[a, x]|  2|
> |  1|   [b]|  3|
> |  2|[a, x]| 10|
> |  3|[]|100|
> +---+--+---+
> scala> df.groupBy("x").pivot("s").agg(collect_list($"y")).show
> java.lang.RuntimeException: Unsupported literal type class 
> scala.collection.mutable.WrappedArray$ofRef WrappedArray()
>   at 
> org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78)
>   at 
> org.apache.spark.sql.RelationalGroupedDataset$$anonfun$pivot$1.apply(RelationalGroupedDataset.scala:419)
>   at 
> org.apache.spark.sql.RelationalGroupedDataset$$anonfun$pivot$1.apply(RelationalGroupedDataset.scala:419)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:419)
>   at 
> org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:397)
>   at 
> org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:317)
>   ... 49 elided
> {code}
> However, this doesn't seem to be a fundamental limitation with {{pivot}}, as 
> it works fine using the {{pivot(pivotColumn: Column, values: Seq[Any])}} 
> overload, as long as the arrays are mapped to the {{Array}} type:
> {code:none}
> scala> val rawValues = df.select("s").distinct.sort("s").collect
> rawValues: Array[org.apache.spark.sql.Row] = Array([WrappedArray()], 
> [WrappedArray(a, x)], [WrappedArray(b)])
> scala> val values = rawValues.map(_.getSeq[String](0).to[Array])
> values: Array[Array[String]] = Array(Array(), Array(a, x), Array(b))
> scala> df.groupBy("x").pivot("s", values).agg(collect_list($"y")).show
> +---+-+--+---+
> |  x|   []|[a, x]|[b]|
> +---+-+--+---+
> |  1|   []|   [2]|[3]|
> |  3|[100]|[]| []|
> |  2|   []|  [10]| []|
> +---+-+--+---+
> {code}
> It would be nice if {{pivot}} was more resilient to Spark's own 
> representation of array columns, and so the first version worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26403) DataFrame pivot using array column fails with "Unsupported literal type class"

2019-01-02 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-26403:


Assignee: Hyukjin Kwon

> DataFrame pivot using array column fails with "Unsupported literal type class"
> --
>
> Key: SPARK-26403
> URL: https://issues.apache.org/jira/browse/SPARK-26403
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Huon Wilson
>Assignee: Hyukjin Kwon
>Priority: Minor
>
> Doing a pivot (using the {{pivot(pivotColumn: Column)}} overload) on a column 
> containing arrays results in a runtime error:
> {code:none}
> scala> val df = Seq((1, Seq("a", "x"), 2), (1, Seq("b"), 3), (2, Seq("a", 
> "x"), 10), (3, Seq(), 100)).toDF("x", "s", "y")
> df: org.apache.spark.sql.DataFrame = [x: int, s: array ... 1 more 
> field]
> scala> df.show
> +---+--+---+
> |  x| s|  y|
> +---+--+---+
> |  1|[a, x]|  2|
> |  1|   [b]|  3|
> |  2|[a, x]| 10|
> |  3|[]|100|
> +---+--+---+
> scala> df.groupBy("x").pivot("s").agg(collect_list($"y")).show
> java.lang.RuntimeException: Unsupported literal type class 
> scala.collection.mutable.WrappedArray$ofRef WrappedArray()
>   at 
> org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:78)
>   at 
> org.apache.spark.sql.RelationalGroupedDataset$$anonfun$pivot$1.apply(RelationalGroupedDataset.scala:419)
>   at 
> org.apache.spark.sql.RelationalGroupedDataset$$anonfun$pivot$1.apply(RelationalGroupedDataset.scala:419)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:419)
>   at 
> org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:397)
>   at 
> org.apache.spark.sql.RelationalGroupedDataset.pivot(RelationalGroupedDataset.scala:317)
>   ... 49 elided
> {code}
> However, this doesn't seem to be a fundamental limitation with {{pivot}}, as 
> it works fine using the {{pivot(pivotColumn: Column, values: Seq[Any])}} 
> overload, as long as the arrays are mapped to the {{Array}} type:
> {code:none}
> scala> val rawValues = df.select("s").distinct.sort("s").collect
> rawValues: Array[org.apache.spark.sql.Row] = Array([WrappedArray()], 
> [WrappedArray(a, x)], [WrappedArray(b)])
> scala> val values = rawValues.map(_.getSeq[String](0).to[Array])
> values: Array[Array[String]] = Array(Array(), Array(a, x), Array(b))
> scala> df.groupBy("x").pivot("s", values).agg(collect_list($"y")).show
> +---+-+--+---+
> |  x|   []|[a, x]|[b]|
> +---+-+--+---+
> |  1|   []|   [2]|[3]|
> |  3|[100]|[]| []|
> |  2|   []|  [10]| []|
> +---+-+--+---+
> {code}
> It would be nice if {{pivot}} was more resilient to Spark's own 
> representation of array columns, and so the first version worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs

2019-01-02 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-25591:
-
Fix Version/s: 2.3.3
   2.2.3

> PySpark Accumulators with multiple PythonUDFs
> -
>
> Key: SPARK-25591
> URL: https://issues.apache.org/jira/browse/SPARK-25591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.2
>Reporter: Abdeali Kothari
>Assignee: Liang-Chi Hsieh
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.2.3, 2.3.3, 2.4.0
>
>
> When having multiple Python UDFs - the last Python UDF's accumulator is the 
> only accumulator that gets updated.
> {code:python}
> import pyspark
> from pyspark.sql import SparkSession, Row
> from pyspark.sql import functions as F
> from pyspark.sql import types as T
> from pyspark import AccumulatorParam
> spark = SparkSession.builder.getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> test_accum = spark.sparkContext.accumulator(0.0)
> SHUFFLE = False
> def main(data):
> print(">>> Check0", test_accum.value)
> def test(x):
> global test_accum
> test_accum += 1.0
> return x
> print(">>> Check1", test_accum.value)
> def test2(x):
> global test_accum
> test_accum += 100.0
> return x
> print(">>> Check2", test_accum.value)
> func_udf = F.udf(test, T.DoubleType())
> print(">>> Check3", test_accum.value)
> func_udf2 = F.udf(test2, T.DoubleType())
> print(">>> Check4", test_accum.value)
> data = data.withColumn("out1", func_udf(data["a"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check5", test_accum.value)
> data = data.withColumn("out2", func_udf2(data["b"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check6", test_accum.value)
> data.show()  # ACTION
> print(">>> Check7", test_accum.value)
> return data
> df = spark.createDataFrame([
> [1.0, 2.0]
> ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for 
> field_name in ["a", "b"]]))
> df2 = main(df)
> {code}
> {code:python}
>  Output 1 - with SHUFFLE=False
> ...
> # >>> Check7 100.0
>  Output 2 - with SHUFFLE=True
> ...
> # >>> Check7 101.0
> {code}
> Basically looks like:
>  - Accumulator works only for last UDF before a shuffle-like operation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs

2019-01-02 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25591:
--
Target Version/s: 2.4.0, 2.2.3, 2.3.3  (was: 2.4.0)

> PySpark Accumulators with multiple PythonUDFs
> -
>
> Key: SPARK-25591
> URL: https://issues.apache.org/jira/browse/SPARK-25591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.2
>Reporter: Abdeali Kothari
>Assignee: Liang-Chi Hsieh
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.4.0
>
>
> When having multiple Python UDFs - the last Python UDF's accumulator is the 
> only accumulator that gets updated.
> {code:python}
> import pyspark
> from pyspark.sql import SparkSession, Row
> from pyspark.sql import functions as F
> from pyspark.sql import types as T
> from pyspark import AccumulatorParam
> spark = SparkSession.builder.getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> test_accum = spark.sparkContext.accumulator(0.0)
> SHUFFLE = False
> def main(data):
> print(">>> Check0", test_accum.value)
> def test(x):
> global test_accum
> test_accum += 1.0
> return x
> print(">>> Check1", test_accum.value)
> def test2(x):
> global test_accum
> test_accum += 100.0
> return x
> print(">>> Check2", test_accum.value)
> func_udf = F.udf(test, T.DoubleType())
> print(">>> Check3", test_accum.value)
> func_udf2 = F.udf(test2, T.DoubleType())
> print(">>> Check4", test_accum.value)
> data = data.withColumn("out1", func_udf(data["a"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check5", test_accum.value)
> data = data.withColumn("out2", func_udf2(data["b"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check6", test_accum.value)
> data.show()  # ACTION
> print(">>> Check7", test_accum.value)
> return data
> df = spark.createDataFrame([
> [1.0, 2.0]
> ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for 
> field_name in ["a", "b"]]))
> df2 = main(df)
> {code}
> {code:python}
>  Output 1 - with SHUFFLE=False
> ...
> # >>> Check7 100.0
>  Output 2 - with SHUFFLE=True
> ...
> # >>> Check7 101.0
> {code}
> Basically looks like:
>  - Accumulator works only for last UDF before a shuffle-like operation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26457) Show hadoop configurations in HistoryServer environment tab

2019-01-02 Thread Saisai Shao (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-26457:

Priority: Minor  (was: Major)

> Show hadoop configurations in HistoryServer environment tab
> ---
>
> Key: SPARK-26457
> URL: https://issues.apache.org/jira/browse/SPARK-26457
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, Web UI
>Affects Versions: 2.3.2, 2.4.0
> Environment: Maybe it is good to show some configurations in 
> HistoryServer environment tab for debugging some bugs about hadoop
>Reporter: deshanxiao
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26516) zeppelin with spark on mesos: environment variable setting

2019-01-02 Thread Saisai Shao (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-26516.
-
Resolution: Invalid

> zeppelin with spark on mesos: environment variable setting
> --
>
> Key: SPARK-26516
> URL: https://issues.apache.org/jira/browse/SPARK-26516
> Project: Spark
>  Issue Type: Question
>  Components: Mesos, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yui Hirasawa
>Priority: Major
>
> I am trying to use zeppelin with spark on mesos mode following [Apache 
> Zeppelin on Spark Cluster 
> Mode|https://zeppelin.apache.org/docs/0.8.0/setup/deployment/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin-1].
> In the instruction, we should set these environment variables:
> {code:java}
> export MASTER=mesos://127.0.1.1:5050
> export MESOS_NATIVE_JAVA_LIBRARY=[PATH OF libmesos.so]
> export SPARK_HOME=[PATH OF SPARK HOME]
> {code}
> As far as I know, these environment variables are used by zeppelin, so it 
> should be set in localhost rather than in docker container(if i am wrong 
> please correct me).
> But mesos and spark is running inside docker container, so do we need to set 
> these environment variables so that they are pointing to the path inside the 
> docker container? If so, how should one achieve that?
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26513) Trigger GC on executor node idle

2019-01-02 Thread Saisai Shao (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-26513:

Fix Version/s: (was: 3.0.0)

> Trigger GC on executor node idle
> 
>
> Key: SPARK-26513
> URL: https://issues.apache.org/jira/browse/SPARK-26513
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sandish Kumar HN
>Priority: Major
>
>  
> Correct me if I'm wrong.
>  *Stage:*
>       On a large cluster, each stage would have some executors. were a few 
> executors would finish a couple of tasks first and wait for whole stage or 
> remaining tasks to finish which are executed by different executors nodes in 
> a cluster. a stage will only be completed when all tasks in a current stage 
> finish its execution. and the next stage execution has to wait till all tasks 
> of the current stage are completed. 
>  
> why don't we trigger GC, when the executor node is waiting for remaining 
> tasks to finish, or executor Idle? anyways executor has to wait for the 
> remaining tasks to finish which can at least take a couple of seconds. why 
> don't we trigger GC? which will max take <300ms
>  
> I have proposed a small code snippet which triggers GC when running tasks are 
> empty and heap usage in current executor node is more than the given 
> threshold.
> This could improve performance for long-running spark job's. 
> we referred this paper 
> [https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] 
> and we found performance improvements in our long-running spark batch job's.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26516) zeppelin with spark on mesos: environment variable setting

2019-01-02 Thread Saisai Shao (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732587#comment-16732587
 ] 

Saisai Shao commented on SPARK-26516:
-

Questions should go to user@spark mail list. Also if this is a problem of 
Zeppelin, it would be better to ask in the Zeppelin mail list.

> zeppelin with spark on mesos: environment variable setting
> --
>
> Key: SPARK-26516
> URL: https://issues.apache.org/jira/browse/SPARK-26516
> Project: Spark
>  Issue Type: Question
>  Components: Mesos, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yui Hirasawa
>Priority: Major
>
> I am trying to use zeppelin with spark on mesos mode following [Apache 
> Zeppelin on Spark Cluster 
> Mode|https://zeppelin.apache.org/docs/0.8.0/setup/deployment/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin-1].
> In the instruction, we should set these environment variables:
> {code:java}
> export MASTER=mesos://127.0.1.1:5050
> export MESOS_NATIVE_JAVA_LIBRARY=[PATH OF libmesos.so]
> export SPARK_HOME=[PATH OF SPARK HOME]
> {code}
> As far as I know, these environment variables are used by zeppelin, so it 
> should be set in localhost rather than in docker container(if i am wrong 
> please correct me).
> But mesos and spark is running inside docker container, so do we need to set 
> these environment variables so that they are pointing to the path inside the 
> docker container? If so, how should one achieve that?
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs

2019-01-02 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732574#comment-16732574
 ] 

Hyukjin Kwon commented on SPARK-25591:
--

+1

> PySpark Accumulators with multiple PythonUDFs
> -
>
> Key: SPARK-25591
> URL: https://issues.apache.org/jira/browse/SPARK-25591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.2
>Reporter: Abdeali Kothari
>Assignee: Liang-Chi Hsieh
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.4.0
>
>
> When having multiple Python UDFs - the last Python UDF's accumulator is the 
> only accumulator that gets updated.
> {code:python}
> import pyspark
> from pyspark.sql import SparkSession, Row
> from pyspark.sql import functions as F
> from pyspark.sql import types as T
> from pyspark import AccumulatorParam
> spark = SparkSession.builder.getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> test_accum = spark.sparkContext.accumulator(0.0)
> SHUFFLE = False
> def main(data):
> print(">>> Check0", test_accum.value)
> def test(x):
> global test_accum
> test_accum += 1.0
> return x
> print(">>> Check1", test_accum.value)
> def test2(x):
> global test_accum
> test_accum += 100.0
> return x
> print(">>> Check2", test_accum.value)
> func_udf = F.udf(test, T.DoubleType())
> print(">>> Check3", test_accum.value)
> func_udf2 = F.udf(test2, T.DoubleType())
> print(">>> Check4", test_accum.value)
> data = data.withColumn("out1", func_udf(data["a"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check5", test_accum.value)
> data = data.withColumn("out2", func_udf2(data["b"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check6", test_accum.value)
> data.show()  # ACTION
> print(">>> Check7", test_accum.value)
> return data
> df = spark.createDataFrame([
> [1.0, 2.0]
> ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for 
> field_name in ["a", "b"]]))
> df2 = main(df)
> {code}
> {code:python}
>  Output 1 - with SHUFFLE=False
> ...
> # >>> Check7 100.0
>  Output 2 - with SHUFFLE=True
> ...
> # >>> Check7 101.0
> {code}
> Basically looks like:
>  - Accumulator works only for last UDF before a shuffle-like operation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs

2019-01-02 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732573#comment-16732573
 ] 

Dongjoon Hyun commented on SPARK-25591:
---

Thank you for confirming, [~viirya]. Yes. Please make two PRs for them.

> PySpark Accumulators with multiple PythonUDFs
> -
>
> Key: SPARK-25591
> URL: https://issues.apache.org/jira/browse/SPARK-25591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.2
>Reporter: Abdeali Kothari
>Assignee: Liang-Chi Hsieh
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.4.0
>
>
> When having multiple Python UDFs - the last Python UDF's accumulator is the 
> only accumulator that gets updated.
> {code:python}
> import pyspark
> from pyspark.sql import SparkSession, Row
> from pyspark.sql import functions as F
> from pyspark.sql import types as T
> from pyspark import AccumulatorParam
> spark = SparkSession.builder.getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> test_accum = spark.sparkContext.accumulator(0.0)
> SHUFFLE = False
> def main(data):
> print(">>> Check0", test_accum.value)
> def test(x):
> global test_accum
> test_accum += 1.0
> return x
> print(">>> Check1", test_accum.value)
> def test2(x):
> global test_accum
> test_accum += 100.0
> return x
> print(">>> Check2", test_accum.value)
> func_udf = F.udf(test, T.DoubleType())
> print(">>> Check3", test_accum.value)
> func_udf2 = F.udf(test2, T.DoubleType())
> print(">>> Check4", test_accum.value)
> data = data.withColumn("out1", func_udf(data["a"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check5", test_accum.value)
> data = data.withColumn("out2", func_udf2(data["b"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check6", test_accum.value)
> data.show()  # ACTION
> print(">>> Check7", test_accum.value)
> return data
> df = spark.createDataFrame([
> [1.0, 2.0]
> ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for 
> field_name in ["a", "b"]]))
> df2 = main(df)
> {code}
> {code:python}
>  Output 1 - with SHUFFLE=False
> ...
> # >>> Check7 100.0
>  Output 2 - with SHUFFLE=True
> ...
> # >>> Check7 101.0
> {code}
> Basically looks like:
>  - Accumulator works only for last UDF before a shuffle-like operation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs

2019-01-02 Thread Liang-Chi Hsieh (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732569#comment-16732569
 ] 

Liang-Chi Hsieh commented on SPARK-25591:
-

I can make backport PRs if you need. [~dongjoon]

> PySpark Accumulators with multiple PythonUDFs
> -
>
> Key: SPARK-25591
> URL: https://issues.apache.org/jira/browse/SPARK-25591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.2
>Reporter: Abdeali Kothari
>Assignee: Liang-Chi Hsieh
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.4.0
>
>
> When having multiple Python UDFs - the last Python UDF's accumulator is the 
> only accumulator that gets updated.
> {code:python}
> import pyspark
> from pyspark.sql import SparkSession, Row
> from pyspark.sql import functions as F
> from pyspark.sql import types as T
> from pyspark import AccumulatorParam
> spark = SparkSession.builder.getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> test_accum = spark.sparkContext.accumulator(0.0)
> SHUFFLE = False
> def main(data):
> print(">>> Check0", test_accum.value)
> def test(x):
> global test_accum
> test_accum += 1.0
> return x
> print(">>> Check1", test_accum.value)
> def test2(x):
> global test_accum
> test_accum += 100.0
> return x
> print(">>> Check2", test_accum.value)
> func_udf = F.udf(test, T.DoubleType())
> print(">>> Check3", test_accum.value)
> func_udf2 = F.udf(test2, T.DoubleType())
> print(">>> Check4", test_accum.value)
> data = data.withColumn("out1", func_udf(data["a"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check5", test_accum.value)
> data = data.withColumn("out2", func_udf2(data["b"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check6", test_accum.value)
> data.show()  # ACTION
> print(">>> Check7", test_accum.value)
> return data
> df = spark.createDataFrame([
> [1.0, 2.0]
> ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for 
> field_name in ["a", "b"]]))
> df2 = main(df)
> {code}
> {code:python}
>  Output 1 - with SHUFFLE=False
> ...
> # >>> Check7 100.0
>  Output 2 - with SHUFFLE=True
> ...
> # >>> Check7 101.0
> {code}
> Basically looks like:
>  - Accumulator works only for last UDF before a shuffle-like operation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs

2019-01-02 Thread Liang-Chi Hsieh (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732567#comment-16732567
 ] 

Liang-Chi Hsieh commented on SPARK-25591:
-

This is bug fixing, so I think it makes sense to backport this to branch-2.3 
and 2.2 if needed.

> PySpark Accumulators with multiple PythonUDFs
> -
>
> Key: SPARK-25591
> URL: https://issues.apache.org/jira/browse/SPARK-25591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.2
>Reporter: Abdeali Kothari
>Assignee: Liang-Chi Hsieh
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.4.0
>
>
> When having multiple Python UDFs - the last Python UDF's accumulator is the 
> only accumulator that gets updated.
> {code:python}
> import pyspark
> from pyspark.sql import SparkSession, Row
> from pyspark.sql import functions as F
> from pyspark.sql import types as T
> from pyspark import AccumulatorParam
> spark = SparkSession.builder.getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> test_accum = spark.sparkContext.accumulator(0.0)
> SHUFFLE = False
> def main(data):
> print(">>> Check0", test_accum.value)
> def test(x):
> global test_accum
> test_accum += 1.0
> return x
> print(">>> Check1", test_accum.value)
> def test2(x):
> global test_accum
> test_accum += 100.0
> return x
> print(">>> Check2", test_accum.value)
> func_udf = F.udf(test, T.DoubleType())
> print(">>> Check3", test_accum.value)
> func_udf2 = F.udf(test2, T.DoubleType())
> print(">>> Check4", test_accum.value)
> data = data.withColumn("out1", func_udf(data["a"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check5", test_accum.value)
> data = data.withColumn("out2", func_udf2(data["b"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check6", test_accum.value)
> data.show()  # ACTION
> print(">>> Check7", test_accum.value)
> return data
> df = spark.createDataFrame([
> [1.0, 2.0]
> ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for 
> field_name in ["a", "b"]]))
> df2 = main(df)
> {code}
> {code:python}
>  Output 1 - with SHUFFLE=False
> ...
> # >>> Check7 100.0
>  Output 2 - with SHUFFLE=True
> ...
> # >>> Check7 101.0
> {code}
> Basically looks like:
>  - Accumulator works only for last UDF before a shuffle-like operation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26521) Sparksql cannot modify the field name of a table

2019-01-02 Thread zengxl (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zengxl updated SPARK-26521:
---
Environment: (was:  alter table tmp.testchange change column i m string;

*Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing column 
'i' with type 'StringType' to 'm' with type 'StringType';*)

> Sparksql cannot modify the field name of a table
> 
>
> Key: SPARK-26521
> URL: https://issues.apache.org/jira/browse/SPARK-26521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: zengxl
>Priority: Major
>
> When i alter table info use sparksql,throw excepiton



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26521) Sparksql cannot modify the field name of a table

2019-01-02 Thread zengxl (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zengxl updated SPARK-26521:
---
Description: 
When i alter table info use sparksql,throw excepiton

 

 alter table tmp.testchange change column i m string;

*Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing column 
'i' with type 'StringType' to 'm' with type 'StringType';*

  was:When i alter table info use sparksql,throw excepiton


> Sparksql cannot modify the field name of a table
> 
>
> Key: SPARK-26521
> URL: https://issues.apache.org/jira/browse/SPARK-26521
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: zengxl
>Priority: Major
>
> When i alter table info use sparksql,throw excepiton
>  
>  alter table tmp.testchange change column i m string;
> *Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing 
> column 'i' with type 'StringType' to 'm' with type 'StringType';*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26521) Sparksql cannot modify the field name of a table

2019-01-02 Thread zengxl (JIRA)

zengxl created SPARK-26521:
--

 Summary: Sparksql cannot modify the field name of a table
 Key: SPARK-26521
 URL: https://issues.apache.org/jira/browse/SPARK-26521
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
 Environment:  alter table tmp.testchange change column i m string;

*Error in query: ALTER TABLE CHANGE COLUMN is not supported for changing column 
'i' with type 'StringType' to 'm' with type 'StringType';*
Reporter: zengxl


When i alter table info use sparksql,throw excepiton



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs

2019-01-02 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732556#comment-16732556
 ] 

Dongjoon Hyun commented on SPARK-25591:
---

Hi, [~viirya], [~hyukjin.kwon].

This is only in branch-2.4. Can we backport this to older branches like 
branch-2.3 and branch-2.2?

cc [~AbdealiJK]

> PySpark Accumulators with multiple PythonUDFs
> -
>
> Key: SPARK-25591
> URL: https://issues.apache.org/jira/browse/SPARK-25591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.2
>Reporter: Abdeali Kothari
>Assignee: Liang-Chi Hsieh
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.4.0
>
>
> When having multiple Python UDFs - the last Python UDF's accumulator is the 
> only accumulator that gets updated.
> {code:python}
> import pyspark
> from pyspark.sql import SparkSession, Row
> from pyspark.sql import functions as F
> from pyspark.sql import types as T
> from pyspark import AccumulatorParam
> spark = SparkSession.builder.getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> test_accum = spark.sparkContext.accumulator(0.0)
> SHUFFLE = False
> def main(data):
> print(">>> Check0", test_accum.value)
> def test(x):
> global test_accum
> test_accum += 1.0
> return x
> print(">>> Check1", test_accum.value)
> def test2(x):
> global test_accum
> test_accum += 100.0
> return x
> print(">>> Check2", test_accum.value)
> func_udf = F.udf(test, T.DoubleType())
> print(">>> Check3", test_accum.value)
> func_udf2 = F.udf(test2, T.DoubleType())
> print(">>> Check4", test_accum.value)
> data = data.withColumn("out1", func_udf(data["a"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check5", test_accum.value)
> data = data.withColumn("out2", func_udf2(data["b"]))
> if SHUFFLE:
> data = data.repartition(2)
> print(">>> Check6", test_accum.value)
> data.show()  # ACTION
> print(">>> Check7", test_accum.value)
> return data
> df = spark.createDataFrame([
> [1.0, 2.0]
> ], schema=T.StructType([T.StructField(field_name, T.DoubleType(), True) for 
> field_name in ["a", "b"]]))
> df2 = main(df)
> {code}
> {code:python}
>  Output 1 - with SHUFFLE=False
> ...
> # >>> Check7 100.0
>  Output 2 - with SHUFFLE=True
> ...
> # >>> Check7 101.0
> {code}
> Basically looks like:
>  - Accumulator works only for last UDF before a shuffle-like operation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23980) Resilient Spark driver on Kubernetes

2019-01-02 Thread Marcelo Vanzin (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-23980:
--

Assignee: (was: Marcelo Vanzin)

> Resilient Spark driver on Kubernetes
> 
>
> Key: SPARK-23980
> URL: https://issues.apache.org/jira/browse/SPARK-23980
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sebastian Toader
>Priority: Major
>
> The current implementation of `Spark driver` on Kubernetes is not resilient 
> to node failures as it’s implemented as a `Pod`. In case of a node failure 
> Kubernetes terminates the pods that were running on that node. Kubernetes 
> doesn't reschedule these pods to any of the other nodes of the cluster.
> If the `driver` is implemented as Kubernetes 
> [Job|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/]
>  than it will be rescheduled to other node.
> When the driver is terminated its executors (that may run on other nodes) are 
> terminated by Kubernetes with some delay by [Kubernetes Garbage 
> collection|https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/].
> This can lead to concurrency issues where the re-spawned `driver` was trying 
> to create new executors with same name as the executors being in the middle 
> of being cleaned up by Kubernetes garbage collection.
> To solve this issue the executor name must be made unique for each `driver` 
> *instance*.
> The PR linked to this lira is an implementation of the above that creates 
> spark driver as a Job and ensures that executor pod names are unique per 
> driver instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23980) Resilient Spark driver on Kubernetes

2019-01-02 Thread Marcelo Vanzin (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-23980:
--

Assignee: Marcelo Vanzin

> Resilient Spark driver on Kubernetes
> 
>
> Key: SPARK-23980
> URL: https://issues.apache.org/jira/browse/SPARK-23980
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sebastian Toader
>Assignee: Marcelo Vanzin
>Priority: Major
>
> The current implementation of `Spark driver` on Kubernetes is not resilient 
> to node failures as it’s implemented as a `Pod`. In case of a node failure 
> Kubernetes terminates the pods that were running on that node. Kubernetes 
> doesn't reschedule these pods to any of the other nodes of the cluster.
> If the `driver` is implemented as Kubernetes 
> [Job|https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/]
>  than it will be rescheduled to other node.
> When the driver is terminated its executors (that may run on other nodes) are 
> terminated by Kubernetes with some delay by [Kubernetes Garbage 
> collection|https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/].
> This can lead to concurrency issues where the re-spawned `driver` was trying 
> to create new executors with same name as the executors being in the middle 
> of being cleaned up by Kubernetes garbage collection.
> To solve this issue the executor name must be made unique for each `driver` 
> *instance*.
> The PR linked to this lira is an implementation of the above that creates 
> spark driver as a Job and ensures that executor pod names are unique per 
> driver instance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26441) Add kind configuration of driver pod

2019-01-02 Thread Marcelo Vanzin (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-26441.

Resolution: Duplicate

> Add kind configuration of driver pod 
> -
>
> Key: SPARK-26441
> URL: https://issues.apache.org/jira/browse/SPARK-26441
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.1, 2.3.2, 2.4.0
>Reporter: Fei Han
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26519) spark sql CHANGE COLUMN not working

2019-01-02 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26519:
--
Flags:   (was: Important)

> spark sql   CHANGE COLUMN not working 
> --
>
> Key: SPARK-26519
> URL: https://issues.apache.org/jira/browse/SPARK-26519
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Spark Submit
>Affects Versions: 2.1.0
> Environment: !image-2019-01-02-14-25-34-594.png!
>Reporter: suman gorantla
>Priority: Major
> Attachments: sparksql error.PNG
>
>
> Dear Team,
> with spark sql I am unable to change the newly added column() position after 
> an existing column in the table (old_column) of a hive external table please 
> see the screenshot as in below 
> scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col  
> STRING)")
>  res14: org.apache.spark.sql.DataFrame = []
>  sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col  
> STRING AFTER old_col ")
>  org.apache.spark.sql.catalyst.parser.ParseException:
>  Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE 
> COLUMN ... FIRST | AFTER otherCol(line 1, pos 0)
> == SQL ==
>  ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col  new_col  
> STRING AFTER  old_col 
>  ^^^
> at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55)
>  at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97)
>  at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
>  ... 48 elided
> !image-2019-01-02-14-25-40-980.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26519) spark sql CHANGE COLUMN not working

2019-01-02 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26519:
--
Component/s: (was: Spark Submit)
 (was: Spark Shell)
 SQL

> spark sql   CHANGE COLUMN not working 
> --
>
> Key: SPARK-26519
> URL: https://issues.apache.org/jira/browse/SPARK-26519
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: !image-2019-01-02-14-25-34-594.png!
>Reporter: suman gorantla
>Priority: Major
> Attachments: sparksql error.PNG
>
>
> Dear Team,
> with spark sql I am unable to change the newly added column() position after 
> an existing column in the table (old_column) of a hive external table please 
> see the screenshot as in below 
> scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col  
> STRING)")
>  res14: org.apache.spark.sql.DataFrame = []
>  sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col  
> STRING AFTER old_col ")
>  org.apache.spark.sql.catalyst.parser.ParseException:
>  Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE 
> COLUMN ... FIRST | AFTER otherCol(line 1, pos 0)
> == SQL ==
>  ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col  new_col  
> STRING AFTER  old_col 
>  ^^^
> at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55)
>  at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97)
>  at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
>  ... 48 elided
> !image-2019-01-02-14-25-40-980.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26519) spark sql CHANGE COLUMN not working

2019-01-02 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732528#comment-16732528
 ] 

Dongjoon Hyun commented on SPARK-26519:
---

Hi, [~sumanGorantla]. It's not a bug, isn't it? Please see the log.

{code}
Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE 
COLUMN ... FIRST | AFTER otherCol(line 1, pos 0)
{code}

> spark sql   CHANGE COLUMN not working 
> --
>
> Key: SPARK-26519
> URL: https://issues.apache.org/jira/browse/SPARK-26519
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Spark Submit
>Affects Versions: 2.1.0
> Environment: !image-2019-01-02-14-25-34-594.png!
>Reporter: suman gorantla
>Priority: Major
> Attachments: sparksql error.PNG
>
>
> Dear Team,
> with spark sql I am unable to change the newly added column() position after 
> an existing column in the table (old_column) of a hive external table please 
> see the screenshot as in below 
> scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col  
> STRING)")
>  res14: org.apache.spark.sql.DataFrame = []
>  sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col  
> STRING AFTER old_col ")
>  org.apache.spark.sql.catalyst.parser.ParseException:
>  Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE 
> COLUMN ... FIRST | AFTER otherCol(line 1, pos 0)
> == SQL ==
>  ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col  new_col  
> STRING AFTER  old_col 
>  ^^^
> at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55)
>  at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97)
>  at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
>  ... 48 elided
> !image-2019-01-02-14-25-40-980.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23525) ALTER TABLE CHANGE COLUMN COMMENT doesn't work for external hive table

2019-01-02 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-23525:
--
Summary: ALTER TABLE CHANGE COLUMN COMMENT doesn't work for external hive 
table  (was: ALTER TABLE CHANGE COLUMN doesn't work for external hive table)

> ALTER TABLE CHANGE COLUMN COMMENT doesn't work for external hive table
> --
>
> Key: SPARK-23525
> URL: https://issues.apache.org/jira/browse/SPARK-23525
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Pavlo Skliar
>Assignee: Xingbo Jiang
>Priority: Major
> Fix For: 2.2.2, 2.3.1, 2.4.0
>
>
> {code:java}
> print(spark.sql("""
> SHOW CREATE TABLE test.trends
> """).collect()[0].createtab_stmt)
> /// OUTPUT
> CREATE EXTERNAL TABLE `test`.`trends`(`id` string COMMENT '', `metric` string 
> COMMENT '', `amount` bigint COMMENT '')
> COMMENT ''
> PARTITIONED BY (`date` string COMMENT '')
> ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> WITH SERDEPROPERTIES (
>   'serialization.format' = '1'
> )
> STORED AS
>   INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
>   OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION 's3://x/x/'
> TBLPROPERTIES (
>   'transient_lastDdlTime' = '1519729384',
>   'last_modified_time' = '1519645652',
>   'last_modified_by' = 'pavlo',
>   'last_castor_run_ts' = '1513561658.0'
> )
> spark.sql("""
> DESCRIBE test.trends
> """).collect()
> // OUTPUT
> [Row(col_name='id', data_type='string', comment=''),
>  Row(col_name='metric', data_type='string', comment=''),
>  Row(col_name='amount', data_type='bigint', comment=''),
>  Row(col_name='date', data_type='string', comment=''),
>  Row(col_name='# Partition Information', data_type='', comment=''),
>  Row(col_name='# col_name', data_type='data_type', comment='comment'),
>  Row(col_name='date', data_type='string', comment='')]
> spark.sql("""alter table test.trends change column id id string comment 
> 'unique identifier'""")
> spark.sql("""
> DESCRIBE test.trends
> """).collect()
> // OUTPUT
> [Row(col_name='id', data_type='string', comment=''), Row(col_name='metric', 
> data_type='string', comment=''), Row(col_name='amount', data_type='bigint', 
> comment=''), Row(col_name='date', data_type='string', comment=''), 
> Row(col_name='# Partition Information', data_type='', comment=''), 
> Row(col_name='# col_name', data_type='data_type', comment='comment'), 
> Row(col_name='date', data_type='string', comment='')]
> {code}
> The strange is that I've assigned comment to the id field from hive 
> successfully, and it's visible in Hue UI, but it's still not visible in from 
> spark, and any spark requests doesn't have effect on the comments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26277) WholeStageCodegen metrics should be tested with whole-stage codegen enabled

2019-01-02 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-26277:
-

Assignee: Chenxiao Mao

> WholeStageCodegen metrics should be tested with whole-stage codegen enabled
> ---
>
> Key: SPARK-26277
> URL: https://issues.apache.org/jira/browse/SPARK-26277
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Chenxiao Mao
>Assignee: Chenxiao Mao
>Priority: Major
>
> In {{org.apache.spark.sql.execution.metric.SQLMetricsSuite}}, there's a test 
> case named "WholeStageCodegen metrics". However, it is executed with 
> whole-stage codegen disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26277) WholeStageCodegen metrics should be tested with whole-stage codegen enabled

2019-01-02 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26277.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23224
[https://github.com/apache/spark/pull/23224]

> WholeStageCodegen metrics should be tested with whole-stage codegen enabled
> ---
>
> Key: SPARK-26277
> URL: https://issues.apache.org/jira/browse/SPARK-26277
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Chenxiao Mao
>Assignee: Chenxiao Mao
>Priority: Major
> Fix For: 3.0.0
>
>
> In {{org.apache.spark.sql.execution.metric.SQLMetricsSuite}}, there's a test 
> case named "WholeStageCodegen metrics". However, it is executed with 
> whole-stage codegen disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26502) Get rid of hiveResultString() in QueryExecution

2019-01-02 Thread Mark Hamstra (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732435#comment-16732435
 ] 

Mark Hamstra commented on SPARK-26502:
--

Don't lose track of this comment: 
[https://github.com/apache/spark/blob/948414afe706e0b526d7f83f598cbd204d2fc687/sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala#L41]

 

Any significant change to QueryExecution needs to doc'd carefully and included 
in the release notes since you will be forcing 3rd party changes. 

> Get rid of hiveResultString() in QueryExecution
> ---
>
> Key: SPARK-26502
> URL: https://issues.apache.org/jira/browse/SPARK-26502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The method hiveResultString() of QueryExecution is used in test and 
> SparkSQLDriver. It should be moved from QueryExecution to more specific class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26520) data source V2 API refactoring (micro-batch read)

2019-01-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26520:


Assignee: Apache Spark  (was: Wenchen Fan)

> data source V2 API refactoring (micro-batch read)
> -
>
> Key: SPARK-26520
> URL: https://issues.apache.org/jira/browse/SPARK-26520
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26520) data source V2 API refactoring (micro-batch read)

2019-01-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26520:


Assignee: Wenchen Fan  (was: Apache Spark)

> data source V2 API refactoring (micro-batch read)
> -
>
> Key: SPARK-26520
> URL: https://issues.apache.org/jira/browse/SPARK-26520
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26520) data source V2 API refactoring (micro-batch read)

2019-01-02 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-26520:
---

 Summary: data source V2 API refactoring (micro-batch read)
 Key: SPARK-26520
 URL: https://issues.apache.org/jira/browse/SPARK-26520
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26519) spark sql CHANGE COLUMN not working

2019-01-02 Thread suman gorantla (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

suman gorantla updated SPARK-26519:
---
Description: 
Dear Team,

with spark sql I am unable to change the newly added column() position after an 
existing column in the table (old_column) of a hive external table please see 
the screenshot as in below 

scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col  
STRING)")
 res14: org.apache.spark.sql.DataFrame = []

 sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col  
STRING AFTER old_col ")
 org.apache.spark.sql.catalyst.parser.ParseException:
 Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE 
COLUMN ... FIRST | AFTER otherCol(line 1, pos 0)

== SQL ==
 ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col  new_col  
STRING AFTER  old_col 
 ^^^

at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
 at 
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934)
 at 
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928)
 at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
 at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928)
 at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55)
 at 
org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485)
 at 
org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
 at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70)
 at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
 at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68)
 at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97)
 at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
 at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
 at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
 ... 48 elided

!image-2019-01-02-14-25-40-980.png!

 

  was:
Dear Team,

with spark sql I am unable to change the newly added column() position after an 
existing column in the table (old_column) of a hive external table please see 
the screenshot as in below 

scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col  
STRING)")
res14: org.apache.spark.sql.DataFrame = []

 sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col  
STRING AFTER old_col ")
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE 
COLUMN ... FIRST | AFTER otherCol(line 1, pos 0)

== SQL ==
ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN column_ui column_ui 
STRING AFTER col1
^^^

at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
 at 
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934)
 at 
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928)
 at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
 at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928)
 at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55)
 at 
org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485)
 at 
org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
 at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70)
 at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
 at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68)
 at

[jira] [Updated] (SPARK-26519) spark sql CHANGE COLUMN not working

2019-01-02 Thread suman gorantla (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

suman gorantla updated SPARK-26519:
---
Attachment: sparksql error.PNG

> spark sql   CHANGE COLUMN not working 
> --
>
> Key: SPARK-26519
> URL: https://issues.apache.org/jira/browse/SPARK-26519
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Spark Submit
>Affects Versions: 2.1.0
> Environment: !image-2019-01-02-14-25-34-594.png!
>Reporter: suman gorantla
>Priority: Major
> Attachments: sparksql error.PNG
>
>
> Dear Team,
> with spark sql I am unable to change the newly added column() position after 
> an existing column in the table (old_column) of a hive external table please 
> see the screenshot as in below 
> scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col  
> STRING)")
> res14: org.apache.spark.sql.DataFrame = []
>  sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col  
> STRING AFTER old_col ")
> org.apache.spark.sql.catalyst.parser.ParseException:
> Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE 
> COLUMN ... FIRST | AFTER otherCol(line 1, pos 0)
> == SQL ==
> ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN column_ui column_ui 
> STRING AFTER col1
> ^^^
> at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928)
>  at 
> org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55)
>  at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485)
>  at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97)
>  at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
>  at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
>  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
>  ... 48 elided
> !image-2019-01-02-14-25-40-980.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26519) spark sql CHANGE COLUMN not working

2019-01-02 Thread suman gorantla (JIRA)

suman gorantla created SPARK-26519:
--

 Summary: spark sql   CHANGE COLUMN not working 
 Key: SPARK-26519
 URL: https://issues.apache.org/jira/browse/SPARK-26519
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell, Spark Submit
Affects Versions: 2.1.0
 Environment: !image-2019-01-02-14-25-34-594.png!
Reporter: suman gorantla
 Attachments: sparksql error.PNG

Dear Team,

with spark sql I am unable to change the newly added column() position after an 
existing column in the table (old_column) of a hive external table please see 
the screenshot as in below 

scala> sql("ALTER TABLE enterprisedatalakedev.tmptst ADD COLUMNs (new_col  
STRING)")
res14: org.apache.spark.sql.DataFrame = []

 sql("ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN new_col new_col  
STRING AFTER old_col ")
org.apache.spark.sql.catalyst.parser.ParseException:
Operation not allowed: ALTER TABLE table [PARTITION partition_spec] CHANGE 
COLUMN ... FIRST | AFTER otherCol(line 1, pos 0)

== SQL ==
ALTER TABLE enterprisedatalakedev.tmptst CHANGE COLUMN column_ui column_ui 
STRING AFTER col1
^^^

at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.operationNotAllowed(ParserUtils.scala:39)
 at 
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:934)
 at 
org.apache.spark.sql.execution.SparkSqlAstBuilder$$anonfun$visitChangeColumn$1.apply(SparkSqlParser.scala:928)
 at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
 at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:928)
 at 
org.apache.spark.sql.execution.SparkSqlAstBuilder.visitChangeColumn(SparkSqlParser.scala:55)
 at 
org.apache.spark.sql.catalyst.parser.SqlBaseParser$ChangeColumnContext.accept(SqlBaseParser.java:1485)
 at 
org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:71)
 at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:99)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:70)
 at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:69)
 at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:68)
 at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:97)
 at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
 at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
 at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
 ... 48 elided

!image-2019-01-02-14-25-40-980.png!

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory

2019-01-02 Thread Steve Loughran (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732388#comment-16732388
 ] 

Steve Loughran commented on SPARK-2984:
---

Gaurav, if this has returned in a 2.x version against HDFS, best to open a new 
JIRA and mark as related to this one.

> FileNotFoundException on _temporary directory
> -
>
> Key: SPARK-2984
> URL: https://issues.apache.org/jira/browse/SPARK-2984
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Ash
>Assignee: Josh Rosen
>Priority: Critical
> Fix For: 1.3.0
>
>
> We've seen several stacktraces and threads on the user mailing list where 
> people are having issues with a {{FileNotFoundException}} stemming from an 
> HDFS path containing {{_temporary}}.
> I ([~aash]) think this may be related to {{spark.speculation}}.  I think the 
> error condition might manifest in this circumstance:
> 1) task T starts on a executor E1
> 2) it takes a long time, so task T' is started on another executor E2
> 3) T finishes in E1 so moves its data from {{_temporary}} to the final 
> destination and deletes the {{_temporary}} directory during cleanup
> 4) T' finishes in E2 and attempts to move its data from {{_temporary}}, but 
> those files no longer exist!  exception
> Some samples:
> {noformat}
> 14/08/11 08:05:08 ERROR JobScheduler: Error running job streaming job 
> 140774430 ms.0
> java.io.FileNotFoundException: File 
> hdfs://hadoopc/user/csong/output/human_bot/-140774430.out/_temporary/0/task_201408110805__m_07
>  does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:654)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:360)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
> at 
> org.apache.hadoop.mapred.FileOutputCommitter.commitJob(FileOutputCommitter.java:136)
> at 
> org.apache.spark.SparkHadoopWriter.commitJob(SparkHadoopWriter.scala:126)
> at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:841)
> at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:724)
> at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:643)
> at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1068)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:773)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:771)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> at scala.util.Try$.apply(Try.scala:161)
> at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
> at 
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> -- Chen Song at 
> http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFiles-file-not-found-exception-td10686.html
> {noformat}
> I am running a Spark Streaming job that uses saveAsTextFiles to save results 
> into hdfs files. However, it has an exception after 20 batches
> result-140631234/_temporary/0/task_201407251119__m_03 does not 
> exist.
> {noformat}
> and
> {noformat}
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /apps/data/vddil/real-time/checkpoint/temp: File does not exist. 
> Holder DFSClient_NONMAPREDUCE_327993456_13 does not have any open files.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2946)
>

[jira] [Commented] (SPARK-24603) Typo in comments

2019-01-02 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732340#comment-16732340
 ] 

Dongjoon Hyun commented on SPARK-24603:
---

I removed 2.2.2 and added 2.2.3 since this wasn't released as a part of Apache 
Spark 2.2.2.
- https://dist.apache.org/repos/dist/release/spark/spark-2.2.2/spark-2.2.2.tgz

> Typo in comments
> 
>
> Key: SPARK-24603
> URL: https://issues.apache.org/jira/browse/SPARK-24603
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Trivial
> Fix For: 2.2.3, 2.3.2, 2.4.0
>
>
> The findTightestCommonTypeOfTwo has been renamed to findTightestCommonType



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24603) Typo in comments

2019-01-02 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24603:
--
Fix Version/s: 2.2.3

> Typo in comments
> 
>
> Key: SPARK-24603
> URL: https://issues.apache.org/jira/browse/SPARK-24603
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Trivial
> Fix For: 2.2.3, 2.3.2, 2.4.0
>
>
> The findTightestCommonTypeOfTwo has been renamed to findTightestCommonType



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24603) Typo in comments

2019-01-02 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-24603:
--
Fix Version/s: (was: 2.2.2)

> Typo in comments
> 
>
> Key: SPARK-24603
> URL: https://issues.apache.org/jira/browse/SPARK-24603
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Trivial
> Fix For: 2.3.2, 2.4.0
>
>
> The findTightestCommonTypeOfTwo has been renamed to findTightestCommonType



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25253) Refactor pyspark connection & authentication

2019-01-02 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732329#comment-16732329
 ] 

Dongjoon Hyun commented on SPARK-25253:
---

I added `2.2.3` at the fix versions because `branch-2.2` has this. It seems 
that we need to add `2.3.x`, too.

> Refactor pyspark connection & authentication
> 
>
> Key: SPARK-25253
> URL: https://issues.apache.org/jira/browse/SPARK-25253
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Assignee: Imran Rashid
>Priority: Minor
> Fix For: 2.2.3, 2.4.0
>
>
> We've got a few places in pyspark that connect to local sockets, with varying 
> levels of ipv6 handling, graceful error handling, and lots of copy-and-paste. 
>  should be pretty easy to clean this up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25253) Refactor pyspark connection & authentication

2019-01-02 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25253:
--
Fix Version/s: 2.2.3

> Refactor pyspark connection & authentication
> 
>
> Key: SPARK-25253
> URL: https://issues.apache.org/jira/browse/SPARK-25253
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Imran Rashid
>Assignee: Imran Rashid
>Priority: Minor
> Fix For: 2.2.3, 2.4.0
>
>
> We've got a few places in pyspark that connect to local sockets, with varying 
> levels of ipv6 handling, graceful error handling, and lots of copy-and-paste. 
>  should be pretty easy to clean this up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2019-01-02 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732299#comment-16732299
 ] 

Sujith commented on SPARK-26432:


Test description is been updated. let me know for any suggestions or input. 
thnks all.

> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  Test steps:
> Steps to test Spark-Hbase connection
> 1. Create 2 tables in hbase shell
>  >Launch hbase shell
>  >Enter commands to create tables and load data
>  create 'table1','cf'
>  put 'table1','row1','cf:cid','20'
> create 'table2','cf'
>  put 'table2','row1','cf:cid','30'
>  
>  >Show values command
>  get 'table1','row1','cf:cid' will diplay value as 20
>  get 'table2','row1','cf:cid' will diplay value as 30
>  
>  
> 2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit
> spark-submit --master yarn-cluster --class 
> com.mrs.example.spark.SparkHbasetoHbase --conf 
> "spark.yarn.security.credentials.hbase.enabled"="true" --conf 
> "spark.security.credentials.hbase.enabled"="true" --keytab 
> /opt/client/user.keytab --principal sen testSpark.jar
> The SparkHbasetoHbase class will update the value of table2 with sum of 
> values of table1 & table2.
> table2 = table1+table2
>  
> 3.Verify the result in hbase shell
> Expected Result: The value of table2 should be 50.
> get 'table1','row1','cf:cid'  will diplay value as 50
> Actual Result : Not updating the value as an error will be thrown when spark 
> tries to connect with hbase service.
> Attached the snapshot of error logs below for more details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2019-01-02 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26432:
---
Description: 
Getting NoSuchMethodException :

org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)

while trying  connect hbase 2.1 service from spark.

This is mainly happening because in spark uses  a deprecated hbase api 

public static Token obtainToken(Configuration 
conf)  

for obtaining the token and the same has been removed from hbase 2.1 version.

 Test steps:

Steps to test Spark-Hbase connection

1. Create 2 tables in hbase shell
 >Launch hbase shell
 >Enter commands to create tables and load data
 create 'table1','cf'
 put 'table1','row1','cf:cid','20'

create 'table2','cf'
 put 'table2','row1','cf:cid','30'
 
 >Show values command
 get 'table1','row1','cf:cid' will diplay value as 20
 get 'table2','row1','cf:cid' will diplay value as 30
 
 
2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit

spark-submit --master yarn-cluster --class 
com.mrs.example.spark.SparkHbasetoHbase --conf 
"spark.yarn.security.credentials.hbase.enabled"="true" --conf 
"spark.security.credentials.hbase.enabled"="true" --keytab 
/opt/client/user.keytab --principal sen testSpark.jar

The SparkHbasetoHbase class will update the value of table2 with sum of values 
of table1 & table2.

table2 = table1+table2

 

3.Verify the result in hbase shell

Expected Result: The value of table2 should be 50.

get 'table1','row1','cf:cid'  will diplay value as 50

Actual Result : Not updating the value as an error will be thrown when spark 
tries to connect with hbase service.

Attached the snapshot of error logs below for more details

  was:
Getting NoSuchMethodException :

org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)

while trying  connect hbase 2.1 service from spark.

This is mainly happening because in spark uses  a deprecated hbase api 

public static Token obtainToken(Configuration 
conf)  

for obtaining the token and the same has been removed from hbase 2.1 version.

 

Attached the snapshot of error logs


> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  Test steps:
> Steps to test Spark-Hbase connection
> 1. Create 2 tables in hbase shell
>  >Launch hbase shell
>  >Enter commands to create tables and load data
>  create 'table1','cf'
>  put 'table1','row1','cf:cid','20'
> create 'table2','cf'
>  put 'table2','row1','cf:cid','30'
>  
>  >Show values command
>  get 'table1','row1','cf:cid' will diplay value as 20
>  get 'table2','row1','cf:cid' will diplay value as 30
>  
>  
> 2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit
> spark-submit --master yarn-cluster --class 
> com.mrs.example.spark.SparkHbasetoHbase --conf 
> "spark.yarn.security.credentials.hbase.enabled"="true" --conf 
> "spark.security.credentials.hbase.enabled"="true" --keytab 
> /opt/client/user.keytab --principal sen testSpark.jar
> The SparkHbasetoHbase class will update the value of table2 with sum of 
> values of table1 & table2.
> table2 = table1+table2
>  
> 3.Verify the result in hbase shell
> Expected Result: The value of table2 should be 50.
> get 'table1','row1','cf:cid'  will diplay value as 50
> Actual Result : Not updating the value as an error will be thrown when spark 
> tries to connect with hbase service.
> Attached the snapshot of error logs below for more details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2019-01-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26432:


Assignee: Apache Spark

> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Assignee: Apache Spark
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  
> Attached the snapshot of error logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2019-01-02 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732281#comment-16732281
 ] 

Sujith edited comment on SPARK-26432 at 1/2/19 6:27 PM:


sorry for the late response due to holidays :), raised a PR please let me know 
for any suggestions. thanks. PR is in WIP as i need to attach test report which 
i will attach tomorrow


was (Author: s71955):
sorry for the late response due to holidays :), raised a PR please let me know 
for any suggestions. thanks

> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  
> Attached the snapshot of error logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2019-01-02 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732281#comment-16732281
 ] 

Sujith commented on SPARK-26432:


sorry for the late response due to holidays :), raised a PR please let me know 
for any suggestions. thanks

> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  
> Attached the snapshot of error logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2019-01-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26432:


Assignee: (was: Apache Spark)

> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  
> Attached the snapshot of error logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26507) Fix core tests for Java 11

2019-01-02 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26507.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23419
[https://github.com/apache/spark/pull/23419]

> Fix core tests for Java 11
> --
>
> Key: SPARK-26507
> URL: https://issues.apache.org/jira/browse/SPARK-26507
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 3.0.0
>
>
> Several core tests still don't pass in Java 11. Some simple fixes will make 
> them pass. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26518) UI Application Info Race Condition Can Throw NoSuchElement

2019-01-02 Thread Russell Spitzer (JIRA)

Russell Spitzer created SPARK-26518:
---

 Summary: UI Application Info Race Condition Can Throw NoSuchElement
 Key: SPARK-26518
 URL: https://issues.apache.org/jira/browse/SPARK-26518
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.4.0, 2.3.0
Reporter: Russell Spitzer


There is a slight race condition in the 
[AppStatusStore|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/AppStatusStore.scala#L39]

Which calls `next` on the returned store even if it is empty which i can be for 
a short period of time after the UI is up but before the store is populated.

{code}


Error 500 Server Error

HTTP ERROR 500
Problem accessing /jobs/. Reason:
Server ErrorCaused 
by:java.util.NoSuchElementException
at java.util.Collections$EmptyIterator.next(Collections.java:4189)
at 
org.apache.spark.util.kvstore.InMemoryStore$InMemoryIterator.next(InMemoryStore.java:281)
at 
org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:38)
at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:275)
at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:86)
at org.apache.spark.ui.WebUI$$anonfun$3.apply(WebUI.scala:86)
at org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:865)
at 
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:535)
at 
org.spark_project.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
at 
org.spark_project.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at 
org.spark_project.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
at 
org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:724)
at 
org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at 
org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.spark_project.jetty.server.Server.handle(Server.java:531)
at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at 
org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at 
org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:102)
at 
org.spark_project.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at 
org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
at 
org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26489) Use ConfigEntry for hardcoded configs for python/r categories.

2019-01-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26489:


Assignee: (was: Apache Spark)

> Use ConfigEntry for hardcoded configs for python/r categories.
> --
>
> Key: SPARK-26489
> URL: https://issues.apache.org/jira/browse/SPARK-26489
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Make the following hardcoded configs to use ConfigEntry.
> {code}
> spark.python
> spark.r
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26489) Use ConfigEntry for hardcoded configs for python/r categories.

2019-01-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26489:


Assignee: Apache Spark

> Use ConfigEntry for hardcoded configs for python/r categories.
> --
>
> Key: SPARK-26489
> URL: https://issues.apache.org/jira/browse/SPARK-26489
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> Make the following hardcoded configs to use ConfigEntry.
> {code}
> spark.python
> spark.r
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception

2019-01-02 Thread Udbhav Agrawal (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732005#comment-16732005
 ] 

Udbhav Agrawal commented on SPARK-26454:


okay i will work on that and raise the PR.

> While creating new UDF with JAR though UDF is created successfully, it throws 
> IllegegalArgument Exception
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
>  1.launch spark-shell
>  2. set role admin;
>  3. create new function
>    CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
>  4. Do select on the function
>  sql("select Func('2018-03-09')").show()
>  5.Create new UDF with same JAR
>     sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26511) java.lang.ClassCastException error when loading Spark MLlib model from parquet file

2019-01-02 Thread Amy Koh (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731996#comment-16731996
 ] 

Amy Koh commented on SPARK-26511:
-

Thanks [~viirya]. I do indeed have a slightly modified format of the saved 
model. I reordered the columns in the schema and it's now working OK. Would be 
nice it could provide a more meaningful error message in the schema validation 
step!

> java.lang.ClassCastException error when loading Spark MLlib model from 
> parquet file
> ---
>
> Key: SPARK-26511
> URL: https://issues.apache.org/jira/browse/SPARK-26511
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.4.0
>Reporter: Amy Koh
>Priority: Major
> Attachments: repro.zip
>
>
> When I tried to load a decision tree model from a parquet file, the following 
> error is thrown. 
> {code:bash}
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.mllib.tree.model.DecisionTreeModel.load. : 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 2, localhost, executor driver): java.lang.ClassCastException: class 
> java.lang.Double cannot be cast to class java.lang.Integer (java.lang.Double 
> and java.lang.Integer are in module java.base of loader 'bootstrap') at 
> scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101) at 
> org.apache.spark.sql.Row$class.getInt(Row.scala:223) at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getInt(rows.scala:165) 
> at 
> org.apache.spark.mllib.tree.model.DecisionTreeModel$SaveLoadV1_0$SplitData$.apply(DecisionTreeModel.scala:171)
>  at 
> org.apache.spark.mllib.tree.model.DecisionTreeModel$SaveLoadV1_0$NodeData$.apply(DecisionTreeModel.scala:195)
>  at 
> org.apache.spark.mllib.tree.model.DecisionTreeModel$SaveLoadV1_0$$anonfun$9.apply(DecisionTreeModel.scala:247)
>  at 
> org.apache.spark.mllib.tree.model.DecisionTreeModel$SaveLoadV1_0$$anonfun$9.apply(DecisionTreeModel.scala:247)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at 
> scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:149)
>  at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at 
> org.apache.spark.scheduler.Task.run(Task.scala:108) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:834) Driver stacktrace: at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486) 
> at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
>  at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
>  at scala.Option.foreach(Option.scala:257) at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2022) at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2043) at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2062) at 
> org.apache.spark.SparkContext.runJob(SparkContext.scala:2087) at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936) at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>  at

[jira] [Commented] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception

2019-01-02 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732002#comment-16732002
 ] 

Sujith commented on SPARK-26454:


I think [~hyukjin.kwon]  idea is better and simple, we can reduce level to 
warn, because when you say error which means user wont expect the particular 
operation to be successful sometimes . so to avoid confusions better to lower 
the error level.

> While creating new UDF with JAR though UDF is created successfully, it throws 
> IllegegalArgument Exception
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
>  1.launch spark-shell
>  2. set role admin;
>  3. create new function
>    CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
>  4. Do select on the function
>  sql("select Func('2018-03-09')").show()
>  5.Create new UDF with same JAR
>     sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception

2019-01-02 Thread Udbhav Agrawal (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731769#comment-16731769
 ] 

Udbhav Agrawal edited comment on SPARK-26454 at 1/2/19 12:15 PM:
-

CC [~sandeep-katta] [~sujith] [~ajithshetty28] [~S71955]


was (Author: udbhav agrawal):
CC [~sandeep-katta] [~sujith] [~ajithshetty28]

> While creating new UDF with JAR though UDF is created successfully, it throws 
> IllegegalArgument Exception
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
>  1.launch spark-shell
>  2. set role admin;
>  3. create new function
>    CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
>  4. Do select on the function
>  sql("select Func('2018-03-09')").show()
>  5.Create new UDF with same JAR
>     sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception

2019-01-02 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731983#comment-16731983
 ] 

Hyukjin Kwon commented on SPARK-26454:
--

Thing is, we shouldn't introduce behaviour change when we fix. Maybe we could 
consider lowering log level from error to warning.

> While creating new UDF with JAR though UDF is created successfully, it throws 
> IllegegalArgument Exception
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
>  1.launch spark-shell
>  2. set role admin;
>  3. create new function
>    CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
>  4. Do select on the function
>  sql("select Func('2018-03-09')").show()
>  5.Create new UDF with same JAR
>     sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception

2019-01-02 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731983#comment-16731983
 ] 

Hyukjin Kwon edited comment on SPARK-26454 at 1/2/19 11:54 AM:
---

Thing is, we shouldn't introduce behaviour change when we fix. This code path 
is shared by multiple APIs. Maybe we could consider lowering log level from 
error to warning.


was (Author: hyukjin.kwon):
Thing is, we shouldn't introduce behaviour change when we fix. Maybe we could 
consider lowering log level from error to warning.

> While creating new UDF with JAR though UDF is created successfully, it throws 
> IllegegalArgument Exception
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
>  1.launch spark-shell
>  2. set role admin;
>  3. create new function
>    CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
>  4. Do select on the function
>  sql("select Func('2018-03-09')").show()
>  5.Create new UDF with same JAR
>     sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception

2019-01-02 Thread Udbhav Agrawal (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731974#comment-16731974
 ] 

Udbhav Agrawal edited comment on SPARK-26454 at 1/2/19 11:26 AM:
-

[~hyukjin.kwon]

[https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L78]

i thought of changing the require function to conditional statement and instead 
of throwing an exception provide a warning message instead, but didn't feel any 
major use. can you suggest to go ahead with the same or else i will close the 
issue.


was (Author: udbhav agrawal):
[https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L78]

i thought of changing the require function to conditional statement and instead 
of throwing an exception provide a warning message instead, but didn't feel any 
major use. can you suggest to go ahead with the same or else i will close the 
issue.

> While creating new UDF with JAR though UDF is created successfully, it throws 
> IllegegalArgument Exception
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
>  1.launch spark-shell
>  2. set role admin;
>  3. create new function
>    CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
>  4. Do select on the function
>  sql("select Func('2018-03-09')").show()
>  5.Create new UDF with same JAR
>     sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception

2019-01-02 Thread Udbhav Agrawal (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731974#comment-16731974
 ] 

Udbhav Agrawal commented on SPARK-26454:


[https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L78]

i thought of changing the require function to conditional statement and instead 
of throwing an exception provide a warning message instead, but didn't feel any 
major use. can you suggest to go ahead with the same or else i will close the 
issue.

> While creating new UDF with JAR though UDF is created successfully, it throws 
> IllegegalArgument Exception
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
>  1.launch spark-shell
>  2. set role admin;
>  3. create new function
>    CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
>  4. Do select on the function
>  sql("select Func('2018-03-09')").show()
>  5.Create new UDF with same JAR
>     sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception

2019-01-02 Thread Udbhav Agrawal (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731967#comment-16731967
 ] 

Udbhav Agrawal commented on SPARK-26454:


[https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L78]
 

> While creating new UDF with JAR though UDF is created successfully, it throws 
> IllegegalArgument Exception
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
>  1.launch spark-shell
>  2. set role admin;
>  3. create new function
>    CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
>  4. Do select on the function
>  sql("select Func('2018-03-09')").show()
>  5.Create new UDF with same JAR
>     sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception

2019-01-02 Thread Udbhav Agrawal (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udbhav Agrawal updated SPARK-26454:
---
Comment: was deleted

(was: 
[https://github.com/apache/spark/blob/02b510728c31b70e6035ad541bfcdc2b59dcd79a/core/src/main/scala/org/apache/spark/rpc/netty/NettyStreamManager.scala#L78]
 )

> While creating new UDF with JAR though UDF is created successfully, it throws 
> IllegegalArgument Exception
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
>  1.launch spark-shell
>  2. set role admin;
>  3. create new function
>    CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
>  4. Do select on the function
>  sql("select Func('2018-03-09')").show()
>  5.Create new UDF with same JAR
>     sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26462) Use ConfigEntry for hardcoded configs for execution categories.

2019-01-02 Thread Takuya Ueshin (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731953#comment-16731953
 ] 

Takuya Ueshin commented on SPARK-26462:
---

[~pralabhkumar] Sure. Please feel free to create the pull request. Thanks!

> Use ConfigEntry for hardcoded configs for execution categories.
> ---
>
> Key: SPARK-26462
> URL: https://issues.apache.org/jira/browse/SPARK-26462
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Make the following hardcoded configs to use ConfigEntry.
> {code}
> spark.memory
> spark.storage
> spark.io
> spark.buffer
> spark.rdd
> spark.locality
> spark.broadcast
> spark.reducer
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception

2019-01-02 Thread Udbhav Agrawal (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udbhav Agrawal updated SPARK-26454:
---
Summary: While creating new UDF with JAR though UDF is created 
successfully, it throws IllegegalArgument Exception  (was: while creating new 
UDF with JAR though UDF is created successfully)

> While creating new UDF with JAR though UDF is created successfully, it throws 
> IllegegalArgument Exception
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
>  1.launch spark-shell
>  2. set role admin;
>  3. create new function
>    CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
>  4. Do select on the function
>  sql("select Func('2018-03-09')").show()
>  5.Create new UDF with same JAR
>     sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26454) while creating new UDF with JAR though UDF is created successfully

2019-01-02 Thread Udbhav Agrawal (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udbhav Agrawal updated SPARK-26454:
---
Description: 
【Test step】：
 1.launch spark-shell
 2. set role admin;
 3. create new function
   CREATE FUNCTION Func AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
'hdfs:///tmp/super_udf/two_udfs.jar'
 4. Do select on the function
 sql("select Func('2018-03-09')").show()
 5.Create new UDF with same JAR
    sql("CREATE FUNCTION newFunc AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
'hdfs:///tmp/super_udf/two_udfs.jar'")

6. Do select on the new function created.

sql("select newFunc ('2018-03-09')").show()

【Output】:

Function is getting created but illegal argument exception is thrown , select 
provides result but with illegal argument exception.

  was:
【Test step】：
1.launch spark-shell
2. set role admin;
3. create new function
  CREATE FUNCTION Func AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
'hdfs:///tmp/super_udf/two_udfs.jar'
4. Do select on the function
sql("select Func('2018-03-09')").show()
5.Create new UDF with same JAR
   sql("CREATE FUNCTION newFunc AS 
'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
'hdfs:///tmp/super_udf/two_udfs.jar'")

6. Do select on the new function created.

sql("select newFunc ('2018-03-09')").show()

【Output】:

Function is getting created but illegal argument exception is thrown , select 
provides result but with illegal argument exception.

Summary: while creating new UDF with JAR though UDF is created 
successfully  (was: IllegegalArgument Exception is Thrown while creating new 
UDF with JAR)

> while creating new UDF with JAR though UDF is created successfully
> --
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
>  1.launch spark-shell
>  2. set role admin;
>  3. create new function
>    CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
>  4. Do select on the function
>  sql("select Func('2018-03-09')").show()
>  5.Create new UDF with same JAR
>     sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26517) Avoid duplicate test in ParquetSchemaPruningSuite

2019-01-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26517:


Assignee: (was: Apache Spark)

> Avoid duplicate test in ParquetSchemaPruningSuite
> -
>
> Key: SPARK-26517
> URL: https://issues.apache.org/jira/browse/SPARK-26517
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> `testExactCaseQueryPruning` and `testMixedCaseQueryPruning` don't need to set 
> up `PARQUET_VECTORIZED_READER_ENABLED` config. Because `withMixedCaseData` 
> will run against both Spark vectorized reader and Parquet-mr reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26517) Avoid duplicate test in ParquetSchemaPruningSuite

2019-01-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26517:


Assignee: Apache Spark

> Avoid duplicate test in ParquetSchemaPruningSuite
> -
>
> Key: SPARK-26517
> URL: https://issues.apache.org/jira/browse/SPARK-26517
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>Priority: Minor
>
> `testExactCaseQueryPruning` and `testMixedCaseQueryPruning` don't need to set 
> up `PARQUET_VECTORIZED_READER_ENABLED` config. Because `withMixedCaseData` 
> will run against both Spark vectorized reader and Parquet-mr reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26517) Avoid duplicate test in ParquetSchemaPruningSuite

2019-01-02 Thread Liang-Chi Hsieh (JIRA)

Liang-Chi Hsieh created SPARK-26517:
---

 Summary: Avoid duplicate test in ParquetSchemaPruningSuite
 Key: SPARK-26517
 URL: https://issues.apache.org/jira/browse/SPARK-26517
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Liang-Chi Hsieh


`testExactCaseQueryPruning` and `testMixedCaseQueryPruning` don't need to set 
up `PARQUET_VECTORIZED_READER_ENABLED` config. Because `withMixedCaseData` will 
run against both Spark vectorized reader and Parquet-mr reader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18805) InternalMapWithStateDStream make java.lang.StackOverflowError

2019-01-02 Thread Joost Verdoorn (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-18805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731924#comment-16731924
 ] 

Joost Verdoorn commented on SPARK-18805:


This issue occurs relatively often within our application, when resuming from 
checkpoint. Is there any progress on this?

> InternalMapWithStateDStream make java.lang.StackOverflowError 
> --
>
> Key: SPARK-18805
> URL: https://issues.apache.org/jira/browse/SPARK-18805
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.3, 2.0.2
> Environment: mesos
>Reporter: etienne
>Priority: Major
>
> When load InternalMapWithStateDStream from a check point.
> If isValidTime is true and if there is no generatedRDD at the given time 
> there is an infinite loop.
> 1) compute is call on InternalMapWithStateDStream
> 2) InternalMapWithStateDStream try to generate the previousRDD
> 3) Stream look in generatedRDD if the RDD is already generated for the given 
> time 
> 4) It not fund the rdd so it check if the time is valid.
> 5) if the time is valid call compute on InternalMapWithStateDStream
> 6) restart from 1)
> Here the exception that illustrate this error
> {code}
> Exception in thread "streaming-start" java.lang.StackOverflowError
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>   at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
>   at 
> org.apache.spark.streaming.dstream.InternalMapWithStateDStream.compute(MapWithStateDStream.scala:134)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>   at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
>   at 
> org.apache.spark.streaming.dstream.InternalMapWithStateDStream.compute(MapWithStateDStream.scala:134)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26454) IllegegalArgument Exception is Thrown while creating new UDF with JAR

2019-01-02 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731902#comment-16731902
 ] 

Hyukjin Kwon commented on SPARK-26454:
--

Can you fix the Jira title and description? it sounds like it doesn't work at 
all. If it's easy to fix, go ahead. Otherwise, I won't fix.

> IllegegalArgument Exception is Thrown while creating new UDF with JAR
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
> 1.launch spark-shell
> 2. set role admin;
> 3. create new function
>   CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
> 4. Do select on the function
> sql("select Func('2018-03-09')").show()
> 5.Create new UDF with same JAR
>    sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26454) IllegegalArgument Exception is Thrown while creating new UDF with JAR

2019-01-02 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-26454:
-
Priority: Trivial  (was: Major)

> IllegegalArgument Exception is Thrown while creating new UDF with JAR
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
> 1.launch spark-shell
> 2. set role admin;
> 3. create new function
>   CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
> 4. Do select on the function
> sql("select Func('2018-03-09')").show()
> 5.Create new UDF with same JAR
>    sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26454) IllegegalArgument Exception is Thrown while creating new UDF with JAR

2019-01-02 Thread Udbhav Agrawal (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731856#comment-16731856
 ] 

Udbhav Agrawal commented on SPARK-26454:


[~hyukjin.kwon] Yes code is working fine but showing the error log as well, and 
i checked in other clients nowhere error log is coming other than spark-shell. 
So do we need to handle this case or it is not required as code is working 
fine. please give your suggestions.

 

> IllegegalArgument Exception is Thrown while creating new UDF with JAR
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Major
> Attachments: create_exception.txt
>
>
> 【Test step】：
> 1.launch spark-shell
> 2. set role admin;
> 3. create new function
>   CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
> 4. Do select on the function
> sql("select Func('2018-03-09')").show()
> 5.Create new UDF with same JAR
>    sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26462) Use ConfigEntry for hardcoded configs for execution categories.

2019-01-02 Thread pralabhkumar (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731862#comment-16731862
 ] 

pralabhkumar commented on SPARK-26462:
--

[~ueshin]  I can work on this . Please let me know if its ok . I'll create the 
pull request

> Use ConfigEntry for hardcoded configs for execution categories.
> ---
>
> Key: SPARK-26462
> URL: https://issues.apache.org/jira/browse/SPARK-26462
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Make the following hardcoded configs to use ConfigEntry.
> {code}
> spark.memory
> spark.storage
> spark.io
> spark.buffer
> spark.rdd
> spark.locality
> spark.broadcast
> spark.reducer
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26516) zeppelin with spark on mesos: environment variable setting

2019-01-02 Thread Yui Hirasawa (JIRA)

Yui Hirasawa created SPARK-26516:


 Summary: zeppelin with spark on mesos: environment variable setting
 Key: SPARK-26516
 URL: https://issues.apache.org/jira/browse/SPARK-26516
 Project: Spark
  Issue Type: IT Help
  Components: Mesos, Spark Core
Affects Versions: 2.4.0
Reporter: Yui Hirasawa


I am trying to use zeppelin with spark on mesos mode following [Apache Zeppelin 
on Spark Cluster 
Mode|https://zeppelin.apache.org/docs/0.8.0/setup/deployment/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin-1].

In the instruction, we should set these environment variables:
{code:java}
export MASTER=mesos://127.0.1.1:5050
export MESOS_NATIVE_JAVA_LIBRARY=[PATH OF libmesos.so]
export SPARK_HOME=[PATH OF SPARK HOME]
{code}
As far as I know, these environment variables are used by zeppelin, so it 
should be set in localhost rather than in docker container(if i am wrong please 
correct me).

But mesos and spark is running inside docker container, so do we need to set 
these environment variables so that they are pointing to the path inside the 
docker container? If so, how should one achieve that?

Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26454) IllegegalArgument Exception is Thrown while creating new UDF with JAR

2019-01-02 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731818#comment-16731818
 ] 

Hyukjin Kwon commented on SPARK-26454:
--

Which Spark version do you use? Looks line number is different from Spark 2.3.2:

https://github.com/apache/spark/blob/v2.3.2/core/src/main/scala/org/apache/spark/SparkContext.scala#L1810-L1858

{code}
java.lang.IllegalArgumentException: requirement failed: File custom.jar was 
already registered with a different path (old path = 
/opt/sparkclient/Spark2x/tmp/spark-10ea8f59-fa23-46c5-af12-aa029bf2f5cb/custom.jar,
 new path = 
/opt/sparkclient/Spark2x/tmp/spark-ed12eb5e-b7b9-49d0-a7a4-a0dba9141ac9/custom.jar
at scala.Predef$.require(Predef.scala:224)
at 
org.apache.spark.rpc.netty.NettyStreamManager.addJar(NettyStreamManager.scala:78)
at org.apache.spark.SparkContext.addJarFile$1(SparkContext.scala:1829)
at org.apache.spark.SparkContext.addJar(SparkContext.scala:1851)
at 
org.apache.spark.sql.internal.SessionResourceLoader.addJar(SessionState.scala:189)
at 
org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:119)
at 
org.apache.spark.sql.hive.HiveACLSessionResourceLoader.addJar(HiveACLSessionStateBuilder.scala:110)
at 
org.apache.spark.sql.internal.SessionResourceLoader.loadResource(SessionState.scala:157)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog$$anonfun$loadFunctionResources$1.apply(SessionCatalo
{code}

Assuming from the codes, it should just show the error log and the code should 
work.

> IllegegalArgument Exception is Thrown while creating new UDF with JAR
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Major
> Attachments: create_exception.txt
>
>
> 【Test step】：
> 1.launch spark-shell
> 2. set role admin;
> 3. create new function
>   CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
> 4. Do select on the function
> sql("select Func('2018-03-09')").show()
> 5.Create new UDF with same JAR
>    sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26516) zeppelin with spark on mesos: environment variable setting

2019-01-02 Thread Yui Hirasawa (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yui Hirasawa updated SPARK-26516:
-
Issue Type: Question  (was: IT Help)

> zeppelin with spark on mesos: environment variable setting
> --
>
> Key: SPARK-26516
> URL: https://issues.apache.org/jira/browse/SPARK-26516
> Project: Spark
>  Issue Type: Question
>  Components: Mesos, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yui Hirasawa
>Priority: Major
>
> I am trying to use zeppelin with spark on mesos mode following [Apache 
> Zeppelin on Spark Cluster 
> Mode|https://zeppelin.apache.org/docs/0.8.0/setup/deployment/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin-1].
> In the instruction, we should set these environment variables:
> {code:java}
> export MASTER=mesos://127.0.1.1:5050
> export MESOS_NATIVE_JAVA_LIBRARY=[PATH OF libmesos.so]
> export SPARK_HOME=[PATH OF SPARK HOME]
> {code}
> As far as I know, these environment variables are used by zeppelin, so it 
> should be set in localhost rather than in docker container(if i am wrong 
> please correct me).
> But mesos and spark is running inside docker container, so do we need to set 
> these environment variables so that they are pointing to the path inside the 
> docker container? If so, how should one achieve that?
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

88 matches

Mail list logo