date:20191117

[jira] [Resolved] (SPARK-29870) Unify the logic of multi-units interval string to CalendarInterval

2019-11-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29870.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26491
[https://github.com/apache/spark/pull/26491]

> Unify the logic of multi-units interval string to CalendarInterval
> --
>
> Key: SPARK-29870
> URL: https://issues.apache.org/jira/browse/SPARK-29870
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> We now have two different implementation for multi-units interval strings to 
> CalendarInterval type values.
> One is used to covert interval string literals to CalendarInterval. This 
> approach will re-delegate the interval string to spark parser which handles 
> the string as a `singleInterval` -> `multiUnitsInterval`  -> eventually call 
> `IntervalUtils.fromUnitStrings`
> The other is used in `Cast`,  which eventually calls 
> `IntervalUtils.stringToInterval`. This approach is ~10 times faster than the 
> other.
> We should unify these two for better performance and simple logic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29870) Unify the logic of multi-units interval string to CalendarInterval

2019-11-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29870:
---

Assignee: Kent Yao

> Unify the logic of multi-units interval string to CalendarInterval
> --
>
> Key: SPARK-29870
> URL: https://issues.apache.org/jira/browse/SPARK-29870
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> We now have two different implementation for multi-units interval strings to 
> CalendarInterval type values.
> One is used to covert interval string literals to CalendarInterval. This 
> approach will re-delegate the interval string to spark parser which handles 
> the string as a `singleInterval` -> `multiUnitsInterval`  -> eventually call 
> `IntervalUtils.fromUnitStrings`
> The other is used in `Cast`,  which eventually calls 
> `IntervalUtils.stringToInterval`. This approach is ~10 times faster than the 
> other.
> We should unify these two for better performance and simple logic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29940) Whether contains schema for this parameter "spark.yarn.historyServer.address"

2019-11-17 Thread hehuiyuan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hehuiyuan updated SPARK-29940:
--
Description: 
 

  !image-2019-11-18-15-44-10-358.png|width=815,height=156!

 

!image-2019-11-18-15-45-33-295.png|width=673,height=273!

 

  was:
 

  !image-2019-11-18-15-44-10-358.png|width=815,height=156!

 


> Whether contains schema  for this parameter "spark.yarn.historyServer.address"
> --
>
> Key: SPARK-29940
> URL: https://issues.apache.org/jira/browse/SPARK-29940
> Project: Spark
>  Issue Type: Wish
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: hehuiyuan
>Priority: Minor
> Attachments: image-2019-11-18-15-44-10-358.png, 
> image-2019-11-18-15-45-33-295.png
>
>
>  
>   !image-2019-11-18-15-44-10-358.png|width=815,height=156!
>  
> !image-2019-11-18-15-45-33-295.png|width=673,height=273!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29940) Whether contains schema for this parameter "spark.yarn.historyServer.address"

2019-11-17 Thread hehuiyuan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hehuiyuan updated SPARK-29940:
--
Attachment: image-2019-11-18-15-45-33-295.png

> Whether contains schema  for this parameter "spark.yarn.historyServer.address"
> --
>
> Key: SPARK-29940
> URL: https://issues.apache.org/jira/browse/SPARK-29940
> Project: Spark
>  Issue Type: Wish
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: hehuiyuan
>Priority: Minor
> Attachments: image-2019-11-18-15-44-10-358.png, 
> image-2019-11-18-15-45-33-295.png
>
>
>  
>   !image-2019-11-18-15-44-10-358.png|width=815,height=156!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29940) Whether contains schema for this parameter "spark.yarn.historyServer.address"

2019-11-17 Thread hehuiyuan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hehuiyuan updated SPARK-29940:
--
Description: 
 

  !image-2019-11-18-15-44-10-358.png|width=815,height=156!

 

  was:
!image-2019-11-18-15-37-20-628.png!

 

!image-2019-11-18-15-38-21-515.png!

 


> Whether contains schema  for this parameter "spark.yarn.historyServer.address"
> --
>
> Key: SPARK-29940
> URL: https://issues.apache.org/jira/browse/SPARK-29940
> Project: Spark
>  Issue Type: Wish
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: hehuiyuan
>Priority: Minor
> Attachments: image-2019-11-18-15-44-10-358.png
>
>
>  
>   !image-2019-11-18-15-44-10-358.png|width=815,height=156!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29940) Whether contains schema for this parameter "spark.yarn.historyServer.address"

2019-11-17 Thread hehuiyuan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hehuiyuan updated SPARK-29940:
--
Attachment: image-2019-11-18-15-44-10-358.png

> Whether contains schema  for this parameter "spark.yarn.historyServer.address"
> --
>
> Key: SPARK-29940
> URL: https://issues.apache.org/jira/browse/SPARK-29940
> Project: Spark
>  Issue Type: Wish
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: hehuiyuan
>Priority: Minor
> Attachments: image-2019-11-18-15-44-10-358.png
>
>
> !image-2019-11-18-15-37-20-628.png!
>  
> !image-2019-11-18-15-38-21-515.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29783) Support SQL Standard output style for interval type

2019-11-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29783.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26418
[https://github.com/apache/spark/pull/26418]

> Support SQL Standard output style for interval type
> ---
>
> Key: SPARK-29783
> URL: https://issues.apache.org/jira/browse/SPARK-29783
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> Support sql standard interval-style for output.
>  
> ||Style ||conf||Year-Month Interval||Day-Time Interval||Mixed Interval||
> |{{sql_standard}}|ANSI enabled|1-2|3 4:05:06|-1-2 3 -4:05:06|
> |{{spark's current}}|ansi disabled|1 year 2 mons|1 days 2 hours 3 minutes 
> 4.123456 seconds|interval 1 days 2 hours 3 minutes 4.123456 seconds|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29783) Support SQL Standard output style for interval type

2019-11-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29783:
---

Assignee: Kent Yao

> Support SQL Standard output style for interval type
> ---
>
> Key: SPARK-29783
> URL: https://issues.apache.org/jira/browse/SPARK-29783
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> Support sql standard interval-style for output.
>  
> ||Style ||conf||Year-Month Interval||Day-Time Interval||Mixed Interval||
> |{{sql_standard}}|ANSI enabled|1-2|3 4:05:06|-1-2 3 -4:05:06|
> |{{spark's current}}|ansi disabled|1 year 2 mons|1 days 2 hours 3 minutes 
> 4.123456 seconds|interval 1 days 2 hours 3 minutes 4.123456 seconds|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29940) Whether contains schema for this parameter "spark.yarn.historyServer.address"

2019-11-17 Thread hehuiyuan (Jira)

hehuiyuan created SPARK-29940:
-

 Summary: Whether contains schema  for this parameter 
"spark.yarn.historyServer.address"
 Key: SPARK-29940
 URL: https://issues.apache.org/jira/browse/SPARK-29940
 Project: Spark
  Issue Type: Wish
  Components: Documentation
Affects Versions: 3.0.0
Reporter: hehuiyuan


!image-2019-11-18-15-37-20-628.png!

 

!image-2019-11-18-15-38-21-515.png!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-25694) URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue

2019-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-25694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-25694:
-

Assignee: Zhou Jiang  (was: Dongjoon Hyun)

> URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue
> ---
>
> Key: SPARK-25694
> URL: https://issues.apache.org/jira/browse/SPARK-25694
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.4, 3.0.0
>Reporter: Bo Yang
>Assignee: Zhou Jiang
>Priority: Minor
> Fix For: 3.0.0
>
>
> URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() 
> returns FsUrlConnection object, which is not compatible with 
> HttpURLConnection. This will cause exception when using some third party http 
> library (e.g. scalaj.http).
> The following code in Spark 2.3.0 introduced the issue: 
> sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala:
> {code}
> object SharedState extends Logging  {   ...   
>   URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())   ...
> }
> {code}
> Here is the example exception when using scalaj.http in Spark:
> {code}
>  StackTrace: scala.MatchError: 
> org.apache.hadoop.fs.FsUrlConnection:[http://.example.com|http://.example.com/]
>  (of class org.apache.hadoop.fs.FsUrlConnection)
>  at 
> scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343)
>  at scalaj.http.HttpRequest.exec(Http.scala:335)
>  at scalaj.http.HttpRequest.asString(Http.scala:455)
> {code}
>   
> One option to fix the issue is to return null in 
> URLStreamHandlerFactory.createURLStreamHandler when the protocol is 
> http/https, so it will use the default behavior and be compatible with 
> scalaj.http. Following is the code example:
> {code}
> class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory with 
> Logging {
>   private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory()
>   override def createURLStreamHandler(protocol: String): URLStreamHandler = {
> val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol)
> if (handler == null) {
>   return null
> }
> if (protocol != null &&
>   (protocol.equalsIgnoreCase("http")
>   || protocol.equalsIgnoreCase("https"))) {
>   // return null to use system default URLStreamHandler
>   null
> } else {
>   handler
> }
>   }
> }
> {code}
> I would like to get some discussion here before submitting a pull request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29939) Add a conf for CompressionCodec for Ser/Deser of MapOutputStatus

2019-11-17 Thread Xiao Li (Jira)

Xiao Li created SPARK-29939:
---

 Summary: Add a conf for CompressionCodec for Ser/Deser of 
MapOutputStatus
 Key: SPARK-29939
 URL: https://issues.apache.org/jira/browse/SPARK-29939
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Xiao Li
Assignee: wuyi


All the other compressions have conf. Could we do it for this too? See the 
examples:

https://github.com/apache/spark/blob/1b575ef5d1b8e3e672b2fca5c354d6678bd78bd1/core/src/main/scala/org/apache/spark/serializer/SerializerManager.scala#L67-L73



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29020) Unifying behaviour between array_sort and sort_array

2019-11-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-29020.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25728
[https://github.com/apache/spark/pull/25728]

> Unifying behaviour between array_sort and sort_array
> 
>
> Key: SPARK-29020
> URL: https://issues.apache.org/jira/browse/SPARK-29020
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: German Schiavon Matteo
>Assignee: German Schiavon Matteo
>Priority: Major
> Fix For: 3.0.0
>
>
> I've noticed that there are two functions to sort arrays *sort_array* and 
> *array_sort*.
> *sort_array* is from 1.5.0 and it has the possibility of ordering both 
> ascending and descending 
> *array_sort* is from 2.4.0 and it only has the possibility of ordering in 
> ascending.
> Basically I just added the possibility of ordering either ascending or 
> descending using *array_sort*. 
> I think it would be good to have unified behaviours. 
>  
> This is the link to the [PR|[https://github.com/apache/spark/pull/25728]]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29020) Unifying behaviour between array_sort and sort_array

2019-11-17 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-29020:


Assignee: German Schiavon Matteo

> Unifying behaviour between array_sort and sort_array
> 
>
> Key: SPARK-29020
> URL: https://issues.apache.org/jira/browse/SPARK-29020
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: German Schiavon Matteo
>Assignee: German Schiavon Matteo
>Priority: Major
>
> I've noticed that there are two functions to sort arrays *sort_array* and 
> *array_sort*.
> *sort_array* is from 1.5.0 and it has the possibility of ordering both 
> ascending and descending 
> *array_sort* is from 2.4.0 and it only has the possibility of ordering in 
> ascending.
> Basically I just added the possibility of ordering either ascending or 
> descending using *array_sort*. 
> I think it would be good to have unified behaviours. 
>  
> This is the link to the [PR|[https://github.com/apache/spark/pull/25728]]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-29600) array_contains built in function is not backward compatible in 3.0

2019-11-17 Thread Udbhav Agrawal (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976319#comment-16976319
 ] 

Udbhav Agrawal edited comment on SPARK-29600 at 11/18/19 6:56 AM:
--

Hi [~hyukjin.kwon], this failure is because we cannot cast the literal to array 
type after the above behavior change. For example:
array(0.1,0.2,0.33) is type decimal(2,2) and literal 0.1 and 0.2 is also 
changed to decimal(2,2) but if we check 0.2 which actually is of type 
decimal(1,1) this query fails its data type doesn't match with array's data 
type.


was (Author: udbhav agrawal):
Hi [~hyukjin.kwon], this failure is because we cannot after the above behavior 
chnage spark doesn't cast the literal to array type . For example:

> array_contains built in function is not backward compatible in 3.0
> --
>
> Key: SPARK-29600
> URL: https://issues.apache.org/jira/browse/SPARK-29600
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> SELECT array_contains(array(0,0.1,0.2,0.3,0.5,0.02,0.033), .2); throws 
> Exception in 3.0 where as in 2.3.2 is working fine.
> Spark 3.0 output:
> 0: jdbc:hive2://10.18.19.208:23040/default> SELECT 
> array_contains(array(0,0.1,0.2,0.3,0.5,0.02,0.033), .2);
>  Error: org.apache.spark.sql.AnalysisException: cannot resolve 
> 'array_contains(array(CAST(0 AS DECIMAL(13,3)), CAST(0.1BD AS DECIMAL(13,3)), 
> CAST(0.2BD AS DECIMAL(13,3)), CAST(0.3BD AS DECIMAL(13,3)), CAST(0.5BD AS 
> DECIMAL(13,3)), CAST(0.02BD AS DECIMAL(13,3)), CAST(0.033BD AS 
> DECIMAL(13,3))), 0.2BD)' due to data type mismatch: Input to function 
> array_contains should have been array followed by a value with same element 
> type, but it's [array, decimal(1,1)].; line 1 pos 7;
>  'Project [unresolvedalias(array_contains(array(cast(0 as decimal(13,3)), 
> cast(0.1 as decimal(13,3)), cast(0.2 as decimal(13,3)), cast(0.3 as 
> decimal(13,3)), cast(0.5 as decimal(13,3)), cast(0.02 as decimal(13,3)), 
> cast(0.033 as decimal(13,3))), 0.2), None)]
> Spark 2.3.2 output
> 0: jdbc:hive2://10.18.18.214:23040/default> SELECT 
> array_contains(array(0,0.1,0.2,0.3,0.5,0.02,0.033), .2);
> |array_contains(array(CAST(0 AS DECIMAL(13,3)), CAST(0.1 AS DECIMAL(13,3)), 
> CAST(0.2 AS DECIMAL(13,3)), CAST(0.3 AS DECIMAL(13,3)), CAST(0.5 AS 
> DECIMAL(13,3)), CAST(0.02 AS DECIMAL(13,3)), CAST(0.033 AS DECIMAL(13,3))), 
> CAST(0.2 AS DECIMAL(13,3)))|
> |true|
> 1 row selected (0.18 seconds)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29600) array_contains built in function is not backward compatible in 3.0

2019-11-17 Thread Udbhav Agrawal (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976319#comment-16976319
 ] 

Udbhav Agrawal commented on SPARK-29600:


Hi [~hyukjin.kwon], this failure is because we cannot after the above behavior 
chnage spark doesn't cast the literal to array type . For example:

> array_contains built in function is not backward compatible in 3.0
> --
>
> Key: SPARK-29600
> URL: https://issues.apache.org/jira/browse/SPARK-29600
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> SELECT array_contains(array(0,0.1,0.2,0.3,0.5,0.02,0.033), .2); throws 
> Exception in 3.0 where as in 2.3.2 is working fine.
> Spark 3.0 output:
> 0: jdbc:hive2://10.18.19.208:23040/default> SELECT 
> array_contains(array(0,0.1,0.2,0.3,0.5,0.02,0.033), .2);
>  Error: org.apache.spark.sql.AnalysisException: cannot resolve 
> 'array_contains(array(CAST(0 AS DECIMAL(13,3)), CAST(0.1BD AS DECIMAL(13,3)), 
> CAST(0.2BD AS DECIMAL(13,3)), CAST(0.3BD AS DECIMAL(13,3)), CAST(0.5BD AS 
> DECIMAL(13,3)), CAST(0.02BD AS DECIMAL(13,3)), CAST(0.033BD AS 
> DECIMAL(13,3))), 0.2BD)' due to data type mismatch: Input to function 
> array_contains should have been array followed by a value with same element 
> type, but it's [array, decimal(1,1)].; line 1 pos 7;
>  'Project [unresolvedalias(array_contains(array(cast(0 as decimal(13,3)), 
> cast(0.1 as decimal(13,3)), cast(0.2 as decimal(13,3)), cast(0.3 as 
> decimal(13,3)), cast(0.5 as decimal(13,3)), cast(0.02 as decimal(13,3)), 
> cast(0.033 as decimal(13,3))), 0.2), None)]
> Spark 2.3.2 output
> 0: jdbc:hive2://10.18.18.214:23040/default> SELECT 
> array_contains(array(0,0.1,0.2,0.3,0.5,0.02,0.033), .2);
> |array_contains(array(CAST(0 AS DECIMAL(13,3)), CAST(0.1 AS DECIMAL(13,3)), 
> CAST(0.2 AS DECIMAL(13,3)), CAST(0.3 AS DECIMAL(13,3)), CAST(0.5 AS 
> DECIMAL(13,3)), CAST(0.02 AS DECIMAL(13,3)), CAST(0.033 AS DECIMAL(13,3))), 
> CAST(0.2 AS DECIMAL(13,3)))|
> |true|
> 1 row selected (0.18 seconds)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29938) Add batching in alter table add partition flow

2019-11-17 Thread Prakhar Jain (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakhar Jain updated SPARK-29938:
-
Description: 
When lot of new partitions are added by an Insert query on a partitioned 
datasource table, sometimes the query fails with -
{noformat}
An error was encountered: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out; at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
 at
org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:928)
 at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:798)
 at
org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.run(ddl.scala:448)
 at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.refreshUpdatedPartitions$1(InsertIntoHadoopFsRelationCommand.scala:137)
{noformat}
This happens because adding thousands of partition in a single call takes lot 
of time and the client eventually timesout.

Also adding lot of partitions can lead to OOM in Hive Metastore (similar issue 
in [recover partition flow|https://github.com/apache/spark/pull/14607] fixed).

Steps to reproduce -
{noformat}
case class Partition(data: Int, partition_key: Int)
val df = sc.parallelize(1 to 15000, 15000).map(x => Partition(x,x)).toDF
df.registerTempTable("temp_table")

spark.sql("""CREATE TABLE `test_table` (`data` INT, `partition_key` INT) USING 
parquet PARTITIONED BY (partition_key) """)
spark.sql("INSERT OVERWRITE TABLE test_table select * from 
temp_table").collect()
{noformat}

  was:
When lot of new partitions are added by an Insert query on a partitioned 
datasource table, sometimes the query fails with -
{noformat}
An error was encountered: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out; at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
 at
org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:928)
 at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:798)
 at
org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.run(ddl.scala:448)
 at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.refreshUpdatedPartitions$1(InsertIntoHadoopFsRelationCommand.scala:137)
{noformat}
This happens because adding thousands of partition in a single call takes lot 
of time and the client eventually timesout.

Also adding lot of partitions can lead to OOM in Hive Metastore (similar issue 
in [recover partition flow|https://github.com/apache/spark/pull/14607] fixed).

Steps to reproduce -
{noformat}
case class Partition(data: Int, partition_key: Int)
val df = sc.parallelize(1 to 15000, 15000).map(x => Partition(x,x)).toDF
df.registerTempTable("temp_table")

spark.sql("""CREATE TABLE `test_table` (`data` INT, `partition_key` INT) USING 
parquet PARTITIONED BY (partition_key) """)
spark.sql("INSERT OVERWRITE TABLE test_table select * from 
temp_table").collect()
{noformat}


> Add batching in alter table add partition flow
> --
>
> Key: SPARK-29938
> URL: https://issues.apache.org/jira/browse/SPARK-29938
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.4
>Reporter: Prakhar Jain
>Priority: Major
>
> When lot of new partitions are added by an Insert query on a partitioned 
> datasource table, sometimes the query fails with -
> {noformat}
> An error was encountered: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out; at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
>  at
> org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:928)
>  at
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:798)
>  at
> org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.run(ddl.scala:448)
>  at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.refreshUpdatedPartitions$1(InsertIntoHadoopFsRelationCommand.scala:137)
> {noformat}
> This happens because adding thousands of partition in a single call takes lot 
> of time and the client eventually timesout.
> Also adding lot of partitions can lead to OOM in Hive Metastore (similar 
> issue in [recover partition

[jira] [Updated] (SPARK-29938) Add batching in alter table add partition flow

2019-11-17 Thread Prakhar Jain (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakhar Jain updated SPARK-29938:
-
Description: 
When lot of new partitions are added by an Insert query on a partitioned 
datasource table, sometimes the query fails with -
{noformat}
An error was encountered: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out; at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
 at
org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:928)
 at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:798)
 at
org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.run(ddl.scala:448)
 at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.refreshUpdatedPartitions$1(InsertIntoHadoopFsRelationCommand.scala:137)
{noformat}
This happens because adding thousands of partition in a single call takes lot 
of time and the client eventually timesout.

Also adding lot of partitions can lead to OOM in Hive Metastore (similar issue 
in [recover partition flow|https://github.com/apache/spark/pull/14607] fixed).

Steps to reproduce -
{noformat}
case class Partition(data: Int, partition_key: Int)
val df = sc.parallelize(1 to 15000, 15000).map(x => Partition(x,x)).toDF
df.registerTempTable("temp_table")

spark.sql("""CREATE TABLE `test_table` (`data` INT, `partition_key` INT) USING 
parquet PARTITIONED BY (partition_key) """)
spark.sql("INSERT OVERWRITE TABLE test_table select * from 
temp_table").collect()
{noformat}

  was:
When lot of new partitions are added by an Insert query on a partitioned 
datasource table, sometimes the query fails with -

{noformat}
An error was encountered: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out; at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:928)
 at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:798)
 at 
org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.run(ddl.scala:448)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.refreshUpdatedPartitions$1(InsertIntoHadoopFsRelationCommand.scala:137)
{noformat}

This happens because adding thousands of partition in a single call takes lot 
of time and the client eventually timesout.

Also adding lot of partitions can lead to OOM in Hive Metastore (similar issue 
in [recover partition flow|https://github.com/apache/spark/pull/14607] fixed).

Steps to reproduce - 

{noformat}
case class Partition(data: Int, partition_key: Int)
val df = sc.parallelize(1 to 15000, 15000).map(x => Partition(x,x)).toDF
df.registerTempTable("temp_table")

spark.sql("""CREATE TABLE `test_table` (`data` INT, `partition_key` INT) USING 
parquet PARTITIONED BY (partition_key) """)
spark.sql("INSERT OVERWRITE TABLE test_table select * from 
temp_table").collect()
{noformat}


> Add batching in alter table add partition flow
> --
>
> Key: SPARK-29938
> URL: https://issues.apache.org/jira/browse/SPARK-29938
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.4
>Reporter: Prakhar Jain
>Priority: Major
>
> When lot of new partitions are added by an Insert query on a partitioned 
> datasource table, sometimes the query fails with -
> {noformat}
> An error was encountered: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException:
> org.apache.thrift.transport.TTransportException: 
> java.net.SocketTimeoutException: Read timed out; at
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
>  at
> org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:928)
>  at
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:798)
>  at
> org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.run(ddl.scala:448)
>  at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.refreshUpdatedPartitions$1(InsertIntoHadoopFsRelationCommand.scala:137)
> {noformat}
> This happens because adding thousands of partition in a single call takes lot 
> of time and the client eventually timesout.
> Also adding lot of partitions can lead to OOM in Hive Metastore (similar 
> issue in [recover partition

[jira] [Created] (SPARK-29938) Add batching in alter table add partition flow

2019-11-17 Thread Prakhar Jain (Jira)

Prakhar Jain created SPARK-29938:


 Summary: Add batching in alter table add partition flow
 Key: SPARK-29938
 URL: https://issues.apache.org/jira/browse/SPARK-29938
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.4, 2.3.4
Reporter: Prakhar Jain


When lot of new partitions are added by an Insert query on a partitioned 
datasource table, sometimes the query fails with -

{noformat}
An error was encountered: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
org.apache.thrift.transport.TTransportException: 
java.net.SocketTimeoutException: Read timed out; at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:928)
 at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:798)
 at 
org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.run(ddl.scala:448)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.refreshUpdatedPartitions$1(InsertIntoHadoopFsRelationCommand.scala:137)
{noformat}

This happens because adding thousands of partition in a single call takes lot 
of time and the client eventually timesout.

Also adding lot of partitions can lead to OOM in Hive Metastore (similar issue 
in [recover partition flow|https://github.com/apache/spark/pull/14607] fixed).

Steps to reproduce - 

{noformat}
case class Partition(data: Int, partition_key: Int)
val df = sc.parallelize(1 to 15000, 15000).map(x => Partition(x,x)).toDF
df.registerTempTable("temp_table")

spark.sql("""CREATE TABLE `test_table` (`data` INT, `partition_key` INT) USING 
parquet PARTITIONED BY (partition_key) """)
spark.sql("INSERT OVERWRITE TABLE test_table select * from 
temp_table").collect()
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-29587) Real data type is not supported in Spark SQL which is supporting in postgresql

2019-11-17 Thread Ankit Raj Boudh (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Raj Boudh updated SPARK-29587:

Comment: was deleted

(was: I will analyse this issue)

> Real data type is not supported in Spark SQL which is supporting in postgresql
> --
>
> Key: SPARK-29587
> URL: https://issues.apache.org/jira/browse/SPARK-29587
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: jobit mathew
>Priority: Minor
>
> Real data type is not supported in Spark SQL which is supporting in 
> postgresql.
> +*In postgresql query success*+
> CREATE TABLE weather2(prcp real);
> insert into weather2 values(2.5);
> select * from weather2;
>  
> ||  ||prcp||
> |1|2,5|
> +*In spark sql getting error*+
> spark-sql> CREATE TABLE weather2(prcp real);
> Error in query:
> DataType real is not supported.(line 1, pos 27)
> == SQL ==
> CREATE TABLE weather2(prcp real)
> ---
> Better to add the datatype "real " support in sql also
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25694) URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue

2019-11-17 Thread DB Tsai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-25694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai resolved SPARK-25694.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26530
[https://github.com/apache/spark/pull/26530]

> URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue
> ---
>
> Key: SPARK-25694
> URL: https://issues.apache.org/jira/browse/SPARK-25694
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.4, 3.0.0
>Reporter: Bo Yang
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.0
>
>
> URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() 
> returns FsUrlConnection object, which is not compatible with 
> HttpURLConnection. This will cause exception when using some third party http 
> library (e.g. scalaj.http).
> The following code in Spark 2.3.0 introduced the issue: 
> sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala:
> {code}
> object SharedState extends Logging  {   ...   
>   URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())   ...
> }
> {code}
> Here is the example exception when using scalaj.http in Spark:
> {code}
>  StackTrace: scala.MatchError: 
> org.apache.hadoop.fs.FsUrlConnection:[http://.example.com|http://.example.com/]
>  (of class org.apache.hadoop.fs.FsUrlConnection)
>  at 
> scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343)
>  at scalaj.http.HttpRequest.exec(Http.scala:335)
>  at scalaj.http.HttpRequest.asString(Http.scala:455)
> {code}
>   
> One option to fix the issue is to return null in 
> URLStreamHandlerFactory.createURLStreamHandler when the protocol is 
> http/https, so it will use the default behavior and be compatible with 
> scalaj.http. Following is the code example:
> {code}
> class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory with 
> Logging {
>   private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory()
>   override def createURLStreamHandler(protocol: String): URLStreamHandler = {
> val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol)
> if (handler == null) {
>   return null
> }
> if (protocol != null &&
>   (protocol.equalsIgnoreCase("http")
>   || protocol.equalsIgnoreCase("https"))) {
>   // return null to use system default URLStreamHandler
>   null
> } else {
>   handler
> }
>   }
> }
> {code}
> I would like to get some discussion here before submitting a pull request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-25694) URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue

2019-11-17 Thread DB Tsai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-25694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai reassigned SPARK-25694:
---

Assignee: Dongjoon Hyun

> URL.setURLStreamHandlerFactory causing incompatible HttpURLConnection issue
> ---
>
> Key: SPARK-25694
> URL: https://issues.apache.org/jira/browse/SPARK-25694
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.4, 3.0.0
>Reporter: Bo Yang
>Assignee: Dongjoon Hyun
>Priority: Minor
>
> URL.setURLStreamHandlerFactory() in SharedState causes URL.openConnection() 
> returns FsUrlConnection object, which is not compatible with 
> HttpURLConnection. This will cause exception when using some third party http 
> library (e.g. scalaj.http).
> The following code in Spark 2.3.0 introduced the issue: 
> sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala:
> {code}
> object SharedState extends Logging  {   ...   
>   URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())   ...
> }
> {code}
> Here is the example exception when using scalaj.http in Spark:
> {code}
>  StackTrace: scala.MatchError: 
> org.apache.hadoop.fs.FsUrlConnection:[http://.example.com|http://.example.com/]
>  (of class org.apache.hadoop.fs.FsUrlConnection)
>  at 
> scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:343)
>  at scalaj.http.HttpRequest.exec(Http.scala:335)
>  at scalaj.http.HttpRequest.asString(Http.scala:455)
> {code}
>   
> One option to fix the issue is to return null in 
> URLStreamHandlerFactory.createURLStreamHandler when the protocol is 
> http/https, so it will use the default behavior and be compatible with 
> scalaj.http. Following is the code example:
> {code}
> class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory with 
> Logging {
>   private val fsUrlStreamHandlerFactory = new FsUrlStreamHandlerFactory()
>   override def createURLStreamHandler(protocol: String): URLStreamHandler = {
> val handler = fsUrlStreamHandlerFactory.createURLStreamHandler(protocol)
> if (handler == null) {
>   return null
> }
> if (protocol != null &&
>   (protocol.equalsIgnoreCase("http")
>   || protocol.equalsIgnoreCase("https"))) {
>   // return null to use system default URLStreamHandler
>   null
> } else {
>   handler
> }
>   }
> }
> {code}
> I would like to get some discussion here before submitting a pull request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29929) Allow V2 Datasources to require a data distribution

2019-11-17 Thread Jungtaek Lim (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976294#comment-16976294
 ] 

Jungtaek Lim commented on SPARK-29929:
--

Possibly duplicated with SPARK-23889 , though no one is working on SPARK-23889 
as of now. SPARK-23889 has broader requirements.

> Allow V2 Datasources to require a data distribution
> ---
>
> Key: SPARK-29929
> URL: https://issues.apache.org/jira/browse/SPARK-29929
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Andrew K Long
>Priority: Major
>
> Currently users are unable to specify that their v2 Datasource requires a 
> particular Distribution before inserting data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29936) Fix SparkR lint errors and add lint-r GitHub Action

2019-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29936.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26564
[https://github.com/apache/spark/pull/26564]

> Fix SparkR lint errors and add lint-r GitHub Action
> ---
>
> Key: SPARK-29936
> URL: https://issues.apache.org/jira/browse/SPARK-29936
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29936) Fix SparkR lint errors and add lint-r GitHub Action

2019-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29936:
-

Assignee: Dongjoon Hyun

> Fix SparkR lint errors and add lint-r GitHub Action
> ---
>
> Key: SPARK-29936
> URL: https://issues.apache.org/jira/browse/SPARK-29936
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29907) Move DELETE/UPDATE/MERGE relative rules to dmlStatementNoWith to support cte.

2019-11-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-29907:
---

Assignee: Xianyin Xin

> Move DELETE/UPDATE/MERGE relative rules to dmlStatementNoWith to support cte.
> -
>
> Key: SPARK-29907
> URL: https://issues.apache.org/jira/browse/SPARK-29907
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
>Priority: Major
>
> SPARK-27444 introduced `dmlStatementNoWith` so that any dml that needs cte 
> support can leverage it. It be better if we move DELETE/UPDATE/MERGE rules to 
> `dmlStatementNoWith`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29907) Move DELETE/UPDATE/MERGE relative rules to dmlStatementNoWith to support cte.

2019-11-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29907.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26536
[https://github.com/apache/spark/pull/26536]

> Move DELETE/UPDATE/MERGE relative rules to dmlStatementNoWith to support cte.
> -
>
> Key: SPARK-29907
> URL: https://issues.apache.org/jira/browse/SPARK-29907
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Assignee: Xianyin Xin
>Priority: Major
> Fix For: 3.0.0
>
>
> SPARK-27444 introduced `dmlStatementNoWith` so that any dml that needs cte 
> support can leverage it. It be better if we move DELETE/UPDATE/MERGE rules to 
> `dmlStatementNoWith`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29903) Add documentation for recursiveFileLookup

2019-11-17 Thread Nicholas Chammas (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976252#comment-16976252
 ] 

Nicholas Chammas commented on SPARK-29903:
--

Happy to do that. Going to wait for [this 
PR|https://github.com/apache/spark/pull/26525] to be completed before writing 
any docs though, so I can address both the DataFrame and SQL APIs in one go.

> Add documentation for recursiveFileLookup
> -
>
> Key: SPARK-29903
> URL: https://issues.apache.org/jira/browse/SPARK-29903
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> SPARK-27990 added a new option, {{recursiveFileLookup}}, for recursively 
> loading data from a source directory. There is currently no documentation for 
> this option.
> We should document this both for the DataFrame API as well as for SQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29807) Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled"

2019-11-17 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29807.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26444
[https://github.com/apache/spark/pull/26444]

> Rename "spark.sql.ansi.enabled" to "spark.sql.dialect.spark.ansi.enabled"
> -
>
> Key: SPARK-29807
> URL: https://issues.apache.org/jira/browse/SPARK-29807
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.0.0
>
>
> The relation between "spark.sql.ansi.enabled" and "spark.sql.dialect" is 
> confusing, since the "PostgreSQL" dialect should contain the features of 
> "spark.sql.ansi.enabled".
> To make things clearer, we can rename the "spark.sql.ansi.enabled" to 
> "spark.sql.dialect.spark.ansi.enabled", thus the option 
> "spark.sql.dialect.spark.ansi.enabled" is only for Spark dialect.
> For the casting and arithmetic operations, runtime exceptions should be 
> thrown if "spark.sql.dialect" is "spark" and 
> "spark.sql.dialect.spark.ansi.enabled" is true or "spark.sql.dialect" is 
> PostgresSQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16872) Impl Gaussian Naive Bayes Classifier

2019-11-17 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-16872.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 26413
[https://github.com/apache/spark/pull/26413]

> Impl Gaussian Naive Bayes Classifier
> 
>
> Key: SPARK-16872
> URL: https://issues.apache.org/jira/browse/SPARK-16872
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Major
> Fix For: 3.1.0
>
>
> I implemented Gaussian NB according to scikit-learn's {{GaussianNB}}.
> In GaussianNB model, the {{theta}} matrix is used to store means and there is 
> a extra {{sigma}} matrix storing the variance of each feature.
> GaussianNB in spark
> {code}
> scala> import org.apache.spark.ml.classification.GaussianNaiveBayes
> import org.apache.spark.ml.classification.GaussianNaiveBayes
> scala> val path = 
> "/Users/zrf/.dev/spark-2.1.0-bin-hadoop2.7/data/mllib/sample_multiclass_classification_data.txt"
> path: String = 
> /Users/zrf/.dev/spark-2.1.0-bin-hadoop2.7/data/mllib/sample_multiclass_classification_data.txt
> scala> val data = spark.read.format("libsvm").load(path).persist()
> data: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [label: 
> double, features: vector]
> scala> val gnb = new GaussianNaiveBayes()
> gnb: org.apache.spark.ml.classification.GaussianNaiveBayes = gnb_54c50467306c
> scala> val model = gnb.fit(data)
> 17/01/03 14:25:48 INFO Instrumentation: 
> GaussianNaiveBayes-gnb_54c50467306c-720112035-1: training: numPartitions=1 
> storageLevel=StorageLevel(1 replicas)
> 17/01/03 14:25:48 INFO Instrumentation: 
> GaussianNaiveBayes-gnb_54c50467306c-720112035-1: {}
> 17/01/03 14:25:49 INFO Instrumentation: 
> GaussianNaiveBayes-gnb_54c50467306c-720112035-1: {"numFeatures":4}
> 17/01/03 14:25:49 INFO Instrumentation: 
> GaussianNaiveBayes-gnb_54c50467306c-720112035-1: {"numClasses":3}
> 17/01/03 14:25:49 INFO Instrumentation: 
> GaussianNaiveBayes-gnb_54c50467306c-720112035-1: training finished
> model: org.apache.spark.ml.classification.GaussianNaiveBayesModel = 
> GaussianNaiveBayesModel (uid=gnb_54c50467306c) with 3 classes
> scala> model.pi
> res0: org.apache.spark.ml.linalg.Vector = 
> [-1.0986122886681098,-1.0986122886681098,-1.0986122886681098]
> scala> model.pi.toArray.map(math.exp)
> res1: Array[Double] = Array(0., 0., 
> 0.)
> scala> model.theta
> res2: org.apache.spark.ml.linalg.Matrix =
> 0.270067018001   -0.188540006  0.543050720001   0.60546
> -0.60779998  0.18172   -0.842711740006  
> -0.88139998
> -0.091425964 -0.35858001   0.105084738  
> 0.021666701507102017
> scala> model.sigma
> res3: org.apache.spark.ml.linalg.Matrix =
> 0.1223012510889361   0.07078051983960698  0.0343595243976   
> 0.051336071297393815
> 0.03758145300924998  0.09880280046403413  0.003390296940069426  
> 0.007822241779598893
> 0.08058763609659315  0.06701386661293329  0.024866409227781675  
> 0.02661391644759426
> scala> model.transform(data).select("probability").take(10)
> [rdd_68_0]
> res4: Array[org.apache.spark.sql.Row] = 
> Array([[1.0627410543476422E-21,0.9938,6.2765233965353945E-15]], 
> [[7.254521422345374E-26,1.0,1.3849442153180895E-18]], 
> [[1.9629244119173135E-24,0.9998,1.9424765181237926E-16]], 
> [[6.061218297948492E-22,0.9902,9.853216073401884E-15]], 
> [[0.9972225671942837,8.844241161578932E-165,0.002777432805716399]], 
> [[5.361683970373604E-26,1.0,2.3004604508982183E-18]], 
> [[0.01062850630038623,3.3102617689978775E-100,0.9893714936996136]], 
> [[1.9297314618271785E-4,2.124922209137708E-71,0.9998070268538172]], 
> [[3.118816393732361E-27,1.0,6.5310299615983584E-21]], 
> [[0.926009854522,8.734773657627494E-206,7.399014547943611E-6]])
> scala> model.transform(data).select("prediction").take(10)
> [rdd_68_0]
> res5: Array[org.apache.spark.sql.Row] = Array([1.0], [1.0], [1.0], [1.0], 
> [0.0], [1.0], [2.0], [2.0], [1.0], [0.0])
> {code}
> GaussianNB in scikit-learn
> {code}
> import numpy as np
> from sklearn.naive_bayes import GaussianNB
> from sklearn.datasets import load_svmlight_file
> path = 
> '/Users/zrf/.dev/spark-2.1.0-bin-hadoop2.7/data/mllib/sample_multiclass_classification_data.txt'
> X, y = load_svmlight_file(path)
> X = X.toarray()
> clf = GaussianNB()
> clf.fit(X, y)
> >>> clf.class_prior_
> array([ 0.,  0.,  0.])
> >>> clf.theta_
> array([[ 0.2701, -0.1885,  0.54305072,  0.6055],
>[-0.6078,  0.1817, -0.84271174, -0.8814],
>[-0.0914, -0.3586,  0.10508474,  0.0216667 ]])
>
> >>>

[jira] [Updated] (SPARK-29936) Fix SparkR lint errors and add lint-r GitHub Action

2019-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29936:
--
Issue Type: Bug  (was: Task)

> Fix SparkR lint errors and add lint-r GitHub Action
> ---
>
> Key: SPARK-29936
> URL: https://issues.apache.org/jira/browse/SPARK-29936
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29936) Fix SparkR lint errors and add lint-r GitHub Action

2019-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29936:
--
Summary: Fix SparkR lint errors and add lint-r GitHub Action  (was: Add 
`lint-r` GitHub Action)

> Fix SparkR lint errors and add lint-r GitHub Action
> ---
>
> Key: SPARK-29936
> URL: https://issues.apache.org/jira/browse/SPARK-29936
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29936) Add `lint-r` GitHub Action

2019-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29936:
--
Component/s: (was: Tests)
 SparkR

> Add `lint-r` GitHub Action
> --
>
> Key: SPARK-29936
> URL: https://issues.apache.org/jira/browse/SPARK-29936
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29937) Make FileSourceScanExec class fields lazy

2019-11-17 Thread ulysses you (Jira)

ulysses you created SPARK-29937:
---

 Summary: Make FileSourceScanExec class fields lazy
 Key: SPARK-29937
 URL: https://issues.apache.org/jira/browse/SPARK-29937
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: ulysses you


Since JIRA SPARK-28346,PR [25111|https://github.com/apache/spark/pull/25111], 
QueryExecution will copy all node stage-by-stage. This make all node instance 
twice almost. So we should make all class fields lazy to avoid create more 
unexpected object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29581) Enable cleanup old event log files

2019-11-17 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-29581.
--
Resolution: Invalid

We took the different approach: see SPARK-29779

> Enable cleanup old event log files 
> ---
>
> Key: SPARK-29581
> URL: https://issues.apache.org/jira/browse/SPARK-29581
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> This issue can be start only with SPARK-29579 is addressed properly.
> After SPARK-29579 Spark would guarantee strong compatibility on both live 
> entities and snapshots, which means snapshot file could replace older origin 
> event log files. This issue tracks the efforts on automatically cleaning up 
> old event logs if snapshot file can replace them, which lets overall size of 
> event log on streaming query to be manageable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29935) Remove `Spark QA Compile` Jenkins Dashboard (and jobs)

2019-11-17 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976209#comment-16976209
 ] 

Dongjoon Hyun commented on SPARK-29935:
---

SPARK-29936 will recover Lint-R in GitHub Action.

> Remove `Spark QA Compile` Jenkins Dashboard (and jobs)
> --
>
> Key: SPARK-29935
> URL: https://issues.apache.org/jira/browse/SPARK-29935
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> The following dashboard has 6 jobs.
> - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
> Those 6 jobs are a subset of GitHub Action now. So, we can save our Jenkins 
> computing resources and reduces our maintenance efforts.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.6/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.7/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-lint/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-29935) Remove `Spark QA Compile` Jenkins Dashboard (and jobs)

2019-11-17 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976208#comment-16976208
 ] 

Dongjoon Hyun edited comment on SPARK-29935 at 11/18/19 12:19 AM:
--

Yes, it's much better now. Also, we can re-trigger the failed task 
indefinitely. 
I've monitoring it and it still faces failures some times due to Maven 
downloading (We didn't cache all).

In addition to that, for now, 2 of the above jobs are broken in Jenkins.
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-lint/
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/


was (Author: dongjoon):
Yes, it's much better now. Also, we can re-trigger the failed task 
indefinitely. 
I've monitoring it and it still faces failures some times due to Maven 
downloading (We didn't cache all).
For now, 2 of the above jobs are broken in Jenkins.
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-lint/
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/

> Remove `Spark QA Compile` Jenkins Dashboard (and jobs)
> --
>
> Key: SPARK-29935
> URL: https://issues.apache.org/jira/browse/SPARK-29935
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> The following dashboard has 6 jobs.
> - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
> Those 6 jobs are a subset of GitHub Action now. So, we can save our Jenkins 
> computing resources and reduces our maintenance efforts.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.6/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.7/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-lint/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29935) Remove `Spark QA Compile` Jenkins Dashboard (and jobs)

2019-11-17 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976208#comment-16976208
 ] 

Dongjoon Hyun commented on SPARK-29935:
---

Yes, it's much better now. Also, we can re-trigger the failed task 
indefinitely. 
I've monitoring it and it still faces failures some times due to Maven 
downloading (We didn't cache all).
For now, 2 of the above jobs are broken in Jenkins.
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-lint/
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/

> Remove `Spark QA Compile` Jenkins Dashboard (and jobs)
> --
>
> Key: SPARK-29935
> URL: https://issues.apache.org/jira/browse/SPARK-29935
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> The following dashboard has 6 jobs.
> - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
> Those 6 jobs are a subset of GitHub Action now. So, we can save our Jenkins 
> computing resources and reduces our maintenance efforts.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.6/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.7/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-lint/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29935) Remove `Spark QA Compile` Jenkins Dashboard (and jobs)

2019-11-17 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976197#comment-16976197
 ] 

Sean R. Owen commented on SPARK-29935:
--

Do the github actions work reliably now? I haven't watched them in a while. 

> Remove `Spark QA Compile` Jenkins Dashboard (and jobs)
> --
>
> Key: SPARK-29935
> URL: https://issues.apache.org/jira/browse/SPARK-29935
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> The following dashboard has 6 jobs.
> - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
> Those 6 jobs are a subset of GitHub Action now. So, we can save our Jenkins 
> computing resources and reduces our maintenance efforts.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.6/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.7/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-lint/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29936) Add `lint-r` GitHub Action

2019-11-17 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-29936:
-

 Summary: Add `lint-r` GitHub Action
 Key: SPARK-29936
 URL: https://issues.apache.org/jira/browse/SPARK-29936
 Project: Spark
  Issue Type: Task
  Components: Tests
Affects Versions: 2.4.5, 3.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29935) Remove `Spark QA Compile` Jenkins Dashboard (and jobs)

2019-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29935:
--
Priority: Minor  (was: Major)

> Remove `Spark QA Compile` Jenkins Dashboard (and jobs)
> --
>
> Key: SPARK-29935
> URL: https://issues.apache.org/jira/browse/SPARK-29935
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> The following dashboard has 6 jobs.
> - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
> Those 6 jobs are a subset of GitHub Action now. So, we can save our Jenkins 
> computing resources and reduces our maintenance efforts.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.6/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.7/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-lint/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29935) Remove `Spark QA Compile` Jenkins Dashboard (and jobs)

2019-11-17 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976153#comment-16976153
 ] 

Dongjoon Hyun commented on SPARK-29935:
---

cc [~shaneknapp], [~srowen]

> Remove `Spark QA Compile` Jenkins Dashboard (and jobs)
> --
>
> Key: SPARK-29935
> URL: https://issues.apache.org/jira/browse/SPARK-29935
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> The following dashboard has 6 jobs.
> - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
> Those 6 jobs are a subset of GitHub Action now. So, we can save our Jenkins 
> computing resources and reduces our maintenance efforts.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.6/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.7/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-lint/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29935) Remove `Spark QA Compile` Jenkins Dashboard (and jobs)

2019-11-17 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-29935:
-

 Summary: Remove `Spark QA Compile` Jenkins Dashboard (and jobs)
 Key: SPARK-29935
 URL: https://issues.apache.org/jira/browse/SPARK-29935
 Project: Spark
  Issue Type: Task
  Components: Project Infra
Affects Versions: 2.4.5, 3.0.0
Reporter: Dongjoon Hyun


The following dashboard has 6 jobs.

- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/

Those 6 jobs are a subset of GitHub Action now. So, we can save our Jenkins 
computing resources and reduces our maintenance efforts.
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.6/
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-compile-maven-hadoop-2.7/
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-branch-2.4-lint/
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29934) Dataset support GraphX

2019-11-17 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976151#comment-16976151
 ] 

Dongjoon Hyun commented on SPARK-29934:
---

Hi, [~darion].
According to your context, the issue type should be 'Improvement' or 'New 
Feature` instead of `Bug`. In addition, in that case, `Affects Version/s` 
should be `3.0.0`.

BTW, JIRA is not for Q For the question, please ask on the dev mailing list 
fist.

> Dataset support GraphX
> --
>
> Key: SPARK-29934
> URL: https://issues.apache.org/jira/browse/SPARK-29934
> Project: Spark
>  Issue Type: Bug
>  Components: Graph, GraphX, Spark Core
>Affects Versions: 2.4.4
>Reporter: darion yaphet
>Priority: Minor
>
> Do we have some plan to support GraphX with dataset ？



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0

2019-11-17 Thread Reynold Xin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976134#comment-16976134
 ] 

Reynold Xin commented on SPARK-29931:
-

You can say "This config will be removed in Spark 4.0 or a later release."

> Declare all SQL legacy configs as will be removed in Spark 4.0
> --
>
> Key: SPARK-29931
> URL: https://issues.apache.org/jira/browse/SPARK-29931
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Add the sentence to descriptions of all legacy SQL configs existed before 
> Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of 
> such configs:
> * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName
> * spark.sql.legacy.literal.pickMinimumPrecision
> * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation
> * spark.sql.legacy.sizeOfNull
> * spark.sql.legacy.replaceDatabricksSparkAvro.enabled
> * spark.sql.legacy.setopsPrecedence.enabled
> * spark.sql.legacy.integralDivide.returnBigint
> * spark.sql.legacy.bucketedTableScan.outputOrdering
> * spark.sql.legacy.parser.havingWithoutGroupByAsWhere
> * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue
> * spark.sql.legacy.setCommandRejectsSparkCoreConfs
> * spark.sql.legacy.utcTimestampFunc.enabled
> * spark.sql.legacy.typeCoercion.datetimeToString
> * spark.sql.legacy.looseUpcast
> * spark.sql.legacy.ctePrecedence.enabled
> * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-29758) json_tuple truncates fields

2019-11-17 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976106#comment-16976106
 ] 

Maxim Gekk edited comment on SPARK-29758 at 11/17/19 6:17 PM:
--

Another solution is to disable this optimization: 
[https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478]


was (Author: maxgekk):
Another solution is to remove this optimization: 
https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478

> json_tuple truncates fields
> ---
>
> Key: SPARK-29758
> URL: https://issues.apache.org/jira/browse/SPARK-29758
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.4
> Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave 
> 10.14.3, Spark 2.4.4)
> Jdk 8, Scala 2.11.12
>Reporter: Stanislav
>Priority: Major
>
> `json_tuple` has inconsistent behaviour with `from_json` - but only if json 
> string is longer than 2700 characters or so.
> This can be reproduced in spark-shell and on cluster, but not in scalatest, 
> for some reason.
> {code}
> import org.apache.spark.sql.functions.{from_json, json_tuple}
> import org.apache.spark.sql.types._
> val counterstring = 
> "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*"
> val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("result", json_tuple('json, "test"))
>   .select('result)
>   .as[String].head.length
> val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", 
> StringType)
>   .withColumn("result", $"parsed.test")
>   .select('result)
>   .as[String].head.length
> scala> json_tuple_result
> res62: Int = 2791
> scala> from_json_result
> res63: Int = 2800
> {code}
> Result is influenced by the total length of the json string at the moment of 
> parsing:
> {code}
> val

[jira] [Assigned] (SPARK-29930) Remove SQL configs declared to be removed in Spark 3.0

2019-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29930:
-

Assignee: Maxim Gekk

> Remove SQL configs declared to be removed in Spark 3.0
> --
>
> Key: SPARK-29930
> URL: https://issues.apache.org/jira/browse/SPARK-29930
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>
> Need to remove the following SQL configs:
> * spark.sql.fromJsonForceNullableSchema
> * spark.sql.legacy.compareDateTimestampInTimestamp



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29930) Remove SQL configs declared to be removed in Spark 3.0

2019-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29930.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26559
[https://github.com/apache/spark/pull/26559]

> Remove SQL configs declared to be removed in Spark 3.0
> --
>
> Key: SPARK-29930
> URL: https://issues.apache.org/jira/browse/SPARK-29930
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 3.0.0
>
>
> Need to remove the following SQL configs:
> * spark.sql.fromJsonForceNullableSchema
> * spark.sql.legacy.compareDateTimestampInTimestamp



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29932) lint-r should do non-zero exit in case of errors

2019-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29932.
---
Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 26561
[https://github.com/apache/spark/pull/26561]

> lint-r should do non-zero exit in case of errors
> 
>
> Key: SPARK-29932
> URL: https://issues.apache.org/jira/browse/SPARK-29932
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.3.4, 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29932) lint-r should do non-zero exit in case of errors

2019-11-17 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29932:
-

Assignee: Dongjoon Hyun

> lint-r should do non-zero exit in case of errors
> 
>
> Key: SPARK-29932
> URL: https://issues.apache.org/jira/browse/SPARK-29932
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.3.4, 2.4.4, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29758) json_tuple truncates fields

2019-11-17 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976106#comment-16976106
 ] 

Maxim Gekk commented on SPARK-29758:


Another solution is to remove this optimization: 
https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478

> json_tuple truncates fields
> ---
>
> Key: SPARK-29758
> URL: https://issues.apache.org/jira/browse/SPARK-29758
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.4
> Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave 
> 10.14.3, Spark 2.4.4)
> Jdk 8, Scala 2.11.12
>Reporter: Stanislav
>Priority: Major
>
> `json_tuple` has inconsistent behaviour with `from_json` - but only if json 
> string is longer than 2700 characters or so.
> This can be reproduced in spark-shell and on cluster, but not in scalatest, 
> for some reason.
> {code}
> import org.apache.spark.sql.functions.{from_json, json_tuple}
> import org.apache.spark.sql.types._
> val counterstring = 
> "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*"
> val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("result", json_tuple('json, "test"))
>   .select('result)
>   .as[String].head.length
> val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", 
> StringType)
>   .withColumn("result", $"parsed.test")
>   .select('result)
>   .as[String].head.length
> scala> json_tuple_result
> res62: Int = 2791
> scala> from_json_result
> res63: Int = 2800
> {code}
> Result is influenced by the total length of the json string at the moment of 
> parsing:
> {code}
> val json_tuple_result_with_prefix = Seq(s"""{"prefix": "dummy", 
> "test":"$counterstring"}""").toDF("json")
>   .withColumn("result", json_tuple('json, "test"))
>   .select('result)
>   .as[String].head.length
> scala> json_tuple_result_with_prefix
> res64: Int = 2772
> {code}

[jira] [Commented] (SPARK-29575) from_json can produce nulls for fields which are marked as non-nullable

2019-11-17 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976102#comment-16976102
 ] 

Maxim Gekk commented on SPARK-29575:


This is intentional behavior. User's schema is forcibly set as nullable. See 
SPARK-23173  

> from_json can produce nulls for fields which are marked as non-nullable
> ---
>
> Key: SPARK-29575
> URL: https://issues.apache.org/jira/browse/SPARK-29575
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.4
>Reporter: Victor Lopez
>Priority: Major
>
> I believe this issue was resolved elsewhere 
> (https://issues.apache.org/jira/browse/SPARK-23173), though for Pyspark this 
> bug seems to still be there.
> The issue appears when using {{from_json}} to parse a column in a Spark 
> dataframe. It seems like {{from_json}} ignores whether the schema provided 
> has any {{nullable:False}} property.
> {code:java}
> schema = T.StructType().add(T.StructField('id', T.LongType(), 
> nullable=False)).add(T.StructField('name', T.StringType(), nullable=False))
> data = [{'user': str({'name': 'joe', 'id':1})}, {'user': str({'name': 
> 'jane'})}]
> df = spark.read.json(sc.parallelize(data))
> df.withColumn("details", F.from_json("user", 
> schema)).select("details.*").show()
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29758) json_tuple truncates fields

2019-11-17 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976099#comment-16976099
 ] 

Maxim Gekk commented on SPARK-29758:


I have reproduced the issue on 2.4. The problem is in Jackson core 2.6.7. It 
was fixed by 
https://github.com/FasterXML/jackson-core/commit/554f8db0f940b2a53f974852a2af194739d65200#diff-7990edc67621822770cdc62e12d933d4R647-R650
 in the version 2.7.7. We could try to back port this 
https://github.com/apache/spark/pull/21596 on 2.4. [~hyukjin.kwon] WDYT? 

> json_tuple truncates fields
> ---
>
> Key: SPARK-29758
> URL: https://issues.apache.org/jira/browse/SPARK-29758
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.4
> Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave 
> 10.14.3, Spark 2.4.4)
> Jdk 8, Scala 2.11.12
>Reporter: Stanislav
>Priority: Major
>
> `json_tuple` has inconsistent behaviour with `from_json` - but only if json 
> string is longer than 2700 characters or so.
> This can be reproduced in spark-shell and on cluster, but not in scalatest, 
> for some reason.
> {code}
> import org.apache.spark.sql.functions.{from_json, json_tuple}
> import org.apache.spark.sql.types._
> val counterstring = 
> "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*"
> val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("result", json_tuple('json, "test"))
>   .select('result)
>   .as[String].head.length
> val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", 
> StringType)
>   .withColumn("result", $"parsed.test")
>   .select('result)
>   .as[String].head.length
> scala> json_tuple_result
> res62: Int = 2791
> scala> from_json_result
> res63: Int = 2800
> {code}
> Result is influenced by the total length of the json string at the moment of 
> parsing:
> {code}
> val json_tuple_result_with_prefix = Seq(s"""{"prefix": "dummy", 
> "test":"$counterstring"}""").toDF("json")
>

[jira] [Created] (SPARK-29934) Dataset support GraphX

2019-11-17 Thread darion yaphet (Jira)

darion yaphet created SPARK-29934:
-

 Summary: Dataset support GraphX
 Key: SPARK-29934
 URL: https://issues.apache.org/jira/browse/SPARK-29934
 Project: Spark
  Issue Type: Bug
  Components: Graph, GraphX, Spark Core
Affects Versions: 2.4.4
Reporter: darion yaphet


Do we have some plan to support GraphX with dataset ？



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29644) Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils

2019-11-17 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-29644:
-
Fix Version/s: 2.4.5

> Corrected ShortType and ByteType mapping to SmallInt and TinyInt in JDBCUtils
> -
>
> Key: SPARK-29644
> URL: https://issues.apache.org/jira/browse/SPARK-29644
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Shiv Prashant Sood
>Assignee: Shiv Prashant Sood
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> @maropu pointed out this issue during  [PR 
> 25344|https://github.com/apache/spark/pull/25344]  review discussion.
>  In 
> [JDBCUtils.scala|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala]
>  line number 547
> case ShortType =>
>  (stmt: PreparedStatement, row: Row, pos: Int) =>
>  stmt.setInt(pos + 1, row.getShort(pos))
> I dont see any reproducible issue, but this is clearly a problem that must be 
> fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29456) Add tooltip information for Session Statistics Table column in JDBC/ODBC Server Tab

2019-11-17 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29456.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 26138
[https://github.com/apache/spark/pull/26138]

> Add tooltip information for Session Statistics Table column  in JDBC/ODBC 
> Server Tab
> 
>
> Key: SPARK-29456
> URL: https://issues.apache.org/jira/browse/SPARK-29456
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: pavithra ramachandran
>Priority: Minor
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29456) Add tooltip information for Session Statistics Table column in JDBC/ODBC Server Tab

2019-11-17 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-29456:
-
Fix Version/s: (was: 3.1.0)
   3.0.0

> Add tooltip information for Session Statistics Table column  in JDBC/ODBC 
> Server Tab
> 
>
> Key: SPARK-29456
> URL: https://issues.apache.org/jira/browse/SPARK-29456
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: pavithra ramachandran
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29456) Add tooltip information for Session Statistics Table column in JDBC/ODBC Server Tab

2019-11-17 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-29456:


Assignee: pavithra ramachandran

> Add tooltip information for Session Statistics Table column  in JDBC/ODBC 
> Server Tab
> 
>
> Key: SPARK-29456
> URL: https://issues.apache.org/jira/browse/SPARK-29456
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: pavithra ramachandran
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29456) Add tooltip information for Session Statistics Table column in JDBC/ODBC Server Tab

2019-11-17 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-29456:
-
Priority: Minor  (was: Major)

> Add tooltip information for Session Statistics Table column  in JDBC/ODBC 
> Server Tab
> 
>
> Key: SPARK-29456
> URL: https://issues.apache.org/jira/browse/SPARK-29456
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29933) ThriftServerQueryTestSuite runs tests with wrong settings

2019-11-17 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-29933:
---
Attachment: filter_tests.patch

> ThriftServerQueryTestSuite runs tests with wrong settings
> -
>
> Key: SPARK-29933
> URL: https://issues.apache.org/jira/browse/SPARK-29933
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: filter_tests.patch
>
>
> ThriftServerQueryTestSuite must run ANSI tests in the Spark dialect but it 
> keeps settings from previous runs. And in fact, it run `ansi/interval.sql` in 
> the PostgreSQL dialect. See 
> https://github.com/apache/spark/pull/26473#issuecomment-554510643



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29933) ThriftServerQueryTestSuite runs tests with wrong settings

2019-11-17 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-29933:
--

 Summary: ThriftServerQueryTestSuite runs tests with wrong settings
 Key: SPARK-29933
 URL: https://issues.apache.org/jira/browse/SPARK-29933
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


ThriftServerQueryTestSuite must run ANSI tests in the Spark dialect but it 
keeps settings from previous runs. And in fact, it run `ansi/interval.sql` in 
the PostgreSQL dialect. See 
https://github.com/apache/spark/pull/26473#issuecomment-554510643



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0

2019-11-17 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975944#comment-16975944
 ] 

Maxim Gekk commented on SPARK-29931:


> It's conceivable there could a reason to do it later, or sooner.

Later is not problem what about sooner. Most of the configs were added for 
Spark 3.0. If you decide to remove one of them in a minor release between 3.0 
and 4.0, you can break user apps that is unacceptable for minor releases, I do 
believe.

> Declare all SQL legacy configs as will be removed in Spark 4.0
> --
>
> Key: SPARK-29931
> URL: https://issues.apache.org/jira/browse/SPARK-29931
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Add the sentence to descriptions of all legacy SQL configs existed before 
> Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of 
> such configs:
> * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName
> * spark.sql.legacy.literal.pickMinimumPrecision
> * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation
> * spark.sql.legacy.sizeOfNull
> * spark.sql.legacy.replaceDatabricksSparkAvro.enabled
> * spark.sql.legacy.setopsPrecedence.enabled
> * spark.sql.legacy.integralDivide.returnBigint
> * spark.sql.legacy.bucketedTableScan.outputOrdering
> * spark.sql.legacy.parser.havingWithoutGroupByAsWhere
> * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue
> * spark.sql.legacy.setCommandRejectsSparkCoreConfs
> * spark.sql.legacy.utcTimestampFunc.enabled
> * spark.sql.legacy.typeCoercion.datetimeToString
> * spark.sql.legacy.looseUpcast
> * spark.sql.legacy.ctePrecedence.enabled
> * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

61 matches

Mail list logo