date:20200329

[jira] [Assigned] (SPARK-30532) DataFrameStatFunctions.approxQuantile doesn't work with TABLE.COLUMN syntax

2020-03-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30532:
---

Assignee: Oleksii Kachaiev

> DataFrameStatFunctions.approxQuantile doesn't work with TABLE.COLUMN syntax
> ---
>
> Key: SPARK-30532
> URL: https://issues.apache.org/jira/browse/SPARK-30532
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Chris Suchanek
>Assignee: Oleksii Kachaiev
>Priority: Minor
> Fix For: 3.0.0
>
>
> The DataFrameStatFunctions.approxQuantile doesn't work with fully qualified 
> column name (i.e TABLE_NAME.COLUMN_NAME) which is often the way you refer to 
> the column when working with joined dataframes having ambiguous column names.
> See code below for example.
> {code:java}
> import scala.util.Random
> val l = (0 to 1000).map(_ => Random.nextGaussian() * 1000)
> val df1 = sc.parallelize(l).toDF("num").as("tt1")
> val df2 = sc.parallelize(l).toDF("num").as("tt2")
> val dfx = df2.crossJoin(df1)
> dfx.stat.approxQuantile("tt1.num", Array(0.1), 0.0)
> // throws: java.lang.IllegalArgumentException: Field "tt1.num" does not exist.
> Available fields: num
> dfx.stat.approxQuantile("num", Array(0.1), 0.0)
> // throws: org.apache.spark.sql.AnalysisException: Reference 'num' is 
> ambiguous, could be: tt2.num, tt1.num.;{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30532) DataFrameStatFunctions.approxQuantile doesn't work with TABLE.COLUMN syntax

2020-03-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30532.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27916
[https://github.com/apache/spark/pull/27916]

> DataFrameStatFunctions.approxQuantile doesn't work with TABLE.COLUMN syntax
> ---
>
> Key: SPARK-30532
> URL: https://issues.apache.org/jira/browse/SPARK-30532
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Chris Suchanek
>Priority: Minor
> Fix For: 3.0.0
>
>
> The DataFrameStatFunctions.approxQuantile doesn't work with fully qualified 
> column name (i.e TABLE_NAME.COLUMN_NAME) which is often the way you refer to 
> the column when working with joined dataframes having ambiguous column names.
> See code below for example.
> {code:java}
> import scala.util.Random
> val l = (0 to 1000).map(_ => Random.nextGaussian() * 1000)
> val df1 = sc.parallelize(l).toDF("num").as("tt1")
> val df2 = sc.parallelize(l).toDF("num").as("tt2")
> val dfx = df2.crossJoin(df1)
> dfx.stat.approxQuantile("tt1.num", Array(0.1), 0.0)
> // throws: java.lang.IllegalArgumentException: Field "tt1.num" does not exist.
> Available fields: num
> dfx.stat.approxQuantile("num", Array(0.1), 0.0)
> // throws: org.apache.spark.sql.AnalysisException: Reference 'num' is 
> ambiguous, could be: tt2.num, tt1.num.;{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31283) Simplify ChiSq by adding a common method

2020-03-29 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-31283:


Assignee: zhengruifeng

> Simplify ChiSq by adding a common method
> 
>
> Key: SPARK-31283
> URL: https://issues.apache.org/jira/browse/SPARK-31283
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Major
>
> The logic in '{color:#c7a65d}chiSquaredDenseFeatures{color}' and 
> '{color:#c7a65d}chiSquaredSparseFeatures{color}' can be unified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31283) Simplify ChiSq by adding a common method

2020-03-29 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-31283.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 28045
[https://github.com/apache/spark/pull/28045]

> Simplify ChiSq by adding a common method
> 
>
> Key: SPARK-31283
> URL: https://issues.apache.org/jira/browse/SPARK-31283
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Major
> Fix For: 3.1.0
>
>
> The logic in '{color:#c7a65d}chiSquaredDenseFeatures{color}' and 
> '{color:#c7a65d}chiSquaredSparseFeatures{color}' can be unified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31286) Specify formats of time zone ID for JSON/CSV option and from/to_utc_timestamp

2020-03-29 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31286.
-
Resolution: Fixed

Issue resolved by pull request 28051
[https://github.com/apache/spark/pull/28051]

> Specify formats of time zone ID for JSON/CSV option and from/to_utc_timestamp
> -
>
> Key: SPARK-31286
> URL: https://issues.apache.org/jira/browse/SPARK-31286
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> There are two distinct types of ID (see 
> https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html):
> # Fixed offsets - a fully resolved offset from UTC/Greenwich, that uses the 
> same offset for all local date-times
> # Geographical regions - an area where a specific set of rules for finding 
> the offset from UTC/Greenwich apply
> For example three-letter time zone IDs are ambitious, and depend on the 
> locale. They have been already deprecated in JDK, see 
> https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html :
> {code}
> For compatibility with JDK 1.1.x, some other three-letter time zone IDs (such 
> as "PST", "CTT", "AST") are also supported. However, their use is deprecated 
> because the same abbreviation is often used for multiple time zones (for 
> example, "CST" could be U.S. "Central Standard Time" and "China Standard 
> Time"), and the Java platform can then only recognize one of them.
> {code}
> The ticket aims to specify formats of the `timeZone` option in JSON/CSV 
> datasource, and the `tz` parameter of the from_utc_timestamp() and 
> to_utc_timestamp() functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2020-03-29 Thread Reynold Xin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070688#comment-17070688
 ] 

Reynold Xin commented on SPARK-22231:
-

[~fqaiser94] thanks for your persistence and my apologies for the delay. You 
have my buy-in. This is a great idea.

 

> Support of map, filter, withColumn, dropColumn in nested list of structures
> ---
>
> Key: SPARK-22231
> URL: https://issues.apache.org/jira/browse/SPARK-22231
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: DB Tsai
>Assignee: Jeremy Smith
>Priority: Major
>
> At Netflix's algorithm team, we work on ranking problems to find the great 
> content to fulfill the unique tastes of our members. Before building a 
> recommendation algorithms, we need to prepare the training, testing, and 
> validation datasets in Apache Spark. Due to the nature of ranking problems, 
> we have a nested list of items to be ranked in one column, and the top level 
> is the contexts describing the setting for where a model is to be used (e.g. 
> profiles, country, time, device, etc.)  Here is a blog post describing the 
> details, [Distributed Time Travel for Feature 
> Generation|https://medium.com/netflix-techblog/distributed-time-travel-for-feature-generation-389cccdd3907].
>  
> To be more concrete, for the ranks of videos for a given profile_id at a 
> given country, our data schema can be looked like this,
> {code:java}
> root
>  |-- profile_id: long (nullable = true)
>  |-- country_iso_code: string (nullable = true)
>  |-- items: array (nullable = false)
>  ||-- element: struct (containsNull = false)
>  |||-- title_id: integer (nullable = true)
>  |||-- scores: double (nullable = true)
> ...
> {code}
> We oftentimes need to work on the nested list of structs by applying some 
> functions on them. Sometimes, we're dropping or adding new columns in the 
> nested list of structs. Currently, there is no easy solution in open source 
> Apache Spark to perform those operations using SQL primitives; many people 
> just convert the data into RDD to work on the nested level of data, and then 
> reconstruct the new dataframe as workaround. This is extremely inefficient 
> because all the optimizations like predicate pushdown in SQL can not be 
> performed, we can not leverage on the columnar format, and the serialization 
> and deserialization cost becomes really huge even we just want to add a new 
> column in the nested level.
> We built a solution internally at Netflix which we're very happy with. We 
> plan to make it open source in Spark upstream. We would like to socialize the 
> API design to see if we miss any use-case.  
> The first API we added is *mapItems* on dataframe which take a function from 
> *Column* to *Column*, and then apply the function on nested dataframe. Here 
> is an example,
> {code:java}
> case class Data(foo: Int, bar: Double, items: Seq[Double])
> val df: Dataset[Data] = spark.createDataset(Seq(
>   Data(10, 10.0, Seq(10.1, 10.2, 10.3, 10.4)),
>   Data(20, 20.0, Seq(20.1, 20.2, 20.3, 20.4))
> ))
> val result = df.mapItems("items") {
>   item => item * 2.0
> }
> result.printSchema()
> // root
> // |-- foo: integer (nullable = false)
> // |-- bar: double (nullable = false)
> // |-- items: array (nullable = true)
> // ||-- element: double (containsNull = true)
> result.show()
> // +---+++
> // |foo| bar|   items|
> // +---+++
> // | 10|10.0|[20.2, 20.4, 20.6...|
> // | 20|20.0|[40.2, 40.4, 40.6...|
> // +---+++
> {code}
> Now, with the ability of applying a function in the nested dataframe, we can 
> add a new function, *withColumn* in *Column* to add or replace the existing 
> column that has the same name in the nested list of struct. Here is two 
> examples demonstrating the API together with *mapItems*; the first one 
> replaces the existing column,
> {code:java}
> case class Item(a: Int, b: Double)
> case class Data(foo: Int, bar: Double, items: Seq[Item])
> val df: Dataset[Data] = spark.createDataset(Seq(
>   Data(10, 10.0, Seq(Item(10, 10.0), Item(11, 11.0))),
>   Data(20, 20.0, Seq(Item(20, 20.0), Item(21, 21.0)))
> ))
> val result = df.mapItems("items") {
>   item => item.withColumn(item("b") + 1 as "b")
> }
> result.printSchema
> root
> // |-- foo: integer (nullable = false)
> // |-- bar: double (nullable = false)
> // |-- items: array (nullable = true)
> // ||-- element: struct (containsNull = true)
> // |||-- a: integer (nullable = true)
> // |||-- b: double (nullable = true)
> result.show(false)
> // +---++--+
> // |foo|bar |items |
> //

[jira] [Updated] (SPARK-31291) SQLQueryTestSuite: Avoid load test data if test case not uses them.

2020-03-29 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-31291:
---
Summary: SQLQueryTestSuite: Avoid load test data if test case not uses 
them.  (was: SQLQueryTestSuiteAvoid load test data if test case not uses them)

> SQLQueryTestSuite: Avoid load test data if test case not uses them.
> ---
>
> Key: SPARK-31291
> URL: https://issues.apache.org/jira/browse/SPARK-31291
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Minor
>
> SQLQueryTestSuite spend 35 minutes time to test.
> I checked the code and found SQLQueryTestSuite load test data repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31291) SQLQueryTestSuiteAvoid load test data if test case not uses them

2020-03-29 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-31291:
---
Summary: SQLQueryTestSuiteAvoid load test data if test case not uses them  
(was: Avoid load test data if test case not uses them)

> SQLQueryTestSuiteAvoid load test data if test case not uses them
> 
>
> Key: SPARK-31291
> URL: https://issues.apache.org/jira/browse/SPARK-31291
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Minor
>
> SQLQueryTestSuite spend 35 minutes time to test.
> I checked the code and found SQLQueryTestSuite load test data repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31291) Avoid load test data if test case not uses them

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31291:
--
Parent: SPARK-25604
Issue Type: Sub-task  (was: Improvement)

> Avoid load test data if test case not uses them
> ---
>
> Key: SPARK-31291
> URL: https://issues.apache.org/jira/browse/SPARK-31291
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Minor
>
> SQLQueryTestSuite spend 35 minutes time to test.
> I checked the code and found SQLQueryTestSuite load test data repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31291) Avoid load test data if test case not uses them

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31291:
--
Priority: Minor  (was: Major)

> Avoid load test data if test case not uses them
> ---
>
> Key: SPARK-31291
> URL: https://issues.apache.org/jira/browse/SPARK-31291
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Minor
>
> SQLQueryTestSuite spend 35 minutes time to test.
> I checked the code and found SQLQueryTestSuite load test data repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31291) Avoid load test data if test case not uses them

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31291:
--
Component/s: Tests

> Avoid load test data if test case not uses them
> ---
>
> Key: SPARK-31291
> URL: https://issues.apache.org/jira/browse/SPARK-31291
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> SQLQueryTestSuite spend 35 minutes time to test.
> I checked the code and found SQLQueryTestSuite load test data repeatedly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31101) Upgrade Janino to 3.0.16

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31101:
--
Fix Version/s: (was: 3.1.0)

> Upgrade Janino to 3.0.16
> 
>
> Key: SPARK-31101
> URL: https://issues.apache.org/jira/browse/SPARK-31101
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0, 2.4.6
>
>
> We got some report on failure on user's query which Janino throws error on 
> compiling generated code. The issue is here: janino-compiler/janino#113 It 
> contains the information of generated code, symptom (error), and analysis of 
> the bug, so please refer the link for more details.
> Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable 
> Janino to succeed to compile user's query properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31101) Upgrade Janino to 3.0.16

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31101:
--
Fix Version/s: 2.4.6
   3.0.0

> Upgrade Janino to 3.0.16
> 
>
> Key: SPARK-31101
> URL: https://issues.apache.org/jira/browse/SPARK-31101
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0, 3.1.0, 2.4.6
>
>
> We got some report on failure on user's query which Janino throws error on 
> compiling generated code. The issue is here: janino-compiler/janino#113 It 
> contains the information of generated code, symptom (error), and analysis of 
> the bug, so please refer the link for more details.
> Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable 
> Janino to succeed to compile user's query properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31101) Upgrade Janino to 3.0.16

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31101:
--
Issue Type: Bug  (was: Dependency upgrade)

> Upgrade Janino to 3.0.16
> 
>
> Key: SPARK-31101
> URL: https://issues.apache.org/jira/browse/SPARK-31101
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0, 2.4.6
>
>
> We got some report on failure on user's query which Janino throws error on 
> compiling generated code. The issue is here: janino-compiler/janino#113 It 
> contains the information of generated code, symptom (error), and analysis of 
> the bug, so please refer the link for more details.
> Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable 
> Janino to succeed to compile user's query properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31101) Upgrade Janino to 3.0.16

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31101:
--
Component/s: (was: SQL)
 Build

> Upgrade Janino to 3.0.16
> 
>
> Key: SPARK-31101
> URL: https://issues.apache.org/jira/browse/SPARK-31101
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0, 2.4.6
>
>
> We got some report on failure on user's query which Janino throws error on 
> compiling generated code. The issue is here: janino-compiler/janino#113 It 
> contains the information of generated code, symptom (error), and analysis of 
> the bug, so please refer the link for more details.
> Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable 
> Janino to succeed to compile user's query properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31101) Upgrade Janino to 3.0.16

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31101:
--
Affects Version/s: (was: 3.1.0)
   3.0.0

> Upgrade Janino to 3.0.16
> 
>
> Key: SPARK-31101
> URL: https://issues.apache.org/jira/browse/SPARK-31101
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0, 3.1.0, 2.4.6
>
>
> We got some report on failure on user's query which Janino throws error on 
> compiling generated code. The issue is here: janino-compiler/janino#113 It 
> contains the information of generated code, symptom (error), and analysis of 
> the bug, so please refer the link for more details.
> Janino 3.0.16 contains the PR janino-compiler/janino#114 which would enable 
> Janino to succeed to compile user's query properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31293) Fix wrong examples and help messages for Kinesis integration

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31293:
--
Affects Version/s: 2.3.4

> Fix wrong examples and help messages for Kinesis integration
> 
>
> Key: SPARK-31293
> URL: https://issues.apache.org/jira/browse/SPARK-31293
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, DStreams
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
> Fix For: 3.0.0, 2.4.6
>
>
> There are some minor mistakes in the examples and the help messages for 
> Kinesis integration. For example, {{KinesisWordCountASL.scala}} takes three 
> arguments but its example is taking four, while {{kinesis_wordcount_asl.py}} 
> takes four but its example is taking three.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31293) Fix wrong examples and help messages for Kinesis integration

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31293:
-

Assignee: Kengo Seki

> Fix wrong examples and help messages for Kinesis integration
> 
>
> Key: SPARK-31293
> URL: https://issues.apache.org/jira/browse/SPARK-31293
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, DStreams
>Affects Versions: 3.0.0
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
>
> There are some minor mistakes in the examples and the help messages for 
> Kinesis integration. For example, {{KinesisWordCountASL.scala}} takes three 
> arguments but its example is taking four, while {{kinesis_wordcount_asl.py}} 
> takes four but its example is taking three.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31293) Fix wrong examples and help messages for Kinesis integration

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31293.
---
Fix Version/s: 3.0.0
   2.4.6
   Resolution: Fixed

Issue resolved by pull request 28063
[https://github.com/apache/spark/pull/28063]

> Fix wrong examples and help messages for Kinesis integration
> 
>
> Key: SPARK-31293
> URL: https://issues.apache.org/jira/browse/SPARK-31293
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, DStreams
>Affects Versions: 3.0.0
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
> Fix For: 2.4.6, 3.0.0
>
>
> There are some minor mistakes in the examples and the help messages for 
> Kinesis integration. For example, {{KinesisWordCountASL.scala}} takes three 
> arguments but its example is taking four, while {{kinesis_wordcount_asl.py}} 
> takes four but its example is taking three.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31293) Fix wrong examples and help messages for Kinesis integration

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31293:
--
Affects Version/s: 2.4.5

> Fix wrong examples and help messages for Kinesis integration
> 
>
> Key: SPARK-31293
> URL: https://issues.apache.org/jira/browse/SPARK-31293
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, DStreams
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
> Fix For: 3.0.0, 2.4.6
>
>
> There are some minor mistakes in the examples and the help messages for 
> Kinesis integration. For example, {{KinesisWordCountASL.scala}} takes three 
> arguments but its example is taking four, while {{kinesis_wordcount_asl.py}} 
> takes four but its example is taking three.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31293) Fix wrong examples and help messages for Kinesis integration

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31293:
--
Issue Type: Bug  (was: Improvement)

> Fix wrong examples and help messages for Kinesis integration
> 
>
> Key: SPARK-31293
> URL: https://issues.apache.org/jira/browse/SPARK-31293
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, DStreams
>Affects Versions: 3.0.0
>Reporter: Kengo Seki
>Priority: Minor
>
> There are some minor mistakes in the examples and the help messages for 
> Kinesis integration. For example, {{KinesisWordCountASL.scala}} takes three 
> arguments but its example is taking four, while {{kinesis_wordcount_asl.py}} 
> takes four but its example is taking three.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31281) Hit OOM Error - GC Limit

2020-03-29 Thread Alfred Davidson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070478#comment-17070478
 ] 

Alfred Davidson edited comment on SPARK-31281 at 3/29/20, 6:54 PM:
---

The allocated driver memory will be split for storage, memoryOverhead etc. Your 
transformation is doing a join (which is likely to be a broadcast join) and you 
have an action that is bringing the data to the driver - the driver doesn’t 
have enough memory (and initially trying to GC to free up space). You can 
either allocate more driver memory or change the fraction that it allocations 
for storage. I believe default value is 0.6 e.g reserves 60% of driver memory 
for storage


was (Author: alfiewdavidson):
The allocated driver memory will be split for storage, memoryOverhead etc. As 
your action is bringing the data to the driver - the driver doesn’t have enough 
memory (and initially trying to GC to free up space). You can either allocate 
more driver memory or change the fraction that it allocations for storage. I 
believe default value is 0.6 e.g reserves 60% of driver memory for storage

> Hit OOM Error - GC Limit
> 
>
> Key: SPARK-31281
> URL: https://issues.apache.org/jira/browse/SPARK-31281
> Project: Spark
>  Issue Type: Question
>  Components: Java API
>Affects Versions: 2.4.4
>Reporter: HongJin
>Priority: Critical
>
> MemoryStore is 2.6GB
> conf = new SparkConf().setAppName("test")
>  //.set("spark.sql.codegen.wholeStage", "false")
>  .set("spark.driver.host", "localhost")
>  .set("spark.driver.memory", "4g")
>  .set("spark.executor.cores","1")
>  .set("spark.num.executors","1")
>  .set("spark.executor.memory", "4g")
>  .set("spark.executor.memoryOverhead", "400m")
>  .set("spark.dynamicAllocation.enabled", "true")
>  .set("spark.dynamicAllocation.minExecutors","1")
>  .set("spark.dynamicAllocation.maxExecutors","2")
>  .set("spark.ui.enabled","true") //enable spark UI
>  .set("spark.sql.shuffle.partitions",defaultPartitions)
>  .setMaster("local[2]")
>  sparkSession = SparkSession.builder.config(conf).getOrCreate()
>  
> val df = SparkFactory.sparkSession.sqlContext
>  .read
>  .option("header", "true")
>  .option("delimiter", delimiter)
>  .csv(textFileLocation)
>  
> joinedDf = upperCaseLeft.as("l")
>  .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
>  .select(compositeKeysCol ::: nonKeyCols.map(col => 
> mapHelper(col,toleranceValue,caseSensitive)): _*)
>  
> data = joinedDf.take(maxRecords)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31281) Hit OOM Error - GC Limit

2020-03-29 Thread Alfred Davidson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070478#comment-17070478
 ] 

Alfred Davidson edited comment on SPARK-31281 at 3/29/20, 6:54 PM:
---

The allocated driver memory will be split for storage, memoryOverhead etc. You 
are executing a join (which is likely to be a broadcast join) and you have an 
action that is bringing the data to the driver - the driver doesn’t have enough 
memory (and initially trying to GC to free up space). You can either allocate 
more driver memory or change the fraction that it allocations for storage. I 
believe default value is 0.6 e.g reserves 60% of driver memory for storage


was (Author: alfiewdavidson):
The allocated driver memory will be split for storage, memoryOverhead etc. Your 
transformation is doing a join (which is likely to be a broadcast join) and you 
have an action that is bringing the data to the driver - the driver doesn’t 
have enough memory (and initially trying to GC to free up space). You can 
either allocate more driver memory or change the fraction that it allocations 
for storage. I believe default value is 0.6 e.g reserves 60% of driver memory 
for storage

> Hit OOM Error - GC Limit
> 
>
> Key: SPARK-31281
> URL: https://issues.apache.org/jira/browse/SPARK-31281
> Project: Spark
>  Issue Type: Question
>  Components: Java API
>Affects Versions: 2.4.4
>Reporter: HongJin
>Priority: Critical
>
> MemoryStore is 2.6GB
> conf = new SparkConf().setAppName("test")
>  //.set("spark.sql.codegen.wholeStage", "false")
>  .set("spark.driver.host", "localhost")
>  .set("spark.driver.memory", "4g")
>  .set("spark.executor.cores","1")
>  .set("spark.num.executors","1")
>  .set("spark.executor.memory", "4g")
>  .set("spark.executor.memoryOverhead", "400m")
>  .set("spark.dynamicAllocation.enabled", "true")
>  .set("spark.dynamicAllocation.minExecutors","1")
>  .set("spark.dynamicAllocation.maxExecutors","2")
>  .set("spark.ui.enabled","true") //enable spark UI
>  .set("spark.sql.shuffle.partitions",defaultPartitions)
>  .setMaster("local[2]")
>  sparkSession = SparkSession.builder.config(conf).getOrCreate()
>  
> val df = SparkFactory.sparkSession.sqlContext
>  .read
>  .option("header", "true")
>  .option("delimiter", delimiter)
>  .csv(textFileLocation)
>  
> joinedDf = upperCaseLeft.as("l")
>  .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
>  .select(compositeKeysCol ::: nonKeyCols.map(col => 
> mapHelper(col,toleranceValue,caseSensitive)): _*)
>  
> data = joinedDf.take(maxRecords)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31281) Hit OOM Error - GC Limit

2020-03-29 Thread Alfred Davidson (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070478#comment-17070478
 ] 

Alfred Davidson commented on SPARK-31281:
-

The allocated driver memory will be split for storage, memoryOverhead etc. As 
your action is bringing the data to the driver - the driver doesn’t have enough 
memory (and initially trying to GC to free up space). You can either allocate 
more driver memory or change the fraction that it allocations for storage. I 
believe default value is 0.6 e.g reserves 60% of driver memory for storage

> Hit OOM Error - GC Limit
> 
>
> Key: SPARK-31281
> URL: https://issues.apache.org/jira/browse/SPARK-31281
> Project: Spark
>  Issue Type: Question
>  Components: Java API
>Affects Versions: 2.4.4
>Reporter: HongJin
>Priority: Critical
>
> MemoryStore is 2.6GB
> conf = new SparkConf().setAppName("test")
>  //.set("spark.sql.codegen.wholeStage", "false")
>  .set("spark.driver.host", "localhost")
>  .set("spark.driver.memory", "4g")
>  .set("spark.executor.cores","1")
>  .set("spark.num.executors","1")
>  .set("spark.executor.memory", "4g")
>  .set("spark.executor.memoryOverhead", "400m")
>  .set("spark.dynamicAllocation.enabled", "true")
>  .set("spark.dynamicAllocation.minExecutors","1")
>  .set("spark.dynamicAllocation.maxExecutors","2")
>  .set("spark.ui.enabled","true") //enable spark UI
>  .set("spark.sql.shuffle.partitions",defaultPartitions)
>  .setMaster("local[2]")
>  sparkSession = SparkSession.builder.config(conf).getOrCreate()
>  
> val df = SparkFactory.sparkSession.sqlContext
>  .read
>  .option("header", "true")
>  .option("delimiter", delimiter)
>  .csv(textFileLocation)
>  
> joinedDf = upperCaseLeft.as("l")
>  .join(upperCaseRight.as("r"), caseTransformedKeys, "full_outer")
>  .select(compositeKeysCol ::: nonKeyCols.map(col => 
> mapHelper(col,toleranceValue,caseSensitive)): _*)
>  
> data = joinedDf.take(maxRecords)
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31280) Perform propagating empty relation after RewritePredicateSubquery

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31280.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 28043
[https://github.com/apache/spark/pull/28043]

> Perform propagating empty relation after RewritePredicateSubquery
> -
>
> Key: SPARK-31280
> URL: https://issues.apache.org/jira/browse/SPARK-31280
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.0
>
>
> {code:java}
> scala> spark.sql(" select * from values(1), (2) t(key) where key in (select 1 
> as key where 1=0)").queryExecution
> res15: org.apache.spark.sql.execution.QueryExecution =
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter 'key IN (list#39 [])
>:  +- Project [1 AS key#38]
>: +- Filter (1 = 0)
>:+- OneRowRelation
>+- 'SubqueryAlias t
>   +- 'UnresolvedInlineTable [key], [List(1), List(2)]
> == Analyzed Logical Plan ==
> key: int
> Project [key#40]
> +- Filter key#40 IN (list#39 [])
>:  +- Project [1 AS key#38]
>: +- Filter (1 = 0)
>:+- OneRowRelation
>+- SubqueryAlias t
>   +- LocalRelation [key#40]
> == Optimized Logical Plan ==
> Join LeftSemi, (key#40 = key#38)
> :- LocalRelation [key#40]
> +- LocalRelation , [key#38]
> == Physical Plan ==
> *(1) BroadcastHashJoin [key#40], [key#38], LeftSemi, BuildRight
> :- *(1) LocalTableScan [key#40]
> +- Br...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31280) Perform propagating empty relation after RewritePredicateSubquery

2020-03-29 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31280:
-

Assignee: Kent Yao

> Perform propagating empty relation after RewritePredicateSubquery
> -
>
> Key: SPARK-31280
> URL: https://issues.apache.org/jira/browse/SPARK-31280
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> {code:java}
> scala> spark.sql(" select * from values(1), (2) t(key) where key in (select 1 
> as key where 1=0)").queryExecution
> res15: org.apache.spark.sql.execution.QueryExecution =
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter 'key IN (list#39 [])
>:  +- Project [1 AS key#38]
>: +- Filter (1 = 0)
>:+- OneRowRelation
>+- 'SubqueryAlias t
>   +- 'UnresolvedInlineTable [key], [List(1), List(2)]
> == Analyzed Logical Plan ==
> key: int
> Project [key#40]
> +- Filter key#40 IN (list#39 [])
>:  +- Project [1 AS key#38]
>: +- Filter (1 = 0)
>:+- OneRowRelation
>+- SubqueryAlias t
>   +- LocalRelation [key#40]
> == Optimized Logical Plan ==
> Join LeftSemi, (key#40 = key#38)
> :- LocalRelation [key#40]
> +- LocalRelation , [key#38]
> == Physical Plan ==
> *(1) BroadcastHashJoin [key#40], [key#38], LeftSemi, BuildRight
> :- *(1) LocalTableScan [key#40]
> +- Br...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31297) Speed-up date-time rebasing

2020-03-29 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070457#comment-17070457
 ] 

Maxim Gekk commented on SPARK-31297:


The rebasing of days doesn't depend on time zone, and has just 14 special dates:
{code:scala}
  test("optimize rebasing") {
val start = localDateToDays(LocalDate.of(1, 1, 1))
val end = localDateToDays(LocalDate.of(2030, 1, 1))

var days = start
var diff = Long.MaxValue
var counter = 0
while (days < end) {
  val rebased = rebaseGregorianToJulianDays(days)
  val curDiff = rebased - days
  if (curDiff != diff) {
counter += 1
diff = curDiff
val ld = daysToLocalDate(days)
println(s"local date = $ld days = $days diff = ${diff} days")
  }
  days += 1
}
println(s"counter = $counter")
  }
{code}
{code}
local date = 0001-01-01 days = -719162 diff = -2 days
local date = 0100-03-01 days = -682944 diff = -1 days
local date = 0200-03-01 days = -646420 diff = 0 days
local date = 0300-03-01 days = -609896 diff = 1 days
local date = 0500-03-01 days = -536847 diff = 2 days
local date = 0600-03-01 days = -500323 diff = 3 days
local date = 0700-03-01 days = -463799 diff = 4 days
local date = 0900-03-01 days = -390750 diff = 5 days
local date = 1000-03-01 days = -354226 diff = 6 days
local date = 1100-03-01 days = -317702 diff = 7 days
local date = 1300-03-01 days = -244653 diff = 8 days
local date = 1400-03-01 days = -208129 diff = 9 days
local date = 1500-03-01 days = -171605 diff = 10 days
local date = 1582-10-15 days = -141427 diff = 0 days
counter = 14
{code}

> Speed-up date-time rebasing
> ---
>
> Key: SPARK-31297
> URL: https://issues.apache.org/jira/browse/SPARK-31297
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> I do believe it is possible to speed up date-time rebasing by building a map 
> of micros to diffs between original and rebased micros. And look up at the 
> map via binary search.
> For example, the *America/Los_Angeles* time zone has less than 100 points 
> when diff changes:
> {code:scala}
>   test("optimize rebasing") {
> val start = instantToMicros(LocalDateTime.of(1, 1, 1, 0, 0, 0)
>   .atZone(getZoneId("America/Los_Angeles"))
>   .toInstant)
> val end = instantToMicros(LocalDateTime.of(2030, 1, 1, 0, 0, 0)
>   .atZone(getZoneId("America/Los_Angeles"))
>   .toInstant)
> var micros = start
> var diff = Long.MaxValue
> var counter = 0
> while (micros < end) {
>   val rebased = rebaseGregorianToJulianMicros(micros)
>   val curDiff = rebased - micros
>   if (curDiff != diff) {
> counter += 1
> diff = curDiff
> val ldt = 
> microsToInstant(micros).atZone(getZoneId("America/Los_Angeles")).toLocalDateTime
> println(s"local date-time = $ldt diff = ${diff / MICROS_PER_MINUTE} 
> minutes")
>   }
>   micros += MICROS_PER_HOUR
> }
> println(s"counter = $counter")
>   }
> {code}
> {code:java}
> local date-time = 0001-01-01T00:00 diff = -2909 minutes
> local date-time = 0100-02-28T14:00 diff = -1469 minutes
> local date-time = 0200-02-28T14:00 diff = -29 minutes
> local date-time = 0300-02-28T14:00 diff = 1410 minutes
> local date-time = 0500-02-28T14:00 diff = 2850 minutes
> local date-time = 0600-02-28T14:00 diff = 4290 minutes
> local date-time = 0700-02-28T14:00 diff = 5730 minutes
> local date-time = 0900-02-28T14:00 diff = 7170 minutes
> local date-time = 1000-02-28T14:00 diff = 8610 minutes
> local date-time = 1100-02-28T14:00 diff = 10050 minutes
> local date-time = 1300-02-28T14:00 diff = 11490 minutes
> local date-time = 1400-02-28T14:00 diff = 12930 minutes
> local date-time = 1500-02-28T14:00 diff = 14370 minutes
> local date-time = 1582-10-14T14:00 diff = -29 minutes
> local date-time = 1899-12-31T16:52:58 diff = 0 minutes
> local date-time = 1917-12-27T11:52:58 diff = 60 minutes
> local date-time = 1917-12-27T12:52:58 diff = 0 minutes
> local date-time = 1918-09-15T12:52:58 diff = 60 minutes
> local date-time = 1918-09-15T13:52:58 diff = 0 minutes
> local date-time = 1919-06-30T16:52:58 diff = 31 minutes
> local date-time = 1919-06-30T17:52:58 diff = 0 minutes
> local date-time = 1919-08-15T12:52:58 diff = 60 minutes
> local date-time = 1919-08-15T13:52:58 diff = 0 minutes
> local date-time = 1921-08-31T10:52:58 diff = 60 minutes
> local date-time = 1921-08-31T11:52:58 diff = 0 minutes
> local date-time = 1921-09-30T11:52:58 diff = 60 minutes
> local date-time = 1921-09-30T12:52:58 diff = 0 minutes
> local date-time = 1922-09-30T12:52:58 diff = 60 minutes
> local date-time = 1922-09-30T13:52:58 diff = 0 minutes
> local date-time = 1981-09-30T12:52:58 diff = 60 minutes
> local date-time =

[jira] [Commented] (SPARK-30272) Remove usage of Guava that breaks in Guava 27

2020-03-29 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070391#comment-17070391
 ] 

Sean R. Owen commented on SPARK-30272:
--

The reason is simply that this is the version Hadoop uses through 3.2.0 and the 
binding to Hadoop still relatively tight.
We do shade Guava, and honestly, I have lost track myself of whether this means 
we could vary the Guava version in Spark or whether it causes problems in some 
build configurations. I think it would only work in the "Hadoop provided" build?

But anyway, I've tried for the simplistic solution of simply letting Spark work 
with Guava 14 through 27 equally. It does limit what Spark can use it for a 
bit, but just a little.

I am still kind of puzzled by the fix you found. Did you manually update the 
Hadoop version to 3.2.1? I'd be surprised if hadoop-azure mattered as it isn't 
the thing that pulls in the class in question, Guava is, and in any event, it 
would not refer to or include a Spark shaded version. I can't explain that one.

> Remove usage of Guava that breaks in Guava 27
> -
>
> Key: SPARK-30272
> URL: https://issues.apache.org/jira/browse/SPARK-30272
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Major
> Fix For: 3.0.0
>
>
> Background:
> https://issues.apache.org/jira/browse/SPARK-29250
> https://github.com/apache/spark/pull/25932
> Hadoop 3.2.1 will update Guava from 11 to 27. There are a number of methods 
> that changed between those releases, typically just a rename, but, means one 
> set of code can't work with both, while we want to work with Hadoop 2.x and 
> 3.x. Among them:
> - Objects.toStringHelper was moved to MoreObjects; we can just use the 
> Commons Lang3 equivalent
> - Objects.hashCode etc were renamed; use java.util.Objects equivalents
> - MoreExecutors.sameThreadExecutor() became directExecutor(); for same-thread 
> execution we can use a dummy implementation of ExecutorService / Executor
> - TypeToken.isAssignableFrom become isSupertypeOf; work around with reflection
> There is probably more to the Guava issue than just this change, but it will 
> make Spark itself work with more versions and reduce our exposure to Guava 
> along the way anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31298) validate CTAS table path in SPARK-19724 seems conflict and External table also need to check non-empty

2020-03-29 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-31298:
--
Summary: validate CTAS table path in SPARK-19724 seems conflict and 
External table also need to check non-empty  (was: validate CTAS table path in 
SPARK-19724 seems conflict)

> validate CTAS table path in SPARK-19724 seems conflict and External table 
> also need to check non-empty
> --
>
> Key: SPARK-31298
> URL: https://issues.apache.org/jira/browse/SPARK-31298
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> In  SessionCatalog.validateTableLocation()
> {code:java}
> val tableLocation =
>   new 
> Path(table.storage.locationUri.getOrElse(defaultTablePath(table.identifier)))
> {code}
> But in CreateDataSourceTableAsSelect , table location use defaultTablePath
> {code:java}
> assert(table.schema.isEmpty)
> sparkSession.sessionState.catalog.validateTableLocation(table)
> val tableLocation = if (table.tableType == CatalogTableType.MANAGED) {
>   Some(sessionState.catalog.defaultTablePath(table.identifier))
> } else {
>   table.storage.locationUri
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31298) validate CTAS table path in SPARK-19724 seems conflict

2020-03-29 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-31298:
--
Summary: validate CTAS table path in SPARK-19724 seems conflict  (was: 
validate External path in SPARK-19724 seems conflict)

> validate CTAS table path in SPARK-19724 seems conflict
> --
>
> Key: SPARK-31298
> URL: https://issues.apache.org/jira/browse/SPARK-31298
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> In  SessionCatalog.validateTableLocation()
> {code:java}
> val tableLocation =
>   new 
> Path(table.storage.locationUri.getOrElse(defaultTablePath(table.identifier)))
> {code}
> But in CreateDataSourceTableAsSelect , table location use defaultTablePath
> {code:java}
> assert(table.schema.isEmpty)
> sparkSession.sessionState.catalog.validateTableLocation(table)
> val tableLocation = if (table.tableType == CatalogTableType.MANAGED) {
>   Some(sessionState.catalog.defaultTablePath(table.identifier))
> } else {
>   table.storage.locationUri
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31298) validate External path in SPARK-19724 seems conflict

2020-03-29 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-31298:
--
Description: 
In  SessionCatalog.validateTableLocation()
{code:java}
val tableLocation =
  new 
Path(table.storage.locationUri.getOrElse(defaultTablePath(table.identifier)))

{code}
But in CreateDataSourceTableAsSelect , table location use defaultTablePath
{code:java}
assert(table.schema.isEmpty)
sparkSession.sessionState.catalog.validateTableLocation(table)
val tableLocation = if (table.tableType == CatalogTableType.MANAGED) {
  Some(sessionState.catalog.defaultTablePath(table.identifier))
} else {
  table.storage.locationUri
}

{code}

> validate External path in SPARK-19724 seems conflict
> 
>
> Key: SPARK-31298
> URL: https://issues.apache.org/jira/browse/SPARK-31298
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Major
>
> In  SessionCatalog.validateTableLocation()
> {code:java}
> val tableLocation =
>   new 
> Path(table.storage.locationUri.getOrElse(defaultTablePath(table.identifier)))
> {code}
> But in CreateDataSourceTableAsSelect , table location use defaultTablePath
> {code:java}
> assert(table.schema.isEmpty)
> sparkSession.sessionState.catalog.validateTableLocation(table)
> val tableLocation = if (table.tableType == CatalogTableType.MANAGED) {
>   Some(sessionState.catalog.defaultTablePath(table.identifier))
> } else {
>   table.storage.locationUri
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31298) validate External path in SPARK-19724 seems conflict

2020-03-29 Thread angerszhu (Jira)

angerszhu created SPARK-31298:
-

 Summary: validate External path in SPARK-19724 seems conflict
 Key: SPARK-31298
 URL: https://issues.apache.org/jira/browse/SPARK-31298
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31297) Speed-up date-time rebasing

2020-03-29 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070286#comment-17070286
 ] 

Maxim Gekk commented on SPARK-31297:


[~cloud_fan] [~hyukjin.kwon] [~dongjoon] WDYT?

> Speed-up date-time rebasing
> ---
>
> Key: SPARK-31297
> URL: https://issues.apache.org/jira/browse/SPARK-31297
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> I do believe it is possible to speed up date-time rebasing by building a map 
> of micros to diffs between original and rebased micros. And look up at the 
> map via binary search.
> For example, the *America/Los_Angeles* time zone has less than 100 points 
> when diff changes:
> {code:scala}
>   test("optimize rebasing") {
> val start = instantToMicros(LocalDateTime.of(1, 1, 1, 0, 0, 0)
>   .atZone(getZoneId("America/Los_Angeles"))
>   .toInstant)
> val end = instantToMicros(LocalDateTime.of(2030, 1, 1, 0, 0, 0)
>   .atZone(getZoneId("America/Los_Angeles"))
>   .toInstant)
> var micros = start
> var diff = Long.MaxValue
> var counter = 0
> while (micros < end) {
>   val rebased = rebaseGregorianToJulianMicros(micros)
>   val curDiff = rebased - micros
>   if (curDiff != diff) {
> counter += 1
> diff = curDiff
> val ldt = 
> microsToInstant(micros).atZone(getZoneId("America/Los_Angeles")).toLocalDateTime
> println(s"local date-time = $ldt diff = ${diff / MICROS_PER_MINUTE} 
> minutes")
>   }
>   micros += MICROS_PER_HOUR
> }
> println(s"counter = $counter")
>   }
> {code}
> {code:java}
> local date-time = 0001-01-01T00:00 diff = -2909 minutes
> local date-time = 0100-02-28T14:00 diff = -1469 minutes
> local date-time = 0200-02-28T14:00 diff = -29 minutes
> local date-time = 0300-02-28T14:00 diff = 1410 minutes
> local date-time = 0500-02-28T14:00 diff = 2850 minutes
> local date-time = 0600-02-28T14:00 diff = 4290 minutes
> local date-time = 0700-02-28T14:00 diff = 5730 minutes
> local date-time = 0900-02-28T14:00 diff = 7170 minutes
> local date-time = 1000-02-28T14:00 diff = 8610 minutes
> local date-time = 1100-02-28T14:00 diff = 10050 minutes
> local date-time = 1300-02-28T14:00 diff = 11490 minutes
> local date-time = 1400-02-28T14:00 diff = 12930 minutes
> local date-time = 1500-02-28T14:00 diff = 14370 minutes
> local date-time = 1582-10-14T14:00 diff = -29 minutes
> local date-time = 1899-12-31T16:52:58 diff = 0 minutes
> local date-time = 1917-12-27T11:52:58 diff = 60 minutes
> local date-time = 1917-12-27T12:52:58 diff = 0 minutes
> local date-time = 1918-09-15T12:52:58 diff = 60 minutes
> local date-time = 1918-09-15T13:52:58 diff = 0 minutes
> local date-time = 1919-06-30T16:52:58 diff = 31 minutes
> local date-time = 1919-06-30T17:52:58 diff = 0 minutes
> local date-time = 1919-08-15T12:52:58 diff = 60 minutes
> local date-time = 1919-08-15T13:52:58 diff = 0 minutes
> local date-time = 1921-08-31T10:52:58 diff = 60 minutes
> local date-time = 1921-08-31T11:52:58 diff = 0 minutes
> local date-time = 1921-09-30T11:52:58 diff = 60 minutes
> local date-time = 1921-09-30T12:52:58 diff = 0 minutes
> local date-time = 1922-09-30T12:52:58 diff = 60 minutes
> local date-time = 1922-09-30T13:52:58 diff = 0 minutes
> local date-time = 1981-09-30T12:52:58 diff = 60 minutes
> local date-time = 1981-09-30T13:52:58 diff = 0 minutes
> local date-time = 1982-09-30T12:52:58 diff = 60 minutes
> local date-time = 1982-09-30T13:52:58 diff = 0 minutes
> local date-time = 1983-09-30T12:52:58 diff = 60 minutes
> local date-time = 1983-09-30T13:52:58 diff = 0 minutes
> local date-time = 1984-09-29T15:52:58 diff = 60 minutes
> local date-time = 1984-09-29T16:52:58 diff = 0 minutes
> local date-time = 1985-09-28T15:52:58 diff = 60 minutes
> local date-time = 1985-09-28T16:52:58 diff = 0 minutes
> local date-time = 1986-09-27T15:52:58 diff = 60 minutes
> local date-time = 1986-09-27T16:52:58 diff = 0 minutes
> local date-time = 1987-09-26T15:52:58 diff = 60 minutes
> local date-time = 1987-09-26T16:52:58 diff = 0 minutes
> local date-time = 1988-09-24T15:52:58 diff = 60 minutes
> local date-time = 1988-09-24T16:52:58 diff = 0 minutes
> local date-time = 1989-09-23T15:52:58 diff = 60 minutes
> local date-time = 1989-09-23T16:52:58 diff = 0 minutes
> local date-time = 1990-09-29T15:52:58 diff = 60 minutes
> local date-time = 1990-09-29T16:52:58 diff = 0 minutes
> local date-time = 1991-09-28T16:52:58 diff = 60 minutes
> local date-time = 1991-09-28T17:52:58 diff = 0 minutes
> local date-time = 1992-09-26T15:52:58 diff = 60 minutes
> local date-time = 1992-09-26T16:52:58 diff = 0 minutes
> local date-time = 1993-09-25T15:52:58 diff = 60 minutes
> local date-time = 1993-09-25T16:52:58 diff = 0

[jira] [Created] (SPARK-31297) Speed-up date-time rebasing

2020-03-29 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-31297:
--

 Summary: Speed-up date-time rebasing
 Key: SPARK-31297
 URL: https://issues.apache.org/jira/browse/SPARK-31297
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


I do believe it is possible to speed up date-time rebasing by building a map of 
micros to diffs between original and rebased micros. And look up at the map via 
binary search.

For example, the *America/Los_Angeles* time zone has less than 100 points when 
diff changes:
{code:scala}
  test("optimize rebasing") {
val start = instantToMicros(LocalDateTime.of(1, 1, 1, 0, 0, 0)
  .atZone(getZoneId("America/Los_Angeles"))
  .toInstant)
val end = instantToMicros(LocalDateTime.of(2030, 1, 1, 0, 0, 0)
  .atZone(getZoneId("America/Los_Angeles"))
  .toInstant)

var micros = start
var diff = Long.MaxValue
var counter = 0
while (micros < end) {
  val rebased = rebaseGregorianToJulianMicros(micros)
  val curDiff = rebased - micros
  if (curDiff != diff) {
counter += 1
diff = curDiff
val ldt = 
microsToInstant(micros).atZone(getZoneId("America/Los_Angeles")).toLocalDateTime
println(s"local date-time = $ldt diff = ${diff / MICROS_PER_MINUTE} 
minutes")
  }
  micros += MICROS_PER_HOUR
}
println(s"counter = $counter")
  }
{code}
{code:java}
local date-time = 0001-01-01T00:00 diff = -2909 minutes
local date-time = 0100-02-28T14:00 diff = -1469 minutes
local date-time = 0200-02-28T14:00 diff = -29 minutes
local date-time = 0300-02-28T14:00 diff = 1410 minutes
local date-time = 0500-02-28T14:00 diff = 2850 minutes
local date-time = 0600-02-28T14:00 diff = 4290 minutes
local date-time = 0700-02-28T14:00 diff = 5730 minutes
local date-time = 0900-02-28T14:00 diff = 7170 minutes
local date-time = 1000-02-28T14:00 diff = 8610 minutes
local date-time = 1100-02-28T14:00 diff = 10050 minutes
local date-time = 1300-02-28T14:00 diff = 11490 minutes
local date-time = 1400-02-28T14:00 diff = 12930 minutes
local date-time = 1500-02-28T14:00 diff = 14370 minutes
local date-time = 1582-10-14T14:00 diff = -29 minutes
local date-time = 1899-12-31T16:52:58 diff = 0 minutes
local date-time = 1917-12-27T11:52:58 diff = 60 minutes
local date-time = 1917-12-27T12:52:58 diff = 0 minutes
local date-time = 1918-09-15T12:52:58 diff = 60 minutes
local date-time = 1918-09-15T13:52:58 diff = 0 minutes
local date-time = 1919-06-30T16:52:58 diff = 31 minutes
local date-time = 1919-06-30T17:52:58 diff = 0 minutes
local date-time = 1919-08-15T12:52:58 diff = 60 minutes
local date-time = 1919-08-15T13:52:58 diff = 0 minutes
local date-time = 1921-08-31T10:52:58 diff = 60 minutes
local date-time = 1921-08-31T11:52:58 diff = 0 minutes
local date-time = 1921-09-30T11:52:58 diff = 60 minutes
local date-time = 1921-09-30T12:52:58 diff = 0 minutes
local date-time = 1922-09-30T12:52:58 diff = 60 minutes
local date-time = 1922-09-30T13:52:58 diff = 0 minutes
local date-time = 1981-09-30T12:52:58 diff = 60 minutes
local date-time = 1981-09-30T13:52:58 diff = 0 minutes
local date-time = 1982-09-30T12:52:58 diff = 60 minutes
local date-time = 1982-09-30T13:52:58 diff = 0 minutes
local date-time = 1983-09-30T12:52:58 diff = 60 minutes
local date-time = 1983-09-30T13:52:58 diff = 0 minutes
local date-time = 1984-09-29T15:52:58 diff = 60 minutes
local date-time = 1984-09-29T16:52:58 diff = 0 minutes
local date-time = 1985-09-28T15:52:58 diff = 60 minutes
local date-time = 1985-09-28T16:52:58 diff = 0 minutes
local date-time = 1986-09-27T15:52:58 diff = 60 minutes
local date-time = 1986-09-27T16:52:58 diff = 0 minutes
local date-time = 1987-09-26T15:52:58 diff = 60 minutes
local date-time = 1987-09-26T16:52:58 diff = 0 minutes
local date-time = 1988-09-24T15:52:58 diff = 60 minutes
local date-time = 1988-09-24T16:52:58 diff = 0 minutes
local date-time = 1989-09-23T15:52:58 diff = 60 minutes
local date-time = 1989-09-23T16:52:58 diff = 0 minutes
local date-time = 1990-09-29T15:52:58 diff = 60 minutes
local date-time = 1990-09-29T16:52:58 diff = 0 minutes
local date-time = 1991-09-28T16:52:58 diff = 60 minutes
local date-time = 1991-09-28T17:52:58 diff = 0 minutes
local date-time = 1992-09-26T15:52:58 diff = 60 minutes
local date-time = 1992-09-26T16:52:58 diff = 0 minutes
local date-time = 1993-09-25T15:52:58 diff = 60 minutes
local date-time = 1993-09-25T16:52:58 diff = 0 minutes
local date-time = 1994-09-24T15:52:58 diff = 60 minutes
local date-time = 1994-09-24T16:52:58 diff = 0 minutes
local date-time = 1995-09-23T15:52:58 diff = 60 minutes
local date-time = 1995-09-23T16:52:58 diff = 0 minutes
local date-time = 1996-10-26T15:52:58 diff = 60 minutes
local date-time = 1996-10-26T16:52:58 diff = 0 minutes
local date-time = 1997-10-25T15:52:58 diff = 60 minutes
local date-time = 1997-10-25T16:52:58 diff =

[jira] [Commented] (SPARK-30272) Remove usage of Guava that breaks in Guava 27

2020-03-29 Thread Jorge Machado (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070255#comment-17070255
 ] 

Jorge Machado commented on SPARK-30272:
---

So I was able to fix it. I build it with profile hadoop 3.2 but after the build 
the hadoop-azure.jar is missing so I added manually into my container and now 
it seems to load. 

I was trying to put guava 28 and remove the 14 but this is a lot of work... why 
do we use a old guava version ? 

> Remove usage of Guava that breaks in Guava 27
> -
>
> Key: SPARK-30272
> URL: https://issues.apache.org/jira/browse/SPARK-30272
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Major
> Fix For: 3.0.0
>
>
> Background:
> https://issues.apache.org/jira/browse/SPARK-29250
> https://github.com/apache/spark/pull/25932
> Hadoop 3.2.1 will update Guava from 11 to 27. There are a number of methods 
> that changed between those releases, typically just a rename, but, means one 
> set of code can't work with both, while we want to work with Hadoop 2.x and 
> 3.x. Among them:
> - Objects.toStringHelper was moved to MoreObjects; we can just use the 
> Commons Lang3 equivalent
> - Objects.hashCode etc were renamed; use java.util.Objects equivalents
> - MoreExecutors.sameThreadExecutor() became directExecutor(); for same-thread 
> execution we can use a dummy implementation of ExecutorService / Executor
> - TypeToken.isAssignableFrom become isSupertypeOf; work around with reflection
> There is probably more to the Guava issue than just this change, but it will 
> make Spark itself work with more versions and reduce our exposure to Guava 
> along the way anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31296) Benchmark date-time rebasing in Parquet datasource

2020-03-29 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-31296:
---
Summary: Benchmark date-time rebasing in Parquet datasource  (was: 
Benchmark date-time rebasing to/from Julian calendar)

> Benchmark date-time rebasing in Parquet datasource
> --
>
> Key: SPARK-31296
> URL: https://issues.apache.org/jira/browse/SPARK-31296
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> * Add benchmarks for saving dates/timestamps to parquet when 
> spark.sql.legacy.parquet.rebaseDateTime.enabled is set to true
> * Add bechmark for loading dates/timestamps from parquet when rebasing is on



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31296) Benchmark date-time rebasing to/from Julian calendar

2020-03-29 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-31296:
--

 Summary: Benchmark date-time rebasing to/from Julian calendar
 Key: SPARK-31296
 URL: https://issues.apache.org/jira/browse/SPARK-31296
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


* Add benchmarks for saving dates/timestamps to parquet when 
spark.sql.legacy.parquet.rebaseDateTime.enabled is set to true
* Add bechmark for loading dates/timestamps from parquet when rebasing is on



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30363) Add Documentation for Refresh Resources

2020-03-29 Thread Huaxin Gao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-30363:
---
Parent: SPARK-28588
Issue Type: Sub-task  (was: Improvement)

> Add Documentation for Refresh Resources
> ---
>
> Key: SPARK-30363
> URL: https://issues.apache.org/jira/browse/SPARK-30363
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Rakesh Raushan
>Assignee: Rakesh Raushan
>Priority: Minor
> Fix For: 3.0.0
>
>
> Refresh Resources is not documented in the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

38 matches

Mail list logo