[jira] [Created] (SPARK-33599) Group exception messages in catalyst/analysis

2020-11-29 Thread Allison Wang (Jira)
Allison Wang created SPARK-33599:


 Summary: Group exception messages in catalyst/analysis
 Key: SPARK-33599
 URL: https://issues.apache.org/jira/browse/SPARK-33599
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Allison Wang


Group all exception messages in `catalyst/analysis`.
|| Filename||   Count ||
| Analyzer.scala  |   1 |
| CheckAnalysis.scala |   1 |
| FunctionRegistry.scala  |   5 |
| ResolveCatalogs.scala   |   1 |
| ResolveHints.scala  |   1 |
| ResolveSessionCatalog.scala |  12 |
| package.scala   |   2 |
| unresolved.scala|  43 |



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33542) Group exception messages in catalyst/catalog

2020-11-29 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-33542:
-
Description: 
Group all exception messages in `catalyst/catalog`.
||Filename||Count||
|ExternalCatalog.scala|4|
|GlobalTempViewManager.scala|1|
|InMemoryCatalog.scala|18|
|SessionCatalog.scala|17|
|functionResources.scala|1|
|interface.scala|4|

> Group exception messages in catalyst/catalog
> 
>
> Key: SPARK-33542
> URL: https://issues.apache.org/jira/browse/SPARK-33542
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Allison Wang
>Priority: Major
>
> Group all exception messages in `catalyst/catalog`.
> ||Filename||Count||
> |ExternalCatalog.scala|4|
> |GlobalTempViewManager.scala|1|
> |InMemoryCatalog.scala|18|
> |SessionCatalog.scala|17|
> |functionResources.scala|1|
> |interface.scala|4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33597) Support REGEXP_LIKE for consistent with mainstream databases

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33597:


Assignee: Apache Spark

> Support REGEXP_LIKE for consistent with mainstream databases
> 
>
> Key: SPARK-33597
> URL: https://issues.apache.org/jira/browse/SPARK-33597
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> There are a lot of mainstream databases support regex function REGEXP_LIKE.
> Currently, Spark supports RLike and we just need add a new alias REGEXP_LIKE 
> for it.
> *Oracle*:https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Pattern-matching-Conditions.html#GUID-D2124F3A-C6E4-4CCA-A40E-2FFCABFD8E19
> *Presto*:https://prestodb.io/docs/current/functions/regexp.html
> *Vertica*:https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/RegularExpressions/REGEXP_LIKE.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CRegular%20Expression%20Functions%7C_5
> *Snowflake*:https://docs.snowflake.com/en/sql-reference/functions/regexp_like.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33597) Support REGEXP_LIKE for consistent with mainstream databases

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33597:


Assignee: (was: Apache Spark)

> Support REGEXP_LIKE for consistent with mainstream databases
> 
>
> Key: SPARK-33597
> URL: https://issues.apache.org/jira/browse/SPARK-33597
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> There are a lot of mainstream databases support regex function REGEXP_LIKE.
> Currently, Spark supports RLike and we just need add a new alias REGEXP_LIKE 
> for it.
> *Oracle*:https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Pattern-matching-Conditions.html#GUID-D2124F3A-C6E4-4CCA-A40E-2FFCABFD8E19
> *Presto*:https://prestodb.io/docs/current/functions/regexp.html
> *Vertica*:https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/RegularExpressions/REGEXP_LIKE.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CRegular%20Expression%20Functions%7C_5
> *Snowflake*:https://docs.snowflake.com/en/sql-reference/functions/regexp_like.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33597) Support REGEXP_LIKE for consistent with mainstream databases

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240530#comment-17240530
 ] 

Apache Spark commented on SPARK-33597:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/30543

> Support REGEXP_LIKE for consistent with mainstream databases
> 
>
> Key: SPARK-33597
> URL: https://issues.apache.org/jira/browse/SPARK-33597
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> There are a lot of mainstream databases support regex function REGEXP_LIKE.
> Currently, Spark supports RLike and we just need add a new alias REGEXP_LIKE 
> for it.
> *Oracle*:https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Pattern-matching-Conditions.html#GUID-D2124F3A-C6E4-4CCA-A40E-2FFCABFD8E19
> *Presto*:https://prestodb.io/docs/current/functions/regexp.html
> *Vertica*:https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/RegularExpressions/REGEXP_LIKE.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CRegular%20Expression%20Functions%7C_5
> *Snowflake*:https://docs.snowflake.com/en/sql-reference/functions/regexp_like.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33597) Support REGEXP_LIKE for consistent with mainstream databases

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240531#comment-17240531
 ] 

Apache Spark commented on SPARK-33597:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/30543

> Support REGEXP_LIKE for consistent with mainstream databases
> 
>
> Key: SPARK-33597
> URL: https://issues.apache.org/jira/browse/SPARK-33597
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> There are a lot of mainstream databases support regex function REGEXP_LIKE.
> Currently, Spark supports RLike and we just need add a new alias REGEXP_LIKE 
> for it.
> *Oracle*:https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Pattern-matching-Conditions.html#GUID-D2124F3A-C6E4-4CCA-A40E-2FFCABFD8E19
> *Presto*:https://prestodb.io/docs/current/functions/regexp.html
> *Vertica*:https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/RegularExpressions/REGEXP_LIKE.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CRegular%20Expression%20Functions%7C_5
> *Snowflake*:https://docs.snowflake.com/en/sql-reference/functions/regexp_like.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33598) Support Java Class with circular references

2020-11-29 Thread jacklzg (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jacklzg updated SPARK-33598:

Description: 
If the target Java data class has a circular reference, Spark will fail fast 
from creating the Dataset or running Encoders.

 

For example, with protobuf class, there is a reference with Descriptor, there 
is no way to build a dataset from the protobuf class.

>From this line

```
{quote}
{code:java}
Encoders.bean(ProtoBuffOuterClass.ProtoBuff.class);{code}
{quote}
```

It will throw out immediately

```
{quote}Exception in thread "main" java.lang.UnsupportedOperationException: 
Cannot have circular references in bean class, but got the circular reference 
of class class com.google.protobuf.Descriptors$Descriptor
{quote}
```

Can we add  a parameter, for example, 

```
{code:java}
Encoders.bean(Class clas, List fieldsToIgnore);{code}


or

```
{code:java}
Encoders.bean(Class clas, boolean skipCircularRefField);{code}


 

  was:
If the target Java data class has a circular reference, Spark will fail fast 
from creating the Dataset or running Encoders.

 

For example, with protobuf class, there is a reference with Descriptor, there 
is no way to build a dataset from the protobuf class.

>From this line

```

Encoders.bean(ProtoBuffOuterClass.ProtoBuff.class);

```

It will throw out immediately

```

Exception in thread "main" java.lang.UnsupportedOperationException: Cannot have 
circular references in bean class, but got the circular reference of class 
class com.google.protobuf.Descriptors$Descriptor

```

Can we add  a parameter, for example, 

```

Encoders.bean(Class clas, List fieldsToIgnore);



or

```

Encoders.bean(Class clas, boolean skipCircularRefField);



 


> Support Java Class with circular references
> ---
>
> Key: SPARK-33598
> URL: https://issues.apache.org/jira/browse/SPARK-33598
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API
>Affects Versions: 2.4.7
>Reporter: jacklzg
>Priority: Minor
>
> If the target Java data class has a circular reference, Spark will fail fast 
> from creating the Dataset or running Encoders.
>  
> For example, with protobuf class, there is a reference with Descriptor, there 
> is no way to build a dataset from the protobuf class.
> From this line
> ```
> {quote}
> {code:java}
> Encoders.bean(ProtoBuffOuterClass.ProtoBuff.class);{code}
> {quote}
> ```
> It will throw out immediately
> ```
> {quote}Exception in thread "main" java.lang.UnsupportedOperationException: 
> Cannot have circular references in bean class, but got the circular reference 
> of class class com.google.protobuf.Descriptors$Descriptor
> {quote}
> ```
> Can we add  a parameter, for example, 
> ```
> {code:java}
> Encoders.bean(Class clas, List fieldsToIgnore);{code}
> 
> or
> ```
> {code:java}
> Encoders.bean(Class clas, boolean skipCircularRefField);{code}
> 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33598) Support Java Class with circular references

2020-11-29 Thread jacklzg (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jacklzg updated SPARK-33598:

Description: 
If the target Java data class has a circular reference, Spark will fail fast 
from creating the Dataset or running Encoders.

 

For example, with protobuf class, there is a reference with Descriptor, there 
is no way to build a dataset from the protobuf class.

>From this line

{color:#7a869a}Encoders.bean(ProtoBuffOuterClass.ProtoBuff.class);{color}

 

It will throw out immediately

 
{quote}Exception in thread "main" java.lang.UnsupportedOperationException: 
Cannot have circular references in bean class, but got the circular reference 
of class class com.google.protobuf.Descriptors$Descriptor
{quote}
 

Can we add  a parameter, for example, 

 
{code:java}
Encoders.bean(Class clas, List fieldsToIgnore);{code}


or

 
{code:java}
Encoders.bean(Class clas, boolean skipCircularRefField);{code}
 

 

  was:
If the target Java data class has a circular reference, Spark will fail fast 
from creating the Dataset or running Encoders.

 

For example, with protobuf class, there is a reference with Descriptor, there 
is no way to build a dataset from the protobuf class.

>From this line

```
{quote}
{code:java}
Encoders.bean(ProtoBuffOuterClass.ProtoBuff.class);{code}
{quote}
```

It will throw out immediately

```
{quote}Exception in thread "main" java.lang.UnsupportedOperationException: 
Cannot have circular references in bean class, but got the circular reference 
of class class com.google.protobuf.Descriptors$Descriptor
{quote}
```

Can we add  a parameter, for example, 

```
{code:java}
Encoders.bean(Class clas, List fieldsToIgnore);{code}


or

```
{code:java}
Encoders.bean(Class clas, boolean skipCircularRefField);{code}


 


> Support Java Class with circular references
> ---
>
> Key: SPARK-33598
> URL: https://issues.apache.org/jira/browse/SPARK-33598
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API
>Affects Versions: 2.4.7
>Reporter: jacklzg
>Priority: Minor
>
> If the target Java data class has a circular reference, Spark will fail fast 
> from creating the Dataset or running Encoders.
>  
> For example, with protobuf class, there is a reference with Descriptor, there 
> is no way to build a dataset from the protobuf class.
> From this line
> {color:#7a869a}Encoders.bean(ProtoBuffOuterClass.ProtoBuff.class);{color}
>  
> It will throw out immediately
>  
> {quote}Exception in thread "main" java.lang.UnsupportedOperationException: 
> Cannot have circular references in bean class, but got the circular reference 
> of class class com.google.protobuf.Descriptors$Descriptor
> {quote}
>  
> Can we add  a parameter, for example, 
>  
> {code:java}
> Encoders.bean(Class clas, List fieldsToIgnore);{code}
> 
> or
>  
> {code:java}
> Encoders.bean(Class clas, boolean skipCircularRefField);{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33598) Support Java Class with circular references

2020-11-29 Thread jacklzg (Jira)
jacklzg created SPARK-33598:
---

 Summary: Support Java Class with circular references
 Key: SPARK-33598
 URL: https://issues.apache.org/jira/browse/SPARK-33598
 Project: Spark
  Issue Type: Improvement
  Components: Java API
Affects Versions: 2.4.7
Reporter: jacklzg


If the target Java data class has a circular reference, Spark will fail fast 
from creating the Dataset or running Encoders.

 

For example, with protobuf class, there is a reference with Descriptor, there 
is no way to build a dataset from the protobuf class.

>From this line

```

Encoders.bean(ProtoBuffOuterClass.ProtoBuff.class);

```

It will throw out immediately

```

Exception in thread "main" java.lang.UnsupportedOperationException: Cannot have 
circular references in bean class, but got the circular reference of class 
class com.google.protobuf.Descriptors$Descriptor

```

Can we add  a parameter, for example, 

```

Encoders.bean(Class clas, List fieldsToIgnore);



or

```

Encoders.bean(Class clas, boolean skipCircularRefField);



 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33597) Support REGEXP_LIKE for consistent with mainstream databases

2020-11-29 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-33597:
--

 Summary: Support REGEXP_LIKE for consistent with mainstream 
databases
 Key: SPARK-33597
 URL: https://issues.apache.org/jira/browse/SPARK-33597
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.1.0
Reporter: jiaan.geng


There are a lot of mainstream databases support regex function REGEXP_LIKE.
Currently, Spark supports RLike and we just need add a new alias REGEXP_LIKE 
for it.
*Oracle*:https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Pattern-matching-Conditions.html#GUID-D2124F3A-C6E4-4CCA-A40E-2FFCABFD8E19
*Presto*:https://prestodb.io/docs/current/functions/regexp.html
*Vertica*:https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/RegularExpressions/REGEXP_LIKE.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CRegular%20Expression%20Functions%7C_5
*Snowflake*:https://docs.snowflake.com/en/sql-reference/functions/regexp_like.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33596) NPE when there is no EventTime

2020-11-29 Thread Genmao Yu (Jira)
Genmao Yu created SPARK-33596:
-

 Summary: NPE when there is no EventTime
 Key: SPARK-33596
 URL: https://issues.apache.org/jira/browse/SPARK-33596
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.1.0
Reporter: Genmao Yu


We parse the process timestamp at 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala#L153,
 but will throw NPE when there is no event time metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28325) Support ANSI SQL:SIMILAR TO ... ESCAPE syntax

2020-11-29 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-28325:
---
Description: 
{code:java}
 ::=
 
 ::=
[ NOT ] SIMILAR TO  [ ESCAPE  ]
 ::=

 ::=

|   
 ::=

|  
 ::=

|  
|  
|  
|  
 ::=
  [  ] 
 ::=
 [  ]
 ::=

 ::=

 ::=

| 
| 
|   
 ::=

| 
 ::=
!! See the Syntax Rules.
494 Foundation (SQL/Foundation)
CD 9075-2:201?(E)
8.6 
 ::=
!! See the Syntax Rules.
 ::=

|  ... 
|   ... 
|  ...
 ... 
 ::=

 ::=

 ::=

|   
| 
 ::=
{code}
 

 Examples:
{code}
SELECT 'abc' RLIKE '%(b|d)%';      // false
SELECT 'abc' SIMILAR TO '%(b|d)%'   // true
SELECT 'abc' RLIKE '(b|c)%';  // false
SELECT 'abc' SIMILAR TO '(b|c)%'; // false{code}
 

Currently, the following DBMSs support the syntax:
 * 
PostgreSQL:[https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP]
 * Redshift: 
[https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions-similar-to.html]
 * 
teradata:[https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/fwqgzZuhAvOXLKUu0kUfJQ]

  was:
{code:java}
 ::=
 
 ::=
[ NOT ] SIMILAR TO  [ ESCAPE  ]
 ::=

 ::=

|   
 ::=

|  
 ::=

|  
|  
|  
|  
 ::=
  [  ] 
 ::=
 [  ]
 ::=

 ::=

 ::=

| 
| 
|   
 ::=

| 
 ::=
!! See the Syntax Rules.
494 Foundation (SQL/Foundation)
CD 9075-2:201?(E)
8.6 
 ::=
!! See the Syntax Rules.
 ::=

|  ... 
|   ... 
|  ...
 ... 
 ::=

 ::=

 ::=

|   
| 
 ::=
{code}
 

 Examples:
{code}
SELECT 'abc' RLIKE '%(b|d)%';      // false
SELECT 'abc' SIMILAR TO '%(b|d)%'   // true
SELECT 'abc' RLIKE '(b|c)%';  // false
SELECT 'abc' SIMILAR TO '(b|c)%'; // false{code}
 

Currently, the following DBMSs support the syntax:
 * 
PostgreSQL:[https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP]
 * Redshift: 
[https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions-similar-to.html]


> Support ANSI SQL:SIMILAR TO ... ESCAPE syntax
> -
>
> Key: SPARK-28325
> URL: https://issues.apache.org/jira/browse/SPARK-28325
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> {code:java}
>  ::=
>  
>  ::=
> [ NOT ] SIMILAR TO  [ ESCAPE  ]
>  ::=
> 
>  ::=
> 
> |   
>  ::=
> 
> |  
>  ::=
> 
> |  
> |  
> |  
> |  
>  ::=
>   [  ] 
>  ::=
>  [  ]
>  ::=
> 
>  ::=
> 
>  ::=
> 
> | 
> | 
> |   
>  ::=
> 
> | 
>  ::=
> !! See the Syntax Rules.
> 494 Foundation (SQL/Foundation)
> CD 9075-2:201?(E)
> 8.6 
>  ::=
> !! See the Syntax Rules.
>  ::=
> 
> |  ... 
> |   ... 
> |  ...
>  ... 
>  ::=
> 
>  ::=
> 
>  ::=
> 
> |   
> |  bracket>
>  ::=
> {code}
>  
>  Examples:
> {code}
> SELECT 'abc' RLIKE '%(b|d)%';      // false
> SELECT 'abc' SIMILAR TO '%(b|d)%'   // true
> SELECT 'abc' RLIKE '(b|c)%';  // false
> SELECT 'abc' SIMILAR TO '(b|c)%'; // false{code}
>  
> Currently, the following DBMSs support the syntax:
>  * 
> PostgreSQL:[https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-SIMILARTO-REGEXP]
>  * Redshift: 
> [https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions-similar-to.html]
>  * 
> teradata:[https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/fwqgzZuhAvOXLKUu0kUfJQ]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28506) not handling usage of group function and window function at some conditions

2020-11-29 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240505#comment-17240505
 ] 

jiaan.geng commented on SPARK-28506:


I runed the similar sql show below:

{code:java}
SELECT
rank() OVER (ORDER BY salary),
count(*)
FROM
basic_pays
GROUP BY 1
> ERROR:  window functions are not allowed in GROUP BY
  LINE 2: rank() OVER (ORDER BY salary),
  ^
  
> Time: 0.011s
{code}
It seems isn't consistent with your description.


> not handling usage of group function and window function at some conditions
> ---
>
> Key: SPARK-28506
> URL: https://issues.apache.org/jira/browse/SPARK-28506
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Hi,
> looks like SparkSQL is not able to handle this query:
> {code:sql}SELECT rank() OVER (ORDER BY 1), count(*) FROM empsalary GROUP BY 
> 1;{code}
> PgSQL, on the other hand, does.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33595) Run PySpark coverage only in the master branch

2020-11-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33595.
--
Resolution: Invalid

Resolved by setting Jenkins environment variables.

> Run PySpark coverage only in the master branch
> --
>
> Key: SPARK-33595
> URL: https://issues.apache.org/jira/browse/SPARK-33595
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currently PySpark test coverage runs in branch-3.0 
> (https://github.com/apache/spark/pull/23117#issuecomment-735557536). We 
> should only run this in the master branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29638) Spark handles 'NaN' as 0 in sums

2020-11-29 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240498#comment-17240498
 ] 

jiaan.geng edited comment on SPARK-29638 at 11/30/20, 6:18 AM:
---

I runed the sql below in PgSQL
{code:java}
SELECT a, b,
   SUM(b) OVER(ORDER BY A ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
FROM (VALUES(1,1),(2,2),(3,(cast('NaN' as int))),(4,3),(5,4)) t(a,b)
> ERROR:  invalid input syntax for type integer: "NaN"
  LINE 3: FROM (VALUES(1,1),(2,2),(3,(cast('NaN' as int))),(4,3),(5,4)...
   ^
  
> Time: 0.011s
{code}
[~DylanGuedes] Could you tell me more ?


was (Author: beliefer):

{code:java}
SELECT a, b,
   SUM(b) OVER(ORDER BY A ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
FROM (VALUES(1,1),(2,2),(3,(cast('NaN' as int))),(4,3),(5,4)) t(a,b)
> ERROR:  invalid input syntax for type integer: "NaN"
  LINE 3: FROM (VALUES(1,1),(2,2),(3,(cast('NaN' as int))),(4,3),(5,4)...
   ^
  
> Time: 0.011s
{code}
[~DylanGuedes] Could you tell me more ?

> Spark handles 'NaN' as 0 in sums
> 
>
> Key: SPARK-29638
> URL: https://issues.apache.org/jira/browse/SPARK-29638
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently, Spark handles 'NaN' as 0 in window functions, such that 3+'NaN'=3. 
> PgSQL, on the other hand, handles the entire result as 'NaN', as in 3+'NaN' = 
> 'NaN'
> I experienced this with the query below:
> {code:sql}
> SELECT a, b,
>SUM(b) OVER(ORDER BY A ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
> FROM (VALUES(1,1),(2,2),(3,(cast('nan' as int))),(4,3),(5,4)) t(a,b);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29638) Spark handles 'NaN' as 0 in sums

2020-11-29 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240498#comment-17240498
 ] 

jiaan.geng commented on SPARK-29638:



{code:java}
SELECT a, b,
   SUM(b) OVER(ORDER BY A ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
FROM (VALUES(1,1),(2,2),(3,(cast('NaN' as int))),(4,3),(5,4)) t(a,b)
> ERROR:  invalid input syntax for type integer: "NaN"
  LINE 3: FROM (VALUES(1,1),(2,2),(3,(cast('NaN' as int))),(4,3),(5,4)...
   ^
  
> Time: 0.011s
{code}
[~DylanGuedes] Could you tell me more ?

> Spark handles 'NaN' as 0 in sums
> 
>
> Key: SPARK-29638
> URL: https://issues.apache.org/jira/browse/SPARK-29638
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently, Spark handles 'NaN' as 0 in window functions, such that 3+'NaN'=3. 
> PgSQL, on the other hand, handles the entire result as 'NaN', as in 3+'NaN' = 
> 'NaN'
> I experienced this with the query below:
> {code:sql}
> SELECT a, b,
>SUM(b) OVER(ORDER BY A ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
> FROM (VALUES(1,1),(2,2),(3,(cast('nan' as int))),(4,3),(5,4)) t(a,b);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33595) Run PySpark coverage only in the master branch

2020-11-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33595:
-
Issue Type: Test  (was: Improvement)

> Run PySpark coverage only in the master branch
> --
>
> Key: SPARK-33595
> URL: https://issues.apache.org/jira/browse/SPARK-33595
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, PySpark
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Currently PySpark test coverage runs in branch-3.0 
> (https://github.com/apache/spark/pull/23117#issuecomment-735557536). We 
> should only run this in the master branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33595) Run PySpark coverage only in the master branch

2020-11-29 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-33595:


 Summary: Run PySpark coverage only in the master branch
 Key: SPARK-33595
 URL: https://issues.apache.org/jira/browse/SPARK-33595
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra, PySpark
Affects Versions: 3.0.1, 3.1.0
Reporter: Hyukjin Kwon


Currently PySpark test coverage runs in branch-3.0 
(https://github.com/apache/spark/pull/23117#issuecomment-735557536). We should 
only run this in the master branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33448) Support CACHE/UNCACHE TABLE for v2 tables

2020-11-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33448.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30403
[https://github.com/apache/spark/pull/30403]

> Support CACHE/UNCACHE TABLE for v2 tables
> -
>
> Key: SPARK-33448
> URL: https://issues.apache.org/jira/browse/SPARK-33448
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.1.0
>
>
> Migrate CACHE/UNCACHE TABLE to new resolution framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33448) Support CACHE/UNCACHE TABLE for v2 tables

2020-11-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33448:
---

Assignee: Terry Kim

> Support CACHE/UNCACHE TABLE for v2 tables
> -
>
> Key: SPARK-33448
> URL: https://issues.apache.org/jira/browse/SPARK-33448
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
>
> Migrate CACHE/UNCACHE TABLE to new resolution framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32976) Support column list in INSERT statement

2020-11-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-32976.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29893
[https://github.com/apache/spark/pull/29893]

> Support column list in INSERT statement
> ---
>
> Key: SPARK-32976
> URL: https://issues.apache.org/jira/browse/SPARK-32976
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.0
>
>
> INSERT currently does not support named column lists.  
> {{INSERT INTO  (col1, col2,…) VALUES( 'val1', 'val2', … )}}
> Note, we assume the column list contains all the column names. Issue an 
> exception if the list is not complete. The column order could be different 
> from the column order defined in the table definition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32976) Support column list in INSERT statement

2020-11-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-32976:
---

Assignee: Kent Yao

> Support column list in INSERT statement
> ---
>
> Key: SPARK-32976
> URL: https://issues.apache.org/jira/browse/SPARK-32976
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Assignee: Kent Yao
>Priority: Major
>
> INSERT currently does not support named column lists.  
> {{INSERT INTO  (col1, col2,…) VALUES( 'val1', 'val2', … )}}
> Note, we assume the column list contains all the column names. Issue an 
> exception if the list is not complete. The column order could be different 
> from the column order defined in the table definition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33567) DSv2: Use callback instead of passing Spark session and v2 relation for refreshing cache

2020-11-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33567:
---

Assignee: Chao Sun

> DSv2: Use callback instead of passing Spark session and v2 relation for 
> refreshing cache
> 
>
> Key: SPARK-33567
> URL: https://issues.apache.org/jira/browse/SPARK-33567
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> As discussed [https://github.com/apache/spark/pull/30429], it's better to not 
> pass Spark session and DataSourceV2Relation through Spark plans. Instead we 
> can use a callback which makes the interface cleaner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33567) DSv2: Use callback instead of passing Spark session and v2 relation for refreshing cache

2020-11-29 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33567.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30491
[https://github.com/apache/spark/pull/30491]

> DSv2: Use callback instead of passing Spark session and v2 relation for 
> refreshing cache
> 
>
> Key: SPARK-33567
> URL: https://issues.apache.org/jira/browse/SPARK-33567
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.1.0
>
>
> As discussed [https://github.com/apache/spark/pull/30429], it's better to not 
> pass Spark session and DataSourceV2Relation through Spark plans. Instead we 
> can use a callback which makes the interface cleaner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33592) Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading

2020-11-29 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu updated SPARK-33592:
---
Summary: Pyspark ML Validator params in estimatorParamMaps may be lost 
after saving and reloading  (was: Pyspark ML Validator writer may lost params 
in estimatorParamMaps)

> Pyspark ML Validator params in estimatorParamMaps may be lost after saving 
> and reloading
> 
>
> Key: SPARK-33592
> URL: https://issues.apache.org/jira/browse/SPARK-33592
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>
> Two typical cases to reproduce it:
> (1)
> {code:python}
> tokenizer = Tokenizer(inputCol="text", outputCol="words")
> hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
> lr = LogisticRegression()
> pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
> paramGrid = ParamGridBuilder() \
> .addGrid(hashingTF.numFeatures, [10, 100]) \
> .addGrid(lr.maxIter, [100, 200]) \
> .build()
> tvs = TrainValidationSplit(estimator=pipeline,
>estimatorParamMaps=paramGrid,
>evaluator=MulticlassClassificationEvaluator())
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
> `hashingTF.numFeatures` and `lr.maxIter` are lost.
> (2)
> {code:python}
> lr = LogisticRegression()
> ova = OneVsRest(classifier=lr)
> grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
> evaluator = MulticlassClassificationEvaluator()
> tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
> evaluator=evaluator)
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
> params`lr.maxIter` are lost.
> Both CrossValidator and TrainValidationSplit in Pyspark has this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33594) Forbid binary type as partition column

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33594:


Assignee: (was: Apache Spark)

> Forbid binary type as partition column
> --
>
> Key: SPARK-33594
> URL: https://issues.apache.org/jira/browse/SPARK-33594
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: angerszhu
>Priority: Major
>
> Forbid binary type as partition column



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33594) Forbid binary type as partition column

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33594:


Assignee: Apache Spark

> Forbid binary type as partition column
> --
>
> Key: SPARK-33594
> URL: https://issues.apache.org/jira/browse/SPARK-33594
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> Forbid binary type as partition column



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33594) Forbid binary type as partition column

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240467#comment-17240467
 ] 

Apache Spark commented on SPARK-33594:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/30542

> Forbid binary type as partition column
> --
>
> Key: SPARK-33594
> URL: https://issues.apache.org/jira/browse/SPARK-33594
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: angerszhu
>Priority: Major
>
> Forbid binary type as partition column



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33594) Forbid binary type as partition column

2020-11-29 Thread angerszhu (Jira)
angerszhu created SPARK-33594:
-

 Summary: Forbid binary type as partition column
 Key: SPARK-33594
 URL: https://issues.apache.org/jira/browse/SPARK-33594
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: angerszhu


Forbid binary type as partition column



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28646) Allow usage of `count` only for parameterless aggregate function

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28646:


Assignee: Apache Spark

> Allow usage of `count` only for parameterless aggregate function
> 
>
> Key: SPARK-28646
> URL: https://issues.apache.org/jira/browse/SPARK-28646
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Dylan Guedes
>Assignee: Apache Spark
>Priority: Major
>
> Currently, Spark allows calls to `count` even for non parameterless aggregate 
> function. For example, the following query actually works:
> {code:sql}SELECT count() OVER () FROM tenk1;{code}
> In PgSQL, on the other hand, the following error is thrown:
> {code:sql}ERROR:  count(*) must be used to call a parameterless aggregate 
> function{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28646) Allow usage of `count` only for parameterless aggregate function

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240455#comment-17240455
 ] 

Apache Spark commented on SPARK-28646:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/30541

> Allow usage of `count` only for parameterless aggregate function
> 
>
> Key: SPARK-28646
> URL: https://issues.apache.org/jira/browse/SPARK-28646
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently, Spark allows calls to `count` even for non parameterless aggregate 
> function. For example, the following query actually works:
> {code:sql}SELECT count() OVER () FROM tenk1;{code}
> In PgSQL, on the other hand, the following error is thrown:
> {code:sql}ERROR:  count(*) must be used to call a parameterless aggregate 
> function{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-11-29 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240456#comment-17240456
 ] 

Hyukjin Kwon commented on SPARK-33571:
--

cc [~maxgekk] FYI

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog post linked there and looking at the Spark code.
> From our testing we're seeing several issues:
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` which contain timestamps before 
> the above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compares to that dataframe in Spark 2.4.5
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` or `DateType` which contain 
> dates or timestamps before the above mentioned moments in time with 
> `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the 
> dataframe as when using `CORRECTED`, so it seems like no rebasing is 
> happening.
> I've made some scripts to help with testing/show the behavior, it uses 
> pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here 
> [https://github.com/simonvanderveldt/spark3-rebasemode-issue]. I'll post the 
> outputs in a comment below as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28646) Allow usage of `count` only for parameterless aggregate function

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28646:


Assignee: Apache Spark

> Allow usage of `count` only for parameterless aggregate function
> 
>
> Key: SPARK-28646
> URL: https://issues.apache.org/jira/browse/SPARK-28646
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Dylan Guedes
>Assignee: Apache Spark
>Priority: Major
>
> Currently, Spark allows calls to `count` even for non parameterless aggregate 
> function. For example, the following query actually works:
> {code:sql}SELECT count() OVER () FROM tenk1;{code}
> In PgSQL, on the other hand, the following error is thrown:
> {code:sql}ERROR:  count(*) must be used to call a parameterless aggregate 
> function{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28646) Allow usage of `count` only for parameterless aggregate function

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28646:


Assignee: (was: Apache Spark)

> Allow usage of `count` only for parameterless aggregate function
> 
>
> Key: SPARK-28646
> URL: https://issues.apache.org/jira/browse/SPARK-28646
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently, Spark allows calls to `count` even for non parameterless aggregate 
> function. For example, the following query actually works:
> {code:sql}SELECT count() OVER () FROM tenk1;{code}
> In PgSQL, on the other hand, the following error is thrown:
> {code:sql}ERROR:  count(*) must be used to call a parameterless aggregate 
> function{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33593) Parquet vector reader incorrect with binary partition value

2020-11-29 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240454#comment-17240454
 ] 

angerszhu commented on SPARK-33593:
---

raise a pr soon

> Parquet vector reader incorrect with binary partition value
> ---
>
> Key: SPARK-33593
> URL: https://issues.apache.org/jira/browse/SPARK-33593
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: angerszhu
>Priority: Major
>
> {code:java}
> test("Parquet vector reader incorrect with binary partition value") {
>   Seq(false, true).foreach(tag => {
> withSQLConf("spark.sql.parquet.enableVectorizedReader" -> tag.toString) {
>   withTable("t1") {
> sql(
>   """CREATE TABLE t1(name STRING, id BINARY, part BINARY)
> | USING PARQUET PARTITIONED BY (part)""".stripMargin)
> sql(s"INSERT INTO t1 PARTITION(part = 'Spark SQL') VALUES('a', 
> X'537061726B2053514C')")
> if (tag) {
>   checkAnswer(sql("SELECT name, cast(id as string), cast(part as 
> string) FROM t1"),
> Row("a", "Spark SQL", ""))
> } else {
>   checkAnswer(sql("SELECT name, cast(id as string), cast(part as 
> string) FROM t1"),
> Row("a", "Spark SQL", "Spark SQL"))
> }
>   }
> }
>   })
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33593) Parquet vector reader incorrect with binary partition value

2020-11-29 Thread angerszhu (Jira)
angerszhu created SPARK-33593:
-

 Summary: Parquet vector reader incorrect with binary partition 
value
 Key: SPARK-33593
 URL: https://issues.apache.org/jira/browse/SPARK-33593
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: angerszhu


{code:java}
test("Parquet vector reader incorrect with binary partition value") {
  Seq(false, true).foreach(tag => {
withSQLConf("spark.sql.parquet.enableVectorizedReader" -> tag.toString) {
  withTable("t1") {
sql(
  """CREATE TABLE t1(name STRING, id BINARY, part BINARY)
| USING PARQUET PARTITIONED BY (part)""".stripMargin)
sql(s"INSERT INTO t1 PARTITION(part = 'Spark SQL') VALUES('a', 
X'537061726B2053514C')")
if (tag) {
  checkAnswer(sql("SELECT name, cast(id as string), cast(part as 
string) FROM t1"),
Row("a", "Spark SQL", ""))
} else {
  checkAnswer(sql("SELECT name, cast(id as string), cast(part as 
string) FROM t1"),
Row("a", "Spark SQL", "Spark SQL"))
}
  }
}
  })
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33572) Datetime building should fail if the year, month, ..., second combination is invalid

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240453#comment-17240453
 ] 

Apache Spark commented on SPARK-33572:
--

User 'waitinfuture' has created a pull request for this issue:
https://github.com/apache/spark/pull/30516

> Datetime building should fail if the year, month, ..., second combination is 
> invalid
> 
>
> Key: SPARK-33572
> URL: https://issues.apache.org/jira/browse/SPARK-33572
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: zhoukeyong
>Priority: Major
>
> Datetime building should fail if the year, month, ..., second combination is 
> invalid, when ANSI mode is enabled. This patch should update MakeDate, 
> MakeTimestamp and MakeInterval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33572) Datetime building should fail if the year, month, ..., second combination is invalid

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33572:


Assignee: (was: Apache Spark)

> Datetime building should fail if the year, month, ..., second combination is 
> invalid
> 
>
> Key: SPARK-33572
> URL: https://issues.apache.org/jira/browse/SPARK-33572
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: zhoukeyong
>Priority: Major
>
> Datetime building should fail if the year, month, ..., second combination is 
> invalid, when ANSI mode is enabled. This patch should update MakeDate, 
> MakeTimestamp and MakeInterval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33572) Datetime building should fail if the year, month, ..., second combination is invalid

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240452#comment-17240452
 ] 

Apache Spark commented on SPARK-33572:
--

User 'waitinfuture' has created a pull request for this issue:
https://github.com/apache/spark/pull/30516

> Datetime building should fail if the year, month, ..., second combination is 
> invalid
> 
>
> Key: SPARK-33572
> URL: https://issues.apache.org/jira/browse/SPARK-33572
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: zhoukeyong
>Priority: Major
>
> Datetime building should fail if the year, month, ..., second combination is 
> invalid, when ANSI mode is enabled. This patch should update MakeDate, 
> MakeTimestamp and MakeInterval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33572) Datetime building should fail if the year, month, ..., second combination is invalid

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33572:


Assignee: Apache Spark

> Datetime building should fail if the year, month, ..., second combination is 
> invalid
> 
>
> Key: SPARK-33572
> URL: https://issues.apache.org/jira/browse/SPARK-33572
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: zhoukeyong
>Assignee: Apache Spark
>Priority: Major
>
> Datetime building should fail if the year, month, ..., second combination is 
> invalid, when ANSI mode is enabled. This patch should update MakeDate, 
> MakeTimestamp and MakeInterval.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-33498) Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33498:
-
Comment: was deleted

(was: User 'leanken' has created a pull request for this issue:
https://github.com/apache/spark/pull/30540)

> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid
> --
>
> Key: SPARK-33498
> URL: https://issues.apache.org/jira/browse/SPARK-33498
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Leanken.Lin
>Assignee: Leanken.Lin
>Priority: Major
> Fix For: 3.1.0
>
>
> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid, when ANSI mode is enable. This patch should update 
> GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33576) PythonException: An exception was thrown from a UDF: 'OSError: Invalid IPC message: negative bodyLength'.

2020-11-29 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240451#comment-17240451
 ] 

Hyukjin Kwon commented on SPARK-33576:
--

- Are you able to provide the full reproducer with smaller data?
- Does that happen consistently which ever you code run that uses pandas / 
Arrow?
- If this is indeterministically reproduced, Is it dependent on the codes or 
data?

> PythonException: An exception was thrown from a UDF: 'OSError: Invalid IPC 
> message: negative bodyLength'.
> -
>
> Key: SPARK-33576
> URL: https://issues.apache.org/jira/browse/SPARK-33576
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.1
> Environment: Databricks runtime 7.3
> Spakr 3.0.1
> Scala 2.12
>Reporter: Darshat
>Priority: Major
>
> Hello,
> We are using Databricks on Azure to process large amount of ecommerce data. 
> Databricks runtime is 7.3 which includes Apache spark 3.0.1 and Scala 2.12.
> During processing, there is a groupby operation on the DataFrame that 
> consistently gets an exception of this type:
>  
> {color:#ff}PythonException: An exception was thrown from a UDF: 'OSError: 
> Invalid IPC message: negative bodyLength'. Full traceback below: Traceback 
> (most recent call last): File "/databricks/spark/python/pyspark/worker.py", 
> line 654, in main process() File 
> "/databricks/spark/python/pyspark/worker.py", line 646, in process 
> serializer.dump_stream(out_iter, outfile) File 
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 281, in 
> dump_stream timely_flush_timeout_ms=self.timely_flush_timeout_ms) File 
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 97, in 
> dump_stream for batch in iterator: File 
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 271, in 
> init_stream_yield_batches for series in iterator: File 
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 287, in 
> load_stream for batch in batches: File 
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 228, in 
> load_stream for batch in batches: File 
> "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 118, in 
> load_stream for batch in reader: File "pyarrow/ipc.pxi", line 412, in 
> __iter__ File "pyarrow/ipc.pxi", line 432, in 
> pyarrow.lib._CRecordBatchReader.read_next_batch File "pyarrow/error.pxi", 
> line 99, in pyarrow.lib.check_status OSError: Invalid IPC message: negative 
> bodyLength{color}
>  
> Code that causes this:
> {color:#ff}x = df.groupby('providerid').apply(domain_features){color}
> {color:#ff}display(x.info()){color}
> Dataframe size - 22 million rows, 31 columns
>  One of the columns is a string ('providerid') on which we do a groupby 
> followed by an apply  operation. There are 3 distinct provider ids in this 
> set. While trying to enumerate/count the results, we get this exception.
> We've put all possible checks in the code for null values, or corrupt data 
> and we are not able to track this to application level code. I hope we can 
> get some help troubleshooting this as this is a blocker for rolling out at 
> scale.
> The cluster has 8 nodes + driver, all 28GB RAM. I can provide any other 
> settings that could be useful. 
>  Hope to get some insights into the problem. 
> Thanks,
> Darshat Shah



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33498) Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240449#comment-17240449
 ] 

Apache Spark commented on SPARK-33498:
--

User 'leanken' has created a pull request for this issue:
https://github.com/apache/spark/pull/30540

> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid
> --
>
> Key: SPARK-33498
> URL: https://issues.apache.org/jira/browse/SPARK-33498
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Leanken.Lin
>Assignee: Leanken.Lin
>Priority: Major
> Fix For: 3.1.0
>
>
> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid, when ANSI mode is enable. This patch should update 
> GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33592) Pyspark ML Validator writer may lost params in estimatorParamMaps

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240435#comment-17240435
 ] 

Apache Spark commented on SPARK-33592:
--

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/30539

> Pyspark ML Validator writer may lost params in estimatorParamMaps
> -
>
> Key: SPARK-33592
> URL: https://issues.apache.org/jira/browse/SPARK-33592
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>
> Two typical cases to reproduce it:
> (1)
> {code:python}
> tokenizer = Tokenizer(inputCol="text", outputCol="words")
> hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
> lr = LogisticRegression()
> pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
> paramGrid = ParamGridBuilder() \
> .addGrid(hashingTF.numFeatures, [10, 100]) \
> .addGrid(lr.maxIter, [100, 200]) \
> .build()
> tvs = TrainValidationSplit(estimator=pipeline,
>estimatorParamMaps=paramGrid,
>evaluator=MulticlassClassificationEvaluator())
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
> `hashingTF.numFeatures` and `lr.maxIter` are lost.
> (2)
> {code:python}
> lr = LogisticRegression()
> ova = OneVsRest(classifier=lr)
> grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
> evaluator = MulticlassClassificationEvaluator()
> tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
> evaluator=evaluator)
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
> params`lr.maxIter` are lost.
> Both CrossValidator and TrainValidationSplit in Pyspark has this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33592) Pyspark ML Validator writer may lost params in estimatorParamMaps

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240436#comment-17240436
 ] 

Apache Spark commented on SPARK-33592:
--

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/30539

> Pyspark ML Validator writer may lost params in estimatorParamMaps
> -
>
> Key: SPARK-33592
> URL: https://issues.apache.org/jira/browse/SPARK-33592
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>
> Two typical cases to reproduce it:
> (1)
> {code:python}
> tokenizer = Tokenizer(inputCol="text", outputCol="words")
> hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
> lr = LogisticRegression()
> pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
> paramGrid = ParamGridBuilder() \
> .addGrid(hashingTF.numFeatures, [10, 100]) \
> .addGrid(lr.maxIter, [100, 200]) \
> .build()
> tvs = TrainValidationSplit(estimator=pipeline,
>estimatorParamMaps=paramGrid,
>evaluator=MulticlassClassificationEvaluator())
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
> `hashingTF.numFeatures` and `lr.maxIter` are lost.
> (2)
> {code:python}
> lr = LogisticRegression()
> ova = OneVsRest(classifier=lr)
> grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
> evaluator = MulticlassClassificationEvaluator()
> tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
> evaluator=evaluator)
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
> params`lr.maxIter` are lost.
> Both CrossValidator and TrainValidationSplit in Pyspark has this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33592) Pyspark ML Validator writer may lost params in estimatorParamMaps

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33592:


Assignee: Weichen Xu  (was: Apache Spark)

> Pyspark ML Validator writer may lost params in estimatorParamMaps
> -
>
> Key: SPARK-33592
> URL: https://issues.apache.org/jira/browse/SPARK-33592
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>
> Two typical cases to reproduce it:
> (1)
> {code:python}
> tokenizer = Tokenizer(inputCol="text", outputCol="words")
> hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
> lr = LogisticRegression()
> pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
> paramGrid = ParamGridBuilder() \
> .addGrid(hashingTF.numFeatures, [10, 100]) \
> .addGrid(lr.maxIter, [100, 200]) \
> .build()
> tvs = TrainValidationSplit(estimator=pipeline,
>estimatorParamMaps=paramGrid,
>evaluator=MulticlassClassificationEvaluator())
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
> `hashingTF.numFeatures` and `lr.maxIter` are lost.
> (2)
> {code:python}
> lr = LogisticRegression()
> ova = OneVsRest(classifier=lr)
> grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
> evaluator = MulticlassClassificationEvaluator()
> tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
> evaluator=evaluator)
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
> params`lr.maxIter` are lost.
> Both CrossValidator and TrainValidationSplit in Pyspark has this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33592) Pyspark ML Validator writer may lost params in estimatorParamMaps

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33592:


Assignee: Apache Spark  (was: Weichen Xu)

> Pyspark ML Validator writer may lost params in estimatorParamMaps
> -
>
> Key: SPARK-33592
> URL: https://issues.apache.org/jira/browse/SPARK-33592
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Weichen Xu
>Assignee: Apache Spark
>Priority: Major
>
> Two typical cases to reproduce it:
> (1)
> {code:python}
> tokenizer = Tokenizer(inputCol="text", outputCol="words")
> hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
> lr = LogisticRegression()
> pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
> paramGrid = ParamGridBuilder() \
> .addGrid(hashingTF.numFeatures, [10, 100]) \
> .addGrid(lr.maxIter, [100, 200]) \
> .build()
> tvs = TrainValidationSplit(estimator=pipeline,
>estimatorParamMaps=paramGrid,
>evaluator=MulticlassClassificationEvaluator())
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
> `hashingTF.numFeatures` and `lr.maxIter` are lost.
> (2)
> {code:python}
> lr = LogisticRegression()
> ova = OneVsRest(classifier=lr)
> grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
> evaluator = MulticlassClassificationEvaluator()
> tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
> evaluator=evaluator)
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
> params`lr.maxIter` are lost.
> Both CrossValidator and TrainValidationSplit in Pyspark has this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33582) Partition predicate pushdown into Hive metastore support not-equals

2020-11-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33582.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30534
[https://github.com/apache/spark/pull/30534]

> Partition predicate pushdown into Hive metastore support not-equals
> ---
>
> Key: SPARK-33582
> URL: https://issues.apache.org/jira/browse/SPARK-33582
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> https://github.com/apache/hive/blob/b8bd4594bef718b1eeac9fceb437d7df7b480ed1/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java#L2194-L2207
> https://issues.apache.org/jira/browse/HIVE-2702



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33589) Close opened session if the initialization fails

2020-11-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33589:


Assignee: Yuming Wang

> Close opened session if the initialization fails
> 
>
> Key: SPARK-33589
> URL: https://issues.apache.org/jira/browse/SPARK-33589
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33589) Close opened session if the initialization fails

2020-11-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33589.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30536
[https://github.com/apache/spark/pull/30536]

> Close opened session if the initialization fails
> 
>
> Key: SPARK-33589
> URL: https://issues.apache.org/jira/browse/SPARK-33589
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33578) enableHiveSupport is invalid after sparkContext that without hive support created

2020-11-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33578.
--
Resolution: Won't Fix

> enableHiveSupport is invalid after sparkContext that without hive support 
> created 
> --
>
> Key: SPARK-33578
> URL: https://issues.apache.org/jira/browse/SPARK-33578
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: steven zhang
>Priority: Minor
> Fix For: 3.1.0
>
>
> reproduce as follow code:
>         SparkConf sparkConf = new SparkConf().setAppName("hello");
>         sparkConf.set("spark.master", "local");
>         JavaSparkContext jssc = new JavaSparkContext(sparkConf);
>         spark = SparkSession.builder()
>                 .config("spark.serializer", 
> "org.apache.spark.serializer.KryoSerializer")
>                 .config("hive.exec.dynamici.partition", 
> true).config("hive.exec.dynamic.partition.mode", "nonstrict")
>                 .config("hive.metastore.uris", "thrift://hivemetastore:9083")
>                 .enableHiveSupport()
>                 .master("local")
>                 .getOrCreate();
>        spark.sql("select * from hudi_db.hudi_test_order").show();
>  
>  it will produce follow Exception    
> AssertionError: assertion failed: No plan for HiveTableRelation 
> [`hudi_db`.`hudi_test_order` … (at current master branch)  
> org.apache.spark.sql.AnalysisException: Table or view not found: 
> `hudi_db`.`hudi_test_order`;  (at spark v2.4.4)
>   
>  the reason is SparkContext#getOrCreate(SparkConf) will return activeContext 
> that include previous spark config if it has
> but the input SparkConf is the newest which include previous spark config and 
> options.
>   enableHiveSupport will set options (“spark.sql.catalogImplementation", 
> "hive”) when spark session created it will miss this conf
> SharedState load conf from sparkContext and will miss hive catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33589) Close opened session if the initialization fails

2020-11-29 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33589:

Summary: Close opened session if the initialization fails  (was: Add try 
catch when opening session)

> Close opened session if the initialization fails
> 
>
> Key: SPARK-33589
> URL: https://issues.apache.org/jira/browse/SPARK-33589
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33592) Pyspark ML Validator writer may lost params in estimatorParamMaps

2020-11-29 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu updated SPARK-33592:
---
Description: 
Two typical cases to reproduce it:
(1)
{code:python}
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
lr = LogisticRegression()
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])

paramGrid = ParamGridBuilder() \
.addGrid(hashingTF.numFeatures, [10, 100]) \
.addGrid(lr.maxIter, [100, 200]) \
.build()
tvs = TrainValidationSplit(estimator=pipeline,
   estimatorParamMaps=paramGrid,
   evaluator=MulticlassClassificationEvaluator())

tvs.save(tvsPath)
loadedTvs = TrainValidationSplit.load(tvsPath)

{code}

Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
`hashingTF.numFeatures` and `lr.maxIter` are lost.


(2)
{code:python}
lr = LogisticRegression()
ova = OneVsRest(classifier=lr)
grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
evaluator = MulticlassClassificationEvaluator()
tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
evaluator=evaluator)

tvs.save(tvsPath)
loadedTvs = TrainValidationSplit.load(tvsPath)

{code}
Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
params`lr.maxIter` are lost.


Both CrossValidator and TrainValidationSplit in Pyspark has this issue.

  was:
Two typical cases to reproduce it:
(1)
{code:python}
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
lr = LogisticRegression()
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])

paramGrid = ParamGridBuilder() \
.addGrid(hashingTF.numFeatures, [10, 100]) \
.addGrid(lr.maxIter, [100, 200]) \
.build()
tvs = TrainValidationSplit(estimator=pipeline,
   estimatorParamMaps=paramGrid,
   evaluator=MulticlassClassificationEvaluator())

tvs.save(tvsPath)
loadedTvs = TrainValidationSplit.load(tvsPath)

{code}

Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
`hashingTF.numFeatures` and `lr.maxIter` are lost.


(2)
{code:python}
lr = LogisticRegression()
ova = OneVsRest(classifier=lr)
grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
evaluator = MulticlassClassificationEvaluator()
tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
evaluator=evaluator)

tvs.save(tvsPath)
loadedTvs = TrainValidationSplit.load(tvsPath)

{code}
Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
params`lr.maxIter` are lost.


Both CrossValidator and TrainValidationSplit has this issue.


> Pyspark ML Validator writer may lost params in estimatorParamMaps
> -
>
> Key: SPARK-33592
> URL: https://issues.apache.org/jira/browse/SPARK-33592
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>
> Two typical cases to reproduce it:
> (1)
> {code:python}
> tokenizer = Tokenizer(inputCol="text", outputCol="words")
> hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
> lr = LogisticRegression()
> pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
> paramGrid = ParamGridBuilder() \
> .addGrid(hashingTF.numFeatures, [10, 100]) \
> .addGrid(lr.maxIter, [100, 200]) \
> .build()
> tvs = TrainValidationSplit(estimator=pipeline,
>estimatorParamMaps=paramGrid,
>evaluator=MulticlassClassificationEvaluator())
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
> `hashingTF.numFeatures` and `lr.maxIter` are lost.
> (2)
> {code:python}
> lr = LogisticRegression()
> ova = OneVsRest(classifier=lr)
> grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
> evaluator = MulticlassClassificationEvaluator()
> tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
> evaluator=evaluator)
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
> params`lr.maxIter` are lost.
> Both CrossValidator and TrainValidationSplit in Pyspark has this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33592) Pyspark ML Validator writer may lost params in estimatorParamMaps

2020-11-29 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu reassigned SPARK-33592:
--

Assignee: Weichen Xu

> Pyspark ML Validator writer may lost params in estimatorParamMaps
> -
>
> Key: SPARK-33592
> URL: https://issues.apache.org/jira/browse/SPARK-33592
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>
> Two typical cases to reproduce it:
> (1)
> {code:python}
> tokenizer = Tokenizer(inputCol="text", outputCol="words")
> hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
> lr = LogisticRegression()
> pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
> paramGrid = ParamGridBuilder() \
> .addGrid(hashingTF.numFeatures, [10, 100]) \
> .addGrid(lr.maxIter, [100, 200]) \
> .build()
> tvs = TrainValidationSplit(estimator=pipeline,
>estimatorParamMaps=paramGrid,
>evaluator=MulticlassClassificationEvaluator())
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
> `hashingTF.numFeatures` and `lr.maxIter` are lost.
> (2)
> {code:python}
> lr = LogisticRegression()
> ova = OneVsRest(classifier=lr)
> grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
> evaluator = MulticlassClassificationEvaluator()
> tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
> evaluator=evaluator)
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
> params`lr.maxIter` are lost.
> Both CrossValidator and TrainValidationSplit has this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33592) Pyspark ML Validator writer may lost params in estimatorParamMaps

2020-11-29 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu updated SPARK-33592:
---
Description: 
Two typical cases to reproduce it:
(1)
{code:python}
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
lr = LogisticRegression()
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])

paramGrid = ParamGridBuilder() \
.addGrid(hashingTF.numFeatures, [10, 100]) \
.addGrid(lr.maxIter, [100, 200]) \
.build()
tvs = TrainValidationSplit(estimator=pipeline,
   estimatorParamMaps=paramGrid,
   evaluator=MulticlassClassificationEvaluator())

tvs.save(tvsPath)
loadedTvs = TrainValidationSplit.load(tvsPath)

{code}

Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
`hashingTF.numFeatures` and `lr.maxIter` are lost.


(2)
{code:python}
lr = LogisticRegression()
ova = OneVsRest(classifier=lr)
grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
evaluator = MulticlassClassificationEvaluator()
tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
evaluator=evaluator)

tvs.save(tvsPath)
loadedTvs = TrainValidationSplit.load(tvsPath)

{code}
Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
params`lr.maxIter` are lost.


Both CrossValidator and TrainValidationSplit has this issue.

  was:
Two typical cases to reproduce it:
(1)
{code: python}
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
lr = LogisticRegression()
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])

paramGrid = ParamGridBuilder() \
.addGrid(hashingTF.numFeatures, [10, 100]) \
.addGrid(lr.maxIter, [100, 200]) \
.build()
tvs = TrainValidationSplit(estimator=pipeline,
   estimatorParamMaps=paramGrid,
   evaluator=MulticlassClassificationEvaluator())

tvs.save(tvsPath)
loadedTvs = TrainValidationSplit.load(tvsPath)

{code}

Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
`hashingTF.numFeatures` and `lr.maxIter` are lost.


(2)
{code: python}
lr = LogisticRegression()
ova = OneVsRest(classifier=lr)
grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
evaluator = MulticlassClassificationEvaluator()
tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
evaluator=evaluator)

tvs.save(tvsPath)
loadedTvs = TrainValidationSplit.load(tvsPath)

{code}
Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
params`lr.maxIter` are lost.


Both CrossValidator and TrainValidationSplit has this issue.


> Pyspark ML Validator writer may lost params in estimatorParamMaps
> -
>
> Key: SPARK-33592
> URL: https://issues.apache.org/jira/browse/SPARK-33592
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Weichen Xu
>Priority: Major
>
> Two typical cases to reproduce it:
> (1)
> {code:python}
> tokenizer = Tokenizer(inputCol="text", outputCol="words")
> hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
> lr = LogisticRegression()
> pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
> paramGrid = ParamGridBuilder() \
> .addGrid(hashingTF.numFeatures, [10, 100]) \
> .addGrid(lr.maxIter, [100, 200]) \
> .build()
> tvs = TrainValidationSplit(estimator=pipeline,
>estimatorParamMaps=paramGrid,
>evaluator=MulticlassClassificationEvaluator())
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
> `hashingTF.numFeatures` and `lr.maxIter` are lost.
> (2)
> {code:python}
> lr = LogisticRegression()
> ova = OneVsRest(classifier=lr)
> grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
> evaluator = MulticlassClassificationEvaluator()
> tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
> evaluator=evaluator)
> tvs.save(tvsPath)
> loadedTvs = TrainValidationSplit.load(tvsPath)
> {code}
> Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
> params`lr.maxIter` are lost.
> Both CrossValidator and TrainValidationSplit has this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33592) Pyspark ML Validator writer may lost params in estimatorParamMaps

2020-11-29 Thread Weichen Xu (Jira)
Weichen Xu created SPARK-33592:
--

 Summary: Pyspark ML Validator writer may lost params in 
estimatorParamMaps
 Key: SPARK-33592
 URL: https://issues.apache.org/jira/browse/SPARK-33592
 Project: Spark
  Issue Type: Bug
  Components: ML, PySpark
Affects Versions: 3.0.0, 3.1.0
Reporter: Weichen Xu


Two typical cases to reproduce it:
(1)
{code: python}
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
lr = LogisticRegression()
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])

paramGrid = ParamGridBuilder() \
.addGrid(hashingTF.numFeatures, [10, 100]) \
.addGrid(lr.maxIter, [100, 200]) \
.build()
tvs = TrainValidationSplit(estimator=pipeline,
   estimatorParamMaps=paramGrid,
   evaluator=MulticlassClassificationEvaluator())

tvs.save(tvsPath)
loadedTvs = TrainValidationSplit.load(tvsPath)

{code}

Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning params 
`hashingTF.numFeatures` and `lr.maxIter` are lost.


(2)
{code: python}
lr = LogisticRegression()
ova = OneVsRest(classifier=lr)
grid = ParamGridBuilder().addGrid(lr.maxIter, [100, 200]).build()
evaluator = MulticlassClassificationEvaluator()
tvs = TrainValidationSplit(estimator=ova, estimatorParamMaps=grid, 
evaluator=evaluator)

tvs.save(tvsPath)
loadedTvs = TrainValidationSplit.load(tvsPath)

{code}
Then we can check `loadedTvs.getEstimatorParamMaps()`, the tuning 
params`lr.maxIter` are lost.


Both CrossValidator and TrainValidationSplit has this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33517) Incorrect menu item display and link in PySpark Usage Guide for Pandas with Apache Arrow

2020-11-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33517:


Assignee: liucht-inspur  (was: Apache Spark)

> Incorrect menu item display and link in PySpark Usage Guide for Pandas with 
> Apache Arrow
> 
>
> Key: SPARK-33517
> URL: https://issues.apache.org/jira/browse/SPARK-33517
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: liucht-inspur
>Assignee: liucht-inspur
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: image-2020-11-23-18-47-01-591.png, 
> image-2020-11-27-09-43-58-141.png, spark-doc.jpg
>
>
> Error setting menu item and link, change "Apache Arrow in Spark" to "Apache 
> Arrow in PySpark"
>   !image-2020-11-23-18-47-01-591.png!
>  
> after:
> !image-2020-11-27-09-43-58-141.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33517) Incorrect menu item display and link in PySpark Usage Guide for Pandas with Apache Arrow

2020-11-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33517:


Assignee: Apache Spark

> Incorrect menu item display and link in PySpark Usage Guide for Pandas with 
> Apache Arrow
> 
>
> Key: SPARK-33517
> URL: https://issues.apache.org/jira/browse/SPARK-33517
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: liucht-inspur
>Assignee: Apache Spark
>Priority: Minor
> Attachments: image-2020-11-23-18-47-01-591.png, 
> image-2020-11-27-09-43-58-141.png, spark-doc.jpg
>
>
> Error setting menu item and link, change "Apache Arrow in Spark" to "Apache 
> Arrow in PySpark"
>   !image-2020-11-23-18-47-01-591.png!
>  
> after:
> !image-2020-11-27-09-43-58-141.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33517) Incorrect menu item display and link in PySpark Usage Guide for Pandas with Apache Arrow

2020-11-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33517.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30466
[https://github.com/apache/spark/pull/30466]

> Incorrect menu item display and link in PySpark Usage Guide for Pandas with 
> Apache Arrow
> 
>
> Key: SPARK-33517
> URL: https://issues.apache.org/jira/browse/SPARK-33517
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: liucht-inspur
>Assignee: Apache Spark
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: image-2020-11-23-18-47-01-591.png, 
> image-2020-11-27-09-43-58-141.png, spark-doc.jpg
>
>
> Error setting menu item and link, change "Apache Arrow in Spark" to "Apache 
> Arrow in PySpark"
>   !image-2020-11-23-18-47-01-591.png!
>  
> after:
> !image-2020-11-27-09-43-58-141.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33585) The comment for SQLContext.tables() doesn't mention the `database` column

2020-11-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33585.
---
Fix Version/s: 2.4.8
   3.0.2
   3.1.0
   Resolution: Fixed

Issue resolved by pull request 30526
[https://github.com/apache/spark/pull/30526]

> The comment for SQLContext.tables() doesn't mention the `database` column
> -
>
> Key: SPARK-33585
> URL: https://issues.apache.org/jira/browse/SPARK-33585
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 3.1.0, 3.0.2, 2.4.8
>
>
> The comment says: "The returned DataFrame has two columns, tableName and 
> isTemporary":
> https://github.com/apache/spark/blob/b26ae98407c6c017a4061c0c420f48685ddd6163/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L664
> but actually the dataframe has 3 columns:
> {code:scala}
> scala> spark.range(10).createOrReplaceTempView("view1")
> scala> val tables = spark.sqlContext.tables()
> tables: org.apache.spark.sql.DataFrame = [database: string, tableName: string 
> ... 1 more field]
> scala> tables.printSchema
> root
>  |-- database: string (nullable = false)
>  |-- tableName: string (nullable = false)
>  |-- isTemporary: boolean (nullable = false)
> scala> tables.show
> ++-+---+
> |database|tableName|isTemporary|
> ++-+---+
> | default|   t1|  false|
> | default|   t2|  false|
> | default|  ymd|  false|
> ||view1|   true|
> ++-+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33585) The comment for SQLContext.tables() doesn't mention the `database` column

2020-11-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33585:
-

Assignee: Maxim Gekk

> The comment for SQLContext.tables() doesn't mention the `database` column
> -
>
> Key: SPARK-33585
> URL: https://issues.apache.org/jira/browse/SPARK-33585
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>
> The comment says: "The returned DataFrame has two columns, tableName and 
> isTemporary":
> https://github.com/apache/spark/blob/b26ae98407c6c017a4061c0c420f48685ddd6163/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L664
> but actually the dataframe has 3 columns:
> {code:scala}
> scala> spark.range(10).createOrReplaceTempView("view1")
> scala> val tables = spark.sqlContext.tables()
> tables: org.apache.spark.sql.DataFrame = [database: string, tableName: string 
> ... 1 more field]
> scala> tables.printSchema
> root
>  |-- database: string (nullable = false)
>  |-- tableName: string (nullable = false)
>  |-- isTemporary: boolean (nullable = false)
> scala> tables.show
> ++-+---+
> |database|tableName|isTemporary|
> ++-+---+
> | default|   t1|  false|
> | default|   t2|  false|
> | default|  ymd|  false|
> ||view1|   true|
> ++-+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33588) Partition spec in SHOW TABLE EXTENDED doesn't respect `spark.sql.caseSensitive`

2020-11-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33588.
---
Fix Version/s: 3.1.0
 Assignee: Maxim Gekk
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/30529

> Partition spec in SHOW TABLE EXTENDED doesn't respect 
> `spark.sql.caseSensitive`
> ---
>
> Key: SPARK-33588
> URL: https://issues.apache.org/jira/browse/SPARK-33588
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> For example:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
>  > USING parquet
>  > partitioned by (year, month);
> spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
> spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1);
> Error in query: Partition spec is invalid. The spec (YEAR, Month) must match 
> the partition spec (year, month) defined in table '`default`.`tbl1`';
> {code}
> The spark.sql.caseSensitive flag is false by default, so, the partition spec 
> is valid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33588) Partition spec in SHOW TABLE EXTENDED doesn't respect `spark.sql.caseSensitive`

2020-11-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33588:
--
Affects Version/s: (was: 3.0.2)
   (was: 2.4.8)
   2.4.7
   3.0.1

> Partition spec in SHOW TABLE EXTENDED doesn't respect 
> `spark.sql.caseSensitive`
> ---
>
> Key: SPARK-33588
> URL: https://issues.apache.org/jira/browse/SPARK-33588
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.1, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> For example:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
>  > USING parquet
>  > partitioned by (year, month);
> spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
> spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1);
> Error in query: Partition spec is invalid. The spec (YEAR, Month) must match 
> the partition spec (year, month) defined in table '`default`.`tbl1`';
> {code}
> The spark.sql.caseSensitive flag is false by default, so, the partition spec 
> is valid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33591) NULL is recognized as the "null" string in partition specs

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33591:


Assignee: Apache Spark

> NULL is recognized as the "null" string in partition specs
> --
>
> Key: SPARK-33591
> URL: https://issues.apache.org/jira/browse/SPARK-33591
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> For example:
> {code:sql}
> spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED 
> BY (p1);
> spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
> spark-sql> SELECT isnull(p1) FROM tbl5;
> false
> {code}
> The *p1 = null* is not recognized as a partition with NULL value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33591) NULL is recognized as the "null" string in partition specs

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240356#comment-17240356
 ] 

Apache Spark commented on SPARK-33591:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30538

> NULL is recognized as the "null" string in partition specs
> --
>
> Key: SPARK-33591
> URL: https://issues.apache.org/jira/browse/SPARK-33591
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> For example:
> {code:sql}
> spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED 
> BY (p1);
> spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
> spark-sql> SELECT isnull(p1) FROM tbl5;
> false
> {code}
> The *p1 = null* is not recognized as a partition with NULL value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33591) NULL is recognized as the "null" string in partition specs

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33591:


Assignee: (was: Apache Spark)

> NULL is recognized as the "null" string in partition specs
> --
>
> Key: SPARK-33591
> URL: https://issues.apache.org/jira/browse/SPARK-33591
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> For example:
> {code:sql}
> spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED 
> BY (p1);
> spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
> spark-sql> SELECT isnull(p1) FROM tbl5;
> false
> {code}
> The *p1 = null* is not recognized as a partition with NULL value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33587) Kill the executor on nested fatal errors

2020-11-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33587:
-

Assignee: Shixiong Zhu

> Kill the executor on nested fatal errors
> 
>
> Key: SPARK-33587
> URL: https://issues.apache.org/jira/browse/SPARK-33587
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
>
> Currently we kill the executor when hitting a fatal error. However, if the 
> fatal error is wrapped by another exception, such as
> - java.util.concurrent.ExecutionException, 
> com.google.common.util.concurrent.UncheckedExecutionException, 
> com.google.common.util.concurrent.ExecutionError when using Guava cache and 
> java thread pool.
> - SparkException thrown from this line: 
> https://github.com/apache/spark/blob/cf98a761de677c733f3c33230e1c63ddb785d5c5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L231
> We will still keep the executor running. Fatal errors are usually 
> unrecoverable (such as OutOfMemoryError), some components may be in a broken 
> state when hitting a fatal error. Hence, it's better to detect the nested 
> fatal error as well and kill the executor. Then we can rely on Spark's fault 
> tolerance to recover.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33587) Kill the executor on nested fatal errors

2020-11-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33587.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30528
[https://github.com/apache/spark/pull/30528]

> Kill the executor on nested fatal errors
> 
>
> Key: SPARK-33587
> URL: https://issues.apache.org/jira/browse/SPARK-33587
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently we kill the executor when hitting a fatal error. However, if the 
> fatal error is wrapped by another exception, such as
> - java.util.concurrent.ExecutionException, 
> com.google.common.util.concurrent.UncheckedExecutionException, 
> com.google.common.util.concurrent.ExecutionError when using Guava cache and 
> java thread pool.
> - SparkException thrown from this line: 
> https://github.com/apache/spark/blob/cf98a761de677c733f3c33230e1c63ddb785d5c5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L231
> We will still keep the executor running. Fatal errors are usually 
> unrecoverable (such as OutOfMemoryError), some components may be in a broken 
> state when hitting a fatal error. Hence, it's better to detect the nested 
> fatal error as well and kill the executor. Then we can rely on Spark's fault 
> tolerance to recover.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33591) NULL is recognized as the "null" string in partition specs

2020-11-29 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33591:
---
Issue Type: Bug  (was: Improvement)

> NULL is recognized as the "null" string in partition specs
> --
>
> Key: SPARK-33591
> URL: https://issues.apache.org/jira/browse/SPARK-33591
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> For example:
> {code:sql}
> spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED 
> BY (p1);
> spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
> spark-sql> SELECT isnull(p1) FROM tbl5;
> false
> {code}
> The *p1 = null* is not recognized as a partition with NULL value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33591) NULL is recognized as the "null" string in partition specs

2020-11-29 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33591:
---
Issue Type: Improvement  (was: Bug)

> NULL is recognized as the "null" string in partition specs
> --
>
> Key: SPARK-33591
> URL: https://issues.apache.org/jira/browse/SPARK-33591
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> For example:
> {code:sql}
> spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED 
> BY (p1);
> spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
> spark-sql> SELECT isnull(p1) FROM tbl5;
> false
> {code}
> The *p1 = null* is not recognized as a partition with NULL value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33591) NULL is recognized as the "null" string in partition specs

2020-11-29 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33591:
--

 Summary: NULL is recognized as the "null" string in partition specs
 Key: SPARK-33591
 URL: https://issues.apache.org/jira/browse/SPARK-33591
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


For example:
{code:sql}
spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED BY 
(p1);
spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
spark-sql> SELECT isnull(p1) FROM tbl5;
false
{code}

The *p1 = null* is not recognized as a partition with NULL value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33590) Missing submenus for Performance Tuning in Spark SQL Guide

2020-11-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33590:
-

Assignee: Kazuaki Ishizaki

> Missing submenus for Performance Tuning in Spark SQL Guide 
> ---
>
> Key: SPARK-33590
> URL: https://issues.apache.org/jira/browse/SPARK-33590
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
> Attachments: image-2020-11-30-00-04-07-969.png
>
>
> Sub-menus for \{Coalesce Hints for SQL Queries} and {Adaptive Query 
> Execution) are missing 
> !image-2020-11-30-00-04-07-969.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33590) Missing submenus for Performance Tuning in Spark SQL Guide

2020-11-29 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33590.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30537
[https://github.com/apache/spark/pull/30537]

> Missing submenus for Performance Tuning in Spark SQL Guide 
> ---
>
> Key: SPARK-33590
> URL: https://issues.apache.org/jira/browse/SPARK-33590
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: image-2020-11-30-00-04-07-969.png
>
>
> Sub-menus for \{Coalesce Hints for SQL Queries} and {Adaptive Query 
> Execution) are missing 
> !image-2020-11-30-00-04-07-969.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33531) [SQL] Reduce shuffle task number when calling CollectLimitExec#executeToIterator

2020-11-29 Thread Mori[A]rty (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mori[A]rty updated SPARK-33531:
---
Description: 
Currently, when invoking CollectLimitExec#executeToIterator, a single-partition 
ShuffledRowRDD containg all parent partitions is created. Spark will compute 
all these partitions to get the result.
But in most cases, computing the first few partitions is enought to get the 
result, which takes much less time.

When running a SparkThriftServer and spark.sql.thriftServer.incrementalCollect 
is enabled, too many shuffle tasks will lead to a significant performance issue 
for SQLs terminated with LIMIT.

A possible improvement may be as follows:
 # Create a ShuffledRowRDD containing the first parent partition.
 # Collect rows of this ShuffledRowRDD to driver
 # If collected rows is less than limit number, then create the next 
ShuffledRowRDD with serveral following parent partitions. The number of parent 
partitions is calculated the same way as SparkPlan#executeTake.
 # Repeat 2~3 until total collected rows reaches limit number or all parent 
partitions have been computed.

  was:
Using a new method SparkPlan#executeTakeToIterator to implement 
CollectLimitExec#executeToIterator to avoid shuffle caused by invoking parent 
method SparkPlan#executeToIterator.

When running a SparkThriftServer and spark.sql.thriftServer.incrementalCollect 
is enabled, extra shuffle will lead to a significant performance issue for SQLs 
terminated with LIMIT.


> [SQL] Reduce shuffle task number when calling 
> CollectLimitExec#executeToIterator
> 
>
> Key: SPARK-33531
> URL: https://issues.apache.org/jira/browse/SPARK-33531
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.1
>Reporter: Mori[A]rty
>Priority: Major
>
> Currently, when invoking CollectLimitExec#executeToIterator, a 
> single-partition ShuffledRowRDD containg all parent partitions is created. 
> Spark will compute all these partitions to get the result.
> But in most cases, computing the first few partitions is enought to get the 
> result, which takes much less time.
> When running a SparkThriftServer and 
> spark.sql.thriftServer.incrementalCollect is enabled, too many shuffle tasks 
> will lead to a significant performance issue for SQLs terminated with LIMIT.
> A possible improvement may be as follows:
>  # Create a ShuffledRowRDD containing the first parent partition.
>  # Collect rows of this ShuffledRowRDD to driver
>  # If collected rows is less than limit number, then create the next 
> ShuffledRowRDD with serveral following parent partitions. The number of 
> parent partitions is calculated the same way as SparkPlan#executeTake.
>  # Repeat 2~3 until total collected rows reaches limit number or all parent 
> partitions have been computed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33531) [SQL] Reduce shuffle task number when calling CollectLimitExec#executeToIterator

2020-11-29 Thread Mori[A]rty (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mori[A]rty updated SPARK-33531:
---
Summary: [SQL] Reduce shuffle task number when calling 
CollectLimitExec#executeToIterator  (was: [SQL] Avoid shuffle when calling 
CollectLimitExec#executeToIterator)

> [SQL] Reduce shuffle task number when calling 
> CollectLimitExec#executeToIterator
> 
>
> Key: SPARK-33531
> URL: https://issues.apache.org/jira/browse/SPARK-33531
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.1
>Reporter: Mori[A]rty
>Priority: Major
>
> Using a new method SparkPlan#executeTakeToIterator to implement 
> CollectLimitExec#executeToIterator to avoid shuffle caused by invoking parent 
> method SparkPlan#executeToIterator.
> When running a SparkThriftServer and 
> spark.sql.thriftServer.incrementalCollect is enabled, extra shuffle will lead 
> to a significant performance issue for SQLs terminated with LIMIT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33590) Missing submenus for Performance Tuning in Spark SQL Guide

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240284#comment-17240284
 ] 

Apache Spark commented on SPARK-33590:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30537

> Missing submenus for Performance Tuning in Spark SQL Guide 
> ---
>
> Key: SPARK-33590
> URL: https://issues.apache.org/jira/browse/SPARK-33590
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Kazuaki Ishizaki
>Priority: Minor
> Attachments: image-2020-11-30-00-04-07-969.png
>
>
> Sub-menus for \{Coalesce Hints for SQL Queries} and {Adaptive Query 
> Execution) are missing 
> !image-2020-11-30-00-04-07-969.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33590) Missing submenus for Performance Tuning in Spark SQL Guide

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33590:


Assignee: (was: Apache Spark)

> Missing submenus for Performance Tuning in Spark SQL Guide 
> ---
>
> Key: SPARK-33590
> URL: https://issues.apache.org/jira/browse/SPARK-33590
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Kazuaki Ishizaki
>Priority: Minor
> Attachments: image-2020-11-30-00-04-07-969.png
>
>
> Sub-menus for \{Coalesce Hints for SQL Queries} and {Adaptive Query 
> Execution) are missing 
> !image-2020-11-30-00-04-07-969.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33590) Missing submenus for Performance Tuning in Spark SQL Guide

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240282#comment-17240282
 ] 

Apache Spark commented on SPARK-33590:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30537

> Missing submenus for Performance Tuning in Spark SQL Guide 
> ---
>
> Key: SPARK-33590
> URL: https://issues.apache.org/jira/browse/SPARK-33590
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Kazuaki Ishizaki
>Priority: Minor
> Attachments: image-2020-11-30-00-04-07-969.png
>
>
> Sub-menus for \{Coalesce Hints for SQL Queries} and {Adaptive Query 
> Execution) are missing 
> !image-2020-11-30-00-04-07-969.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33590) Missing submenus for Performance Tuning in Spark SQL Guide

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33590:


Assignee: Apache Spark

> Missing submenus for Performance Tuning in Spark SQL Guide 
> ---
>
> Key: SPARK-33590
> URL: https://issues.apache.org/jira/browse/SPARK-33590
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>Priority: Minor
> Attachments: image-2020-11-30-00-04-07-969.png
>
>
> Sub-menus for \{Coalesce Hints for SQL Queries} and {Adaptive Query 
> Execution) are missing 
> !image-2020-11-30-00-04-07-969.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33590) Missing submenus for Performance Tuning in Spark SQL Guide

2020-11-29 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created SPARK-33590:


 Summary: Missing submenus for Performance Tuning in Spark SQL 
Guide 
 Key: SPARK-33590
 URL: https://issues.apache.org/jira/browse/SPARK-33590
 Project: Spark
  Issue Type: Bug
  Components: docs
Affects Versions: 3.0.1, 3.0.0
Reporter: Kazuaki Ishizaki
 Attachments: image-2020-11-30-00-04-07-969.png

Sub-menus for \{Coalesce Hints for SQL Queries} and {Adaptive Query Execution) 
are missing 

!image-2020-11-30-00-03-04-814.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33590) Missing submenus for Performance Tuning in Spark SQL Guide

2020-11-29 Thread Kazuaki Ishizaki (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-33590:
-
Attachment: image-2020-11-30-00-04-07-969.png

> Missing submenus for Performance Tuning in Spark SQL Guide 
> ---
>
> Key: SPARK-33590
> URL: https://issues.apache.org/jira/browse/SPARK-33590
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Kazuaki Ishizaki
>Priority: Minor
> Attachments: image-2020-11-30-00-04-07-969.png
>
>
> Sub-menus for \{Coalesce Hints for SQL Queries} and {Adaptive Query 
> Execution) are missing 
> !image-2020-11-30-00-03-04-814.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33590) Missing submenus for Performance Tuning in Spark SQL Guide

2020-11-29 Thread Kazuaki Ishizaki (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-33590:
-
Description: 
Sub-menus for \{Coalesce Hints for SQL Queries} and {Adaptive Query Execution) 
are missing 

!image-2020-11-30-00-04-07-969.png!

  was:
Sub-menus for \{Coalesce Hints for SQL Queries} and {Adaptive Query Execution) 
are missing 

!image-2020-11-30-00-03-04-814.png!


> Missing submenus for Performance Tuning in Spark SQL Guide 
> ---
>
> Key: SPARK-33590
> URL: https://issues.apache.org/jira/browse/SPARK-33590
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Kazuaki Ishizaki
>Priority: Minor
> Attachments: image-2020-11-30-00-04-07-969.png
>
>
> Sub-menus for \{Coalesce Hints for SQL Queries} and {Adaptive Query 
> Execution) are missing 
> !image-2020-11-30-00-04-07-969.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33589) Add try catch when opening session

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240260#comment-17240260
 ] 

Apache Spark commented on SPARK-33589:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30536

> Add try catch when opening session
> --
>
> Key: SPARK-33589
> URL: https://issues.apache.org/jira/browse/SPARK-33589
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33589) Add try catch when opening session

2020-11-29 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17240259#comment-17240259
 ] 

Apache Spark commented on SPARK-33589:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/30536

> Add try catch when opening session
> --
>
> Key: SPARK-33589
> URL: https://issues.apache.org/jira/browse/SPARK-33589
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33589) Add try catch when opening session

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33589:


Assignee: (was: Apache Spark)

> Add try catch when opening session
> --
>
> Key: SPARK-33589
> URL: https://issues.apache.org/jira/browse/SPARK-33589
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33589) Add try catch when opening session

2020-11-29 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33589:


Assignee: Apache Spark

> Add try catch when opening session
> --
>
> Key: SPARK-33589
> URL: https://issues.apache.org/jira/browse/SPARK-33589
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33589) Add try catch when opening session

2020-11-29 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-33589:
---

 Summary: Add try catch when opening session
 Key: SPARK-33589
 URL: https://issues.apache.org/jira/browse/SPARK-33589
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org