[jira] [Created] (SPARK-48954) Rename unreleased try_remainder() function to try_mod()

2024-07-19 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-48954:


 Summary: Rename unreleased try_remainder() function to try_mod()
 Key: SPARK-48954
 URL: https://issues.apache.org/jira/browse/SPARK-48954
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau
 Fix For: 4.0.0


the try_remainder() function is the try_* version of `%` and `mod`.
As such, given that there is NO `remainder()` function and no other product 
seems to have try_remainder() we want to rename try_remainder to try_mod()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48929) View fails with internal error after upgrade causes expected syntax error.

2024-07-17 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-48929:


 Summary: View fails with internal error after upgrade causes 
expected syntax error.
 Key: SPARK-48929
 URL: https://issues.apache.org/jira/browse/SPARK-48929
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau
 Fix For: 4.0.0


On older Spark:
CREATE VIEW v AS SELECT 1 ! IN (2);
SEELCT * FROM v;
=> true
Upgrade to Spark 4
SELECT * FROM v;
Internal error 

This makes it hard to debug the problem.
Rather than assuming that failure to parse a view text is an internal error we 
should assume something like upgrade broke it and expose the actual error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48031) Add schema evolution options to views

2024-04-28 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-48031:


 Summary: Add schema evolution options to views 
 Key: SPARK-48031
 URL: https://issues.apache.org/jira/browse/SPARK-48031
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


We want to provide the ability for views to react to changes in the query 
resolution in manners differently than just failing the view.
For example we want the view to be able to compensate for type changes by 
casting the query result to the view column types.
Or to adopt any type of column arity changes into a view.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47907) Put removal of '!' as a synonym for 'NOT' on a keyword level under a config

2024-04-18 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47907:


 Summary: Put removal of '!' as a synonym for 'NOT' on a keyword 
level under a config
 Key: SPARK-47907
 URL: https://issues.apache.org/jira/browse/SPARK-47907
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


Recently we dissolved the lexer equivalence between '!' and 'NOT'.
! is a prefix operator and a synonym for NOT only in that case.
But NOT is used in many more cases in the grammar.
Given that there are a handful of known scenearios where users have exploited 
the undocumented loophole it's best to add a config.
Usage found so far is:

`c1 ! IN(1, 2)`
`c1 ! BETWEEN 1 AND 2`
`c1 ! LIKE 'a%'`

 But there are worse cases:
c1 IS ! NULL
CREATE TABLE T(c1 INT ! NULL)
or even
CREATE TABLE IF ! EXISTS T(c1 INT)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47802) Revert mapping ( star ) to named_struct ( star )

2024-04-10 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47802:


 Summary: Revert mapping ( star ) to named_struct ( star )
 Key: SPARK-47802
 URL: https://issues.apache.org/jira/browse/SPARK-47802
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


Turning star within parens into named_struct ( star) as opposed to ignoring the 
parens turns out to be more risky than anticipated. Given that this was done 
solely for consistency with ( c1, c2...) it's best to not go there at all.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47789) Review and improve error message texts

2024-04-09 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47789:


 Summary: Review and improve error message texts
 Key: SPARK-47789
 URL: https://issues.apache.org/jira/browse/SPARK-47789
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


error-classes.json content could use some TLC to fix formatting, improve 
grammar, and other editing.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47783) Refresh error-states.json

2024-04-09 Thread Serge Rielau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-47783:
-
Summary: Refresh error-states.json  (was: Refresh error-state.sql)

> Refresh error-states.json
> -
>
> Key: SPARK-47783
> URL: https://issues.apache.org/jira/browse/SPARK-47783
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Priority: Major
>
> We want to add more SQLSTATEs to the menu to prevent collisions and do some 
> general cleanup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47783) Refresh error-state.sql

2024-04-09 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47783:


 Summary: Refresh error-state.sql
 Key: SPARK-47783
 URL: https://issues.apache.org/jira/browse/SPARK-47783
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


We want to add more SQLSTATEs to the menu to prevent collisions and do some 
general cleanup



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47719) Change default of spark.sql.legacy.timeParserPolicy from EXCEPTION to CORRECTED

2024-04-03 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47719:


 Summary: Change default of spark.sql.legacy.timeParserPolicy from 
EXCEPTION to CORRECTED
 Key: SPARK-47719
 URL: https://issues.apache.org/jira/browse/SPARK-47719
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


spark.sql.legacy.timeParserPolicy was introduced in Spark 3.0 and has been set 
to EXCEPTION.
Changing it from EXCEPTION for SPark 4.0 to CORRECTED will reduce errors and 
reflects a prudent timeframe.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47637) Use errorCapturingIdentifier rule in more places to improve error messages

2024-03-28 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47637:


 Summary: Use errorCapturingIdentifier rule in more places to 
improve error messages
 Key: SPARK-47637
 URL: https://issues.apache.org/jira/browse/SPARK-47637
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


errorCapturingIdentifier parses identifier with included '-' to raise 
INVALID_IDENTIFIER 

instead of SYNTAX_ERROR for non-delimited identifiers containing a hyphen.
It is meant to be used wherever the context is not that of an expression
This Jira replaces a few missed identifiers with that rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47571) date_format() java.lang.ArithmeticException: long overflow for large dates

2024-03-26 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47571:


 Summary: date_format() java.lang.ArithmeticException: long 
overflow for large dates
 Key: SPARK-47571
 URL: https://issues.apache.org/jira/browse/SPARK-47571
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


The following works for CATS, but not for DATE_FORMAT():

select  cast(cast('5881580' AS DATE) AS STRING);
+5881580-01-01

spark-sql (default)> select date_format(cast('5881580' AS DATE), 
'yyy-mm-dd');

24/03/26 11:08:23 ERROR SparkSQLDriver: Failed in [select 
date_format(cast('5881580' AS DATE), 'yyy-mm-dd')]

java.lang.ArithmeticException: long overflow

 at java.base/java.lang.Math.multiplyExact(Math.java:1004)

 at 
org.apache.spark.sql.catalyst.util.SparkDateTimeUtils.instantToMicros(SparkDateTimeUtils.scala:122)

 at 
org.apache.spark.sql.catalyst.util.SparkDateTimeUtils.instantToMicros$(SparkDateTimeUtils.scala:116)

 at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.instantToMicros(DateTimeUtils.scala:41)

 at 
org.apache.spark.sql.catalyst.util.SparkDateTimeUtils.daysToMicros(SparkDateTimeUtils.scala:174)

 at 
org.apache.spark.sql.catalyst.util.SparkDateTimeUtils.daysToMicros$(SparkDateTimeUtils.scala:172)

 at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.daysToMicros(DateTimeUtils.scala:41)

 at 
org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToTimestamp$14(Cast.scala:642)

 at scala.runtime.java8.JFunction1$mcJI$sp.apply(JFunction1$mcJI$sp.scala:17)

 at org.apache.spark.sql.catalyst.expressions.Cast.buildCast(Cast.scala:557)

 at 
org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToTimestamp$13(Cast.scala:642)

 at org.apache.spark.sql.catalyst.expressions.Cast.nullSafeEval(Cast.scala:1170)

 at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:558)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47492) Relax definition of whitespace in lexer

2024-03-20 Thread Serge Rielau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-47492:
-
Summary: Relax definition of whitespace in lexer  (was: Wide definition of 
whitespace in lexer)

> Relax definition of whitespace in lexer
> ---
>
> Key: SPARK-47492
> URL: https://issues.apache.org/jira/browse/SPARK-47492
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Priority: Major
>
> There have been multiple incidences where queries "copied" in from other 
> sources resulted in "weird" syntax errors which ultimately boiled down to 
> whitespaces which the lexer does not recognize as such.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47492) Wide definition of whitespace in lexer

2024-03-20 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47492:


 Summary: Wide definition of whitespace in lexer
 Key: SPARK-47492
 URL: https://issues.apache.org/jira/browse/SPARK-47492
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


There have been multiple incidences where queries "copied" in from other 
sources resulted in "weird" syntax errors which ultimately boiled down to 
whitespaces which the lexer does not recognize as such.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47467) Error message regressed when creating hive table with illegal column name

2024-03-19 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47467:


 Summary: Error message regressed when creating hive table with 
illegal column name
 Key: SPARK-47467
 URL: https://issues.apache.org/jira/browse/SPARK-47467
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


The following statement used to result in:
CREATE TABLE test5(`北京, ` INT) USING HIVE;
[[INVALID_HIVE_COLUMN_NAME|https://docs.databricks.com/error-messages/error-classes.html#invalid_hive_column_name]]
 Cannot create the table `hive_metastore`.`srielau`.`test5` having the column 
`北京, ` whose name contains invalid characters ',' in Hive metastore. SQLSTATE: 
42K05

Now it results in:
org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: 
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe: columns has 2 elements 
while columns.types has 1 elements!) at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:168)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:115)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:153)
 at 
com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:405)
 at 
com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:391)
 at 
com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:152)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:312)
 at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.$anonfun$createTable$1(ExternalCatalogWithListener.scala:122)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
org.apache.spark.sql.catalyst.MetricKeyUtils$.measure(MetricKey.scala:661) at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.$anonfun$profile$1(ExternalCatalogWithListener.scala:54)
 at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94) at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.profile(ExternalCatalogWithListener.scala:53)
 
This may be related to:
https://github.com/apache/spark/pull/45180



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47427) Support trailing commas in select list

2024-03-15 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47427:


 Summary: Support trailing commas in select list
 Key: SPARK-47427
 URL: https://issues.apache.org/jira/browse/SPARK-47427
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


DuckDb has popularized allowing trailing commas in the SELECT list.
The benefit of this ability is that it is easy to add, remove, comment out 
expressions from the select list:
{noformat}

SELECT c1,
 /* c2 */
   FROM T;

vs
SELECT  c1
/* ,  c2 */
   FROM T; 
{noformat}
Recently Snowflake jumped onto this usability feature as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47382) SPARK_JOB_CANCELLED is mislabeled as a system error

2024-03-13 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47382:


 Summary: SPARK_JOB_CANCELLED is mislabeled as a system error
 Key: SPARK-47382
 URL: https://issues.apache.org/jira/browse/SPARK-47382
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


This related to:
[https://github.com/apache/spark/pull/43926]

The proper SQLSTATE should be 57014 "processing was canceled as requested"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47344) Enhance error message for invalid identifiers that need backticks

2024-03-11 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47344:


 Summary: Enhance error message for invalid identifiers that need 
backticks
 Key: SPARK-47344
 URL: https://issues.apache.org/jira/browse/SPARK-47344
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


We detect patterns like "my-tab" and raise a meaningful INVALID_IDENTIFIER 
error when it is not surrounded by back quotes.
In this ticket we want to improve this effort to go beyond dashes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47308) LATERAL regresses correlation name resolution

2024-03-06 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47308:


 Summary: LATERAL regresses correlation name resolution
 Key: SPARK-47308
 URL: https://issues.apache.org/jira/browse/SPARK-47308
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.0, 3.3.0
Reporter: Serge Rielau


{code:java}
CREATE TABLE persons(name STRING);
INSERT INTO persons VALUES('Table: persons');
CREATE OR REPLACE TABLE women(name STRING);
INSERT INTO women VALUES('Table: women');

-- This works:
SELECT (SELECT max(folk.id) 
  FROM persons AS men(id),
   (SELECT name) AS folk(id))
  FROM women;
Table: women

-- This does not:
SELECT (SELECT max(folk.id) 
  FROM persons AS men(id), 
LATERAL (SELECT name) AS folk(id))
  FROM women;
[UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column, variable, or function 
parameter with name `name` cannot be resolved.  SQLSTATE: 42703;{code}

This is weird. LATERAL should be strictly additive to name resolution rules.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47192) Convert _LEGACY_ERROR_TEMP_0035 (unsupported hive feature)

2024-02-27 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47192:


 Summary: Convert _LEGACY_ERROR_TEMP_0035 (unsupported hive feature)
 Key: SPARK-47192
 URL: https://issues.apache.org/jira/browse/SPARK-47192
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


Old:



> GRANT ROLE;

_LEGACY_ERROR_TEMP_0035

Operation not allowed: grant role. (line 1, pos 0)

 

New: 

error class: HIVE_OPERATION_NOT_SUPPORTED

The Hive operation  is not supported. (line 1, pos 0)
 
sqlstate: 0A000



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47033) EXECUTE IMMEDIATE USING does not recognize session variable names

2024-02-13 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47033:


 Summary: EXECUTE IMMEDIATE USING does not recognize session 
variable names
 Key: SPARK-47033
 URL: https://issues.apache.org/jira/browse/SPARK-47033
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


{noformat}
DECLARE parm = 'Hello';

EXECUTE IMMEDIATE 'SELECT :parm' USING parm;
[ALL_PARAMETERS_MUST_BE_NAMED] Using name parameterized queries requires all 
parameters to be named. Parameters missing names: "parm". SQLSTATE: 07001

EXECUTE IMMEDIATE 'SELECT :parm' USING parm AS parm;
Hello

{noformat}
variables are like column references, they act as their own aliases and thus 
should not be required to be named to associate with a named parameter with the 
same name.

Note that unlike for pySpark this should be case insensitive (haven't verified).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46993) Allow session variables in more places such as from_json for schema

2024-02-06 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-46993:


 Summary: Allow session variables in more places such as from_json 
for schema
 Key: SPARK-46993
 URL: https://issues.apache.org/jira/browse/SPARK-46993
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.2
Reporter: Serge Rielau


It appears we do not allow session variables to provide a schema for 
from_json().
This is likely a generic restriction re constant folding.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46908) Extend SELECT * support outside of select list

2024-01-29 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-46908:


 Summary: Extend SELECT * support outside of select list
 Key: SPARK-46908
 URL: https://issues.apache.org/jira/browse/SPARK-46908
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


Traditionally * is confined to thr select list and there to the top level of 
expressions.
Spark does, in an undocumented fashion support * in the SELECT list for 
function argument list.
Here we want to expand upon this capability by adding the WHERE clause (Filter) 
as well as a couple of more scenarios such as row value constructors and IN 
operator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46810) Clarify error class terminology

2024-01-27 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811631#comment-17811631
 ] 

Serge Rielau commented on SPARK-46810:
--

Yes I prefer option 1. 
Agreement from [~maxgekk] can't hurt.

> Clarify error class terminology
> ---
>
> Key: SPARK-46810
> URL: https://issues.apache.org/jira/browse/SPARK-46810
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>
> We use inconsistent terminology when talking about error classes. I'd like to 
> get some clarity on that before contributing any potential improvements to 
> this part of the documentation.
> Consider 
> [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html].
>  It has several key pieces of hierarchical information that have inconsistent 
> names throughout our documentation and codebase:
>  * 42
>  ** K01
>  *** INCOMPLETE_TYPE_DEFINITION
>   ARRAY
>   MAP
>   STRUCT
> What are the names of these different levels of information?
> Some examples of inconsistent terminology:
>  * [Over 
> here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation]
>  we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION 
> we call that an "error class". So what exactly is a class, the 42 or the 
> INCOMPLETE_TYPE_DEFINITION?
>  * [Over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122]
>  we call K01 the "subclass". But [over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467]
>  we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for 
> INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". 
> So what exactly is a subclass?
>  * [On this 
> page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition]
>  we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other 
> places we refer to it as an "error class".
> I don't think we should leave this status quo as-is. I see a couple of ways 
> to fix this.
> h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition"
> One solution is to use the following terms:
>  * Error class: 42
>  * Error sub-class: K01
>  * Error state: 42K01
>  * Error condition: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-condition: ARRAY, MAP, STRUCT
> Pros: 
>  * This terminology seems (to me at least) the most natural and intuitive.
>  * It may also match the SQL standard.
> Cons:
>  * We use {{errorClass}} [all over our 
> codebase|https://github.com/apache/spark/blob/15c9ec7ca3b66ec413b7964a374cb9508a80/common/utils/src/main/scala/org/apache/spark/SparkException.scala#L30]
>  – literally in thousands of places – to refer to strings like 
> INCOMPLETE_TYPE_DEFINITION.
>  ** It's probably not practical to update all these usages to say 
> {{errorCondition}} instead, so if we go with this approach there will be a 
> divide between the terminology we use in user-facing documentation vs. what 
> the code base uses.
>  ** We can perhaps rename the existing {{error-classes.json}} to 
> {{error-conditions.json}} but clarify the reason for this divide between code 
> and user docs in the documentation for {{ErrorClassesJsonReader}} .
> h1. Option 2: 42 becomes an "Error Category"
> Another approach is to use the following terminology:
>  * Error category: 42
>  * Error sub-category: K01
>  * Error state: 42K01
>  * Error class: INCOMPLETE_TYPE_DEFINITION
>  * Error sub-classes: ARRAY, MAP, STRUCT
> Pros:
>  * We continue to use "error class" as we do today in our code base.
>  * The change from calling "42" a class to a category is low impact and may 
> not show up in user-facing documentation at all. (See my side note below.)
> Cons:
>  * These terms may not align with the SQL standard.
>  * We will have to retire the term "error condition", which we have [already 
> used|https://github.com/apache/spark/blob/e7fb0ad68f73d0c1996b19c9e139d70dcc97a8c4/docs/sql-error-conditions.md]
>  in user-facing documentation.
> —
> Side note: In either case, I believe talking about "42" and "K01" – 
> regardless of what we end up calling them – in front of users is not helpful. 
> I don't think anybody cares what "42" by itself means, or what "K01" by 
> itself means. Accordingly, we should limit how much we talk about these 
> concepts in the user-facing documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#

[jira] [Commented] (SPARK-46810) Clarify error class terminology

2024-01-27 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811606#comment-17811606
 ] 

Serge Rielau commented on SPARK-46810:
--

[~nchammas] 
ISO/IEC 9075-2:2016(E) 24.1 SQLSTATE
The character string value returned in an SQLSTATE parameter comprises a 
2-character class code followed by a 3-character subclass code, each with an 
implementation-defined character set that has a one-octet character encoding 
form and is restricted to s and s. Table 
38, “SQLSTATE class and subclass codes”, specifies the class code for each 
condition and the subclass code or codes for each class code.

Class codes that begin with one of the s '0', '1', '2', '3', or '4' or 
one of the s 'A', 'B', 'C', 'D', 'E', 'F', 'G', 
or 'H' are returned only for conditions defined in ISO/IEC 9075 or in any other 
International Standard. The range of such class codes is called 
standard-defined classes. Some such class codes are reserved for use by 
specific International Standards, as specified elsewhere in this Clause. 
Subclass codes associated with such classes that also begin with one of those 
13 characters are returned only for conditions defined in ISO/IEC 9075 or some 
other International Standard. The range of such subclass codes is called 
standard-defined subclasses. Subclass codes associated with such classes that 
begin with one of the s '5', '6', '7', '8', or '9' or one of the s 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 
'S', 'T', 'U', 'V', 'W', 'X', 'Y', or 'Z' are reserved for 
implementation-defined conditions and are called implementation- defined 
subclasses.

Class codes that begin with one of the s '5', '6', '7', '8', or '9' or 
one of the s 'I', 'J', 'K', 'L', 'M', 'N', 'O', 
'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', or 'Z' are reserved for 
implementation-defined exception conditions and are called 
implementation-defined classes. All subclass codes except '000', which means no 
subclass, associated with such classes are reserved for implementation-defined 
conditions and are called implementation-defined subclasses. An 
implementation-defined completion condition shall be indicated by returning an 
implementation-defined subclass in conjunction with one of the classes 
successful completion, warning, or no data.

I'm fine with the renaming of error class to error condition and subcondition.

> Clarify error class terminology
> ---
>
> Key: SPARK-46810
> URL: https://issues.apache.org/jira/browse/SPARK-46810
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>
> We use inconsistent terminology when talking about error classes. I'd like to 
> get some clarity on that before contributing any potential improvements to 
> this part of the documentation.
> Consider 
> [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html].
>  It has several key pieces of hierarchical information that have inconsistent 
> names throughout our documentation and codebase:
>  * 42
>  ** K01
>  *** INCOMPLETE_TYPE_DEFINITION
>   ARRAY
>   MAP
>   STRUCT
> What are the names of these different levels of information?
> Some examples of inconsistent terminology:
>  * [Over 
> here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation]
>  we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION 
> we call that an "error class". So what exactly is a class, the 42 or the 
> INCOMPLETE_TYPE_DEFINITION?
>  * [Over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122]
>  we call K01 the "subclass". But [over 
> here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467]
>  we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for 
> INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". 
> So what exactly is a subclass?
>  * [On this 
> page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition]
>  we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other 
> places we refer to it as an "error class".
> I don't think we should leave this status quo as-is. I see a couple of ways 
> to fix this.
> h1. Option 1: INCOMPLETE_TYPE_DEFINITION becomes an "Error Condition"
> One solution is to use the following terms:
>  * Error class: 42
>  * Error sub-class: K01
>  * Error state: 42K01
>  * Error condition: INCOMPLETE_TYPE_DEFINITI

[jira] [Created] (SPARK-46782) Bad SQLSTATE "ID001" for INVALID_INVERSE_DISTRIBUTION_FUNCTION

2024-01-19 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-46782:


 Summary: Bad SQLSTATE "ID001" for 
INVALID_INVERSE_DISTRIBUTION_FUNCTION 
 Key: SPARK-46782
 URL: https://issues.apache.org/jira/browse/SPARK-46782
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


 INVALID_INVERSE_DISTRIBUTION_FUNCTION 
The "ID" SQLSTATE class is undefined and it is heavy handed to consume a class 
for a function.
In Spark we use the K** subclass to expand the existing range with Spark 
private states 
Since this error appears to be compile time I propose "42" as a class.
the next free slot in 42K** is:
42K0K



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46625) IDENTIFIER clause does not work with CTE reference

2024-01-08 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-46625:


 Summary: IDENTIFIER clause does not work with CTE reference
 Key: SPARK-46625
 URL: https://issues.apache.org/jira/browse/SPARK-46625
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.4
Reporter: Serge Rielau


IDENTIFIER clause does not pick up CTE 

DECLARE agg = 'max';
DECLARE col = 'c1';
DECLARE tab = 'T';

WITH S(c1, c2) AS (VALUES(1, 2), (2, 3)),
      T(c1, c2) AS (VALUES ('a', 'b'), ('c', 'd'))
SELECT IDENTIFIER(agg)(IDENTIFIER(col)) FROM IDENTIFIER(tab);

[TABLE_OR_VIEW_NOT_FOUND] The table or view `T` cannot be found. Verify the 
spelling and correctness of the schema and catalog.

If you did not qualify the name with a schema, verify the current_schema() 
output, or qualify the name with the correct schema and catalog.

To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. 
SQLSTATE: 42P01; line 3 pos 45;

'Project [unresolvedalias(expressionwithunresolvedidentifier('agg, 
org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$2785/0x009002071490@126688a7))]

+- 'UnresolvedRelation [T], [], false



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46410) Assign error classes/subclasses to JdbcUtils.classifyException

2023-12-14 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-46410:


 Summary: Assign error classes/subclasses to 
JdbcUtils.classifyException
 Key: SPARK-46410
 URL: https://issues.apache.org/jira/browse/SPARK-46410
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Serge Rielau


This is a follow up to SPARK-46393.
We should raise distinct error classes for the different kinds of invokers of 
JdbcUtils.classifyException



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46372) "Invalid call to toAttribute on unresolved object" instead UNRESOLVED_COLUMN.WITH_SUGGESTION on INSERT statement

2023-12-11 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-46372:


 Summary: "Invalid call to toAttribute on unresolved object" 
instead UNRESOLVED_COLUMN.WITH_SUGGESTION on INSERT statement
 Key: SPARK-46372
 URL: https://issues.apache.org/jira/browse/SPARK-46372
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Serge Rielau


{{CREATE TABLE rec(n INT, sm INT);}}

SELECT n + 1, n + 1 + sm FROM rec WHERE rec = 8;

[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter 
with name `rec` cannot be resolved. Did you mean one of the following? [`n`, 
`sm`]. SQLSTATE: 42703; line 1 pos 40;

{{but when placed in an INSERT:}}
{{}}

INSERT INTO rec SELECT n + 1, n + 1 + sm FROM rec WHERE rec = 8;

Invalid call to toAttribute on unresolved object

1. This appears to be a system error and we should raise it as such
2. Clearly we missed or didn't get to the point to raise the proper error. 

Stacktrace:
{quote}scala> spark.sql("INSERT INTO rec SELECT n + 1, n + 1 + sm FROM rec 
WHERE rec = 8").show();

23/12/11 18:12:25 WARN ObjectStore: Version information not found in metastore. 
hive.metastore.schema.verification is not enabled so recording the schema 
version 2.3.0

23/12/11 18:12:25 WARN ObjectStore: setMetaStoreSchemaVersion called but 
recording version is disabled: version = 2.3.0, comment = Set by MetaStore 
serge.rielau@10.240.1.53

org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
toAttribute on unresolved object

  at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAlias.toAttribute(unresolved.scala:707)

  at 
org.apache.spark.sql.catalyst.plans.logical.Project.$anonfun$output$1(basicLogicalOperators.scala:74)

  at scala.collection.immutable.List.map(List.scala:246)

  at scala.collection.immutable.List.map(List.scala:79)

  at 
org.apache.spark.sql.catalyst.plans.logical.Project.output(basicLogicalOperators.scala:74)

  at 
org.apache.spark.sql.hive.HiveAnalysis$$anonfun$apply$3.applyOrElse(HiveStrategies.scala:166)

  at 
org.apache.spark.sql.hive.HiveAnalysis$$anonfun$apply$3.applyOrElse(HiveStrategies.scala:161)

  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:170)

  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:83)

  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:170)

  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)

  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:168)

  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:164)

  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:33)

  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:99)

  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:96)

  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:33)

  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:76)

  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:75)

  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:33)

  at org.apache.spark.sql.hive.HiveAnalysis$.apply(HiveStrategies.scala:161)

  at org.apache.spark.sql.hive.HiveAnalysis$.apply(HiveStrategies.scala:160)

  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:222)

  at scala.collection.LinearSeqOps.foldLeft(LinearSeq.scala:183)

  at scala.collection.LinearSeqOps.foldLeft$(LinearSeq.scala:179)

  at scala.collection.immutable.List.foldLeft(List.scala:79)

  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:219)

  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:211)

  at scala.collection.immutable.List.foreach(List.scala:333)

  at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:211)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:224)

  at 
org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:220)

  at 
org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:176)

  at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:220)

  at 

[jira] [Created] (SPARK-46141) Change default of spark.sql.legacy.ctePrecedencePolicy from EXCEPTION to CORRECTED

2023-11-28 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-46141:


 Summary: Change default of spark.sql.legacy.ctePrecedencePolicy 
from EXCEPTION to CORRECTED
 Key: SPARK-46141
 URL: https://issues.apache.org/jira/browse/SPARK-46141
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


spark.sql.legacy.ctePrecedencePolicy has been around for years and is defaulted 
to EXCEPTION.
It is high time that we change it to corrected



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46068) Improve error message when using a string literal where only an identifier can go

2023-11-22 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-46068:


 Summary: Improve error message when using a string literal where 
only an identifier can go
 Key: SPARK-46068
 URL: https://issues.apache.org/jira/browse/SPARK-46068
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.2
Reporter: Serge Rielau


The following:
{noformat}
*nospark-sql (default)> select * from "t";


[PARSE_SYNTAX_ERROR] Syntax error at or near '"t"'. SQLSTATE: 42601 (line 1, 
pos 14)


== SQL ==
select * from "t"
--^^^

{noformat}
.. is confusing if one is used to double quotes for identifiers.
Similarly, it is easy to mix up ' and `.
So we would like to return an error that clearly states that a string was given 
where an identifier was expected. We can also propose using 
spark.sql.ansi.double_quoted_identifiers in that case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45595) Expose SQLSTATE in error message

2023-10-18 Thread Serge Rielau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-45595:
-
Summary: Expose SQLSTATE in error message  (was: Expose SQLSTATRE in 
errormessage)

> Expose SQLSTATE in error message
> 
>
> Key: SPARK-45595
> URL: https://issues.apache.org/jira/browse/SPARK-45595
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Priority: Major
>
> When using spark.sql.error.messageFormat in MINIMAL or STANDARD mode the 
> SQLSTATE is exposed;
> We want to extend this to PRETTY mode, now that all errors have SQLSTATEs
> We propose to trail the SQLSTATE after the text message, so it does not take 
> away from the reading experience of the message, while still being easily 
> found by tooling or humans.
> []  SQLSTATE: 
> 
> Example:
> {{[DIVIDE_BY_ZERO] ** Division by zero. Use `try_divide` to tolerate divisor 
> being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to 
> "false" to bypass this error. SQLSTATE: 22013}}
> {{{}== SQL(line 1, position 8){}}}{{{}==
> {}}}{{{}SELECT 1/0
> {}}}{{   ^^^}}
> Other options considered have been:
> {{[DIVIDE_BY_ZERO](22013) ** Division by zero. Use `try_divide` to tolerate 
> divisor being 0 and return NULL instead. If necessary set 
> "spark.sql.ansi.enabled" to "false" to bypass this error. }}
> {{{}== SQL(line 1, position 8){}}}{{{}==
> {}}}{{{}SELECT 1/0
> {}}}{{   ^^^}}
> {{and}}
> [DIVIDE_BY_ZERO] ** Division by zero. Use `try_divide` to tolerate 
> divisor being 0 and return NULL instead. If necessary set 
> "spark.sql.ansi.enabled" to "false" to bypass this error.}}
> {{{}== SQL(line 1, position 8){}}}{{{}=={}}}
> {{SELECT 1/0}}
> {{   ^^^}}
> SQLSTATE: 22013
> }}{{{}{{}}{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45595) Expose SQLSTATRE in errormessage

2023-10-18 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-45595:


 Summary: Expose SQLSTATRE in errormessage
 Key: SPARK-45595
 URL: https://issues.apache.org/jira/browse/SPARK-45595
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


When using spark.sql.error.messageFormat in MINIMAL or STANDARD mode the 
SQLSTATE is exposed;
We want to extend this to PRETTY mode, now that all errors have SQLSTATEs

We propose to trail the SQLSTATE after the text message, so it does not take 
away from the reading experience of the message, while still being easily found 
by tooling or humans.
[]  SQLSTATE: 


Example:


{{[DIVIDE_BY_ZERO] ** Division by zero. Use `try_divide` to tolerate divisor 
being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to 
"false" to bypass this error. SQLSTATE: 22013}}
{{{}== SQL(line 1, position 8){}}}{{{}==
{}}}{{{}SELECT 1/0
{}}}{{   ^^^}}

Other options considered have been:
{{[DIVIDE_BY_ZERO](22013) ** Division by zero. Use `try_divide` to tolerate 
divisor being 0 and return NULL instead. If necessary set 
"spark.sql.ansi.enabled" to "false" to bypass this error. }}
{{{}== SQL(line 1, position 8){}}}{{{}==
{}}}{{{}SELECT 1/0
{}}}{{   ^^^}}


{{and}}

[DIVIDE_BY_ZERO] ** Division by zero. Use `try_divide` to tolerate divisor 
being 0 and return NULL instead. If necessary set "spark.sql.ansi.enabled" to 
"false" to bypass this error.}}
{{{}== SQL(line 1, position 8){}}}{{{}=={}}}
{{SELECT 1/0}}
{{   ^^^}}
SQLSTATE: 22013
}}{{{}{{}}{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45581) Make SQLSTATEs mandatory

2023-10-17 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-45581:


 Summary: Make SQLSTATEs mandatory
 Key: SPARK-45581
 URL: https://issues.apache.org/jira/browse/SPARK-45581
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


All well-defined (non _LEGACY) error classes have been issued SQLSTATEs.
To keep this clean, it is time to enforce that any new error classes must come 
with a SQLSTATE moving forward



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45490) Replace: _LEGACY_ERROR_TEMP_2151 with a proper error class

2023-10-11 Thread Serge Rielau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau resolved SPARK-45490.
--
Resolution: Cannot Reproduce

Seems to have been implemented as: EXPRESSION_DECODING_FAILED

> Replace: _LEGACY_ERROR_TEMP_2151 with a proper error class
> --
>
> Key: SPARK-45490
> URL: https://issues.apache.org/jira/browse/SPARK-45490
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Priority: Major
>
> {code:java}
> def expressionDecodingError(e: Exception, expressions: Seq[Expression]): 
> SparkRuntimeException = {
>   new SparkRuntimeException(
> errorClass = "_LEGACY_ERROR_TEMP_2151",
> messageParameters = Map(
>   "e" -> e.toString(),
>   "expressions" -> expressions.map(
> _.simpleString(SQLConf.get.maxToStringFields)).mkString("\n")),
> cause = e)
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45493) Replace: _LEGACY_ERROR_TEMP_2187 with a better error message

2023-10-10 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-45493:


 Summary: Replace: _LEGACY_ERROR_TEMP_2187 with a better error 
message
 Key: SPARK-45493
 URL: https://issues.apache.org/jira/browse/SPARK-45493
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


{code:java}
def convertHiveTableToCatalogTableError(
e: SparkException, dbName: String, tableName: String): Throwable = {
  new SparkException(
errorClass = "_LEGACY_ERROR_TEMP_2187",
messageParameters = Map(
  "message" -> e.getMessage,
  "dbName" -> dbName,
  "tableName" -> tableName),
cause = e)
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45492) Replace: _LEGACY_ERROR_TEMP_2152 with a better error class

2023-10-10 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-45492:


 Summary: Replace: _LEGACY_ERROR_TEMP_2152 with a better error class
 Key: SPARK-45492
 URL: https://issues.apache.org/jira/browse/SPARK-45492
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


{code:java}
def expressionEncodingError(e: Exception, expressions: Seq[Expression]): 
SparkRuntimeException = {
  new SparkRuntimeException(
errorClass = "_LEGACY_ERROR_TEMP_2152",
messageParameters = Map(
  "e" -> e.toString(),
  "expressions" -> expressions.map(
_.simpleString(SQLConf.get.maxToStringFields)).mkString("\n")),
cause = e)
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45491) Replace: _LEGACY_ERROR_TEMP_2196 with a better error class

2023-10-10 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-45491:


 Summary: Replace: _LEGACY_ERROR_TEMP_2196 with a better error class
 Key: SPARK-45491
 URL: https://issues.apache.org/jira/browse/SPARK-45491
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


{code:java}
def cannotFetchTablesOfDatabaseError(dbName: String, e: Exception): Throwable = 
{
  new SparkException(
errorClass = "_LEGACY_ERROR_TEMP_2196",
messageParameters = Map(
  "dbName" -> dbName),
cause = e)
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45490) Replace: _LEGACY_ERROR_TEMP_2151 with a proper error class

2023-10-10 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-45490:


 Summary: Replace: _LEGACY_ERROR_TEMP_2151 with a proper error class
 Key: SPARK-45490
 URL: https://issues.apache.org/jira/browse/SPARK-45490
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


{code:java}
def expressionDecodingError(e: Exception, expressions: Seq[Expression]): 
SparkRuntimeException = {
  new SparkRuntimeException(
errorClass = "_LEGACY_ERROR_TEMP_2151",
messageParameters = Map(
  "e" -> e.toString(),
  "expressions" -> expressions.map(
_.simpleString(SQLConf.get.maxToStringFields)).mkString("\n")),
cause = e)
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45489) Replace: _LEGACY_ERROR_TEMP_2134 with a regular error class

2023-10-10 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-45489:


 Summary: Replace:   _LEGACY_ERROR_TEMP_2134 with a regular error 
class
 Key: SPARK-45489
 URL: https://issues.apache.org/jira/browse/SPARK-45489
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


This is a frequently seen error we should convert:
def cannotParseStringAsDataTypeError(pattern: String, value: String, dataType: 
DataType)
: SparkRuntimeException = {
new SparkRuntimeException(
errorClass = "_LEGACY_ERROR_TEMP_2134",
messageParameters = Map(
"value" -> toSQLValue(value),
"pattern" -> toSQLValue(pattern),
"dataType" -> dataType.toString))
}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45487) Replace: _LEGACY_ERROR_TEMP_3007

2023-10-10 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-45487:


 Summary: Replace: _LEGACY_ERROR_TEMP_3007
 Key: SPARK-45487
 URL: https://issues.apache.org/jira/browse/SPARK-45487
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


def checkpointRDDBlockIdNotFoundError(rddBlockId: RDDBlockId): Throwable = \{
new SparkException(
  errorClass = "_LEGACY_ERROR_TEMP_3007",
  messageParameters = Map("rddBlockId" -> s"$rddBlockId"),
  cause = null
)
  }
This error condition appears to be quite common, so we should convert it to a 
proper error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45367) Add errorclass and sqlstate for: _LEGACY_ERROR_TEMP_1273

2023-09-27 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-45367:


 Summary: Add errorclass and sqlstate for: _LEGACY_ERROR_TEMP_1273
 Key: SPARK-45367
 URL: https://issues.apache.org/jira/browse/SPARK-45367
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Serge Rielau


This seems to be a very common error



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45132) Fix IDENTIFIER clause for functions

2023-09-12 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-45132:


 Summary: Fix IDENTIFIER clause for functions
 Key: SPARK-45132
 URL: https://issues.apache.org/jira/browse/SPARK-45132
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Serge Rielau


Due to a quirk in the grammar IDENTIFIER('foo')() does not resolve 
depending on .

Example:
SELECT IDENTIFIER('abs')(-1) works, but
SELECT IDENTIFIER('abs')(c1) FROM VALUES(-1) AS T(c1) does not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-44840) array_insert() give wrong results for ngative index

2023-08-20 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756574#comment-17756574
 ] 

Serge Rielau edited comment on SPARK-44840 at 8/20/23 7:41 PM:
---

[~srowen] There is no standard as such.
However, there are multiple reasons not to be compatible with Snowflake:
1. Precedence: SUBSTR('Hello', 1, 1) => 'H', SUBSTR('Hello', -1, 1) => 'o' (not 
'l').
2. array access has been a mixed bag for us (some 0, some 1-based), but we have 
tried to move towards 1-based as well. e.g., element_at() is 1-based, and we 
use -1 (!) to get the last element.

3. Snowflake had no choice but to use -1 for the second last element because 1 
is their second element. Because they are 0-based they are unable to use 
array_insert() to append an element (short of passing the (length - 1) as 
parameter. So the proposal is  objectively more powerful.


was (Author: JIRAUSER288374):
[~srowen] There is no standard as such.
However, there are multiple reasons not to be compatible with Snowflake:
1. Precedence: SUBSTR('Hello', 1, 1) => 'H', SUBSTR('Hello', -1, 1) => 'o' (not 
'l').
2. array access has been a mixed bag for us (some 0, some 1-based), but we have 
tried to move towards 1-based as well. e.g., element_at() is 1-based, and we 
use -1 (!) to get the last element.

3. Snowflake had no choice but to use 1 for the second last element because 1 
is their second element. Because they are 0-based they are unable to use 
array_insert() to append an element (short of passing the (length - 1) as 
parameter. So the proposal is  objectively more powerful.

> array_insert() give wrong results for ngative index
> ---
>
> Key: SPARK-44840
> URL: https://issues.apache.org/jira/browse/SPARK-44840
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Max Gekk
>Priority: Major
>
> Unlike in Snowflake we decided that array_inert() is 1 based.
> This means 1 is the first element in an array and -1 is the last. 
> This matches the behavior of functions such as substr() and element_at().
>  
> {code:java}
> > SELECT array_insert(array('a', 'b', 'c'), 1, 'z');
> ["z","a","b","c"]
> > SELECT array_insert(array('a', 'b', 'c'), 0, 'z');
> Error
> > SELECT array_insert(array('a', 'b', 'c'), -1, 'z');
> ["a","b","c","z"]
> > SELECT array_insert(array('a', 'b', 'c'), 5, 'z');
> ["a","b","c",NULL,"z"]
> > SELECT array_insert(array('a', 'b', 'c'), -5, 'z');
> ["z",NULL,"a","b","c"]
> > SELECT array_insert(array('a', 'b', 'c'), 2, cast(NULL AS STRING));
> ["a",NULL,"b","c"]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44840) array_insert() give wrong results for ngative index

2023-08-20 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756574#comment-17756574
 ] 

Serge Rielau commented on SPARK-44840:
--

[~srowen] There is no standard as such.
However, there are multiple reasons not to be compatible with Snowflake:
1. Precedence: SUBSTR('Hello', 1, 1) => 'H', SUBSTR('Hello', -1, 1) => 'o' (not 
'l').
2. array access has been a mixed bag for us (some 0, some 1-based), but we have 
tried to move towards 1-based as well. e.g., element_at() is 1-based, and we 
use -1 (!) to get the last element.

3. Snowflake had no choice but to use 1 for the second last element because 1 
is their second element. Because they are 0-based they are unable to use 
array_insert() to append an element (short of passing the (length - 1) as 
parameter. So the proposal is  objectively more powerful.

> array_insert() give wrong results for ngative index
> ---
>
> Key: SPARK-44840
> URL: https://issues.apache.org/jira/browse/SPARK-44840
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Max Gekk
>Priority: Major
>
> Unlike in Snowflake we decided that array_inert() is 1 based.
> This means 1 is the first element in an array and -1 is the last. 
> This matches the behavior of functions such as substr() and element_at().
>  
> {code:java}
> > SELECT array_insert(array('a', 'b', 'c'), 1, 'z');
> ["z","a","b","c"]
> > SELECT array_insert(array('a', 'b', 'c'), 0, 'z');
> Error
> > SELECT array_insert(array('a', 'b', 'c'), -1, 'z');
> ["a","b","c","z"]
> > SELECT array_insert(array('a', 'b', 'c'), 5, 'z');
> ["a","b","c",NULL,"z"]
> > SELECT array_insert(array('a', 'b', 'c'), -5, 'z');
> ["z",NULL,"a","b","c"]
> > SELECT array_insert(array('a', 'b', 'c'), 2, cast(NULL AS STRING));
> ["a",NULL,"b","c"]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44840) array_insert() give wrong results for ngative index

2023-08-16 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-44840:


 Summary: array_insert() give wrong results for ngative index
 Key: SPARK-44840
 URL: https://issues.apache.org/jira/browse/SPARK-44840
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


Unlike in Snowflake we decided that array_inert() is 1 based.
This means 1 is the first element in an array and -1 is the last. 
This matches the behavior of functions such as substr() and element_at().

 
{code:java}
> SELECT array_insert(array('a', 'b', 'c'), 1, 'z');
["z","a","b","c"]
> SELECT array_insert(array('a', 'b', 'c'), 0, 'z');
Error
> SELECT array_insert(array('a', 'b', 'c'), -1, 'z');
["a","b","c","z"]
> SELECT array_insert(array('a', 'b', 'c'), 5, 'z');
["a","b","c",NULL,"z"]
> SELECT array_insert(array('a', 'b', 'c'), -5, 'z');
["z",NULL,"a","b","c"]
> SELECT array_insert(array('a', 'b', 'c'), 2, cast(NULL AS STRING));
["a",NULL,"b","c"]
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44838) Enhance raise_error() to exploit the new error framework

2023-08-16 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-44838:


 Summary: Enhance raise_error() to exploit the new error framework
 Key: SPARK-44838
 URL: https://issues.apache.org/jira/browse/SPARK-44838
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Serge Rielau


raise_error() and assert_true() do not presently utilize the new error 
framework.
We want to generalize raise_error() to take an error class, sqlstate and 
message parameters as arguments to compose a well-formed error condition.
The existing assert_true(0 and raise_error() versions should return an error 
class 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44780) Document SQL Session variables

2023-08-11 Thread Serge Rielau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-44780:
-
Attachment: Screenshot 2023-08-11 at 10.22.55 PM.png
Screenshot 2023-08-11 at 10.24.33 PM.png
Screenshot 2023-08-11 at 10.26.54 PM.png

> Document SQL Session variables
> --
>
> Key: SPARK-44780
> URL: https://issues.apache.org/jira/browse/SPARK-44780
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.4.2
>Reporter: Serge Rielau
>Priority: Major
> Attachments: Screenshot 2023-08-11 at 10.22.55 PM.png, Screenshot 
> 2023-08-11 at 10.24.33 PM.png, Screenshot 2023-08-11 at 10.26.54 PM.png
>
>
> SQL Session variables have been added with: SPARK-42849.
> Here we add the docs for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44780) Document SQL Session variables

2023-08-11 Thread Serge Rielau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-44780:
-
Summary: Document SQL Session variables  (was: Docuement SQL Session 
variables)

> Document SQL Session variables
> --
>
> Key: SPARK-44780
> URL: https://issues.apache.org/jira/browse/SPARK-44780
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.4.2
>Reporter: Serge Rielau
>Priority: Major
>
> SQL Session variables have been added with: SPARK-42849.
> Here we add the docs for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44780) Docuement SQL Session variables

2023-08-11 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-44780:


 Summary: Docuement SQL Session variables
 Key: SPARK-44780
 URL: https://issues.apache.org/jira/browse/SPARK-44780
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Affects Versions: 3.4.2
Reporter: Serge Rielau


SQL Session variables have been added with: SPARK-42849.
Here we add the docs for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44680) parameter markers are not blocked from DEFAULT (and other places)

2023-08-04 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-44680:


 Summary: parameter markers are not blocked from DEFAULT (and other 
places)
 Key: SPARK-44680
 URL: https://issues.apache.org/jira/browse/SPARK-44680
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


scala> spark.sql("CREATE TABLE t11(c1 int default :parm)", args = Map("parm" -> 
5)).show()

-> success

scala> spark.sql("describe t11");

[INVALID_DEFAULT_VALUE.UNRESOLVED_EXPRESSION] Failed to execute EXISTS_DEFAULT 
command because the destination table column `c1` has a DEFAULT value :parm, 
which fails to resolve as a valid expression.

This likely extends to other DDL-y places.
I can only find protection against placement in the body of a CREATE VIEW.

I see two ways out of this:
* Raise an error (as we do for CREATE VIEW v1(c1) AS SELECT ? )
 * Improve the way we persist queries/expressions to substitute the at-DDL-time 
bound parameter value (it' not a bug it's a feature)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44577) INSERT BY NAME returns non-sensical error message

2023-07-27 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-44577:


 Summary: INSERT BY NAME returns non-sensical error message
 Key: SPARK-44577
 URL: https://issues.apache.org/jira/browse/SPARK-44577
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


CREATE TABLE bug(c1 INT);

INSERT INTO bug BY NAME SELECT 1 AS c2;

==> Multi-part identifier cannot be empty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43438) Fix mismatched column list error on INSERT

2023-07-06 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740789#comment-17740789
 ] 

Serge Rielau edited comment on SPARK-43438 at 7/6/23 8:17 PM:
--

spark-sql (default)> INSERT INTO tabtest SELECT 1;
This should NOT succeed.


was (Author: JIRAUSER288374):
spark-sql (default)> INSERT INTO tabtest SELECT 1;
This should NOT succeed.

> Fix mismatched column list error on INSERT
> --
>
> Key: SPARK-43438
> URL: https://issues.apache.org/jira/browse/SPARK-43438
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> This error message is pretty bad, and common
> "_LEGACY_ERROR_TEMP_1038" : {
> "message" : [
> "Cannot write to table due to mismatched user specified column 
> size() and data column size()."
> ]
> },
> It can perhaps be merged with this one - after giving it an ERROR_CLASS
> "_LEGACY_ERROR_TEMP_1168" : {
> "message" : [
> " requires that the data to be inserted have the same number of 
> columns as the target table: target table has  column(s) but 
> the inserted data has  column(s), including  
> partition column(s) having constant value(s)."
> ]
> },
> Repro:
> CREATE TABLE tabtest(c1 INT, c2 INT);
> INSERT INTO tabtest SELECT 1;
> `spark_catalog`.`default`.`tabtest` requires that the data to be inserted 
> have the same number of columns as the target table: target table has 2 
> column(s) but the inserted data has 1 column(s), including 0 partition 
> column(s) having constant value(s).
> INSERT INTO tabtest(c1) SELECT 1, 2, 3;
> Cannot write to table due to mismatched user specified column size(1) and 
> data column size(3).; line 1 pos 24
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43438) Fix mismatched column list error on INSERT

2023-07-06 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740789#comment-17740789
 ] 

Serge Rielau commented on SPARK-43438:
--

spark-sql (default)> INSERT INTO tabtest SELECT 1;
This should NOT succeed.

> Fix mismatched column list error on INSERT
> --
>
> Key: SPARK-43438
> URL: https://issues.apache.org/jira/browse/SPARK-43438
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> This error message is pretty bad, and common
> "_LEGACY_ERROR_TEMP_1038" : {
> "message" : [
> "Cannot write to table due to mismatched user specified column 
> size() and data column size()."
> ]
> },
> It can perhaps be merged with this one - after giving it an ERROR_CLASS
> "_LEGACY_ERROR_TEMP_1168" : {
> "message" : [
> " requires that the data to be inserted have the same number of 
> columns as the target table: target table has  column(s) but 
> the inserted data has  column(s), including  
> partition column(s) having constant value(s)."
> ]
> },
> Repro:
> CREATE TABLE tabtest(c1 INT, c2 INT);
> INSERT INTO tabtest SELECT 1;
> `spark_catalog`.`default`.`tabtest` requires that the data to be inserted 
> have the same number of columns as the target table: target table has 2 
> column(s) but the inserted data has 1 column(s), including 0 partition 
> column(s) having constant value(s).
> INSERT INTO tabtest(c1) SELECT 1, 2, 3;
> Cannot write to table due to mismatched user specified column size(1) and 
> data column size(3).; line 1 pos 24
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43918) Cannot CREATE VIEW despite columns explicitly aliased

2023-06-01 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-43918:


 Summary: Cannot CREATE VIEW despite columns explicitly aliased 
 Key: SPARK-43918
 URL: https://issues.apache.org/jira/browse/SPARK-43918
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


spark.sql("CREATE VIEW v(c) AS SELECT b AS c FROM (SELECT (SELECT 1)) AS 
T(b)").show()

org.apache.spark.sql.AnalysisException: Not allowed to create a permanent view 
`spark_catalog`.`default`.`v` without explicitly assigning an alias for 
expression c.

The problem seems to be the scalar subquery (SELECT 1) not being aliased.
But that shouldn't matter. "AS X should be the backstop. In fact, AS T(b) 
should have been sufficient.
Not to speak of the fact that the column is named c in the view header. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43884) Allow parameter markers in DDL (again)

2023-05-30 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-43884:


 Summary: Allow parameter markers in DDL (again)
 Key: SPARK-43884
 URL: https://issues.apache.org/jira/browse/SPARK-43884
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Serge Rielau


When we introduced parameter markers initially, we allowed them n any SQL 
statement.
Subsequently, we have limited their use to DML and queries because that aligns 
better with the industry and we saw no immediate use for broader support.

However, we have introduced the IDENTIFIER() clause, which allows templating 
table-, and function-identifiers in DDL statements. 
To exploit this, we need parameter markers as argument:

spark.sql("CREATE TABLE IDENTIFIER(:tableName) (c1 INT)", args = 
Map("tableName" -> "mytable") 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43472) Mexico has changed observation of DST, this breaks timestamp_utc()

2023-05-11 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-43472:


 Summary: Mexico has changed observation of DST, this breaks 
timestamp_utc()
 Key: SPARK-43472
 URL: https://issues.apache.org/jira/browse/SPARK-43472
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


[https://www.timeanddate.com/time/change/mexico/mexico-city?year=2023#:~:text=Daylight%20Saving%20Time%20(DST)%20Not,was%20on%20October%2030%2C%202022.]

Mexico has stopped observing DST. This results in wrong results for:
from_utc_timestamp([timestamp], 'America/Mexico_City')



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43438) Fix mismatched column list error on INSERT

2023-05-10 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-43438:


 Summary: Fix mismatched column list error on INSERT
 Key: SPARK-43438
 URL: https://issues.apache.org/jira/browse/SPARK-43438
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


This error message is pretty bad, and common
"_LEGACY_ERROR_TEMP_1038" : {
"message" : [
"Cannot write to table due to mismatched user specified column 
size() and data column size()."
]
},

It can perhaps be merged with this one - after giving it an ERROR_CLASS

"_LEGACY_ERROR_TEMP_1168" : {
"message" : [
" requires that the data to be inserted have the same number of 
columns as the target table: target table has  column(s) but the 
inserted data has  column(s), including  
partition column(s) having constant value(s)."
]
},



Repro:

CREATE TABLE tabtest(c1 INT, c2 INT);


INSERT INTO tabtest SELECT 1;

`spark_catalog`.`default`.`tabtest` requires that the data to be inserted have 
the same number of columns as the target table: target table has 2 column(s) 
but the inserted data has 1 column(s), including 0 partition column(s) having 
constant value(s).

INSERT INTO tabtest(c1) SELECT 1, 2, 3;
Cannot write to table due to mismatched user specified column size(1) and data 
column size(3).; line 1 pos 24


 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43359) DELETE from Hive table result in INTERNAL error

2023-05-03 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-43359:


 Summary: DELETE from Hive table result in INTERNAL error
 Key: SPARK-43359
 URL: https://issues.apache.org/jira/browse/SPARK-43359
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


spark-sql (default)> CREATE TABLE T1(c1 INT);
spark-sql (default)> DELETE FROM T1 WHERE c1 = 1;
[INTERNAL_ERROR] Unexpected table relation: HiveTableRelation 
[`spark_catalog`.`default`.`t1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [c1#3], 
Partition Cols: []]

org.apache.spark.SparkException: [INTERNAL_ERROR] Unexpected table relation: 
HiveTableRelation [`spark_catalog`.`default`.`t1`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [c1#3], 
Partition Cols: []]
at 
org.apache.spark.SparkException$.internalError(SparkException.scala:77)
at 
org.apache.spark.SparkException$.internalError(SparkException.scala:81)
at 
org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy.apply(DataSourceV2Strategy.scala:310)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:63)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:491)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:70)
at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:78)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43205) Add an IDENTIFIER(stringLiteral) clause that maps a string to an identifier

2023-04-19 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-43205:


 Summary: Add an IDENTIFIER(stringLiteral) clause that maps a 
string to an identifier
 Key: SPARK-43205
 URL: https://issues.apache.org/jira/browse/SPARK-43205
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Serge Rielau


There is a requirement for SQL templates, where the table and or column names 
are provided through substitution. This can be done today using variable 
substitution:
SET hivevar:tabname = mytab;
SELECT * FROM ${ hivevar:tabname };

A straight variable substitution is dangerous since it does allow for SQL 
injection:
SET hivevar:tabname = mytab, someothertab;
SELECT * FROM ${ hivevar:tabname };

A way to get around this problem is to wrap the variable substitution with a 
clause that limits the scope t produce an identifier.
This approach is taken by Snowflake:
 
[https://docs.snowflake.com/en/sql-reference/session-variables#using-variables-in-sql]

SET hivevar:tabname = 'tabname';
SELECT * FROM IDENTIFIER(${ hivevar:tabname })



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42919) SELECT * LIKE 'pattern' FROM ....

2023-03-24 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-42919:


 Summary: SELECT * LIKE 'pattern' FROM 
 Key: SPARK-42919
 URL: https://issues.apache.org/jira/browse/SPARK-42919
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Serge Rielau


SparkSQL supports *regex_column_names.*
[https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select.html]
However, support depends on a config: spark.sql.parser.quotedRegexColumnNames
The reason is that it overloads proper identifier names.

Here we propose a cleaner, compatible API:

SELECT * LIKE 'pattern';

The semantic should follow common regular expression patterns used for the LIKE 
operator with the caveat that it should obey identifier case insensitivity 
setting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42919) SELECT * LIKE 'pattern' FROM ....

2023-03-24 Thread Serge Rielau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-42919:
-
Description: 
SparkSQL supports *regex_column_names.*
[https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select.html]
However, support depends on a config: spark.sql.parser.quotedRegexColumnNames
The reason is that it overloads proper identifier names.

Here we propose a cleaner, compatible API:

SELECT * LIKE 'pattern' ...

The semantic should follow common regular expression patterns used for the LIKE 
operator with the caveat that it should obey identifier case insensitivity 
setting.

  was:
SparkSQL supports *regex_column_names.*
[https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select.html]
However, support depends on a config: spark.sql.parser.quotedRegexColumnNames
The reason is that it overloads proper identifier names.

Here we propose a cleaner, compatible API:

SELECT * LIKE 'pattern';

The semantic should follow common regular expression patterns used for the LIKE 
operator with the caveat that it should obey identifier case insensitivity 
setting.


> SELECT * LIKE 'pattern' FROM 
> -
>
> Key: SPARK-42919
> URL: https://issues.apache.org/jira/browse/SPARK-42919
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Serge Rielau
>Priority: Minor
>
> SparkSQL supports *regex_column_names.*
> [https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select.html]
> However, support depends on a config: spark.sql.parser.quotedRegexColumnNames
> The reason is that it overloads proper identifier names.
> Here we propose a cleaner, compatible API:
> SELECT * LIKE 'pattern' ...
> The semantic should follow common regular expression patterns used for the 
> LIKE operator with the caveat that it should obey identifier case 
> insensitivity setting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42546) SPARK-42045 is incomplete in supporting ANSI_MODE fro round() and bround()

2023-03-19 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702337#comment-17702337
 ] 

Serge Rielau commented on SPARK-42546:
--

Can't speak for Wenchen, but +1 [~ddavies1] 

> SPARK-42045 is incomplete in supporting ANSI_MODE fro round() and bround()
> --
>
> Key: SPARK-42546
> URL: https://issues.apache.org/jira/browse/SPARK-42546
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> under ANSI mode SPARK-42045 added error conditions insetad of silent 
> overflows for edge cases in round() and bround().
> However it appears this fix works only for the INT data type. Trying it on a 
> e.g. SMALLINT the function still returns wrong results:
> {code:java}
> spark-sql> select round(2147483647, -1);
> [ARITHMETIC_OVERFLOW] Overflow. If necessary set "spark.sql.ansi.enabled" to 
> "false" to bypass this error.{code}
> {code:java}
> spark-sql> select round(127y, -1);
> -126 {code}
>    



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-42546) SPARK-42045 is incomplete in supporting ANSI_MODE fro round() and bround()

2023-03-19 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702337#comment-17702337
 ] 

Serge Rielau edited comment on SPARK-42546 at 3/19/23 6:30 PM:
---

Can't speak for [~cloud_fan] , but +1 [~ddavies1] 


was (Author: JIRAUSER288374):
Can't speak for Wenchen, but +1 [~ddavies1] 

> SPARK-42045 is incomplete in supporting ANSI_MODE fro round() and bround()
> --
>
> Key: SPARK-42546
> URL: https://issues.apache.org/jira/browse/SPARK-42546
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> under ANSI mode SPARK-42045 added error conditions insetad of silent 
> overflows for edge cases in round() and bround().
> However it appears this fix works only for the INT data type. Trying it on a 
> e.g. SMALLINT the function still returns wrong results:
> {code:java}
> spark-sql> select round(2147483647, -1);
> [ARITHMETIC_OVERFLOW] Overflow. If necessary set "spark.sql.ansi.enabled" to 
> "false" to bypass this error.{code}
> {code:java}
> spark-sql> select round(127y, -1);
> -126 {code}
>    



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42849) Session variables

2023-03-17 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-42849:


 Summary: Session variables
 Key: SPARK-42849
 URL: https://issues.apache.org/jira/browse/SPARK-42849
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Serge Rielau


Provide a type-safe, engine controlled session variable:

CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name  [ type ][ 
DEFAULT expresion ]

SET {  variable = expression | ( variable [, ...] ) = ( subquery | expression 
[, ...] )

DROP VARIABLE  [ IF EXISTS ]variable_name



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42638) current_user() is blocked from VALUES, but current_timestamp() is not

2023-03-01 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-42638:


 Summary: current_user() is blocked from VALUES, but 
current_timestamp() is not
 Key: SPARK-42638
 URL: https://issues.apache.org/jira/browse/SPARK-42638
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Serge Rielau


VALUES(current_user());
returns:

cannot evaluate expression current_user() in inline table definition.; line 1 
pos 8

 

The same with current_timestamp() works.

It appears current_user() is recognized as non-deterministic. But it is 
constant within the statement, just like current_timestanmp().

PS: It's not clear why we block non-deterministic functions to begin with



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42623) parameter markers not blocked in DDL

2023-02-28 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-42623:


 Summary: parameter markers not blocked in DDL
 Key: SPARK-42623
 URL: https://issues.apache.org/jira/browse/SPARK-42623
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau
 Fix For: 3.4.0


The parameterized query code does not block DDL statements from referencing 
parameter markers.
E.g. a 

 
{code:java}
scala> spark.sql(sqlText = "CREATE VIEW v1 AS SELECT current_timestamp() + 
:later as stamp, :x * :x AS square", args = Map("later" -> "INTERVAL'3' HOUR", 
"x" -> "15.0")).show()
++
||
++
++
{code}
It appears we have some protection that fails us when the view is invoked:

 
{code:java}
scala> spark.sql(sqlText = "SELECT * FROM v1", args = Map("later" -> 
"INTERVAL'3' HOUR", "x" -> "15.0")).show()
org.apache.spark.sql.AnalysisException: [UNBOUND_SQL_PARAMETER] Found the 
unbound parameter: `later`. Please, fix `args` and provide a mapping of the 
parameter to a SQL literal.; line 1 pos 29
{code}

Right now I think affected are:
* DEFAULT definition
* VIEW definition

but any other future standard expression popping up is at risk, such as SQL 
Functions, or GENERATED COLUMN.

CREATE TABLE AS is debatable, since it it executes the query at definition only.
For simplicity I propose to block the feature from ANY DDL statement (CREATE, 
ALTER).

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42546) SPARK-42045 is incomplete in supporting ANSI_MODE fro round() and bround()

2023-02-23 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-42546:


 Summary: SPARK-42045 is incomplete in supporting ANSI_MODE fro 
round() and bround()
 Key: SPARK-42546
 URL: https://issues.apache.org/jira/browse/SPARK-42546
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.2, 3.4.0
Reporter: Serge Rielau


under ANSI mode SPARK-42045 added error conditions insetad of silent overflows 
for edge cases in round() and bround().
However it appears this fix works only for the INT data type. Trying it on a 
e.g. SMALLINT the function still returns wrong results:
{code:java}
spark-sql> select round(2147483647, -1);
[ARITHMETIC_OVERFLOW] Overflow. If necessary set "spark.sql.ansi.enabled" to 
"false" to bypass this error.{code}
{code:java}
spark-sql> select round(127y, -1);
-126 {code}
   



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42399) CONV() silently overflows returning wrong results

2023-02-14 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688755#comment-17688755
 ] 

Serge Rielau commented on SPARK-42399:
--

Adding support is of course best. If it can be done quickly, if not we should 
stop the wrong results first.

> CONV() silently overflows returning wrong results
> -
>
> Key: SPARK-42399
> URL: https://issues.apache.org/jira/browse/SPARK-42399
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Critical
>
> spark-sql> SELECT 
> CONV(SUBSTRING('0x',
>  3), 16, 10);
> 18446744073709551615
> Time taken: 2.114 seconds, Fetched 1 row(s)
> spark-sql> set spark.sql.ansi.enabled = true;
> spark.sql.ansi.enabled true
> Time taken: 0.068 seconds, Fetched 1 row(s)
> spark-sql> SELECT 
> CONV(SUBSTRING('0x',
>  3), 16, 10);
> 18446744073709551615
> Time taken: 0.05 seconds, Fetched 1 row(s)
> In ANSI mode we should raise an error for sure.
> In non ANSI either an error or a NULL maybe be acceptable.
> Alternatively, of course, we could consider if we can support arbitrary 
> domains since the result is a STRING again. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42399) CONV() silently overflows returning wrong results

2023-02-10 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-42399:


 Summary: CONV() silently overflows returning wrong results
 Key: SPARK-42399
 URL: https://issues.apache.org/jira/browse/SPARK-42399
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


spark-sql> SELECT 
CONV(SUBSTRING('0x',
 3), 16, 10);

18446744073709551615

Time taken: 2.114 seconds, Fetched 1 row(s)

spark-sql> set spark.sql.ansi.enabled = true;

spark.sql.ansi.enabled true

Time taken: 0.068 seconds, Fetched 1 row(s)

spark-sql> SELECT 
CONV(SUBSTRING('0x',
 3), 16, 10);

18446744073709551615

Time taken: 0.05 seconds, Fetched 1 row(s)


In ANSI mode we should raise an error for sure.
In non ANSI either an error or a NULL maybe be acceptable.

Alternatively, of course, we could consider if we can support arbitrary domains 
since the result is a STRING again. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42066) The DATATYPE_MISMATCH error class contains inappropriate and duplicating subclasses

2023-01-14 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-42066:


 Summary: The DATATYPE_MISMATCH error class contains inappropriate 
and duplicating subclasses
 Key: SPARK-42066
 URL: https://issues.apache.org/jira/browse/SPARK-42066
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


subclass WRONG_NUM_ARGS (with suggestions) semantically does not belong into 
DATATYPE_MISMATCH and there is an error class with that same name.
We should rea the subclasses for this errorclass, which seems to have become a 
bit of a dumping ground...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42058) Harden SQLSTATE usage for error classes (2)

2023-01-13 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-42058:


 Summary: Harden SQLSTATE usage for error classes (2)
 Key: SPARK-42058
 URL: https://issues.apache.org/jira/browse/SPARK-42058
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


Error classes are great, but for JDBC, ODBC etc the SQLSTATEs of the standard 
reign.
We have started adding SQLSTATEs but have not really paid attention to their 
correctness.
Follow up to: https://issues.apache.org/jira/browse/SPARK-41994



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41994) Harden SQLSTATE usage for error classes

2023-01-11 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-41994:


 Summary: Harden SQLSTATE usage for error classes
 Key: SPARK-41994
 URL: https://issues.apache.org/jira/browse/SPARK-41994
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


Error classes are great, but for JDBC, ODBC etc the SQLSTATEs of the standard 
reign.
We have started adding SQLSTATEs but have not really paid attention to their 
correctness.
Here is a unified view of SQLSTATE's used in the  
[Industry.|https://docs.google.com/spreadsheets/d/1hrQBSuHooiozUNAQTHiYq3WidS1uliHpl9cYfWpig1c/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41931) Improve UNSUPPORTED_DATA_TYPE message for complex types

2023-01-06 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-41931:


 Summary: Improve UNSUPPORTED_DATA_TYPE message for complex types
 Key: SPARK-41931
 URL: https://issues.apache.org/jira/browse/SPARK-41931
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


spark-sql> SELECT CAST(array(1, 2, 3) AS ARRAY);

[UNSUPPORTED_DATATYPE] Unsupported data type "ARRAY"(line 1, pos 30)

== SQL ==

SELECT CAST(array(1, 2, 3) AS ARRAY)

--^^^

This error message is confusing. We support ARRAY. We just require it to be 
typed.
We should have an error like:
[INCOMPLETE_TYPE_DEFINITION.ARRAY] The definition of type `ARRAY` is 
incomplete. You must provide an element type. For example: `ARRAY\`.
Similarly for STRUCT and MAP.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41670) Introduce builtin and session namespaces for builtin functions and temp views/functions

2022-12-21 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-41670:


 Summary: Introduce builtin and session namespaces for builtin 
functions and temp views/functions 
 Key: SPARK-41670
 URL: https://issues.apache.org/jira/browse/SPARK-41670
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


Spark today allows overloading between persisted relations and function and 
temporary relations and functions.
It also allows overloading between persisted functions and builtin functions.

While Spark allows us to disambiguate a persisted objects by qualifying it, 
there is no qualifier for temp or builtin objects.

Here we propose to use `builtin` for builtin objects and `session` for session 
temporary objects.
If there is a conflict with persisted schemas of this name we can further 
declare that the catalog for both is `system`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41668) DECODE function returns wrong results when passed NULL

2022-12-21 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-41668:


 Summary: DECODE function returns wrong results when passed NULL
 Key: SPARK-41668
 URL: https://issues.apache.org/jira/browse/SPARK-41668
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2
Reporter: Serge Rielau


The DECODE function was implemented for Oracle compatibility. It works similar 
to CASE expression, but it is supposed to have one major difference: NULL == 
NULL
[https://docs.oracle.com/database/121/SQLRF/functions057.htm#SQLRF00631]

The Spark implementation does not observe this however:

select decode(null, 6, 'Spark', NULL, 'SQL', 4, 'rocks');

NULL

 The result is supposed to be 'SQL'



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41353) UNRESOLVED_ROUTINE error class

2022-12-01 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-41353:


 Summary: UNRESOLVED_ROUTINE error class
 Key: SPARK-41353
 URL: https://issues.apache.org/jira/browse/SPARK-41353
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


We want to unify and  name:
"_LEGACY_ERROR_TEMP_1041" : \{
  "message" : [
"Undefined function ."  ]
},
_LEGACY_ERROR_TEMP_1242" : \{
  "message" : [
"Undefined function: . This function is neither a 
built-in/temporary function, nor a persistent function that is qualified as 
."  ]
},"_LEGACY_ERROR_TEMP_1243" : {
  "message" : [
"Undefined function: "  ]
I proposal is:
UNRESOLVED_ROUTINE. routineName => `a`.`b`.`func`, routineSignature => [INT, 
STRING] , searchPath => [`builtin`, `session`, `hiveMetaStore`.`default`]
This assumes agreement to introduce `builtin` as optional qualifier for builtin 
functions.
And `session` a optional qualifier for temporary functions (separate PR).

Q: Why ROUTINE?
A: Some day we may want to support PROCEDURES and they will follow the name 
rule and share the same namespace.

Q:Why A PATH
A: We do follow a hard coded path today with a fixed precedence  rule.

Q: Why provide the signature
A: Longterm we may support overloading of functions by arity, type or even 
parameter name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41323) Support CURRENT_SCHEMA() as alias for CURRENT_DATABASE()

2022-11-29 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-41323:


 Summary: Support CURRENT_SCHEMA() as alias for CURRENT_DATABASE()
 Key: SPARK-41323
 URL: https://issues.apache.org/jira/browse/SPARK-41323
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


CURRENT_SCHEMA is the keyword used in thE SQL Standard to refer to the current 
namespace. It is also supported my multiple other vendors:
PostgreSQL

Redshift

Snowflake
Db2
and others



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41104) Can insert NULL into hive table table with NOT NULL column

2022-11-10 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-41104:


 Summary: Can insert NULL into hive table table with NOT NULL column
 Key: SPARK-41104
 URL: https://issues.apache.org/jira/browse/SPARK-41104
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


spark-sql> CREATE TABLE tttd(c1 int not null);
22/11/10 14:04:28 WARN ResolveSessionCatalog: A Hive serde table will be 
created as there is no table provider specified. You can set 
spark.sql.legacy.createHiveTableByDefault to false so that native data source 
table will be created instead.
22/11/10 14:04:28 WARN HiveMetaStore: Location: 
file:/Users/serge.rielau/spark/spark-warehouse/tttd specified for non-external 
table:tttd
Time taken: 0.078 seconds
spark-sql> INSERT INTO tttd VALUES(null);
Time taken: 0.36 seconds
spark-sql> SELECT * FROM tttd;
NULL
Time taken: 0.074 seconds, Fetched 1 row(s)
spark-sql> 
Does hive not support NOT NULL? That's fine, but then we should fail on CREATE 
TABLE



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40822) Use stable derived-column-alias algorithm, suitable for CREATE VIEW

2022-10-17 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-40822:


 Summary: Use stable derived-column-alias algorithm, suitable for 
CREATE VIEW 
 Key: SPARK-40822
 URL: https://issues.apache.org/jira/browse/SPARK-40822
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


Spark has the ability derive column aliases for expressions if no alias was 
provided by the user.
E.g.
CREATE TABLE T(c1 INT, c2 INT);
SELECT c1, `(c1 + 1)`, c3 FROM (SELECT c1, c1 + 1, c1 * c2 AS c3 FROM T);

This is a valuable feature. However, the current implementation works by pretty 
printing the expression from the logical plan.  This has multiple downsides:
 * The derived names can be unintuitive. For example the brackets in `(c1 + 1)` 
or outright ugly, such as:
SELECT `substr(hello, 1, 2147483647)` FROM (SELECT substr('hello', 1)) AS T;
 * We cannot guarantee stability across versions since the logical lan of an 
expression may change.

The later is a major reason why we cannot allow CREATE VIEW without a column 
list except in "trivial" cases.

CREATE VIEW v AS SELECT c1, c1 + 1, c1 * c2 AS c3 FROM T;
Not allowed to create a permanent view `spark_catalog`.`default`.`v` without 
explicitly assigning an alias for expression (c1 + 1).

There are two way we can go about fixing this:
 # Stop deriving column aliases from the expression. Instead generate unique 
names such as `_col_1` based on their position in the select list. This is ugly 
and takes away the "nice" headers on result sets
 # Move the derivation of the name upstream. That is instead of pretty printing 
the logical plan we pretty print the lexer output, or a sanitized version of 
the expression as typed.
The statement as typed is stable by definition. The lexer is stable because i 
has no reason to change. And if it ever did we have a better chance to manage 
the change.

In this feature we propose the following semantic:
 # If the column alias can be trivially derived (some of these can stack), do 
so:
 ** a (qualified) column reference => the unqualified column identifier
cat.sch.tab.col => col
 ** A field reference => the fieldname
struct.field1.field2 => field2
 ** A cast(column AS type) => column
cast(col1 AS INT) => col1
 ** A map lookup with literal key => keyname
map.key => key
map['key'] => key
 ** A parameter less function => unqualified function name
current_schema() => current_schema
 # Take the lexer tokens of the expression, eliminate comments, and append them.
foo(tab1.c1 + /* this is a plus*/
1) => `foo(tab1.c1+1)`

 

Of course we wan this change under a config.
If the config is set we can allow CREATE VIEW to exploit this and use the 
derived expressions.

PS: The exact mechanics of formatting the name is very much debatable. 
E.g.spaces between token, squeezing out comments - upper casing - preserving 
quotes or double quotes...)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40585) Support double-quoted identifiers

2022-09-27 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-40585:


 Summary: Support double-quoted identifiers
 Key: SPARK-40585
 URL: https://issues.apache.org/jira/browse/SPARK-40585
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


In many SQL identifiers can be unquoted or quoted with double quotes. 
In Spark double quoted literals imply strings.
In this proposal we allow for a config:

double_quoted_identifiers
which, when set, switches the interpretation from string to identifier.

Note that back ticks are still allowed.
Also the treatment of escapes is not changed as part of this work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40521) PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions instead of the conflicting partition

2022-09-21 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607897#comment-17607897
 ] 

Serge Rielau commented on SPARK-40521:
--

Hive does return the offending partition. We just need to dig it out  !Screen 
Shot 2022-09-21 at 10.08.44 AM.png!!Screen Shot 2022-09-21 at 10.08.52 AM.png!

> PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions 
> instead of the conflicting partition
> -
>
> Key: SPARK-40521
> URL: https://issues.apache.org/jira/browse/SPARK-40521
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Minor
> Attachments: Screen Shot 2022-09-21 at 10.08.44 AM.png, Screen Shot 
> 2022-09-21 at 10.08.52 AM.png
>
>
> PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions 
> instead of the conflicting partition
> When I run:
> AlterTableAddPartitionSuiteBase for Hive
> The test: partition already exists
> Fails in my my local build ONLY in that mode because it reports two 
> partitions as conflicting where there should be only one. In all other modes 
> the test succeeds.
> The test is passing on master because the test does not check the partitions 
> themselves.
> Repro on master: Note that c1 = 1 does not already exist. It should NOT be 
> listed 
> create table t(c1 int, c2 int) partitioned by (c1);
> alter table t add partition (c1 = 2);
> alter table t add partition (c1 = 1) partition (c1 = 2);
> 22/09/21 09:30:09 ERROR Hive: AlreadyExistsException(message:Partition 
> already exists: Partition(values:[2], dbName:default, tableName:t, 
> createTime:0, lastAccessTime:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:c2, type:int, comment:null)], 
> location:file:/Users/serge.rielau/spark/spark-warehouse/t/c1=2, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:\{serialization.format=1}), bucketCols:[], sortCols:[], 
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
> skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
> parameters:null))
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.startAddPartition(HiveMetaStore.java:2744)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_core(HiveMetaStore.java:2442)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_req(HiveMetaStore.java:2560)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>  at com.sun.proxy.$Proxy31.add_partitions_req(Unknown Source)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:625)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>  at com.sun.proxy.$Proxy32.add_partitions(Unknown Source)
>  at org.apache.hadoop.hive.ql.metadata.Hive.createPartitions(Hive.java:2103)
>  at 
> org.apache.spark.sql.hive.client.Shim_v0_13.createPartitions(HiveShim.scala:763)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createPartitions$1(HiveClientImpl.scala:631)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:296)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
>  at 
> org.apac

[jira] [Updated] (SPARK-40521) PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions instead of the conflicting partition

2022-09-21 Thread Serge Rielau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-40521:
-
Attachment: Screen Shot 2022-09-21 at 10.08.52 AM.png
Screen Shot 2022-09-21 at 10.08.44 AM.png

> PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions 
> instead of the conflicting partition
> -
>
> Key: SPARK-40521
> URL: https://issues.apache.org/jira/browse/SPARK-40521
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Minor
> Attachments: Screen Shot 2022-09-21 at 10.08.44 AM.png, Screen Shot 
> 2022-09-21 at 10.08.52 AM.png
>
>
> PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions 
> instead of the conflicting partition
> When I run:
> AlterTableAddPartitionSuiteBase for Hive
> The test: partition already exists
> Fails in my my local build ONLY in that mode because it reports two 
> partitions as conflicting where there should be only one. In all other modes 
> the test succeeds.
> The test is passing on master because the test does not check the partitions 
> themselves.
> Repro on master: Note that c1 = 1 does not already exist. It should NOT be 
> listed 
> create table t(c1 int, c2 int) partitioned by (c1);
> alter table t add partition (c1 = 2);
> alter table t add partition (c1 = 1) partition (c1 = 2);
> 22/09/21 09:30:09 ERROR Hive: AlreadyExistsException(message:Partition 
> already exists: Partition(values:[2], dbName:default, tableName:t, 
> createTime:0, lastAccessTime:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:c2, type:int, comment:null)], 
> location:file:/Users/serge.rielau/spark/spark-warehouse/t/c1=2, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:\{serialization.format=1}), bucketCols:[], sortCols:[], 
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
> skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
> parameters:null))
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.startAddPartition(HiveMetaStore.java:2744)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_core(HiveMetaStore.java:2442)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_req(HiveMetaStore.java:2560)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>  at com.sun.proxy.$Proxy31.add_partitions_req(Unknown Source)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:625)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>  at com.sun.proxy.$Proxy32.add_partitions(Unknown Source)
>  at org.apache.hadoop.hive.ql.metadata.Hive.createPartitions(Hive.java:2103)
>  at 
> org.apache.spark.sql.hive.client.Shim_v0_13.createPartitions(HiveShim.scala:763)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createPartitions$1(HiveClientImpl.scala:631)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:296)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.createPartitions(HiveClientImpl.scala:624)
>  at 
> org.apac

[jira] [Updated] (SPARK-40521) PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions instead of the conflicting partition

2022-09-21 Thread Serge Rielau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-40521:
-
Description: 
PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions 
instead of the conflicting partition

When I run:
AlterTableAddPartitionSuiteBase for Hive
The test: partition already exists
Fails in my my local build ONLY in that mode because it reports two partitions 
as conflicting where there should be only one. In all other modes the test 
succeeds.
The test is passing on master because the test does not check the partitions 
themselves.

Repro on master: Note that c1 = 1 does not already exist. It should NOT be 
listed 

create table t(c1 int, c2 int) partitioned by (c1);

alter table t add partition (c1 = 2);

alter table t add partition (c1 = 1) partition (c1 = 2);

22/09/21 09:30:09 ERROR Hive: AlreadyExistsException(message:Partition already 
exists: Partition(values:[2], dbName:default, tableName:t, createTime:0, 
lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:c2, type:int, 
comment:null)], location:file:/Users/serge.rielau/spark/spark-warehouse/t/c1=2, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:\{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:null))

 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.startAddPartition(HiveMetaStore.java:2744)

 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_core(HiveMetaStore.java:2442)

 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_req(HiveMetaStore.java:2560)

 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)

 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.base/java.lang.reflect.Method.invoke(Method.java:566)

 at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)

 at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)

 at com.sun.proxy.$Proxy31.add_partitions_req(Unknown Source)

 at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:625)

 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)

 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.base/java.lang.reflect.Method.invoke(Method.java:566)

 at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)

 at com.sun.proxy.$Proxy32.add_partitions(Unknown Source)

 at org.apache.hadoop.hive.ql.metadata.Hive.createPartitions(Hive.java:2103)

 at 
org.apache.spark.sql.hive.client.Shim_v0_13.createPartitions(HiveShim.scala:763)

 at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createPartitions$1(HiveClientImpl.scala:631)

 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

 at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:296)

 at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)

 at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)

 at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)

 at 
org.apache.spark.sql.hive.client.HiveClientImpl.createPartitions(HiveClientImpl.scala:624)

 at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createPartitions$1(HiveExternalCatalog.scala:1039)

 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

 at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)

 at 
org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:1021)

 at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createPartitions(ExternalCatalogWithListener.scala:201)

 at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:1169)

 at 
org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.$anonfun$run$17(ddl.scala:514)

 at 
org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.$anonfun$run$17$adapted(ddl.scala:513)

 at scala.collection.Iterator.foreach(Itera

[jira] [Created] (SPARK-40521) PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions instead of the conflicting partition

2022-09-21 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-40521:


 Summary: PartitionsAlreadyExistException in Hive V1 Command V1 
reports all partitions instead of the conflicting partition
 Key: SPARK-40521
 URL: https://issues.apache.org/jira/browse/SPARK-40521
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions 
instead of the conflicting partition

When I run:
AlterTableAddPartitionSuiteBase for Hive
The test: partition already exists
Fails in my my local build ONLY in that mode because it reports two partitions 
as conflicting where there should be only one. In all other modes the test 
succeeds.
The test is passing on master because the test does not check the partitions 
themselves.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40360) Convert some DDL exception to new error framework

2022-09-06 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-40360:


 Summary: Convert some DDL exception to new error framework
 Key: SPARK-40360
 URL: https://issues.apache.org/jira/browse/SPARK-40360
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


Tackling the following files:

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NoSuchItemException.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CannotReplaceMissingTableException.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NonEmptyException.scala
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala

Here is the doc with proposed text:
https://docs.google.com/document/d/1TpFx3AwcJZd3l7zB1ZDchvZ8j2dY6_uf5LHfW2gjE4A/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40208) New OFFSET clause does not use new error framework

2022-08-24 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17584351#comment-17584351
 ] 

Serge Rielau commented on SPARK-40208:
--

[~maxgekk] (FYI)
Also (I'm sure LIMIT is the same, maybe fix in one fell swoop?)

spark-sql> SELECT name, age FROM person ORDER BY name OFFSET -1;

Error in query: The offset expression must be equal to or greater than 0, but 
got -1;

Offset -1

+- Sort [name#185 ASC NULLS FIRST], true

   +- Project [name#185, age#186]

      +- SubqueryAlias person

         +- View (`person`, [name#185,age#186])

            +- Project [cast(col1#187 as string) AS name#185, cast(col2#188 as 
int) AS age#186]

               +- LocalRelation [col1#187, col2#188]

> New OFFSET clause does not use new error framework 
> ---
>
> Key: SPARK-40208
> URL: https://issues.apache.org/jira/browse/SPARK-40208
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Minor
>
> CREATE TEMP VIEW person (name, age)
> AS VALUES ('Zen Hui', 25),
> ('Anil B' , 18),
> ('Shone S', 16),
> ('Mike A' , 25),
> ('John A' , 18),
> ('Jack N' , 16);
> SELECT name, age FROM person ORDER BY name OFFSET length(name);
> Error in query: The offset expression must evaluate to a constant value, but 
> got length(person.name);
> Offset length(name#181)
> +- Sort [name#181 ASC NULLS FIRST], true
>    +- Project [name#181, age#182]
>       +- SubqueryAlias person
>          +- View (`person`, [name#181,age#182])
>             +- Project [cast(col1#183 as string) AS name#181, cast(col2#184 
> as int) AS age#182]
>                +- LocalRelation [col1#183, col2#184|#183, col2#184]
>  
> Returning the plan here is quite pointless as well. The context would be more 
> interesting.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40208) New OFFSET clause does not use new error framework

2022-08-24 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-40208:


 Summary: New OFFSET clause does not use new error framework 
 Key: SPARK-40208
 URL: https://issues.apache.org/jira/browse/SPARK-40208
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


CREATE TEMP VIEW person (name, age)
AS VALUES ('Zen Hui', 25),
('Anil B' , 18),
('Shone S', 16),
('Mike A' , 25),
('John A' , 18),
('Jack N' , 16);

SELECT name, age FROM person ORDER BY name OFFSET length(name);

Error in query: The offset expression must evaluate to a constant value, but 
got length(person.name);

Offset length(name#181)

+- Sort [name#181 ASC NULLS FIRST], true

   +- Project [name#181, age#182]

      +- SubqueryAlias person

         +- View (`person`, [name#181,age#182])

            +- Project [cast(col1#183 as string) AS name#181, cast(col2#184 as 
int) AS age#182]

               +- LocalRelation [col1#183, col2#184|#183, col2#184]


 

Returning the plan here is quite pointless as well. The context would be more 
interesting.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40156) url_decode() exposes a Java error

2022-08-21 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582643#comment-17582643
 ] 

Serge Rielau commented on SPARK-40156:
--

+ [~maxgekk] 

For new function. we should be using the new error framework:
[https://github.com/apache/spark/blob/master/core/src/main/resources/error/error-classes.json]

> url_decode() exposes a Java error
> -
>
> Key: SPARK-40156
> URL: https://issues.apache.org/jira/browse/SPARK-40156
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> Given a badly encode string Spark returns a Java error.
> It should the return an ERROR_CLASS
> spark-sql> SELECT url_decode('http%3A%2F%2spark.apache.org');
> 22/08/20 17:17:20 ERROR SparkSQLDriver: Failed in [SELECT 
> url_decode('http%3A%2F%2spark.apache.org')]
> java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in 
> escape (%) pattern - Error at index 1 in: "2s"
>  at java.base/java.net.URLDecoder.decode(URLDecoder.java:232)
>  at java.base/java.net.URLDecoder.decode(URLDecoder.java:142)
>  at 
> org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:113)
>  at 
> org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40156) url_decode() exposes a Java error

2022-08-20 Thread Serge Rielau (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582452#comment-17582452
 ] 

Serge Rielau commented on SPARK-40156:
--

It's new [~Zing] 
https://github.com/apache/spark/commit/e5c1b822016600e77fabcdf145ecb3ba93c692b3

> url_decode() exposes a Java error
> -
>
> Key: SPARK-40156
> URL: https://issues.apache.org/jira/browse/SPARK-40156
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> Given a badly encode string Spark returns a Java error.
> It should the return an ERROR_CLASS
> spark-sql> SELECT url_decode('http%3A%2F%2spark.apache.org');
> 22/08/20 17:17:20 ERROR SparkSQLDriver: Failed in [SELECT 
> url_decode('http%3A%2F%2spark.apache.org')]
> java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in 
> escape (%) pattern - Error at index 1 in: "2s"
>  at java.base/java.net.URLDecoder.decode(URLDecoder.java:232)
>  at java.base/java.net.URLDecoder.decode(URLDecoder.java:142)
>  at 
> org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:113)
>  at 
> org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40156) url_decode() exposes a Java error

2022-08-20 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-40156:


 Summary: url_decode() exposes a Java error
 Key: SPARK-40156
 URL: https://issues.apache.org/jira/browse/SPARK-40156
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


Given a badly encode string Spark returns a Java error.
It should the return an ERROR_CLASS

spark-sql> SELECT url_decode('http%3A%2F%2spark.apache.org');

22/08/20 17:17:20 ERROR SparkSQLDriver: Failed in [SELECT 
url_decode('http%3A%2F%2spark.apache.org')]

java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in 
escape (%) pattern - Error at index 1 in: "2s"

 at java.base/java.net.URLDecoder.decode(URLDecoder.java:232)

 at java.base/java.net.URLDecoder.decode(URLDecoder.java:142)

 at 
org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:113)

 at 
org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40151) Fix return type for new median(interval) function

2022-08-19 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-40151:


 Summary: Fix return type for new median(interval) function 
 Key: SPARK-40151
 URL: https://issues.apache.org/jira/browse/SPARK-40151
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Serge Rielau


median() right now returns an interval of the same type as the input.
We should instead match mean and avg():

The result type is computed as for the arguments:

- year-month interval: The result is an `INTERVAL YEAR TO MONTH`.
- day-time interval: The result is an `INTERVAL DAY TO SECOND`.
- In all other cases the result is a DOUBLE.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39558) Store error message parameters as Map instead of Array

2022-06-22 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-39558:


 Summary: Store error message  parameters as Map instead of Array
 Key: SPARK-39558
 URL: https://issues.apache.org/jira/browse/SPARK-39558
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.1
Reporter: Serge Rielau


Right now when we raise a SparkException we pass an array of argument which are 
assigned to the message parameters by position. This has several downsides:
1. It makes hit hard to later localize (or rework) the messages since that may 
shuffle position.
2. There could be an accidental mismatch when writing code which is not 
detected in QA.

3. Sometimes we want to use the same parameter multiple times in a message. 
Repeating it as an argument seems silly. 

All of these problems go away when we use a map aligning parameters and 
arguments. We already do this for CheckError.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39492) Rework MISSING_COLUMN error class

2022-06-16 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-39492:


 Summary: Rework MISSING_COLUMN error class
 Key: SPARK-39492
 URL: https://issues.apache.org/jira/browse/SPARK-39492
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.1
Reporter: Serge Rielau


"MISSING_COLUMN" : {
"message" : [
"Column '' does not exist. Did you mean one of the following? 
[]"
],
"sqlState" : "42000"

Is unfortunately named. It is more accurate to talk about an UNRESOLVED_COLUMN 
or an UNRESOLVED_COLUMN_IDENTIFIER since we could refer to an alias, a SQL UDF 
parameter, a field, or, in the future, a variable. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39432) element_at(*, 0) does not return INVALID_ARRAY_INDEX_IN_ELEMENT_AT

2022-06-09 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-39432:


 Summary: element_at(*, 0) does not return 
INVALID_ARRAY_INDEX_IN_ELEMENT_AT
 Key: SPARK-39432
 URL: https://issues.apache.org/jira/browse/SPARK-39432
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Serge Rielau


spark-sql> SELECT element_at(array('a', 'b', 'c'), index) FROM VALUES(0), (2) 
AS T(index);

22/06/09 16:23:07 ERROR SparkSQLDriver: Failed in [SELECT element_at(array('a', 
'b', 'c'), index) FROM VALUES(0), (2) AS T(index)]

java.lang.ArrayIndexOutOfBoundsException: SQL array indices start at 1

 at 
org.apache.spark.sql.errors.QueryExecutionErrors$.sqlArrayIndexNotStartAtOneError(QueryExecutionErrors.scala:1206)

 

This should roll into INVALID_ARRAY_IN_ELEMENT_AT. Makes no sense to make a new 
error class 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39418) DECODE docs refer to Oracle instead of Spark

2022-06-08 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-39418:


 Summary: DECODE docs refer to Oracle instead of Spark
 Key: SPARK-39418
 URL: https://issues.apache.org/jira/browse/SPARK-39418
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.2.0
Reporter: Serge Rielau


[https://spark.apache.org/docs/latest/api/sql/index.html#decode]

If no match is found, then {color:#de350b}Oracle{color} returns default. If 
default is omitted, returns null.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39416) When raising an exception, pass parameters as a map instead of an array

2022-06-08 Thread Serge Rielau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-39416:
-
Description: 
We have moved away from c-style parameters in error message texts towards 
symbolic parameters. E.g.

 
{code:java}
"CANNOT_CAST_DATATYPE" : {
  "message" : [
"Cannot cast  to ."
  ],
  "sqlState" : "22005"
},{code}
{{However when we raise an exception we merely pass a simple array and assume 
positional assignment. }}
{code:java}
def cannotCastFromNullTypeError(to: DataType): Throwable = {
  new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
 messageParameters = Array(NullType.typeName, to.typeName), null)
}{code}
 

This has multiple downsides:
 # It's not possible to mention the same parameter twice in an error message.
 # When reworking an error message we cannon shuffle parameters without 
changing the code
 # There is a risk that the error message and the exception go out of synch 
unnoticed given we do not want to check for the message text in the code.

So in this PR we propose the following new usage:
{code:java}
def cannotCastFromNullTypeError(to: DataType): Throwable = {
  new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
messageParameters = Map("sourceType" -> NullType.typeName, "targetType" 
->to.typeName),
context = null)
}{code}
getMessage will then substitute the parameters in the message appropriately.

Moving forward this should be the preferred way to raise exceptions.

  was:
We have moved away from c-style parameters in error message texts towards 
symbolic parameters. E.g.

 
{code:java}
"CANNOT_CAST_DATATYPE" : {
  "message" : [
"Cannot cast  to ."
  ],
  "sqlState" : "22005"
},{code}

{{However when we raise an exception we merely pass a simple array and assume 
positional assignment. }}
{{}}

 

 
{code:java}
def cannotCastFromNullTypeError(to: DataType): Throwable = {
  new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
 messageParameters = Array(NullType.typeName, to.typeName), null)
}{code}
 

This has multiple downsides:
 # It's not possible to mention the same parameter twice in an error message.
 # When reworking an error message we cannon shuffle parameters without 
changing the code
 # There is a risk that the error message and the exception go out of synch 
unnoticed given we do not want to check for the message text in the code.


So in this PR we propose the following new usage:
{code:java}
def cannotCastFromNullTypeError(to: DataType): Throwable = {
  new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
messageParameters = Map("sourceType" -> NullType.typeName, "targetType" 
->to.typeName),
context = null)
}{code}
getMessage will then substitute the parameters in the message appropriately.

Moving forward this should be the preferred way to raise exceptions.


> When raising an exception, pass parameters as a map instead of an array
> ---
>
> Key: SPARK-39416
> URL: https://issues.apache.org/jira/browse/SPARK-39416
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.1
>Reporter: Serge Rielau
>Priority: Major
>
> We have moved away from c-style parameters in error message texts towards 
> symbolic parameters. E.g.
>  
> {code:java}
> "CANNOT_CAST_DATATYPE" : {
>   "message" : [
> "Cannot cast  to ."
>   ],
>   "sqlState" : "22005"
> },{code}
> {{However when we raise an exception we merely pass a simple array and assume 
> positional assignment. }}
> {code:java}
> def cannotCastFromNullTypeError(to: DataType): Throwable = {
>   new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
>  messageParameters = Array(NullType.typeName, to.typeName), null)
> }{code}
>  
> This has multiple downsides:
>  # It's not possible to mention the same parameter twice in an error message.
>  # When reworking an error message we cannon shuffle parameters without 
> changing the code
>  # There is a risk that the error message and the exception go out of synch 
> unnoticed given we do not want to check for the message text in the code.
> So in this PR we propose the following new usage:
> {code:java}
> def cannotCastFromNullTypeError(to: DataType): Throwable = {
>   new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
> messageParameters = Map("sourceType" -> NullType.typeName, "targetType" 
> ->to.typeName),
> context = null)
> }{code}
> getMessage will then substitute the parameters in the message appropriately.
> Moving forward this should be the preferred way to raise exceptions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h.

[jira] [Created] (SPARK-39416) When raising an exception, pass parameters as a map instead of an array

2022-06-08 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-39416:


 Summary: When raising an exception, pass parameters as a map 
instead of an array
 Key: SPARK-39416
 URL: https://issues.apache.org/jira/browse/SPARK-39416
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.1
Reporter: Serge Rielau


We have moved away from c-style parameters in error message texts towards 
symbolic parameters. E.g.

 
{code:java}
"CANNOT_CAST_DATATYPE" : {
  "message" : [
"Cannot cast  to ."
  ],
  "sqlState" : "22005"
},{code}

{{However when we raise an exception we merely pass a simple array and assume 
positional assignment. }}
{{}}

 

 
{code:java}
def cannotCastFromNullTypeError(to: DataType): Throwable = {
  new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
 messageParameters = Array(NullType.typeName, to.typeName), null)
}{code}
 

This has multiple downsides:
 # It's not possible to mention the same parameter twice in an error message.
 # When reworking an error message we cannon shuffle parameters without 
changing the code
 # There is a risk that the error message and the exception go out of synch 
unnoticed given we do not want to check for the message text in the code.


So in this PR we propose the following new usage:
{code:java}
def cannotCastFromNullTypeError(to: DataType): Throwable = {
  new SparkException(errorClass = "CANNOT_CAST_DATATYPE",
messageParameters = Map("sourceType" -> NullType.typeName, "targetType" 
->to.typeName),
context = null)
}{code}
getMessage will then substitute the parameters in the message appropriately.

Moving forward this should be the preferred way to raise exceptions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39349) Add a CheckError() method to SparkFunSuite

2022-05-31 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-39349:


 Summary: Add a CheckError() method to SparkFunSuite
 Key: SPARK-39349
 URL: https://issues.apache.org/jira/browse/SPARK-39349
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.1
Reporter: Serge Rielau


We want to standardize on a generic way to QA error messages without impeding 
the ability to enhance/rework error messages.
CheckError() allows for efficient asserting on the "payload":
 * Errorclass, subclass
 * SQLState
 * Parameters (both names and values)

 

It does not test the actual English text. Which is the feature



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39185) Convert *AlreadyExistsException to use error classes

2022-05-13 Thread Serge Rielau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-39185:
-
Summary: Convert *AlreadyExistsException to use error classes  (was: t 
*AlreadyExistsException to use error classes)

> Convert *AlreadyExistsException to use error classes
> 
>
> Key: SPARK-39185
> URL: https://issues.apache.org/jira/browse/SPARK-39185
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Serge Rielau
>Priority: Major
>
> XXX already exists is a pretty common error condition.
> We want to handle it as an error class



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39185) t *AlreadyExistsException to use error classes

2022-05-13 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-39185:


 Summary: t *AlreadyExistsException to use error classes
 Key: SPARK-39185
 URL: https://issues.apache.org/jira/browse/SPARK-39185
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Serge Rielau


XXX already exists is a pretty common error condition.
We want to handle it as an error class



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >