[jira] [Updated] (SPARK-44233) Support an outer outer context in subquery resolution

2023-06-28 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-44233:
--
Description: 
{code:java}
>>> sql("select * from range(8) t, lateral (select * from t) s")
Traceback (most recent call last):
...
pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] 
The table or view `t` cannot be found. Verify the spelling and correctness of 
the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() 
output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.; 
line 1 pos 49;
'Project [*]
+- 'LateralJoin lateral-subquery#0 [], Inner
   :  +- 'SubqueryAlias s
   :     +- 'Project [*]
   :        +- 'UnresolvedRelation [t], [], false
   +- SubqueryAlias t
      +- Range (0, 8, step=1, splits=None){code}

> Support an outer outer context in subquery resolution
> -
>
> Key: SPARK-44233
> URL: https://issues.apache.org/jira/browse/SPARK-44233
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {code:java}
> >>> sql("select * from range(8) t, lateral (select * from t) s")
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.captured.AnalysisException: 
> [TABLE_OR_VIEW_NOT_FOUND] The table or view `t` cannot be found. Verify the 
> spelling and correctness of the schema and catalog.
> If you did not qualify the name with a schema, verify the current_schema() 
> output, or qualify the name with the correct schema and catalog.
> To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF 
> EXISTS.; line 1 pos 49;
> 'Project [*]
> +- 'LateralJoin lateral-subquery#0 [], Inner
>    :  +- 'SubqueryAlias s
>    :     +- 'Project [*]
>    :        +- 'UnresolvedRelation [t], [], false
>    +- SubqueryAlias t
>       +- Range (0, 8, step=1, splits=None){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44233) Support an outer outer context in subquery resolution

2023-06-28 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44233:
-

 Summary: Support an outer outer context in subquery resolution
 Key: SPARK-44233
 URL: https://issues.apache.org/jira/browse/SPARK-44233
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44200) Support TABLE argument parser rule for TableValuedFunction

2023-06-26 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-44200:
-

 Summary: Support TABLE argument parser rule for TableValuedFunction
 Key: SPARK-44200
 URL: https://issues.apache.org/jira/browse/SPARK-44200
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43804) Test on nested structs support in Pandas UDF

2023-05-31 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-43804.
---
  Assignee: Xinrong Meng
Resolution: Fixed

Issue resolved by pull request 41320
https://github.com/apache/spark/pull/41320

> Test on nested structs support in Pandas UDF
> 
>
> Key: SPARK-43804
> URL: https://issues.apache.org/jira/browse/SPARK-43804
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> Test on nested structs support in Pandas UDF. That support is newly enabled 
> (compared to Spark 3.4).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43817) Support UserDefinedType in creaetDataFrame from pandas DataFrame and toPandas

2023-05-26 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-43817:
-

 Summary: Support UserDefinedType in creaetDataFrame from pandas 
DataFrame and toPandas
 Key: SPARK-43817
 URL: https://issues.apache.org/jira/browse/SPARK-43817
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43759) Expose TimestampNTZType in pyspark.sql.types

2023-05-23 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-43759:
-

 Summary: Expose TimestampNTZType in pyspark.sql.types
 Key: SPARK-43759
 URL: https://issues.apache.org/jira/browse/SPARK-43759
 Project: Spark
  Issue Type: Improvement
  Components: python
Affects Versions: 3.5.0
Reporter: Takuya Ueshin


{{TimestampNTZType}} is missing in {{__all__}} list in {{pyspark.sql.types}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43531) Enable more parity tests for Pandas UDFs.

2023-05-16 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-43531:
-

 Summary: Enable more parity tests for Pandas UDFs.
 Key: SPARK-43531
 URL: https://issues.apache.org/jira/browse/SPARK-43531
 Project: Spark
  Issue Type: Test
  Components: Connect, PySpark
Affects Versions: 3.5.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43528) Support duplicated field names in createDataFrame with pandas DataFrame.

2023-05-16 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-43528:
-

 Summary: Support duplicated field names in createDataFrame with 
pandas DataFrame.
 Key: SPARK-43528
 URL: https://issues.apache.org/jira/browse/SPARK-43528
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43473) Support struct type in createDataFrame from pandas DataFrame

2023-05-11 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-43473:
-

 Summary: Support struct type in createDataFrame from pandas 
DataFrame
 Key: SPARK-43473
 URL: https://issues.apache.org/jira/browse/SPARK-43473
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Takuya Ueshin


Support struct type in createDataFrame from pandas DataFrame with {{Row}} 
object or {{dict}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43363) Remove a workaround for pandas categorical type for pyarrow

2023-05-03 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-43363:
-

 Summary: Remove a workaround for pandas categorical type for 
pyarrow
 Key: SPARK-43363
 URL: https://issues.apache.org/jira/browse/SPARK-43363
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


Now that the minimum version of pyarrow is {{1.0.0}}, a workaround for pandas' 
categorical type for pyarrow can be removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43323) DataFrame.toPandas with Arrow enabled should handle exceptions properly

2023-04-28 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-43323:
-

 Summary: DataFrame.toPandas with Arrow enabled should handle 
exceptions properly
 Key: SPARK-43323
 URL: https://issues.apache.org/jira/browse/SPARK-43323
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


Currently {{DataFrame.toPandas}} doesn't capture exceptions happened in Spark 
properly.

{code:python}
>>> spark.conf.set("spark.sql.ansi.enabled", True)
>>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
>>> spark.sql("select 1/0").toPandas()
...
  An error occurred while calling o53.getResult.
: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:322)
...
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43153) Skip Spark execution when the dataframe is local.

2023-04-15 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-43153:
-

 Summary: Skip Spark execution when the dataframe is local.
 Key: SPARK-43153
 URL: https://issues.apache.org/jira/browse/SPARK-43153
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43146) Implement eager evaluation.

2023-04-14 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-43146:
-

 Summary: Implement eager evaluation.
 Key: SPARK-43146
 URL: https://issues.apache.org/jira/browse/SPARK-43146
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43115) Split pyspark-pandas-connect from pyspark-connect module.

2023-04-12 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-43115:
-

 Summary: Split pyspark-pandas-connect from pyspark-connect module.
 Key: SPARK-43115
 URL: https://issues.apache.org/jira/browse/SPARK-43115
 Project: Spark
  Issue Type: Test
  Components: PySpark, Tests
Affects Versions: 3.5.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42437) Pyspark catalog.cacheTable allow to specify storage level Connect add support Storagelevel

2023-04-12 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-42437.
---
Fix Version/s: 3.5.0
 Assignee: Khalid Mammadov
   Resolution: Fixed

Issue resolved by pull request 40015
https://github.com/apache/spark/pull/40015

> Pyspark catalog.cacheTable allow to specify storage level Connect add support 
> Storagelevel
> --
>
> Key: SPARK-42437
> URL: https://issues.apache.org/jira/browse/SPARK-42437
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Khalid Mammadov
>Assignee: Khalid Mammadov
>Priority: Major
> Fix For: 3.5.0
>
>
> Currently PySpark version of catalog.cacheTable function does not support to 
> specify storage level. This is to add that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43062) Add options to lint-python to run each test separately

2023-04-07 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-43062:
--
Priority: Minor  (was: Major)

> Add options to lint-python to run each test separately
> --
>
> Key: SPARK-43062
> URL: https://issues.apache.org/jira/browse/SPARK-43062
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43062) Add options to lint-python to run each test separately

2023-04-07 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-43062:
-

 Summary: Add options to lint-python to run each test separately
 Key: SPARK-43062
 URL: https://issues.apache.org/jira/browse/SPARK-43062
 Project: Spark
  Issue Type: Test
  Components: Project Infra
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43055) createDataFrame should support duplicated nested field names

2023-04-06 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-43055:
-

 Summary: createDataFrame should support duplicated nested field 
names
 Key: SPARK-43055
 URL: https://issues.apache.org/jira/browse/SPARK-43055
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42998) Fix DataFrame.collect with null struct.

2023-03-31 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42998:
-

 Summary: Fix DataFrame.collect with null struct.
 Key: SPARK-42998
 URL: https://issues.apache.org/jira/browse/SPARK-42998
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


In Spark Connect:

{code:python}
>>> df = spark.sql("values (1, struct('a' as x)), (null, null) as t(a, b)")
>>> df.show()
+++
|   a|   b|
+++
|   1| {a}|
|null|null|
+++

>>> df.collect()
[Row(a=1, b=Row(x='a')), Row(a=None, b=)]
{code}

whereas PySpark:

{code:python}
>>> df.collect()
[Row(a=1, b=Row(x='a')), Row(a=None, b=None)]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42985) Fix createDataFrame from pandas to respect session timezone.

2023-03-30 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42985:
-

 Summary: Fix createDataFrame from pandas to respect session 
timezone.
 Key: SPARK-42985
 URL: https://issues.apache.org/jira/browse/SPARK-42985
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42984) Fix test_createDataFrame_with_single_data_type.

2023-03-30 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42984:
-

 Summary: Fix test_createDataFrame_with_single_data_type.
 Key: SPARK-42984
 URL: https://issues.apache.org/jira/browse/SPARK-42984
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


PySpark raises an exception when:

{code:python}
>>> spark.createDataFrame(pd.DataFrame({"a": [1]}), schema="int").collect()
Traceback (most recent call last):
...
TypeError: field value: IntegerType() can not accept object (1,) in type 
{code}

whereas Spark Connect doesn't.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42983) Fix the error message of createDataFrame from np.array(0)

2023-03-30 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42983:
-

 Summary: Fix the error message of createDataFrame from np.array(0)
 Key: SPARK-42983
 URL: https://issues.apache.org/jira/browse/SPARK-42983
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


{code:python}
>>> import numpy as np
>>> spark.createDataFrame(np.array(0))
Traceback (most recent call last):
...
TypeError: len() of unsized object
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42982) Fix createDataFrame from pandas with map type

2023-03-30 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42982:
-

 Summary: Fix createDataFrame from pandas with map type
 Key: SPARK-42982
 URL: https://issues.apache.org/jira/browse/SPARK-42982
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


{code:python}
>>> import pandas as pd
>>>
>>> map_data = [{"a": 1}, {"b": 2, "c": 3}, {}, None, {"d": None}]
>>> pdf = pd.DataFrame({"id": [0, 1, 2, 3, 4], "m": map_data})
>>> schema = "id long, m map"
>>> spark.createDataFrame(pdf, schema=schema)
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.AnalysisException: 
[INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `col_1` is of type 
"STRUCT" while it's 
required to be "MAP".
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42970) Reuse pyspark.sql.tests.test_arrow test cases

2023-03-29 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42970:
-

 Summary: Reuse pyspark.sql.tests.test_arrow test cases
 Key: SPARK-42970
 URL: https://issues.apache.org/jira/browse/SPARK-42970
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42969) Fix the comparison the result with Arrow optimization enabled/disabled.

2023-03-29 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42969:
-

 Summary: Fix the comparison the result with Arrow optimization 
enabled/disabled.
 Key: SPARK-42969
 URL: https://issues.apache.org/jira/browse/SPARK-42969
 Project: Spark
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


in {{{}test_arrow{}}}, there are a bunch of comparison between DataFrames with 
Arrow optimization enabled/disabled.

These should be fixed to compare with the expected values so that it can be 
reusable for Spark Connect parity tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42920) Python UDF with UDT

2023-03-24 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42920:
-

 Summary: Python UDF with UDT
 Key: SPARK-42920
 URL: https://issues.apache.org/jira/browse/SPARK-42920
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42911) Introduce more basic exceptions.

2023-03-23 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42911:
-

 Summary: Introduce more basic exceptions.
 Key: SPARK-42911
 URL: https://issues.apache.org/jira/browse/SPARK-42911
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42900) Fix createDataFrame to respect both type inference and column names.

2023-03-22 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42900:
-

 Summary: Fix createDataFrame to respect both type inference and 
column names.
 Key: SPARK-42900
 URL: https://issues.apache.org/jira/browse/SPARK-42900
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42899) DataFrame.to(schema) fails when it contains non-nullable nested field in nullable field

2023-03-22 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-42899:
--
Summary: DataFrame.to(schema) fails when it contains non-nullable nested 
field in nullable field  (was: DataFrame.to(schema) fails with the schema of 
itself.)

> DataFrame.to(schema) fails when it contains non-nullable nested field in 
> nullable field
> ---
>
> Key: SPARK-42899
> URL: https://issues.apache.org/jira/browse/SPARK-42899
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {{DataFrame.to(schema)}} fails when it contains non-nullable nested field in 
> nullable field:
> {code:scala}
> scala> val df = spark.sql("VALUES (1, STRUCT(1 as i)), (NULL, NULL) as t(a, 
> b)")
> df: org.apache.spark.sql.DataFrame = [a: int, b: struct]
> scala> df.printSchema()
> root
>  |-- a: integer (nullable = true)
>  |-- b: struct (nullable = true)
>  ||-- i: integer (nullable = false)
> scala> df.to(df.schema)
> org.apache.spark.sql.AnalysisException: [NULLABLE_COLUMN_OR_FIELD] Column or 
> field `b`.`i` is nullable while it's required to be non-nullable.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42899) DataFrame.to(schema) fails with the schema of itself.

2023-03-22 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-42899:
--
Description: 
{{DataFrame.to(schema)}} fails when it contains non-nullable nested field in 
nullable field:
{code:scala}
scala> val df = spark.sql("VALUES (1, STRUCT(1 as i)), (NULL, NULL) as t(a, b)")
df: org.apache.spark.sql.DataFrame = [a: int, b: struct]
scala> df.printSchema()
root
 |-- a: integer (nullable = true)
 |-- b: struct (nullable = true)
 ||-- i: integer (nullable = false)

scala> df.to(df.schema)
org.apache.spark.sql.AnalysisException: [NULLABLE_COLUMN_OR_FIELD] Column or 
field `b`.`i` is nullable while it's required to be non-nullable.
{code}

  was:
{{DataFrame.to(schema)}} fails with the schema of itself, when it contains 
non-nullable nested field in nullable field:

{code:scala}
scala> val df = spark.sql("VALUES (1, STRUCT(1 as i)), (NULL, NULL) as t(a, b)")
df: org.apache.spark.sql.DataFrame = [a: int, b: struct]
scala> df.printSchema()
root
 |-- a: integer (nullable = true)
 |-- b: struct (nullable = true)
 ||-- i: integer (nullable = false)

scala> df.to(df.schema)
org.apache.spark.sql.AnalysisException: [NULLABLE_COLUMN_OR_FIELD] Column or 
field `b`.`i` is nullable while it's required to be non-nullable.
{code}



> DataFrame.to(schema) fails with the schema of itself.
> -
>
> Key: SPARK-42899
> URL: https://issues.apache.org/jira/browse/SPARK-42899
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {{DataFrame.to(schema)}} fails when it contains non-nullable nested field in 
> nullable field:
> {code:scala}
> scala> val df = spark.sql("VALUES (1, STRUCT(1 as i)), (NULL, NULL) as t(a, 
> b)")
> df: org.apache.spark.sql.DataFrame = [a: int, b: struct]
> scala> df.printSchema()
> root
>  |-- a: integer (nullable = true)
>  |-- b: struct (nullable = true)
>  ||-- i: integer (nullable = false)
> scala> df.to(df.schema)
> org.apache.spark.sql.AnalysisException: [NULLABLE_COLUMN_OR_FIELD] Column or 
> field `b`.`i` is nullable while it's required to be non-nullable.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42899) DataFrame.to(schema) fails with the schema of itself.

2023-03-22 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42899:
-

 Summary: DataFrame.to(schema) fails with the schema of itself.
 Key: SPARK-42899
 URL: https://issues.apache.org/jira/browse/SPARK-42899
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


{{DataFrame.to(schema)}} fails with the schema of itself, when it contains 
non-nullable nested field in nullable field:

{code:scala}
scala> val df = spark.sql("VALUES (1, STRUCT(1 as i)), (NULL, NULL) as t(a, b)")
df: org.apache.spark.sql.DataFrame = [a: int, b: struct]
scala> df.printSchema()
root
 |-- a: integer (nullable = true)
 |-- b: struct (nullable = true)
 ||-- i: integer (nullable = false)

scala> df.to(df.schema)
org.apache.spark.sql.AnalysisException: [NULLABLE_COLUMN_OR_FIELD] Column or 
field `b`.`i` is nullable while it's required to be non-nullable.
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42889) Implement cache, persist, unpersist, and storageLevel

2023-03-21 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42889:
-

 Summary: Implement cache, persist, unpersist, and storageLevel
 Key: SPARK-42889
 URL: https://issues.apache.org/jira/browse/SPARK-42889
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42875) Fix toPandas to handle timezone and map types properly.

2023-03-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42875:
-

 Summary: Fix toPandas to handle timezone and map types properly.
 Key: SPARK-42875
 URL: https://issues.apache.org/jira/browse/SPARK-42875
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42848) Implement DataFrame.registerTempTable

2023-03-17 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42848:
-

 Summary: Implement DataFrame.registerTempTable
 Key: SPARK-42848
 URL: https://issues.apache.org/jira/browse/SPARK-42848
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41922) Implement DataFrame `semanticHash`

2023-03-17 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-41922.
---
Resolution: Duplicate

> Implement DataFrame `semanticHash`
> --
>
> Key: SPARK-41922
> URL: https://issues.apache.org/jira/browse/SPARK-41922
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42818) Implement DataFrameReader/Writer.jdbc

2023-03-15 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42818:
-

 Summary: Implement DataFrameReader/Writer.jdbc
 Key: SPARK-42818
 URL: https://issues.apache.org/jira/browse/SPARK-42818
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42733) df.write.format().save() should support calling with no path or table name

2023-03-09 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-42733:
--
Parent: SPARK-41284
Issue Type: Sub-task  (was: Bug)

> df.write.format().save() should support calling with no path or table name
> --
>
> Key: SPARK-42733
> URL: https://issues.apache.org/jira/browse/SPARK-42733
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> When calling `session.range(5).write.format("xxx").options().save()` Spark 
> Connect currently throws an assertion error because it expects that either 
> path or tableName are present. According to our current PySpark 
> implementation that is not necessary though.
>
> {code:python}
> if format is not None:
> self.format(format)
> if path is None:
> self._jwrite.save()
> else:
> self._jwrite.save(path)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42705) SparkSession.sql doesn't return values from commands.

2023-03-07 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42705:
-

 Summary: SparkSession.sql doesn't return values from commands.
 Key: SPARK-42705
 URL: https://issues.apache.org/jira/browse/SPARK-42705
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


{code:python}
>>> spark.sql("show functions").show()
++
|function|
++
++
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41843) Implement SparkSession.udf

2023-03-01 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-41843.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> Implement SparkSession.udf
> --
>
> Key: SPARK-41843
> URL: https://issues.apache.org/jira/browse/SPARK-41843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2331, in pyspark.sql.connect.functions.call_udf
> Failed example:
>     _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
>     AttributeError: 'SparkSession' object has no attribute 'udf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42624) Reorganize imports in test_functions

2023-02-28 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-42624:
--
Component/s: PySpark
 (was: SQL)

> Reorganize imports in test_functions
> 
>
> Key: SPARK-42624
> URL: https://issues.apache.org/jira/browse/SPARK-42624
> Project: Spark
>  Issue Type: Task
>  Components: PySpark, Tests
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42624) Reorganize imports in test_functions

2023-02-28 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42624:
-

 Summary: Reorganize imports in test_functions
 Key: SPARK-42624
 URL: https://issues.apache.org/jira/browse/SPARK-42624
 Project: Spark
  Issue Type: Task
  Components: SQL, Tests
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42612) Enable more parity tests related to functions

2023-02-27 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42612:
-

 Summary: Enable more parity tests related to functions
 Key: SPARK-42612
 URL: https://issues.apache.org/jira/browse/SPARK-42612
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42510) Implement `DataFrame.mapInPandas`

2023-02-27 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-42510.
---
Fix Version/s: 3.4.0
 Assignee: Xinrong Meng
   Resolution: Fixed

Issue resolved by pull request 40104
https://github.com/apache/spark/pull/40104

> Implement `DataFrame.mapInPandas`
> -
>
> Key: SPARK-42510
> URL: https://issues.apache.org/jira/browse/SPARK-42510
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.4.0
>
>
> Implement `DataFrame.mapInPandas`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42574) DataFrame.toPandas should handle duplicated column names

2023-02-24 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42574:
-

 Summary: DataFrame.toPandas should handle duplicated column names
 Key: SPARK-42574
 URL: https://issues.apache.org/jira/browse/SPARK-42574
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


{code:python}
spark.sql("select 1 v, 1 v").toPandas()
{code}

should work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42570) Fix DataFrameReader to use the default source

2023-02-24 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42570:
-

 Summary: Fix DataFrameReader to use the default source
 Key: SPARK-42570
 URL: https://issues.apache.org/jira/browse/SPARK-42570
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


{code:python}
spark.read.load(path)
{code}

should work without specifying the format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42568) SparkConnectStreamHandler should manage configs properly while creating plans.

2023-02-24 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42568:
-

 Summary: SparkConnectStreamHandler should manage configs properly 
while creating plans.
 Key: SPARK-42568
 URL: https://issues.apache.org/jira/browse/SPARK-42568
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


Some components for planning need to check configs in {{SQLConf.get}} while 
building the plan, but currently it's unavailable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42522) Fix DataFrameWriterV2 to find the default source

2023-02-21 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42522:
-

 Summary: Fix DataFrameWriterV2 to find the default source
 Key: SPARK-42522
 URL: https://issues.apache.org/jira/browse/SPARK-42522
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


{code:python}
df.writeTo("test_table").create()
{code}

throws:

{noformat}
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(org.apache.spark.SparkClassNotFoundException) [DATA_SOURCE_NOT_FOUND] Failed 
to find the data source: . Please find packages at 
`https://spark.apache.org/third-party-projects.html`.
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41901) Parity in String representation of Column

2023-02-17 Thread Takuya Ueshin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17690652#comment-17690652
 ] 

Takuya Ueshin commented on SPARK-41901:
---

For the first case, {{{}ACOSH{}}}, {{{}ASINH{}}}, and {{ATANH}} returns upper 
case name in PySpark because their \{{prettyName}}s use upper case names, 
whereas lower cases in Spark Connect. 
[~gurwls223] [~podongfeng] Is it ok to compare them non case-sensitive way to 
enable the still skipped test 
{{{}FunctionsParityTests.test_inverse_trig_functions{}}}?

> Parity in String representation of Column
> -
>
> Key: SPARK-41901
> URL: https://issues.apache.org/jira/browse/SPARK-41901
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> from pyspark.sql import functions
> funs = [
> (functions.acosh, "ACOSH"),
> (functions.asinh, "ASINH"),
> (functions.atanh, "ATANH"),
> ]
> cols = ["a", functions.col("a")]
> for f, alias in funs:
> for c in cols:
> self.assertIn(f"{alias}(a)", repr(f(c))){code}
> {code:java}
>  Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 271, in test_inverse_trig_functions
> self.assertIn(f"{alias}(a)", repr(f(c)))
> AssertionError: 'ACOSH(a)' not found in 
> "Column<'acosh(ColumnReference(a))'>"{code}
>  
>  
> {code:java}
> from pyspark.sql.functions import col, lit, overlay
> from itertools import chain
> import re
> actual = list(
> chain.from_iterable(
> [
> re.findall("(overlay\\(.*\\))", str(x))
> for x in [
> overlay(col("foo"), col("bar"), 1),
> overlay("x", "y", 3),
> overlay(col("x"), col("y"), 1, 3),
> overlay("x", "y", 2, 5),
> overlay("x", "y", lit(11)),
> overlay("x", "y", lit(2), lit(5)),
> ]
> ]
> )
> )
> expected = [
> "overlay(foo, bar, 1, -1)",
> "overlay(x, y, 3, -1)",
> "overlay(x, y, 1, 3)",
> "overlay(x, y, 2, 5)",
> "overlay(x, y, 11, -1)",
> "overlay(x, y, 2, 5)",
> ]
> self.assertListEqual(actual, expected)
> df = self.spark.createDataFrame([("SPARK_SQL", "CORE", 7, 0)], ("x", "y", 
> "pos", "len"))
> exp = [Row(ol="SPARK_CORESQL")]
> self.assertTrue(
> all(
> [
> df.select(overlay(df.x, df.y, 7, 0).alias("ol")).collect() == exp,
> df.select(overlay(df.x, df.y, lit(7), 
> lit(0)).alias("ol")).collect() == exp,
> df.select(overlay("x", "y", "pos", "len").alias("ol")).collect() 
> == exp,
> ]
> )
> ) {code}
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_functions.py",
>  line 675, in test_overlay
> self.assertListEqual(actual, expected)
> AssertionError: Lists differ: ['overlay(ColumnReference(foo), 
> ColumnReference(bar[402 chars]5))'] != ['overlay(foo, bar, 1, -1)', 
> 'overlay(x, y, 3, -1)'[90 chars] 5)']
> First differing element 0:
> 'overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), Literal(-1))'
> 'overlay(foo, bar, 1, -1)'
> - ['overlay(ColumnReference(foo), ColumnReference(bar), Literal(1), 
> Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(3), Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(1), Literal(3))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(11), 
> Literal(-1))',
> -  'overlay(ColumnReference(x), ColumnReference(y), Literal(2), Literal(5))']
> + ['overlay(foo, bar, 1, -1)',
> +  'overlay(x, y, 3, -1)',
> +  'overlay(x, y, 1, 3)',
> +  'overlay(x, y, 2, 5)',
> +  'overlay(x, y, 11, -1)',
> +  'overlay(x, y, 2, 5)']
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42458) createDataFrame should support DDL string as schema

2023-02-15 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42458:
-

 Summary: createDataFrame should support DDL string as schema
 Key: SPARK-42458
 URL: https://issues.apache.org/jira/browse/SPARK-42458
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


{code:python}
File "/.../python/pyspark/sql/connect/readwriter.py", line 393, in 
pyspark.sql.connect.readwriter.DataFrameWriter.option
Failed example:
with tempfile.TemporaryDirectory() as d:
# Write a DataFrame into a CSV file with 'nullValue' option set to 
'Hyukjin Kwon'.
df = spark.createDataFrame([(100, None)], "age INT, name STRING")
df.write.option("nullValue", "Hyukjin 
Kwon").mode("overwrite").format("csv").save(d)

# Read the CSV file as a DataFrame.
spark.read.schema(df.schema).format('csv').load(d).show()
Exception raised:
Traceback (most recent call last):
  File "/.../lib/python3.9/doctest.py", line 1334, in __run
exec(compile(example.source, filename, "single",
  File "", line 3, in 
df = spark.createDataFrame([(100, None)], "age INT, name STRING")
  File "/.../python/pyspark/sql/connect/session.py", line 312, in 
createDataFrame
raise ValueError(
ValueError: Some of types cannot be determined after inferring, a 
StructType Schema is required in this case
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42426) insertInto fails when the column names are different from the table columns

2023-02-13 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-42426:
--
Summary: insertInto fails when the column names are different from the 
table columns  (was: insertInto doesn't insert when the column names are 
different from the table columns)

> insertInto fails when the column names are different from the table columns
> ---
>
> Key: SPARK-42426
> URL: https://issues.apache.org/jira/browse/SPARK-42426
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {noformat}
> File "/.../python/pyspark/sql/connect/readwriter.py", line 518, in 
> pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
> Failed example:
> df.selectExpr("age AS col1", "name AS col2").write.insertInto("tblA")
> Exception raised:
> Traceback (most recent call last):
>   File "/.../lib/python3.9/doctest.py", line 1334, in __run
> exec(compile(example.source, filename, "single",
>   File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[3]>", line 1, in 
> 
> df.selectExpr("age AS col1", "name AS col2").write.insertInto("tblA")
>   File "/.../python/pyspark/sql/connect/readwriter.py", line 477, in 
> insertInto
> self.saveAsTable(tableName)
>   File "/.../python/pyspark/sql/connect/readwriter.py", line 495, in 
> saveAsTable
> 
> self._spark.client.execute_command(self._write.command(self._spark.client))
>   File "/.../python/pyspark/sql/connect/client.py", line 553, in 
> execute_command
> self._execute(req)
>   File "/.../python/pyspark/sql/connect/client.py", line 648, in _execute
> self._handle_error(rpc_error)
>   File "/.../python/pyspark/sql/connect/client.py", line 718, in 
> _handle_error
> raise convert_exception(info, status.message) from None
> pyspark.errors.exceptions.connect.AnalysisException: Cannot resolve 'age' 
> given input columns: [col1, col2].
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42426) insertInto doesn't insert when the column names are different from the table columns

2023-02-13 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42426:
-

 Summary: insertInto doesn't insert when the column names are 
different from the table columns
 Key: SPARK-42426
 URL: https://issues.apache.org/jira/browse/SPARK-42426
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


File "/.../python/pyspark/sql/connect/readwriter.py", line 518, in 
pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
Failed example:
df.selectExpr("age AS col1", "name AS col2").write.insertInto("tblA")
Exception raised:
Traceback (most recent call last):
  File "/.../lib/python3.9/doctest.py", line 1334, in __run
exec(compile(example.source, filename, "single",
  File "", line 1, in 

df.selectExpr("age AS col1", "name AS col2").write.insertInto("tblA")
  File "/.../python/pyspark/sql/connect/readwriter.py", line 477, in 
insertInto
self.saveAsTable(tableName)
  File "/.../python/pyspark/sql/connect/readwriter.py", line 495, in 
saveAsTable

self._spark.client.execute_command(self._write.command(self._spark.client))
  File "/.../python/pyspark/sql/connect/client.py", line 553, in 
execute_command
self._execute(req)
  File "/.../python/pyspark/sql/connect/client.py", line 648, in _execute
self._handle_error(rpc_error)
  File "/.../python/pyspark/sql/connect/client.py", line 718, in 
_handle_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.AnalysisException: Cannot resolve 'age' 
given input columns: [col1, col2].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42426) insertInto doesn't insert when the column names are different from the table columns

2023-02-13 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-42426:
--
Description: 
{noformat}
File "/.../python/pyspark/sql/connect/readwriter.py", line 518, in 
pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
Failed example:
df.selectExpr("age AS col1", "name AS col2").write.insertInto("tblA")
Exception raised:
Traceback (most recent call last):
  File "/.../lib/python3.9/doctest.py", line 1334, in __run
exec(compile(example.source, filename, "single",
  File "", line 1, in 

df.selectExpr("age AS col1", "name AS col2").write.insertInto("tblA")
  File "/.../python/pyspark/sql/connect/readwriter.py", line 477, in 
insertInto
self.saveAsTable(tableName)
  File "/.../python/pyspark/sql/connect/readwriter.py", line 495, in 
saveAsTable

self._spark.client.execute_command(self._write.command(self._spark.client))
  File "/.../python/pyspark/sql/connect/client.py", line 553, in 
execute_command
self._execute(req)
  File "/.../python/pyspark/sql/connect/client.py", line 648, in _execute
self._handle_error(rpc_error)
  File "/.../python/pyspark/sql/connect/client.py", line 718, in 
_handle_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.AnalysisException: Cannot resolve 'age' 
given input columns: [col1, col2].
{noformat}


  was:
File "/.../python/pyspark/sql/connect/readwriter.py", line 518, in 
pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
Failed example:
df.selectExpr("age AS col1", "name AS col2").write.insertInto("tblA")
Exception raised:
Traceback (most recent call last):
  File "/.../lib/python3.9/doctest.py", line 1334, in __run
exec(compile(example.source, filename, "single",
  File "", line 1, in 

df.selectExpr("age AS col1", "name AS col2").write.insertInto("tblA")
  File "/.../python/pyspark/sql/connect/readwriter.py", line 477, in 
insertInto
self.saveAsTable(tableName)
  File "/.../python/pyspark/sql/connect/readwriter.py", line 495, in 
saveAsTable

self._spark.client.execute_command(self._write.command(self._spark.client))
  File "/.../python/pyspark/sql/connect/client.py", line 553, in 
execute_command
self._execute(req)
  File "/.../python/pyspark/sql/connect/client.py", line 648, in _execute
self._handle_error(rpc_error)
  File "/.../python/pyspark/sql/connect/client.py", line 718, in 
_handle_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.AnalysisException: Cannot resolve 'age' 
given input columns: [col1, col2].


> insertInto doesn't insert when the column names are different from the table 
> columns
> 
>
> Key: SPARK-42426
> URL: https://issues.apache.org/jira/browse/SPARK-42426
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {noformat}
> File "/.../python/pyspark/sql/connect/readwriter.py", line 518, in 
> pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
> Failed example:
> df.selectExpr("age AS col1", "name AS col2").write.insertInto("tblA")
> Exception raised:
> Traceback (most recent call last):
>   File "/.../lib/python3.9/doctest.py", line 1334, in __run
> exec(compile(example.source, filename, "single",
>   File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[3]>", line 1, in 
> 
> df.selectExpr("age AS col1", "name AS col2").write.insertInto("tblA")
>   File "/.../python/pyspark/sql/connect/readwriter.py", line 477, in 
> insertInto
> self.saveAsTable(tableName)
>   File "/.../python/pyspark/sql/connect/readwriter.py", line 495, in 
> saveAsTable
> 
> self._spark.client.execute_command(self._write.command(self._spark.client))
>   File "/.../python/pyspark/sql/connect/client.py", line 553, in 
> execute_command
> self._execute(req)
>   File "/.../python/pyspark/sql/connect/client.py", line 648, in _execute
> self._handle_error(rpc_error)
>   File "/.../python/pyspark/sql/connect/client.py", line 718, in 
> _handle_error
> raise convert_exception(info, status.message) from None
> pyspark.errors.exceptions.connect.AnalysisException: Cannot resolve 'age' 
> given input columns: [col1, col2].
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41870) Handle duplicate columns in `createDataFrame`

2023-02-13 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-41870:
--
Attachment: (was: session.py)

> Handle duplicate columns in `createDataFrame`
> -
>
> Key: SPARK-41870
> URL: https://issues.apache.org/jira/browse/SPARK-41870
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> df = self.spark.createDataFrame([(1, 2)], ["c", "c"]){code}
> Error:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 65, in test_duplicated_column_names
>     df = self.spark.createDataFrame([(1, 2)], ["c", "c"])
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 277, in createDataFrame
>     raise ValueError(
> ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 
> elements{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41870) Handle duplicate columns in `createDataFrame`

2023-02-13 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-41870:
--
Attachment: session.py

> Handle duplicate columns in `createDataFrame`
> -
>
> Key: SPARK-41870
> URL: https://issues.apache.org/jira/browse/SPARK-41870
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> df = self.spark.createDataFrame([(1, 2)], ["c", "c"]){code}
> Error:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/tests/test_dataframe.py",
>  line 65, in test_duplicated_column_names
>     df = self.spark.createDataFrame([(1, 2)], ["c", "c"])
>   File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/session.py", 
> line 277, in createDataFrame
>     raise ValueError(
> ValueError: Length mismatch: Expected axis has 1 elements, new values have 2 
> elements{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42265) DataFrame.createTempView - SparkConnectGrpcException: requirement failed

2023-02-13 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-42265.
---
  Assignee: Takuya Ueshin
Resolution: Fixed

Issue resolved by pull request 39968
https://github.com/apache/spark/pull/39968

> DataFrame.createTempView - SparkConnectGrpcException: requirement failed
> 
>
> Key: SPARK-42265
> URL: https://issues.apache.org/jira/browse/SPARK-42265
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Takuya Ueshin
>Priority: Major
>
> To reproduce,
> ```
> spark.range(1).filter(udf(lambda x: x)("id") >= 0).createTempView("v")
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41820) DataFrame.createOrReplaceGlobalTempView - SparkConnectException: requirement failed

2023-02-13 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-41820.
---
  Assignee: Takuya Ueshin
Resolution: Fixed

Issue resolved by pull request 39968
https://github.com/apache/spark/pull/39968

> DataFrame.createOrReplaceGlobalTempView - SparkConnectException: requirement 
> failed
> ---
>
> Key: SPARK-41820
> URL: https://issues.apache.org/jira/browse/SPARK-41820
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Takuya Ueshin
>Priority: Major
>
> {code:java}
> >>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", 
> >>> "name"])
> >>> df2 = df.filter(df.age > 3)
> >>> df2.createOrReplaceGlobalTempView("people") {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1292, in 
> pyspark.sql.connect.dataframe.DataFrame.createOrReplaceGlobalTempView
> Failed example:
>     df2.createOrReplaceGlobalTempView("people")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.dataframe.DataFrame.createOrReplaceGlobalTempView[3]>", 
> line 1, in 
>         df2.createOrReplaceGlobalTempView("people")
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1192, in createOrReplaceGlobalTempView
>         self._session.client.execute_command(command)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 459, in execute_command
>         self._execute(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 547, in _execute
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 625, in _handle_error
>         raise SparkConnectException(status.message) from None
>     pyspark.sql.connect.client.SparkConnectException: requirement failed 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42402) Support parameterized SQL by sql()

2023-02-10 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42402:
-

 Summary: Support parameterized SQL by sql()
 Key: SPARK-42402
 URL: https://issues.apache.org/jira/browse/SPARK-42402
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41820) DataFrame.createOrReplaceGlobalTempView - SparkConnectException: requirement failed

2023-02-10 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-41820:
--
Description: 
{code:java}
>>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", 
>>> "name"])
>>> df2 = df.filter(df.age > 3)
>>> df2.createOrReplaceGlobalTempView("people") {code}
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1292, in 
pyspark.sql.connect.dataframe.DataFrame.createOrReplaceGlobalTempView
Failed example:
    df2.createOrReplaceGlobalTempView("people")
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df2.createOrReplaceGlobalTempView("people")
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1192, in createOrReplaceGlobalTempView
        self._session.client.execute_command(command)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
459, in execute_command
        self._execute(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
547, in _execute
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
625, in _handle_error
        raise SparkConnectException(status.message) from None
    pyspark.sql.connect.client.SparkConnectException: requirement failed 

{code}

  was:
{code:java}
>>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", 
>>> "name"])
>>> df.createOrReplaceGlobalTempView("people") {code}
{code:java}
File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1292, in 
pyspark.sql.connect.dataframe.DataFrame.createOrReplaceGlobalTempView
Failed example:
    df2.createOrReplaceGlobalTempView("people")
Exception raised:
    Traceback (most recent call last):
      File 
"/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
 line 1350, in __run
        exec(compile(example.source, filename, "single",
      File "", 
line 1, in 
        df2.createOrReplaceGlobalTempView("people")
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
line 1192, in createOrReplaceGlobalTempView
        self._session.client.execute_command(command)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
459, in execute_command
        self._execute(req)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
547, in _execute
        self._handle_error(rpc_error)
      File 
"/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", line 
625, in _handle_error
        raise SparkConnectException(status.message) from None
    pyspark.sql.connect.client.SparkConnectException: requirement failed 

{code}


> DataFrame.createOrReplaceGlobalTempView - SparkConnectException: requirement 
> failed
> ---
>
> Key: SPARK-41820
> URL: https://issues.apache.org/jira/browse/SPARK-41820
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>
> {code:java}
> >>> df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", 
> >>> "name"])
> >>> df2 = df.filter(df.age > 3)
> >>> df2.createOrReplaceGlobalTempView("people") {code}
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1292, in 
> pyspark.sql.connect.dataframe.DataFrame.createOrReplaceGlobalTempView
> Failed example:
>     df2.createOrReplaceGlobalTempView("people")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.dataframe.DataFrame.createOrReplaceGlobalTempView[3]>", 
> line 1, in 
>         df2.createOrReplaceGlobalTempView("people")
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1192, in createOrReplaceGlobalTempView
>         self._session.client.execute_command(command)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 459, in execute_command
>         self._execute(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> 

[jira] [Updated] (SPARK-42017) df["bad_key"] does not raise AnalysisException

2023-02-09 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-42017:
--
Parent Issue: SPARK-41282  (was: SPARK-42006)

> df["bad_key"] does not raise AnalysisException
> --
>
> Key: SPARK-42017
> URL: https://issues.apache.org/jira/browse/SPARK-42017
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> e.g.)
> {code}
> 23/01/12 14:33:43 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> FAILED [  8%]
> pyspark/sql/tests/test_column.py:105 (ColumnParityTests.test_access_column)
> self =  testMethod=test_access_column>
> def test_access_column(self):
> df = self.df
> self.assertTrue(isinstance(df.key, Column))
> self.assertTrue(isinstance(df["key"], Column))
> self.assertTrue(isinstance(df[0], Column))
> self.assertRaises(IndexError, lambda: df[2])
> >   self.assertRaises(AnalysisException, lambda: df["bad_key"])
> E   AssertionError: AnalysisException not raised by 
> ../test_column.py:112: AssertionError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42017) df["bad_key"] does not raise AnalysisException

2023-02-06 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-42017:
--
Summary: df["bad_key"] does not raise AnalysisException  (was: Different 
error type AnalysisException vs SparkConnectAnalysisException)

> df["bad_key"] does not raise AnalysisException
> --
>
> Key: SPARK-42017
> URL: https://issues.apache.org/jira/browse/SPARK-42017
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> e.g.)
> {code}
> 23/01/12 14:33:43 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> FAILED [  8%]
> pyspark/sql/tests/test_column.py:105 (ColumnParityTests.test_access_column)
> self =  testMethod=test_access_column>
> def test_access_column(self):
> df = self.df
> self.assertTrue(isinstance(df.key, Column))
> self.assertTrue(isinstance(df["key"], Column))
> self.assertTrue(isinstance(df[0], Column))
> self.assertRaises(IndexError, lambda: df[2])
> >   self.assertRaises(AnalysisException, lambda: df["bad_key"])
> E   AssertionError: AnalysisException not raised by 
> ../test_column.py:112: AssertionError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42338) Different exception in DataFrame.sample

2023-02-03 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-42338:
--
Environment: (was: It raises {{SparkConnectGrpcException}} instead of 
{{IllegalArgumentException}}.)

> Different exception in DataFrame.sample
> ---
>
> Key: SPARK-42338
> URL: https://issues.apache.org/jira/browse/SPARK-42338
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42338) Different exception in DataFrame.sample

2023-02-03 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-42338:
--
Description: It raises {{SparkConnectGrpcException}} instead of 
{{IllegalArgumentException}}.

> Different exception in DataFrame.sample
> ---
>
> Key: SPARK-42338
> URL: https://issues.apache.org/jira/browse/SPARK-42338
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> It raises {{SparkConnectGrpcException}} instead of 
> {{IllegalArgumentException}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42342) Introduce base hierarchy to exceptions.

2023-02-03 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42342:
-

 Summary: Introduce base hierarchy to exceptions.
 Key: SPARK-42342
 URL: https://issues.apache.org/jira/browse/SPARK-42342
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42340) Implement GroupedData.applyInPandas

2023-02-03 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42340:
-

 Summary: Implement GroupedData.applyInPandas
 Key: SPARK-42340
 URL: https://issues.apache.org/jira/browse/SPARK-42340
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42338) Different exception in DataFrame.sample

2023-02-03 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42338:
-

 Summary: Different exception in DataFrame.sample
 Key: SPARK-42338
 URL: https://issues.apache.org/jira/browse/SPARK-42338
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
 Environment: It raises {{SparkConnectGrpcException}} instead of 
{{IllegalArgumentException}}.
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42017) Different error type AnalysisException vs SparkConnectAnalysisException

2023-02-02 Thread Takuya Ueshin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17683594#comment-17683594
 ] 

Takuya Ueshin commented on SPARK-42017:
---

The error class hierarchy is one of the issues, but the test in the description 
has a different issue, 
{code:python}
df["bad_key"]
{code}
does not raise any error at the point because Spark Connect doesn't analyze 
whether the column name is valid or not yet.

> Different error type AnalysisException vs SparkConnectAnalysisException
> ---
>
> Key: SPARK-42017
> URL: https://issues.apache.org/jira/browse/SPARK-42017
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> e.g.)
> {code}
> 23/01/12 14:33:43 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> FAILED [  8%]
> pyspark/sql/tests/test_column.py:105 (ColumnParityTests.test_access_column)
> self =  testMethod=test_access_column>
> def test_access_column(self):
> df = self.df
> self.assertTrue(isinstance(df.key, Column))
> self.assertTrue(isinstance(df["key"], Column))
> self.assertTrue(isinstance(df[0], Column))
> self.assertRaises(IndexError, lambda: df[2])
> >   self.assertRaises(AnalysisException, lambda: df["bad_key"])
> E   AssertionError: AnalysisException not raised by 
> ../test_column.py:112: AssertionError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42295) Tear down the test cleanly

2023-02-02 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42295:
-

 Summary: Tear down the test cleanly
 Key: SPARK-42295
 URL: https://issues.apache.org/jira/browse/SPARK-42295
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41778) Add an alias "reduce" to ArrayAggregate

2022-12-29 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-41778:
-

 Summary: Add an alias "reduce" to ArrayAggregate
 Key: SPARK-41778
 URL: https://issues.apache.org/jira/browse/SPARK-41778
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


Adds an alias "reduce" to {{ArrayAggregate}}.
Presto provides the function: 
https://prestodb.io/docs/current/functions/array.html#reduce.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41753) Add tests for ArrayZip to check the result size and nullability.

2022-12-28 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-41753:
-

 Summary: Add tests for ArrayZip to check the result size and 
nullability.
 Key: SPARK-41753
 URL: https://issues.apache.org/jira/browse/SPARK-41753
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.4.0
Reporter: Takuya Ueshin


Add tests for {{ArrayZip}} to check the result size and nullability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39419) When the comparator of ArraySort returns null, it should fail.

2022-06-08 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-39419:
-

 Summary: When the comparator of ArraySort returns null, it should 
fail.
 Key: SPARK-39419
 URL: https://issues.apache.org/jira/browse/SPARK-39419
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


When the comparator of {{ArraySort}} returns {{null}}, currently it handles it 
as {{0}} (equal).

According to the doc, 

{quote}
It returns -1, 0, or 1 as the first element is less than, equal to, or greater 
than the second element. If the comparator function returns other values 
(including null), the function will fail and raise an error.
{quote}

It's fine to return non -1, 0, 1 integers to follow the Java convention (still 
need to update the doc, though), but it should throw an exception for {{null}} 
result.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39293) The accumulator of ArrayAggregate should copy the intermediate result if string, struct, array, or map

2022-05-25 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-39293:
-

 Summary: The accumulator of ArrayAggregate should copy the 
intermediate result if string, struct, array, or map
 Key: SPARK-39293
 URL: https://issues.apache.org/jira/browse/SPARK-39293
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.1, 3.1.2, 3.0.3, 3.3.0
Reporter: Takuya Ueshin


The accumulator of ArrayAggregate should copy the intermediate result if 
string, struct, array, or map.

{code:scala}
import org.apache.spark.sql.functions._

val reverse = udf((s: String) => s.reverse)

val df = Seq(Array("abc", "def")).toDF("array")
val testArray = df.withColumn(
  "agg",
  aggregate(
col("array"),
array().cast("array"),
(acc, s) => concat(acc, array(reverse(s)

aggArray.show(truncate=false)
{code}

should be:

{code}
+--+--+
|array |agg   |
+--+--+
|[abc, def]|[cba, fed]|
+--+--+
{code}

but:

{code}
+--+--+
|array |agg   |
+--+--+
|[abc, def]|[fed, fed]|
+--+--+
{code}





--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39048) Refactor `GroupBy._reduce_for_stat_function` on accepted data types

2022-04-29 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-39048.
---
  Assignee: Xinrong Meng
Resolution: Fixed

Issue resolved by pull request 36382
https://github.com/apache/spark/pull/36382

> Refactor `GroupBy._reduce_for_stat_function` on accepted data types 
> 
>
> Key: SPARK-39048
> URL: https://issues.apache.org/jira/browse/SPARK-39048
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>
> `Groupby._reduce_for_stat_function` is a common helper function leveraged by 
> multiple statistical functions of GroupBy objects.
> It defines parameters `only_numeric` and `bool_as_numeric` to control 
> accepted Spark types.
> To be consistent with pandas API, we may also have to introduce 
> `str_as_numeric` for `sum` for example.
> Instead of introducing parameters designated for each Spark type, the PR is 
> proposed to introduce a parameter `accepted_spark_types` to specify accepted 
> types of Spark columns to be aggregated.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38882) The usage logger attachment logic should handle static methods properly.

2022-04-12 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-38882:
-

 Summary: The usage logger attachment logic should handle static 
methods properly.
 Key: SPARK-38882
 URL: https://issues.apache.org/jira/browse/SPARK-38882
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.2.1, 3.3.0
Reporter: Takuya Ueshin


The usage logger attachment logic has an issue when handling static methods.

For example,

{code}
$ PYSPARK_PANDAS_USAGE_LOGGER=pyspark.pandas.usage_logging.usage_logger 
./bin/pyspark
{code}

{code:python}
>>> import pyspark.pandas as ps
>>> psdf = ps.DataFrame({"a": [1,2,3], "b": [4,5,6]})
>>> psdf.from_records([(1, 2), (3, 4)])
A function `DataFrame.from_records(data, index, exclude, columns, coerce_float, 
nrows)` was failed after 2007.430 ms: 0
Traceback (most recent call last):
...
{code}

without usage logger:

{code:python}
>>> import pyspark.pandas as ps
>>> psdf = ps.DataFrame({"a": [1,2,3], "b": [4,5,6]})
>>> psdf.from_records([(1, 2), (3, 4)])
   0  1
0  1  2
1  3  4
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38628) Complete the copy method in subclasses of InternalRow, ArrayData, and MapData to safely copy their instances.

2022-03-22 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-38628:
-

 Summary: Complete the copy method in subclasses of InternalRow, 
ArrayData, and MapData to safely copy their instances.
 Key: SPARK-38628
 URL: https://issues.apache.org/jira/browse/SPARK-38628
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


Some subclasses of {{InternalRow}}, {{ArrayData}}, and {{MapData}} missing 
support for {{StructType}}, {{ArrayType}}, and {{MapType}} in their {{copy}} 
method.
We should complete them to safely copy their instances.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38484) Move usage logging instrumentation util functions from pandas module to pyspark.util module

2022-03-15 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-38484.
---
Fix Version/s: 3.3.0
 Assignee: Yihong He
   Resolution: Fixed

Issue resolved by pull request 35790
https://github.com/apache/spark/pull/35790

> Move usage logging instrumentation util functions from pandas module to 
> pyspark.util module
> ---
>
> Key: SPARK-38484
> URL: https://issues.apache.org/jira/browse/SPARK-38484
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.1
>Reporter: Yihong He
>Assignee: Yihong He
>Priority: Minor
> Fix For: 3.3.0
>
>
> It will be helpful to attach the usage logger to other modules (e.g. sql) 
> besides Pandas but other modules should not depend on Pandas modules to use 
> the instrumentation utils (e.g. _wrap_function, _wrap_property ...). So we 
> need to move usage logging instrumentation util functions from Pandas module 
> to pyspark.util module.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37491) Fix Series.asof when values of the series is not sorted

2022-03-14 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-37491.
---
  Assignee: pralabhkumar
Resolution: Fixed

Issue resolved by pull request 35191
https://github.com/apache/spark/pull/35191

> Fix Series.asof when values of the series is not sorted
> ---
>
> Key: SPARK-37491
> URL: https://issues.apache.org/jira/browse/SPARK-37491
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: pralabhkumar
>Priority: Major
>
> https://github.com/apache/spark/pull/34737#discussion_r758223279



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38387) Support `na_action` and Series input correspondence in `Series.map`

2022-03-09 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-38387.
---
Fix Version/s: 3.3.0
 Assignee: Xinrong Meng
   Resolution: Fixed

Issue resolved by pull request 35706
https://github.com/apache/spark/pull/35706

> Support `na_action` and Series input correspondence in `Series.map`
> ---
>
> Key: SPARK-38387
> URL: https://issues.apache.org/jira/browse/SPARK-38387
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.3.0
>
>
> Support `na_action` and Series input correspondence in `Series.map`, in order 
> to reach parity to pandas API.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37903) Replace string_typehints with get_type_hints.

2022-01-13 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37903:
-

 Summary: Replace string_typehints with get_type_hints.
 Key: SPARK-37903
 URL: https://issues.apache.org/jira/browse/SPARK-37903
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


Currently we have a hacky way to resolve type hints written as strings, but we 
can use {{get_type_hints}} instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37885) Allow pandas_udf to take type annotations with future annotations enabled

2022-01-12 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37885:
-

 Summary: Allow pandas_udf to take type annotations with future 
annotations enabled
 Key: SPARK-37885
 URL: https://issues.apache.org/jira/browse/SPARK-37885
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


When using {{{}from __future__ import annotations{}}}, the type hints will be 
all strings, then pandas UDF type inference won't work as follows:
{code:python}
>>> from __future__ import annotations
>>> from typing import Union
>>> import pandas as pd
>>> from pyspark.sql.functions import pandas_udf
>>> @pandas_udf("long")
... def plus_one(v: Union[pd.Series, pd.DataFrame]) -> pd.Series:
... return v + 1

Traceback (most recent call last):
...
NotImplementedError: Unsupported signature: (v: 'Union[pd.Series, 
pd.DataFrame]') -> 'pd.Series'.
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37782) Make DataFrame.transform take the parameters for the function.

2021-12-29 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37782:
-

 Summary: Make DataFrame.transform take the parameters for the 
function.
 Key: SPARK-37782
 URL: https://issues.apache.org/jira/browse/SPARK-37782
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


Currently when a function which takes parameters besides DataFrame is passed to 
{{DataFrame.transform}}, {{lambda}} needs to be used.
Making {{DataFrame.transform}} take the parameters would be more convenient.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37678) Incorrect annotations in SeriesGroupBy._cleanup_and_return

2021-12-20 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-37678.
---
Fix Version/s: 3.2.1
   3.3.0
 Assignee: Maciej Szymkiewicz
   Resolution: Fixed

Issue resolved by pull request 34950
https://github.com/apache/spark/pull/34950

> Incorrect annotations in SeriesGroupBy._cleanup_and_return 
> ---
>
> Key: SPARK-37678
> URL: https://issues.apache.org/jira/browse/SPARK-37678
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.2.1, 3.3.0
>
>
> [{{SeriesGroupBy._cleanup_and_return}}|https://github.com/apache/spark/blob/02ee1ae10b938eaa1621c3e878d07c39b9887c2e/python/pyspark/pandas/groupby.py#L2997-L2998]
>  annotations
> {code:python}
> def _cleanup_and_return(self, pdf: pd.DataFrame) -> Series:
> return first_series(pdf).rename().rename(self._psser.name)
> {code}
> are inconsistent:
> - If {{pdf}} is {{pd.DataFrame}} then output should be {{pd.Series}}.
> - If output is {{ps.Series}} then {{pdf}} should be {{ps.DataFrame}}.
> Doesn't seem like the method is used (it is possible that my search skills 
> and PyCharm inspection failed), so I am not sure which of these options was 
> intended.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37678) Incorrect annotations in SeriesGroupBy._cleanup_and_return

2021-12-17 Thread Takuya Ueshin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461681#comment-17461681
 ] 

Takuya Ueshin commented on SPARK-37678:
---

Yes!

> Incorrect annotations in SeriesGroupBy._cleanup_and_return 
> ---
>
> Key: SPARK-37678
> URL: https://issues.apache.org/jira/browse/SPARK-37678
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> [{{SeriesGroupBy._cleanup_and_return}}|https://github.com/apache/spark/blob/02ee1ae10b938eaa1621c3e878d07c39b9887c2e/python/pyspark/pandas/groupby.py#L2997-L2998]
>  annotations
> {code:python}
> def _cleanup_and_return(self, pdf: pd.DataFrame) -> Series:
> return first_series(pdf).rename().rename(self._psser.name)
> {code}
> are inconsistent:
> - If {{pdf}} is {{pd.DataFrame}} then output should be {{pd.Series}}.
> - If output is {{ps.Series}} then {{pdf}} should be {{ps.DataFrame}}.
> Doesn't seem like the method is used (it is possible that my search skills 
> and PyCharm inspection failed), so I am not sure which of these options was 
> intended.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-37678) Incorrect annotations in SeriesGroupBy._cleanup_and_return

2021-12-17 Thread Takuya Ueshin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461677#comment-17461677
 ] 

Takuya Ueshin edited comment on SPARK-37678 at 12/17/21, 9:30 PM:
--

Good catch!
It must be {{{}_cleanup_and_return(self, psdf: DataFrame) -> Series{}}}. (not 
{{pd.}})

??Doesn't seem like the method is used??

It's an actual implementation of an abstract method 
{{GroupBy._cleanup_and_return}} for {{{}SeriesGroupBy{}}}.
{{GroupBy._cleanup_and_return}} is called at many places in {{{}GroupBy{}}}.


was (Author: ueshin):
Good catch!
It must be {{{}_cleanup_and_return(self, psdf: DataFrame) -> Series{}}}.

??Doesn't seem like the method is used??

It's an actual implementation of an abstract method 
{{GroupBy._cleanup_and_return}} for {{{}SeriesGroupBy{}}}.
{{GroupBy._cleanup_and_return}} is called at many places in {{{}GroupBy{}}}.

> Incorrect annotations in SeriesGroupBy._cleanup_and_return 
> ---
>
> Key: SPARK-37678
> URL: https://issues.apache.org/jira/browse/SPARK-37678
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> [{{SeriesGroupBy._cleanup_and_return}}|https://github.com/apache/spark/blob/02ee1ae10b938eaa1621c3e878d07c39b9887c2e/python/pyspark/pandas/groupby.py#L2997-L2998]
>  annotations
> {code:python}
> def _cleanup_and_return(self, pdf: pd.DataFrame) -> Series:
> return first_series(pdf).rename().rename(self._psser.name)
> {code}
> are inconsistent:
> - If {{pdf}} is {{pd.DataFrame}} then output should be {{pd.Series}}.
> - If output is {{ps.Series}} then {{pdf}} should be {{ps.DataFrame}}.
> Doesn't seem like the method is used (it is possible that my search skills 
> and PyCharm inspection failed), so I am not sure which of these options was 
> intended.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37678) Incorrect annotations in SeriesGroupBy._cleanup_and_return

2021-12-17 Thread Takuya Ueshin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461677#comment-17461677
 ] 

Takuya Ueshin commented on SPARK-37678:
---

Good catch!
It must be {{{}_cleanup_and_return(self, psdf: DataFrame) -> Series{}}}.

??Doesn't seem like the method is used??

It's an actual implementation of an abstract method 
{{GroupBy._cleanup_and_return}} for {{{}SeriesGroupBy{}}}.
{{GroupBy._cleanup_and_return}} is called at many places in {{{}GroupBy{}}}.

> Incorrect annotations in SeriesGroupBy._cleanup_and_return 
> ---
>
> Key: SPARK-37678
> URL: https://issues.apache.org/jira/browse/SPARK-37678
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> [{{SeriesGroupBy._cleanup_and_return}}|https://github.com/apache/spark/blob/02ee1ae10b938eaa1621c3e878d07c39b9887c2e/python/pyspark/pandas/groupby.py#L2997-L2998]
>  annotations
> {code:python}
> def _cleanup_and_return(self, pdf: pd.DataFrame) -> Series:
> return first_series(pdf).rename().rename(self._psser.name)
> {code}
> are inconsistent:
> - If {{pdf}} is {{pd.DataFrame}} then output should be {{pd.Series}}.
> - If output is {{ps.Series}} then {{pdf}} should be {{ps.DataFrame}}.
> Doesn't seem like the method is used (it is possible that my search skills 
> and PyCharm inspection failed), so I am not sure which of these options was 
> intended.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37669) Remove unnecessary usages of OrderedDict

2021-12-16 Thread Takuya Ueshin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461057#comment-17461057
 ] 

Takuya Ueshin commented on SPARK-37669:
---

I'm working on this.

> Remove unnecessary usages of OrderedDict
> 
>
> Key: SPARK-37669
> URL: https://issues.apache.org/jira/browse/SPARK-37669
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Now that supported Python is 3.7 and above, we can remove unnecessary usages 
> of {{OrderedDict}} because built-in dict guarantees the insertion order.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37669) Remove unnecessary usages of OrderedDict

2021-12-16 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37669:
-

 Summary: Remove unnecessary usages of OrderedDict
 Key: SPARK-37669
 URL: https://issues.apache.org/jira/browse/SPARK-37669
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


Now that supported Python is 3.7 and above, we can remove unnecessary usages of 
{{OrderedDict}} because built-in dict guarantees the insertion order.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37514) Remove workarounds due to older pandas

2021-12-01 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37514:
-

 Summary: Remove workarounds due to older pandas
 Key: SPARK-37514
 URL: https://issues.apache.org/jira/browse/SPARK-37514
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


Now that we upgraded the minimum version of pandas to {{1.0.5}}.
We can remove workarounds for pandas API on Spark to run with older pandas.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37443) Provide a profiler for Python/Pandas UDFs

2021-11-22 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37443:
-

 Summary: Provide a profiler for Python/Pandas UDFs
 Key: SPARK-37443
 URL: https://issues.apache.org/jira/browse/SPARK-37443
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


Currently a profiler is provided for only {{RDD}} operations, but providing a 
profiler for Python/Pandas UDFs would be great.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37374) StatCounter should use mergeStats when merging with self.

2021-11-18 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37374:
-

 Summary: StatCounter should use mergeStats when merging with self.
 Key: SPARK-37374
 URL: https://issues.apache.org/jira/browse/SPARK-37374
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Spark Core
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


{{StatCounter}} should use {{mergeStats}} instead of {{merge}} when merging 
with {{self}}.

This is a long standing bug but usually this bug won't be hit unless users 
explicitly use {{mergeStats}} with {{self}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37298) Use unique exprId in RewriteAsOfJoin

2021-11-12 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-37298.
---
Fix Version/s: 3.3.0
 Assignee: Allison Wang
   Resolution: Fixed

Issue resolved by pull request 34567
https://github.com/apache/spark/pull/34567

> Use unique exprId in RewriteAsOfJoin
> 
>
> Key: SPARK-37298
> URL: https://issues.apache.org/jira/browse/SPARK-37298
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> Use a new exprId instead of reusing an old exprId in RewriteAsOfJoin to help 
> guarantee plan integrity and eliminate potential issues with exprId reuse.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37296) Add missing type hints in python/pyspark/util.py

2021-11-11 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37296:
-

 Summary: Add missing type hints in python/pyspark/util.py
 Key: SPARK-37296
 URL: https://issues.apache.org/jira/browse/SPARK-37296
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36845) Inline type hint files for files in python/pyspark/sql

2021-11-11 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-36845:
--
Summary: Inline type hint files for files in python/pyspark/sql  (was: 
Inline type hint files)

> Inline type hint files for files in python/pyspark/sql
> --
>
> Key: SPARK-36845
> URL: https://issues.apache.org/jira/browse/SPARK-36845
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>
> Currently there are type hint stub files ({{*.pyi}}) to show the expected 
> types for functions, but we can also take advantage of static type checking 
> within the functions by inlining the type hints.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36845) Inline type hint files

2021-10-21 Thread Takuya Ueshin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17432684#comment-17432684
 ] 

Takuya Ueshin commented on SPARK-36845:
---

Hi [~dchvn], shall we file separate umbrella tickets for each module and 
resolve this?
The number of sub tasks are already growing for one umbrella ticket.
 Managing tasks based on each module should be clearer.

Thanks!

> Inline type hint files
> --
>
> Key: SPARK-36845
> URL: https://issues.apache.org/jira/browse/SPARK-36845
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Xinrong Meng
>Priority: Major
>
> Currently there are type hint stub files ({{*.pyi}}) to show the expected 
> types for functions, but we can also take advantage of static type checking 
> within the functions by inlining the type hints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37079) Fix DataFrameWriterV2.partitionedBy to send the arguments to JVM properly

2021-10-20 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37079:
-

 Summary: Fix DataFrameWriterV2.partitionedBy to send the arguments 
to JVM properly
 Key: SPARK-37079
 URL: https://issues.apache.org/jira/browse/SPARK-37079
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 3.2.0, 3.1.2, 3.3.0
Reporter: Takuya Ueshin


In PySpark, {{DataFrameWriterV2.partitionedBy}} doesn't send the arguments to 
JVM properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37048) Clean up inlining type hints under SQL module

2021-10-20 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-37048.
---
Fix Version/s: 3.3.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

Issue resolved by pull request 34318
https://github.com/apache/spark/pull/34318

> Clean up inlining type hints under SQL module
> -
>
> Key: SPARK-37048
> URL: https://issues.apache.org/jira/browse/SPARK-37048
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.3.0
>
>
> Now that most of type hits under the SQL module are inlined.
> We should clean up for the module now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36945) Inline type hints for python/pyspark/sql/udf.py

2021-10-18 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-36945.
---
Fix Version/s: 3.3.0
 Assignee: dch nguyen
   Resolution: Fixed

Issue resolved by pull request 34289
https://github.com/apache/spark/pull/34289

> Inline type hints for python/pyspark/sql/udf.py
> ---
>
> Key: SPARK-36945
> URL: https://issues.apache.org/jira/browse/SPARK-36945
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37048) Clean up inlining type hints under SQL module

2021-10-18 Thread Takuya Ueshin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430245#comment-17430245
 ] 

Takuya Ueshin commented on SPARK-37048:
---

I'm working on this.

> Clean up inlining type hints under SQL module
> -
>
> Key: SPARK-37048
> URL: https://issues.apache.org/jira/browse/SPARK-37048
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Now that most of type hits under the SQL module are inlined.
> We should clean up for the module now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37048) Clean up inlining type hints under SQL module

2021-10-18 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-37048:
-

 Summary: Clean up inlining type hints under SQL module
 Key: SPARK-37048
 URL: https://issues.apache.org/jira/browse/SPARK-37048
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


Now that most of type hits under the SQL module are inlined.
We should clean up for the module now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36886) Inline type hints for python/pyspark/sql/context.py

2021-10-18 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-36886.
---
Fix Version/s: 3.3.0
 Assignee: dch nguyen
   Resolution: Fixed

Issue resolved by pull request 34185
https://github.com/apache/spark/pull/34185

> Inline type hints for python/pyspark/sql/context.py
> ---
>
> Key: SPARK-36886
> URL: https://issues.apache.org/jira/browse/SPARK-36886
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>
> Inline type hints for python/pyspark/sql/context.py from Inline type hints 
> for python/pyspark/sql/context.pyi.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36910) Inline type hints for python/pyspark/sql/types.py

2021-10-15 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-36910.
---
Fix Version/s: 3.3.0
 Assignee: Xinrong Meng
   Resolution: Fixed

Issue resolved by pull request 34174
https://github.com/apache/spark/pull/34174

> Inline type hints for python/pyspark/sql/types.py
> -
>
> Key: SPARK-36910
> URL: https://issues.apache.org/jira/browse/SPARK-36910
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
> Fix For: 3.3.0
>
>
> Inline type hints for python/pyspark/sql/types.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   >