[jira] [Resolved] (SPARK-33573) Server side metrics related to push-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-33573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-33573. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37638 [https://github.com/apache/spark/pull/37638] > Server side metrics related to push-based shuffle > - > > Key: SPARK-33573 > URL: https://issues.apache.org/jira/browse/SPARK-33573 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Assignee: Minchu Yang >Priority: Major > Fix For: 3.4.0 > > > Shuffle Server side metrics for push based shuffle. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33573) Server side metrics related to push-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-33573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan reassigned SPARK-33573: --- Assignee: Minchu Yang > Server side metrics related to push-based shuffle > - > > Key: SPARK-33573 > URL: https://issues.apache.org/jira/browse/SPARK-33573 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Assignee: Minchu Yang >Priority: Major > > Shuffle Server side metrics for push based shuffle. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41998) Reeuse test_readwriter test cases
[ https://issues.apache.org/jira/browse/SPARK-41998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41998. -- Assignee: Hyukjin Kwon Resolution: Fixed Fixed in https://github.com/apache/spark/pull/39522 > Reeuse test_readwriter test cases > - > > Key: SPARK-41998 > URL: https://issues.apache.org/jira/browse/SPARK-41998 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41968) Refactor ProtobufSerDe to ProtobufSerDe[T]
[ https://issues.apache.org/jira/browse/SPARK-41968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-41968: -- Assignee: Yang Jie > Refactor ProtobufSerDe to ProtobufSerDe[T] > -- > > Key: SPARK-41968 > URL: https://issues.apache.org/jira/browse/SPARK-41968 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41968) Refactor ProtobufSerDe to ProtobufSerDe[T]
[ https://issues.apache.org/jira/browse/SPARK-41968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-41968. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39487 [https://github.com/apache/spark/pull/39487] > Refactor ProtobufSerDe to ProtobufSerDe[T] > -- > > Key: SPARK-41968 > URL: https://issues.apache.org/jira/browse/SPARK-41968 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42019) Reuse pyspark.sql.tests.test_types test cases
[ https://issues.apache.org/jira/browse/SPARK-42019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675823#comment-17675823 ] Apache Spark commented on SPARK-42019: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39529 > Reuse pyspark.sql.tests.test_types test cases > - > > Key: SPARK-42019 > URL: https://issues.apache.org/jira/browse/SPARK-42019 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42019) Reuse pyspark.sql.tests.test_types test cases
[ https://issues.apache.org/jira/browse/SPARK-42019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42019: Assignee: Apache Spark > Reuse pyspark.sql.tests.test_types test cases > - > > Key: SPARK-42019 > URL: https://issues.apache.org/jira/browse/SPARK-42019 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42019) Reuse pyspark.sql.tests.test_types test cases
[ https://issues.apache.org/jira/browse/SPARK-42019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42019: Assignee: (was: Apache Spark) > Reuse pyspark.sql.tests.test_types test cases > - > > Key: SPARK-42019 > URL: https://issues.apache.org/jira/browse/SPARK-42019 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42019) Reuse pyspark.sql.tests.test_types test cases
[ https://issues.apache.org/jira/browse/SPARK-42019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675822#comment-17675822 ] Apache Spark commented on SPARK-42019: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39529 > Reuse pyspark.sql.tests.test_types test cases > - > > Key: SPARK-42019 > URL: https://issues.apache.org/jira/browse/SPARK-42019 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42024) createDataFrame should corse types of string float to float
Hyukjin Kwon created SPARK-42024: Summary: createDataFrame should corse types of string float to float Key: SPARK-42024 URL: https://issues.apache.org/jira/browse/SPARK-42024 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} pyspark/sql/tests/test_types.py:245 (TypesParityTests.test_infer_schema_upcast_float_to_string) self = def test_infer_schema_upcast_float_to_string(self): > df = self.spark.createDataFrame([[1.33, 1], ["2.1", 1]], schema=["a", > "b"]) ../test_types.py:247: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../connect/session.py:282: in createDataFrame _table = pa.Table.from_pylist([dict(zip(_cols, list(item))) for item in _data]) pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist ??? pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist ??? pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays ??? pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays ??? pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays ??? pyarrow/array.pxi:320: in pyarrow.lib.array ??? pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array ??? pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E pyarrow.lib.ArrowInvalid: Could not convert '2.1' with type str: tried to convert to double pyarrow/error.pxi:100: ArrowInvalid {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42023) createDataFrame should corse types of string false to bool false
Hyukjin Kwon created SPARK-42023: Summary: createDataFrame should corse types of string false to bool false Key: SPARK-42023 URL: https://issues.apache.org/jira/browse/SPARK-42023 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} pyspark/sql/tests/test_types.py:249 (TypesParityTests.test_infer_schema_upcast_boolean_to_string) self = def test_infer_schema_upcast_boolean_to_string(self): > df = self.spark.createDataFrame([[True, 1], ["false", 1]], schema=["a", > "b"]) ../test_types.py:251: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../connect/session.py:282: in createDataFrame _table = pa.Table.from_pylist([dict(zip(_cols, list(item))) for item in _data]) pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist ??? pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist ??? pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays ??? pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays ??? pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays ??? pyarrow/array.pxi:320: in pyarrow.lib.array ??? pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array ??? pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E pyarrow.lib.ArrowInvalid: Could not convert 'false' with type str: tried to convert to boolean pyarrow/error.pxi:100: ArrowInvalid {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42022) createDataFrame should autogenerate missing column names
Hyukjin Kwon created SPARK-42022: Summary: createDataFrame should autogenerate missing column names Key: SPARK-42022 URL: https://issues.apache.org/jira/browse/SPARK-42022 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} pyspark/sql/tests/test_types.py:233 (TypesParityTests.test_infer_schema_not_enough_names) ['col1', '_2'] != ['col1'] Expected :['col1'] Actual :['col1', '_2'] self = def test_infer_schema_not_enough_names(self): df = self.spark.createDataFrame([["a", "b"]], ["col1"]) > self.assertEqual(df.columns, ["col1", "_2"]) ../test_types.py:236: AssertionError {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42021) createDataFrame with array.array
Hyukjin Kwon created SPARK-42021: Summary: createDataFrame with array.array Key: SPARK-42021 URL: https://issues.apache.org/jira/browse/SPARK-42021 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} pyspark/sql/tests/test_types.py:964 (TypesParityTests.test_array_types) self = def test_array_types(self): # This test need to make sure that the Scala type selected is at least # as large as the python's types. This is necessary because python's # array types depend on C implementation on the machine. Therefore there # is no machine independent correspondence between python's array types # and Scala types. # See: https://docs.python.org/2/library/array.html def assertCollectSuccess(typecode, value): row = Row(myarray=array.array(typecode, [value])) df = self.spark.createDataFrame([row]) self.assertEqual(df.first()["myarray"][0], value) # supported string types # # String types in python's array are "u" for Py_UNICODE and "c" for char. # "u" will be removed in python 4, and "c" is not supported in python 3. supported_string_types = [] if sys.version_info[0] < 4: supported_string_types += ["u"] # test unicode > assertCollectSuccess("u", "a") ../test_types.py:986: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../test_types.py:975: in assertCollectSuccess df = self.spark.createDataFrame([row]) ../../connect/session.py:278: in createDataFrame _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in _data]) pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist ??? pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist ??? pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays ??? pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays ??? pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays ??? pyarrow/array.pxi:320: in pyarrow.lib.array ??? pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array ??? pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E pyarrow.lib.ArrowInvalid: Could not convert array('u', 'a') with type array.array: did not recognize Python value type when inferring an Arrow data type pyarrow/error.pxi:100: ArrowInvalid {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42020) createDataFrame with UDT
Hyukjin Kwon created SPARK-42020: Summary: createDataFrame with UDT Key: SPARK-42020 URL: https://issues.apache.org/jira/browse/SPARK-42020 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} pyspark/sql/tests/test_types.py:596 (TypesParityTests.test_apply_schema_with_udt) self = def test_apply_schema_with_udt(self): row = (1.0, ExamplePoint(1.0, 2.0)) schema = StructType( [ StructField("label", DoubleType(), False), StructField("point", ExamplePointUDT(), False), ] ) > df = self.spark.createDataFrame([row], schema) ../test_types.py:605: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../connect/session.py:282: in createDataFrame _table = pa.Table.from_pylist([dict(zip(_cols, list(item))) for item in _data]) pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist ??? pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist ??? pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays ??? pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays ??? pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays ??? pyarrow/array.pxi:320: in pyarrow.lib.array ??? pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array ??? pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E pyarrow.lib.ArrowInvalid: Could not convert ExamplePoint(1.0,2.0) with type ExamplePoint: did not recognize Python value type when inferring an Arrow data type pyarrow/error.pxi:100: ArrowInvalid {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42019) Reuse pyspark.sql.tests.test_group test cases
Hyukjin Kwon created SPARK-42019: Summary: Reuse pyspark.sql.tests.test_group test cases Key: SPARK-42019 URL: https://issues.apache.org/jira/browse/SPARK-42019 Project: Spark Issue Type: Sub-task Components: Connect, Tests Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42010) Reuse pyspark.sql.tests.test_column test cases
[ https://issues.apache.org/jira/browse/SPARK-42010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675818#comment-17675818 ] Apache Spark commented on SPARK-42010: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39528 > Reuse pyspark.sql.tests.test_column test cases > -- > > Key: SPARK-42010 > URL: https://issues.apache.org/jira/browse/SPARK-42010 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42019) Reuse pyspark.sql.tests.test_types test cases
[ https://issues.apache.org/jira/browse/SPARK-42019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42019: - Summary: Reuse pyspark.sql.tests.test_types test cases (was: Reuse pyspark.sql.tests.test_group test cases) > Reuse pyspark.sql.tests.test_types test cases > - > > Key: SPARK-42019 > URL: https://issues.apache.org/jira/browse/SPARK-42019 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42006) Test parity: pyspark.sql.tests.test_group, test_serde, test_datasources and test_column
[ https://issues.apache.org/jira/browse/SPARK-42006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42006: - Description: See https://issues.apache.org/jira/browse/SPARK-41652 and https://issues.apache.org/jira/browse/SPARK-41651 > Test parity: pyspark.sql.tests.test_group, test_serde, test_datasources and > test_column > --- > > Key: SPARK-42006 > URL: https://issues.apache.org/jira/browse/SPARK-42006 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > See https://issues.apache.org/jira/browse/SPARK-41652 and > https://issues.apache.org/jira/browse/SPARK-41651 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42010) Reuse pyspark.sql.tests.test_column test cases
[ https://issues.apache.org/jira/browse/SPARK-42010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42010: Assignee: (was: Apache Spark) > Reuse pyspark.sql.tests.test_column test cases > -- > > Key: SPARK-42010 > URL: https://issues.apache.org/jira/browse/SPARK-42010 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42010) Reuse pyspark.sql.tests.test_column test cases
[ https://issues.apache.org/jira/browse/SPARK-42010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675817#comment-17675817 ] Apache Spark commented on SPARK-42010: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39528 > Reuse pyspark.sql.tests.test_column test cases > -- > > Key: SPARK-42010 > URL: https://issues.apache.org/jira/browse/SPARK-42010 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42018) Test parity: pyspark.sql.tests.test_types
[ https://issues.apache.org/jira/browse/SPARK-42018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42018: - Description: See https://issues.apache.org/jira/browse/SPARK-41652 and https://issues.apache.org/jira/browse/SPARK-41651 > Test parity: pyspark.sql.tests.test_types > - > > Key: SPARK-42018 > URL: https://issues.apache.org/jira/browse/SPARK-42018 > Project: Spark > Issue Type: Umbrella > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > See https://issues.apache.org/jira/browse/SPARK-41652 and > https://issues.apache.org/jira/browse/SPARK-41651 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42010) Reuse pyspark.sql.tests.test_column test cases
[ https://issues.apache.org/jira/browse/SPARK-42010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42010: Assignee: Apache Spark > Reuse pyspark.sql.tests.test_column test cases > -- > > Key: SPARK-42010 > URL: https://issues.apache.org/jira/browse/SPARK-42010 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42018) Test parity: pyspark.sql.tests.test_types
[ https://issues.apache.org/jira/browse/SPARK-42018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42018: - Epic Link: SPARK-39375 > Test parity: pyspark.sql.tests.test_types > - > > Key: SPARK-42018 > URL: https://issues.apache.org/jira/browse/SPARK-42018 > Project: Spark > Issue Type: Umbrella > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42018) Test parity: pyspark.sql.tests.test_types
Hyukjin Kwon created SPARK-42018: Summary: Test parity: pyspark.sql.tests.test_types Key: SPARK-42018 URL: https://issues.apache.org/jira/browse/SPARK-42018 Project: Spark Issue Type: Umbrella Components: Connect, Tests Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42017) Different error type AnalysisException vs SparkConnectAnalysisException
Hyukjin Kwon created SPARK-42017: Summary: Different error type AnalysisException vs SparkConnectAnalysisException Key: SPARK-42017 URL: https://issues.apache.org/jira/browse/SPARK-42017 Project: Spark Issue Type: Sub-task Components: Connect, Tests Affects Versions: 3.4.0 Reporter: Hyukjin Kwon e.g.) {code} 23/01/12 14:33:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). FAILED [ 8%] pyspark/sql/tests/test_column.py:105 (ColumnParityTests.test_access_column) self = def test_access_column(self): df = self.df self.assertTrue(isinstance(df.key, Column)) self.assertTrue(isinstance(df["key"], Column)) self.assertTrue(isinstance(df[0], Column)) self.assertRaises(IndexError, lambda: df[2]) > self.assertRaises(AnalysisException, lambda: df["bad_key"]) E AssertionError: AnalysisException not raised by ../test_column.py:112: AssertionError {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42016) Type inconsistency of struct and map when accessing the nested column
[ https://issues.apache.org/jira/browse/SPARK-42016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42016: - Parent: (was: SPARK-41281) Issue Type: Bug (was: Sub-task) > Type inconsistency of struct and map when accessing the nested column > - > > Key: SPARK-42016 > URL: https://issues.apache.org/jira/browse/SPARK-42016 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > org.apache.spark.sql.AnalysisException: [INVALID_COLUMN_OR_FIELD_DATA_TYPE] > Column or field `d` is of type "STRUCT" while it's required to be > "MAP". > at > org.apache.spark.sql.errors.QueryCompilationErrors$.invalidColumnOrFieldDataTypeError(QueryCompilationErrors.scala:3179) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.reconcileColumnType(basicLogicalOperators.scala:163) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.$anonfun$reorderFields$1(basicLogicalOperators.scala:203) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.reorderFields(basicLogicalOperators.scala:173) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.matchSchema(basicLogicalOperators.scala:103) > at org.apache.spark.sql.Dataset.to(Dataset.scala:485) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformLocalRelation(SparkConnectPlanner.scala:635) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:83) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformProject(SparkConnectPlanner.scala:678) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:70) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformLimit(SparkConnectPlanner.scala:758) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:72) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:58) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:49) > at > org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:135) > at > org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:306) > at > org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) > at > org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352) > at > org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) > at > org.sparkproject.connect.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.sparkproject.connect.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > pyspark/sql/tests/test_column.py:126 (ColumnParityTests.test_field_accessor) > self = testMethod=test_field_accessor> > def test_field_accessor(self): > df = self.spark.createDataFrame([Row(l=[1], r=Row(a=1, b="b"), > d={"k": "v"})]) > > self.assertEqual(1, df.select(df.l[0]).first()[0]) > ../test_column.py:129: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../../connect/dataframe.py:340: in first > return self.head() > ../../connect/dataframe.py:407: in head > rs = self.head(1) > ../../connect/dataframe.py:409: in head > return self.take(n) > ../../connect/dataframe.py:414: in take > return self.limit(num).collect() > ../../connect/dataframe.py:1247: in collect > table = self._session.client.to_table(query) >
[jira] [Updated] (SPARK-42016) Type inconsistency of struct and map when accessing the nested column
[ https://issues.apache.org/jira/browse/SPARK-42016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42016: - Epic Link: (was: SPARK-39375) > Type inconsistency of struct and map when accessing the nested column > - > > Key: SPARK-42016 > URL: https://issues.apache.org/jira/browse/SPARK-42016 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > org.apache.spark.sql.AnalysisException: [INVALID_COLUMN_OR_FIELD_DATA_TYPE] > Column or field `d` is of type "STRUCT" while it's required to be > "MAP". > at > org.apache.spark.sql.errors.QueryCompilationErrors$.invalidColumnOrFieldDataTypeError(QueryCompilationErrors.scala:3179) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.reconcileColumnType(basicLogicalOperators.scala:163) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.$anonfun$reorderFields$1(basicLogicalOperators.scala:203) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.reorderFields(basicLogicalOperators.scala:173) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.matchSchema(basicLogicalOperators.scala:103) > at org.apache.spark.sql.Dataset.to(Dataset.scala:485) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformLocalRelation(SparkConnectPlanner.scala:635) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:83) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformProject(SparkConnectPlanner.scala:678) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:70) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformLimit(SparkConnectPlanner.scala:758) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:72) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:58) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:49) > at > org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:135) > at > org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:306) > at > org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) > at > org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352) > at > org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) > at > org.sparkproject.connect.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.sparkproject.connect.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > pyspark/sql/tests/test_column.py:126 (ColumnParityTests.test_field_accessor) > self = testMethod=test_field_accessor> > def test_field_accessor(self): > df = self.spark.createDataFrame([Row(l=[1], r=Row(a=1, b="b"), > d={"k": "v"})]) > > self.assertEqual(1, df.select(df.l[0]).first()[0]) > ../test_column.py:129: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../../connect/dataframe.py:340: in first > return self.head() > ../../connect/dataframe.py:407: in head > rs = self.head(1) > ../../connect/dataframe.py:409: in head > return self.take(n) > ../../connect/dataframe.py:414: in take > return self.limit(num).collect() > ../../connect/dataframe.py:1247: in collect > table = self._session.client.to_table(query) > ../../connect/client.py:415: in
[jira] [Updated] (SPARK-42016) Type inconsistency of struct and map when accessing the nested column
[ https://issues.apache.org/jira/browse/SPARK-42016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42016: - Epic Link: SPARK-39375 > Type inconsistency of struct and map when accessing the nested column > - > > Key: SPARK-42016 > URL: https://issues.apache.org/jira/browse/SPARK-42016 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > org.apache.spark.sql.AnalysisException: [INVALID_COLUMN_OR_FIELD_DATA_TYPE] > Column or field `d` is of type "STRUCT" while it's required to be > "MAP". > at > org.apache.spark.sql.errors.QueryCompilationErrors$.invalidColumnOrFieldDataTypeError(QueryCompilationErrors.scala:3179) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.reconcileColumnType(basicLogicalOperators.scala:163) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.$anonfun$reorderFields$1(basicLogicalOperators.scala:203) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.reorderFields(basicLogicalOperators.scala:173) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.matchSchema(basicLogicalOperators.scala:103) > at org.apache.spark.sql.Dataset.to(Dataset.scala:485) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformLocalRelation(SparkConnectPlanner.scala:635) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:83) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformProject(SparkConnectPlanner.scala:678) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:70) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformLimit(SparkConnectPlanner.scala:758) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:72) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:58) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:49) > at > org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:135) > at > org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:306) > at > org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) > at > org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352) > at > org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) > at > org.sparkproject.connect.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.sparkproject.connect.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > pyspark/sql/tests/test_column.py:126 (ColumnParityTests.test_field_accessor) > self = testMethod=test_field_accessor> > def test_field_accessor(self): > df = self.spark.createDataFrame([Row(l=[1], r=Row(a=1, b="b"), > d={"k": "v"})]) > > self.assertEqual(1, df.select(df.l[0]).first()[0]) > ../test_column.py:129: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../../connect/dataframe.py:340: in first > return self.head() > ../../connect/dataframe.py:407: in head > rs = self.head(1) > ../../connect/dataframe.py:409: in head > return self.take(n) > ../../connect/dataframe.py:414: in take > return self.limit(num).collect() > ../../connect/dataframe.py:1247: in collect > table = self._session.client.to_table(query) > ../../connect/client.py:415: in to_table > table,
[jira] [Updated] (SPARK-42016) Type inconsistency of struct and map when accessing the nested column
[ https://issues.apache.org/jira/browse/SPARK-42016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42016: - Parent: SPARK-41282 Issue Type: Sub-task (was: Bug) > Type inconsistency of struct and map when accessing the nested column > - > > Key: SPARK-42016 > URL: https://issues.apache.org/jira/browse/SPARK-42016 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > org.apache.spark.sql.AnalysisException: [INVALID_COLUMN_OR_FIELD_DATA_TYPE] > Column or field `d` is of type "STRUCT" while it's required to be > "MAP". > at > org.apache.spark.sql.errors.QueryCompilationErrors$.invalidColumnOrFieldDataTypeError(QueryCompilationErrors.scala:3179) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.reconcileColumnType(basicLogicalOperators.scala:163) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.$anonfun$reorderFields$1(basicLogicalOperators.scala:203) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.reorderFields(basicLogicalOperators.scala:173) > at > org.apache.spark.sql.catalyst.plans.logical.Project$.matchSchema(basicLogicalOperators.scala:103) > at org.apache.spark.sql.Dataset.to(Dataset.scala:485) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformLocalRelation(SparkConnectPlanner.scala:635) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:83) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformProject(SparkConnectPlanner.scala:678) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:70) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformLimit(SparkConnectPlanner.scala:758) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:72) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:58) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:49) > at > org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:135) > at > org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:306) > at > org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) > at > org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352) > at > org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) > at > org.sparkproject.connect.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.sparkproject.connect.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > pyspark/sql/tests/test_column.py:126 (ColumnParityTests.test_field_accessor) > self = testMethod=test_field_accessor> > def test_field_accessor(self): > df = self.spark.createDataFrame([Row(l=[1], r=Row(a=1, b="b"), > d={"k": "v"})]) > > self.assertEqual(1, df.select(df.l[0]).first()[0]) > ../test_column.py:129: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../../connect/dataframe.py:340: in first > return self.head() > ../../connect/dataframe.py:407: in head > rs = self.head(1) > ../../connect/dataframe.py:409: in head > return self.take(n) > ../../connect/dataframe.py:414: in take > return self.limit(num).collect() > ../../connect/dataframe.py:1247: in collect > table = self._session.client.to_table(query) >
[jira] [Created] (SPARK-42016) Type inconsistency of struct and map when accessing the nested column
Hyukjin Kwon created SPARK-42016: Summary: Type inconsistency of struct and map when accessing the nested column Key: SPARK-42016 URL: https://issues.apache.org/jira/browse/SPARK-42016 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} org.apache.spark.sql.AnalysisException: [INVALID_COLUMN_OR_FIELD_DATA_TYPE] Column or field `d` is of type "STRUCT" while it's required to be "MAP". at org.apache.spark.sql.errors.QueryCompilationErrors$.invalidColumnOrFieldDataTypeError(QueryCompilationErrors.scala:3179) at org.apache.spark.sql.catalyst.plans.logical.Project$.reconcileColumnType(basicLogicalOperators.scala:163) at org.apache.spark.sql.catalyst.plans.logical.Project$.$anonfun$reorderFields$1(basicLogicalOperators.scala:203) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) at scala.collection.TraversableLike.map(TraversableLike.scala:286) at scala.collection.TraversableLike.map$(TraversableLike.scala:279) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.sql.catalyst.plans.logical.Project$.reorderFields(basicLogicalOperators.scala:173) at org.apache.spark.sql.catalyst.plans.logical.Project$.matchSchema(basicLogicalOperators.scala:103) at org.apache.spark.sql.Dataset.to(Dataset.scala:485) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformLocalRelation(SparkConnectPlanner.scala:635) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:83) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformProject(SparkConnectPlanner.scala:678) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:70) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformLimit(SparkConnectPlanner.scala:758) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:72) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handlePlan(SparkConnectStreamHandler.scala:58) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:49) at org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:135) at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:306) at org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) at org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352) at org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) at org.sparkproject.connect.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.sparkproject.connect.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) pyspark/sql/tests/test_column.py:126 (ColumnParityTests.test_field_accessor) self = def test_field_accessor(self): df = self.spark.createDataFrame([Row(l=[1], r=Row(a=1, b="b"), d={"k": "v"})]) > self.assertEqual(1, df.select(df.l[0]).first()[0]) ../test_column.py:129: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../connect/dataframe.py:340: in first return self.head() ../../connect/dataframe.py:407: in head rs = self.head(1) ../../connect/dataframe.py:409: in head return self.take(n) ../../connect/dataframe.py:414: in take return self.limit(num).collect() ../../connect/dataframe.py:1247: in collect table = self._session.client.to_table(query) ../../connect/client.py:415: in to_table table, _ = self._execute_and_fetch(req) ../../connect/client.py:593: in _execute_and_fetch self._handle_error(rpc_error) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = rpc_error = <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.INTERNAL details = "[INVALID_COLUMN_OR_FI...ATA_TYPE]
[jira] [Assigned] (SPARK-42009) Reuse pyspark.sql.tests.test_serde test cases
[ https://issues.apache.org/jira/browse/SPARK-42009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42009: Assignee: Apache Spark > Reuse pyspark.sql.tests.test_serde test cases > -- > > Key: SPARK-42009 > URL: https://issues.apache.org/jira/browse/SPARK-42009 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42009) Reuse pyspark.sql.tests.test_serde test cases
[ https://issues.apache.org/jira/browse/SPARK-42009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42009: Assignee: (was: Apache Spark) > Reuse pyspark.sql.tests.test_serde test cases > -- > > Key: SPARK-42009 > URL: https://issues.apache.org/jira/browse/SPARK-42009 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42009) Reuse pyspark.sql.tests.test_serde test cases
[ https://issues.apache.org/jira/browse/SPARK-42009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675814#comment-17675814 ] Apache Spark commented on SPARK-42009: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39527 > Reuse pyspark.sql.tests.test_serde test cases > -- > > Key: SPARK-42009 > URL: https://issues.apache.org/jira/browse/SPARK-42009 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42006) Test parity: pyspark.sql.tests.test_group, test_serde, test_datasources and test_column
[ https://issues.apache.org/jira/browse/SPARK-42006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42006: - Summary: Test parity: pyspark.sql.tests.test_group, test_serde, test_datasources and test_column (was: Test parity: pyspark.sql.tests.test_group, test_serde, test_datasources and test_types) > Test parity: pyspark.sql.tests.test_group, test_serde, test_datasources and > test_column > --- > > Key: SPARK-42006 > URL: https://issues.apache.org/jira/browse/SPARK-42006 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42010) Reuse pyspark.sql.tests.test_column test cases
[ https://issues.apache.org/jira/browse/SPARK-42010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42010: - Summary: Reuse pyspark.sql.tests.test_column test cases (was: Reuse pyspark.sql.tests.test_types test cases) > Reuse pyspark.sql.tests.test_column test cases > -- > > Key: SPARK-42010 > URL: https://issues.apache.org/jira/browse/SPARK-42010 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-42015) Support struct as a key in map
[ https://issues.apache.org/jira/browse/SPARK-42015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon deleted SPARK-42015: - > Support struct as a key in map > -- > > Key: SPARK-42015 > URL: https://issues.apache.org/jira/browse/SPARK-42015 > Project: Spark > Issue Type: Sub-task >Reporter: Hyukjin Kwon >Priority: Major > > {code} > pyspark/sql/tests/test_serde.py:54 (SerdeParityTests.test_struct_in_map) > self = testMethod=test_struct_in_map> > def test_struct_in_map(self): > d = [Row(m={Row(i=1): Row(s="")})] > > df = self.spark.createDataFrame(d).toDF() > ../test_serde.py:57: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../../connect/session.py:278: in createDataFrame > _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in > _data]) > pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist > ??? > pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist > ??? > pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays > ??? > pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays > ??? > pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays > ??? > pyarrow/array.pxi:320: in pyarrow.lib.array > ??? > pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array > ??? > pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status > ??? > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > > ??? > E pyarrow.lib.ArrowTypeError: Expected dict key of type str or bytes, got > 'Row' > pyarrow/error.pxi:123: ArrowTypeError > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42015) Support struct as a key in map
Hyukjin Kwon created SPARK-42015: Summary: Support struct as a key in map Key: SPARK-42015 URL: https://issues.apache.org/jira/browse/SPARK-42015 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} pyspark/sql/tests/test_serde.py:54 (SerdeParityTests.test_struct_in_map) self = def test_struct_in_map(self): d = [Row(m={Row(i=1): Row(s="")})] > df = self.spark.createDataFrame(d).toDF() ../test_serde.py:57: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../connect/session.py:278: in createDataFrame _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in _data]) pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist ??? pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist ??? pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays ??? pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays ??? pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays ??? pyarrow/array.pxi:320: in pyarrow.lib.array ??? pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array ??? pyarrow/error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E pyarrow.lib.ArrowTypeError: Expected dict key of type str or bytes, got 'Row' pyarrow/error.pxi:123: ArrowTypeError {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42014) Support aware datetimes
Hyukjin Kwon created SPARK-42014: Summary: Support aware datetimes Key: SPARK-42014 URL: https://issues.apache.org/jira/browse/SPARK-42014 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} pyspark/sql/tests/test_serde.py:71 (SerdeParityTests.test_filter_with_datetime_timezone) self = def test_filter_with_datetime_timezone(self): dt1 = datetime.datetime(2015, 4, 17, 23, 1, 2, 3000, tzinfo=UTCOffsetTimezone(0)) dt2 = datetime.datetime(2015, 4, 17, 23, 1, 2, 3000, tzinfo=UTCOffsetTimezone(1)) row = Row(date=dt1) > df = self.spark.createDataFrame([row]) ../test_serde.py:76: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../connect/session.py:278: in createDataFrame _table = pa.Table.from_pylist([row.asDict(recursive=True) for row in _data]) pyarrow/table.pxi:3700: in pyarrow.lib.Table.from_pylist ??? pyarrow/table.pxi:5221: in pyarrow.lib._from_pylist ??? pyarrow/table.pxi:3575: in pyarrow.lib.Table.from_arrays ??? pyarrow/table.pxi:1383: in pyarrow.lib._sanitize_arrays ??? pyarrow/table.pxi:1364: in pyarrow.lib._schema_from_arrays ??? pyarrow/array.pxi:320: in pyarrow.lib.array ??? pyarrow/array.pxi:39: in pyarrow.lib._sequence_to_array ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > ??? E NotImplementedError: a tzinfo subclass must implement tzname() pyarrow/error.pxi:144: NotImplementedError {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42008) Reuse pyspark.sql.tests.test_datasources test cases
[ https://issues.apache.org/jira/browse/SPARK-42008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675812#comment-17675812 ] Apache Spark commented on SPARK-42008: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39526 > Reuse pyspark.sql.tests.test_datasources test cases > > > Key: SPARK-42008 > URL: https://issues.apache.org/jira/browse/SPARK-42008 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42008) Reuse pyspark.sql.tests.test_datasources test cases
[ https://issues.apache.org/jira/browse/SPARK-42008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675811#comment-17675811 ] Apache Spark commented on SPARK-42008: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39526 > Reuse pyspark.sql.tests.test_datasources test cases > > > Key: SPARK-42008 > URL: https://issues.apache.org/jira/browse/SPARK-42008 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42008) Reuse pyspark.sql.tests.test_datasources test cases
[ https://issues.apache.org/jira/browse/SPARK-42008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42008: Assignee: (was: Apache Spark) > Reuse pyspark.sql.tests.test_datasources test cases > > > Key: SPARK-42008 > URL: https://issues.apache.org/jira/browse/SPARK-42008 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42008) Reuse pyspark.sql.tests.test_datasources test cases
[ https://issues.apache.org/jira/browse/SPARK-42008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42008: Assignee: Apache Spark > Reuse pyspark.sql.tests.test_datasources test cases > > > Key: SPARK-42008 > URL: https://issues.apache.org/jira/browse/SPARK-42008 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42012) Implement DataFrameReader.orc
Hyukjin Kwon created SPARK-42012: Summary: Implement DataFrameReader.orc Key: SPARK-42012 URL: https://issues.apache.org/jira/browse/SPARK-42012 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} pyspark/sql/tests/test_datasources.py:114 (DataSourcesParityTests.test_read_multiple_orc_file) self = def test_read_multiple_orc_file(self): > df = self.spark.read.orc( [ "python/test_support/sql/orc_partitioned/b=0/c=0", "python/test_support/sql/orc_partitioned/b=1/c=1", ] ) ../test_datasources.py:116: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = args = (['python/test_support/sql/orc_partitioned/b=0/c=0', 'python/test_support/sql/orc_partitioned/b=1/c=1'],) kwargs = {} def orc(self, *args: Any, **kwargs: Any) -> None: > raise NotImplementedError("orc() is not implemented.") E NotImplementedError: orc() is not implemented. ../../connect/readwriter.py:228: NotImplementedError {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42013) Implement DataFrameReader.text to take multiple paths
Hyukjin Kwon created SPARK-42013: Summary: Implement DataFrameReader.text to take multiple paths Key: SPARK-42013 URL: https://issues.apache.org/jira/browse/SPARK-42013 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} java.io.IOException: Illegal file pattern: error parsing regexp: Unclosed character class at pos 8: `['python` at org.apache.hadoop.fs.GlobFilter.init(GlobFilter.java:71) at org.apache.hadoop.fs.GlobFilter.(GlobFilter.java:50) at org.apache.hadoop.fs.Globber.doGlob(Globber.java:265) at org.apache.hadoop.fs.Globber.glob(Globber.java:202) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2124) at org.apache.spark.deploy.SparkHadoopUtil.globPath(SparkHadoopUtil.scala:254) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:736) at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:393) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1067) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1703) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:172) Caused by: org.apache.hadoop.shaded.com.google.re2j.PatternSyntaxException: error parsing regexp: Unclosed character class at pos 8: `['python` at org.apache.hadoop.fs.GlobPattern.error(GlobPattern.java:168) at org.apache.hadoop.fs.GlobPattern.set(GlobPattern.java:151) at org.apache.hadoop.fs.GlobPattern.(GlobPattern.java:42) at org.apache.hadoop.fs.GlobFilter.init(GlobFilter.java:67) ... 19 more pyspark/sql/tests/test_datasources.py:123 (DataSourcesParityTests.test_read_text_file_list) self = def test_read_text_file_list(self): df = self.spark.read.text( ["python/test_support/sql/text-test.txt", "python/test_support/sql/text-test.txt"] ) > count = df.count() ../test_datasources.py:128: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../connect/dataframe.py:177: in count pdd = self.agg(_invoke_function("count", lit(1))).toPandas() ../../connect/dataframe.py:1297: in toPandas return self._session.client.to_pandas(query) ../../connect/client.py:422: in to_pandas table, metrics = self._execute_and_fetch(req) ../../connect/client.py:593: in _execute_and_fetch self._handle_error(rpc_error) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = rpc_error = <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.INTERNAL details = "Illegal file pattern:...tatus:13, grpc_message:"Illegal file pattern: error parsing regexp: Unclosed character class at pos 8: `[\'python`"}" > def _handle_error(self, rpc_error: grpc.RpcError) -> NoReturn: """ Error handling helper for dealing with GRPC Errors. On the server side, certain exceptions are enriched with additional RPC Status information. These are unpacked in this function and put into the exception. To avoid overloading the user with GRPC errors, this message explicitly swallows the error context from the call. This GRPC Error is logged however, and can be enabled. Parameters -- rpc_error : grpc.RpcError RPC Error containing the details of the exception. Returns --- Throws the appropriate internal Python exception. """ logger.exception("GRPC Error received") # We have to cast the value here because, a RpcError is a Call as well. # https://grpc.github.io/grpc/python/grpc.html#grpc.UnaryUnaryMultiCallable.__call__ status = rpc_status.from_call(cast(grpc.Call, rpc_error)) if status: for d in status.details: if d.Is(error_details_pb2.ErrorInfo.DESCRIPTOR): info = error_details_pb2.ErrorInfo() d.Unpack(info) if info.reason ==
[jira] [Created] (SPARK-42011) Implement DataFrameReader.csv
Hyukjin Kwon created SPARK-42011: Summary: Implement DataFrameReader.csv Key: SPARK-42011 URL: https://issues.apache.org/jira/browse/SPARK-42011 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} pyspark/sql/tests/test_datasources.py:147 (DataSourcesParityTests.test_checking_csv_header) self = def test_checking_csv_header(self): path = tempfile.mkdtemp() shutil.rmtree(path) try: self.spark.createDataFrame([[1, 1000], [2000, 2]]).toDF("f1", "f2").write.option( "header", "true" ).csv(path) schema = StructType( [ StructField("f2", IntegerType(), nullable=True), StructField("f1", IntegerType(), nullable=True), ] ) df = ( > self.spark.read.option("header", "true") .schema(schema) .csv(path, enforceSchema=False) ) ../test_datasources.py:162: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = args = ('/var/folders/0c/q8y15ybd3tn7sr2_jmbmftr8gp/T/tmp4kdxohcw',) kwargs = {'enforceSchema': False} def csv(self, *args: Any, **kwargs: Any) -> None: > raise NotImplementedError("csv() is not implemented.") E NotImplementedError: csv() is not implemented. ../../connect/readwriter.py:225: NotImplementedError {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42007) Reuse pyspark.sql.tests.test_group test cases
[ https://issues.apache.org/jira/browse/SPARK-42007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42007: Assignee: (was: Apache Spark) > Reuse pyspark.sql.tests.test_group test cases > - > > Key: SPARK-42007 > URL: https://issues.apache.org/jira/browse/SPARK-42007 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42007) Reuse pyspark.sql.tests.test_group test cases
[ https://issues.apache.org/jira/browse/SPARK-42007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42007: Assignee: Apache Spark > Reuse pyspark.sql.tests.test_group test cases > - > > Key: SPARK-42007 > URL: https://issues.apache.org/jira/browse/SPARK-42007 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42007) Reuse pyspark.sql.tests.test_group test cases
[ https://issues.apache.org/jira/browse/SPARK-42007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675809#comment-17675809 ] Apache Spark commented on SPARK-42007: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39525 > Reuse pyspark.sql.tests.test_group test cases > - > > Key: SPARK-42007 > URL: https://issues.apache.org/jira/browse/SPARK-42007 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42008) Reuse pyspark.sql.tests.test_datasources test cases
[ https://issues.apache.org/jira/browse/SPARK-42008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42008: - Summary: Reuse pyspark.sql.tests.test_datasources test cases (was: Reuse pyspark.sql.tests.test_types test cases ) > Reuse pyspark.sql.tests.test_datasources test cases > > > Key: SPARK-42008 > URL: https://issues.apache.org/jira/browse/SPARK-42008 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42008) Reuse pyspark.sql.tests.test_types test cases
[ https://issues.apache.org/jira/browse/SPARK-42008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42008: - Summary: Reuse pyspark.sql.tests.test_types test cases (was: Reeuse pyspark.sql.tests.test_types test cases ) > Reuse pyspark.sql.tests.test_types test cases > -- > > Key: SPARK-42008 > URL: https://issues.apache.org/jira/browse/SPARK-42008 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42009) Reuse pyspark.sql.tests.test_serde test cases
[ https://issues.apache.org/jira/browse/SPARK-42009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42009: - Summary: Reuse pyspark.sql.tests.test_serde test cases (was: Reeuse pyspark.sql.tests.test_serde test cases ) > Reuse pyspark.sql.tests.test_serde test cases > -- > > Key: SPARK-42009 > URL: https://issues.apache.org/jira/browse/SPARK-42009 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42007) Reeuse pyspark.sql.tests.test_group test cases
[ https://issues.apache.org/jira/browse/SPARK-42007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42007: - Summary: Reeuse pyspark.sql.tests.test_group test cases (was: Reeuse pyspark.sql.tests.test_readwriter test cases) > Reeuse pyspark.sql.tests.test_group test cases > -- > > Key: SPARK-42007 > URL: https://issues.apache.org/jira/browse/SPARK-42007 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42007) Reuse pyspark.sql.tests.test_group test cases
[ https://issues.apache.org/jira/browse/SPARK-42007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42007: - Summary: Reuse pyspark.sql.tests.test_group test cases (was: Reeuse pyspark.sql.tests.test_group test cases) > Reuse pyspark.sql.tests.test_group test cases > - > > Key: SPARK-42007 > URL: https://issues.apache.org/jira/browse/SPARK-42007 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42010) Reuse pyspark.sql.tests.test_types test cases
Hyukjin Kwon created SPARK-42010: Summary: Reuse pyspark.sql.tests.test_types test cases Key: SPARK-42010 URL: https://issues.apache.org/jira/browse/SPARK-42010 Project: Spark Issue Type: Sub-task Components: Connect, Tests Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42008) Reeuse pyspark.sql.tests.test_types test cases
Hyukjin Kwon created SPARK-42008: Summary: Reeuse pyspark.sql.tests.test_types test cases Key: SPARK-42008 URL: https://issues.apache.org/jira/browse/SPARK-42008 Project: Spark Issue Type: Sub-task Components: Connect, Tests Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42009) Reeuse pyspark.sql.tests.test_serde test cases
Hyukjin Kwon created SPARK-42009: Summary: Reeuse pyspark.sql.tests.test_serde test cases Key: SPARK-42009 URL: https://issues.apache.org/jira/browse/SPARK-42009 Project: Spark Issue Type: Sub-task Components: Connect, Tests Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42007) Reeuse pyspark.sql.tests.test_readwriter test cases
Hyukjin Kwon created SPARK-42007: Summary: Reeuse pyspark.sql.tests.test_readwriter test cases Key: SPARK-42007 URL: https://issues.apache.org/jira/browse/SPARK-42007 Project: Spark Issue Type: Sub-task Components: Connect, Tests Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42006) Test parity: pyspark.sql.tests.test_group, test_serde, test_datasources and test_types
Hyukjin Kwon created SPARK-42006: Summary: Test parity: pyspark.sql.tests.test_group, test_serde, test_datasources and test_types Key: SPARK-42006 URL: https://issues.apache.org/jira/browse/SPARK-42006 Project: Spark Issue Type: Umbrella Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42006) Test parity: pyspark.sql.tests.test_group, test_serde, test_datasources and test_types
[ https://issues.apache.org/jira/browse/SPARK-42006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42006: - Epic Link: SPARK-39375 > Test parity: pyspark.sql.tests.test_group, test_serde, test_datasources and > test_types > -- > > Key: SPARK-42006 > URL: https://issues.apache.org/jira/browse/SPARK-42006 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42005) SparkR cannot collect dataframe with NA in a date column along with another timestamp column
[ https://issues.apache.org/jira/browse/SPARK-42005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vivek Atal updated SPARK-42005: --- Description: This issue seems to be related with https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by [https://github.com/apache/spark/pull/15421] . If there exists a column of data type `date` which is completely NA, and another column of data type `timestamp`, then SparkR cannot collect that Spark dataframe into R dataframe. The reproducible code snippet is below. {code:java} df <- data.frame(x = as.Date(NA), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #> Error in handleErrors(returnStatus, conn): org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 1 times, most recent failure: Lost task 0.0 in stage 25.0 (TID 25) (ip-10-172-210-194.us-west-2.compute.internal executor driver): java.lang.IllegalArgumentException: Invalid type N #> at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:94) #> at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:68) #> at #> org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1(SQLUtils.scala:129) #> at org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1$adapted(SQLUtils.scala:128) #> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) #> at scala.collection.immutable.Range.foreach(Range.scala:158) #> ...{code} This issue does not appear If the column of `date` data type is {_}not missing{_}. Or if there _does not exist_ any other column with data type as `timestamp`. {code:java} df <- data.frame(x = as.Date("2022-01-01"), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #>x y #> 1 2022-01-012022-01-01{code} or {code:java} df <- data.frame(x = as.Date(NA), y = as.character("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #>x y #> 1 2022-01-01{code} was: This issue seems to be related with https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by [https://github.com/apache/spark/pull/15421] . If there exists a column of data type `date` which is completely NA, and another column of data type `timestamp`, the SparkR cannot collect that Spark dataframe into R dataframe. The reproducible code snippet is below. {code:java} df <- data.frame(x = as.Date(NA), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #> Error in handleErrors(returnStatus, conn): org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 1 times, most recent failure: Lost task 0.0 in stage 25.0 (TID 25) (ip-10-172-210-194.us-west-2.compute.internal executor driver): java.lang.IllegalArgumentException: Invalid type N #> at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:94) #> at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:68) #> at #> org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1(SQLUtils.scala:129) #> at org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1$adapted(SQLUtils.scala:128) #> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) #> at scala.collection.immutable.Range.foreach(Range.scala:158) #> ...{code} This issue does not appear If the column of `date` data type is {_}not missing{_}. Or if there _does not exist_ any other column with data type as `timestamp`. {code:java} df <- data.frame(x = as.Date("2022-01-01"), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #>x y #> 1 2022-01-012022-01-01{code} or {code:java} df <- data.frame(x = as.Date(NA), y = as.character("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #>x y #> 1 2022-01-01{code} > SparkR cannot collect dataframe with NA in a date column along with another > timestamp column > > > Key: SPARK-42005 > URL: https://issues.apache.org/jira/browse/SPARK-42005 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 3.3.0 >Reporter: Vivek Atal >Priority: Major > > This issue seems to be related with > https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by > [https://github.com/apache/spark/pull/15421] . > If there exists a column of data type `date` which is completely NA, and > another column of data type `timestamp`, then SparkR cannot collect that > Spark dataframe into R dataframe. > The reproducible code snippet is below. > {code:java} > df <- data.frame(x = as.Date(NA), y
[jira] [Updated] (SPARK-42005) SparkR cannot collect dataframe with NA in a date column along with another timestamp column
[ https://issues.apache.org/jira/browse/SPARK-42005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vivek Atal updated SPARK-42005: --- Description: This issue seems to be related with https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by [https://github.com/apache/spark/pull/15421] . If there exists a column of data type `date` which is completely NA, and another column of data type `timestamp`, the SparkR cannot collect that Spark dataframe into R dataframe. The reproducible code snippet is below. {code:java} df <- data.frame(x = as.Date(NA), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #> Error in handleErrors(returnStatus, conn): org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 1 times, most recent failure: Lost task 0.0 in stage 25.0 (TID 25) (ip-10-172-210-194.us-west-2.compute.internal executor driver): java.lang.IllegalArgumentException: Invalid type N #> at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:94) #> at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:68) #> at #> org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1(SQLUtils.scala:129) #> at org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1$adapted(SQLUtils.scala:128) #> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) #> at scala.collection.immutable.Range.foreach(Range.scala:158) #> ...{code} This issue does not appear If the column of `date` data type is {_}not missing{_}. Or if there _does not exist_ any other column with data type as `timestamp`. {code:java} df <- data.frame(x = as.Date("2022-01-01"), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #>x y #> 1 2022-01-012022-01-01{code} or {code:java} df <- data.frame(x = as.Date(NA), y = as.character("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #>x y #> 1 2022-01-01{code} was: This issue seems to be related with https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by [https://github.com/apache/spark/pull/15421] . If there exists a column of data type `date` which is completely NA, and another column of data type `timestamp`, the SparkR cannot collect that Spark dataframe into R dataframe. The reproducible code snippet is below. {code:java} df <- data.frame(x = as.Date(NA), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #> Error in handleErrors(returnStatus, conn): org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 1 times, most recent failure: Lost task 0.0 in stage 25.0 (TID 25) (ip-10-172-210-194.us-west-2.compute.internal executor driver): java.lang.IllegalArgumentException: Invalid type N #> at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:94) #> at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:68) #> at #> org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1(SQLUtils.scala:129) #> at org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1$adapted(SQLUtils.scala:128) #> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) #> at scala.collection.immutable.Range.foreach(Range.scala:158) #> ...{code} This issue does not appear If the column of `date` data type is {_}not missing{_}. Or if there _does not exist_ any other column with data type as `timestamp`. {code:java} df <- data.frame(x = as.Date("2022-01-01"), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #>x y #> 1 2022-01-012022-01-01{code} or {code:java} df <- data.frame(x = as.Date(NA), y = as.character("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #>x y #> 1 2022-01-01{code} > SparkR cannot collect dataframe with NA in a date column along with another > timestamp column > > > Key: SPARK-42005 > URL: https://issues.apache.org/jira/browse/SPARK-42005 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 3.3.0 >Reporter: Vivek Atal >Priority: Major > > This issue seems to be related with > https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by > [https://github.com/apache/spark/pull/15421] . > If there exists a column of data type `date` which is completely NA, and > another column of data type `timestamp`, the SparkR cannot collect that Spark > dataframe into R dataframe. > The reproducible code snippet is below. > {code:java} > df <- data.frame(x = as.Date(NA),
[jira] [Created] (SPARK-42005) SparkR cannot collect dataframe with NA in a date column along with another timestamp column
Vivek Atal created SPARK-42005: -- Summary: SparkR cannot collect dataframe with NA in a date column along with another timestamp column Key: SPARK-42005 URL: https://issues.apache.org/jira/browse/SPARK-42005 Project: Spark Issue Type: Bug Components: SparkR Affects Versions: 3.3.0 Reporter: Vivek Atal This issue seems to be related with https://issues.apache.org/jira/browse/SPARK-17811, which was resolved by [https://github.com/apache/spark/pull/15421] . If there exists a column of data type `date` which is completely NA, and another column of data type `timestamp`, the SparkR cannot collect that Spark dataframe into R dataframe. The reproducible code snippet is below. {code:java} df <- data.frame(x = as.Date(NA), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #> Error in handleErrors(returnStatus, conn): org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 1 times, most recent failure: Lost task 0.0 in stage 25.0 (TID 25) (ip-10-172-210-194.us-west-2.compute.internal executor driver): java.lang.IllegalArgumentException: Invalid type N #> at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:94) #> at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:68) #> at #> org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1(SQLUtils.scala:129) #> at org.apache.spark.sql.api.r.SQLUtils$.$anonfun$bytesToRow$1$adapted(SQLUtils.scala:128) #> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) #> at scala.collection.immutable.Range.foreach(Range.scala:158) #> ...{code} This issue does not appear If the column of `date` data type is {_}not missing{_}. Or if there _does not exist_ any other column with data type as `timestamp`. {code:java} df <- data.frame(x = as.Date("2022-01-01"), y = as.POSIXct("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #>x y #> 1 2022-01-012022-01-01{code} or {code:java} df <- data.frame(x = as.Date(NA), y = as.character("2022-01-01")) SparkR::collect(SparkR::createDataFrame(df)) #>x y #> 1 2022-01-01{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41996) KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due to kafka operations taking longer
[ https://issues.apache.org/jira/browse/SPARK-41996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-41996. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39520 [https://github.com/apache/spark/pull/39520] > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer > --- > > Key: SPARK-41996 > URL: https://issues.apache.org/jira/browse/SPARK-41996 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Assignee: Anish Shrigondekar >Priority: Major > Fix For: 3.4.0 > > > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41996) KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due to kafka operations taking longer
[ https://issues.apache.org/jira/browse/SPARK-41996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-41996: Assignee: Anish Shrigondekar > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer > --- > > Key: SPARK-41996 > URL: https://issues.apache.org/jira/browse/SPARK-41996 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Assignee: Anish Shrigondekar >Priority: Major > > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42004) Migrate "XX000" sqlState onto `INTERNAL_ERROR`
Haejoon Lee created SPARK-42004: --- Summary: Migrate "XX000" sqlState onto `INTERNAL_ERROR` Key: SPARK-42004 URL: https://issues.apache.org/jira/browse/SPARK-42004 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Haejoon Lee We should "sqlState" : "XX000" onto INTERNAL_ERROR to follow the standard (This is what PostgreSQL does). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41990) Filtering by composite field name like `field name` doesn't work with pushDownPredicate = true
[ https://issues.apache.org/jira/browse/SPARK-41990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675795#comment-17675795 ] Apache Spark commented on SPARK-41990: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/39524 > Filtering by composite field name like `field name` doesn't work with > pushDownPredicate = true > -- > > Key: SPARK-41990 > URL: https://issues.apache.org/jira/browse/SPARK-41990 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1 >Reporter: Marina Krasilnikova >Priority: Major > > Suppose we have some table in postgresql with field `Last Name` The following > code results in error > Dataset dataset = sparkSession.read() > .format("jdbc") > .option("url", myUrl) > .option("dbtable", "myTable") > .option("user", "myUser") > .option("password", "muPassword") > .load(); > dataset.where("`Last Name`='Tessel'").show(); //error > > > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > Syntax error at or near 'Name': extra input 'Name'(line 1, pos 5) > == SQL == > Last Name > -^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:143) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:67) > at > org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:40) > at > org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:368) > at org.apache.spark.sql.sources.IsNotNull.toV2(filters.scala:262) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.$anonfun$unhandledFilters$1(JDBCRelation.scala:278) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.$anonfun$unhandledFilters$1$adapted(JDBCRelation.scala:278) > > But if we set pushDownPredicate to false everything works fine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41990) Filtering by composite field name like `field name` doesn't work with pushDownPredicate = true
[ https://issues.apache.org/jira/browse/SPARK-41990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41990: Assignee: Apache Spark > Filtering by composite field name like `field name` doesn't work with > pushDownPredicate = true > -- > > Key: SPARK-41990 > URL: https://issues.apache.org/jira/browse/SPARK-41990 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1 >Reporter: Marina Krasilnikova >Assignee: Apache Spark >Priority: Major > > Suppose we have some table in postgresql with field `Last Name` The following > code results in error > Dataset dataset = sparkSession.read() > .format("jdbc") > .option("url", myUrl) > .option("dbtable", "myTable") > .option("user", "myUser") > .option("password", "muPassword") > .load(); > dataset.where("`Last Name`='Tessel'").show(); //error > > > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > Syntax error at or near 'Name': extra input 'Name'(line 1, pos 5) > == SQL == > Last Name > -^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:143) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:67) > at > org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:40) > at > org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:368) > at org.apache.spark.sql.sources.IsNotNull.toV2(filters.scala:262) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.$anonfun$unhandledFilters$1(JDBCRelation.scala:278) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.$anonfun$unhandledFilters$1$adapted(JDBCRelation.scala:278) > > But if we set pushDownPredicate to false everything works fine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41990) Filtering by composite field name like `field name` doesn't work with pushDownPredicate = true
[ https://issues.apache.org/jira/browse/SPARK-41990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41990: Assignee: (was: Apache Spark) > Filtering by composite field name like `field name` doesn't work with > pushDownPredicate = true > -- > > Key: SPARK-41990 > URL: https://issues.apache.org/jira/browse/SPARK-41990 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1 >Reporter: Marina Krasilnikova >Priority: Major > > Suppose we have some table in postgresql with field `Last Name` The following > code results in error > Dataset dataset = sparkSession.read() > .format("jdbc") > .option("url", myUrl) > .option("dbtable", "myTable") > .option("user", "myUser") > .option("password", "muPassword") > .load(); > dataset.where("`Last Name`='Tessel'").show(); //error > > > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > Syntax error at or near 'Name': extra input 'Name'(line 1, pos 5) > == SQL == > Last Name > -^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:143) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:67) > at > org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:40) > at > org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:368) > at org.apache.spark.sql.sources.IsNotNull.toV2(filters.scala:262) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.$anonfun$unhandledFilters$1(JDBCRelation.scala:278) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.$anonfun$unhandledFilters$1$adapted(JDBCRelation.scala:278) > > But if we set pushDownPredicate to false everything works fine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41990) Filtering by composite field name like `field name` doesn't work with pushDownPredicate = true
[ https://issues.apache.org/jira/browse/SPARK-41990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675793#comment-17675793 ] Apache Spark commented on SPARK-41990: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/39524 > Filtering by composite field name like `field name` doesn't work with > pushDownPredicate = true > -- > > Key: SPARK-41990 > URL: https://issues.apache.org/jira/browse/SPARK-41990 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.1 >Reporter: Marina Krasilnikova >Priority: Major > > Suppose we have some table in postgresql with field `Last Name` The following > code results in error > Dataset dataset = sparkSession.read() > .format("jdbc") > .option("url", myUrl) > .option("dbtable", "myTable") > .option("user", "myUser") > .option("password", "muPassword") > .load(); > dataset.where("`Last Name`='Tessel'").show(); //error > > > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > Syntax error at or near 'Name': extra input 'Name'(line 1, pos 5) > == SQL == > Last Name > -^^^ > at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:306) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:143) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parseMultipartIdentifier(ParseDriver.scala:67) > at > org.apache.spark.sql.connector.expressions.LogicalExpressions$.parseReference(expressions.scala:40) > at > org.apache.spark.sql.connector.expressions.FieldReference$.apply(expressions.scala:368) > at org.apache.spark.sql.sources.IsNotNull.toV2(filters.scala:262) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.$anonfun$unhandledFilters$1(JDBCRelation.scala:278) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.$anonfun$unhandledFilters$1$adapted(JDBCRelation.scala:278) > > But if we set pushDownPredicate to false everything works fine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42003) Reduce duplicate code in ResolveGroupByAll
[ https://issues.apache.org/jira/browse/SPARK-42003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42003: Assignee: Gengliang Wang (was: Apache Spark) > Reduce duplicate code in ResolveGroupByAll > -- > > Key: SPARK-42003 > URL: https://issues.apache.org/jira/browse/SPARK-42003 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42003) Reduce duplicate code in ResolveGroupByAll
[ https://issues.apache.org/jira/browse/SPARK-42003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42003: Assignee: Apache Spark (was: Gengliang Wang) > Reduce duplicate code in ResolveGroupByAll > -- > > Key: SPARK-42003 > URL: https://issues.apache.org/jira/browse/SPARK-42003 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42003) Reduce duplicate code in ResolveGroupByAll
[ https://issues.apache.org/jira/browse/SPARK-42003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675792#comment-17675792 ] Apache Spark commented on SPARK-42003: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39523 > Reduce duplicate code in ResolveGroupByAll > -- > > Key: SPARK-42003 > URL: https://issues.apache.org/jira/browse/SPARK-42003 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42003) Reduce duplicate code in ResolveGroupByAll
Gengliang Wang created SPARK-42003: -- Summary: Reduce duplicate code in ResolveGroupByAll Key: SPARK-42003 URL: https://issues.apache.org/jira/browse/SPARK-42003 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.4.0 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41998) Reeuse test_readwriter test cases
[ https://issues.apache.org/jira/browse/SPARK-41998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675788#comment-17675788 ] Apache Spark commented on SPARK-41998: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39522 > Reeuse test_readwriter test cases > - > > Key: SPARK-41998 > URL: https://issues.apache.org/jira/browse/SPARK-41998 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41998) Reeuse test_readwriter test cases
[ https://issues.apache.org/jira/browse/SPARK-41998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41998: Assignee: Apache Spark > Reeuse test_readwriter test cases > - > > Key: SPARK-41998 > URL: https://issues.apache.org/jira/browse/SPARK-41998 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41998) Reeuse test_readwriter test cases
[ https://issues.apache.org/jira/browse/SPARK-41998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675787#comment-17675787 ] Apache Spark commented on SPARK-41998: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39522 > Reeuse test_readwriter test cases > - > > Key: SPARK-41998 > URL: https://issues.apache.org/jira/browse/SPARK-41998 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41998) Reeuse test_readwriter test cases
[ https://issues.apache.org/jira/browse/SPARK-41998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41998: Assignee: (was: Apache Spark) > Reeuse test_readwriter test cases > - > > Key: SPARK-41998 > URL: https://issues.apache.org/jira/browse/SPARK-41998 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42002) Implement DataFrameWriterV2 (ReadwriterV2Tests)
Hyukjin Kwon created SPARK-42002: Summary: Implement DataFrameWriterV2 (ReadwriterV2Tests) Key: SPARK-42002 URL: https://issues.apache.org/jira/browse/SPARK-42002 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} pyspark/sql/tests/test_readwriter.py:182 (ReadwriterV2ParityTests.test_api) self = def test_api(self): df = self.df > writer = df.writeTo("testcat.t") ../test_readwriter.py:185: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = DataFrame[key: bigint, value: string], args = ('testcat.t',), kwargs = {} def writeTo(self, *args: Any, **kwargs: Any) -> None: > raise NotImplementedError("writeTo() is not implemented.") E NotImplementedError: writeTo() is not implemented. ../../connect/dataframe.py:1529: NotImplementedError {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42001) Unexpected schema set to DefaultSource plan (ReadwriterTests.test_save_and_load)
Hyukjin Kwon created SPARK-42001: Summary: Unexpected schema set to DefaultSource plan (ReadwriterTests.test_save_and_load) Key: SPARK-42001 URL: https://issues.apache.org/jira/browse/SPARK-42001 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} pyspark/sql/tests/test_readwriter.py:28 (ReadwriterParityTests.test_save_and_load) self = def test_save_and_load(self): df = self.df tmpPath = tempfile.mkdtemp() shutil.rmtree(tmpPath) df.write.json(tmpPath) actual = self.spark.read.json(tmpPath) self.assertEqual(sorted(df.collect()), sorted(actual.collect())) schema = StructType([StructField("value", StringType(), True)]) actual = self.spark.read.json(tmpPath, schema) > self.assertEqual(sorted(df.select("value").collect()), > sorted(actual.collect())) ../test_readwriter.py:39: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../connect/dataframe.py:1246: in collect query = self._plan.to_proto(self._session.client) ../../connect/plan.py:93: in to_proto plan.root.CopyFrom(self.plan(session)) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = session = def plan(self, session: "SparkConnectClient") -> proto.Relation: plan = proto.Relation() if self.format is not None: plan.read.data_source.format = self.format if self.schema is not None: > plan.read.data_source.schema = self.schema E TypeError: StructType([StructField('value', StringType(), True)]) has type StructType, but expected one of: bytes, unicode ../../connect/plan.py:246: TypeError {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42000) saveAsTable fail to find the default source (ReadwriterTests.test_insert_into)
[ https://issues.apache.org/jira/browse/SPARK-42000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-42000: - Summary: saveAsTable fail to find the default source (ReadwriterTests.test_insert_into) (was: saveAsTable fail to find the default source) > saveAsTable fail to find the default source (ReadwriterTests.test_insert_into) > -- > > Key: SPARK-42000 > URL: https://issues.apache.org/jira/browse/SPARK-42000 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed > to find the data source: . Please find packages at > `https://spark.apache.org/third-party-projects.html`. > at > org.apache.spark.sql.errors.QueryExecutionErrors$.dataSourceNotFoundError(QueryExecutionErrors.scala:739) > at > org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:646) > at > org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:696) > at > org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:860) > at > org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:559) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.handleWriteOperation(SparkConnectPlanner.scala:1426) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(SparkConnectPlanner.scala:1297) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handleCommand(SparkConnectStreamHandler.scala:182) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48) > at > org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:135) > at > org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:306) > at > org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) > at > org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352) > at > org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) > at > org.sparkproject.connect.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.sparkproject.connect.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassNotFoundException: .DefaultSource > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:418) > at java.lang.ClassLoader.loadClass(ClassLoader.java:351) > at > org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:632) > at scala.util.Try$.apply(Try.scala:213) > at > org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:632) > at scala.util.Failure.orElse(Try.scala:224) > at > org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:632) > ... 17 more > pyspark/sql/tests/test_readwriter.py:159 > (ReadwriterParityTests.test_insert_into) > self = > testMethod=test_insert_into> > def test_insert_into(self): > df = self.spark.createDataFrame([("a", 1), ("b", 2)], ["C1", "C2"]) > with self.table("test_table"): > > df.write.saveAsTable("test_table") > ../test_readwriter.py:163: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../../connect/readwriter.py:381: in saveAsTable > > self._spark.client.execute_command(self._write.command(self._spark.client)) > ../../connect/client.py:478: in execute_command > self._execute(req) > ../../connect/client.py:562: in _execute > self._handle_error(rpc_error) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = 0x7fe0d069b5b0> > rpc_error = <_MultiThreadedRendezvous of RPC that terminated with: > status = StatusCode.INTERNAL > details = ".DefaultSource" > debu...pv6:%5B::1%5D:15002 > {created_time:"2023-01-12T11:27:46.698322+09:00", grpc_status:13, > grpc_message:".DefaultSource"}" > > > def
[jira] [Updated] (SPARK-41999) NPE for bucketed write (ReadwriterTests.test_bucketed_write)
[ https://issues.apache.org/jira/browse/SPARK-41999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41999: - Summary: NPE for bucketed write (ReadwriterTests.test_bucketed_write) (was: NPE for bucketed write) > NPE for bucketed write (ReadwriterTests.test_bucketed_write) > > > Key: SPARK-41999 > URL: https://issues.apache.org/jira/browse/SPARK-41999 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > java.util.NoSuchElementException > at java.util.AbstractList$Itr.next(AbstractList.java:364) > at > scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46) > at scala.collection.IterableLike.head(IterableLike.scala:109) > at scala.collection.IterableLike.head$(IterableLike.scala:108) > at scala.collection.AbstractIterable.head(Iterable.scala:56) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.handleWriteOperation(SparkConnectPlanner.scala:1411) > at > org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(SparkConnectPlanner.scala:1297) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handleCommand(SparkConnectStreamHandler.scala:182) > at > org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48) > at > org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:135) > at > org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:306) > at > org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) > at > org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352) > at > org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) > at > org.sparkproject.connect.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.sparkproject.connect.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 23/01/12 11:27:45 ERROR SerializingExecutor: Exception while executing > runnable > org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@6c9d5784 > java.lang.NullPointerException > at > org.sparkproject.connect.google_protos.rpc.Status$Builder.setMessage(Status.java:783) > at > org.apache.spark.sql.connect.service.SparkConnectService$$anonfun$handleError$1.applyOrElse(SparkConnectService.scala:112) > at > org.apache.spark.sql.connect.service.SparkConnectService$$anonfun$handleError$1.applyOrElse(SparkConnectService.scala:85) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) > at > org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:136) > at > org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:306) > at > org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) > at > org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352) > at > org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) > at > org.sparkproject.connect.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) > at > org.sparkproject.connect.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > pyspark/sql/tests/test_readwriter.py:102 > (ReadwriterParityTests.test_bucketed_write) > self = > testMethod=test_bucketed_write> > def test_bucketed_write(self): > data = [ > (1, "foo", 3.0), > (2, "foo", 5.0), > (3, "bar", -1.0), > (4, "bar", 6.0), > ] > df = self.spark.createDataFrame(data, ["x", "y", "z"]) > > def count_bucketed_cols(names,
[jira] [Created] (SPARK-42000) saveAsTable fail to find the default source
Hyukjin Kwon created SPARK-42000: Summary: saveAsTable fail to find the default source Key: SPARK-42000 URL: https://issues.apache.org/jira/browse/SPARK-42000 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} org.apache.spark.SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: . Please find packages at `https://spark.apache.org/third-party-projects.html`. at org.apache.spark.sql.errors.QueryExecutionErrors$.dataSourceNotFoundError(QueryExecutionErrors.scala:739) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:646) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:696) at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:860) at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:559) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.handleWriteOperation(SparkConnectPlanner.scala:1426) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(SparkConnectPlanner.scala:1297) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handleCommand(SparkConnectStreamHandler.scala:182) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48) at org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:135) at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:306) at org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) at org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352) at org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) at org.sparkproject.connect.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.sparkproject.connect.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: .DefaultSource at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:632) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:632) at scala.util.Failure.orElse(Try.scala:224) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:632) ... 17 more pyspark/sql/tests/test_readwriter.py:159 (ReadwriterParityTests.test_insert_into) self = def test_insert_into(self): df = self.spark.createDataFrame([("a", 1), ("b", 2)], ["C1", "C2"]) with self.table("test_table"): > df.write.saveAsTable("test_table") ../test_readwriter.py:163: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ../../connect/readwriter.py:381: in saveAsTable self._spark.client.execute_command(self._write.command(self._spark.client)) ../../connect/client.py:478: in execute_command self._execute(req) ../../connect/client.py:562: in _execute self._handle_error(rpc_error) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = rpc_error = <_MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.INTERNAL details = ".DefaultSource" debu...pv6:%5B::1%5D:15002 {created_time:"2023-01-12T11:27:46.698322+09:00", grpc_status:13, grpc_message:".DefaultSource"}" > def _handle_error(self, rpc_error: grpc.RpcError) -> NoReturn: """ Error handling helper for dealing with GRPC Errors. On the server side, certain exceptions are enriched with additional RPC Status information. These are unpacked in this function and put into the exception. To avoid overloading the user with GRPC errors, this message explicitly swallows the error context from the call. This GRPC Error is logged however, and can be enabled. Parameters -- rpc_error :
[jira] [Created] (SPARK-41999) NPE for bucketed write
Hyukjin Kwon created SPARK-41999: Summary: NPE for bucketed write Key: SPARK-41999 URL: https://issues.apache.org/jira/browse/SPARK-41999 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon {code} java.util.NoSuchElementException at java.util.AbstractList$Itr.next(AbstractList.java:364) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46) at scala.collection.IterableLike.head(IterableLike.scala:109) at scala.collection.IterableLike.head$(IterableLike.scala:108) at scala.collection.AbstractIterable.head(Iterable.scala:56) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.handleWriteOperation(SparkConnectPlanner.scala:1411) at org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(SparkConnectPlanner.scala:1297) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handleCommand(SparkConnectStreamHandler.scala:182) at org.apache.spark.sql.connect.service.SparkConnectStreamHandler.handle(SparkConnectStreamHandler.scala:48) at org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:135) at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:306) at org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) at org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352) at org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) at org.sparkproject.connect.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.sparkproject.connect.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 23/01/12 11:27:45 ERROR SerializingExecutor: Exception while executing runnable org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@6c9d5784 java.lang.NullPointerException at org.sparkproject.connect.google_protos.rpc.Status$Builder.setMessage(Status.java:783) at org.apache.spark.sql.connect.service.SparkConnectService$$anonfun$handleError$1.applyOrElse(SparkConnectService.scala:112) at org.apache.spark.sql.connect.service.SparkConnectService$$anonfun$handleError$1.applyOrElse(SparkConnectService.scala:85) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:136) at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:306) at org.sparkproject.connect.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182) at org.sparkproject.connect.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:352) at org.sparkproject.connect.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:866) at org.sparkproject.connect.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at org.sparkproject.connect.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) pyspark/sql/tests/test_readwriter.py:102 (ReadwriterParityTests.test_bucketed_write) self = def test_bucketed_write(self): data = [ (1, "foo", 3.0), (2, "foo", 5.0), (3, "bar", -1.0), (4, "bar", 6.0), ] df = self.spark.createDataFrame(data, ["x", "y", "z"]) def count_bucketed_cols(names, table="pyspark_bucket"): """Given a sequence of column names and a table name query the catalog and return number o columns which are used for bucketing """ cols = self.spark.catalog.listColumns(table) num = len([c for c in cols if c.name in names and c.isBucket]) return num with self.table("pyspark_bucket"): # Test write with one bucketing column > df.write.bucketBy(3, >
[jira] [Updated] (SPARK-41997) Test parity: pyspark.sql.tests.test_readwriter
[ https://issues.apache.org/jira/browse/SPARK-41997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-41997: - Description: See https://issues.apache.org/jira/browse/SPARK-41652 and https://issues.apache.org/jira/browse/SPARK-41651 > Test parity: pyspark.sql.tests.test_readwriter > -- > > Key: SPARK-41997 > URL: https://issues.apache.org/jira/browse/SPARK-41997 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > > See https://issues.apache.org/jira/browse/SPARK-41652 and > https://issues.apache.org/jira/browse/SPARK-41651 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41998) Reeuse test_readwriter test cases
Hyukjin Kwon created SPARK-41998: Summary: Reeuse test_readwriter test cases Key: SPARK-41998 URL: https://issues.apache.org/jira/browse/SPARK-41998 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41997) Test parity: pyspark.sql.tests.test_readwriter
Hyukjin Kwon created SPARK-41997: Summary: Test parity: pyspark.sql.tests.test_readwriter Key: SPARK-41997 URL: https://issues.apache.org/jira/browse/SPARK-41997 Project: Spark Issue Type: Umbrella Components: Connect Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41887) Support DataFrame hint parameter to be list
[ https://issues.apache.org/jira/browse/SPARK-41887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675783#comment-17675783 ] Apache Spark commented on SPARK-41887: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/39521 > Support DataFrame hint parameter to be list > --- > > Key: SPARK-41887 > URL: https://issues.apache.org/jira/browse/SPARK-41887 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > {code:java} > df = self.spark.range(10e10).toDF("id") > such_a_nice_list = ["itworks1", "itworks2", "itworks3"] > hinted_df = df.hint("my awesome hint", 1.2345, "what", such_a_nice_list){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41996) KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due to kafka operations taking longer
[ https://issues.apache.org/jira/browse/SPARK-41996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675778#comment-17675778 ] Apache Spark commented on SPARK-41996: -- User 'anishshri-db' has created a pull request for this issue: https://github.com/apache/spark/pull/39520 > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer > --- > > Key: SPARK-41996 > URL: https://issues.apache.org/jira/browse/SPARK-41996 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41996) KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due to kafka operations taking longer
[ https://issues.apache.org/jira/browse/SPARK-41996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41996: Assignee: (was: Apache Spark) > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer > --- > > Key: SPARK-41996 > URL: https://issues.apache.org/jira/browse/SPARK-41996 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41996) KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due to kafka operations taking longer
[ https://issues.apache.org/jira/browse/SPARK-41996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41996: Assignee: Apache Spark > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer > --- > > Key: SPARK-41996 > URL: https://issues.apache.org/jira/browse/SPARK-41996 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Assignee: Apache Spark >Priority: Major > > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41996) KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due to kafka operations taking longer
[ https://issues.apache.org/jira/browse/SPARK-41996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675777#comment-17675777 ] Apache Spark commented on SPARK-41996: -- User 'anishshri-db' has created a pull request for this issue: https://github.com/apache/spark/pull/39520 > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer > --- > > Key: SPARK-41996 > URL: https://issues.apache.org/jira/browse/SPARK-41996 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41980) Enable test_functions_broadcast
[ https://issues.apache.org/jira/browse/SPARK-41980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41980. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39500 [https://github.com/apache/spark/pull/39500] > Enable test_functions_broadcast > --- > > Key: SPARK-41980 > URL: https://issues.apache.org/jira/browse/SPARK-41980 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41980) Enable test_functions_broadcast
[ https://issues.apache.org/jira/browse/SPARK-41980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41980: Assignee: Hyukjin Kwon > Enable test_functions_broadcast > --- > > Key: SPARK-41980 > URL: https://issues.apache.org/jira/browse/SPARK-41980 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41996) KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due to kafka operations taking longer
[ https://issues.apache.org/jira/browse/SPARK-41996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675764#comment-17675764 ] Anish Shrigondekar commented on SPARK-41996: Have the fix and will send the PR out soon cc - [~kabhwan] > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer > --- > > Key: SPARK-41996 > URL: https://issues.apache.org/jira/browse/SPARK-41996 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 3.4.0 >Reporter: Anish Shrigondekar >Priority: Major > > KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due > to kafka operations taking longer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41996) KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due to kafka operations taking longer
Anish Shrigondekar created SPARK-41996: -- Summary: KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due to kafka operations taking longer Key: SPARK-41996 URL: https://issues.apache.org/jira/browse/SPARK-41996 Project: Spark Issue Type: Test Components: Structured Streaming Affects Versions: 3.4.0 Reporter: Anish Shrigondekar KafkaMicroBatchV2SourceSuite failed for topic partitions unavailable test due to kafka operations taking longer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41591) Implement functionality for training a PyTorch file locally
[ https://issues.apache.org/jira/browse/SPARK-41591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41591. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39188 [https://github.com/apache/spark/pull/39188] > Implement functionality for training a PyTorch file locally > --- > > Key: SPARK-41591 > URL: https://issues.apache.org/jira/browse/SPARK-41591 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41591) Implement functionality for training a PyTorch file locally
[ https://issues.apache.org/jira/browse/SPARK-41591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41591: Assignee: Rithwik Ediga Lakhamsani > Implement functionality for training a PyTorch file locally > --- > > Key: SPARK-41591 > URL: https://issues.apache.org/jira/browse/SPARK-41591 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41995) schema_of_json only accepts foldable expressions
[ https://issues.apache.org/jira/browse/SPARK-41995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675745#comment-17675745 ] Apache Spark commented on SPARK-41995: -- User 'eric-maynard' has created a pull request for this issue: https://github.com/apache/spark/pull/39519 > schema_of_json only accepts foldable expressions > > > Key: SPARK-41995 > URL: https://issues.apache.org/jira/browse/SPARK-41995 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1 >Reporter: Eric Maynard >Priority: Major > > Right now schema_of_json only accepts foldable expressions, or literals. But > it could be extended to accept any arbitrary expression. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41995) schema_of_json only accepts foldable expressions
[ https://issues.apache.org/jira/browse/SPARK-41995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17675744#comment-17675744 ] Apache Spark commented on SPARK-41995: -- User 'eric-maynard' has created a pull request for this issue: https://github.com/apache/spark/pull/39519 > schema_of_json only accepts foldable expressions > > > Key: SPARK-41995 > URL: https://issues.apache.org/jira/browse/SPARK-41995 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1 >Reporter: Eric Maynard >Priority: Major > > Right now schema_of_json only accepts foldable expressions, or literals. But > it could be extended to accept any arbitrary expression. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41995) schema_of_json only accepts foldable expressions
[ https://issues.apache.org/jira/browse/SPARK-41995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41995: Assignee: (was: Apache Spark) > schema_of_json only accepts foldable expressions > > > Key: SPARK-41995 > URL: https://issues.apache.org/jira/browse/SPARK-41995 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1 >Reporter: Eric Maynard >Priority: Major > > Right now schema_of_json only accepts foldable expressions, or literals. But > it could be extended to accept any arbitrary expression. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org