[jira] [Created] (SPARK-48253) Support default mode for Pandas API on Spark
Haejoon Lee created SPARK-48253: --- Summary: Support default mode for Pandas API on Spark Key: SPARK-48253 URL: https://issues.apache.org/jira/browse/SPARK-48253 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee To reduce the communication cost between Python process and JVM, suggest to support default mode -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48183) Update error contribution guide to respect new error class file
Haejoon Lee created SPARK-48183: --- Summary: Update error contribution guide to respect new error class file Key: SPARK-48183 URL: https://issues.apache.org/jira/browse/SPARK-48183 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee We moved error class definition from .py to .json but documentation still shows old behavior. We should update it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47948) Bump Pandas to 2.0.0
Haejoon Lee created SPARK-47948: --- Summary: Bump Pandas to 2.0.0 Key: SPARK-47948 URL: https://issues.apache.org/jira/browse/SPARK-47948 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee Bump up the minimum version of Pandas from 1.4.4 to 2.0.0 to support Pandas API on Spark from Apache Spark 4.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47864) Enhance "Installation" page to cover all installable options
Haejoon Lee created SPARK-47864: --- Summary: Enhance "Installation" page to cover all installable options Key: SPARK-47864 URL: https://issues.apache.org/jira/browse/SPARK-47864 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee Like Installation page from Pandas, we might need to cover all installable options with related dependencies from our Installation documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47858) Refactoring the structure for DataFrame error context
Haejoon Lee created SPARK-47858: --- Summary: Refactoring the structure for DataFrame error context Key: SPARK-47858 URL: https://issues.apache.org/jira/browse/SPARK-47858 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee The current implementation for PySpark DataFrame error context could be more flexible by addressing some hacky spots. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47852) Support DataFrameQueryContext for reverse operations
Haejoon Lee created SPARK-47852: --- Summary: Support DataFrameQueryContext for reverse operations Key: SPARK-47852 URL: https://issues.apache.org/jira/browse/SPARK-47852 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee To improve error message for reverse ops -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47827) Missing warnings for deprecated features
Haejoon Lee created SPARK-47827: --- Summary: Missing warnings for deprecated features Key: SPARK-47827 URL: https://issues.apache.org/jira/browse/SPARK-47827 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee There are some APIs will be removed but missing deprecation warnings -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47591) Hive-thriftserver: Migrate logInfo with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834828#comment-17834828 ] Haejoon Lee commented on SPARK-47591: - I'm working on this :) > Hive-thriftserver: Migrate logInfo with variables to structured logging > framework > - > > Key: SPARK-47591 > URL: https://issues.apache.org/jira/browse/SPARK-47591 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47737) Bump PyArrow to 10.0.0
[ https://issues.apache.org/jira/browse/SPARK-47737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-47737: Description: For more rich API support (was: Use the latest version for stability.) > Bump PyArrow to 10.0.0 > -- > > Key: SPARK-47737 > URL: https://issues.apache.org/jira/browse/SPARK-47737 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > For more rich API support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47737) Bump PyArrow to 10.0.0
[ https://issues.apache.org/jira/browse/SPARK-47737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-47737: Summary: Bump PyArrow to 10.0.0 (was: Bump PyArrow to 15.0.2) > Bump PyArrow to 10.0.0 > -- > > Key: SPARK-47737 > URL: https://issues.apache.org/jira/browse/SPARK-47737 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Use the latest version for stability. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-47580) SQL catalyst: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47580 ] Haejoon Lee deleted comment on SPARK-47580: - was (Author: itholic): I'm working on this :) > SQL catalyst: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47580 > URL: https://issues.apache.org/jira/browse/SPARK-47580 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47580) SQL catalyst: Migrate logError with variables to structured logging framework
[ https://issues.apache.org/jira/browse/SPARK-47580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833368#comment-17833368 ] Haejoon Lee commented on SPARK-47580: - I'm working on this :) > SQL catalyst: Migrate logError with variables to structured logging framework > - > > Key: SPARK-47580 > URL: https://issues.apache.org/jira/browse/SPARK-47580 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47240) SPIP: Structured Logging Framework for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-47240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833367#comment-17833367 ] Haejoon Lee commented on SPARK-47240: - Thanks [~Gengliang.Wang] and [~panbingkun] for working on this! As I'm currently investigating on PySpark logging, let me also participate and pick some items here to get more context about structured logging. > SPIP: Structured Logging Framework for Apache Spark > --- > > Key: SPARK-47240 > URL: https://issues.apache.org/jira/browse/SPARK-47240 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Priority: Major > > This proposal aims to enhance Apache Spark's logging system by implementing > structured logging. This transition will change the format of the default log > files from plain text to JSON, making them more accessible and analyzable. > The new logs will include crucial identifiers such as worker, executor, > query, job, stage, and task IDs, thereby making the logs more informative and > facilitating easier search and analysis. > h2. Current Logging Format > The current format of Spark logs is plain text, which can be challenging to > parse and analyze efficiently. An example of the current log format is as > follows: > {code:java} > 23/11/29 17:53:44 ERROR BlockManagerMasterEndpoint: Fail to know the executor > 289 is alive or not. > org.apache.spark.SparkException: Exception thrown in awaitResult: > > Caused by: org.apache.spark.rpc.RpcEndpointNotFoundException: .. > {code} > h2. Proposed Structured Logging Format > The proposed change involves structuring the logs in JSON format, which > organizes the log information into easily identifiable fields. Here is how > the new structured log format would look: > {code:java} > { > "ts":"23/11/29 17:53:44", > "level":"ERROR", > "msg":"Fail to know the executor 289 is alive or not", > "context":{ > "executor_id":"289" > }, > "exception":{ > "class":"org.apache.spark.SparkException", > "msg":"Exception thrown in awaitResult", > "stackTrace":"..." > }, > "source":"BlockManagerMasterEndpoint" > } {code} > This format will enable users to upload and directly query > driver/executor/master/worker log files using Spark SQL for more effective > problem-solving and analysis, such as tracking executor losses or identifying > faulty tasks: > {code:java} > spark.read.json("hdfs://hdfs_host/logs").createOrReplaceTempView("logs") > /* To get all the executor lost logs */ > SELECT * FROM logs WHERE contains(message, 'Lost executor'); > /* To get all the distributed logs about executor 289 */ > SELECT * FROM logs WHERE executor_id = 289; > /* To get all the errors on host 100.116.29.4 */ > SELECT * FROM logs WHERE host = "100.116.29.4" and log_level="ERROR"; > {code} > > SPIP doc: > [https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47543) Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation.
Haejoon Lee created SPARK-47543: --- Summary: Inferring `dict` as `MapType` from Pandas DataFrame to allow DataFrame creation. Key: SPARK-47543 URL: https://issues.apache.org/jira/browse/SPARK-47543 Project: Spark Issue Type: Bug Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee Currently the PyArrow infers the Pandas dictionary field as StructType instead of MapType, so Spark can't handle the schema properly: {code:java} >>> pdf = pd.DataFrame({"str_col": ['second'], "dict_col": [{'first': 0.7, >>> 'second': 0.3}]}) >>> pa.Schema.from_pandas(pdf) str_col: string dict_col: struct child 0, first: double child 1, second: double {code} We cannot handle this case since we use PyArrow for schema creation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44101) Support pandas 2
[ https://issues.apache.org/jira/browse/SPARK-44101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828607#comment-17828607 ] Haejoon Lee commented on SPARK-44101: - Got it. Thanks for the notice, [~dongjoon] ! > Support pandas 2 > > > Key: SPARK-44101 > URL: https://issues.apache.org/jira/browse/SPARK-44101 > Project: Spark > Issue Type: Umbrella > Components: Pandas API on Spark, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: releasenotes > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47338) Introduce `_LEGACY_ERROR_UNKNOWN` for default error class
Haejoon Lee created SPARK-47338: --- Summary: Introduce `_LEGACY_ERROR_UNKNOWN` for default error class Key: SPARK-47338 URL: https://issues.apache.org/jira/browse/SPARK-47338 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Haejoon Lee Currently, in Spark, when an {{ErrorClass}} is not explicitly defined for an exception, the method {{getErrorClass}} returns {{{}null{}}}. This behavior can lead to ambiguity and makes debugging more challenging by not providing a clear indication that the error class was not set. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47274) Provide more useful context for PySpark DataFrame API errors
Haejoon Lee created SPARK-47274: --- Summary: Provide more useful context for PySpark DataFrame API errors Key: SPARK-47274 URL: https://issues.apache.org/jira/browse/SPARK-47274 Project: Spark Issue Type: Bug Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee Errors originating from PySpark operations can be difficult to debug with limited context in the error messages. While improvements on the JVM side have been made to offer detailed error contexts, PySpark errors often lack this level of detail. Adding detailed context about the location within the user's PySpark code where the error occurred will help debuggability for PySpark users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47179) Improve error message from SparkThrowableSuite for better debuggability
Haejoon Lee created SPARK-47179: --- Summary: Improve error message from SparkThrowableSuite for better debuggability Key: SPARK-47179 URL: https://issues.apache.org/jira/browse/SPARK-47179 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Haejoon Lee Current error message is not very helpful when error classes documentation is not up-to-date so we better improve it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47158) Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231
[ https://issues.apache.org/jira/browse/SPARK-47158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-47158: Summary: Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 (was: Assign proper name to top LEGACY errors) > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 > - > > Key: SPARK-47158 > URL: https://issues.apache.org/jira/browse/SPARK-47158 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47158) Assign proper name to top LEGACY errors
Haejoon Lee created SPARK-47158: --- Summary: Assign proper name to top LEGACY errors Key: SPARK-47158 URL: https://issues.apache.org/jira/browse/SPARK-47158 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Haejoon Lee Assign proper name and sqlState to _LEGACY_ERROR_TEMP_2134 & 2231 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46809) Check error message parameter properly
[ https://issues.apache.org/jira/browse/SPARK-46809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-46809. - Resolution: Not A Bug Seems to be working fine > Check error message parameter properly > -- > > Key: SPARK-46809 > URL: https://issues.apache.org/jira/browse/SPARK-46809 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > If error message parameter from template is missing in actual usage or the > name is different, it should raise exception but currently it's not. We > should handle this to work properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46927) Make `assertDataFrameEqual` work properly without PyArrow
Haejoon Lee created SPARK-46927: --- Summary: Make `assertDataFrameEqual` work properly without PyArrow Key: SPARK-46927 URL: https://issues.apache.org/jira/browse/SPARK-46927 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46874) Remove pyspark.pandas dependency from assertDataFrameEqual
Haejoon Lee created SPARK-46874: --- Summary: Remove pyspark.pandas dependency from assertDataFrameEqual Key: SPARK-46874 URL: https://issues.apache.org/jira/browse/SPARK-46874 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46824) Enable test without optional dependency on PyPy
Haejoon Lee created SPARK-46824: --- Summary: Enable test without optional dependency on PyPy Key: SPARK-46824 URL: https://issues.apache.org/jira/browse/SPARK-46824 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee PyPy doesn't have pandas/pyarrow so we should enable test without them on PyPy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46820) Fix error message regression by restoring new_msg
[ https://issues.apache.org/jira/browse/SPARK-46820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46820: Description: >>> from pyspark.sql.types import StructType, StructField, StringType, >>> IntegerType >>> schema = StructType([ ... StructField("name", StringType(), nullable=True), ... StructField("age", IntegerType(), nullable=False) ... ]) >>> df = spark.createDataFrame([("asd", None])], schema) pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_BE_NONE] Argument `obj` cannot be None. was: >>> from pyspark.sql.types import StructType, StructField, StringType, >>> IntegerType >>> schema = StructType([ ... StructField("name", StringType(), nullable=True), ... StructField("age", IntegerType(), nullable=False) ... ]) >>> df = spark.createDataFrame([("asd", None])], schema) pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_BE_NONE] Argument `obj` cannot be None. The error message in the example above says "obj", but createDataFrame function has not "obj" reference. We should fix this error message properly. > Fix error message regression by restoring new_msg > - > > Key: SPARK-46820 > URL: https://issues.apache.org/jira/browse/SPARK-46820 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > >>> from pyspark.sql.types import StructType, StructField, StringType, > >>> IntegerType > >>> schema = StructType([ > ... StructField("name", StringType(), nullable=True), > ... StructField("age", IntegerType(), nullable=False) > ... ]) > >>> df = spark.createDataFrame([("asd", None])], schema) > pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_BE_NONE] Argument > `obj` cannot be None. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46820) Fix error message regression by restoring new_msg
[ https://issues.apache.org/jira/browse/SPARK-46820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46820: Summary: Fix error message regression by restoring new_msg (was: Improve error message when createDataFrame have illegal nullable) > Fix error message regression by restoring new_msg > - > > Key: SPARK-46820 > URL: https://issues.apache.org/jira/browse/SPARK-46820 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > >>> from pyspark.sql.types import StructType, StructField, StringType, > >>> IntegerType > >>> schema = StructType([ > ... StructField("name", StringType(), nullable=True), > ... StructField("age", IntegerType(), nullable=False) > ... ]) > >>> df = spark.createDataFrame([("asd", None])], schema) > pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_BE_NONE] Argument > `obj` cannot be None. > > The error message in the example above says "obj", but createDataFrame > function has not "obj" reference. We should fix this error message properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46810) Clarify error class terminology
[ https://issues.apache.org/jira/browse/SPARK-46810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17810156#comment-17810156 ] Haejoon Lee commented on SPARK-46810: - cc [~maxgekk] who drives SQL side error message improvement > Clarify error class terminology > --- > > Key: SPARK-46810 > URL: https://issues.apache.org/jira/browse/SPARK-46810 > Project: Spark > Issue Type: Improvement > Components: Documentation, SQL >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Priority: Minor > > We use inconsistent terminology when talking about error classes. I'd like to > get some clarity on that before contributing any potential improvements to > this part of the documentation. > Consider > [INCOMPLETE_TYPE_DEFINITION|https://spark.apache.org/docs/3.5.0/sql-error-conditions-incomplete-type-definition-error-class.html]. > It has several key pieces of hierarchical information that have inconsistent > names throughout our documentation and codebase: > * 42 > ** K01 > *** INCOMPLETE_TYPE_DEFINITION > ARRAY > MAP > STRUCT > What are the names of these different levels of information? > Some examples of inconsistent terminology: > * [Over > here|https://spark.apache.org/docs/latest/sql-error-conditions-sqlstates.html#class-42-syntax-error-or-access-rule-violation] > we call 42 the "class". Yet on the main page for INCOMPLETE_TYPE_DEFINITION > we call that an "error class". So what exactly is a class, the 42 or the > INCOMPLETE_TYPE_DEFINITION? > * [Over > here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/README.md#L122] > we call K01 the "subclass". But [over > here|https://github.com/apache/spark/blob/26d3eca0a8d3303d0bb9450feb6575ed145bbd7e/common/utils/src/main/resources/error/error-classes.json#L1452-L1467] > we call the ARRAY, MAP, and STRUCT the subclasses. And on the main page for > INCOMPLETE_TYPE_DEFINITION we call those same things "derived error classes". > So what exactly is a subclass? > * [On this > page|https://spark.apache.org/docs/3.5.0/sql-error-conditions.html#incomplete_type_definition] > we call INCOMPLETE_TYPE_DEFINITION an "error condition", though in other > places we refer to it as an "error class". > I personally like the terminology "error condition", but as we are already > using "error class" very heavily throughout the codebase to refer to > something like INCOMPLETE_TYPE_DEFINITION, I don't think it's practical to > change at this point. > To rationalize the different terms we are using, I propose the following > terminology, which we should use consistently throughout our code and > documentation: > * Error category: 42 > * Error sub-category: K01 > * Error state: 42K01 > * Error class: INCOMPLETE_TYPE_DEFINITION > * Error sub-classes: ARRAY, MAP, STRUCT > We should not use "error condition" if one of the above terms more accurately > describes what we are talking about. > Side note: With this terminology, I believe talking about error categories > and sub-categories in front of users is not helpful. I don't think anybody > cares what "42" by itself means, or what "K01" by itself means. Accordingly, > we should limit how much we talk about these concepts in the user-facing > documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46821) Remove pandas dependency from assertDataFrameEqual properly
[ https://issues.apache.org/jira/browse/SPARK-46821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-46821. - Resolution: Fixed > Remove pandas dependency from assertDataFrameEqual properly > --- > > Key: SPARK-46821 > URL: https://issues.apache.org/jira/browse/SPARK-46821 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > assertDataFrameEqual should not depend on pandas -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46821) Remove pandas dependency from assertDataFrameEqual properly
Haejoon Lee created SPARK-46821: --- Summary: Remove pandas dependency from assertDataFrameEqual properly Key: SPARK-46821 URL: https://issues.apache.org/jira/browse/SPARK-46821 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee assertDataFrameEqual should not depend on pandas -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46820) Improve error message when createDataFrame have illegal nullable
Haejoon Lee created SPARK-46820: --- Summary: Improve error message when createDataFrame have illegal nullable Key: SPARK-46820 URL: https://issues.apache.org/jira/browse/SPARK-46820 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee >>> from pyspark.sql.types import StructType, StructField, StringType, >>> IntegerType >>> schema = StructType([ ... StructField("name", StringType(), nullable=True), ... StructField("age", IntegerType(), nullable=False) ... ]) >>> df = spark.createDataFrame([("asd", None])], schema) pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_BE_NONE] Argument `obj` cannot be None. The error message in the example above says "obj", but createDataFrame function has not "obj" reference. We should fix this error message properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46809) Check error message parameter properly
[ https://issues.apache.org/jira/browse/SPARK-46809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46809: Description: If error message parameter from template is missing in actual usage or the name is different, it should raise exception but currently it's not. We should handle this to work properly. (was: If error message parameter from template is missing in actual usage, it should raise exception but currently it's not. We should handle this to work properly.) > Check error message parameter properly > -- > > Key: SPARK-46809 > URL: https://issues.apache.org/jira/browse/SPARK-46809 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > If error message parameter from template is missing in actual usage or the > name is different, it should raise exception but currently it's not. We > should handle this to work properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46809) Check error message parameter properly
[ https://issues.apache.org/jira/browse/SPARK-46809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46809: Summary: Check error message parameter properly (was: Check missing error message parameter properly) > Check error message parameter properly > -- > > Key: SPARK-46809 > URL: https://issues.apache.org/jira/browse/SPARK-46809 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > If error message parameter from template is missing in actual usage, it > should raise exception but currently it's not. We should handle this to work > properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46809) Check missing error message parameter properly
Haejoon Lee created SPARK-46809: --- Summary: Check missing error message parameter properly Key: SPARK-46809 URL: https://issues.apache.org/jira/browse/SPARK-46809 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee If error message parameter from template is missing in actual usage, it should raise exception but currently it's not. We should handle this to work properly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46728) Check pandas installation properly
Haejoon Lee created SPARK-46728: --- Summary: Check pandas installation properly Key: SPARK-46728 URL: https://issues.apache.org/jira/browse/SPARK-46728 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee Checking minimum_pandas_version currently not working properly. >>> import pyspark.pandas AttributeError: module 'pandas' has no attribute '__version__' -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46665) Remove assertPandasOnSparkEqual
[ https://issues.apache.org/jira/browse/SPARK-46665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46665: Summary: Remove assertPandasOnSparkEqual (was: Remove Pandas dependency for pyspark.testing) > Remove assertPandasOnSparkEqual > --- > > Key: SPARK-46665 > URL: https://issues.apache.org/jira/browse/SPARK-46665 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We should not make pyspark.testing depending on Pandas. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46665) Remove assertPandasOnSparkEqual
[ https://issues.apache.org/jira/browse/SPARK-46665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46665: Description: Remove deprecated API (was: We should not make pyspark.testing depending on Pandas.) > Remove assertPandasOnSparkEqual > --- > > Key: SPARK-46665 > URL: https://issues.apache.org/jira/browse/SPARK-46665 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Remove deprecated API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46642) Add `getMessageTemplate` to PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee resolved SPARK-46642. - Resolution: Won't Fix > Add `getMessageTemplate` to PySpark error framework > --- > > Key: SPARK-46642 > URL: https://issues.apache.org/jira/browse/SPARK-46642 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We should have `getMessageTemplate` to PySpark error framework to meet the > feature parity with JVM side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46665) Remove Pandas dependency for pyspark.testing
Haejoon Lee created SPARK-46665: --- Summary: Remove Pandas dependency for pyspark.testing Key: SPARK-46665 URL: https://issues.apache.org/jira/browse/SPARK-46665 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee We should not make pyspark.testing depending on Pandas. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46642) Add `getMessageTemplate` to PySpark error framework
Haejoon Lee created SPARK-46642: --- Summary: Add `getMessageTemplate` to PySpark error framework Key: SPARK-46642 URL: https://issues.apache.org/jira/browse/SPARK-46642 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee We should have `getMessageTemplate` to PySpark error framework to meet the feature parity with JVM side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46583) Refine getMessage for Spark Connect
Haejoon Lee created SPARK-46583: --- Summary: Refine getMessage for Spark Connect Key: SPARK-46583 URL: https://issues.apache.org/jira/browse/SPARK-46583 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Haejoon Lee We added getMessage API from [https://github.com/apache/spark/pull/44292] to provide a simplified error message for users, but it's not supported in the same way for Spark Connect server side. We should match the behavior to fill the gap between two. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46571) Re-enable TODOs that are resolved from recent Pandas
Haejoon Lee created SPARK-46571: --- Summary: Re-enable TODOs that are resolved from recent Pandas Key: SPARK-46571 URL: https://issues.apache.org/jira/browse/SPARK-46571 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee We can uncomments some TODOs that are already resolved from test. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46553) FutureWarning for interpolate with object dtype
Haejoon Lee created SPARK-46553: --- Summary: FutureWarning for interpolate with object dtype Key: SPARK-46553 URL: https://issues.apache.org/jira/browse/SPARK-46553 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee >>> pdf.interpolate() :1: FutureWarning: DataFrame.interpolate with object dtype is deprecated and will raise in a future version. Call obj.infer_objects(copy=False) before interpolating instead. A B 0 a 1 1 b 2 2 c 3 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46360) Enhance error message debugging with new `getMessage` API
Haejoon Lee created SPARK-46360: --- Summary: Enhance error message debugging with new `getMessage` API Key: SPARK-46360 URL: https://issues.apache.org/jira/browse/SPARK-46360 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee introduces a new API, `getMessage`. This API provides a standardized way for users to obtain a concise and clear error message, streamlining the process of error handling and debugging. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46338) Re-enable the get_item test for BasicIndexingTests
Haejoon Lee created SPARK-46338: --- Summary: Re-enable the get_item test for BasicIndexingTests Key: SPARK-46338 URL: https://issues.apache.org/jira/browse/SPARK-46338 Project: Spark Issue Type: Bug Components: Pandas API on Spark, Tests Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46322) Replace external link with internal link for error documentation
Haejoon Lee created SPARK-46322: --- Summary: Replace external link with internal link for error documentation Key: SPARK-46322 URL: https://issues.apache.org/jira/browse/SPARK-46322 Project: Spark Issue Type: Bug Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46321) Re-ennable IndexesTests.test_asof that was skipped due to Pandas bug
Haejoon Lee created SPARK-46321: --- Summary: Re-ennable IndexesTests.test_asof that was skipped due to Pandas bug Key: SPARK-46321 URL: https://issues.apache.org/jira/browse/SPARK-46321 Project: Spark Issue Type: Bug Components: Pandas API on Spark, Tests Affects Versions: 4.0.0 Reporter: Haejoon Lee Re-ennable IndexesTests.test_asof that was skipped due to Pandas bug -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46307) Enable `fill_value` tests for `GroupByTests.test_shift`
Haejoon Lee created SPARK-46307: --- Summary: Enable `fill_value` tests for `GroupByTests.test_shift` Key: SPARK-46307 URL: https://issues.apache.org/jira/browse/SPARK-46307 Project: Spark Issue Type: Bug Components: PS, Tests Affects Versions: 4.0.0 Reporter: Haejoon Lee Enable `fill_value` tests for `GroupByTests.test_shift` since the bug from Pandas is fixed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46306) Fix `LocIndexer` to work properly when the key is missing
[ https://issues.apache.org/jira/browse/SPARK-46306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46306: Summary: Fix `LocIndexer` to work properly when the key is missing (was: Fix `LocIndexer` work properly when the key is missing) > Fix `LocIndexer` to work properly when the key is missing > - > > Key: SPARK-46306 > URL: https://issues.apache.org/jira/browse/SPARK-46306 > Project: Spark > Issue Type: Bug > Components: PS >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > Pandas raises exception when key is missing, but Pandas API on Spark just > exclude the missing key from the result. We should fix this behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46306) Fix `LocIndexer` work properly when the key is missing
Haejoon Lee created SPARK-46306: --- Summary: Fix `LocIndexer` work properly when the key is missing Key: SPARK-46306 URL: https://issues.apache.org/jira/browse/SPARK-46306 Project: Spark Issue Type: Bug Components: PS Affects Versions: 4.0.0 Reporter: Haejoon Lee Pandas raises exception when key is missing, but Pandas API on Spark just exclude the missing key from the result. We should fix this behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46293) Remove `protobuf` from required package.
[ https://issues.apache.org/jira/browse/SPARK-46293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46293: Description: (was: Add missing required package for docs.) > Remove `protobuf` from required package. > > > Key: SPARK-46293 > URL: https://issues.apache.org/jira/browse/SPARK-46293 > Project: Spark > Issue Type: Bug > Components: Connect, Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46293) Remove `protobuf` from required package.
[ https://issues.apache.org/jira/browse/SPARK-46293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46293: Description: Remove `protobuf` from requirements.txt > Remove `protobuf` from required package. > > > Key: SPARK-46293 > URL: https://issues.apache.org/jira/browse/SPARK-46293 > Project: Spark > Issue Type: Bug > Components: Connect, Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Remove `protobuf` from requirements.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46293) Remove `protobuf` from required package.
[ https://issues.apache.org/jira/browse/SPARK-46293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46293: Summary: Remove `protobuf` from required package. (was: Add protobuf to required dependency for Spark Connect) > Remove `protobuf` from required package. > > > Key: SPARK-46293 > URL: https://issues.apache.org/jira/browse/SPARK-46293 > Project: Spark > Issue Type: Bug > Components: Connect, Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Add missing required package for docs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46293) Add protobuf to required dependency for Spark Connect
Haejoon Lee created SPARK-46293: --- Summary: Add protobuf to required dependency for Spark Connect Key: SPARK-46293 URL: https://issues.apache.org/jira/browse/SPARK-46293 Project: Spark Issue Type: Bug Components: Connect, Documentation, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee Add missing required package for docs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46112) Implement lint check for PySpark custom errors
[ https://issues.apache.org/jira/browse/SPARK-46112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46112: Summary: Implement lint check for PySpark custom errors (was: Enforce usage of PySpark-specific Exceptions over built-in Python Exceptions) > Implement lint check for PySpark custom errors > -- > > Key: SPARK-46112 > URL: https://issues.apache.org/jira/browse/SPARK-46112 > Project: Spark > Issue Type: Sub-task > Components: Build, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Currently, in the PySpark codebase, there is an inconsistency in the usage of > exceptions. In some instances, PySpark-specific exceptions are utilized, > while in others, generic Python built-in exceptions are used. This > inconsistency can lead to confusion and difficulty in maintaining and > debugging the code. See [https://github.com/apache/spark/pull/44024] related > work to fix such a case. > The goal of this ticket is to establish a standardized practice for error > handling in PySpark by mandating the use of PySpark-specific exceptions where > applicable. This will ensure that all exceptions thrown within PySpark adhere > to a consistent format and standard, making them more informative and easier > to handle. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46282) Create a Standalone Page for DataFrame API in PySpark Documentation
[ https://issues.apache.org/jira/browse/SPARK-46282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46282: Summary: Create a Standalone Page for DataFrame API in PySpark Documentation (was: Create dedicated page for DataFrame API) > Create a Standalone Page for DataFrame API in PySpark Documentation > --- > > Key: SPARK-46282 > URL: https://issues.apache.org/jira/browse/SPARK-46282 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46282) Create a Standalone Page for DataFrame API in PySpark Documentation
[ https://issues.apache.org/jira/browse/SPARK-46282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46282: Description: Previously, the DataFrame API content was nested under the Spark SQL section. This change will involve relocating and structuring the DataFrame API documentation as a distinct, top-level category in the PySpark API reference. > Create a Standalone Page for DataFrame API in PySpark Documentation > --- > > Key: SPARK-46282 > URL: https://issues.apache.org/jira/browse/SPARK-46282 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > Previously, the DataFrame API content was nested under the Spark SQL section. > This change will involve relocating and structuring the DataFrame API > documentation as a distinct, top-level category in the PySpark API reference. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46282) Create dedicated page for DataFrame API
Haejoon Lee created SPARK-46282: --- Summary: Create dedicated page for DataFrame API Key: SPARK-46282 URL: https://issues.apache.org/jira/browse/SPARK-46282 Project: Spark Issue Type: Bug Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46278) Re-organize `GroupByTests`
Haejoon Lee created SPARK-46278: --- Summary: Re-organize `GroupByTests` Key: SPARK-46278 URL: https://issues.apache.org/jira/browse/SPARK-46278 Project: Spark Issue Type: Bug Components: Pandas API on Spark, Tests Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46269) Enable more NumPy compatibility function tests
[ https://issues.apache.org/jira/browse/SPARK-46269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46269: Summary: Enable more NumPy compatibility function tests (was: Enable more NumPy function tests) > Enable more NumPy compatibility function tests > -- > > Key: SPARK-46269 > URL: https://issues.apache.org/jira/browse/SPARK-46269 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Should enable more NumPy test for better coverage -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46269) Enable more NumPy function tests
[ https://issues.apache.org/jira/browse/SPARK-46269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46269: Summary: Enable more NumPy function tests (was: Enable NumPy function tests that marked as flaky before.) > Enable more NumPy function tests > > > Key: SPARK-46269 > URL: https://issues.apache.org/jira/browse/SPARK-46269 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > Should enable more NumPy test for better coverage -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46269) Enable NumPy function tests that marked as flaky before.
[ https://issues.apache.org/jira/browse/SPARK-46269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46269: Summary: Enable NumPy function tests that marked as flaky before. (was: Enable more NumPy function tests) > Enable NumPy function tests that marked as flaky before. > > > Key: SPARK-46269 > URL: https://issues.apache.org/jira/browse/SPARK-46269 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > Should enable more NumPy test for better coverage -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46269) Enable more NumPy function tests
Haejoon Lee created SPARK-46269: --- Summary: Enable more NumPy function tests Key: SPARK-46269 URL: https://issues.apache.org/jira/browse/SPARK-46269 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee Should enable more NumPy test for better coverage -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46262) Enable test for np.left_shift for Pandas-on-Spark object.
[ https://issues.apache.org/jira/browse/SPARK-46262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46262: Summary: Enable test for np.left_shift for Pandas-on-Spark object. (was: Support `np.left_shift` for Pandas-on-Spark object.) > Enable test for np.left_shift for Pandas-on-Spark object. > - > > Key: SPARK-46262 > URL: https://issues.apache.org/jira/browse/SPARK-46262 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Now we support PyArrow>=4.0.0, we can enable the test for `np.left_shift`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46262) Support `np.left_shift` for Pandas-on-Spark object.
Haejoon Lee created SPARK-46262: --- Summary: Support `np.left_shift` for Pandas-on-Spark object. Key: SPARK-46262 URL: https://issues.apache.org/jira/browse/SPARK-46262 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee Now we support PyArrow>=4.0.0, we can enable the test for `np.left_shift`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46259) Add appropriate link for error class usage documentation.
Haejoon Lee created SPARK-46259: --- Summary: Add appropriate link for error class usage documentation. Key: SPARK-46259 URL: https://issues.apache.org/jira/browse/SPARK-46259 Project: Spark Issue Type: Bug Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee We don't have appropriate link for error class usage documentation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46213) Add PySparkImportError for error framework
[ https://issues.apache.org/jira/browse/SPARK-46213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46213: Parent: SPARK-45673 Issue Type: Sub-task (was: Bug) > Add PySparkImportError for error framework > -- > > Key: SPARK-46213 > URL: https://issues.apache.org/jira/browse/SPARK-46213 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add PySparkImportError for error framework for wrapping ImportError -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46226) Migrate all remaining RuntimeError into PySpark error framework
[ https://issues.apache.org/jira/browse/SPARK-46226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46226: Summary: Migrate all remaining RuntimeError into PySpark error framework (was: Migrate all remaining RuntimeErrors into PySpark error framework) > Migrate all remaining RuntimeError into PySpark error framework > --- > > Key: SPARK-46226 > URL: https://issues.apache.org/jira/browse/SPARK-46226 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > We should migrate all errors into PySpark error framework -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46230) Migrate RetriesExceeded into PySpark error.
[ https://issues.apache.org/jira/browse/SPARK-46230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46230: Summary: Migrate RetriesExceeded into PySpark error. (was: Migrate RetriesExceeded and RetryException into PySpark error.) > Migrate RetriesExceeded into PySpark error. > --- > > Key: SPARK-46230 > URL: https://issues.apache.org/jira/browse/SPARK-46230 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46234) Introduce PySparkKeyError for PySpark error framework
Haejoon Lee created SPARK-46234: --- Summary: Introduce PySparkKeyError for PySpark error framework Key: SPARK-46234 URL: https://issues.apache.org/jira/browse/SPARK-46234 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46233) Migrate all remaining ArrtibuteError into PySpark error framework
Haejoon Lee created SPARK-46233: --- Summary: Migrate all remaining ArrtibuteError into PySpark error framework Key: SPARK-46233 URL: https://issues.apache.org/jira/browse/SPARK-46233 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46232) Migrate all remaining ValueError into PySpark error framework
Haejoon Lee created SPARK-46232: --- Summary: Migrate all remaining ValueError into PySpark error framework Key: SPARK-46232 URL: https://issues.apache.org/jira/browse/SPARK-46232 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46231) Migrate all remaining NotImplementedError & TypeError into PySpark error framework
Haejoon Lee created SPARK-46231: --- Summary: Migrate all remaining NotImplementedError & TypeError into PySpark error framework Key: SPARK-46231 URL: https://issues.apache.org/jira/browse/SPARK-46231 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46230) Migrate RetriesExceeded and RetryException into PySpark error.
Haejoon Lee created SPARK-46230: --- Summary: Migrate RetriesExceeded and RetryException into PySpark error. Key: SPARK-46230 URL: https://issues.apache.org/jira/browse/SPARK-46230 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46226) Migrate all remaining RuntimeErrors into PySpark error framework
Haejoon Lee created SPARK-46226: --- Summary: Migrate all remaining RuntimeErrors into PySpark error framework Key: SPARK-46226 URL: https://issues.apache.org/jira/browse/SPARK-46226 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee We should migrate all errors into PySpark error framework -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46208) Adding a link for latest Pandas API specifications.
[ https://issues.apache.org/jira/browse/SPARK-46208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46208: Summary: Adding a link for latest Pandas API specifications. (was: Use specific Pandas version for API specifications) > Adding a link for latest Pandas API specifications. > --- > > Key: SPARK-46208 > URL: https://issues.apache.org/jira/browse/SPARK-46208 > Project: Spark > Issue Type: Bug > Components: Documentation, Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Use specific supported Pandas version to be more clear. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46213) Add PySparkImportError for error framework
Haejoon Lee created SPARK-46213: --- Summary: Add PySparkImportError for error framework Key: SPARK-46213 URL: https://issues.apache.org/jira/browse/SPARK-46213 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee Add PySparkImportError for error framework for wrapping ImportError -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46208) Use specific Pandas version for API specifications
[ https://issues.apache.org/jira/browse/SPARK-46208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46208: Summary: Use specific Pandas version for API specifications (was: Use specific Pandas version for API specifictations) > Use specific Pandas version for API specifications > -- > > Key: SPARK-46208 > URL: https://issues.apache.org/jira/browse/SPARK-46208 > Project: Spark > Issue Type: Bug > Components: Documentation, Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > Use specific supported Pandas version to be more clear. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46208) Use specific Pandas version for API specifictations
Haejoon Lee created SPARK-46208: --- Summary: Use specific Pandas version for API specifictations Key: SPARK-46208 URL: https://issues.apache.org/jira/browse/SPARK-46208 Project: Spark Issue Type: Bug Components: Documentation, Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee Use specific supported Pandas version to be more clear. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46206) Use a narrower scope exception for SQL processor
Haejoon Lee created SPARK-46206: --- Summary: Use a narrower scope exception for SQL processor Key: SPARK-46206 URL: https://issues.apache.org/jira/browse/SPARK-46206 Project: Spark Issue Type: Bug Components: PS Affects Versions: 4.0.0 Reporter: Haejoon Lee Current exception handling in these functions uses the general {{Exception}} type, which can obscure the root cause of issues and make the code harder to maintain and debug -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46169) Assign appropriate JIRA numbers to unlabeled TODO items for DataFrame API.
Haejoon Lee created SPARK-46169: --- Summary: Assign appropriate JIRA numbers to unlabeled TODO items for DataFrame API. Key: SPARK-46169 URL: https://issues.apache.org/jira/browse/SPARK-46169 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee There are many TODO items has no actual JIRA number. We should assign proper number for better tracking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46168) Add axis parameter to DataFrame.idxmin & idxmax
Haejoon Lee created SPARK-46168: --- Summary: Add axis parameter to DataFrame.idxmin & idxmax Key: SPARK-46168 URL: https://issues.apache.org/jira/browse/SPARK-46168 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.idxmax.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46167) Add axis, pct and na_option parameter to DataFrame.rank
Haejoon Lee created SPARK-46167: --- Summary: Add axis, pct and na_option parameter to DataFrame.rank Key: SPARK-46167 URL: https://issues.apache.org/jira/browse/SPARK-46167 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46165) Improve axis parameter for DataFrame.all to support columns.
Haejoon Lee created SPARK-46165: --- Summary: Improve axis parameter for DataFrame.all to support columns. Key: SPARK-46165 URL: https://issues.apache.org/jira/browse/SPARK-46165 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.all.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46166) Add axis and skipna parameters to DataFrame.any
Haejoon Lee created SPARK-46166: --- Summary: Add axis and skipna parameters to DataFrame.any Key: SPARK-46166 URL: https://issues.apache.org/jira/browse/SPARK-46166 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.any.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46164) Add include and exclude parameters for DataFrame.describe
Haejoon Lee created SPARK-46164: --- Summary: Add include and exclude parameters for DataFrame.describe Key: SPARK-46164 URL: https://issues.apache.org/jira/browse/SPARK-46164 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46163) Add filter_func and errors parameter for DataFrame.update
Haejoon Lee created SPARK-46163: --- Summary: Add filter_func and errors parameter for DataFrame.update Key: SPARK-46163 URL: https://issues.apache.org/jira/browse/SPARK-46163 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.update.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46160) Add freq and axis parameters to DataFrame.shift
Haejoon Lee created SPARK-46160: --- Summary: Add freq and axis parameters to DataFrame.shift Key: SPARK-46160 URL: https://issues.apache.org/jira/browse/SPARK-46160 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shift.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46162) Improve axis parameter for DataFrame.nunique to support columns.
Haejoon Lee created SPARK-46162: --- Summary: Improve axis parameter for DataFrame.nunique to support columns. Key: SPARK-46162 URL: https://issues.apache.org/jira/browse/SPARK-46162 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.nunique.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46161) Improve axis parameter for DataFrame.diff to support columns.
Haejoon Lee created SPARK-46161: --- Summary: Improve axis parameter for DataFrame.diff to support columns. Key: SPARK-46161 URL: https://issues.apache.org/jira/browse/SPARK-46161 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.diff.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46159) Improve axis parameter for DataFrame.at_time to support columns.
Haejoon Lee created SPARK-46159: --- Summary: Improve axis parameter for DataFrame.at_time to support columns. Key: SPARK-46159 URL: https://issues.apache.org/jira/browse/SPARK-46159 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.at_time.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46158) Improve axis parameter for DataFrame.xs to support columns.
Haejoon Lee created SPARK-46158: --- Summary: Improve axis parameter for DataFrame.xs to support columns. Key: SPARK-46158 URL: https://issues.apache.org/jira/browse/SPARK-46158 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.xs.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46157) Add `axis` parameter for DataFrame.aggregate.
Haejoon Lee created SPARK-46157: --- Summary: Add `axis` parameter for DataFrame.aggregate. Key: SPARK-46157 URL: https://issues.apache.org/jira/browse/SPARK-46157 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Haejoon Lee See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.aggregate.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46129) Add GitHub link icon to PySpark documentation header
Haejoon Lee created SPARK-46129: --- Summary: Add GitHub link icon to PySpark documentation header Key: SPARK-46129 URL: https://issues.apache.org/jira/browse/SPARK-46129 Project: Spark Issue Type: Bug Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee Add GitHub link icon to PySpark documentation header for better accessibility such as Pandas does. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46123) Using brighter color for document title for better visibility
Haejoon Lee created SPARK-46123: --- Summary: Using brighter color for document title for better visibility Key: SPARK-46123 URL: https://issues.apache.org/jira/browse/SPARK-46123 Project: Spark Issue Type: Bug Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee With the increasing popularity of dark mode for its eye comfort and energy-saving benefits, it's important to ensure that our documentation is easily readable in both light and dark settings. The current title font color in dark mode is not optimal for readability, which can hinder user experience. By adjusting the color, we aim to enhance the overall accessibility and readability of the PySpark documentation in dark mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-46117) Enhancing readability of PySpark API reference by hiding verbose typehints.
Haejoon Lee created SPARK-46117: --- Summary: Enhancing readability of PySpark API reference by hiding verbose typehints. Key: SPARK-46117 URL: https://issues.apache.org/jira/browse/SPARK-46117 Project: Spark Issue Type: Bug Components: Documentation, PySpark Affects Versions: 4.0.0 Reporter: Haejoon Lee Currently, the PySpark API documentation displays all type hints in the signatures, which can make the documentation appear cluttered and less readable. By setting `autodoc_typehints` to 'none', we can achieve a cleaner and more concise presentation of our API, similar to how the Pandas documentation handles type hints. This approach has been effective in Pandas, making the documentation more approachable and easier to understand, especially for newcomers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46116) Adding "Q Support" and "Mailing Lists" link into PySpark doc homepage.
[ https://issues.apache.org/jira/browse/SPARK-46116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46116: Description: It is aimed at improving user engagement and providing quick access to community support and discussions. This approach is inspired by the [Pandas documentation](https://pandas.pydata.org/docs/index.html), which effectively uses a similar section for community engagement. The "Q Support" will lead users to a curated list of StackOverflow questions tagged with `pyspark`, while the mailing lists will offer platforms for deeper discussions and insights within the Spark community. was:The addition of the "Q Support" link provides quick access to the community-driven Q platform, StackOverflow, where users can seek help and contribute to discussions about PySpark. It enhances the user experience by connecting the documentation with a dynamic and interactive community resource. > Adding "Q Support" and "Mailing Lists" link into PySpark doc homepage. > > > Key: SPARK-46116 > URL: https://issues.apache.org/jira/browse/SPARK-46116 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > It is aimed at improving user engagement and providing quick access to > community support and discussions. This approach is inspired by the [Pandas > documentation](https://pandas.pydata.org/docs/index.html), which effectively > uses a similar section for community engagement. > The "Q Support" will lead users to a curated list of StackOverflow > questions tagged with `pyspark`, while the mailing lists will offer platforms > for deeper discussions and insights within the Spark community. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46116) Adding "Q Support" and "Mailing Lists" link into PySpark doc homepage.
[ https://issues.apache.org/jira/browse/SPARK-46116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46116: Summary: Adding "Q Support" and "Mailing Lists" link into PySpark doc homepage. (was: Enriching PySpark doc with "Useful links" including Q Support and Mailing Lists) > Adding "Q Support" and "Mailing Lists" link into PySpark doc homepage. > > > Key: SPARK-46116 > URL: https://issues.apache.org/jira/browse/SPARK-46116 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > The addition of the "Q Support" link provides quick access to the > community-driven Q platform, StackOverflow, where users can seek help and > contribute to discussions about PySpark. It enhances the user experience by > connecting the documentation with a dynamic and interactive community > resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46116) Enriching PySpark doc with "Useful links" including Q Support and Mailing Lists
[ https://issues.apache.org/jira/browse/SPARK-46116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46116: Summary: Enriching PySpark doc with "Useful links" including Q Support and Mailing Lists (was: Enriching PySpark doc with "Useful links" Including Q Support and Mailing Lists) > Enriching PySpark doc with "Useful links" including Q Support and Mailing > Lists > - > > Key: SPARK-46116 > URL: https://issues.apache.org/jira/browse/SPARK-46116 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > The addition of the "Q Support" link provides quick access to the > community-driven Q platform, StackOverflow, where users can seek help and > contribute to discussions about PySpark. It enhances the user experience by > connecting the documentation with a dynamic and interactive community > resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46116) Enriching "Useful links" on PySpark docs including "Q Support" and "Mailing Lists"
[ https://issues.apache.org/jira/browse/SPARK-46116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46116: Summary: Enriching "Useful links" on PySpark docs including "Q Support" and "Mailing Lists" (was: Enriching PySpark Documentation with "Useful Links" Including Q Support and Mailing Lists) > Enriching "Useful links" on PySpark docs including "Q Support" and "Mailing > Lists" > > > Key: SPARK-46116 > URL: https://issues.apache.org/jira/browse/SPARK-46116 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > The addition of the "Q Support" link provides quick access to the > community-driven Q platform, StackOverflow, where users can seek help and > contribute to discussions about PySpark. It enhances the user experience by > connecting the documentation with a dynamic and interactive community > resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46116) Enriching PySpark doc with "Useful links" Including Q Support and Mailing Lists
[ https://issues.apache.org/jira/browse/SPARK-46116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-46116: Summary: Enriching PySpark doc with "Useful links" Including Q Support and Mailing Lists (was: Enriching "Useful links" on PySpark docs including "Q Support" and "Mailing Lists") > Enriching PySpark doc with "Useful links" Including Q Support and Mailing > Lists > - > > Key: SPARK-46116 > URL: https://issues.apache.org/jira/browse/SPARK-46116 > Project: Spark > Issue Type: Bug > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > > The addition of the "Q Support" link provides quick access to the > community-driven Q platform, StackOverflow, where users can seek help and > contribute to discussions about PySpark. It enhances the user experience by > connecting the documentation with a dynamic and interactive community > resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org