Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-05-01 Thread via GitHub


HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1587017977


##
python/MANIFEST.in:
##
@@ -14,13 +14,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-global-exclude *.py[cod] __pycache__ .DS_Store
+# Reference: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html
+
+graft pyspark

Review Comment:
   I would appreciate if you make another PR :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-05-01 Thread via GitHub


nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1586990733


##
python/MANIFEST.in:
##
@@ -14,13 +14,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-global-exclude *.py[cod] __pycache__ .DS_Store
+# Reference: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html
+
+graft pyspark

Review Comment:
   OK. Are you planning to address this in #46331 (or some other PR), or would 
you like me to take care of it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-05-01 Thread via GitHub


HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1586988988


##
python/MANIFEST.in:
##
@@ -14,13 +14,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-global-exclude *.py[cod] __pycache__ .DS_Store
+# Reference: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html
+
+graft pyspark

Review Comment:
   I agree that it's safer so we don't miss something out ... but let's just 
add `json` file alone .. I think it's more import to get rid of unrelated files 
..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-05-01 Thread via GitHub


nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1586985679


##
python/MANIFEST.in:
##
@@ -14,13 +14,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-global-exclude *.py[cod] __pycache__ .DS_Store
+# Reference: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html
+
+graft pyspark

Review Comment:
   Yes, `graft` pulls everything.
   
   We can try to just include what we think we need, but it's probably safer 
(and easier) in the long run to instead exclude what we don't want to package, 
like tests. Would that work for you?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-05-01 Thread via GitHub


HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1586979834


##
python/MANIFEST.in:
##
@@ -14,13 +14,18 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-global-exclude *.py[cod] __pycache__ .DS_Store
+# Reference: https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html
+
+graft pyspark

Review Comment:
   @nchammas Seems like this ending up with adding all tests as well. Could we 
just include that json file alone?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-04-30 Thread via GitHub


HyukjinKwon closed pull request #44920: [SPARK-46894][PYTHON] Move PySpark 
error conditions into standalone JSON file
URL: https://github.com/apache/spark/pull/44920


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-04-30 Thread via GitHub


HyukjinKwon commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2087787520

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-04-30 Thread via GitHub


nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1585016463


##
python/pyspark/errors/error_classes.py:
##
@@ -15,1160 +15,15 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-"message": [
-  "An application name must be set in your configuration."
-]
-  },
-  "ARGUMENT_REQUIRED": {
-"message": [
-  "Argument `` is required when ."
-]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-"message": [
-  "Arrow legacy IPC format is not supported in PySpark, please unset 
ARROW_PRE_0_15_IPC_FORMAT."
-]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-"message": [
-  "Attribute `` in provided object `` is not 
callable."
-]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-"message": [
-  "Attribute `` is not supported."
-]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-"message": [
-  "Length mismatch: Expected axis has  element, new 
values have  elements."
-]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-"message": [
-  "Broadcast variable `` not loaded."
-]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-"message": [
-  "Not supported to call `` before initialize ."
-]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-"message": [
-  "`` can not accept object `` in type ``."
-]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-"message": [
-  "Dunder(double underscore) attribute is for internal use only."
-]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-"message": [
-  "Cannot apply 'in' operator against a column: please use 'contains' in a 
string column or 'array_contains' function for an array column."
-]
-  },
-  "CANNOT_BE_EMPTY": {
-"message": [
-  "At least one  must be specified."
-]
-  },
-  "CANNOT_BE_NONE": {
-"message": [
-  "Argument `` cannot be None."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-"message": [
-  "Spark Connect server cannot be configured: Existing [], 
New []."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-"message": [
-  "Spark Connect server and Spark master cannot be configured together: 
Spark master [], Spark Connect []."
-]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-"message": [
-  "Cannot convert column into bool: please use '&' for 'and', '|' for 
'or', '~' for 'not' when building DataFrame boolean expressions."
-]
-  },
-  "CANNOT_CONVERT_TYPE": {
-"message": [
-  "Cannot convert  into ."
-]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-"message": [
-  "Some of types cannot be determined after inferring."
-]
-  },
-  "CANNOT_GET_BATCH_ID": {
-"message": [
-  "Could not get batch id from ."
-]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-"message": [
-  "Can not infer Array Type from a list with None as the first element."
-]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-"message": [
-  "Can not infer schema from an empty dataset."
-]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-"message": [
-  "Can not infer schema for type: ``."
-]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-"message": [
-  "Unable to infer the type of the field ``."
-]
-  },
-  "CANNOT_MERGE_TYPE": {
-"message": [
-  "Can not merge type `` and ``."
-]
-  },
-  "CANNOT_OPEN_SOCKET": {
-"message": [
-  "Can not open socket: ."
-]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-"message": [
-  "Unable to parse datatype. ."
-]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-"message": [
-  "Metadata can only be provided for a single column."
-]
-  },
-  "CANNOT_SET_TOGETHER": {
-"message": [
-  " should not be set together."
-]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-"message": [
-  "returnType can not be specified when `` is a user-defined 
function, but got ."
-]
-  },
-  "CANNOT_WITHOUT": {
-"message": [
-  "Cannot  without ."
-]
-  },
-  "COLUMN_IN_LIST": {
-"message": [
-  "`` does not allow a Column in a list."
-]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-"message": [
-  "Only one Spark Connect client URL can be set; however, got a different 
URL [] from the existing []."
-]
-  },
-  "CONNECT_URL_NOT_SET": {
-"message": [
-  "Cannot create a Spark Connect session because the Spark Connect remote 
URL has not been set. Please define the remote URL by setting either the 
'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-"message": [
-  "It appears that you are attempting to reference SparkContext from a 
broadcast variable, action, or transformation. SparkContext can only be used on 
the driver, not in code that it 

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-04-29 Thread via GitHub


HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1584127828


##
python/pyspark/errors/error_classes.py:
##
@@ -15,1160 +15,15 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-"message": [
-  "An application name must be set in your configuration."
-]
-  },
-  "ARGUMENT_REQUIRED": {
-"message": [
-  "Argument `` is required when ."
-]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-"message": [
-  "Arrow legacy IPC format is not supported in PySpark, please unset 
ARROW_PRE_0_15_IPC_FORMAT."
-]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-"message": [
-  "Attribute `` in provided object `` is not 
callable."
-]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-"message": [
-  "Attribute `` is not supported."
-]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-"message": [
-  "Length mismatch: Expected axis has  element, new 
values have  elements."
-]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-"message": [
-  "Broadcast variable `` not loaded."
-]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-"message": [
-  "Not supported to call `` before initialize ."
-]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-"message": [
-  "`` can not accept object `` in type ``."
-]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-"message": [
-  "Dunder(double underscore) attribute is for internal use only."
-]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-"message": [
-  "Cannot apply 'in' operator against a column: please use 'contains' in a 
string column or 'array_contains' function for an array column."
-]
-  },
-  "CANNOT_BE_EMPTY": {
-"message": [
-  "At least one  must be specified."
-]
-  },
-  "CANNOT_BE_NONE": {
-"message": [
-  "Argument `` cannot be None."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-"message": [
-  "Spark Connect server cannot be configured: Existing [], 
New []."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-"message": [
-  "Spark Connect server and Spark master cannot be configured together: 
Spark master [], Spark Connect []."
-]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-"message": [
-  "Cannot convert column into bool: please use '&' for 'and', '|' for 
'or', '~' for 'not' when building DataFrame boolean expressions."
-]
-  },
-  "CANNOT_CONVERT_TYPE": {
-"message": [
-  "Cannot convert  into ."
-]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-"message": [
-  "Some of types cannot be determined after inferring."
-]
-  },
-  "CANNOT_GET_BATCH_ID": {
-"message": [
-  "Could not get batch id from ."
-]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-"message": [
-  "Can not infer Array Type from a list with None as the first element."
-]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-"message": [
-  "Can not infer schema from an empty dataset."
-]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-"message": [
-  "Can not infer schema for type: ``."
-]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-"message": [
-  "Unable to infer the type of the field ``."
-]
-  },
-  "CANNOT_MERGE_TYPE": {
-"message": [
-  "Can not merge type `` and ``."
-]
-  },
-  "CANNOT_OPEN_SOCKET": {
-"message": [
-  "Can not open socket: ."
-]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-"message": [
-  "Unable to parse datatype. ."
-]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-"message": [
-  "Metadata can only be provided for a single column."
-]
-  },
-  "CANNOT_SET_TOGETHER": {
-"message": [
-  " should not be set together."
-]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-"message": [
-  "returnType can not be specified when `` is a user-defined 
function, but got ."
-]
-  },
-  "CANNOT_WITHOUT": {
-"message": [
-  "Cannot  without ."
-]
-  },
-  "COLUMN_IN_LIST": {
-"message": [
-  "`` does not allow a Column in a list."
-]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-"message": [
-  "Only one Spark Connect client URL can be set; however, got a different 
URL [] from the existing []."
-]
-  },
-  "CONNECT_URL_NOT_SET": {
-"message": [
-  "Cannot create a Spark Connect session because the Spark Connect remote 
URL has not been set. Please define the remote URL by setting either the 
'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-"message": [
-  "It appears that you are attempting to reference SparkContext from a 
broadcast variable, action, or transformation. SparkContext can only be used on 
the driver, not in code that 

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-04-29 Thread via GitHub


HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1584127828


##
python/pyspark/errors/error_classes.py:
##
@@ -15,1160 +15,15 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-"message": [
-  "An application name must be set in your configuration."
-]
-  },
-  "ARGUMENT_REQUIRED": {
-"message": [
-  "Argument `` is required when ."
-]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-"message": [
-  "Arrow legacy IPC format is not supported in PySpark, please unset 
ARROW_PRE_0_15_IPC_FORMAT."
-]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-"message": [
-  "Attribute `` in provided object `` is not 
callable."
-]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-"message": [
-  "Attribute `` is not supported."
-]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-"message": [
-  "Length mismatch: Expected axis has  element, new 
values have  elements."
-]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-"message": [
-  "Broadcast variable `` not loaded."
-]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-"message": [
-  "Not supported to call `` before initialize ."
-]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-"message": [
-  "`` can not accept object `` in type ``."
-]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-"message": [
-  "Dunder(double underscore) attribute is for internal use only."
-]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-"message": [
-  "Cannot apply 'in' operator against a column: please use 'contains' in a 
string column or 'array_contains' function for an array column."
-]
-  },
-  "CANNOT_BE_EMPTY": {
-"message": [
-  "At least one  must be specified."
-]
-  },
-  "CANNOT_BE_NONE": {
-"message": [
-  "Argument `` cannot be None."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-"message": [
-  "Spark Connect server cannot be configured: Existing [], 
New []."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-"message": [
-  "Spark Connect server and Spark master cannot be configured together: 
Spark master [], Spark Connect []."
-]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-"message": [
-  "Cannot convert column into bool: please use '&' for 'and', '|' for 
'or', '~' for 'not' when building DataFrame boolean expressions."
-]
-  },
-  "CANNOT_CONVERT_TYPE": {
-"message": [
-  "Cannot convert  into ."
-]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-"message": [
-  "Some of types cannot be determined after inferring."
-]
-  },
-  "CANNOT_GET_BATCH_ID": {
-"message": [
-  "Could not get batch id from ."
-]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-"message": [
-  "Can not infer Array Type from a list with None as the first element."
-]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-"message": [
-  "Can not infer schema from an empty dataset."
-]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-"message": [
-  "Can not infer schema for type: ``."
-]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-"message": [
-  "Unable to infer the type of the field ``."
-]
-  },
-  "CANNOT_MERGE_TYPE": {
-"message": [
-  "Can not merge type `` and ``."
-]
-  },
-  "CANNOT_OPEN_SOCKET": {
-"message": [
-  "Can not open socket: ."
-]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-"message": [
-  "Unable to parse datatype. ."
-]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-"message": [
-  "Metadata can only be provided for a single column."
-]
-  },
-  "CANNOT_SET_TOGETHER": {
-"message": [
-  " should not be set together."
-]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-"message": [
-  "returnType can not be specified when `` is a user-defined 
function, but got ."
-]
-  },
-  "CANNOT_WITHOUT": {
-"message": [
-  "Cannot  without ."
-]
-  },
-  "COLUMN_IN_LIST": {
-"message": [
-  "`` does not allow a Column in a list."
-]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-"message": [
-  "Only one Spark Connect client URL can be set; however, got a different 
URL [] from the existing []."
-]
-  },
-  "CONNECT_URL_NOT_SET": {
-"message": [
-  "Cannot create a Spark Connect session because the Spark Connect remote 
URL has not been set. Please define the remote URL by setting either the 
'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-"message": [
-  "It appears that you are attempting to reference SparkContext from a 
broadcast variable, action, or transformation. SparkContext can only be used on 
the driver, not in code that 

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-04-29 Thread via GitHub


HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1584127568


##
python/pyspark/errors/error_classes.py:
##
@@ -15,1110 +15,14 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-"message": [
-  "An application name must be set in your configuration."
-]
-  },
-  "ARGUMENT_REQUIRED": {
-"message": [
-  "Argument `` is required when ."
-]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-"message": [
-  "Arrow legacy IPC format is not supported in PySpark, please unset 
ARROW_PRE_0_15_IPC_FORMAT."
-]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-"message": [
-  "Attribute `` in provided object `` is not 
callable."
-]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-"message": [
-  "Attribute `` is not supported."
-]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-"message": [
-  "Length mismatch: Expected axis has  element, new 
values have  elements."
-]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-"message": [
-  "Broadcast variable `` not loaded."
-]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-"message": [
-  "Not supported to call `` before initialize ."
-]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-"message": [
-  "`` can not accept object `` in type ``."
-]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-"message": [
-  "Dunder(double underscore) attribute is for internal use only."
-]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-"message": [
-  "Cannot apply 'in' operator against a column: please use 'contains' in a 
string column or 'array_contains' function for an array column."
-]
-  },
-  "CANNOT_BE_EMPTY": {
-"message": [
-  "At least one  must be specified."
-]
-  },
-  "CANNOT_BE_NONE": {
-"message": [
-  "Argument `` cannot be None."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-"message": [
-  "Spark Connect server cannot be configured: Existing [], 
New []."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-"message": [
-  "Spark Connect server and Spark master cannot be configured together: 
Spark master [], Spark Connect []."
-]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-"message": [
-  "Cannot convert column into bool: please use '&' for 'and', '|' for 
'or', '~' for 'not' when building DataFrame boolean expressions."
-]
-  },
-  "CANNOT_CONVERT_TYPE": {
-"message": [
-  "Cannot convert  into ."
-]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-"message": [
-  "Some of types cannot be determined after inferring."
-]
-  },
-  "CANNOT_GET_BATCH_ID": {
-"message": [
-  "Could not get batch id from ."
-]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-"message": [
-  "Can not infer Array Type from a list with None as the first element."
-]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-"message": [
-  "Can not infer schema from an empty dataset."
-]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-"message": [
-  "Can not infer schema for type: ``."
-]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-"message": [
-  "Unable to infer the type of the field ``."
-]
-  },
-  "CANNOT_MERGE_TYPE": {
-"message": [
-  "Can not merge type `` and ``."
-]
-  },
-  "CANNOT_OPEN_SOCKET": {
-"message": [
-  "Can not open socket: ."
-]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-"message": [
-  "Unable to parse datatype. ."
-]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-"message": [
-  "Metadata can only be provided for a single column."
-]
-  },
-  "CANNOT_SET_TOGETHER": {
-"message": [
-  " should not be set together."
-]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-"message": [
-  "returnType can not be specified when `` is a user-defined 
function, but got ."
-]
-  },
-  "CANNOT_WITHOUT": {
-"message": [
-  "Cannot  without ."
-]
-  },
-  "COLUMN_IN_LIST": {
-"message": [
-  "`` does not allow a Column in a list."
-]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-"message": [
-  "Only one Spark Connect client URL can be set; however, got a different 
URL [] from the existing []."
-]
-  },
-  "CONNECT_URL_NOT_SET": {
-"message": [
-  "Cannot create a Spark Connect session because the Spark Connect remote 
URL has not been set. Please define the remote URL by setting either the 
'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-"message": [
-  "It appears that you are attempting to reference SparkContext from a 
broadcast variable, action, or transformation. SparkContext can only be used on 
the driver, not in code that 

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-04-29 Thread via GitHub


nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1584104489


##
python/pyspark/errors/exceptions/__init__.py:
##
@@ -18,39 +18,15 @@
 
 def _write_self() -> None:
 import json
+from pathlib import Path
 from pyspark.errors import error_classes
 
-with open("python/pyspark/errors/error_classes.py", "w") as f:
-error_class_py_file = """#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
-import json
-
-
-ERROR_CLASSES_JSON = '''
-%s
-'''
+ERRORS_DIR = Path(__file__).parents[1]
 
-ERROR_CLASSES_MAP = json.loads(ERROR_CLASSES_JSON)
-""" % json.dumps(
-error_classes.ERROR_CLASSES_MAP, sort_keys=True, indent=2
+with open(ERRORS_DIR / "error-conditions.json", "w") as f:

Review Comment:
   Hmm, I don't understand the concern. This method here is `_write_self()`. 
It's for development only. No user will run this when they install Spark, 
regardless of the installation method. That's what I was saying in my [earlier 
comment on this method][1].
   
   The real code path we care about is [in `error_classes.py`][2], not 
`__init__.py`. And this is the code path that I tested in various ways and 
documented in the PR description.
   
   I tested the zip installation method you were particularly concerned about 
in point 5:
   
   https://github.com/apache/spark/assets/1039369/07884c50-8cd6-4caf-8bb9-b0269f40eb54";>
   
   Is there something about that test you think is inadequate?
   
   [1]: 
https://github.com/apache/spark/pull/44920/files/010714d00b84d7e9edb61170cf35d176cacfb67d#r1470557657
   
   [2]: 
https://github.com/apache/spark/pull/44920/files/010714d00b84d7e9edb61170cf35d176cacfb67d#diff-2823e146fc0e6bddff3505b5bee6e2b855782d9f71e900e6f9099fc97d1fffa6



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-04-29 Thread via GitHub


HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1584091032


##
python/pyspark/errors/exceptions/__init__.py:
##
@@ -18,39 +18,15 @@
 
 def _write_self() -> None:
 import json
+from pathlib import Path
 from pyspark.errors import error_classes
 
-with open("python/pyspark/errors/error_classes.py", "w") as f:
-error_class_py_file = """#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
-import json
-
-
-ERROR_CLASSES_JSON = '''
-%s
-'''
+ERRORS_DIR = Path(__file__).parents[1]
 
-ERROR_CLASSES_MAP = json.loads(ERROR_CLASSES_JSON)
-""" % json.dumps(
-error_classes.ERROR_CLASSES_MAP, sort_keys=True, indent=2
+with open(ERRORS_DIR / "error-conditions.json", "w") as f:

Review Comment:
   We should read `error-conditions.json` from `pyspark.zip` .. and that's the 
real problem ..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-04-29 Thread via GitHub


HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1584090772


##
python/pyspark/errors/exceptions/__init__.py:
##
@@ -18,39 +18,15 @@
 
 def _write_self() -> None:
 import json
+from pathlib import Path
 from pyspark.errors import error_classes
 
-with open("python/pyspark/errors/error_classes.py", "w") as f:
-error_class_py_file = """#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
-import json
-
-
-ERROR_CLASSES_JSON = '''
-%s
-'''
+ERRORS_DIR = Path(__file__).parents[1]
 
-ERROR_CLASSES_MAP = json.loads(ERROR_CLASSES_JSON)
-""" % json.dumps(
-error_classes.ERROR_CLASSES_MAP, sort_keys=True, indent=2
+with open(ERRORS_DIR / "error-conditions.json", "w") as f:

Review Comment:
   @nchammas I believe it worked because you built Spark at the project root 
directory, and `ERRORS_DIR` directory exists ...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-04-29 Thread via GitHub


nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2084153710

   Friendly ping @HyukjinKwon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-04-08 Thread via GitHub


nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-204232

   @HyukjinKwon - Anything else you'd like to see done here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-03-29 Thread via GitHub


nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2027389233

   OK, I tested that as well and updated the PR description accordingly.
   
   I also tweaked the syntax highlighting for that bit documentation you linked 
to, because it was off. This is how it currently looks:
   
   ![Screenshot 2024-03-29 at 11 30 26 
AM](https://github.com/apache/spark/assets/1039369/4ee9b28f-768d-478f-980e-3937fa533029)
   
   Note the weird italicization and missing `*`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-03-28 Thread via GitHub


HyukjinKwon commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2026650601

   People can actually directly use PySpark via importing `pyspark.zip`, see 
https://spark.apache.org/docs/latest/api/python/getting_started/install.html?highlight=pythonpath#manually-downloading


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-03-28 Thread via GitHub


nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2026612169

   Hmm, so when people install this ZIP how exactly do they do it? Because it 
does not install cleanly like the ZIP under `python/dist/`.
   
   ```
   $ pip install .../spark/python/lib/pyspark.zip
   Processing .../spark/python/lib/pyspark.zip
   ERROR: file:///.../spark/python/lib/pyspark.zip does not appear to be a 
Python project:
 neither 'setup.py' nor 'pyproject.toml' found.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-03-28 Thread via GitHub


HyukjinKwon commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2026508352

   It's the one `python/lib/pyspark.zip` when you finish building.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-03-28 Thread via GitHub


nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2025495693

   I've updated the PR description with this additional ZIP test (test 4).
   
   Just to confirm, the ZIP that gets uploaded to the site is the one under 
`python/dist/`. Is that correct?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-03-27 Thread via GitHub


HyukjinKwon commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2024304980

   Just to make sure, does it work when you install PySpark as a ZIP file? 
e.g., downloading it from https://spark.apache.org/downloads.html would install 
PySpark as a ZIP file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-03-24 Thread via GitHub


nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-2016860163

   We have agreed in SPARK-46810 to rename "error class" to "error condition", 
so this PR is unblocked since we know we won't need to rename the new 
`error-classes.json` file.
   
   The work to rename all instances of "error class" to "error condition" 
across the board will happen in SPARK-46810 and SPARK-47429. I would like to 
keep this PR focused on simply moving the Python error conditions into a JSON 
file.
   
   @HyukjinKwon - I believe this PR is ready to go. Do you have any oustanding 
concerns?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-29 Thread via GitHub


nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1916097263

   Converting to draft until 
[SPARK-46810](https://issues.apache.org/jira/browse/SPARK-46810) is resolved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-29 Thread via GitHub


nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1470602522


##
python/pyspark/errors/error_classes.py:
##
@@ -15,1110 +15,14 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-"message": [
-  "An application name must be set in your configuration."
-]
-  },
-  "ARGUMENT_REQUIRED": {
-"message": [
-  "Argument `` is required when ."
-]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-"message": [
-  "Arrow legacy IPC format is not supported in PySpark, please unset 
ARROW_PRE_0_15_IPC_FORMAT."
-]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-"message": [
-  "Attribute `` in provided object `` is not 
callable."
-]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-"message": [
-  "Attribute `` is not supported."
-]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-"message": [
-  "Length mismatch: Expected axis has  element, new 
values have  elements."
-]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-"message": [
-  "Broadcast variable `` not loaded."
-]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-"message": [
-  "Not supported to call `` before initialize ."
-]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-"message": [
-  "`` can not accept object `` in type ``."
-]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-"message": [
-  "Dunder(double underscore) attribute is for internal use only."
-]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-"message": [
-  "Cannot apply 'in' operator against a column: please use 'contains' in a 
string column or 'array_contains' function for an array column."
-]
-  },
-  "CANNOT_BE_EMPTY": {
-"message": [
-  "At least one  must be specified."
-]
-  },
-  "CANNOT_BE_NONE": {
-"message": [
-  "Argument `` cannot be None."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-"message": [
-  "Spark Connect server cannot be configured: Existing [], 
New []."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-"message": [
-  "Spark Connect server and Spark master cannot be configured together: 
Spark master [], Spark Connect []."
-]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-"message": [
-  "Cannot convert column into bool: please use '&' for 'and', '|' for 
'or', '~' for 'not' when building DataFrame boolean expressions."
-]
-  },
-  "CANNOT_CONVERT_TYPE": {
-"message": [
-  "Cannot convert  into ."
-]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-"message": [
-  "Some of types cannot be determined after inferring."
-]
-  },
-  "CANNOT_GET_BATCH_ID": {
-"message": [
-  "Could not get batch id from ."
-]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-"message": [
-  "Can not infer Array Type from a list with None as the first element."
-]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-"message": [
-  "Can not infer schema from an empty dataset."
-]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-"message": [
-  "Can not infer schema for type: ``."
-]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-"message": [
-  "Unable to infer the type of the field ``."
-]
-  },
-  "CANNOT_MERGE_TYPE": {
-"message": [
-  "Can not merge type `` and ``."
-]
-  },
-  "CANNOT_OPEN_SOCKET": {
-"message": [
-  "Can not open socket: ."
-]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-"message": [
-  "Unable to parse datatype. ."
-]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-"message": [
-  "Metadata can only be provided for a single column."
-]
-  },
-  "CANNOT_SET_TOGETHER": {
-"message": [
-  " should not be set together."
-]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-"message": [
-  "returnType can not be specified when `` is a user-defined 
function, but got ."
-]
-  },
-  "CANNOT_WITHOUT": {
-"message": [
-  "Cannot  without ."
-]
-  },
-  "COLUMN_IN_LIST": {
-"message": [
-  "`` does not allow a Column in a list."
-]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-"message": [
-  "Only one Spark Connect client URL can be set; however, got a different 
URL [] from the existing []."
-]
-  },
-  "CONNECT_URL_NOT_SET": {
-"message": [
-  "Cannot create a Spark Connect session because the Spark Connect remote 
URL has not been set. Please define the remote URL by setting either the 
'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-"message": [
-  "It appears that you are attempting to reference SparkContext from a 
broadcast variable, action, or transformation. SparkContext can only be used on 
the driver, not in code that it 

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-29 Thread via GitHub


HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1470588168


##
python/pyspark/errors/error_classes.py:
##
@@ -15,1110 +15,14 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-"message": [
-  "An application name must be set in your configuration."
-]
-  },
-  "ARGUMENT_REQUIRED": {
-"message": [
-  "Argument `` is required when ."
-]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-"message": [
-  "Arrow legacy IPC format is not supported in PySpark, please unset 
ARROW_PRE_0_15_IPC_FORMAT."
-]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-"message": [
-  "Attribute `` in provided object `` is not 
callable."
-]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-"message": [
-  "Attribute `` is not supported."
-]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-"message": [
-  "Length mismatch: Expected axis has  element, new 
values have  elements."
-]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-"message": [
-  "Broadcast variable `` not loaded."
-]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-"message": [
-  "Not supported to call `` before initialize ."
-]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-"message": [
-  "`` can not accept object `` in type ``."
-]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-"message": [
-  "Dunder(double underscore) attribute is for internal use only."
-]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-"message": [
-  "Cannot apply 'in' operator against a column: please use 'contains' in a 
string column or 'array_contains' function for an array column."
-]
-  },
-  "CANNOT_BE_EMPTY": {
-"message": [
-  "At least one  must be specified."
-]
-  },
-  "CANNOT_BE_NONE": {
-"message": [
-  "Argument `` cannot be None."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-"message": [
-  "Spark Connect server cannot be configured: Existing [], 
New []."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-"message": [
-  "Spark Connect server and Spark master cannot be configured together: 
Spark master [], Spark Connect []."
-]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-"message": [
-  "Cannot convert column into bool: please use '&' for 'and', '|' for 
'or', '~' for 'not' when building DataFrame boolean expressions."
-]
-  },
-  "CANNOT_CONVERT_TYPE": {
-"message": [
-  "Cannot convert  into ."
-]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-"message": [
-  "Some of types cannot be determined after inferring."
-]
-  },
-  "CANNOT_GET_BATCH_ID": {
-"message": [
-  "Could not get batch id from ."
-]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-"message": [
-  "Can not infer Array Type from a list with None as the first element."
-]
-  },
-  "CANNOT_INFER_EMPTY_SCHEMA": {
-"message": [
-  "Can not infer schema from an empty dataset."
-]
-  },
-  "CANNOT_INFER_SCHEMA_FOR_TYPE": {
-"message": [
-  "Can not infer schema for type: ``."
-]
-  },
-  "CANNOT_INFER_TYPE_FOR_FIELD": {
-"message": [
-  "Unable to infer the type of the field ``."
-]
-  },
-  "CANNOT_MERGE_TYPE": {
-"message": [
-  "Can not merge type `` and ``."
-]
-  },
-  "CANNOT_OPEN_SOCKET": {
-"message": [
-  "Can not open socket: ."
-]
-  },
-  "CANNOT_PARSE_DATATYPE": {
-"message": [
-  "Unable to parse datatype. ."
-]
-  },
-  "CANNOT_PROVIDE_METADATA": {
-"message": [
-  "Metadata can only be provided for a single column."
-]
-  },
-  "CANNOT_SET_TOGETHER": {
-"message": [
-  " should not be set together."
-]
-  },
-  "CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF": {
-"message": [
-  "returnType can not be specified when `` is a user-defined 
function, but got ."
-]
-  },
-  "CANNOT_WITHOUT": {
-"message": [
-  "Cannot  without ."
-]
-  },
-  "COLUMN_IN_LIST": {
-"message": [
-  "`` does not allow a Column in a list."
-]
-  },
-  "CONNECT_URL_ALREADY_DEFINED": {
-"message": [
-  "Only one Spark Connect client URL can be set; however, got a different 
URL [] from the existing []."
-]
-  },
-  "CONNECT_URL_NOT_SET": {
-"message": [
-  "Cannot create a Spark Connect session because the Spark Connect remote 
URL has not been set. Please define the remote URL by setting either the 
'spark.remote' option or the 'SPARK_REMOTE' environment variable."
-]
-  },
-  "CONTEXT_ONLY_VALID_ON_DRIVER": {
-"message": [
-  "It appears that you are attempting to reference SparkContext from a 
broadcast variable, action, or transformation. SparkContext can only be used on 
the driver, not in code that 

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-29 Thread via GitHub


nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1470557657


##
python/pyspark/errors/exceptions/__init__.py:
##
@@ -18,39 +18,15 @@
 
 def _write_self() -> None:
 import json
+from pathlib import Path
 from pyspark.errors import error_classes
 
-with open("python/pyspark/errors/error_classes.py", "w") as f:
-error_class_py_file = """#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
-import json
-
-
-ERROR_CLASSES_JSON = '''
-%s
-'''
+ERRORS_DIR = Path(__file__).parents[1]

Review Comment:
   I don't know if this particular line will work when PySpark is packaged for 
distribution, but that's OK because `_write_self()` is meant for use by 
developers who are writing to the JSON file during development. Right?
   
   I don't think we want to use `importlib.resources` here because that's for 
loading resources from a potentially read-only volume, which may be the case 
when PySpark is installed from a ZIP file, for example. Since this is a 
development tool, we need a functioning filesystem with write access, so 
`__file__` will work fine.



##
python/pyspark/errors/error_classes.py:
##
@@ -15,1110 +15,14 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+import importlib.resources
 
-
-ERROR_CLASSES_JSON = '''
-{
-  "APPLICATION_NAME_NOT_SET": {
-"message": [
-  "An application name must be set in your configuration."
-]
-  },
-  "ARGUMENT_REQUIRED": {
-"message": [
-  "Argument `` is required when ."
-]
-  },
-  "ARROW_LEGACY_IPC_FORMAT": {
-"message": [
-  "Arrow legacy IPC format is not supported in PySpark, please unset 
ARROW_PRE_0_15_IPC_FORMAT."
-]
-  },
-  "ATTRIBUTE_NOT_CALLABLE": {
-"message": [
-  "Attribute `` in provided object `` is not 
callable."
-]
-  },
-  "ATTRIBUTE_NOT_SUPPORTED": {
-"message": [
-  "Attribute `` is not supported."
-]
-  },
-  "AXIS_LENGTH_MISMATCH": {
-"message": [
-  "Length mismatch: Expected axis has  element, new 
values have  elements."
-]
-  },
-  "BROADCAST_VARIABLE_NOT_LOADED": {
-"message": [
-  "Broadcast variable `` not loaded."
-]
-  },
-  "CALL_BEFORE_INITIALIZE": {
-"message": [
-  "Not supported to call `` before initialize ."
-]
-  },
-  "CANNOT_ACCEPT_OBJECT_IN_TYPE": {
-"message": [
-  "`` can not accept object `` in type ``."
-]
-  },
-  "CANNOT_ACCESS_TO_DUNDER": {
-"message": [
-  "Dunder(double underscore) attribute is for internal use only."
-]
-  },
-  "CANNOT_APPLY_IN_FOR_COLUMN": {
-"message": [
-  "Cannot apply 'in' operator against a column: please use 'contains' in a 
string column or 'array_contains' function for an array column."
-]
-  },
-  "CANNOT_BE_EMPTY": {
-"message": [
-  "At least one  must be specified."
-]
-  },
-  "CANNOT_BE_NONE": {
-"message": [
-  "Argument `` cannot be None."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT": {
-"message": [
-  "Spark Connect server cannot be configured: Existing [], 
New []."
-]
-  },
-  "CANNOT_CONFIGURE_SPARK_CONNECT_MASTER": {
-"message": [
-  "Spark Connect server and Spark master cannot be configured together: 
Spark master [], Spark Connect []."
-]
-  },
-  "CANNOT_CONVERT_COLUMN_INTO_BOOL": {
-"message": [
-  "Cannot convert column into bool: please use '&' for 'and', '|' for 
'or', '~' for 'not' when building DataFrame boolean expressions."
-]
-  },
-  "CANNOT_CONVERT_TYPE": {
-"message": [
-  "Cannot convert  into ."
-]
-  },
-  "CANNOT_DETERMINE_TYPE": {
-"message": [
-  "Some of types cannot be determined after inferring."
-]
-  },
-  "CANNOT_GET_BATCH_ID": {
-"message": [
-  "Could not get batch id from ."
-]
-  },
-  "CANNOT_INFER_ARRAY_TYPE": {
-"message": [
-  "Can not infer Array Type from a list with

Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-29 Thread via GitHub


HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1470387128


##
python/pyspark/errors/exceptions/__init__.py:
##
@@ -18,39 +18,15 @@
 
 def _write_self() -> None:
 import json
+from pathlib import Path
 from pyspark.errors import error_classes
 
-with open("python/pyspark/errors/error_classes.py", "w") as f:
-error_class_py_file = """#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
-import json
-
-
-ERROR_CLASSES_JSON = '''
-%s
-'''
+ERRORS_DIR = Path(__file__).parents[1]

Review Comment:
   Just to confirm, so it works both when you `pip install pyspark`  and when 
download this from Apache Spark channel 
(https://spark.apache.org/downloads.html)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-29 Thread via GitHub


nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1914877483

   I think we should wait for the conversation in SPARK-46810 to resolve before 
merging this in.
   
   But apart from that, is there anything more you'd like me to check here? Do 
you approve of the use of `importlib.resources` (which I think is the "correct" 
solution in our case)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-28 Thread via GitHub


itholic commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1914109640

   > Is this a reference to this command?
   
   Yes, so you might need to fix the description from 
https://github.com/apache/spark/blob/master/python/pyspark/errors_doc_gen.py#L44.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-28 Thread via GitHub


itholic commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1914106426

   > This is for a separate potential PR, but if it were possible to use the 
"main" error JSON files from Scala in PySpark automatically, would we want to 
do that?
   
   I don't think so. As I recall, the main reason for not doing it was because, 
as you said, the error structure on the PySpark side is different from the 
error structure on the JVM side.
   
   > I will be consolidating the various sql-error-* pages. I will tag you in 
those PRs when I open them.
   
   +1
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-28 Thread via GitHub


nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1913976049

   The build is still running, but the [pyspark-core tests are 
passing](https://github.com/nchammas/spark/actions/runs/7691032575/job/20955751265).
 I believe `importlib.resources` is what we need to load data files packaged in 
the distribution we upload to PyPI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-28 Thread via GitHub


nchammas commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1913899832

   > since we cannot integrate with the 
[error-classes.json](https://github.com/databricks/runtime/blob/master/common/utils/src/main/resources/error/error-classes.json)
 file on the JVM side
   
   This is for a separate potential PR, but if it were possible to use the 
"main" error JSON files from Scala in PySpark automatically, would we want to 
do that? I see that PySpark's errors don't define a SQLSTATE, so I assumed they 
were a separate thing and we didn't want to reuse the main error definitions.
   
   > So I agree to change to a `json` file if the advantage of using a `json` 
file over using a `py` file is clear, and if there are no issues with 
packaging. Also you might need to take a deeper look at the documentation. For 
example we're pointing the `py` file path from [Error classes in 
PySpark](https://spark.apache.org/docs/latest/api/python/development/errors.html#error-classes-in-pyspark).
   
   Is this a reference to this command? 
https://github.com/apache/spark/blob/8060e7e73170c0122acb2a005f3c54487e226208/python/docs/source/conf.py#L42
   
   Anyway, just FYI I am in the middle of revamping the main error 
documentation to make it easier to maintain. First I am cleaning up the 
terminology in #44902 but then I will consolidating the various `sql-error-*` 
pages. I will tag you in those PRs when I open them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-28 Thread via GitHub


itholic commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-191354

   IIRC there was no major issue with managing the `json` itself. However, 
since we cannot integrate with the 
[error-classes.json](https://github.com/databricks/runtime/blob/master/common/utils/src/main/resources/error/error-classes.json)
 file on the JVM side - because we didn't want to have a JVM dependency -, we 
simply adopted a `.py` file that is more convenient way to manage in Python.
   
   So I agree to change to a `json` file if the advantage of using a json file 
over using a `.py` file is clear, and if there are no issues with packaging. 
Also you might need to take a deeper look at the documentation. For example 
we're using `.py` file to build documentation for [Error classes in 
PySpark](https://spark.apache.org/docs/latest/api/python/development/errors.html#error-classes-in-pyspark).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-28 Thread via GitHub


nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1469030237


##
python/pyspark/errors/error_classes.py:
##
@@ -15,1110 +15,16 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+from pathlib import Path
 
+THIS_DIR = Path(__file__).parent

Review Comment:
   Actually, you are right. `MANIFEST.in` needs to be adjusted. I see the JSON 
file added to the `dist/` directory, but it doesn't get installed into the 
virtual environment I created for testing.
   
   To find this, I had to adjust my test from `pip install -e .` to `pip 
install .`, so that the virtual environment gets its own copy of PySpark and 
does not rely on the source repo.
   
   Fix incoming...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-28 Thread via GitHub


HyukjinKwon commented on PR #44920:
URL: https://github.com/apache/spark/pull/44920#issuecomment-1913861869

   @itholic can you take a look please? I remember you took a look and failed 
to find a good way to upload JSON.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-28 Thread via GitHub


HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1469019240


##
python/pyspark/errors/error_classes.py:
##
@@ -15,1110 +15,16 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+from pathlib import Path
 
+THIS_DIR = Path(__file__).parent

Review Comment:
   Hmmm .. I thought we should change sth in `setup.py` to include JSON as a 
data file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-28 Thread via GitHub


nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1469003292


##
python/pyspark/errors/error_classes.py:
##
@@ -15,1110 +15,16 @@
 # limitations under the License.
 #
 
-# NOTE: Automatically sort this file via
-# - cd $SPARK_HOME
-# - bin/pyspark
-# - from pyspark.errors.exceptions import _write_self; _write_self()
 import json
+from pathlib import Path
 
+THIS_DIR = Path(__file__).parent
+# Note that though we call them "error classes" here, the proper name is 
"error conditions",
+# hence why the name of the JSON file different.
+# For more information, please see: 
https://issues.apache.org/jira/browse/SPARK-46810
+ERROR_CONDITIONS_PATH = THIS_DIR / "error-conditions.json"

Review Comment:
   This comment is related to the work being done in #44902, by the way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-28 Thread via GitHub


nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1469000326


##
python/pyspark/errors/error-conditions.json:
##
@@ -0,0 +1,1096 @@
+{

Review Comment:
   Confirmed. I've updated the PR description accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-28 Thread via GitHub


nchammas commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1468973275


##
python/pyspark/errors/error-conditions.json:
##
@@ -0,0 +1,1096 @@
+{

Review Comment:
   Yes, good call out. I was looking at `MANIFEST.in` and I believe this file 
should be included, but I will confirm.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46894][PYTHON] Move PySpark error conditions into standalone JSON file [spark]

2024-01-28 Thread via GitHub


HyukjinKwon commented on code in PR #44920:
URL: https://github.com/apache/spark/pull/44920#discussion_r1468965413


##
python/pyspark/errors/error-conditions.json:
##
@@ -0,0 +1,1096 @@
+{

Review Comment:
   The problem is that it has to be packaged together, and able to be uploaded 
into PyPI. Would be great if we can make sure that still works



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org