[GitHub] [spark] zero323 commented on a change in pull request #27109: [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package

GitBox Tue, 07 Jan 2020 05:40:40 -0800

zero323 commented on a change in pull request #27109: 
[SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' 
sub-package
URL: https://github.com/apache/spark/pull/27109#discussion_r363750892


 ##########
 File path: python/pyspark/sql/dataframe.py
 ##########
 @@ -31,23 +31,23 @@
 
 from pyspark import copy_func, since, _NoValue
 from pyspark.rdd import RDD, _load_from_socket, _local_iterator_from_socket, \
-    ignore_unicode_prefix, PythonEvalType
-from pyspark.serializers import ArrowCollectSerializer, BatchedSerializer, 
PickleSerializer, \
+    ignore_unicode_prefix
+from pyspark.serializers import BatchedSerializer, PickleSerializer, \
     UTF8Deserializer
 from pyspark.storagelevel import StorageLevel
 from pyspark.traceback_utils import SCCallSiteSync
 from pyspark.sql.types import _parse_datatype_json_string
 from pyspark.sql.column import Column, _to_seq, _to_list, _to_java_column
 from pyspark.sql.readwriter import DataFrameWriter
 from pyspark.sql.streaming import DataStreamWriter
-from pyspark.sql.types import IntegralType
 from pyspark.sql.types import *
-from pyspark.util import _exception_message
+from pyspark.sql.pandas.conversion import PandasConversionMixin
+from pyspark.sql.pandas.map_ops import PandasMapOpsMixin
 
 __all__ = ["DataFrame", "DataFrameNaFunctions", "DataFrameStatFunctions"]
 
 
-class DataFrame(object):
+class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 
 Review comment:
   @cloud-fan 
   
   > I think both are fine. It's internal so we can change it later.
   
   It is hardly internal, considering that as mixed classes are "public" (as 
much as it is meaningful to say about access control in Python) and, in 
contrast to other changes proposed here, all shifted non-static methods are 
part of external API. The impact so far is small though, if that's what you 
mean.
   
   @HyukjinKwon 
   
   > Also, I believe what I am doing is what self type trait is supposed to be 
doing. It's coupled to specific type and other types cannot implement this 
trait.
   
   Such patterns, or its closes Python equivalents (see for example Django 
mixins for class-based views) typically indicate two things:
   
   - Potential for inheritance, which is clearly not the case, given both Spark 
API and design of the `DataFrame` class.
   - Non-obligatory character, which once again is not the case.
   
   So I guess the question I am trying to ask is - "what future planned changes 
justify such move" - as for now it seems mostly obsolete, and less effort path, 
given implied time pressure, would be to keep things as-is.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zero323 commented on a change in pull request #27109: [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package

Reply via email to