[GitHub] spark pull request #21654: [SPARK-24671][PySpark] DataFrame length using a d...

rgbkrk Mon, 10 Sep 2018 10:49:53 -0700

Github user rgbkrk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21654#discussion_r216414567
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -375,6 +375,9 @@ def _truncate(self):
             return int(self.sql_ctx.getConf(
                 "spark.sql.repl.eagerEval.truncate", "20"))
     
    +    def __len__(self):
    --- End diff --
    
    I'd argue for bringing this in, if you don't think we're providing people a 
footgun where they'd incidentally use `len()` on a dataframe often. As for 
making a plan around built in function support, I'm happy to be part of a 
`_repr_*_` campaign. I wouldn't have the background to participate in others 
(`__lt__`, etc.) as I wouldn't be able to weigh their maintainability, 
performance, and utility like I could visual elements like reprs.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21654: [SPARK-24671][PySpark] DataFrame length using a d...

Reply via email to