[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

xuanyuanking Thu, 31 May 2018 18:49:15 -0700

Github user xuanyuanking commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21370#discussion_r192282041
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False):
             else:
                 print(self._jdf.showString(n, int(truncate), vertical))
     
    +    @property
    +    def _eager_eval(self):
    +        """Returns true if the eager evaluation enabled.
    +        """
    +        return self.sql_ctx.getConf(
    +            "spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
    +
    +    @property
    +    def _max_num_rows(self):
    +        """Returns the max row number for eager evaluation.
    +        """
    +        return int(self.sql_ctx.getConf(
    +            "spark.sql.repl.eagerEval.maxNumRows", "20"))
    +
    +    @property
    +    def _truncate(self):
    +        """Returns the truncate length for eager evaluation.
    +        """
    +        return int(self.sql_ctx.getConf(
    +            "spark.sql.repl.eagerEval.truncate", "20"))
    +
         def __repr__(self):
    -        return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
    +        if not self._support_repr_html and self._eager_eval:
    +            vertical = False
    +            return self._jdf.showString(
    +                self._max_num_rows, self._truncate, vertical)
    +        else:
    +            return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
    +
    +    def _repr_html_(self):
    +        """Returns a dataframe with html code when you enabled eager 
evaluation
    +        by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
    +        using support eager evaluation with HTML.
    +        """
    +        import cgi
    +        if not self._support_repr_html:
    +            self._support_repr_html = True
    +        if self._eager_eval:
    +            max_num_rows = self._max_num_rows
    --- End diff --
    
    Yes, but I do this in scala side `getRowsToPython`. Link here: 
https://github.com/apache/spark/pull/21370/files/9c6b3bbc430ffbcb752dc9870df877728f356cb8#diff-7a46f10c3cedbf013cf255564d9483cdR3229
    This is because during my test, I found python `sys.intmax` actually cast 
to long with 2 ^ 63 - 1 while scala `Int.MaxValue` is 2 ^ 31 - 1.
    
![image](https://user-images.githubusercontent.com/4833765/40816707-fb9f1eee-6580-11e8-9a24-9667aadc5177.png)




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

Reply via email to