[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194641579
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3209,6 +3222,19 @@ class Dataset[T] private[sql](
 }
   }
 
+  private[sql] def getRowsToPython(
+  _numRows: Int,
+  truncate: Int,
+  vertical: Boolean): Array[Any] = {
+EvaluatePython.registerPicklers()
+val numRows = _numRows.max(0).min(Int.MaxValue - 1)
+val rows = getRows(numRows, truncate, vertical).map(_.toArray).toArray
+val toJava: (Any) => Any = EvaluatePython.toJava(_, 
ArrayType(ArrayType(StringType)))
+val iter: Iterator[Array[Byte]] = new SerDeUtil.AutoBatchedPickler(
+  rows.iterator.map(toJava))
+PythonRDD.serveIterator(iter, "serve-GetRows")
--- End diff --

I think we return `Array[Any]` for `PythonRDD.serveIterator` too.


https://github.com/apache/spark/blob/628c7b517969c4a7ccb26ea67ab3dd61266073ca/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L400

Did I maybe miss something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194629747
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3209,6 +3222,19 @@ class Dataset[T] private[sql](
 }
   }
 
+  private[sql] def getRowsToPython(
+  _numRows: Int,
+  truncate: Int,
+  vertical: Boolean): Array[Any] = {
+EvaluatePython.registerPicklers()
+val numRows = _numRows.max(0).min(Int.MaxValue - 1)
+val rows = getRows(numRows, truncate, vertical).map(_.toArray).toArray
+val toJava: (Any) => Any = EvaluatePython.toJava(_, 
ArrayType(ArrayType(StringType)))
+val iter: Iterator[Array[Byte]] = new SerDeUtil.AutoBatchedPickler(
+  rows.iterator.map(toJava))
+PythonRDD.serveIterator(iter, "serve-GetRows")
--- End diff --

`PythonRDD.serveIterator(iter, "serve-GetRows")` returns `Int`, but the 
return type of `getRowsToPython `  is `Array[Any]`. How does it work? cc 
@xuanyuanking @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194292067
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
--- End diff --

This PR also changed `__repr__`. Thus, we need to update the PR title and 
description. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194287915
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
--- End diff --

In the ongoing release, a nice-to-have refactoring is to move all the Core 
Confs into a single file just like what we did in Spark SQL Conf. Default 
values, boundary checking, types and descriptions. Thus, in PySpark, it would 
be better to do it starting from now. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194278100
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
--- End diff --

Probably, we should access to SQLConf object. 1. Agree with not hardcoding 
it in general but 2. IMHO I want to avoid Py4J JVM accesses in the test because 
the test can likely be more flaky up to my knowledge, on the other hand (unlike 
Scala or Java side).

Maybe we should try to take a look about this hardcoding if we see more 
occurrences next time


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194277542
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and the REPL you are using 
supports eager evaluation,
--- End diff --

Just a question. When the REPL does not support eager evaluation, could we 
do anything better instead of silently ignoring the user inputs? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194277082
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3209,6 +3222,19 @@ class Dataset[T] private[sql](
 }
   }
 
+  private[sql] def getRowsToPython(
--- End diff --

In DataFrameSuite, we have multiple test cases for `showString` instead of 
`getRows `, which is introduced in this PR. 

We also need the unit test cases for `getRowsToPython`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194276795
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
--- End diff --

These confs are not part of `spark.sql("SET -v").show(numRows = 200, 
truncate = false)`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194276735
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
--- End diff --

Is that possible we can avoid hard-coding these conf key values? cc @ueshin 
@HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194276557
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
--- End diff --

All the SQL configurations should follow what we did in the section of 
`Spark SQL` https://spark.apache.org/docs/latest/configuration.html#spark-sql. 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194276329
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,68 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+vertical = False
--- End diff --

Any discussion about this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194276298
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3209,6 +3222,19 @@ class Dataset[T] private[sql](
 }
   }
 
+  private[sql] def getRowsToPython(
+  _numRows: Int,
+  truncate: Int,
+  vertical: Boolean): Array[Any] = {
+EvaluatePython.registerPicklers()
+val numRows = _numRows.max(0).min(Int.MaxValue - 1)
--- End diff --

This should be also part of the conf description. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194276179
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3074,6 +3074,36 @@ def test_checking_csv_header(self):
 finally:
 shutil.rmtree(path)
 
+def test_repr_html(self):
--- End diff --

This function only covers the most basic positive case. We need also add 
more test cases. For example, the results when 
`spark.sql.repl.eagerEval.enabled` is set to `false`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194275282
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and the REPL you are using 
supports eager evaluation,
+Dataset will be ran automatically. The HTML table which generated by 
_repl_html_
+called by notebooks like Jupyter will feedback the queries user have 
defined. For plain Python
+REPL, the output will be shown like dataframe.show()
+(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.maxNumRows
+  20
+  
+Default number of rows in eager evaluation output HTML table generated 
by _repr_html_ or plain text,
+this only take effect when 
spark.sql.repl.eagerEval.enabled is set to true.
--- End diff --

take -> takes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r194275288
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and the REPL you are using 
supports eager evaluation,
+Dataset will be ran automatically. The HTML table which generated by 
_repl_html_
+called by notebooks like Jupyter will feedback the queries user have 
defined. For plain Python
+REPL, the output will be shown like dataframe.show()
+(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.maxNumRows
+  20
+  
+Default number of rows in eager evaluation output HTML table generated 
by _repr_html_ or plain text,
+this only take effect when 
spark.sql.repl.eagerEval.enabled is set to true.
+  
+
+
+  spark.sql.repl.eagerEval.truncate
+  20
+  
+Default number of truncate in eager evaluation output HTML table 
generated by _repr_html_ or
+plain text, this only take effect when 
spark.sql.repl.eagerEval.enabled set to true.
--- End diff --

take -> takes


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21370


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-04 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192772218
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+max_num_rows, self._truncate, vertical)
+rows = list(_load_from_socket(sock_info, 
BatchedSerializer(PickleSerializer(
+head = rows[0]
+row_data = rows[1:]
+has_more_data = len(row_data) > max_num_rows
+row_data = row_data[0:max_num_rows]
+
+html = "\n"
+# generate table head
+html += "".join(map(lambda x: cgi.escape(x), head)) + 
"\n"
+# generate table rows
+for row in row_data:
+data = "" + "".join(map(lambda x: 
cgi.escape(x), row)) + \
+"\n"
+html += data
+html += "\n"
+if has_more_data:
+html += "only showing top %d %s\n" % (
--- End diff --

Maybe we need this? Just want to keep same with `df.show()`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-04 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192772009
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+max_num_rows, self._truncate, vertical)
+rows = list(_load_from_socket(sock_info, 
BatchedSerializer(PickleSerializer(
+head = rows[0]
+row_data = rows[1:]
+has_more_data = len(row_data) > max_num_rows
+row_data = row_data[0:max_num_rows]
+
+html = "\n"
+# generate table head
+html += "".join(map(lambda x: cgi.escape(x), head)) + 
"\n"
+# generate table rows
+for row in row_data:
+data = "" + "".join(map(lambda x: 
cgi.escape(x), row)) + \
+"\n"
--- End diff --

Thanks, more clearer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-04 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192771951
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3040,6 +3040,36 @@ def test_csv_sampling_ratio(self):
 .csv(rdd, samplingRatio=0.5).schema
 self.assertEquals(schema, StructType([StructField("_c0", 
IntegerType(), True)]))
 
+def test_repr_html(self):
+import re
+pattern = re.compile(r'^ *\|', re.MULTILINE)
+df = self.spark.createDataFrame([(1, "1"), (2, "2")], 
("key", "value"))
+self.assertEquals(None, df._repr_html_())
+self.spark.conf.set("spark.sql.repl.eagerEval.enabled", "true")
--- End diff --

Thanks, done in next commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-04 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192771831
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+max_num_rows, self._truncate, vertical)
+rows = list(_load_from_socket(sock_info, 
BatchedSerializer(PickleSerializer(
+head = rows[0]
+row_data = rows[1:]
+has_more_data = len(row_data) > max_num_rows
+row_data = row_data[0:max_num_rows]
+
+html = "\n"
+# generate table head
+html += "".join(map(lambda x: cgi.escape(x), head)) + 
"\n"
--- End diff --

Thanks, more clearer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-04 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192771787
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+max_num_rows, self._truncate, vertical)
+rows = list(_load_from_socket(sock_info, 
BatchedSerializer(PickleSerializer(
+head = rows[0]
+row_data = rows[1:]
+has_more_data = len(row_data) > max_num_rows
+row_data = row_data[0:max_num_rows]
--- End diff --

Thanks, done in next commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-04 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192771103
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
--- End diff --

Thanks, delete it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192610559
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+max_num_rows, self._truncate, vertical)
+rows = list(_load_from_socket(sock_info, 
BatchedSerializer(PickleSerializer(
+head = rows[0]
+row_data = rows[1:]
+has_more_data = len(row_data) > max_num_rows
+row_data = row_data[0:max_num_rows]
+
+html = "\n"
+# generate table head
+html += "".join(map(lambda x: cgi.escape(x), head)) + 
"\n"
+# generate table rows
+for row in row_data:
+data = "" + "".join(map(lambda x: 
cgi.escape(x), row)) + \
+"\n"
--- End diff --

ditto:

```
"%s\n" % "".join(map(lambda x: cgi.escape(x), 
row))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192610390
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+max_num_rows, self._truncate, vertical)
+rows = list(_load_from_socket(sock_info, 
BatchedSerializer(PickleSerializer(
+head = rows[0]
+row_data = rows[1:]
+has_more_data = len(row_data) > max_num_rows
+row_data = row_data[0:max_num_rows]
--- End diff --

tiny nit: `row_data[:max_num_rows]`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192610512
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+max_num_rows, self._truncate, vertical)
+rows = list(_load_from_socket(sock_info, 
BatchedSerializer(PickleSerializer(
+head = rows[0]
+row_data = rows[1:]
+has_more_data = len(row_data) > max_num_rows
+row_data = row_data[0:max_num_rows]
+
+html = "\n"
+# generate table head
+html += "".join(map(lambda x: cgi.escape(x), head)) + 
"\n"
--- End diff --

maybe:

```
"%s\n" % "".join(map(lambda x: cgi.escape(x), 
head))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192610308
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
--- End diff --

`css` seems not used.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192610839
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3040,6 +3040,36 @@ def test_csv_sampling_ratio(self):
 .csv(rdd, samplingRatio=0.5).schema
 self.assertEquals(schema, StructType([StructField("_c0", 
IntegerType(), True)]))
 
+def test_repr_html(self):
+import re
+pattern = re.compile(r'^ *\|', re.MULTILINE)
+df = self.spark.createDataFrame([(1, "1"), (2, "2")], 
("key", "value"))
+self.assertEquals(None, df._repr_html_())
+self.spark.conf.set("spark.sql.repl.eagerEval.enabled", "true")
--- End diff --

Can we use `with self.sql_conf(...)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192610620
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +354,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = max(self._max_num_rows, 0)
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+max_num_rows, self._truncate, vertical)
+rows = list(_load_from_socket(sock_info, 
BatchedSerializer(PickleSerializer(
+head = rows[0]
+row_data = rows[1:]
+has_more_data = len(row_data) > max_num_rows
+row_data = row_data[0:max_num_rows]
+
+html = "\n"
+# generate table head
+html += "".join(map(lambda x: cgi.escape(x), head)) + 
"\n"
+# generate table rows
+for row in row_data:
+data = "" + "".join(map(lambda x: 
cgi.escape(x), row)) + \
+"\n"
+html += data
+html += "\n"
+if has_more_data:
+html += "only showing top %d %s\n" % (
--- End diff --

I'd just way `row(s)`. Don't have to be super clever on this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192548464
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically. HTML table will feedback the 
queries user have defined if
--- End diff --

The HTML table is generated by `_repr_html_`, isn't Jupyter only term. 
`_repr_html` is the rich display support for IPython in notebook and Qt 
console. I think it can be used in other place but currently I just test this 
in Jupyter. I re-write the doc, please check is it appropriate, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192548359
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically. HTML table will feedback the 
queries user have defined if
+_repl_html_ called by notebooks like Jupyter, otherwise 
for plain Python REPL, output
--- End diff --

Thanks, done in 5b36604.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192548352
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically. HTML table will feedback the 
queries user have defined if
+_repl_html_ called by notebooks like Jupyter, otherwise 
for plain Python REPL, output
+will be shown like dataframe.show()
+(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.maxNumRows
+  20
+  
+Default number of rows in eager evaluation output HTML table generated 
by _repr_html_ or plain text,
+this only take effect when 
spark.sql.repl.eagerEval.enabled set to true.
--- End diff --

Thanks, done in 5b36604.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192548361
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
--- End diff --

Thanks, done in 5b36604.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192446664
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically. HTML table will feedback the 
queries user have defined if
+_repl_html_ called by notebooks like Jupyter, otherwise 
for plain Python REPL, output
--- End diff --

`output ` -> `the output`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192446542
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically. HTML table will feedback the 
queries user have defined if
+_repl_html_ called by notebooks like Jupyter, otherwise 
for plain Python REPL, output
+will be shown like dataframe.show()
+(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.maxNumRows
+  20
+  
+Default number of rows in eager evaluation output HTML table generated 
by _repr_html_ or plain text,
+this only take effect when 
spark.sql.repl.eagerEval.enabled set to true.
--- End diff --

`set to` -> `is set to`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192446886
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
--- End diff --

`REPL` -> `the REPL`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192447943
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,33 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically. HTML table will feedback the 
queries user have defined if
--- End diff --

`dataframe` -> `DataFrame/Dataset`

What is `HTML table`? Is the term used in Jupyter only?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192349637
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
--- End diff --

Just want to avoid calling `_jdf` twice here, cause the second time called 
by `__repr__ ` is useless while `_repr_html_` is supported. The return string 
of `__repr__` will not finally be shown to notebook.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192349210
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = self._max_num_rows
--- End diff --

Thanks, done in 7f43a8b.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192349023
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically and html table will feedback the 
queries user have defined
--- End diff --

Sorry for this...again. 7f43a8b


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192349075
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -291,37 +310,30 @@ class Dataset[T] private[sql](
 }
   }
 
+  val paddedRows = rows.map { row =>
+row.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+  }
+
   // Create SeparateLine
   val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
 
   // column names
-  rows.head.zipWithIndex.map { case (cell, i) =>
-if (truncate > 0) {
-  StringUtils.leftPad(cell, colWidths(i))
-} else {
-  StringUtils.rightPad(cell, colWidths(i))
-}
-  }.addString(sb, "|", "|", "|\n")
-
+  paddedRows.head.addString(sb, "|", "|", "|\n")
   sb.append(sep)
 
   // data
-  rows.tail.foreach {
-_.zipWithIndex.map { case (cell, i) =>
-  if (truncate > 0) {
-StringUtils.leftPad(cell.toString, colWidths(i))
-  } else {
-StringUtils.rightPad(cell.toString, colWidths(i))
-  }
-}.addString(sb, "|", "|", "|\n")
-  }
-
+  paddedRows.tail.foreach(_.addString(sb, "|", "|", "|\n"))
   sb.append(sep)
 } else {
   // Extended display mode enabled
   val fieldNames = rows.head
   val dataRows = rows.tail
-
--- End diff --

Thanks, done in 7f43a8b.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192349063
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -78,6 +78,7 @@ def __init__(self, jdf, sql_ctx):
 self.is_cached = False
 self._schema = None  # initialized lazily
 self._lazy_rdd = None
+self._support_repr_html = False
--- End diff --

Got it, more comments in 7f43a8b.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-06-01 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192348972
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically and html table will feedback the 
queries user have defined
+(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.maxNumRows
+  20
+  
+Default number of rows in HTML table.
--- End diff --

Got it, more detailed description in 7f43a8b. Please check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192292200
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -78,6 +78,7 @@ def __init__(self, jdf, sql_ctx):
 self.is_cached = False
 self._schema = None  # initialized lazily
 self._lazy_rdd = None
+self._support_repr_html = False
--- End diff --

Shall we explain why we need this (as talked in 
https://github.com/apache/spark/pull/21370#discussion_r191591799)? It took me a 
while to understand too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192292453
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically and html table will feedback the 
queries user have defined
+(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.maxNumRows
+  20
+  
+Default number of rows in HTML table.
--- End diff --

Shell we explain a bit more what's the HTML table here a bit more? For 
example, I think at least we should say it's `_repr_html_`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192292278
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and REPL you are using 
supports eager evaluation,
+dataframe will be ran automatically and html table will feedback the 
queries user have defined
--- End diff --

html -> HTML


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-31 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192291854
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -291,37 +310,30 @@ class Dataset[T] private[sql](
 }
   }
 
+  val paddedRows = rows.map { row =>
+row.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+  }
+
   // Create SeparateLine
   val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
 
   // column names
-  rows.head.zipWithIndex.map { case (cell, i) =>
-if (truncate > 0) {
-  StringUtils.leftPad(cell, colWidths(i))
-} else {
-  StringUtils.rightPad(cell, colWidths(i))
-}
-  }.addString(sb, "|", "|", "|\n")
-
+  paddedRows.head.addString(sb, "|", "|", "|\n")
   sb.append(sep)
 
   // data
-  rows.tail.foreach {
-_.zipWithIndex.map { case (cell, i) =>
-  if (truncate > 0) {
-StringUtils.leftPad(cell.toString, colWidths(i))
-  } else {
-StringUtils.rightPad(cell.toString, colWidths(i))
-  }
-}.addString(sb, "|", "|", "|\n")
-  }
-
+  paddedRows.tail.foreach(_.addString(sb, "|", "|", "|\n"))
   sb.append(sep)
 } else {
   // Extended display mode enabled
   val fieldNames = rows.head
   val dataRows = rows.tail
-
--- End diff --

Shall we revive this newline back?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-31 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192291498
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = self._max_num_rows
--- End diff --

I see. I think it's okay with max(0) only.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-31 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192282041
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = self._max_num_rows
--- End diff --

Yes, but I do this in scala side `getRowsToPython`. Link here: 
https://github.com/apache/spark/pull/21370/files/9c6b3bbc430ffbcb752dc9870df877728f356cb8#diff-7a46f10c3cedbf013cf255564d9483cdR3229
This is because during my test, I found python `sys.intmax` actually cast 
to long with 2 ^ 63 - 1 while scala `Int.MaxValue` is 2 ^ 31 - 1.

![image](https://user-images.githubusercontent.com/4833765/40816707-fb9f1eee-6580-11e8-9a24-9667aadc5177.png)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-31 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192209239
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
--- End diff --

I see, thanks.
I thinks it's okay, but I'm just curious why you want to restrict it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-31 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192207299
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def _eager_eval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def _max_num_rows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def _truncate(self):
+"""Returns the truncate length for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", "20"))
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+if not self._support_repr_html and self._eager_eval:
+vertical = False
+return self._jdf.showString(
+self._max_num_rows, self._truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you are
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+if self._eager_eval:
+max_num_rows = self._max_num_rows
--- End diff --

We need to adjust `max_num_rows` as the same as Scala side like `val 
numRows = _numRows.max(0).min(Int.MaxValue - 1)`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-31 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192167547
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -291,37 +289,57 @@ class Dataset[T] private[sql](
 }
   }
 
+  rows = rows.map {
+_.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+  }
--- End diff --

Thanks, done in 9c6b3bb.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-31 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192167463
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def eagerEval(self):
--- End diff --

Thanks, done in 9c6b3bb.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-31 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192150368
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
--- End diff --

Yes that's right.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-31 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r192147588
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -291,37 +289,57 @@ class Dataset[T] private[sql](
 }
   }
 
+  rows = rows.map {
+_.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+  }
--- End diff --

Oh, I see, the pad rows only useful in console mode, so not need in html 
code. I'll do this ASAP.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191870129
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
+vertical = False
+return self._jdf.showString(
+console_row, console_truncate, vertical)
--- End diff --

Oh, I see. Good to know. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191869090
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
--- End diff --

So you want to restrict `__repr__` to always return the original string 
like `"DataFrame[key: bigint, value: string]"` after `_repr_html_` is called?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191854612
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def eagerEval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def maxNumRows(self):
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191854703
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def eagerEval(self):
+"""Returns true if the eager evaluation enabled.
+"""
+return self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+
+@property
+def maxNumRows(self):
+"""Returns the max row number for eager evaluation.
+"""
+return int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", "20"))
+
+@property
+def truncate(self):
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191854585
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def eagerEval(self):
--- End diff --

Btw, we should use snake case, e.g. `_eager_eval`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191853613
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -291,37 +289,57 @@ class Dataset[T] private[sql](
 }
   }
 
+  rows = rows.map {
+_.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+  }
--- End diff --

Seems like the truncation is already done when creating `rows` above?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191854114
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,70 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+@property
+def eagerEval(self):
--- End diff --

Maybe we need `_`, e.g. `_eagerEval`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191702754
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -291,37 +289,57 @@ class Dataset[T] private[sql](
 }
   }
 
+  rows = rows.map {
+_.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+  }
+}
+rows
+  }
+
+  /**
+   * Compose the string representing rows for output
+   *
+   * @param _numRows Number of rows to show
+   * @param truncate If set to more than 0, truncates strings to 
`truncate` characters and
+   *   all cells will be aligned right.
+   * @param vertical If set to true, prints output rows vertically (one 
line per column value).
+   */
+  private[sql] def showString(
+  _numRows: Int,
+  truncate: Int = 20,
+  vertical: Boolean = false): String = {
+val numRows = _numRows.max(0).min(Int.MaxValue - 1)
+// Get rows represented by Seq[Seq[String]], we may get one more line 
if it has more data.
+val rows = getRows(numRows, truncate, vertical)
+val fieldNames = rows.head
+val data = rows.tail
+
+val hasMoreData = data.length > numRows
+val dataRows = data.take(numRows)
+
+val sb = new StringBuilder
+if (!vertical) {
   // Create SeparateLine
-  val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
+  val sep: String = fieldNames.map(_.length).toArray
+.map("-" * _).addString(sb, "+", "+", "+\n").toString()
 
   // column names
-  rows.head.zipWithIndex.map { case (cell, i) =>
-if (truncate > 0) {
-  StringUtils.leftPad(cell, colWidths(i))
-} else {
-  StringUtils.rightPad(cell, colWidths(i))
-}
-  }.addString(sb, "|", "|", "|\n")
-
+  fieldNames.addString(sb, "|", "|", "|\n")
   sb.append(sep)
 
   // data
-  rows.tail.foreach {
-_.zipWithIndex.map { case (cell, i) =>
-  if (truncate > 0) {
-StringUtils.leftPad(cell.toString, colWidths(i))
-  } else {
-StringUtils.rightPad(cell.toString, colWidths(i))
-  }
-}.addString(sb, "|", "|", "|\n")
+  dataRows.foreach {
+_.addString(sb, "|", "|", "|\n")
--- End diff --

Thanks, done in d4bf01a


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191702826
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -231,16 +234,17 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * Compose the string representing rows for output
+   * Get rows represented in Sequence by specific truncate and vertical 
requirement.
*
-   * @param _numRows Number of rows to show
+   * @param numRows Number of rows to return
* @param truncate If set to more than 0, truncates strings to 
`truncate` characters and
*   all cells will be aligned right.
-   * @param vertical If set to true, prints output rows vertically (one 
line per column value).
+   * @param vertical If set to true, the rows to return don't need 
truncate.
--- End diff --

Yep, all abbreviation done in d4bf01a.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191702931
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
--- End diff --

Done in d4bf01a. Please check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191702675
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -291,37 +289,57 @@ class Dataset[T] private[sql](
 }
   }
 
+  rows = rows.map {
+_.zipWithIndex.map { case (cell, i) =>
--- End diff --

Thanks, done in d4bf01a.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191696389
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -291,37 +289,57 @@ class Dataset[T] private[sql](
 }
   }
 
+  rows = rows.map {
+_.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+  }
--- End diff --

Do this in getRows here is to reuse the truncate logic. I think its the 
same problem with we discuss in here:

![image](https://user-images.githubusercontent.com/4833765/40711061-d0762762-642c-11e8-9249-2465ee3e2536.png)
If we don't need truncate, we can move the logic and `minimumColWidth` in 
`showString`. I would like to hear your suggestions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191694631
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
+vertical = False
+return self._jdf.showString(
+console_row, console_truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you're
--- End diff --

ditto for abbreviation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191694501
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and repl you're using supports 
eager evaluation,
--- End diff --

ditto for abbreviation `you're`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191694169
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -231,16 +234,17 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * Compose the string representing rows for output
+   * Get rows represented in Sequence by specific truncate and vertical 
requirement.
*
-   * @param _numRows Number of rows to show
+   * @param numRows Number of rows to return
* @param truncate If set to more than 0, truncates strings to 
`truncate` characters and
*   all cells will be aligned right.
-   * @param vertical If set to true, prints output rows vertically (one 
line per column value).
+   * @param vertical If set to true, the rows to return don't need 
truncate.
--- End diff --

I would avoid abbreviation in the documentation. `don't` -> `do not`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191693929
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -291,37 +289,57 @@ class Dataset[T] private[sql](
 }
   }
 
+  rows = rows.map {
+_.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+  }
+}
+rows
+  }
+
+  /**
+   * Compose the string representing rows for output
+   *
+   * @param _numRows Number of rows to show
+   * @param truncate If set to more than 0, truncates strings to 
`truncate` characters and
+   *   all cells will be aligned right.
+   * @param vertical If set to true, prints output rows vertically (one 
line per column value).
+   */
+  private[sql] def showString(
+  _numRows: Int,
+  truncate: Int = 20,
+  vertical: Boolean = false): String = {
+val numRows = _numRows.max(0).min(Int.MaxValue - 1)
+// Get rows represented by Seq[Seq[String]], we may get one more line 
if it has more data.
+val rows = getRows(numRows, truncate, vertical)
+val fieldNames = rows.head
+val data = rows.tail
+
+val hasMoreData = data.length > numRows
+val dataRows = data.take(numRows)
+
+val sb = new StringBuilder
+if (!vertical) {
   // Create SeparateLine
-  val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
+  val sep: String = fieldNames.map(_.length).toArray
+.map("-" * _).addString(sb, "+", "+", "+\n").toString()
 
   // column names
-  rows.head.zipWithIndex.map { case (cell, i) =>
-if (truncate > 0) {
-  StringUtils.leftPad(cell, colWidths(i))
-} else {
-  StringUtils.rightPad(cell, colWidths(i))
-}
-  }.addString(sb, "|", "|", "|\n")
-
+  fieldNames.addString(sb, "|", "|", "|\n")
   sb.append(sep)
 
   // data
-  rows.tail.foreach {
-_.zipWithIndex.map { case (cell, i) =>
-  if (truncate > 0) {
-StringUtils.leftPad(cell.toString, colWidths(i))
-  } else {
-StringUtils.rightPad(cell.toString, colWidths(i))
-  }
-}.addString(sb, "|", "|", "|\n")
+  dataRows.foreach {
+_.addString(sb, "|", "|", "|\n")
--- End diff --

nit: we could just make it inlined


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191692934
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -291,37 +289,57 @@ class Dataset[T] private[sql](
 }
   }
 
+  rows = rows.map {
+_.zipWithIndex.map { case (cell, i) =>
--- End diff --

nit:

```
rows.map { row =>
  row.zipWithIndex...
``


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191687426
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -231,16 +234,17 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * Compose the string representing rows for output
+   * Get rows represented in Sequence by specific truncate and vertical 
requirement.
*
-   * @param _numRows Number of rows to show
+   * @param numRows Number of rows to return
* @param truncate If set to more than 0, truncates strings to 
`truncate` characters and
*   all cells will be aligned right.
-   * @param vertical If set to true, prints output rows vertically (one 
line per column value).
+   * @param vertical If set to true, the rows to return don't need 
truncate.
*/
-  private[sql] def showString(
-  _numRows: Int, truncate: Int = 20, vertical: Boolean = false): 
String = {
-val numRows = _numRows.max(0).min(Int.MaxValue - 1)
--- End diff --

Yep, thanks, my mistake here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191687183
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
+vertical = False
+return self._jdf.showString(
+console_row, console_truncate, vertical)
--- End diff --

Actually I firstly implement like this but we'll got an TypeError Exception
```
TypeError: __call__() got an unexpected keyword argument 'vertical'
```
The named arguments can't work during python call _jdf func.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191686126
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
--- End diff --

As comment before, this is the flag to check whether \_repr_html\_ is 
called.

![image](https://user-images.githubusercontent.com/4833765/40709259-2cbf6ede-6428-11e8-8cbe-e14e1450ec31.png)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191685525
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
--- End diff --

Just follow the doc in 
https://github.com/apache/spark/blob/master/python/pyspark/sql/context.py#L134, 
but finally we cast it to int, so unicode is useless? I remove it in next 
commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-30 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191685596
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
--- End diff --

OK.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191594326
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
--- End diff --

Do we need `u` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191594348
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191593987
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
+vertical = False
+return self._jdf.showString(
+console_row, console_truncate, vertical)
+else:
+return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by REPL 
you're
+using support eager evaluation with HTML.
+"""
+import cgi
+if not self._support_repr_html:
+self._support_repr_html = True
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if eager_eval:
+with SCCallSiteSync(self._sc) as css:
+vertical = False
+sock_info = self._jdf.getRowsToPython(
+console_row, console_truncate, vertical)
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191591921
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
--- End diff --

How about declaring those as `@property`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191591799
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
--- End diff --

What's `_support_repr_html` for?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191593927
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -351,8 +352,62 @@ def show(self, n=20, truncate=True, vertical=False):
 else:
 print(self._jdf.showString(n, int(truncate), vertical))
 
+def _get_repl_config(self):
+"""Return the configs for eager evaluation each time when __repr__ 
or
+_repr_html_ called by user or notebook.
+"""
+eager_eval = self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.enabled", "false").lower() == "true"
+console_row = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.maxNumRows", u"20"))
+console_truncate = int(self.sql_ctx.getConf(
+"spark.sql.repl.eagerEval.truncate", u"20"))
+return (eager_eval, console_row, console_truncate)
+
 def __repr__(self):
-return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
+(eager_eval, console_row, console_truncate) = 
self._get_repl_config()
+if not self._support_repr_html and eager_eval:
+vertical = False
+return self._jdf.showString(
+console_row, console_truncate, vertical)
--- End diff --

I guess

```python
return self._jdf.showString(
console_row, console_truncate, vertical=False)
```

should work without `vertical` variable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191591455
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -291,37 +289,57 @@ class Dataset[T] private[sql](
 }
   }
 
+  rows = rows.map {
+_.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+  }
--- End diff --

We should do this in `showString`? And we can move `minimumColWidth` into 
the `showString` in that case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-29 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191595442
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -231,16 +234,17 @@ class Dataset[T] private[sql](
   }
 
   /**
-   * Compose the string representing rows for output
+   * Get rows represented in Sequence by specific truncate and vertical 
requirement.
*
-   * @param _numRows Number of rows to show
+   * @param numRows Number of rows to return
* @param truncate If set to more than 0, truncates strings to 
`truncate` characters and
*   all cells will be aligned right.
-   * @param vertical If set to true, prints output rows vertically (one 
line per column value).
+   * @param vertical If set to true, the rows to return don't need 
truncate.
*/
-  private[sql] def showString(
-  _numRows: Int, truncate: Int = 20, vertical: Boolean = false): 
String = {
-val numRows = _numRows.max(0).min(Int.MaxValue - 1)
--- End diff --

Don't we need to check the `numRows` range when called from 
`getRowsToPython`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080316
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and repl you're using supports 
eager evaluation,
+dataframe will be ran automatically and html table will feedback the 
queries user have defined
+(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.showRows
+  20
+  
+Default number of rows in HTML table.
+  
+
+
+  spark.sql.repl.eagerEval.truncate
--- End diff --

Yep, I just want to keep the same behavior of `dataframe.show`.
```
That's useful for console output, but not so much for notebooks.
```
Notebooks aren't afraid for too many chaacters within a cell, so I just 
delete this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080194
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -237,9 +238,13 @@ class Dataset[T] private[sql](
* @param truncate If set to more than 0, truncates strings to 
`truncate` characters and
*   all cells will be aligned right.
* @param vertical If set to true, prints output rows vertically (one 
line per column value).
+   * @param html If set to true, return output as html table.
--- End diff --

@viirya @gatorsmile @rdblue Sorry for the late commit, the refactor do in 
94f3414. I spend some time on testing and implementing the transformation of 
rows between python and scala.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080082
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -358,6 +357,43 @@ class Dataset[T] private[sql](
 sb.toString()
   }
 
+  /**
+   * Transform current row string and append to builder
+   *
+   * @param row   Current row of string
+   * @param truncate  If set to more than 0, truncates strings to 
`truncate` characters and
+   *all cells will be aligned right.
+   * @param colWidths The width of each column
+   * @param html  If set to true, return output as html table.
+   * @param head  Set to true while current row is table head.
+   * @param sbStringBuilder for current row.
+   */
+  private[sql] def appendRowString(
+  row: Seq[String],
+  truncate: Int,
+  colWidths: Array[Int],
+  html: Boolean,
+  head: Boolean,
+  sb: StringBuilder): Unit = {
+val data = row.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell, colWidths(i))
+  } else {
+StringUtils.rightPad(cell, colWidths(i))
+  }
+}
+(html, head) match {
+  case (true, true) =>
+data.map(StringEscapeUtils.escapeHtml).addString(
+  sb, "", "\n", "\n")
--- End diff --

I change the format in python \_repr\_html\_ in 94f3414.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080049
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
 else:
-print(self._jdf.showString(n, int(truncate), vertical))
+print(self._jdf.showString(n, int(truncate), vertical, False))
--- End diff --

Fix in 94f3414.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080066
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
 else:
-print(self._jdf.showString(n, int(truncate), vertical))
+print(self._jdf.showString(n, int(truncate), vertical, False))
 
 def __repr__(self):
 return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
 
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by repr 
you're
--- End diff --

Thanks, change to REPL in 94f3414.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080057
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -3040,6 +3040,50 @@ def test_csv_sampling_ratio(self):
 .csv(rdd, samplingRatio=0.5).schema
 self.assertEquals(schema, StructType([StructField("_c0", 
IntegerType(), True)]))
 
+def _get_content(self, content):
+"""
+Strips leading spaces from content up to the first '|' in each 
line.
+"""
+import re
+pattern = re.compile(r'^ *\|', re.MULTILINE)
--- End diff --

Thanks! Fix it in 94f3414.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080044
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
--- End diff --

Thanks, fix in 94f3414.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080037
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
 else:
-print(self._jdf.showString(n, int(truncate), vertical))
+print(self._jdf.showString(n, int(truncate), vertical, False))
 
 def __repr__(self):
 return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
--- End diff --

Thanks for your reply, this implement in 94f3414.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-27 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r191080026
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and repl you're using supports 
eager evaluation,
+dataframe will be ran automatically and html table will feedback the 
queries user have defined
+(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.showRows
--- End diff --

Thanks, change it in 94f3414.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-24 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r190803873
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
 else:
-print(self._jdf.showString(n, int(truncate), vertical))
+print(self._jdf.showString(n, int(truncate), vertical, False))
--- End diff --

use named arguments for boolean flags




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-24 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r190803855
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
--- End diff --

use named arguments for boolean flags


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-24 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r190803772
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and repl you're using supports 
eager evaluation,
+dataframe will be ran automatically and html table will feedback the 
queries user have defined
+(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.showRows
+  20
+  
+Default number of rows in HTML table.
+  
+
+
+  spark.sql.repl.eagerEval.truncate
--- End diff --

maybe he wants to follow what dataframe.show does, which truncates num 
characters within a cell. That's useful for console output, but not so much for 
notebooks.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-24 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r190803641
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and repl you're using supports 
eager evaluation,
+dataframe will be ran automatically and html table will feedback the 
queries user have defined
+(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.showRows
--- End diff --

maxNumRows


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-24 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r190683568
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
 else:
-print(self._jdf.showString(n, int(truncate), vertical))
+print(self._jdf.showString(n, int(truncate), vertical, False))
 
 def __repr__(self):
 return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
--- End diff --

I agree that it would be better to respect 
`spark.sql.repr.eagerEval.enabled` here as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-24 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r190683035
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False):
  name | Bob
 """
 if isinstance(truncate, bool) and truncate:
-print(self._jdf.showString(n, 20, vertical))
+print(self._jdf.showString(n, 20, vertical, False))
 else:
-print(self._jdf.showString(n, int(truncate), vertical))
+print(self._jdf.showString(n, int(truncate), vertical, False))
 
 def __repr__(self):
 return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in 
self.dtypes))
 
+def _repr_html_(self):
+"""Returns a dataframe with html code when you enabled eager 
evaluation
+by 'spark.sql.repl.eagerEval.enabled', this only called by repr 
you're
--- End diff --

I think it works either way. REPL is better in my opinion because these 
settings should (ideally) apply when using any REPL.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-24 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21370#discussion_r190682693
  
--- Diff: docs/configuration.md ---
@@ -456,6 +456,29 @@ Apart from these, the following properties are also 
available, and may be useful
 from JVM to Python worker for every task.
   
 
+
+  spark.sql.repl.eagerEval.enabled
+  false
+  
+Enable eager evaluation or not. If true and repl you're using supports 
eager evaluation,
+dataframe will be ran automatically and html table will feedback the 
queries user have defined
+(see https://issues.apache.org/jira/browse/SPARK-24215";>SPARK-24215 for 
more details).
+  
+
+
+  spark.sql.repl.eagerEval.showRows
+  20
+  
+Default number of rows in HTML table.
+  
+
+
+  spark.sql.repl.eagerEval.truncate
--- End diff --

What is the difference between this and showRows? Why are there two 
properties?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >