[GitHub] spark pull request #15089: [SPARK-15621] [SQL] Support spilling for Python U...

rxin Thu, 29 Sep 2016 14:38:42 -0700

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15089#discussion_r81233633
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/BatchEvalPythonExec.scala
 ---
    @@ -37,9 +40,25 @@ import org.apache.spark.sql.types.{DataType, 
StructField, StructType}
      * Python evaluation works by sending the necessary (projected) input data 
via a socket to an
      * external Python process, and combine the result from the Python process 
with the original row.
      *
    - * For each row we send to Python, we also put it in a queue. For each 
output row from Python,
    + * For each row we send to Python, we also put it in a queue first. For 
each output row from Python,
      * we drain the queue to find the original input row. Note that if the 
Python process is way too
    - * slow, this could lead to the queue growing unbounded and eventually run 
out of memory.
    + * slow, this could lead to the queue growing unbounded and spill into 
disk when run out of memory.
    + *
    + * Here is a diagram to show how this works:
    + *
    + *            Upstream (from child)
    --- End diff --
    
    i suggest switching upstream and downstream and and put child operator 
(upstream) at the bottom, and parent operator (downstream) at the top to be 
more consistent with databases. basically the "tree" would face upward.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15089: [SPARK-15621] [SQL] Support spilling for Python U...

Reply via email to