[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

Stibbons Wed, 21 Dec 2016 03:51:55 -0800

Github user Stibbons commented on a diff in the pull request:

https://github.com/apache/spark/pull/14830#discussion_r93424188

--- Diff: docs/streaming-programming-guide.md ---
@@ -2105,7 +2105,7 @@ documentation), or set the `spark.default.parallelism`
{:.no_toc}
The overheads of data serialization can be reduced by tuning the
serialization formats. In the case of streaming, there are two types of data
that are being serialized.

-* **Input data**: By default, the input data received through Receivers is
stored in the executors' memory with
[StorageLevel.MEMORY_AND_DISK_SER_2](api/scala/index.html#org.apache.spark.storage.StorageLevel$).
That is, the data is serialized into bytes to reduce GC overheads, and
replicated for tolerating executor failures. Also, the data is kept first in
memory, and spilled over to disk only if the memory is insufficient to hold all
of the input data necessary for the streaming computation. This serialization
obviously has overheads -- the receiver must deserialize the received data and
re-serialize it using Spark's serialization format.
+* **Input data**: By default, the input data received through Receivers is
stored in the executors' memory with
[StorageLevel.MEMORY_AND_DISK_SER_2](api/scala/index.html#org.apache.spark.storage.StorageLevel$).
That is, the data is serialized into bytes to reduce GC overheads, and
replicated for tolerating executor failures. Also, the data is kept first in
memory, and spilled over to disk only if the memory is insufficient to hold all
of the input data necessary for the streaming computation. This serialization
obviously has overheads -- the receiver must deserialize the received data and
re-serialize it using Spark's serialization format.
--- End diff --

there is an extra space at the end of the line, github doesn't display it



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK][DOCS] import sort and auto...

Reply via email to