Github user Stibbons commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14830#discussion_r93424188
  
    --- Diff: docs/streaming-programming-guide.md ---
    @@ -2105,7 +2105,7 @@ documentation), or set the `spark.default.parallelism`
     {:.no_toc}
     The overheads of data serialization can be reduced by tuning the 
serialization formats. In the case of streaming, there are two types of data 
that are being serialized.
     
    -* **Input data**: By default, the input data received through Receivers is 
stored in the executors' memory with 
[StorageLevel.MEMORY_AND_DISK_SER_2](api/scala/index.html#org.apache.spark.storage.StorageLevel$).
 That is, the data is serialized into bytes to reduce GC overheads, and 
replicated for tolerating executor failures. Also, the data is kept first in 
memory, and spilled over to disk only if the memory is insufficient to hold all 
of the input data necessary for the streaming computation. This serialization 
obviously has overheads -- the receiver must deserialize the received data and 
re-serialize it using Spark's serialization format. 
    +* **Input data**: By default, the input data received through Receivers is 
stored in the executors' memory with 
[StorageLevel.MEMORY_AND_DISK_SER_2](api/scala/index.html#org.apache.spark.storage.StorageLevel$).
 That is, the data is serialized into bytes to reduce GC overheads, and 
replicated for tolerating executor failures. Also, the data is kept first in 
memory, and spilled over to disk only if the memory is insufficient to hold all 
of the input data necessary for the streaming computation. This serialization 
obviously has overheads -- the receiver must deserialize the received data and 
re-serialize it using Spark's serialization format.
    --- End diff --
    
    there is an extra space at the end of the line, github doesn't display it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to