[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

jerryshao Wed, 14 Sep 2016 05:42:14 -0700

Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/13513
  
    Thanks a lot @zsxwing and @frreiss  for your comments.
    
    For the slow scan problem of compact batch. Originally I planned to to not 
merge the latest batch as I did before, also suggested above. but with several 
different tries it is hard to implement with small changes. So for now I still 
choose the same implementation with a simple cache layer to overcome this 
problem, the basic compaction algorithm is still the same as 
`FileStreamSinkLog`. I think it is easier to maintain.
    
    For the problem of semantics broken. I realized that it is really a 
problem, but current code didn't touch it. So I changed to scan the compacted 
batch files to retrieve missing batches. It is a little time-consuming, and the 
current logic of `FileStreamSource` will not touch this part.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

Reply via email to