Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/5473#issuecomment-95122243
  
    Overall, I get the code. However it is hard to read because of the 
following.
    
    1. confusion on the meaning of "job", which could be streaming job or spark 
job. Please make sure to refer all the places in streaming code where you refer 
to spark job are named as `sparkJob`. This includes ids, infos, etc. in the 
StreamingJobProgressListener and BatchPage.
    
    2. This tuple (OutputOpId, JobId) is being passed around. It maybe cleaner 
to create something like a `BatchUIData` object (similar to `JobUIData`, 
`StageUIData`, `TaskUIData` in `JobProgressListener`). That will encapsulate 
all the batch and output op related data inside a single class, as opposed to 
being spread across multiple data structures. So `BatchUIData` will have 
outputId --> jobIds inside it. I think that is more intuitive as the output id 
and jobids are automatically scoped within the BatchUIData. This is safer as 
well for cleanup; when the `BatchUIData` object is cleared, all associated 
information is cleared.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to