Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/5886#issuecomment-99295131
  
    @vanzin , there is time interval from getting the first file's modification 
time to the last file's. Assume there are 3 files: F1, F2, F3. And before 
scanning, their modification times are TF1=100, TF2=101, TF3=102 respectively. 
    At time T1=103, we start scanning . 
    At time T2=104, we finished loading F1 mod time, starting to loading F2 mod 
time. 
    At time T3=107, we finished loading F2 mod time. At this point, 
`lastModifiedTime` is 101, which is equal to F2 mode time --- TF2. And during 
loading F2 mod time, there are two operations: 
    First, at time T4=105, contents written to F1, which leads to F1 mod time 
changing from TF1=100 to TF1'=105
    Second, at time T5=106, contents written to F3, which leads to F3 mod time 
changing from TF3=102 to TF3'=106.
    
    Then we continue to load F3 mode time, and at time  T6=108, we finished 
loading F3 mode time. At this point, `lastModifiedTime` is 106.
    
    So for the next round, we would not pick up F1 even it has been modified at 
time T4=105.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to