[ 
https://issues.apache.org/jira/browse/BEAM-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Kirpichov closed BEAM-1822.
----------------------------------
       Resolution: Duplicate
    Fix Version/s: Not applicable

> Improve handling of eventually-consistent filepatterns
> ------------------------------------------------------
>
>                 Key: BEAM-1822
>                 URL: https://issues.apache.org/jira/browse/BEAM-1822
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>             Fix For: Not applicable
>
>
> Reading from an eventually consistent filepattern (e.g. located in a 
> multi-regional Google Cloud Storage bucket, etc.) using FileBasedSource is 
> dangerous, because it may silently process fewer data than the user thinks, 
> in case not all files get returned by the match call.
> We should improve our handling of this case. I'd suggest to aim for 
> minimizing the chance of silent data loss. Here's a couple of things we could 
> do.
> - Let the user supply an expected number of files to be matched, and fail the 
> pipeline if the actual number is different. For special filepatterns like 
> XXX-of-YYY, we can autodetect the expected number.
> - Poll the filepattern for a while (perhaps for a period determined by the 
> underlying IOChannelFactory that knows the typical eventual consistency 
> convergence times of its filesystem), and either wait until it quiesces, or 
> fail the pipeline if it doesn't



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to