Re: [PR] #30903: Do not reemit data from impulse [beam]
je-ik merged PR #30905: URL: https://github.com/apache/beam/pull/30905 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] #30903: Do not reemit data from impulse [beam]
je-ik commented on code in PR #30905: URL: https://github.com/apache/beam/pull/30905#discussion_r1562135025 ## runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/source/bounded/FlinkBoundedSourceReader.java: ## @@ -61,8 +55,6 @@ */ public class FlinkBoundedSourceReader extends FlinkSourceReaderBase> { private static final Logger LOG = LoggerFactory.getLogger(FlinkBoundedSourceReader.class); - private static final VarLongCoder LONG_CODER = VarLongCoder.of(); - private final Map consumedFromSplit = new HashMap<>(); Review Comment: This is related, because this was introduced in an attempt to fix exactly this issue of avoiding re-emission of elements from Impulse. Unfortunately, this is not working as expected and - more seriously - can cause data-loss for sources that are not 'stable' - e.g. NoSQL databases, where the data in a split can change between restarts, which might cause skipping an element that was always present (but changed its ordered position). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] #30903: Do not reemit data from impulse [beam]
JozoVilcek commented on code in PR #30905: URL: https://github.com/apache/beam/pull/30905#discussion_r1562178562 ## runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/source/bounded/FlinkBoundedSourceReader.java: ## @@ -61,8 +55,6 @@ */ public class FlinkBoundedSourceReader extends FlinkSourceReaderBase> { private static final Logger LOG = LoggerFactory.getLogger(FlinkBoundedSourceReader.class); - private static final VarLongCoder LONG_CODER = VarLongCoder.of(); - private final Map consumedFromSplit = new HashMap<>(); Review Comment: Understood. Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] #30903: Do not reemit data from impulse [beam]
je-ik commented on code in PR #30905: URL: https://github.com/apache/beam/pull/30905#discussion_r1562135025 ## runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/source/bounded/FlinkBoundedSourceReader.java: ## @@ -61,8 +55,6 @@ */ public class FlinkBoundedSourceReader extends FlinkSourceReaderBase> { private static final Logger LOG = LoggerFactory.getLogger(FlinkBoundedSourceReader.class); - private static final VarLongCoder LONG_CODER = VarLongCoder.of(); - private final Map consumedFromSplit = new HashMap<>(); Review Comment: This is related, because this was introduce in an attempt to fix exactly this issue of avoiding re-emission of elements from Impulse. Unfortunately, this is not working as expected and - more seriously - can cause data-loss for sources that are not 'stable' - e.g. NoSQL databases, where the data in a split can change between restarts, which might cause skipping an element that was always present (but changed its ordered position). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] #30903: Do not reemit data from impulse [beam]
JozoVilcek commented on code in PR #30905: URL: https://github.com/apache/beam/pull/30905#discussion_r1562080961 ## runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/source/bounded/FlinkBoundedSourceReader.java: ## @@ -61,8 +55,6 @@ */ public class FlinkBoundedSourceReader extends FlinkSourceReaderBase> { private static final Logger LOG = LoggerFactory.getLogger(FlinkBoundedSourceReader.class); - private static final VarLongCoder LONG_CODER = VarLongCoder.of(); - private final Map consumedFromSplit = new HashMap<>(); Review Comment: Can you please elaborate how does removing tracking of consumption from state helps to fix the bug, to not reemit elements from impulse source? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] #30903: Do not reemit data from impulse [beam]
github-actions[bot] commented on PR #30905: URL: https://github.com/apache/beam/pull/30905#issuecomment-2045434378 Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] #30903: Do not reemit data from impulse [beam]
je-ik commented on PR #30905: URL: https://github.com/apache/beam/pull/30905#issuecomment-2045431614 R: @JozoVilcek -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] #30903: Do not reemit data from impulse [beam]
github-actions[bot] commented on PR #30905: URL: https://github.com/apache/beam/pull/30905#issuecomment-2045414641 Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`: R: @Abacn for label build. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org