Re: [PR] #30903: Do not reemit data from impulse [beam]

2024-04-15 Thread via GitHub


je-ik merged PR #30905:
URL: https://github.com/apache/beam/pull/30905


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] #30903: Do not reemit data from impulse [beam]

2024-04-12 Thread via GitHub


je-ik commented on code in PR #30905:
URL: https://github.com/apache/beam/pull/30905#discussion_r1562135025


##
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/source/bounded/FlinkBoundedSourceReader.java:
##
@@ -61,8 +55,6 @@
  */
 public class FlinkBoundedSourceReader extends FlinkSourceReaderBase> {
   private static final Logger LOG = 
LoggerFactory.getLogger(FlinkBoundedSourceReader.class);
-  private static final VarLongCoder LONG_CODER = VarLongCoder.of();
-  private final Map consumedFromSplit = new HashMap<>();

Review Comment:
   This is related, because this was introduced in an attempt to fix exactly 
this issue of avoiding re-emission of elements from Impulse. Unfortunately, 
this is not working as expected and - more seriously - can cause data-loss for 
sources that are not 'stable' - e.g. NoSQL databases, where the data in a split 
can change between restarts, which might cause skipping an element that was 
always present (but changed its ordered position).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] #30903: Do not reemit data from impulse [beam]

2024-04-12 Thread via GitHub


JozoVilcek commented on code in PR #30905:
URL: https://github.com/apache/beam/pull/30905#discussion_r1562178562


##
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/source/bounded/FlinkBoundedSourceReader.java:
##
@@ -61,8 +55,6 @@
  */
 public class FlinkBoundedSourceReader extends FlinkSourceReaderBase> {
   private static final Logger LOG = 
LoggerFactory.getLogger(FlinkBoundedSourceReader.class);
-  private static final VarLongCoder LONG_CODER = VarLongCoder.of();
-  private final Map consumedFromSplit = new HashMap<>();

Review Comment:
   Understood. Thanks  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] #30903: Do not reemit data from impulse [beam]

2024-04-12 Thread via GitHub


je-ik commented on code in PR #30905:
URL: https://github.com/apache/beam/pull/30905#discussion_r1562135025


##
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/source/bounded/FlinkBoundedSourceReader.java:
##
@@ -61,8 +55,6 @@
  */
 public class FlinkBoundedSourceReader extends FlinkSourceReaderBase> {
   private static final Logger LOG = 
LoggerFactory.getLogger(FlinkBoundedSourceReader.class);
-  private static final VarLongCoder LONG_CODER = VarLongCoder.of();
-  private final Map consumedFromSplit = new HashMap<>();

Review Comment:
   This is related, because this was introduce in an attempt to fix exactly 
this issue of avoiding re-emission of elements from Impulse. Unfortunately, 
this is not working as expected and - more seriously - can cause data-loss for 
sources that are not 'stable' - e.g. NoSQL databases, where the data in a split 
can change between restarts, which might cause skipping an element that was 
always present (but changed its ordered position).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] #30903: Do not reemit data from impulse [beam]

2024-04-12 Thread via GitHub


JozoVilcek commented on code in PR #30905:
URL: https://github.com/apache/beam/pull/30905#discussion_r1562080961


##
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/io/source/bounded/FlinkBoundedSourceReader.java:
##
@@ -61,8 +55,6 @@
  */
 public class FlinkBoundedSourceReader extends FlinkSourceReaderBase> {
   private static final Logger LOG = 
LoggerFactory.getLogger(FlinkBoundedSourceReader.class);
-  private static final VarLongCoder LONG_CODER = VarLongCoder.of();
-  private final Map consumedFromSplit = new HashMap<>();

Review Comment:
   Can you please elaborate how does removing tracking of consumption from 
state helps to fix the bug, to not reemit elements from impulse source?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] #30903: Do not reemit data from impulse [beam]

2024-04-09 Thread via GitHub


github-actions[bot] commented on PR #30905:
URL: https://github.com/apache/beam/pull/30905#issuecomment-2045434378

   Stopping reviewer notifications for this pull request: review requested by 
someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] #30903: Do not reemit data from impulse [beam]

2024-04-09 Thread via GitHub


je-ik commented on PR #30905:
URL: https://github.com/apache/beam/pull/30905#issuecomment-2045431614

   R: @JozoVilcek 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] #30903: Do not reemit data from impulse [beam]

2024-04-09 Thread via GitHub


github-actions[bot] commented on PR #30905:
URL: https://github.com/apache/beam/pull/30905#issuecomment-2045414641

   Assigning reviewers. If you would like to opt out of this review, comment 
`assign to next reviewer`:
   
   R: @Abacn for label build.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org