[jira] [Updated] (BEAM-9354) How long does PubSubIO message deduplication last?
[ https://issues.apache.org/jira/browse/BEAM-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Webb updated BEAM-9354: Status: Open (was: Triage Needed) > How long does PubSubIO message deduplication last? > -- > > Key: BEAM-9354 > URL: https://issues.apache.org/jira/browse/BEAM-9354 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Tianzi Cai >Priority: P3 > Labels: gcp, pubsubio > > GCP documentation heavily > [promotes|https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub] > Beam's PubSubIO for Pub/Sub message deduplication. Yet nowhere in the > documentation, including the [source > code|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java], > tells users how long this deduplication is supposed to last. > In > [`PubsubIO.java`|https://github.com/apache/beam/blob/a24bc3bae54f089b93bd66a118bd4bf09dbc9254/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L842-L853]: > {code:java} > /** > * When reading from Cloud Pub/Sub where unique record identifiers are > provided as Pub/Sub > * message attributes, specifies the name of the attribute containing the > unique identifier. The > * value of the attribute can be any string that uniquely identifies this > record. > * > * Pub/Sub cannot guarantee that no duplicate data will be delivered > on the Pub/Sub stream. > * If {@code idAttribute} is not provided, Beam cannot guarantee that no > duplicate data will be > * delivered, and deduplication of the stream will be strictly best > effort. > */ > public Read withIdAttribute(String idAttribute) { > return toBuilder().setIdAttribute(idAttribute).build(); > } > {code} > This information here isn't enough for users to know if a second message, > published with the same custom IdAttribute as that of a first message, which > was published `x` minutes ago, would be deduplicated by the Dataflow runner. > Better documentation will help. I imagine a lot of users will wonder about > this and may even ask how to configure this period, but that will probably > need a separate ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9354) How long does PubSubIO message deduplication last?
[ https://issues.apache.org/jira/browse/BEAM-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beam JIRA Bot updated BEAM-9354: Priority: P3 (was: P2) > How long does PubSubIO message deduplication last? > -- > > Key: BEAM-9354 > URL: https://issues.apache.org/jira/browse/BEAM-9354 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Tianzi Cai >Priority: P3 > Labels: gcp, pubsubio, stale-P2 > > GCP documentation heavily > [promotes|https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub] > Beam's PubSubIO for Pub/Sub message deduplication. Yet nowhere in the > documentation, including the [source > code|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java], > tells users how long this deduplication is supposed to last. > In > [`PubsubIO.java`|https://github.com/apache/beam/blob/a24bc3bae54f089b93bd66a118bd4bf09dbc9254/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L842-L853]: > {code:java} > /** > * When reading from Cloud Pub/Sub where unique record identifiers are > provided as Pub/Sub > * message attributes, specifies the name of the attribute containing the > unique identifier. The > * value of the attribute can be any string that uniquely identifies this > record. > * > * Pub/Sub cannot guarantee that no duplicate data will be delivered > on the Pub/Sub stream. > * If {@code idAttribute} is not provided, Beam cannot guarantee that no > duplicate data will be > * delivered, and deduplication of the stream will be strictly best > effort. > */ > public Read withIdAttribute(String idAttribute) { > return toBuilder().setIdAttribute(idAttribute).build(); > } > {code} > This information here isn't enough for users to know if a second message, > published with the same custom IdAttribute as that of a first message, which > was published `x` minutes ago, would be deduplicated by the Dataflow runner. > Better documentation will help. I imagine a lot of users will wonder about > this and may even ask how to configure this period, but that will probably > need a separate ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9354) How long does PubSubIO message deduplication last?
[ https://issues.apache.org/jira/browse/BEAM-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beam JIRA Bot updated BEAM-9354: Labels: gcp pubsubio (was: gcp pubsubio stale-P2) > How long does PubSubIO message deduplication last? > -- > > Key: BEAM-9354 > URL: https://issues.apache.org/jira/browse/BEAM-9354 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Tianzi Cai >Priority: P3 > Labels: gcp, pubsubio > > GCP documentation heavily > [promotes|https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub] > Beam's PubSubIO for Pub/Sub message deduplication. Yet nowhere in the > documentation, including the [source > code|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java], > tells users how long this deduplication is supposed to last. > In > [`PubsubIO.java`|https://github.com/apache/beam/blob/a24bc3bae54f089b93bd66a118bd4bf09dbc9254/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L842-L853]: > {code:java} > /** > * When reading from Cloud Pub/Sub where unique record identifiers are > provided as Pub/Sub > * message attributes, specifies the name of the attribute containing the > unique identifier. The > * value of the attribute can be any string that uniquely identifies this > record. > * > * Pub/Sub cannot guarantee that no duplicate data will be delivered > on the Pub/Sub stream. > * If {@code idAttribute} is not provided, Beam cannot guarantee that no > duplicate data will be > * delivered, and deduplication of the stream will be strictly best > effort. > */ > public Read withIdAttribute(String idAttribute) { > return toBuilder().setIdAttribute(idAttribute).build(); > } > {code} > This information here isn't enough for users to know if a second message, > published with the same custom IdAttribute as that of a first message, which > was published `x` minutes ago, would be deduplicated by the Dataflow runner. > Better documentation will help. I imagine a lot of users will wonder about > this and may even ask how to configure this period, but that will probably > need a separate ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9354) How long does PubSubIO message deduplication last?
[ https://issues.apache.org/jira/browse/BEAM-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beam JIRA Bot updated BEAM-9354: Labels: gcp pubsubio stale-P2 (was: gcp pubsubio) > How long does PubSubIO message deduplication last? > -- > > Key: BEAM-9354 > URL: https://issues.apache.org/jira/browse/BEAM-9354 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Tianzi Cai >Priority: P2 > Labels: gcp, pubsubio, stale-P2 > > GCP documentation heavily > [promotes|https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub] > Beam's PubSubIO for Pub/Sub message deduplication. Yet nowhere in the > documentation, including the [source > code|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java], > tells users how long this deduplication is supposed to last. > In > [`PubsubIO.java`|https://github.com/apache/beam/blob/a24bc3bae54f089b93bd66a118bd4bf09dbc9254/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L842-L853]: > {code:java} > /** > * When reading from Cloud Pub/Sub where unique record identifiers are > provided as Pub/Sub > * message attributes, specifies the name of the attribute containing the > unique identifier. The > * value of the attribute can be any string that uniquely identifies this > record. > * > * Pub/Sub cannot guarantee that no duplicate data will be delivered > on the Pub/Sub stream. > * If {@code idAttribute} is not provided, Beam cannot guarantee that no > duplicate data will be > * delivered, and deduplication of the stream will be strictly best > effort. > */ > public Read withIdAttribute(String idAttribute) { > return toBuilder().setIdAttribute(idAttribute).build(); > } > {code} > This information here isn't enough for users to know if a second message, > published with the same custom IdAttribute as that of a first message, which > was published `x` minutes ago, would be deduplicated by the Dataflow runner. > Better documentation will help. I imagine a lot of users will wonder about > this and may even ask how to configure this period, but that will probably > need a separate ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9354) How long does PubSubIO message deduplication last?
[ https://issues.apache.org/jira/browse/BEAM-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Beam JIRA Bot updated BEAM-9354: Labels: gcp pubsubio (was: gcp pubsubio stale-assigned) > How long does PubSubIO message deduplication last? > -- > > Key: BEAM-9354 > URL: https://issues.apache.org/jira/browse/BEAM-9354 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Tianzi Cai >Priority: P2 > Labels: gcp, pubsubio > > GCP documentation heavily > [promotes|https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub] > Beam's PubSubIO for Pub/Sub message deduplication. Yet nowhere in the > documentation, including the [source > code|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java], > tells users how long this deduplication is supposed to last. > In > [`PubsubIO.java`|https://github.com/apache/beam/blob/a24bc3bae54f089b93bd66a118bd4bf09dbc9254/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L842-L853]: > {code:java} > /** > * When reading from Cloud Pub/Sub where unique record identifiers are > provided as Pub/Sub > * message attributes, specifies the name of the attribute containing the > unique identifier. The > * value of the attribute can be any string that uniquely identifies this > record. > * > * Pub/Sub cannot guarantee that no duplicate data will be delivered > on the Pub/Sub stream. > * If {@code idAttribute} is not provided, Beam cannot guarantee that no > duplicate data will be > * delivered, and deduplication of the stream will be strictly best > effort. > */ > public Read withIdAttribute(String idAttribute) { > return toBuilder().setIdAttribute(idAttribute).build(); > } > {code} > This information here isn't enough for users to know if a second message, > published with the same custom IdAttribute as that of a first message, which > was published `x` minutes ago, would be deduplicated by the Dataflow runner. > Better documentation will help. I imagine a lot of users will wonder about > this and may even ask how to configure this period, but that will probably > need a separate ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9354) How long does PubSubIO message deduplication last?
[ https://issues.apache.org/jira/browse/BEAM-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-9354: -- Labels: gcp pubsubio stale-assigned (was: gcp pubsubio) > How long does PubSubIO message deduplication last? > -- > > Key: BEAM-9354 > URL: https://issues.apache.org/jira/browse/BEAM-9354 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Tianzi Cai >Assignee: Reuven Lax >Priority: P2 > Labels: gcp, pubsubio, stale-assigned > > GCP documentation heavily > [promotes|https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub] > Beam's PubSubIO for Pub/Sub message deduplication. Yet nowhere in the > documentation, including the [source > code|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java], > tells users how long this deduplication is supposed to last. > In > [`PubsubIO.java`|https://github.com/apache/beam/blob/a24bc3bae54f089b93bd66a118bd4bf09dbc9254/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L842-L853]: > {code:java} > /** > * When reading from Cloud Pub/Sub where unique record identifiers are > provided as Pub/Sub > * message attributes, specifies the name of the attribute containing the > unique identifier. The > * value of the attribute can be any string that uniquely identifies this > record. > * > * Pub/Sub cannot guarantee that no duplicate data will be delivered > on the Pub/Sub stream. > * If {@code idAttribute} is not provided, Beam cannot guarantee that no > duplicate data will be > * delivered, and deduplication of the stream will be strictly best > effort. > */ > public Read withIdAttribute(String idAttribute) { > return toBuilder().setIdAttribute(idAttribute).build(); > } > {code} > This information here isn't enough for users to know if a second message, > published with the same custom IdAttribute as that of a first message, which > was published `x` minutes ago, would be deduplicated by the Dataflow runner. > Better documentation will help. I imagine a lot of users will wonder about > this and may even ask how to configure this period, but that will probably > need a separate ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005)