[jira] [Updated] (BEAM-9354) How long does PubSubIO message deduplication last?

2021-09-14 Thread Jeff Webb (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Webb updated BEAM-9354:

Status: Open  (was: Triage Needed)

> How long does PubSubIO message deduplication last?
> --
>
> Key: BEAM-9354
> URL: https://issues.apache.org/jira/browse/BEAM-9354
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Tianzi Cai
>Priority: P3
>  Labels: gcp, pubsubio
>
> GCP documentation heavily 
> [promotes|https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub]
>  Beam's PubSubIO for Pub/Sub message deduplication. Yet nowhere in the 
> documentation, including the [source 
> code|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java],
>  tells users how long this deduplication is supposed to last. 
> In 
> [`PubsubIO.java`|https://github.com/apache/beam/blob/a24bc3bae54f089b93bd66a118bd4bf09dbc9254/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L842-L853]:
> {code:java}
> /**
>  * When reading from Cloud Pub/Sub where unique record identifiers are 
> provided as Pub/Sub
>  * message attributes, specifies the name of the attribute containing the 
> unique identifier. The
>  * value of the attribute can be any string that uniquely identifies this 
> record.
>  *
>  * Pub/Sub cannot guarantee that no duplicate data will be delivered 
> on the Pub/Sub stream.
>  * If {@code idAttribute} is not provided, Beam cannot guarantee that no 
> duplicate data will be
>  * delivered, and deduplication of the stream will be strictly best 
> effort.
>  */
> public Read withIdAttribute(String idAttribute) {
>   return toBuilder().setIdAttribute(idAttribute).build();
> }
> {code}
> This information here isn't enough for users to know if a second message, 
> published with the same custom IdAttribute as that of a first message, which 
> was published `x` minutes ago, would be deduplicated by the Dataflow runner. 
> Better documentation will help. I imagine a lot of users will wonder about 
> this and may even ask how to configure this period, but that will probably 
> need a separate ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9354) How long does PubSubIO message deduplication last?

2020-08-25 Thread Beam JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beam JIRA Bot updated BEAM-9354:

Priority: P3  (was: P2)

> How long does PubSubIO message deduplication last?
> --
>
> Key: BEAM-9354
> URL: https://issues.apache.org/jira/browse/BEAM-9354
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Tianzi Cai
>Priority: P3
>  Labels: gcp, pubsubio, stale-P2
>
> GCP documentation heavily 
> [promotes|https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub]
>  Beam's PubSubIO for Pub/Sub message deduplication. Yet nowhere in the 
> documentation, including the [source 
> code|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java],
>  tells users how long this deduplication is supposed to last. 
> In 
> [`PubsubIO.java`|https://github.com/apache/beam/blob/a24bc3bae54f089b93bd66a118bd4bf09dbc9254/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L842-L853]:
> {code:java}
> /**
>  * When reading from Cloud Pub/Sub where unique record identifiers are 
> provided as Pub/Sub
>  * message attributes, specifies the name of the attribute containing the 
> unique identifier. The
>  * value of the attribute can be any string that uniquely identifies this 
> record.
>  *
>  * Pub/Sub cannot guarantee that no duplicate data will be delivered 
> on the Pub/Sub stream.
>  * If {@code idAttribute} is not provided, Beam cannot guarantee that no 
> duplicate data will be
>  * delivered, and deduplication of the stream will be strictly best 
> effort.
>  */
> public Read withIdAttribute(String idAttribute) {
>   return toBuilder().setIdAttribute(idAttribute).build();
> }
> {code}
> This information here isn't enough for users to know if a second message, 
> published with the same custom IdAttribute as that of a first message, which 
> was published `x` minutes ago, would be deduplicated by the Dataflow runner. 
> Better documentation will help. I imagine a lot of users will wonder about 
> this and may even ask how to configure this period, but that will probably 
> need a separate ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9354) How long does PubSubIO message deduplication last?

2020-08-25 Thread Beam JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beam JIRA Bot updated BEAM-9354:

Labels: gcp pubsubio  (was: gcp pubsubio stale-P2)

> How long does PubSubIO message deduplication last?
> --
>
> Key: BEAM-9354
> URL: https://issues.apache.org/jira/browse/BEAM-9354
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Tianzi Cai
>Priority: P3
>  Labels: gcp, pubsubio
>
> GCP documentation heavily 
> [promotes|https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub]
>  Beam's PubSubIO for Pub/Sub message deduplication. Yet nowhere in the 
> documentation, including the [source 
> code|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java],
>  tells users how long this deduplication is supposed to last. 
> In 
> [`PubsubIO.java`|https://github.com/apache/beam/blob/a24bc3bae54f089b93bd66a118bd4bf09dbc9254/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L842-L853]:
> {code:java}
> /**
>  * When reading from Cloud Pub/Sub where unique record identifiers are 
> provided as Pub/Sub
>  * message attributes, specifies the name of the attribute containing the 
> unique identifier. The
>  * value of the attribute can be any string that uniquely identifies this 
> record.
>  *
>  * Pub/Sub cannot guarantee that no duplicate data will be delivered 
> on the Pub/Sub stream.
>  * If {@code idAttribute} is not provided, Beam cannot guarantee that no 
> duplicate data will be
>  * delivered, and deduplication of the stream will be strictly best 
> effort.
>  */
> public Read withIdAttribute(String idAttribute) {
>   return toBuilder().setIdAttribute(idAttribute).build();
> }
> {code}
> This information here isn't enough for users to know if a second message, 
> published with the same custom IdAttribute as that of a first message, which 
> was published `x` minutes ago, would be deduplicated by the Dataflow runner. 
> Better documentation will help. I imagine a lot of users will wonder about 
> this and may even ask how to configure this period, but that will probably 
> need a separate ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9354) How long does PubSubIO message deduplication last?

2020-08-10 Thread Beam JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beam JIRA Bot updated BEAM-9354:

Labels: gcp pubsubio stale-P2  (was: gcp pubsubio)

> How long does PubSubIO message deduplication last?
> --
>
> Key: BEAM-9354
> URL: https://issues.apache.org/jira/browse/BEAM-9354
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Tianzi Cai
>Priority: P2
>  Labels: gcp, pubsubio, stale-P2
>
> GCP documentation heavily 
> [promotes|https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub]
>  Beam's PubSubIO for Pub/Sub message deduplication. Yet nowhere in the 
> documentation, including the [source 
> code|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java],
>  tells users how long this deduplication is supposed to last. 
> In 
> [`PubsubIO.java`|https://github.com/apache/beam/blob/a24bc3bae54f089b93bd66a118bd4bf09dbc9254/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L842-L853]:
> {code:java}
> /**
>  * When reading from Cloud Pub/Sub where unique record identifiers are 
> provided as Pub/Sub
>  * message attributes, specifies the name of the attribute containing the 
> unique identifier. The
>  * value of the attribute can be any string that uniquely identifies this 
> record.
>  *
>  * Pub/Sub cannot guarantee that no duplicate data will be delivered 
> on the Pub/Sub stream.
>  * If {@code idAttribute} is not provided, Beam cannot guarantee that no 
> duplicate data will be
>  * delivered, and deduplication of the stream will be strictly best 
> effort.
>  */
> public Read withIdAttribute(String idAttribute) {
>   return toBuilder().setIdAttribute(idAttribute).build();
> }
> {code}
> This information here isn't enough for users to know if a second message, 
> published with the same custom IdAttribute as that of a first message, which 
> was published `x` minutes ago, would be deduplicated by the Dataflow runner. 
> Better documentation will help. I imagine a lot of users will wonder about 
> this and may even ask how to configure this period, but that will probably 
> need a separate ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9354) How long does PubSubIO message deduplication last?

2020-06-10 Thread Beam JIRA Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beam JIRA Bot updated BEAM-9354:

Labels: gcp pubsubio  (was: gcp pubsubio stale-assigned)

> How long does PubSubIO message deduplication last?
> --
>
> Key: BEAM-9354
> URL: https://issues.apache.org/jira/browse/BEAM-9354
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Tianzi Cai
>Priority: P2
>  Labels: gcp, pubsubio
>
> GCP documentation heavily 
> [promotes|https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub]
>  Beam's PubSubIO for Pub/Sub message deduplication. Yet nowhere in the 
> documentation, including the [source 
> code|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java],
>  tells users how long this deduplication is supposed to last. 
> In 
> [`PubsubIO.java`|https://github.com/apache/beam/blob/a24bc3bae54f089b93bd66a118bd4bf09dbc9254/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L842-L853]:
> {code:java}
> /**
>  * When reading from Cloud Pub/Sub where unique record identifiers are 
> provided as Pub/Sub
>  * message attributes, specifies the name of the attribute containing the 
> unique identifier. The
>  * value of the attribute can be any string that uniquely identifies this 
> record.
>  *
>  * Pub/Sub cannot guarantee that no duplicate data will be delivered 
> on the Pub/Sub stream.
>  * If {@code idAttribute} is not provided, Beam cannot guarantee that no 
> duplicate data will be
>  * delivered, and deduplication of the stream will be strictly best 
> effort.
>  */
> public Read withIdAttribute(String idAttribute) {
>   return toBuilder().setIdAttribute(idAttribute).build();
> }
> {code}
> This information here isn't enough for users to know if a second message, 
> published with the same custom IdAttribute as that of a first message, which 
> was published `x` minutes ago, would be deduplicated by the Dataflow runner. 
> Better documentation will help. I imagine a lot of users will wonder about 
> this and may even ask how to configure this period, but that will probably 
> need a separate ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9354) How long does PubSubIO message deduplication last?

2020-06-01 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles updated BEAM-9354:
--
Labels: gcp pubsubio stale-assigned  (was: gcp pubsubio)

> How long does PubSubIO message deduplication last?
> --
>
> Key: BEAM-9354
> URL: https://issues.apache.org/jira/browse/BEAM-9354
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Tianzi Cai
>Assignee: Reuven Lax
>Priority: P2
>  Labels: gcp, pubsubio, stale-assigned
>
> GCP documentation heavily 
> [promotes|https://cloud.google.com/dataflow/docs/concepts/streaming-with-cloud-pubsub]
>  Beam's PubSubIO for Pub/Sub message deduplication. Yet nowhere in the 
> documentation, including the [source 
> code|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java],
>  tells users how long this deduplication is supposed to last. 
> In 
> [`PubsubIO.java`|https://github.com/apache/beam/blob/a24bc3bae54f089b93bd66a118bd4bf09dbc9254/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.java#L842-L853]:
> {code:java}
> /**
>  * When reading from Cloud Pub/Sub where unique record identifiers are 
> provided as Pub/Sub
>  * message attributes, specifies the name of the attribute containing the 
> unique identifier. The
>  * value of the attribute can be any string that uniquely identifies this 
> record.
>  *
>  * Pub/Sub cannot guarantee that no duplicate data will be delivered 
> on the Pub/Sub stream.
>  * If {@code idAttribute} is not provided, Beam cannot guarantee that no 
> duplicate data will be
>  * delivered, and deduplication of the stream will be strictly best 
> effort.
>  */
> public Read withIdAttribute(String idAttribute) {
>   return toBuilder().setIdAttribute(idAttribute).build();
> }
> {code}
> This information here isn't enough for users to know if a second message, 
> published with the same custom IdAttribute as that of a first message, which 
> was published `x` minutes ago, would be deduplicated by the Dataflow runner. 
> Better documentation will help. I imagine a lot of users will wonder about 
> this and may even ask how to configure this period, but that will probably 
> need a separate ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)