[jira] [Commented] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
[ https://issues.apache.org/jira/browse/BEAM-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405139#comment-16405139 ] Benjamin BENOIST commented on BEAM-3772: I had the issue with both Beam 2.2.0 and 2.3.0 > BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded > PCollection and FILE_LOADS > > > Key: BEAM-3772 > URL: https://issues.apache.org/jira/browse/BEAM-3772 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.2.0, 2.3.0 > Environment: Dataflow streaming pipeline >Reporter: Benjamin BENOIST >Assignee: Eugene Kirpichov >Priority: Major > > My workflow : KAFKA -> Dataflow streaming -> BigQuery > Given that having low-latency isn't important in my case, I use FILE_LOADS to > reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, > which is a table with the current hour as a suffix. > This _BigQueryIO.Write_ is configured like this : > {code:java} > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withMethod(Method.FILE_LOADS) > .withTriggeringFrequency(triggeringFrequency) > .withNumFileShards(100) > {code} > The first table is successfully created and is written to. But then the > following tables are never created and I get these exceptions: > {code:java} > (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job > with id prefix > 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, > reached max retries: 3, last failed load job: { > "configuration" : { > "load" : { > "createDisposition" : "CREATE_NEVER", > "destinationTable" : { > "datasetId" : "dev_mydataset", > "projectId" : "myproject-id", > "tableId" : "mytable_20180302_16" > }, > {code} > The _CreateDisposition_ used is _CREATE_NEVER_, contrary as > _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-3772) BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS
Benjamin BENOIST created BEAM-3772: -- Summary: BigQueryIO - Can't use DynamicDestination with CREATE_IF_NEEDED for unbounded PCollection and FILE_LOADS Key: BEAM-3772 URL: https://issues.apache.org/jira/browse/BEAM-3772 Project: Beam Issue Type: Bug Components: io-java-gcp Affects Versions: 2.3.0, 2.2.0 Environment: Dataflow streaming pipeline Reporter: Benjamin BENOIST Assignee: Chamikara Jayalath My workflow : KAFKA -> Dataflow streaming -> BigQuery Given that having low-latency isn't important in my case, I use FILE_LOADS to reduce the costs. I'm using _BigQueryIO.Write_ with a _DynamicDestination_, which is a table with the current hour as a suffix. This _BigQueryIO.Write_ is configured like this : {code:java} .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) .withMethod(Method.FILE_LOADS) .withTriggeringFrequency(triggeringFrequency) .withNumFileShards(100) {code} The first table is successfully created and is written to. But then the following tables are never created and I get these exceptions: {code:java} (99e5cd8c66414e7a): java.lang.RuntimeException: Failed to create load job with id prefix 5047f71312a94bf3a42ee5d67feede75_5295fbf25e1a7534f85e25dcaa9f4986_1_00023, reached max retries: 3, last failed load job: { "configuration" : { "load" : { "createDisposition" : "CREATE_NEVER", "destinationTable" : { "datasetId" : "dev_mydataset", "projectId" : "myproject-id", "tableId" : "mytable_20180302_16" }, {code} The _CreateDisposition_ used is _CREATE_NEVER_, contrary as _CREATE_IF_NEEDED_ as specified. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-3766) BigQueryIO - Must specify numFileShards when using FILE_LOADS with unbounded PCollection
Benjamin BENOIST created BEAM-3766: -- Summary: BigQueryIO - Must specify numFileShards when using FILE_LOADS with unbounded PCollection Key: BEAM-3766 URL: https://issues.apache.org/jira/browse/BEAM-3766 Project: Beam Issue Type: Bug Components: io-java-gcp Affects Versions: 2.3.0 Reporter: Benjamin BENOIST Assignee: Chamikara Jayalath Since Beam v2.2 it's possible to use [FILE_LOADS|https://beam.apache.org/documentation/sdks/javadoc/2.2.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html#FILE_LOADS]. The documentation states that we have to specify _withTriggeringFrequency_ when using it, but doesn't talk about _withNumFileShards_, whereas if we don't specify it we get the below exception: {code:java} Exception in thread "main" java.lang.IllegalArgumentException at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) at org.apache.beam.sdk.io.gcp.bigquery.BatchLoads.expandTriggered(BatchLoads.java:209) at org.apache.beam.sdk.io.gcp.bigquery.BatchLoads.expand(BatchLoads.java:546) at org.apache.beam.sdk.io.gcp.bigquery.BatchLoads.expand(BatchLoads.java:79) at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537) at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:472) at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:286) at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.expandTyped(BigQueryIO.java:1550) at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.expand(BigQueryIO.java:1497) at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.expand(BigQueryIO.java:980) at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537) at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491) at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299) at com.travelaudience.data.job.rtbtobigquery.Main$.main(Main.scala:74) at com.travelaudience.data.job.rtbtobigquery.Main.main(Main.scala) {code} Either default _numFileShards_ should be used or it should be precised in the documentation that this has to be set. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-3754) KAFKA - Can't have commitOffsetsInFinalizeEnabled set to false with KafkaIO.readBytes()
[ https://issues.apache.org/jira/browse/BEAM-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin BENOIST updated BEAM-3754: --- Summary: KAFKA - Can't have commitOffsetsInFinalizeEnabled set to false with KafkaIO.readBytes() (was: Can't have commitOffsetsInFinalizeEnabled set to false with KafkaIO.readBytes()) > KAFKA - Can't have commitOffsetsInFinalizeEnabled set to false with > KafkaIO.readBytes() > --- > > Key: BEAM-3754 > URL: https://issues.apache.org/jira/browse/BEAM-3754 > Project: Beam > Issue Type: Bug > Components: io-java-kafka >Affects Versions: 2.3.0 > Environment: Dataflow pipeline using Kafka as a Sink >Reporter: Benjamin BENOIST >Assignee: Raghu Angadi >Priority: Minor > Labels: patch > Original Estimate: 2h > Remaining Estimate: 2h > > Beam v2.3 introduces finalized offsets, in order to reduce the gaps or > duplicate processing of records while restarting a pipeline. > _read()_ sets this parameter to false [by > default|https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L307] > but _readBytes()_ > [doesn't|https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L282], > thus creating an exception: > {noformat} > Exception in thread "main" java.lang.IllegalStateException: Missing required > properties: commitOffsetsInFinalizeEnabled > at > org.apache.beam.sdk.io.kafka.AutoValue_KafkaIO_Read$Builder.build(AutoValue_KafkaIO_Read.java:344) > at > org.apache.beam.sdk.io.kafka.KafkaIO.readBytes(KafkaIO.java:291){noformat} > The parameter can be set to true with _commitOffsetsInFinalize()_ but never > to false. > Using _read()_ in the definition of _readBytes()_ could prevent this kind of > error in the future: > {code:java} > public static Read readBytes() { > return read() > .setKeyDeserializer(ByteArrayDeserializer.class) > .setValueDeserializer(ByteArrayDeserializer.class) > .build(); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-3754) KAFKA - Can't set commitOffsetsInFinalizeEnabled to false with KafkaIO.readBytes()
[ https://issues.apache.org/jira/browse/BEAM-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin BENOIST updated BEAM-3754: --- Summary: KAFKA - Can't set commitOffsetsInFinalizeEnabled to false with KafkaIO.readBytes() (was: KAFKA - Can't have commitOffsetsInFinalizeEnabled set to false with KafkaIO.readBytes()) > KAFKA - Can't set commitOffsetsInFinalizeEnabled to false with > KafkaIO.readBytes() > -- > > Key: BEAM-3754 > URL: https://issues.apache.org/jira/browse/BEAM-3754 > Project: Beam > Issue Type: Bug > Components: io-java-kafka >Affects Versions: 2.3.0 > Environment: Dataflow pipeline using Kafka as a Sink >Reporter: Benjamin BENOIST >Assignee: Raghu Angadi >Priority: Minor > Labels: patch > Original Estimate: 2h > Remaining Estimate: 2h > > Beam v2.3 introduces finalized offsets, in order to reduce the gaps or > duplicate processing of records while restarting a pipeline. > _read()_ sets this parameter to false [by > default|https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L307] > but _readBytes()_ > [doesn't|https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L282], > thus creating an exception: > {noformat} > Exception in thread "main" java.lang.IllegalStateException: Missing required > properties: commitOffsetsInFinalizeEnabled > at > org.apache.beam.sdk.io.kafka.AutoValue_KafkaIO_Read$Builder.build(AutoValue_KafkaIO_Read.java:344) > at > org.apache.beam.sdk.io.kafka.KafkaIO.readBytes(KafkaIO.java:291){noformat} > The parameter can be set to true with _commitOffsetsInFinalize()_ but never > to false. > Using _read()_ in the definition of _readBytes()_ could prevent this kind of > error in the future: > {code:java} > public static Read readBytes() { > return read() > .setKeyDeserializer(ByteArrayDeserializer.class) > .setValueDeserializer(ByteArrayDeserializer.class) > .build(); > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-3754) Can't have commitOffsetsInFinalizeEnabled set to false with KafkaIO.readBytes()
Benjamin BENOIST created BEAM-3754: -- Summary: Can't have commitOffsetsInFinalizeEnabled set to false with KafkaIO.readBytes() Key: BEAM-3754 URL: https://issues.apache.org/jira/browse/BEAM-3754 Project: Beam Issue Type: Bug Components: io-java-kafka Affects Versions: 2.3.0 Environment: Dataflow pipeline using Kafka as a Sink Reporter: Benjamin BENOIST Assignee: Raghu Angadi Beam v2.3 introduces finalized offsets, in order to reduce the gaps or duplicate processing of records while restarting a pipeline. _read()_ sets this parameter to false [by default|https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L307] but _readBytes()_ [doesn't|https://github.com/apache/beam/blob/release-2.3.0/sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaIO.java#L282], thus creating an exception: {noformat} Exception in thread "main" java.lang.IllegalStateException: Missing required properties: commitOffsetsInFinalizeEnabled at org.apache.beam.sdk.io.kafka.AutoValue_KafkaIO_Read$Builder.build(AutoValue_KafkaIO_Read.java:344) at org.apache.beam.sdk.io.kafka.KafkaIO.readBytes(KafkaIO.java:291){noformat} The parameter can be set to true with _commitOffsetsInFinalize()_ but never to false. Using _read()_ in the definition of _readBytes()_ could prevent this kind of error in the future: {code:java} public static Read readBytes() { return read() .setKeyDeserializer(ByteArrayDeserializer.class) .setValueDeserializer(ByteArrayDeserializer.class) .build(); }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-3224) Add support for path with braces for Google Cloud Storage
[ https://issues.apache.org/jira/browse/BEAM-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin BENOIST updated BEAM-3224: --- Description: At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/{file1,file2,file3}}} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} was: At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/\{file1,file2,file3\\}}} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} > Add support for path with braces for Google Cloud Storage > - > > Key: BEAM-3224 > URL: https://issues.apache.org/jira/browse/BEAM-3224 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Benjamin BENOIST >Assignee: Chamikara Jayalath >Priority: Minor > Original Estimate: 3h > Remaining Estimate: 3h > > At the moment we can not use braces in Google Cloud Storage paths, as > explained > [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. > The path is backed by a file pattern defined as a Java glob and is then then > expanded to a regex in > _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ > in the _wildcardToRegexp_ function. > {{gs://bucket/{file1,file2,file3}}} should match > {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (BEAM-3224) Add support for path with braces for Google Cloud Storage
[ https://issues.apache.org/jira/browse/BEAM-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin BENOIST updated BEAM-3224: --- Description: At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/\{file1,file2,file3\\}}} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} was: At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/\{file1,file2,file3\} }} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} > Add support for path with braces for Google Cloud Storage > - > > Key: BEAM-3224 > URL: https://issues.apache.org/jira/browse/BEAM-3224 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Benjamin BENOIST >Assignee: Chamikara Jayalath >Priority: Minor > Original Estimate: 3h > Remaining Estimate: 3h > > At the moment we can not use braces in Google Cloud Storage paths, as > explained > [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. > The path is backed by a file pattern defined as a Java glob and is then then > expanded to a regex in > _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ > in the _wildcardToRegexp_ function. > {{gs://bucket/\{file1,file2,file3\\}}} should match {{gs://bucket/file1}}, > {{gs://bucket/file2}} and {{gs://bucket/file3}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (BEAM-3224) Add support for path with braces for Google Cloud Storage
[ https://issues.apache.org/jira/browse/BEAM-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin BENOIST updated BEAM-3224: --- Description: At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/\{file1,file2,file3\} }} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} was: At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/\{file1,file2,file3\}}} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} > Add support for path with braces for Google Cloud Storage > - > > Key: BEAM-3224 > URL: https://issues.apache.org/jira/browse/BEAM-3224 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Benjamin BENOIST >Assignee: Chamikara Jayalath >Priority: Minor > Original Estimate: 3h > Remaining Estimate: 3h > > At the moment we can not use braces in Google Cloud Storage paths, as > explained > [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. > The path is backed by a file pattern defined as a Java glob and is then then > expanded to a regex in > _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ > in the _wildcardToRegexp_ function. > {{gs://bucket/\{file1,file2,file3\} }} should match {{gs://bucket/file1}}, > {{gs://bucket/file2}} and {{gs://bucket/file3}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (BEAM-3224) Add support for path with braces for Google Cloud Storage
[ https://issues.apache.org/jira/browse/BEAM-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin BENOIST updated BEAM-3224: --- Description: At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/\{file1,file2,file3\}}} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} was: At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/{file1,file2,file3}. }} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} > Add support for path with braces for Google Cloud Storage > - > > Key: BEAM-3224 > URL: https://issues.apache.org/jira/browse/BEAM-3224 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Benjamin BENOIST >Assignee: Chamikara Jayalath >Priority: Minor > Original Estimate: 3h > Remaining Estimate: 3h > > At the moment we can not use braces in Google Cloud Storage paths, as > explained > [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. > The path is backed by a file pattern defined as a Java glob and is then then > expanded to a regex in > _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ > in the _wildcardToRegexp_ function. > {{gs://bucket/\{file1,file2,file3\}}} should match {{gs://bucket/file1}}, > {{gs://bucket/file2}} and {{gs://bucket/file3}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (BEAM-3224) Add support for path with braces for Google Cloud Storage
[ https://issues.apache.org/jira/browse/BEAM-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin BENOIST updated BEAM-3224: --- Description: At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/{file1,file2,file3}. }} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} was: At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/{file1,file2,file3} }} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} > Add support for path with braces for Google Cloud Storage > - > > Key: BEAM-3224 > URL: https://issues.apache.org/jira/browse/BEAM-3224 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Benjamin BENOIST >Assignee: Chamikara Jayalath >Priority: Minor > Original Estimate: 3h > Remaining Estimate: 3h > > At the moment we can not use braces in Google Cloud Storage paths, as > explained > [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. > The path is backed by a file pattern defined as a Java glob and is then then > expanded to a regex in > _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ > in the _wildcardToRegexp_ function. > {{gs://bucket/{file1,file2,file3}. }} should match {{gs://bucket/file1}}, > {{gs://bucket/file2}} and {{gs://bucket/file3}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (BEAM-3224) Add support for path with braces for Google Cloud Storage
[ https://issues.apache.org/jira/browse/BEAM-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin BENOIST updated BEAM-3224: --- Description: At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/{file1,file2,file3} }} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} was: At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/{file1,file2,file3}}} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} > Add support for path with braces for Google Cloud Storage > - > > Key: BEAM-3224 > URL: https://issues.apache.org/jira/browse/BEAM-3224 > Project: Beam > Issue Type: Improvement > Components: sdk-java-gcp >Reporter: Benjamin BENOIST >Assignee: Chamikara Jayalath >Priority: Minor > Original Estimate: 3h > Remaining Estimate: 3h > > At the moment we can not use braces in Google Cloud Storage paths, as > explained > [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. > The path is backed by a file pattern defined as a Java glob and is then then > expanded to a regex in > _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ > in the _wildcardToRegexp_ function. > {{gs://bucket/{file1,file2,file3} }} should match {{gs://bucket/file1}}, > {{gs://bucket/file2}} and {{gs://bucket/file3}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (BEAM-3224) Add support for path with braces for Google Cloud Storage
Benjamin BENOIST created BEAM-3224: -- Summary: Add support for path with braces for Google Cloud Storage Key: BEAM-3224 URL: https://issues.apache.org/jira/browse/BEAM-3224 Project: Beam Issue Type: Improvement Components: sdk-java-gcp Reporter: Benjamin BENOIST Assignee: Chamikara Jayalath Priority: Minor At the moment we can not use braces in Google Cloud Storage paths, as explained [here|https://stackoverflow.com/questions/46977552/filebasedsource-not-able-to-understand-a-glob-corresponding-to-several-specific]. The path is backed by a file pattern defined as a Java glob and is then then expanded to a regex in _sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/util/GcsUtil.java_ in the _wildcardToRegexp_ function. {{gs://bucket/{file1,file2,file3}}} should match {{gs://bucket/file1}}, {{gs://bucket/file2}} and {{gs://bucket/file3}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)