[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=138073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138073 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 25/Aug/18 06:15 Start Date: 25/Aug/18 06:15 Worklog Time Spent: 10m Work Description: JozoVilcek commented on issue #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251#issuecomment-415942808 @apilloud , yes by personal experience, #5808 is not fun. For me, the most straight forward solution would be not get rid of default `scheme` and never guess. It is not that difficult to always put `schema` to paths even if it is `file` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 138073) Time Spent: 2h 40m (was: 2.5h) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Ankur Goenka >Priority: Blocker > Fix For: 2.7.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=137917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-137917 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 24/Aug/18 18:08 Start Date: 24/Aug/18 18:08 Worklog Time Spent: 10m Work Description: apilloud closed pull request #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java index be89c9ec099..7de41c1174a 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java @@ -69,7 +69,7 @@ public static final String DEFAULT_SCHEME = "file"; private static final Pattern FILE_SCHEME_PATTERN = - Pattern.compile("(?[a-zA-Z][-a-zA-Z0-9+.]*)://.*"); + Pattern.compile("(?[a-zA-Z][-a-zA-Z0-9+.]*):/.*"); private static final Pattern GLOB_PATTERN = Pattern.compile("[*?{}]"); private static final AtomicReference> SCHEME_TO_FILESYSTEM = diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java index 22f71f6e09f..0fbeb71325d 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java @@ -196,6 +196,7 @@ public void testValidMatchNewResourceForLocalFileSystem() { @Test(expected = IllegalArgumentException.class) public void testInvalidSchemaMatchNewResource() { assertEquals("file", FileSystems.matchNewResource("invalidschema://tmp/f1", false)); +assertEquals("file", FileSystems.matchNewResource("c:/tmp/f1", false)); } private List toResourceIds(List paths, final boolean isDirectory) { This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 137917) Time Spent: 2.5h (was: 2h 20m) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 2.5h > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=137916=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-137916 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 24/Aug/18 18:07 Start Date: 24/Aug/18 18:07 Worklog Time Spent: 10m Work Description: apilloud commented on issue #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251#issuecomment-415838093 This is effectively a partial rollback of https://github.com/apache/beam/pull/5808 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 137916) Time Spent: 2h 20m (was: 2h 10m) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 2h 20m > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=137909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-137909 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 24/Aug/18 17:40 Start Date: 24/Aug/18 17:40 Worklog Time Spent: 10m Work Description: angoenka commented on issue #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251#issuecomment-415830568 From the java.net.URI docs, A hierarchical URI is subject to further parsing according to the syntax `[scheme:][//authority][path][?query][#fragment]` Which enforces // But to support HDFS and unblock our selves we should go with the rollback. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 137909) Time Spent: 2h 10m (was: 2h) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 2h 10m > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=137363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-137363 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 23/Aug/18 12:32 Start Date: 23/Aug/18 12:32 Worklog Time Spent: 10m Work Description: JozoVilcek commented on issue #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251#issuecomment-415397768 I am not sure how failed `GrpcDataServiceTest` relates to this PR This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 137363) Time Spent: 2h (was: 1h 50m) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 2h > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=137352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-137352 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 23/Aug/18 12:15 Start Date: 23/Aug/18 12:15 Worklog Time Spent: 10m Work Description: JozoVilcek opened a new pull request #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 137352) Time Spent: 1h 50m (was: 1h 40m) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 1h 50m > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=137351=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-137351 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 23/Aug/18 12:14 Start Date: 23/Aug/18 12:14 Worklog Time Spent: 10m Work Description: JozoVilcek closed pull request #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java index be89c9ec099..7de41c1174a 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java @@ -69,7 +69,7 @@ public static final String DEFAULT_SCHEME = "file"; private static final Pattern FILE_SCHEME_PATTERN = - Pattern.compile("(?[a-zA-Z][-a-zA-Z0-9+.]*)://.*"); + Pattern.compile("(?[a-zA-Z][-a-zA-Z0-9+.]*):/.*"); private static final Pattern GLOB_PATTERN = Pattern.compile("[*?{}]"); private static final AtomicReference> SCHEME_TO_FILESYSTEM = diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java index 22f71f6e09f..0fbeb71325d 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java @@ -196,6 +196,7 @@ public void testValidMatchNewResourceForLocalFileSystem() { @Test(expected = IllegalArgumentException.class) public void testInvalidSchemaMatchNewResource() { assertEquals("file", FileSystems.matchNewResource("invalidschema://tmp/f1", false)); +assertEquals("file", FileSystems.matchNewResource("c:/tmp/f1", false)); } private List toResourceIds(List paths, final boolean isDirectory) { This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 137351) Time Spent: 1h 40m (was: 1.5h) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 1h 40m > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136516 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 21/Aug/18 13:47 Start Date: 21/Aug/18 13:47 Worklog Time Spent: 10m Work Description: JozoVilcek edited a comment on issue #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251#issuecomment-414679509 Actually, after more thoughts, is this a non valid URI or ResourceId `hdfs:/`? Given that authority component is optional, extra `//` can be dropped. `java.net.URI` parse that just fine This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 136516) Time Spent: 1.5h (was: 1h 20m) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 1.5h > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136515 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 21/Aug/18 13:46 Start Date: 21/Aug/18 13:46 Worklog Time Spent: 10m Work Description: JozoVilcek commented on issue #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251#issuecomment-414679509 Actually, after more thoughts, is this a non valid URI or ResourceId `hdfs:/`. Given that authority component is optional, extra `//` can be dropped. `java.net.URI` parse that just fine This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 136515) Time Spent: 1h 20m (was: 1h 10m) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 1h 20m > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136513 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 21/Aug/18 13:42 Start Date: 21/Aug/18 13:42 Worklog Time Spent: 10m Work Description: JozoVilcek opened a new pull request #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 136513) Time Spent: 1h 10m (was: 1h) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 1h 10m > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136434 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 21/Aug/18 09:26 Start Date: 21/Aug/18 09:26 Worklog Time Spent: 10m Work Description: JozoVilcek commented on issue #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251#issuecomment-414611378 True, code is using the `hdfs:/some/path` when I enter paths without authority `hdfs:///some/path`. I looked around and hadoop filesystem does this. Commented about this here: https://issues.apache.org/jira/browse/BEAM-2277?focusedCommentId=16587202=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16587202 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 136434) Time Spent: 50m (was: 40m) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 50m > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136435 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 21/Aug/18 09:26 Start Date: 21/Aug/18 09:26 Worklog Time Spent: 10m Work Description: JozoVilcek closed pull request #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java index be89c9ec099..7de41c1174a 100644 --- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java +++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java @@ -69,7 +69,7 @@ public static final String DEFAULT_SCHEME = "file"; private static final Pattern FILE_SCHEME_PATTERN = - Pattern.compile("(?[a-zA-Z][-a-zA-Z0-9+.]*)://.*"); + Pattern.compile("(?[a-zA-Z][-a-zA-Z0-9+.]*):/.*"); private static final Pattern GLOB_PATTERN = Pattern.compile("[*?{}]"); private static final AtomicReference> SCHEME_TO_FILESYSTEM = diff --git a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java index 22f71f6e09f..0fbeb71325d 100644 --- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java +++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java @@ -196,6 +196,7 @@ public void testValidMatchNewResourceForLocalFileSystem() { @Test(expected = IllegalArgumentException.class) public void testInvalidSchemaMatchNewResource() { assertEquals("file", FileSystems.matchNewResource("invalidschema://tmp/f1", false)); +assertEquals("file", FileSystems.matchNewResource("c:/tmp/f1", false)); } private List toResourceIds(List paths, final boolean isDirectory) { This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 136435) Time Spent: 1h (was: 50m) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 1h > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136327 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 20/Aug/18 22:48 Start Date: 20/Aug/18 22:48 Worklog Time Spent: 10m Work Description: angoenka commented on issue #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251#issuecomment-414488403 HDFS file names are expected to start with "hdfs://" so I would fix the file name instead of regex. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 136327) Time Spent: 40m (was: 0.5h) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 40m > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136277 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 20/Aug/18 20:53 Start Date: 20/Aug/18 20:53 Worklog Time Spent: 10m Work Description: apilloud commented on issue #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251#issuecomment-414459555 FYI @angoenka @jkff This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 136277) Time Spent: 0.5h (was: 20m) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 0.5h > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136276 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 20/Aug/18 20:52 Start Date: 20/Aug/18 20:52 Worklog Time Spent: 10m Work Description: apilloud commented on issue #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251#issuecomment-414459331 From your bug report, it looks like you need `hdfs:/some/path` to work? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 136276) Time Spent: 20m (was: 10m) > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 20m > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change
[ https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136130 ] ASF GitHub Bot logged work on BEAM-5180: Author: ASF GitHub Bot Created on: 20/Aug/18 12:44 Start Date: 20/Aug/18 12:44 Worklog Time Spent: 10m Work Description: JozoVilcek opened a new pull request #6251: [BEAM-5180] Relax back restriction on parsing file scheme URL: https://github.com/apache/beam/pull/6251 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 136130) Time Spent: 10m Remaining Estimate: 0h > Broken FileResultCoder via parseSchema change > - > > Key: BEAM-5180 > URL: https://issues.apache.org/jira/browse/BEAM-5180 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.6.0 >Reporter: Jozef Vilcek >Assignee: Kenneth Knowles >Priority: Blocker > Time Spent: 10m > Remaining Estimate: 0h > > Recently this commit > [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384] > introduced more strict schema parsing which is breaking the contract between > _FileResultCoder_ and _FileSystems.matchNewResource()_. > Coder takes _ResourceId_ and serialize it via `_toString_` methods and then > relies on filesystem being able to parse it back again. Having strict > _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for > _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_` > I guess the _ResourceIdCoder_ is suffering the same problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)