[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-25 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=138073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-138073
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 25/Aug/18 06:15
Start Date: 25/Aug/18 06:15
Worklog Time Spent: 10m 
  Work Description: JozoVilcek commented on issue #6251: [BEAM-5180] Relax 
back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251#issuecomment-415942808
 
 
   @apilloud , yes by personal experience, #5808 is not fun. For me, the most 
straight forward solution would be not get rid of default `scheme` and never 
guess. It is not that difficult to always put `schema` to paths even if it is 
`file`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 138073)
Time Spent: 2h 40m  (was: 2.5h)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Ankur Goenka
>Priority: Blocker
> Fix For: 2.7.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=137917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-137917
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 24/Aug/18 18:08
Start Date: 24/Aug/18 18:08
Worklog Time Spent: 10m 
  Work Description: apilloud closed pull request #6251: [BEAM-5180] Relax 
back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
index be89c9ec099..7de41c1174a 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
@@ -69,7 +69,7 @@
 
   public static final String DEFAULT_SCHEME = "file";
   private static final Pattern FILE_SCHEME_PATTERN =
-  Pattern.compile("(?[a-zA-Z][-a-zA-Z0-9+.]*)://.*");
+  Pattern.compile("(?[a-zA-Z][-a-zA-Z0-9+.]*):/.*");
   private static final Pattern GLOB_PATTERN = Pattern.compile("[*?{}]");
 
   private static final AtomicReference> 
SCHEME_TO_FILESYSTEM =
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java
index 22f71f6e09f..0fbeb71325d 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java
@@ -196,6 +196,7 @@ public void testValidMatchNewResourceForLocalFileSystem() {
   @Test(expected = IllegalArgumentException.class)
   public void testInvalidSchemaMatchNewResource() {
 assertEquals("file", 
FileSystems.matchNewResource("invalidschema://tmp/f1", false));
+assertEquals("file", FileSystems.matchNewResource("c:/tmp/f1", false));
   }
 
   private List toResourceIds(List paths, final boolean 
isDirectory) {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 137917)
Time Spent: 2.5h  (was: 2h 20m)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=137916=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-137916
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 24/Aug/18 18:07
Start Date: 24/Aug/18 18:07
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #6251: [BEAM-5180] Relax 
back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251#issuecomment-415838093
 
 
   This is effectively a partial rollback of 
https://github.com/apache/beam/pull/5808
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 137916)
Time Spent: 2h 20m  (was: 2h 10m)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-24 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=137909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-137909
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 24/Aug/18 17:40
Start Date: 24/Aug/18 17:40
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #6251: [BEAM-5180] Relax 
back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251#issuecomment-415830568
 
 
   From the java.net.URI docs,
   
   A hierarchical URI is subject to further parsing according to the syntax
   
   `[scheme:][//authority][path][?query][#fragment]`
   Which enforces  //
   
   But to support HDFS and unblock our selves we should go with the rollback.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 137909)
Time Spent: 2h 10m  (was: 2h)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=137363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-137363
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 23/Aug/18 12:32
Start Date: 23/Aug/18 12:32
Worklog Time Spent: 10m 
  Work Description: JozoVilcek commented on issue #6251: [BEAM-5180] Relax 
back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251#issuecomment-415397768
 
 
   I am not sure how failed `GrpcDataServiceTest` relates to this PR


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 137363)
Time Spent: 2h  (was: 1h 50m)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=137352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-137352
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 23/Aug/18 12:15
Start Date: 23/Aug/18 12:15
Worklog Time Spent: 10m 
  Work Description: JozoVilcek opened a new pull request #6251: [BEAM-5180] 
Relax back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251
 
 
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 137352)
Time Spent: 1h 50m  (was: 1h 40m)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-23 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=137351=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-137351
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 23/Aug/18 12:14
Start Date: 23/Aug/18 12:14
Worklog Time Spent: 10m 
  Work Description: JozoVilcek closed pull request #6251: [BEAM-5180] Relax 
back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
index be89c9ec099..7de41c1174a 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
@@ -69,7 +69,7 @@
 
   public static final String DEFAULT_SCHEME = "file";
   private static final Pattern FILE_SCHEME_PATTERN =
-  Pattern.compile("(?[a-zA-Z][-a-zA-Z0-9+.]*)://.*");
+  Pattern.compile("(?[a-zA-Z][-a-zA-Z0-9+.]*):/.*");
   private static final Pattern GLOB_PATTERN = Pattern.compile("[*?{}]");
 
   private static final AtomicReference> 
SCHEME_TO_FILESYSTEM =
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java
index 22f71f6e09f..0fbeb71325d 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java
@@ -196,6 +196,7 @@ public void testValidMatchNewResourceForLocalFileSystem() {
   @Test(expected = IllegalArgumentException.class)
   public void testInvalidSchemaMatchNewResource() {
 assertEquals("file", 
FileSystems.matchNewResource("invalidschema://tmp/f1", false));
+assertEquals("file", FileSystems.matchNewResource("c:/tmp/f1", false));
   }
 
   private List toResourceIds(List paths, final boolean 
isDirectory) {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 137351)
Time Spent: 1h 40m  (was: 1.5h)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136516
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 21/Aug/18 13:47
Start Date: 21/Aug/18 13:47
Worklog Time Spent: 10m 
  Work Description: JozoVilcek edited a comment on issue #6251: [BEAM-5180] 
Relax back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251#issuecomment-414679509
 
 
   Actually, after more thoughts, is this a non valid URI or ResourceId 
`hdfs:/`? Given that authority component is optional, extra `//` can be 
dropped. `java.net.URI` parse that just fine


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136516)
Time Spent: 1.5h  (was: 1h 20m)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136515
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 21/Aug/18 13:46
Start Date: 21/Aug/18 13:46
Worklog Time Spent: 10m 
  Work Description: JozoVilcek commented on issue #6251: [BEAM-5180] Relax 
back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251#issuecomment-414679509
 
 
   Actually, after more thoughts, is this a non valid URI or ResourceId 
`hdfs:/`. Given that authority component is optional, extra `//` can be 
dropped. `java.net.URI` parse that just fine


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136515)
Time Spent: 1h 20m  (was: 1h 10m)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136513
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 21/Aug/18 13:42
Start Date: 21/Aug/18 13:42
Worklog Time Spent: 10m 
  Work Description: JozoVilcek opened a new pull request #6251: [BEAM-5180] 
Relax back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251
 
 
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136513)
Time Spent: 1h 10m  (was: 1h)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136434
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 21/Aug/18 09:26
Start Date: 21/Aug/18 09:26
Worklog Time Spent: 10m 
  Work Description: JozoVilcek commented on issue #6251: [BEAM-5180] Relax 
back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251#issuecomment-414611378
 
 
   True, code is using the `hdfs:/some/path` when I enter paths without 
authority `hdfs:///some/path`. 
   I looked around and hadoop filesystem does this. Commented about this here:
   
   
https://issues.apache.org/jira/browse/BEAM-2277?focusedCommentId=16587202=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16587202


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136434)
Time Spent: 50m  (was: 40m)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-21 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136435
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 21/Aug/18 09:26
Start Date: 21/Aug/18 09:26
Worklog Time Spent: 10m 
  Work Description: JozoVilcek closed pull request #6251: [BEAM-5180] Relax 
back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java 
b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
index be89c9ec099..7de41c1174a 100644
--- a/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
+++ b/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
@@ -69,7 +69,7 @@
 
   public static final String DEFAULT_SCHEME = "file";
   private static final Pattern FILE_SCHEME_PATTERN =
-  Pattern.compile("(?[a-zA-Z][-a-zA-Z0-9+.]*)://.*");
+  Pattern.compile("(?[a-zA-Z][-a-zA-Z0-9+.]*):/.*");
   private static final Pattern GLOB_PATTERN = Pattern.compile("[*?{}]");
 
   private static final AtomicReference> 
SCHEME_TO_FILESYSTEM =
diff --git 
a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java 
b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java
index 22f71f6e09f..0fbeb71325d 100644
--- a/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java
+++ b/sdks/java/core/src/test/java/org/apache/beam/sdk/io/FileSystemsTest.java
@@ -196,6 +196,7 @@ public void testValidMatchNewResourceForLocalFileSystem() {
   @Test(expected = IllegalArgumentException.class)
   public void testInvalidSchemaMatchNewResource() {
 assertEquals("file", 
FileSystems.matchNewResource("invalidschema://tmp/f1", false));
+assertEquals("file", FileSystems.matchNewResource("c:/tmp/f1", false));
   }
 
   private List toResourceIds(List paths, final boolean 
isDirectory) {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136435)
Time Spent: 1h  (was: 50m)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136327
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 20/Aug/18 22:48
Start Date: 20/Aug/18 22:48
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #6251: [BEAM-5180] Relax 
back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251#issuecomment-414488403
 
 
   HDFS file names are expected to start with "hdfs://" so I would fix the file 
name instead of regex.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136327)
Time Spent: 40m  (was: 0.5h)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136277=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136277
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 20/Aug/18 20:53
Start Date: 20/Aug/18 20:53
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #6251: [BEAM-5180] Relax 
back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251#issuecomment-414459555
 
 
   FYI @angoenka @jkff


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136277)
Time Spent: 0.5h  (was: 20m)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136276
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 20/Aug/18 20:52
Start Date: 20/Aug/18 20:52
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #6251: [BEAM-5180] Relax 
back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251#issuecomment-414459331
 
 
   From your bug report, it looks like you need `hdfs:/some/path` to work?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136276)
Time Spent: 20m  (was: 10m)

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-5180) Broken FileResultCoder via parseSchema change

2018-08-20 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5180?focusedWorklogId=136130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-136130
 ]

ASF GitHub Bot logged work on BEAM-5180:


Author: ASF GitHub Bot
Created on: 20/Aug/18 12:44
Start Date: 20/Aug/18 12:44
Worklog Time Spent: 10m 
  Work Description: JozoVilcek opened a new pull request #6251: [BEAM-5180] 
Relax back restriction on parsing file scheme
URL: https://github.com/apache/beam/pull/6251
 
 
   
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 136130)
Time Spent: 10m
Remaining Estimate: 0h

> Broken FileResultCoder via parseSchema change
> -
>
> Key: BEAM-5180
> URL: https://issues.apache.org/jira/browse/BEAM-5180
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.6.0
>Reporter: Jozef Vilcek
>Assignee: Kenneth Knowles
>Priority: Blocker
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Recently this commit
> [https://github.com/apache/beam/commit/3fff58c21f94415f3397e185377e36d3df662384]
> introduced more strict schema parsing which is breaking the contract between 
> _FileResultCoder_ and _FileSystems.matchNewResource()_.
> Coder takes _ResourceId_ and serialize it via `_toString_` methods and then 
> relies on filesystem being able to parse it back again. Having strict 
> _scheme://_ breaks this at least for Hadoop filesystem which use _URI_ for 
> _ResourceId_ and produce _toString()_ in form of `_hdfs:/some/path_`
> I guess the _ResourceIdCoder_ is suffering the same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)