[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=387748&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387748 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 14/Feb/20 23:50 Start Date: 14/Feb/20 23:50 Worklog Time Spent: 10m Work Description: udim commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387748) Time Spent: 3.5h (was: 3h 20m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=387737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387737 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 14/Feb/20 22:58 Start Date: 14/Feb/20 22:58 Worklog Time Spent: 10m Work Description: udim commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223#issuecomment-586511225 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387737) Time Spent: 3h 20m (was: 3h 10m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=387017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387017 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 14/Feb/20 01:17 Start Date: 14/Feb/20 01:17 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223#issuecomment-586051119 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387017) Time Spent: 3h 10m (was: 3h) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=386264&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386264 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 12/Feb/20 22:03 Start Date: 12/Feb/20 22:03 Worklog Time Spent: 10m Work Description: udim commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223#issuecomment-585440374 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386264) Time Spent: 3h (was: 2h 50m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=385421&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385421 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 11/Feb/20 19:58 Start Date: 11/Feb/20 19:58 Worklog Time Spent: 10m Work Description: udim commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223#issuecomment-584822664 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385421) Time Spent: 2h 50m (was: 2h 40m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=385377&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385377 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 11/Feb/20 19:15 Start Date: 11/Feb/20 19:15 Worklog Time Spent: 10m Work Description: udim commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223#issuecomment-584803384 Run PythonFormatter PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385377) Time Spent: 2.5h (was: 2h 20m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=385378&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385378 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 11/Feb/20 19:15 Start Date: 11/Feb/20 19:15 Worklog Time Spent: 10m Work Description: udim commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223#issuecomment-584803433 Run PythonFormatter PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385378) Time Spent: 2h 40m (was: 2.5h) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=385365&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385365 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 11/Feb/20 18:52 Start Date: 11/Feb/20 18:52 Worklog Time Spent: 10m Work Description: udim commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223#issuecomment-584792946 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 385365) Time Spent: 2h 20m (was: 2h 10m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=384610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384610 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 10/Feb/20 18:04 Start Date: 10/Feb/20 18:04 Worklog Time Spent: 10m Work Description: udim commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223#issuecomment-584253928 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384610) Time Spent: 2h 10m (was: 2h) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=384607&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384607 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 10/Feb/20 17:54 Start Date: 10/Feb/20 17:54 Worklog Time Spent: 10m Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223#discussion_r377221348 ## File path: sdks/python/apache_beam/io/hadoopfilesystem.py ## @@ -163,19 +181,25 @@ def join(self, base_url, *paths): Returns: Full url after combining all the passed components. """ -basepath = self._parse_url(base_url) -return _HDFS_PREFIX + self._join(basepath, *paths) +server, basepath = self._parse_url(base_url) +# TODO full_urls check and test +return _HDFS_PREFIX + self._join(server, basepath, *paths) def _join(self, basepath, *paths): return posixpath.join(basepath, *paths) def split(self, url): -rel_path = self._parse_url(url) +server, rel_path = self._parse_url(url) +if server is None: + server = '' +else: + server = '/' + server Review comment: Nevermind. I think if we consider hfds as a constraint this is fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 384607) Time Spent: 2h (was: 1h 50m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=383877&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-383877 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 07/Feb/20 23:35 Start Date: 07/Feb/20 23:35 Worklog Time Spent: 10m Work Description: udim commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223#issuecomment-583660975 R: @zhitaoli, @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 383877) Time Spent: 1h 50m (was: 1h 40m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=383310&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-383310 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 07/Feb/20 01:37 Start Date: 07/Feb/20 01:37 Worklog Time Spent: 10m Work Description: udim commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option (wip) URL: https://github.com/apache/beam/pull/10223#discussion_r376174956 ## File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py ## @@ -323,7 +375,7 @@ def test_create_success(self): url = self.fs.join(self.tmpdir, 'new_file') handle = self.fs.create(url) self.assertIsNotNone(handle) -url = self.fs._parse_url(url) +_, url = self.fs._parse_url(url) Review comment: There will be a separate `test_parse_url` to test these return values. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 383310) Time Spent: 1h 40m (was: 1.5h) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=383301&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-383301 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 07/Feb/20 00:56 Start Date: 07/Feb/20 00:56 Worklog Time Spent: 10m Work Description: udim commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option (wip) URL: https://github.com/apache/beam/pull/10223#discussion_r375595989 ## File path: sdks/python/apache_beam/io/hadoopfilesystem.py ## @@ -115,42 +116,59 @@ def __init__(self, pipeline_options): hdfs_host = hdfs_options.hdfs_host hdfs_port = hdfs_options.hdfs_port hdfs_user = hdfs_options.hdfs_user + self.full_urls = hdfs_options.hdfs_full_urls Review comment: done (still working on a commit) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 383301) Time Spent: 1.5h (was: 1h 20m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=383300&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-383300 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 07/Feb/20 00:56 Start Date: 07/Feb/20 00:56 Worklog Time Spent: 10m Work Description: udim commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option (wip) URL: https://github.com/apache/beam/pull/10223#discussion_r375601513 ## File path: sdks/python/apache_beam/io/hadoopfilesystem.py ## @@ -163,19 +181,25 @@ def join(self, base_url, *paths): Returns: Full url after combining all the passed components. """ -basepath = self._parse_url(base_url) -return _HDFS_PREFIX + self._join(basepath, *paths) +server, basepath = self._parse_url(base_url) +# TODO full_urls check and test +return _HDFS_PREFIX + self._join(server, basepath, *paths) def _join(self, basepath, *paths): return posixpath.join(basepath, *paths) def split(self, url): -rel_path = self._parse_url(url) +server, rel_path = self._parse_url(url) +if server is None: + server = '' +else: + server = '/' + server Review comment: `hdfs://` URLs always use `/` as separators, hence the use posixpath.join instead of os.path.join in this module. Can you give me an example URL with `\` that you use in Windows, and the name of the tool or client that supports it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 383300) Time Spent: 1.5h (was: 1h 20m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=383302&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-383302 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 07/Feb/20 00:56 Start Date: 07/Feb/20 00:56 Worklog Time Spent: 10m Work Description: udim commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option (wip) URL: https://github.com/apache/beam/pull/10223#discussion_r375607987 ## File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py ## @@ -579,6 +630,19 @@ def test_dict_options_missing(self): } ) + def test_dict_options_full_urls(self): +pipeline_options = { +'hdfs_host': '', +'hdfs_port': 0, +'hdfs_user': '', +'hdfs_full_urls': 'invalid', +} + +with +self.fs = hdfs.HadoopFileSystem(pipeline_options=pipeline_options) +self.assertFalse(self.fs.full_urls) Review comment: This was incomplete code This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 383302) Time Spent: 1.5h (was: 1h 20m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=381962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-381962 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 05/Feb/20 00:16 Start Date: 05/Feb/20 00:16 Worklog Time Spent: 10m Work Description: zhitaoli commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option (wip) URL: https://github.com/apache/beam/pull/10223#issuecomment-582180187 ping? is it possible to get this proper merged? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 381962) Time Spent: 1h 20m (was: 1h 10m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=380600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-380600 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 02/Feb/20 22:33 Start Date: 02/Feb/20 22:33 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option (wip) URL: https://github.com/apache/beam/pull/10223#issuecomment-581184946 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 380600) Time Spent: 1h 10m (was: 1h) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=353807&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-353807 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 04/Dec/19 21:34 Start Date: 04/Dec/19 21:34 Worklog Time Spent: 10m Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option (wip) URL: https://github.com/apache/beam/pull/10223#discussion_r353994642 ## File path: sdks/python/apache_beam/io/hadoopfilesystem.py ## @@ -163,19 +181,25 @@ def join(self, base_url, *paths): Returns: Full url after combining all the passed components. """ -basepath = self._parse_url(base_url) -return _HDFS_PREFIX + self._join(basepath, *paths) +server, basepath = self._parse_url(base_url) +# TODO full_urls check and test +return _HDFS_PREFIX + self._join(server, basepath, *paths) def _join(self, basepath, *paths): return posixpath.join(basepath, *paths) def split(self, url): -rel_path = self._parse_url(url) +server, rel_path = self._parse_url(url) +if server is None: + server = '' +else: + server = '/' + server Review comment: Is this only expected to ever work in posix? If not, should we use `os.path.join` (i.e, for Windows)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 353807) Time Spent: 1h (was: 50m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=353805&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-353805 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 04/Dec/19 21:33 Start Date: 04/Dec/19 21:33 Worklog Time Spent: 10m Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option (wip) URL: https://github.com/apache/beam/pull/10223#discussion_r353994642 ## File path: sdks/python/apache_beam/io/hadoopfilesystem.py ## @@ -163,19 +181,25 @@ def join(self, base_url, *paths): Returns: Full url after combining all the passed components. """ -basepath = self._parse_url(base_url) -return _HDFS_PREFIX + self._join(basepath, *paths) +server, basepath = self._parse_url(base_url) +# TODO full_urls check and test +return _HDFS_PREFIX + self._join(server, basepath, *paths) def _join(self, basepath, *paths): return posixpath.join(basepath, *paths) def split(self, url): -rel_path = self._parse_url(url) +server, rel_path = self._parse_url(url) +if server is None: + server = '' +else: + server = '/' + server Review comment: Is this only expected to ever work in posix? If not, should we use `os.path.join`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 353805) Time Spent: 50m (was: 40m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=353803&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-353803 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 04/Dec/19 21:32 Start Date: 04/Dec/19 21:32 Worklog Time Spent: 10m Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option (wip) URL: https://github.com/apache/beam/pull/10223#discussion_r353993812 ## File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py ## @@ -579,6 +630,19 @@ def test_dict_options_missing(self): } ) + def test_dict_options_full_urls(self): +pipeline_options = { +'hdfs_host': '', +'hdfs_port': 0, +'hdfs_user': '', +'hdfs_full_urls': 'invalid', +} + +with +self.fs = hdfs.HadoopFileSystem(pipeline_options=pipeline_options) +self.assertFalse(self.fs.full_urls) Review comment: Is this `with` a typo? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 353803) Time Spent: 40m (was: 0.5h) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=353801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-353801 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 04/Dec/19 21:30 Start Date: 04/Dec/19 21:30 Worklog Time Spent: 10m Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option (wip) URL: https://github.com/apache/beam/pull/10223#discussion_r353993086 ## File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py ## @@ -323,7 +375,7 @@ def test_create_success(self): url = self.fs.join(self.tmpdir, 'new_file') handle = self.fs.create(url) self.assertIsNotNone(handle) -url = self.fs._parse_url(url) +_, url = self.fs._parse_url(url) Review comment: Test `server` is None, here and below? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 353801) Time Spent: 0.5h (was: 20m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=353800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-353800 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 04/Dec/19 21:27 Start Date: 04/Dec/19 21:27 Worklog Time Spent: 10m Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option (wip) URL: https://github.com/apache/beam/pull/10223#discussion_r353991891 ## File path: sdks/python/apache_beam/io/hadoopfilesystem.py ## @@ -115,42 +116,59 @@ def __init__(self, pipeline_options): hdfs_host = hdfs_options.hdfs_host hdfs_port = hdfs_options.hdfs_port hdfs_user = hdfs_options.hdfs_user + self.full_urls = hdfs_options.hdfs_full_urls Review comment: Make this private? `self._full_urls` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 353800) Time Spent: 20m (was: 10m) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=350010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-350010 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 26/Nov/19 20:52 Start Date: 26/Nov/19 20:52 Worklog Time Spent: 10m Work Description: udim commented on pull request #10223: [BEAM-8399] Add --hdfs_full_urls option (wip) URL: https://github.com/apache/beam/pull/10223 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit