[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=387748&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387748
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 14/Feb/20 23:50
Start Date: 14/Feb/20 23:50
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387748)
Time Spent: 3.5h  (was: 3h 20m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=387737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387737
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 14/Feb/20 22:58
Start Date: 14/Feb/20 22:58
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#issuecomment-586511225
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387737)
Time Spent: 3h 20m  (was: 3h 10m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=387017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387017
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 14/Feb/20 01:17
Start Date: 14/Feb/20 01:17
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#issuecomment-586051119
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387017)
Time Spent: 3h 10m  (was: 3h)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=386264&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386264
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 12/Feb/20 22:03
Start Date: 12/Feb/20 22:03
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#issuecomment-585440374
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386264)
Time Spent: 3h  (was: 2h 50m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=385421&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385421
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 11/Feb/20 19:58
Start Date: 11/Feb/20 19:58
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#issuecomment-584822664
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385421)
Time Spent: 2h 50m  (was: 2h 40m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=385377&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385377
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 11/Feb/20 19:15
Start Date: 11/Feb/20 19:15
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#issuecomment-584803384
 
 
   Run PythonFormatter PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385377)
Time Spent: 2.5h  (was: 2h 20m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=385378&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385378
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 11/Feb/20 19:15
Start Date: 11/Feb/20 19:15
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#issuecomment-584803433
 
 
   Run PythonFormatter PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385378)
Time Spent: 2h 40m  (was: 2.5h)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=385365&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385365
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 11/Feb/20 18:52
Start Date: 11/Feb/20 18:52
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#issuecomment-584792946
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385365)
Time Spent: 2h 20m  (was: 2h 10m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=384610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384610
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 10/Feb/20 18:04
Start Date: 10/Feb/20 18:04
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#issuecomment-584253928
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 384610)
Time Spent: 2h 10m  (was: 2h)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=384607&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-384607
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 10/Feb/20 17:54
Start Date: 10/Feb/20 17:54
Worklog Time Spent: 10m 
  Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] 
Add --hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#discussion_r377221348
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem.py
 ##
 @@ -163,19 +181,25 @@ def join(self, base_url, *paths):
 Returns:
   Full url after combining all the passed components.
 """
-basepath = self._parse_url(base_url)
-return _HDFS_PREFIX + self._join(basepath, *paths)
+server, basepath = self._parse_url(base_url)
+# TODO full_urls check and test
+return _HDFS_PREFIX + self._join(server, basepath, *paths)
 
   def _join(self, basepath, *paths):
 return posixpath.join(basepath, *paths)
 
   def split(self, url):
-rel_path = self._parse_url(url)
+server, rel_path = self._parse_url(url)
+if server is None:
+  server = ''
+else:
+  server = '/' + server
 
 Review comment:
   Nevermind. I think if we consider hfds as a constraint this is fine.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 384607)
Time Spent: 2h  (was: 1h 50m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=383877&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-383877
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 07/Feb/20 23:35
Start Date: 07/Feb/20 23:35
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#issuecomment-583660975
 
 
   R: @zhitaoli, @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 383877)
Time Spent: 1h 50m  (was: 1h 40m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=383310&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-383310
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 07/Feb/20 01:37
Start Date: 07/Feb/20 01:37
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10223: [BEAM-8399] Add 
--hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#discussion_r376174956
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py
 ##
 @@ -323,7 +375,7 @@ def test_create_success(self):
 url = self.fs.join(self.tmpdir, 'new_file')
 handle = self.fs.create(url)
 self.assertIsNotNone(handle)
-url = self.fs._parse_url(url)
+_, url = self.fs._parse_url(url)
 
 Review comment:
   There will be a separate `test_parse_url` to test these return values.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 383310)
Time Spent: 1h 40m  (was: 1.5h)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=383301&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-383301
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 07/Feb/20 00:56
Start Date: 07/Feb/20 00:56
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10223: [BEAM-8399] Add 
--hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#discussion_r375595989
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem.py
 ##
 @@ -115,42 +116,59 @@ def __init__(self, pipeline_options):
   hdfs_host = hdfs_options.hdfs_host
   hdfs_port = hdfs_options.hdfs_port
   hdfs_user = hdfs_options.hdfs_user
+  self.full_urls = hdfs_options.hdfs_full_urls
 
 Review comment:
   done (still working on a commit)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 383301)
Time Spent: 1.5h  (was: 1h 20m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=383300&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-383300
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 07/Feb/20 00:56
Start Date: 07/Feb/20 00:56
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10223: [BEAM-8399] Add 
--hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#discussion_r375601513
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem.py
 ##
 @@ -163,19 +181,25 @@ def join(self, base_url, *paths):
 Returns:
   Full url after combining all the passed components.
 """
-basepath = self._parse_url(base_url)
-return _HDFS_PREFIX + self._join(basepath, *paths)
+server, basepath = self._parse_url(base_url)
+# TODO full_urls check and test
+return _HDFS_PREFIX + self._join(server, basepath, *paths)
 
   def _join(self, basepath, *paths):
 return posixpath.join(basepath, *paths)
 
   def split(self, url):
-rel_path = self._parse_url(url)
+server, rel_path = self._parse_url(url)
+if server is None:
+  server = ''
+else:
+  server = '/' + server
 
 Review comment:
   `hdfs://` URLs always use `/` as separators, hence the use posixpath.join 
instead of os.path.join in this module.
   
   Can you give me an example URL with `\` that you use in Windows, and the 
name of the tool or client that supports it?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 383300)
Time Spent: 1.5h  (was: 1h 20m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=383302&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-383302
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 07/Feb/20 00:56
Start Date: 07/Feb/20 00:56
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10223: [BEAM-8399] Add 
--hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#discussion_r375607987
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py
 ##
 @@ -579,6 +630,19 @@ def test_dict_options_missing(self):
   }
   )
 
+  def test_dict_options_full_urls(self):
+pipeline_options = {
+'hdfs_host': '',
+'hdfs_port': 0,
+'hdfs_user': '',
+'hdfs_full_urls': 'invalid',
+}
+
+with
+self.fs = hdfs.HadoopFileSystem(pipeline_options=pipeline_options)
+self.assertFalse(self.fs.full_urls)
 
 Review comment:
   This was incomplete code
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 383302)
Time Spent: 1.5h  (was: 1h 20m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=381962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-381962
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 05/Feb/20 00:16
Start Date: 05/Feb/20 00:16
Worklog Time Spent: 10m 
  Work Description: zhitaoli commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#issuecomment-582180187
 
 
   ping? is it possible to get this proper merged?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 381962)
Time Spent: 1h 20m  (was: 1h 10m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=380600&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-380600
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 02/Feb/20 22:33
Start Date: 02/Feb/20 22:33
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#issuecomment-581184946
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 380600)
Time Spent: 1h 10m  (was: 1h)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2019-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=353807&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-353807
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 04/Dec/19 21:34
Start Date: 04/Dec/19 21:34
Worklog Time Spent: 10m 
  Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] 
Add --hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#discussion_r353994642
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem.py
 ##
 @@ -163,19 +181,25 @@ def join(self, base_url, *paths):
 Returns:
   Full url after combining all the passed components.
 """
-basepath = self._parse_url(base_url)
-return _HDFS_PREFIX + self._join(basepath, *paths)
+server, basepath = self._parse_url(base_url)
+# TODO full_urls check and test
+return _HDFS_PREFIX + self._join(server, basepath, *paths)
 
   def _join(self, basepath, *paths):
 return posixpath.join(basepath, *paths)
 
   def split(self, url):
-rel_path = self._parse_url(url)
+server, rel_path = self._parse_url(url)
+if server is None:
+  server = ''
+else:
+  server = '/' + server
 
 Review comment:
   Is this only expected to ever work in posix? If not, should we use 
`os.path.join` (i.e, for Windows)?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 353807)
Time Spent: 1h  (was: 50m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2019-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=353805&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-353805
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 04/Dec/19 21:33
Start Date: 04/Dec/19 21:33
Worklog Time Spent: 10m 
  Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] 
Add --hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#discussion_r353994642
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem.py
 ##
 @@ -163,19 +181,25 @@ def join(self, base_url, *paths):
 Returns:
   Full url after combining all the passed components.
 """
-basepath = self._parse_url(base_url)
-return _HDFS_PREFIX + self._join(basepath, *paths)
+server, basepath = self._parse_url(base_url)
+# TODO full_urls check and test
+return _HDFS_PREFIX + self._join(server, basepath, *paths)
 
   def _join(self, basepath, *paths):
 return posixpath.join(basepath, *paths)
 
   def split(self, url):
-rel_path = self._parse_url(url)
+server, rel_path = self._parse_url(url)
+if server is None:
+  server = ''
+else:
+  server = '/' + server
 
 Review comment:
   Is this only expected to ever work in posix? If not, should we use 
`os.path.join`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 353805)
Time Spent: 50m  (was: 40m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2019-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=353803&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-353803
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 04/Dec/19 21:32
Start Date: 04/Dec/19 21:32
Worklog Time Spent: 10m 
  Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] 
Add --hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#discussion_r353993812
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py
 ##
 @@ -579,6 +630,19 @@ def test_dict_options_missing(self):
   }
   )
 
+  def test_dict_options_full_urls(self):
+pipeline_options = {
+'hdfs_host': '',
+'hdfs_port': 0,
+'hdfs_user': '',
+'hdfs_full_urls': 'invalid',
+}
+
+with
+self.fs = hdfs.HadoopFileSystem(pipeline_options=pipeline_options)
+self.assertFalse(self.fs.full_urls)
 
 Review comment:
   Is this `with` a typo?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 353803)
Time Spent: 40m  (was: 0.5h)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2019-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=353801&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-353801
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 04/Dec/19 21:30
Start Date: 04/Dec/19 21:30
Worklog Time Spent: 10m 
  Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] 
Add --hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#discussion_r353993086
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem_test.py
 ##
 @@ -323,7 +375,7 @@ def test_create_success(self):
 url = self.fs.join(self.tmpdir, 'new_file')
 handle = self.fs.create(url)
 self.assertIsNotNone(handle)
-url = self.fs._parse_url(url)
+_, url = self.fs._parse_url(url)
 
 Review comment:
   Test `server` is None, here and below?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 353801)
Time Spent: 0.5h  (was: 20m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2019-12-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=353800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-353800
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 04/Dec/19 21:27
Start Date: 04/Dec/19 21:27
Worklog Time Spent: 10m 
  Work Description: zhitaoli commented on pull request #10223: [BEAM-8399] 
Add --hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223#discussion_r353991891
 
 

 ##
 File path: sdks/python/apache_beam/io/hadoopfilesystem.py
 ##
 @@ -115,42 +116,59 @@ def __init__(self, pipeline_options):
   hdfs_host = hdfs_options.hdfs_host
   hdfs_port = hdfs_options.hdfs_port
   hdfs_user = hdfs_options.hdfs_user
+  self.full_urls = hdfs_options.hdfs_full_urls
 
 Review comment:
   Make this private? `self._full_urls`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 353800)
Time Spent: 20m  (was: 10m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2019-11-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=350010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-350010
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 26/Nov/19 20:52
Start Date: 26/Nov/19 20:52
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10223: [BEAM-8399] Add 
--hdfs_full_urls option (wip)
URL: https://github.com/apache/beam/pull/10223
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit