[ https://issues.apache.org/jira/browse/BEAM-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17122896#comment-17122896 ]
Beam JIRA Bot commented on BEAM-6821: ------------------------------------- This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled "stale-P2". If this issue is still affecting you, we care! Please comment and remove the label. Otherwise, in 14 days the issue will be moved to P3. Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation of what these priorities mean. > FileBasedSink is not creating file paths according to target filesystem > ----------------------------------------------------------------------- > > Key: BEAM-6821 > URL: https://issues.apache.org/jira/browse/BEAM-6821 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Affects Versions: 2.11.0 > Environment: Windows 10 > Reporter: Gregory Kovelman > Priority: P2 > Labels: stale-P2 > Time Spent: 3h 40m > Remaining Estimate: 0h > > File path generated in _open_writer_ method is not according to target > filesystem, because > os.path.join is used and not FileSystems.join. > apache_beam\io\filebasedsink.py extract: > > {code:java} > def _create_temp_dir(self, file_path_prefix): > base_path, last_component = FileSystems.split(file_path_prefix) > if not last_component: > # Trying to re-split the base_path to check if it's a root. > new_base_path, _ = FileSystems.split(base_path) > if base_path == new_base_path: > raise ValueError('Cannot create a temporary directory for root path ' > 'prefix %s. Please specify a file path prefix with ' > 'at least two components.' % file_path_prefix) > path_components = [base_path, > 'beam-temp-' + last_component + '-' + uuid.uuid1().hex] > return FileSystems.join(*path_components) > @check_accessible(['file_path_prefix', 'file_name_suffix']) > def open_writer(self, init_result, uid): > # A proper suffix is needed for AUTO compression detection. > # We also ensure there will be no collisions with uid and a > # (possibly unsharded) file_path_prefix and a (possibly empty) > # file_name_suffix. > file_path_prefix = self.file_path_prefix.get() > file_name_suffix = self.file_name_suffix.get() > suffix = ( > '.' + os.path.basename(file_path_prefix) + file_name_suffix) > return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix) > {code} > > > This created incompatibilities between, for example, Windows and GCS. > Expected: gs://bucket/beam-temp-result-uuid\\uid.result > Actual: gs://bucket/beam-temp-result-uuid/uid.result > Replacing os.path.join with FileSystems.join fixes the issue -- This message was sent by Atlassian Jira (v8.3.4#803005)