Hello community, here is the log from the commit of package python-azure-datalake-store for openSUSE:Factory checked in at 2019-10-10 14:28:38 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-azure-datalake-store (Old) and /work/SRC/openSUSE:Factory/.python-azure-datalake-store.new.2352 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-azure-datalake-store" Thu Oct 10 14:28:38 2019 rev:6 rq:735676 version:0.0.47 Changes: -------- --- /work/SRC/openSUSE:Factory/python-azure-datalake-store/python-azure-datalake-store.changes 2019-05-14 13:31:21.779484100 +0200 +++ /work/SRC/openSUSE:Factory/.python-azure-datalake-store.new.2352/python-azure-datalake-store.changes 2019-10-10 14:28:40.388859911 +0200 @@ -1,0 +2,12 @@ +Fri Oct 4 12:13:06 UTC 2019 - John Paul Adrian Glaubitz <adrian.glaub...@suse.com> + +- New upstream release + + Version 0.0.47 + + For detailed information about changes see the + HISTORY.txt file provided with this package +- Add python-requires to BuildRequires for Python 2.x +- Drop patch to support older versions of setuptools as + SLE-12 is now shipping with a recent enough version + + ads_drop-extras-require.patch + +------------------------------------------------------------------- Old: ---- ads_drop-extras-require.patch azure-datalake-store-0.0.44.tar.gz New: ---- azure-datalake-store-0.0.47.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-azure-datalake-store.spec ++++++ --- /var/tmp/diff_new_pack.XouKBe/_old 2019-10-10 14:28:41.068858326 +0200 +++ /var/tmp/diff_new_pack.XouKBe/_new 2019-10-10 14:28:41.072858318 +0200 @@ -18,7 +18,7 @@ %{?!python_module:%define python_module() python-%{**} python3-%{**}} Name: python-azure-datalake-store -Version: 0.0.44 +Version: 0.0.47 Release: 0 Summary: Microsoft Azure Data Lake Store Client Library License: MIT @@ -26,11 +26,13 @@ Url: https://github.com/Azure/azure-sdk-for-python Source: https://files.pythonhosted.org/packages/source/a/azure-datalake-store/azure-datalake-store-%{version}.tar.gz Source1: LICENSE.txt -Patch1: ads_drop-extras-require.patch BuildRequires: %{python_module azure-nspkg >= 3.0.0} BuildRequires: %{python_module setuptools} BuildRequires: fdupes BuildRequires: python-rpm-macros +%ifpython2 +BuildRequires: python-futures +%endif Requires: python-adal >= 0.4.2 Requires: python-azure-nspkg >= 3.0.0 Requires: python-cffi @@ -51,7 +53,6 @@ %prep %setup -q -n azure-datalake-store-%{version} -%patch1 -p1 %build install -m 644 %{SOURCE1} %{_builddir}/azure-datalake-store-%{version} ++++++ azure-datalake-store-0.0.44.tar.gz -> azure-datalake-store-0.0.47.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/azure-datalake-store-0.0.44/HISTORY.rst new/azure-datalake-store-0.0.47/HISTORY.rst --- old/azure-datalake-store-0.0.44/HISTORY.rst 2019-03-07 00:13:59.000000000 +0100 +++ new/azure-datalake-store-0.0.47/HISTORY.rst 2019-08-15 03:34:18.000000000 +0200 @@ -3,6 +3,22 @@ Release History =============== +0.0.47 (2019-08-14) ++++++++++++++++++++ +* Remove logging of bearer token +* Documentation related changes(Add readme.md and correct some formatting) + +0.0.46 (2019-06-25) ++++++++++++++++++++ +* Expose per request timeout. Default to 60. +* Concat will not retry by default. +* Bug fixes. + +0.0.45 (2019-05-10) ++++++++++++++++++++ +* Update open and close ADLFile semantics +* Refactor code and improve performance for opening a file + 0.0.44 (2019-03-05) +++++++++++++++++++ * Add continuation token to LISTSTATUS api call diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/azure-datalake-store-0.0.44/PKG-INFO new/azure-datalake-store-0.0.47/PKG-INFO --- old/azure-datalake-store-0.0.44/PKG-INFO 2019-03-07 00:24:10.000000000 +0100 +++ new/azure-datalake-store-0.0.47/PKG-INFO 2019-08-15 03:46:55.000000000 +0200 @@ -1,6 +1,6 @@ Metadata-Version: 1.1 Name: azure-datalake-store -Version: 0.0.44 +Version: 0.0.47 Summary: Azure Data Lake Store Filesystem Client Library for Python Home-page: https://github.com/Azure/azure-data-lake-store-python Author: Microsoft Corporation @@ -247,6 +247,22 @@ Release History =============== + 0.0.47 (2019-08-14) + +++++++++++++++++++ + * Remove logging of bearer token + * Documentation related changes(Add readme.md and correct some formatting) + + 0.0.46 (2019-06-25) + +++++++++++++++++++ + * Expose per request timeout. Default to 60. + * Concat will not retry by default. + * Bug fixes. + + 0.0.45 (2019-05-10) + +++++++++++++++++++ + * Update open and close ADLFile semantics + * Refactor code and improve performance for opening a file + 0.0.44 (2019-03-05) +++++++++++++++++++ * Add continuation token to LISTSTATUS api call diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/azure-datalake-store-0.0.44/azure/datalake/store/__init__.py new/azure-datalake-store-0.0.47/azure/datalake/store/__init__.py --- old/azure-datalake-store-0.0.44/azure/datalake/store/__init__.py 2019-03-07 00:13:59.000000000 +0100 +++ new/azure-datalake-store-0.0.47/azure/datalake/store/__init__.py 2019-08-15 03:34:18.000000000 +0200 @@ -6,7 +6,7 @@ # license information. # -------------------------------------------------------------------------- -__version__ = "0.0.44" +__version__ = "0.0.47" from .core import AzureDLFileSystem from .multithread import ADLDownloader diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/azure-datalake-store-0.0.44/azure/datalake/store/core.py new/azure-datalake-store-0.0.47/azure/datalake/store/core.py --- old/azure-datalake-store-0.0.44/azure/datalake/store/core.py 2019-03-07 00:13:59.000000000 +0100 +++ new/azure-datalake-store-0.0.47/azure/datalake/store/core.py 2019-08-15 03:34:18.000000000 +0200 @@ -21,14 +21,13 @@ import uuid import json - # local imports from .exceptions import DatalakeBadOffsetException, DatalakeIncompleteTransferException from .exceptions import FileNotFoundError, PermissionError from .lib import DatalakeRESTInterface from .utils import ensure_writable, read_block from .enums import ExpiryOptionType -from .retry import ExponentialRetryPolicy +from .retry import ExponentialRetryPolicy, NoRetryPolicy from .multiprocessor import multi_processor_change_acl if sys.version_info >= (3, 4): @@ -39,15 +38,16 @@ logger = logging.getLogger(__name__) valid_expire_types = [x.value for x in ExpiryOptionType] + class AzureDLFileSystem(object): """ Access Azure DataLake Store as if it were a file-system Parameters ---------- - store_name : str ("") - Store name to connect to - token : credentials object + store_name: str ("") + Store name to connect to. + token: credentials object When setting up a new connection, this contains the authorization credentials (see `lib.auth()`). url_suffix: str (None) @@ -57,16 +57,18 @@ The API version to target with requests. Changing this value will change the behavior of the requests, and can cause unexpected behavior or breaking changes. Changes to this value should be undergone with caution. + per_call_timeout_seconds: float(60) + This is the timeout for each requests library call. kwargs: optional key/values See ``lib.auth()``; full list: tenant_id, username, password, client_id, client_secret, resource """ _singleton = [None] - def __init__(self, token=None, **kwargs): - # store instance vars + def __init__(self, token=None, per_call_timeout_seconds=60, **kwargs): self.token = token self.kwargs = kwargs + self.per_call_timeout_seconds = per_call_timeout_seconds self.connect() self.dirs = {} self._emptyDirs = [] @@ -85,14 +87,14 @@ """ Establish connection object. """ - self.azure = DatalakeRESTInterface(token=self.token, **self.kwargs) + self.azure = DatalakeRESTInterface(token=self.token, req_timeout_s=self.per_call_timeout_seconds, **self.kwargs) self.token = self.azure.token def __setstate__(self, state): self.__dict__.update(state) self.connect() - def open(self, path, mode='rb', blocksize=2**25, delimiter=None): + def open(self, path, mode='rb', blocksize=2 ** 25, delimiter=None): """ Open a file for reading or writing Parameters @@ -148,14 +150,16 @@ def ls(self, path="", detail=False, invalidate_cache=True): """ List all elements under directory specified with path + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Path to query - detail : bool + detail: bool Detailed info or not. - invalidate_cache : bool + invalidate_cache: bool Whether to invalidate cache or not + Returns ------- List of elements under directory specified with path @@ -178,14 +182,16 @@ def info(self, path, invalidate_cache=True, expected_error_code=None): """ File information for path + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Path to query - invalidate_cache : bool + invalidate_cache: bool Whether to invalidate cache or not - expected_error_code : int + expected_error_code: int Optionally indicates a specific, expected error code, if any. + Returns ------- File information @@ -198,7 +204,8 @@ # in the case of getting info about the root itself or if the cache won't be hit # simply return the result of a GETFILESTATUS from the service if invalidate_cache or path_as_posix in {'/', '.'}: - to_return = self.azure.call('GETFILESTATUS', path_as_posix, expected_error_code=expected_error_code)['FileStatus'] + to_return = self.azure.call('GETFILESTATUS', path_as_posix, expected_error_code=expected_error_code)[ + 'FileStatus'] to_return['name'] = path_as_posix # add the key/value pair back to the cache so long as it isn't the root @@ -210,7 +217,6 @@ for f in self.dirs[root_as_posix]: if f['name'] == path_as_posix: found = True - f = to_return break if not found: self.dirs[root_as_posix].append(to_return) @@ -225,13 +231,14 @@ def _walk(self, path, invalidate_cache=True, include_dirs=False): """ Walk a path recursively and returns list of files and dirs(if parameter set) + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Path to query - invalidate_cache : bool + invalidate_cache: bool Whether to invalidate cache - include_dirs : bool + include_dirs: bool Whether to include dirs in return value Returns @@ -269,13 +276,14 @@ def walk(self, path='', details=False, invalidate_cache=True): """ Get all files below given path + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Path to query - details : bool + details: bool Whether to include file details - invalidate_cache : bool + invalidate_cache: bool Whether to invalidate cache Returns @@ -287,13 +295,14 @@ def glob(self, path, details=False, invalidate_cache=True): """ Find files (not directories) by glob-matching. + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Path to query - details : bool + details: bool Whether to include file details - invalidate_cache : bool + invalidate_cache: bool Whether to invalidate cache Returns @@ -312,15 +321,16 @@ def du(self, path, total=False, deep=False, invalidate_cache=True): """ Bytes in keys at path + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Path to query - total : bool + total: bool Return the sum on list - deep : bool + deep: bool Recursively enumerate or just use files under current dir - invalidate_cache : bool + invalidate_cache: bool Whether to invalidate cache Returns @@ -339,7 +349,8 @@ def df(self, path): """ Resource summary of path - Parameters + + Parameters ---------- path: str Path to query @@ -351,7 +362,7 @@ 'spaceConsumed': current_path_info['length'], 'spaceQuota': -1} else: all_files_and_dirs = self._walk(path, include_dirs=True) - dir_count = 1 # 1 as walk doesn't return current directory + dir_count = 1 # 1 as walk doesn't return current directory length = file_count = 0 for item in all_files_and_dirs: length += item['length'] @@ -403,10 +414,13 @@ parms = {} value_to_use = [x for x in valid_expire_types if x.lower() == expiry_option.lower()] if len(value_to_use) != 1: - raise ValueError('expiry_option must be one of: {}. Value given: {}'.format(valid_expire_types, expiry_option)) + raise ValueError( + 'expiry_option must be one of: {}. Value given: {}'.format(valid_expire_types, expiry_option)) if value_to_use[0] != ExpiryOptionType.never_expire.value and not expire_time: - raise ValueError('expire_time must be specified if the expiry_option is not NeverExpire. Value of expiry_option: {}'.format(expiry_option)) + raise ValueError( + 'expire_time must be specified if the expiry_option is not NeverExpire. Value of expiry_option: {}'.format( + expiry_option)) path = AzureDLPath(path).trim() parms['expiryOption'] = value_to_use[0] @@ -465,7 +479,8 @@ Specifies whether to set ACLs recursively or not """ if recursive: - multi_processor_change_acl(adl=self, path=path, method_name="set_acl", acl_spec=acl_spec, number_of_sub_process=number_of_sub_process) + multi_processor_change_acl(adl=self, path=path, method_name="set_acl", acl_spec=acl_spec, + number_of_sub_process=number_of_sub_process) else: self._acl_call('SETACL', path, acl_spec, invalidate_cache=True) @@ -488,7 +503,8 @@ Specifies whether to modify ACLs recursively or not """ if recursive: - multi_processor_change_acl(adl=self, path=path, method_name="mod_acl", acl_spec=acl_spec, number_of_sub_process=number_of_sub_process) + multi_processor_change_acl(adl=self, path=path, method_name="mod_acl", acl_spec=acl_spec, + number_of_sub_process=number_of_sub_process) else: self._acl_call('MODIFYACLENTRIES', path, acl_spec, invalidate_cache=True) @@ -512,7 +528,8 @@ Specifies whether to remove ACLs recursively or not """ if recursive: - multi_processor_change_acl(adl=self, path=path, method_name="rem_acl", acl_spec=acl_spec, number_of_sub_process=number_of_sub_process) + multi_processor_change_acl(adl=self, path=path, method_name="rem_acl", acl_spec=acl_spec, + number_of_sub_process=number_of_sub_process) else: self._acl_call('REMOVEACLENTRIES', path, acl_spec, invalidate_cache=True) @@ -585,12 +602,14 @@ def exists(self, path, invalidate_cache=True): """ Does such a file/directory exist? + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Path to query - invalidate_cache : bool + invalidate_cache: bool Whether to invalidate cache + Returns ------- True or false depending on whether the path exists. @@ -604,10 +623,12 @@ def cat(self, path): """ Return contents of file + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Path to query + Returns ------- Contents of file @@ -618,11 +639,12 @@ def tail(self, path, size=1024): """ Return last bytes of file + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Path to query - size : int + size: int How many bytes to return Returns @@ -639,11 +661,12 @@ def head(self, path, size=1024): """ Return first bytes of file + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Path to query - size : int + size: int How many bytes to return Returns @@ -656,12 +679,14 @@ def get(self, path, filename): """ Stream data from file at path to local filename + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath ADL Path to read - filename : str or Path + filename: str or Path Local file path to write to + Returns ------- None @@ -677,14 +702,16 @@ def put(self, filename, path, delimiter=None): """ Stream data from local filename to file at path + Parameters ---------- - filename : str or Path + filename: str or Path Local file path to read from - path : str or AzureDLPath + path: str or AzureDLPath ADL Path to write to - delimiter : + delimiter: Optional delimeter for delimiter-ended blocks + Returns ------- None @@ -700,10 +727,12 @@ def mkdir(self, path): """ Make new directory + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Path to create directory + Returns ------- None @@ -716,10 +745,12 @@ def rmdir(self, path): """ Remove empty directory + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Directory path to remove + Returns ------- None @@ -734,11 +765,12 @@ def mv(self, path1, path2): """ Move file between locations on ADL + Parameters ---------- - path1 : + path1: Source Path - path2 : + path2: Destination path Returns @@ -758,12 +790,12 @@ Parameters ---------- - outfile : path + outfile: path The file which will be concatenated to. If it already exists, the extra pieces will be appended. - filelist : list of paths + filelist: list of paths Existing adl files to concatenate, in order - delete_source : bool (False) + delete_source: bool (False) If True, assume that the paths to concatenate exist alone in a directory, and delete that whole directory when done. @@ -776,16 +808,18 @@ sourceList = [AzureDLPath(f).as_posix() for f in filelist] sources = {} sources["sources"] = sourceList + self.azure.call('MSCONCAT', outfile.as_posix(), - data=bytearray(json.dumps(sources,separators=(',', ':')), encoding="utf-8"), + data=bytearray(json.dumps(sources, separators=(',', ':')), encoding="utf-8"), deleteSourceDirectory=delete, - headers={'Content-Type': "application/json"}) + headers={'Content-Type': "application/json"}, + retry_policy=NoRetryPolicy()) self.invalidate_cache(outfile) merge = concat def cp(self, path1, path2): - """ Copy file between locations on ADL """ + """ Not implemented. Copy file between locations on ADL """ # TODO: any implementation for this without download? raise NotImplementedError @@ -795,9 +829,9 @@ Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath The location to remove. - recursive : bool (True) + recursive: bool (True) Whether to remove also all entries below, i.e., which are returned by `walk()`. @@ -818,10 +852,12 @@ def invalidate_cache(self, path=None): """ Remove entry from object file-cache + Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Remove the path from object file-cache + Returns ------- None @@ -840,8 +876,9 @@ Parameters ---------- - path : str or AzureDLPath + path: str or AzureDLPath Path of file to create + Returns ------- None @@ -911,14 +948,14 @@ Parameters ---------- - azure : azure connection - path : AzureDLPath + azure: azure connection + path: AzureDLPath location of file - mode : str {'wb', 'rb', 'ab'} - blocksize : int - Size of the write or read-ahead buffer. For writing, will be + mode: str {'wb', 'rb', 'ab'} + blocksize: int + Size of the write or read-ahead buffer. For writing(and appending, will be truncated to 4MB (2**22). - delimiter : bytes or None + delimiter: bytes or None If specified and in write mode, each flush will send data terminating on this bytestring, potentially leaving some data in the buffer. @@ -933,7 +970,7 @@ AzureDLFileSystem.open: used to create AzureDLFile objects """ - def __init__(self, azure, path, mode='rb', blocksize=2**25, + def __init__(self, azure, path, mode='rb', blocksize=2 ** 25, delimiter=None): self.mode = mode if mode not in {'rb', 'wb', 'ab'}: @@ -949,26 +986,45 @@ self.trim = True self.buffer = io.BytesIO() self.blocksize = blocksize - self.first_write = True uniqueid = str(uuid.uuid4()) self.filesessionid = uniqueid self.leaseid = uniqueid # always invalidate the cache when checking for existence of a file # that may be created or written to (for the first time). - exists = self.azure.exists(path, invalidate_cache=True) + try: + file_data = self.azure.info(path, invalidate_cache=True, expected_error_code=404) + exists = True + except FileNotFoundError: + exists = False # cannot create a new file object out of a directory - if exists and self.info()['type'] == 'DIRECTORY': - raise IOError('path: {} is a directory, not a file, and cannot be opened for reading or writing'.format(path)) + if exists and file_data['type'] == 'DIRECTORY': + raise IOError( + 'path: {} is a directory, not a file, and cannot be opened for reading or writing'.format(path)) + + if mode == 'ab' or mode == 'wb': + self.blocksize = min(2 ** 22, blocksize) if mode == 'ab' and exists: - self.loc = self.info()['length'] - self.first_write = False - elif mode == 'rb': - self.size = self.info()['length'] - else: - self.blocksize = min(2**22, blocksize) + self.loc = file_data['length'] + elif (mode == 'ab' and not exists) or (mode == 'wb'): + # Create the file + _put_data_with_retry( + rest=self.azure.azure, + op='CREATE', + path=self.path.as_posix(), + data=None, + overwrite='true', + write='true', + syncFlag='DATA', + leaseid=self.leaseid, + filesessionid=self.filesessionid) + logger.debug('Created file %s ' % self.path) + else: # mode == 'rb': + if not exists: + raise FileNotFoundError(path.as_posix()) + self.size = file_data['length'] def info(self): """ File information about this path """ @@ -983,9 +1039,9 @@ Parameters ---------- - loc : int + loc: int byte location - whence : {0, 1, 2} + whence: {0, 1, 2} from start of file, current location or end of file, resp. """ if not self.mode == 'rb': @@ -1026,9 +1082,10 @@ found = self.cache[self.loc - self.start:].find(b'\n') + 1 if found: - partialLine = self.cache[self.loc-self.start: min(self.loc-self.start+found, self.loc-self.start+length)] + partialLine = self.cache[ + self.loc - self.start: min(self.loc - self.start + found, self.loc - self.start + length)] else: - partialLine = self.cache[self.loc-self.start:] + partialLine = self.cache[self.loc - self.start:] self.loc += len(partialLine) line += partialLine @@ -1064,9 +1121,12 @@ Parameters ---------- - offset : int (-1) + offset: int (-1) offset from where to read; if <0, last read location or beginning of file. - :return: + + Returns + ------- + None """ if offset < 0: offset = self.loc @@ -1087,7 +1147,7 @@ Parameters ---------- - length : int (-1) + length: int (-1) Number of bytes to read; if <0, all remaining bytes. """ if self.mode != 'rb': @@ -1101,7 +1161,7 @@ while length > 0: self._read_blocksize() data_read = self.cache[self.loc - self.start: - min(self.loc - self.start + length, self.end - self.start)] + min(self.loc - self.start + length, self.end - self.start)] if not data_read: # Check to catch possible server errors. Ideally shouldn't happen. flag += 1 if flag >= 5: @@ -1121,12 +1181,16 @@ def readinto(self, b): """ Reads data into buffer b - Returns number of bytes read. + Parameters ---------- - b : bytearray + b: bytearray Buffer to which bytes are read into + + Returns + ------- + Returns number of bytes read. """ temp = self.read(len(b)) b[:len(temp)] = temp @@ -1141,7 +1205,7 @@ Parameters ---------- - data : bytes + data: bytes Set of bytes to be written. """ if self.mode not in {'wb', 'ab'}: @@ -1154,7 +1218,6 @@ self.flush(syncFlag='DATA') return out - def flush(self, syncFlag='METADATA', force=False): """ Write buffered data to ADL. @@ -1172,94 +1235,39 @@ return if not (syncFlag == 'METADATA' or syncFlag == 'DATA' or syncFlag == 'CLOSE'): - raise ValueError('syncFlag must be one of these: METADAT, DATA or CLOSE') - + raise ValueError('syncFlag must be one of these: METADATA, DATA or CLOSE') - if self.buffer.tell() == 0: - if force and self.first_write: - _put_data_with_retry( - self.azure.azure, - 'CREATE', - path=self.path.as_posix(), - data=None, - overwrite='true', - write='true', - syncFlag=syncFlag, - leaseid=self.leaseid, - filesessionid=self.filesessionid) - self.first_write = False - return - - self.buffer.seek(0) + common_args_append = { + 'rest': self.azure.azure, + 'op': 'APPEND', + 'path': self.path.as_posix(), + 'append': 'true', + 'leaseid': self.leaseid, + 'filesessionid': self.filesessionid + } + self.buffer.seek(0) # Go to start of buffer data = self.buffer.read() - syncFlagLocal = 'DATA' while len(data) > self.blocksize: + data_to_write_limit = self.blocksize if self.delimiter: - place = data[:self.blocksize].rfind(self.delimiter) - else: - place = -1 - if place < 0: - # not found - write whole block - limit = self.blocksize - else: - limit = place + len(self.delimiter) - if self.first_write: - _put_data_with_retry( - self.azure.azure, - 'CREATE', - path=self.path.as_posix(), - data=data[:limit], - overwrite='true', - write='true', - syncFlag=syncFlagLocal, - leaseid=self.leaseid, - filesessionid=self.filesessionid) - self.first_write = False - else: - _put_data_with_retry( - self.azure.azure, - 'APPEND', - path=self.path.as_posix(), - data=data[:limit], - append='true', - syncFlag=syncFlagLocal, - leaseid=self.leaseid, - filesessionid=self.filesessionid) - logger.debug('Wrote %d bytes to %s' % (limit, self)) - data = data[limit:] - - - self.buffer = io.BytesIO(data) - self.buffer.seek(0, 2) + delimiter_index = data.rfind(self.delimiter, 0, self.blocksize) + if delimiter_index != -1: # delimiter found + data_to_write_limit = delimiter_index + len(self.delimiter) + + offset = self.tell() - len(data) + _put_data_with_retry(syncFlag='DATA', data=data[:data_to_write_limit], offset=offset, **common_args_append) + logger.debug('Wrote %d bytes to %s' % (data_to_write_limit, self)) + data = data[data_to_write_limit:] if force: - zero_offset = self.tell() - len(data) - if self.first_write: - _put_data_with_retry( - self.azure.azure, - 'CREATE', - path=self.path.as_posix(), - data=data, - overwrite='true', - write='true', - syncFlag=syncFlag, - leaseid=self.leaseid, - filesessionid=self.filesessionid) - self.first_write = False - else: - _put_data_with_retry( - self.azure.azure, - 'APPEND', - path=self.path.as_posix(), - data=data, - offset=zero_offset, - append='true', - syncFlag=syncFlag, - leaseid=self.leaseid, - filesessionid=self.filesessionid) + offset = self.tell() - len(data) + _put_data_with_retry(syncFlag=syncFlag, data=data, offset=offset, **common_args_append) logger.debug('Wrote %d bytes to %s' % (len(data), self)) - self.buffer = io.BytesIO() + data = b'' + + self.buffer = io.BytesIO(data) + self.buffer.seek(0, 2) # seek to end for other writes to buffer def close(self): """ Close file @@ -1303,13 +1311,14 @@ # if the caller gives a bad start/end combination, OPEN will throw and # this call will bubble it up return rest.call( - 'OPEN', path, offset=start, length=end-start, read='true', stream=stream, retry_policy=retry_policy, **kwargs) + 'OPEN', path, offset=start, length=end - start, read='true', stream=stream, retry_policy=retry_policy, **kwargs) def _fetch_range_with_retry(rest, path, start, end, stream=False, retries=10, delay=0.01, backoff=3, **kwargs): err = None - retry_policy = ExponentialRetryPolicy(max_retries=retries, exponential_retry_interval=delay, exponential_factor=backoff) + retry_policy = ExponentialRetryPolicy(max_retries=retries, exponential_retry_interval=delay, + exponential_factor=backoff) try: return _fetch_range(rest, path, start, end, stream=False, retry_policy=retry_policy, **kwargs) except Exception as e: @@ -1352,8 +1361,6 @@ rest.log_response_and_raise(None, exception) - - class AzureDLPath(type(pathlib.PurePath())): """ Subclass of native object-oriented filesystem path. @@ -1367,7 +1374,7 @@ Parameters ---------- - path : AzureDLPath or string + path: AzureDLPath or string location of file or directory Examples diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/azure-datalake-store-0.0.44/azure/datalake/store/lib.py new/azure-datalake-store-0.0.47/azure/datalake/store/lib.py --- old/azure-datalake-store-0.0.44/azure/datalake/store/lib.py 2019-03-07 00:13:59.000000000 +0100 +++ new/azure-datalake-store-0.0.47/azure/datalake/store/lib.py 2019-08-15 03:34:18.000000000 +0200 @@ -80,23 +80,23 @@ Parameters ---------- - tenant_id : str + tenant_id: str associated with the user's subscription, or "common" - username : str + username: str active directory user - password : str + password: str sign-in password - client_id : str + client_id: str the service principal client - client_secret : str + client_secret: str the secret associated with the client_id - resource : str + resource: str resource for auth (e.g., https://datalake.azure.net/) - require_2fa : bool + require_2fa: bool indicates this authentication attempt requires two-factor authentication authority: string The full URI of the authentication authority to authenticate against (such as https://login.microsoftonline.com/) - kwargs : key/values + kwargs: key/values Other parameters, for future use Returns @@ -225,6 +225,8 @@ The API version to target with requests. Changing this value will change the behavior of the requests, and can cause unexpected behavior or breaking changes. Changes to this value should be undergone with caution. + req_timeout_s: float(60) + This is the timeout for each requests library call. kwargs: optional arguments to auth See ``auth()``. Includes, e.g., username, password, tenant; will pull values from environment variables if not provided. @@ -256,7 +258,7 @@ } def __init__(self, store_name=default_store, token=None, - url_suffix=default_adls_suffix, api_version='2018-09-01', **kwargs): + url_suffix=default_adls_suffix, api_version='2018-09-01', req_timeout_s=60, **kwargs): # in the case where an empty string is passed for the url suffix, it must be replaced with the default. url_suffix = url_suffix or default_adls_suffix self.local = threading.local() @@ -278,6 +280,7 @@ platform.platform(), __name__, __version__) + self.req_timeout_s = req_timeout_s @property def session(self): @@ -309,7 +312,8 @@ op, path, " ".join(["{}={}".format(key, params[key]) for key in params])) msg += "\n".join(["{}: {}".format(header, headers[header]) - for header in headers]) + for header in headers if header != 'Authorization']) + msg += "\nAuthorization header length:" + str(len(headers['Authorization'])) if retry_count > 0: msg += "retry-count:{}".format(retry_count) logger.debug(msg) @@ -474,7 +478,7 @@ req_headers['User-Agent'] = self.user_agent req_headers.update(headers) self._log_request(method, url, op, urllib.quote(path), kwargs, req_headers, retry_count) - return func(url, params=params, headers=req_headers, data=data, stream=stream) + return func(url, params=params, headers=req_headers, data=data, stream=stream, timeout=self.req_timeout_s) def __getstate__(self): state = self.__dict__.copy() diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/azure-datalake-store-0.0.44/azure/datalake/store/multithread.py new/azure-datalake-store-0.0.47/azure/datalake/store/multithread.py --- old/azure-datalake-store-0.0.44/azure/datalake/store/multithread.py 2019-03-07 00:13:59.000000000 +0100 +++ new/azure-datalake-store-0.0.47/azure/datalake/store/multithread.py 2019-08-15 03:34:18.000000000 +0200 @@ -468,7 +468,7 @@ Returns ------- - A dictionary of upload instances. The hashes are auto- + A dictionary of upload instances. The hashes are auto generated unique. The state of the chunks completed, errored, etc., can be seen in the status attribute. Instances can be resumed with ``run()``. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/azure-datalake-store-0.0.44/azure/datalake/store/retry.py new/azure-datalake-store-0.0.47/azure/datalake/store/retry.py --- old/azure-datalake-store-0.0.44/azure/datalake/store/retry.py 2019-03-07 00:13:59.000000000 +0100 +++ new/azure-datalake-store-0.0.47/azure/datalake/store/retry.py 2019-08-15 03:34:18.000000000 +0200 @@ -100,8 +100,8 @@ response = response_from_adal_exception(last_exception) if hasattr(last_exception, 'response'): # HTTP exception i.e 429 response = last_exception.response - - request_successful = last_exception is None or response.status_code == 401 # 401 = Invalid credentials + + request_successful = last_exception is None or (response is not None and response.status_code == 401) # 401 = Invalid credentials if request_successful or not retry_policy.should_retry(response, last_exception, retry_count): break if last_exception is not None: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/azure-datalake-store-0.0.44/azure/datalake/store/transfer.py new/azure-datalake-store-0.0.47/azure/datalake/store/transfer.py --- old/azure-datalake-store-0.0.44/azure/datalake/store/transfer.py 2019-03-07 00:13:59.000000000 +0100 +++ new/azure-datalake-store-0.0.47/azure/datalake/store/transfer.py 2019-08-15 03:34:18.000000000 +0200 @@ -42,7 +42,7 @@ Parameters ---------- - states : list of valid states + states: list of valid states Managed objects can only use these defined states. Examples diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/azure-datalake-store-0.0.44/azure_datalake_store.egg-info/PKG-INFO new/azure-datalake-store-0.0.47/azure_datalake_store.egg-info/PKG-INFO --- old/azure-datalake-store-0.0.44/azure_datalake_store.egg-info/PKG-INFO 2019-03-07 00:24:10.000000000 +0100 +++ new/azure-datalake-store-0.0.47/azure_datalake_store.egg-info/PKG-INFO 2019-08-15 03:46:55.000000000 +0200 @@ -1,6 +1,6 @@ Metadata-Version: 1.1 Name: azure-datalake-store -Version: 0.0.44 +Version: 0.0.47 Summary: Azure Data Lake Store Filesystem Client Library for Python Home-page: https://github.com/Azure/azure-data-lake-store-python Author: Microsoft Corporation @@ -247,6 +247,22 @@ Release History =============== + 0.0.47 (2019-08-14) + +++++++++++++++++++ + * Remove logging of bearer token + * Documentation related changes(Add readme.md and correct some formatting) + + 0.0.46 (2019-06-25) + +++++++++++++++++++ + * Expose per request timeout. Default to 60. + * Concat will not retry by default. + * Bug fixes. + + 0.0.45 (2019-05-10) + +++++++++++++++++++ + * Update open and close ADLFile semantics + * Refactor code and improve performance for opening a file + 0.0.44 (2019-03-05) +++++++++++++++++++ * Add continuation token to LISTSTATUS api call diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/azure-datalake-store-0.0.44/docs/source/api.rst new/azure-datalake-store-0.0.47/docs/source/api.rst --- old/azure-datalake-store-0.0.44/docs/source/api.rst 2019-03-07 00:13:59.000000000 +0100 +++ new/azure-datalake-store-0.0.47/docs/source/api.rst 2019-08-15 03:34:18.000000000 +0200 @@ -5,21 +5,43 @@ .. autosummary:: AzureDLFileSystem + AzureDLFileSystem.access AzureDLFileSystem.cat + AzureDLFileSystem.chmod + AzureDLFileSystem.chown + AzureDLFileSystem.concat + AzureDLFileSystem.cp + AzureDLFileSystem.df AzureDLFileSystem.du AzureDLFileSystem.exists AzureDLFileSystem.get + AzureDLFileSystem.get_acl_status AzureDLFileSystem.glob + AzureDLFileSystem.head AzureDLFileSystem.info + AzureDLFileSystem.listdir AzureDLFileSystem.ls + AzureDLFileSystem.merge AzureDLFileSystem.mkdir + AzureDLFileSystem.modify_acl_entries AzureDLFileSystem.mv AzureDLFileSystem.open AzureDLFileSystem.put AzureDLFileSystem.read_block + AzureDLFileSystem.remove + AzureDLFileSystem.remove_acl + AzureDLFileSystem.remove_acl_entries + AzureDLFileSystem.remove_default_acl + AzureDLFileSystem.rename AzureDLFileSystem.rm + AzureDLFileSystem.rmdir + AzureDLFileSystem.set_acl + AzureDLFileSystem.set_expiry + AzureDLFileSystem.stat AzureDLFileSystem.tail AzureDLFileSystem.touch + AzureDLFileSystem.unlink + AzureDLFileSystem.walk .. autosummary:: AzureDLFile @@ -44,11 +66,11 @@ .. currentmodule:: azure.datalake.store.multithread -.. autoclass:: AzureDLFile - :members: - .. autoclass:: ADLUploader :members: .. autoclass:: ADLDownloader :members: + +.. currentmodule:: azure.datalake.store.lib +.. autofunction:: auth \ No newline at end of file diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/azure-datalake-store-0.0.44/docs/source/index.rst new/azure-datalake-store-0.0.47/docs/source/index.rst --- old/azure-datalake-store-0.0.44/docs/source/index.rst 2019-03-07 00:13:59.000000000 +0100 +++ new/azure-datalake-store-0.0.47/docs/source/index.rst 2019-08-15 03:34:18.000000000 +0200 @@ -18,7 +18,7 @@ * Download the repo from https://github.com/Azure/azure-data-lake-store-python * checkout the ``dev`` branch -* install the requirememnts (``pip install -r dev_requirements.txt``) +* install the requirements (``pip install -r dev_requirements.txt``) * install in develop mode (``python setup.py develop``) * optionally: build the documentation (including this page) by running ``make html`` in the docs directory.