Hello community, here is the log from the commit of package python-csvkit for openSUSE:Factory checked in at 2019-03-26 15:44:11 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-csvkit (Old) and /work/SRC/openSUSE:Factory/.python-csvkit.new.25356 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-csvkit" Tue Mar 26 15:44:11 2019 rev:6 rq:688143 version:1.0.4 Changes: -------- --- /work/SRC/openSUSE:Factory/python-csvkit/python-csvkit.changes 2019-02-27 15:10:10.146373292 +0100 +++ /work/SRC/openSUSE:Factory/.python-csvkit.new.25356/python-csvkit.changes 2019-03-26 15:44:29.356145108 +0100 @@ -1,0 +2,11 @@ +Mon Mar 25 09:07:40 UTC 2019 - Tomáš Chvátal <[email protected]> + +- Update to 1.0.4: + * Dropped Python 3.3 support (end-of-life was September 29, 2017). + * :doc:`/scripts/csvsql` adds a --chunk-size option to set the chunk size when batch inserting into a table. + * csvkit is now tested against Python 3.7. + * Dates and datetimes without punctuation can be parsed with --date-format and datetime-format. + * Error messages about column indices use 1-based numbering unless --zero is set. +- Remove merged patch remove-unittest2.patch + +------------------------------------------------------------------- Old: ---- csvkit-1.0.3.tar.gz remove-unittest2.patch New: ---- csvkit-1.0.4.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-csvkit.spec ++++++ --- /var/tmp/diff_new_pack.xikxWi/_old 2019-03-26 15:44:30.408143876 +0100 +++ /var/tmp/diff_new_pack.xikxWi/_new 2019-03-26 15:44:30.408143876 +0100 @@ -18,15 +18,13 @@ %{?!python_module:%define python_module() python-%{**} python3-%{**}} Name: python-csvkit -Version: 1.0.3 +Version: 1.0.4 Release: 0 Summary: A library of utilities for working with CSV License: MIT Group: Development/Languages/Python Url: https://github.com/wireservice/csvkit Source: https://files.pythonhosted.org/packages/source/c/csvkit/csvkit-%{version}.tar.gz -# PATCH-FIX-UPSTREAM https://github.com/wireservice/csvkit/pull/979 [email protected] -Patch0: remove-unittest2.patch BuildRequires: %{python_module SQLAlchemy >= 0.9.3} BuildRequires: %{python_module Sphinx >= 1.0.7} BuildRequires: %{python_module aenum} @@ -60,7 +58,6 @@ %prep %setup -q -n csvkit-%{version} -%autopatch -p1 # find and remove unneeded shebangs find csvkit -name "*.py" | xargs sed -i '1 {/^#!/ d}' @@ -73,7 +70,7 @@ %check export LANG=en_US.UTF-8 -%python_expand nosetests-%{$python_bin_suffix} +%python_expand nosetests-%{$python_bin_suffix} -v %files %python_files %license COPYING ++++++ csvkit-1.0.3.tar.gz -> csvkit-1.0.4.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/AUTHORS.rst new/csvkit-1.0.4/AUTHORS.rst --- old/csvkit-1.0.3/AUTHORS.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/AUTHORS.rst 2018-11-21 19:27:32.000000000 +0100 @@ -85,3 +85,8 @@ * Forest Gregg * Aliaksei Urbanski * Reid Beels +* Rodrigo Lemos +* Victor Noagbodji +* Connor McArthur +* Matěj Cepl +* Nicholas Matteo diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/CHANGELOG.rst new/csvkit-1.0.4/CHANGELOG.rst --- old/csvkit-1.0.3/CHANGELOG.rst 2018-03-11 16:11:38.000000000 +0100 +++ new/csvkit-1.0.4/CHANGELOG.rst 2019-03-16 17:25:56.000000000 +0100 @@ -1,3 +1,28 @@ +1.0.4 - March 16, 2019 +---------------------- + +Changes: + +* Dropped Python 3.3 support (end-of-life was September 29, 2017). + +Improvements: + +* :doc:`/scripts/csvsql` adds a :code:`--chunk-size` option to set the chunk size when batch inserting into a table. +* csvkit is now tested against Python 3.7. + +Fixes: + +* :code:`--names` works with :code:`--skip-lines`. +* Dates and datetimes without punctuation can be parsed with :code:`--date-format` and :code:`datetime-format`. +* Error messages about column indices use 1-based numbering unless :code:`--zero` is set. +* :doc:`/scripts/csvcut` no longer errors on :code:`--delete-empty-rows` with short rows. +* :doc:`/scripts/csvjoin` no longer errors if given a single file. +* :doc:`/scripts/csvsql` supports UPDATE commands. +* :doc:`/scripts/csvstat` no longer errors on non-finite numbers. +* :doc:`/scripts/csvstat` respects all command-line arguments when :code:`--count` is set. +* :doc:`/scripts/in2csv` CSV-to-CSV conversion respects :code:`--linenumbers` when buffering. +* :doc:`/scripts/in2csv` writes XLS sheets without encoding errors in Python 2. + 1.0.3 - March 11, 2018 ---------------------- diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/PKG-INFO new/csvkit-1.0.4/PKG-INFO --- old/csvkit-1.0.3/PKG-INFO 2018-03-11 16:17:00.000000000 +0100 +++ new/csvkit-1.0.4/PKG-INFO 2019-03-16 17:27:11.000000000 +0100 @@ -1,19 +1,16 @@ Metadata-Version: 1.1 Name: csvkit -Version: 1.0.3 +Version: 1.0.4 Summary: A suite of command-line tools for working with CSV, the king of tabular file formats. Home-page: http://csvkit.rtfd.org/ Author: Christopher Groskopf Author-email: [email protected] License: MIT +Description-Content-Type: UNKNOWN Description: .. image:: https://secure.travis-ci.org/wireservice/csvkit.svg :target: https://travis-ci.org/wireservice/csvkit :alt: Build Status - .. image:: https://gemnasium.com/wireservice/csvkit.svg - :target: https://gemnasium.com/wireservice/csvkit - :alt: Dependency Status - .. image:: https://coveralls.io/repos/wireservice/csvkit/badge.svg?branch=master :target: https://coveralls.io/r/wireservice/csvkit :alt: Coverage Status @@ -32,17 +29,14 @@ csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats. - It is inspired by pdftk, gdal and the original csvcut tool by Joe Germuska and Aaron Bycoffe. - - If you need to do more complex data analysis than csvkit can handle, use `agate <https://github.com/wireservice/agate>`_. + It is inspired by pdftk, GDAL and the original csvcut tool by Joe Germuska and Aaron Bycoffe. Important links: + * Documentation: http://csvkit.rtfd.org/ * Repository: https://github.com/wireservice/csvkit * Issues: https://github.com/wireservice/csvkit/issues - * Documentation: http://csvkit.rtfd.org/ * Schemas: https://github.com/wireservice/ffs - * Buildbot: https://travis-ci.org/wireservice/csvkit Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable @@ -55,10 +49,10 @@ Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2.7 -Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 +Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Programming Language :: Python :: Implementation :: PyPy Classifier: Topic :: Scientific/Engineering :: Information Analysis diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/README.rst new/csvkit-1.0.4/README.rst --- old/csvkit-1.0.3/README.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/README.rst 2018-09-20 19:09:31.000000000 +0200 @@ -2,10 +2,6 @@ :target: https://travis-ci.org/wireservice/csvkit :alt: Build Status -.. image:: https://gemnasium.com/wireservice/csvkit.svg - :target: https://gemnasium.com/wireservice/csvkit - :alt: Dependency Status - .. image:: https://coveralls.io/repos/wireservice/csvkit/badge.svg?branch=master :target: https://coveralls.io/r/wireservice/csvkit :alt: Coverage Status @@ -24,14 +20,11 @@ csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats. -It is inspired by pdftk, gdal and the original csvcut tool by Joe Germuska and Aaron Bycoffe. - -If you need to do more complex data analysis than csvkit can handle, use `agate <https://github.com/wireservice/agate>`_. +It is inspired by pdftk, GDAL and the original csvcut tool by Joe Germuska and Aaron Bycoffe. Important links: +* Documentation: http://csvkit.rtfd.org/ * Repository: https://github.com/wireservice/csvkit * Issues: https://github.com/wireservice/csvkit/issues -* Documentation: http://csvkit.rtfd.org/ * Schemas: https://github.com/wireservice/ffs -* Buildbot: https://travis-ci.org/wireservice/csvkit diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/csvkit/cli.py new/csvkit-1.0.4/csvkit/cli.py --- old/csvkit-1.0.3/csvkit/cli.py 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/csvkit/cli.py 2019-01-02 21:16:10.000000000 +0100 @@ -185,7 +185,7 @@ help='Specify that the input CSV file has no header row. Will create default headers (a,b,c,...).') if 'K' not in self.override_flags: self.argparser.add_argument('-K', '--skip-lines', dest='skip_lines', type=int, default=0, - help='Specify the number of initial lines to skip (e.g. comments, copyright notices, empty rows).') + help='Specify the number of initial lines to skip before the header row (e.g. comments, copyright notices, empty rows).') if 'v' not in self.override_flags: self.argparser.add_argument('-v', '--verbose', dest='verbose', action='store_true', help='Print detailed tracebacks when errors occur.') @@ -200,7 +200,7 @@ self.argparser.add_argument('--zero', dest='zero_based', action='store_true', help='When interpreting or displaying column numbers, use zero-based numbering instead of the default 1-based numbering.') - self.argparser.add_argument('-V', '--version', action='version', version='%(prog)s 1.0.3', + self.argparser.add_argument('-V', '--version', action='version', version='%(prog)s 1.0.4', help='Display version information and exit.') def _open_input_file(self, path): @@ -299,10 +299,11 @@ if not self.args.no_inference: types = [ agate.Boolean(**type_kwargs), - agate.Number(locale=self.args.locale, **type_kwargs), agate.TimeDelta(**type_kwargs), agate.Date(date_format=self.args.date_format, **type_kwargs), agate.DateTime(datetime_format=self.args.datetime_format, **type_kwargs), + # This is a different order than agate's default, in order to parse dates like "20010101". + agate.Number(locale=self.args.locale, **type_kwargs), ] + types return agate.TypeTester(types=types) @@ -354,19 +355,16 @@ if getattr(self.args, 'no_header_row', None): raise RequiredHeaderError('You cannot use --no-header-row with the -n or --names options.') - f = self.input_file - output = self.output_file - if getattr(self.args, 'zero_based', None): start = 0 else: start = 1 - rows = agate.csv.reader(f, **self.reader_kwargs) + rows = agate.csv.reader(self.skip_lines(), **self.reader_kwargs) column_names = next(rows) for i, c in enumerate(column_names, start): - output.write('%3i: %s\n' % (i, c)) + self.output_file.write('%3i: %s\n' % (i, c)) def additional_input_expected(self): return sys.stdin.isatty() and not self.args.input_path @@ -396,11 +394,11 @@ # Fail out if index is 0-based if c < 0: - raise ColumnIdentifierError("Column 0 is invalid. Columns are 1-based.") + raise ColumnIdentifierError("Column %i is invalid. Columns are 1-based." % (c + column_offset)) # Fail out if index is out of range if c >= len(column_names): - raise ColumnIdentifierError("Column %i is invalid. The last column is '%s' at index %i." % (c, column_names[-1], len(column_names) - 1)) + raise ColumnIdentifierError("Column %i is invalid. The last column is '%s' at index %i." % (c + column_offset, column_names[-1], len(column_names) - 1 + column_offset)) return c diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/csvkit/grep.py new/csvkit-1.0.4/csvkit/grep.py --- old/csvkit-1.0.3/csvkit/grep.py 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/csvkit/grep.py 2019-02-23 17:57:27.000000000 +0100 @@ -6,7 +6,7 @@ class FilteringCSVReader(six.Iterator): - """ + r""" Given any row iterator, only return rows which pass the filter. If 'header' is False, then all rows must pass the filter; by default, the first row will be passed through untested. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/csvkit/utilities/csvcut.py new/csvkit-1.0.4/csvkit/utilities/csvcut.py --- old/csvkit-1.0.3/csvkit/utilities/csvcut.py 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/csvkit/utilities/csvcut.py 2018-11-07 19:21:45.000000000 +0100 @@ -46,7 +46,7 @@ for row in rows: out_row = [row[column_id] if column_id < len(row) else None for column_id in column_ids] - if not self.args.delete_empty or ''.join(out_row): + if not self.args.delete_empty or any(out_row): output.writerow(out_row) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/csvkit/utilities/csvjoin.py new/csvkit-1.0.4/csvkit/utilities/csvjoin.py --- old/csvkit-1.0.3/csvkit/utilities/csvjoin.py 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/csvkit/utilities/csvjoin.py 2019-01-02 21:16:10.000000000 +0100 @@ -33,8 +33,8 @@ for path in self.args.input_paths: self.input_files.append(self._open_input_file(path)) - if len(self.input_files) < 2: - self.argparser.error('You must specify at least two files to join.') + if len(self.input_files) < 1: + self.argparser.error('You must specify at least one file to join.') if self.args.columns: join_column_names = self._parse_join_column_names(self.args.columns) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/csvkit/utilities/csvjson.py new/csvkit-1.0.4/csvkit/utilities/csvjson.py --- old/csvkit-1.0.3/csvkit/utilities/csvjson.py 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/csvkit/utilities/csvjson.py 2018-11-07 18:55:45.000000000 +0100 @@ -98,10 +98,7 @@ self.stream.write("\n") def can_stream(self): - return (self.args.streamOutput - and self.args.no_inference - and not self.args.skip_lines - and self.args.sniff_limit == 0) + return self.args.streamOutput and self.args.no_inference and self.args.sniff_limit == 0 and not self.args.skip_lines def is_geo(self): return self.args.lat and self.args.lon @@ -141,19 +138,13 @@ def output_geojson(self): table = self.read_csv_to_table() - geojson_generator = self.GeoJsonGenerator(self.args, - table.column_names) + geojson_generator = self.GeoJsonGenerator(self.args, table.column_names) if self.args.streamOutput: for row in table.rows: - self.dump_json( - geojson_generator.feature_for_row(row), - newline=True - ) + self.dump_json(geojson_generator.feature_for_row(row), newline=True) else: - self.dump_json( - geojson_generator.generate_feature_collection(table) - ) + self.dump_json(geojson_generator.generate_feature_collection(table)) def streaming_output_ndjson(self): rows = agate.csv.reader(self.input_file, **self.reader_kwargs) @@ -174,51 +165,31 @@ geojson_generator = self.GeoJsonGenerator(self.args, column_names) for row in rows: - self.dump_json(geojson_generator.feature_for_row(row), - newline=True) + self.dump_json(geojson_generator.feature_for_row(row), newline=True) class GeoJsonGenerator: def __init__(self, args, column_names): self.args = args self.column_names = column_names - self.lat_column = None - self.lon_column = None - self.type_column = None - self.geometry_column = None - self.id_column = None - - self.lat_column = match_column_identifier( - column_names, - self.args.lat, - self.args.zero_based - ) - - self.lon_column = match_column_identifier( - column_names, - self.args.lon, - self.args.zero_based - ) + + self.lat_column = match_column_identifier(column_names, self.args.lat, self.args.zero_based) + + self.lon_column = match_column_identifier(column_names, self.args.lon, self.args.zero_based) if self.args.type: - self.type_column = match_column_identifier( - column_names, - self.args.type, - self.args.zero_based - ) + self.type_column = match_column_identifier(column_names, self.args.type, self.args.zero_based) + else: + self.type_column = None if self.args.geometry: - self.geometry_column = match_column_identifier( - column_names, - self.args.geometry, - self.args.zero_based - ) + self.geometry_column = match_column_identifier(column_names, self.args.geometry, self.args.zero_based) + else: + self.geometry_column = None if self.args.key: - self.id_column = match_column_identifier( - column_names, - self.args.key, - self.args.zero_based - ) + self.id_column = match_column_identifier(column_names, self.args.key, self.args.zero_based) + else: + self.id_column = None def generate_feature_collection(self, table): features = [] @@ -241,34 +212,24 @@ items.insert(1, ('bbox', bounds.bbox())) if self.args.crs: - items.append( - ( - 'crs', - OrderedDict([ - ('type', 'name'), - ('properties', { - 'name': self.args.crs - }) - ]) - )) + items.append(('crs', OrderedDict([ + ('type', 'name'), + ('properties', { + 'name': self.args.crs, + }), + ]))) return OrderedDict(items) def feature_for_row(self, row): feature = OrderedDict([ ('type', 'Feature'), - ('properties', OrderedDict()) + ('properties', OrderedDict()), ]) for i, c in enumerate(row): # Prevent "type" or geo fields from being added to properties. - if ( - c is None or - i == self.type_column or - i == self.lat_column or - i == self.lon_column or - i == self.geometry_column - ): + if c is None or i in (self.type_column, self.lat_column, self.lon_column, self.geometry_column): continue elif i == self.id_column: feature['id'] = c @@ -297,7 +258,7 @@ if lon and lat: return OrderedDict([ ('type', 'Point'), - ('coordinates', [lon, lat]) + ('coordinates', [lon, lat]), ]) class GeoJsonBounds: @@ -311,10 +272,7 @@ return [self.min_lon, self.min_lat, self.max_lon, self.max_lat] def add_feature(self, feature): - if ( - 'geometry' in feature and - 'coordinates' in feature['geometry'] - ): + if 'geometry' in feature and 'coordinates' in feature['geometry']: self.update_coordinates(feature['geometry']['coordinates']) def update_lat(self, lat): @@ -330,10 +288,7 @@ self.max_lon = lon def update_coordinates(self, coordinates): - if ( - len(coordinates) <= 3 and - isinstance(coordinates[0], (float, int)) - ): + if len(coordinates) <= 3 and isinstance(coordinates[0], (float, int)): self.update_lon(coordinates[0]) self.update_lat(coordinates[1]) else: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/csvkit/utilities/csvsql.py new/csvkit-1.0.4/csvkit/utilities/csvsql.py --- old/csvkit-1.0.3/csvkit/utilities/csvsql.py 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/csvkit/utilities/csvsql.py 2018-10-09 20:25:24.000000000 +0200 @@ -46,13 +46,15 @@ self.argparser.add_argument('--create-if-not-exists', dest='create_if_not_exists', action='store_true', help='Create table if it does not exist, otherwise keep going. Only valid when --insert is specified.') self.argparser.add_argument('--overwrite', dest='overwrite', action='store_true', - help='Drop the table before creating. Only valid when --insert is specified.') + help='Drop the table before creating. Only valid when --insert is specified and --no-create is not specified.') self.argparser.add_argument('--db-schema', dest='db_schema', help='Optional name of database schema to create table(s) in.') self.argparser.add_argument('-y', '--snifflimit', dest='sniff_limit', type=int, help='Limit CSV dialect sniffing to the specified number of bytes. Specify "0" to disable sniffing entirely.') self.argparser.add_argument('-I', '--no-inference', dest='no_inference', action='store_true', help='Disable type inference when parsing the input.') + self.argparser.add_argument('--chunk-size', dest='chunk_size', type=int, + help='Chunk size for batch insert into the table. Only valid when --insert is specified.') def main(self): if sys.stdin.isatty() and not self.args.input_paths: @@ -84,10 +86,14 @@ self.argparser.error('The --create-if-not-exists option is only valid if --insert is also specified.') if self.args.overwrite and not self.args.insert: self.argparser.error('The --overwrite option is only valid if --insert is also specified.') + if self.args.overwrite and self.args.no_create: + self.argparser.error('The --overwrite option is only valid if --no-create is not specified.') if self.args.before_insert and not self.args.insert: - self.argparser.error('The --before_insert option is only valid if --insert is also specified.') + self.argparser.error('The --before-insert option is only valid if --insert is also specified.') if self.args.after_insert and not self.args.insert: - self.argparser.error('The --after_insert option is only valid if --insert is also specified.') + self.argparser.error('The --after-insert option is only valid if --insert is also specified.') + if self.args.chunk_size and not self.args.insert: + self.argparser.error('The --chunk-size option is only valid if --insert is also specified.') if self.args.no_create and self.args.create_if_not_exists: self.argparser.error('The --no-create and --create-if-not-exists options are mutually exclusive.') @@ -101,7 +107,7 @@ try: engine = create_engine(self.args.connection_string) except ImportError: - raise ImportError('You don\'t appear to have the necessary database backend installed for connection string you\'re trying to use. Available backends include:\n\nPostgresql:\tpip install psycopg2\nMySQL:\t\tpip install MySQL-python\n\nFor details on connection strings and other backends, please see the SQLAlchemy documentation on dialects at: \n\nhttp://www.sqlalchemy.org/docs/dialects/\n\n') + raise ImportError('You don\'t appear to have the necessary database backend installed for connection string you\'re trying to use. Available backends include:\n\nPostgresql:\tpip install psycopg2\nMySQL:\t\tpip install mysql-connector-python\n\nFor details on connection strings and other backends, please see the SQLAlchemy documentation on dialects at: \n\nhttp://www.sqlalchemy.org/docs/dialects/\n\n') self.connection = engine.connect() @@ -164,7 +170,8 @@ prefixes=self.args.prefix, db_schema=self.args.db_schema, constraints=not self.args.no_constraints, - unique_constraint=self.unique_constraint + unique_constraint=self.unique_constraint, + chunk_size=self.args.chunk_size ) if self.args.after_insert: @@ -200,10 +207,11 @@ rows = self.connection.execute(q) # Output the result of the last query as CSV - output = agate.csv.writer(self.output_file, **self.writer_kwargs) - output.writerow(rows._metadata.keys) - for row in rows: - output.writerow(row) + if rows.returns_rows: + output = agate.csv.writer(self.output_file, **self.writer_kwargs) + output.writerow(rows._metadata.keys) + for row in rows: + output.writerow(row) transaction.commit() diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/csvkit/utilities/csvstat.py new/csvkit-1.0.4/csvkit/utilities/csvstat.py --- old/csvkit-1.0.3/csvkit/utilities/csvstat.py 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/csvkit/utilities/csvstat.py 2018-11-21 21:42:24.000000000 +0100 @@ -124,7 +124,7 @@ self.output_file = codecs.getwriter('utf-8')(self.output_file) if self.args.count_only: - count = len(list(agate.csv.reader(self.input_file))) + count = len(list(agate.csv.reader(self.skip_lines(), **self.reader_kwargs))) if not self.args.no_header_row: count -= 1 @@ -171,6 +171,9 @@ else: self.print_stats(table, column_ids, stats) + def is_finite_decimal(self, value): + return isinstance(value, Decimal) and value.is_finite() + def print_one(self, table, column_id, operation, label=True, **kwargs): """ Print data for a single statistic. @@ -190,7 +193,7 @@ op = OPERATIONS[op_name]['aggregation'] stat = table.aggregate(op(column_id)) - if isinstance(stat, Decimal): + if self.is_finite_decimal(stat): stat = format_decimal(stat, locale=agate.config.get_option('default_locale')) except: stat = None @@ -224,7 +227,7 @@ op = op_data['aggregation'] v = table.aggregate(op(column_id)) - if isinstance(v, Decimal): + if self.is_finite_decimal(v): v = format_decimal(v, locale=agate.config.get_option('default_locale')) stats[op_name] = v @@ -268,7 +271,7 @@ if isinstance(column.data_type, agate.Number): v = row[column_name] - if isinstance(v, Decimal): + if self.is_finite_decimal(v): v = format_decimal(v, locale=agate.config.get_option('default_locale')) else: v = six.text_type(row[column_name]) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/csvkit/utilities/in2csv.py new/csvkit-1.0.4/csvkit/utilities/in2csv.py --- old/csvkit-1.0.3/csvkit/utilities/in2csv.py 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/csvkit/utilities/in2csv.py 2019-01-02 19:03:46.000000000 +0100 @@ -62,11 +62,14 @@ else: return open(path, 'rb') - def sheet_names(self, filetype): + def sheet_names(self, path, filetype): + input_file = self.open_excel_input_file(path) if filetype == 'xls': - return xlrd.open_workbook(file_contents=self.input_file.read()).sheet_names() + sheet_names = xlrd.open_workbook(file_contents=input_file.read()).sheet_names() elif filetype == 'xlsx': - return openpyxl.load_workbook(self.input_file, read_only=True, data_only=True).sheetnames + sheet_names = openpyxl.load_workbook(input_file, read_only=True, data_only=True).sheetnames + input_file.close() + return sheet_names def main(self): path = self.args.input_path @@ -87,22 +90,21 @@ if not filetype: self.argparser.error('Unable to automatically determine the format of the input file. Try specifying a format with --format.') - # Set the input file. - if filetype in ('xls', 'xlsx'): - self.input_file = self.open_excel_input_file(path) - else: - self.input_file = self._open_input_file(path) - if self.args.names_only: - sheets = self.sheet_names(filetype) + sheets = self.sheet_names(path, filetype) if sheets: for sheet in sheets: self.output_file.write('%s\n' % sheet) else: self.argparser.error('You cannot use the -n or --names options with non-Excel files.') - self.input_file.close() return + # Set the input file. + if filetype in ('xls', 'xlsx'): + self.input_file = self.open_excel_input_file(path) + else: + self.input_file = self._open_input_file(path) + # Set the reader's arguments. kwargs = {} @@ -148,7 +150,7 @@ if not hasattr(self.input_file, 'name'): raise ValueError('DBF files can not be converted from stdin. You must pass a filename.') table = agate.Table.from_dbf(self.input_file.name, **kwargs) - table.to_csv(self.output_file) + table.to_csv(self.output_file, **self.writer_kwargs) if self.args.write_sheets: # Close and re-open the file, as the file object has been mutated or closed. @@ -157,19 +159,19 @@ self.input_file = self.open_excel_input_file(path) if self.args.write_sheets == '-': - sheets = self.sheet_names(filetype) + sheets = self.sheet_names(path, filetype) else: sheets = [int(sheet) if sheet.isdigit() else sheet for sheet in self.args.write_sheets.split(',')] if filetype == 'xls': - tables = agate.Table.from_xls(self.input_file, sheet=sheets, **kwargs) + tables = agate.Table.from_xls(self.input_file, sheet=sheets, encoding_override=self.args.encoding_xls, **kwargs) elif filetype == 'xlsx': tables = agate.Table.from_xlsx(self.input_file, sheet=sheets, **kwargs) base = splitext(self.input_file.name)[0] for i, table in enumerate(tables.values()): with open('%s_%d.csv' % (base, i), 'w') as f: - table.to_csv(f) + table.to_csv(f, **self.writer_kwargs) self.input_file.close() diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/csvkit/utilities/sql2csv.py new/csvkit-1.0.4/csvkit/utilities/sql2csv.py --- old/csvkit-1.0.3/csvkit/utilities/sql2csv.py 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/csvkit/utilities/sql2csv.py 2018-09-20 19:09:31.000000000 +0200 @@ -44,8 +44,8 @@ except ImportError: raise ImportError("You don't appear to have the necessary database backend installed for connection " "string you're trying to use. Available backends include:\n\nPostgreSQL:\tpip install " - "psycopg2\nMySQL:\t\tpip install MySQL-python\n\nFor details on connection strings and " - "other backends, please see the SQLAlchemy documentation on dialects at:\n\n" + "psycopg2\nMySQL:\t\tpip install mysql-connector-python\n\nFor details on connection " + "strings and other backends, please see the SQLAlchemy documentation on dialects at:\n\n" "http://www.sqlalchemy.org/docs/dialects/\n\n") connection = engine.connect() diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/csvkit.egg-info/PKG-INFO new/csvkit-1.0.4/csvkit.egg-info/PKG-INFO --- old/csvkit-1.0.3/csvkit.egg-info/PKG-INFO 2018-03-11 16:16:59.000000000 +0100 +++ new/csvkit-1.0.4/csvkit.egg-info/PKG-INFO 2019-03-16 17:27:10.000000000 +0100 @@ -1,19 +1,16 @@ Metadata-Version: 1.1 Name: csvkit -Version: 1.0.3 +Version: 1.0.4 Summary: A suite of command-line tools for working with CSV, the king of tabular file formats. Home-page: http://csvkit.rtfd.org/ Author: Christopher Groskopf Author-email: [email protected] License: MIT +Description-Content-Type: UNKNOWN Description: .. image:: https://secure.travis-ci.org/wireservice/csvkit.svg :target: https://travis-ci.org/wireservice/csvkit :alt: Build Status - .. image:: https://gemnasium.com/wireservice/csvkit.svg - :target: https://gemnasium.com/wireservice/csvkit - :alt: Dependency Status - .. image:: https://coveralls.io/repos/wireservice/csvkit/badge.svg?branch=master :target: https://coveralls.io/r/wireservice/csvkit :alt: Coverage Status @@ -32,17 +29,14 @@ csvkit is a suite of command-line tools for converting to and working with CSV, the king of tabular file formats. - It is inspired by pdftk, gdal and the original csvcut tool by Joe Germuska and Aaron Bycoffe. - - If you need to do more complex data analysis than csvkit can handle, use `agate <https://github.com/wireservice/agate>`_. + It is inspired by pdftk, GDAL and the original csvcut tool by Joe Germuska and Aaron Bycoffe. Important links: + * Documentation: http://csvkit.rtfd.org/ * Repository: https://github.com/wireservice/csvkit * Issues: https://github.com/wireservice/csvkit/issues - * Documentation: http://csvkit.rtfd.org/ * Schemas: https://github.com/wireservice/ffs - * Buildbot: https://travis-ci.org/wireservice/csvkit Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable @@ -55,10 +49,10 @@ Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2.7 -Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 +Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Programming Language :: Python :: Implementation :: PyPy Classifier: Topic :: Scientific/Engineering :: Information Analysis diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/csvkit.egg-info/SOURCES.txt new/csvkit-1.0.4/csvkit.egg-info/SOURCES.txt --- old/csvkit-1.0.3/csvkit.egg-info/SOURCES.txt 2018-03-11 16:16:59.000000000 +0100 +++ new/csvkit-1.0.4/csvkit.egg-info/SOURCES.txt 2019-03-16 17:27:10.000000000 +0100 @@ -109,6 +109,7 @@ examples/test_literal_order.csv examples/test_locale.csv examples/test_locale_converted.csv +examples/test_numeric_date_format.csv examples/test_query.sql examples/test_skip_lines.csv examples/test_skip_lines.xls diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/cli.rst new/csvkit-1.0.4/docs/cli.rst --- old/csvkit-1.0.3/docs/cli.rst 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/docs/cli.rst 2018-09-20 19:09:31.000000000 +0200 @@ -26,6 +26,15 @@ scripts/csvsort scripts/csvstack +To transpose CSVs, consider `csvtool <http://colin.maudry.com/csvtool-manual-page/>`_. Install ``csvtool`` on Linux using your package manager, or on macOS using:: + + brew install ocaml + opam install csv + ln -s ~/.opam/system/bin/csvtool /usr/local/bin/ + csvtool --help + +To run ``sed``-like commands on CSV files, consider `csvsed <https://github.com/metagriffin/csvsed>`_. + Output and Analysis =================== @@ -39,7 +48,7 @@ scripts/csvsql scripts/csvstat -To diff CSVs, consider `daff <http://paulfitz.github.io/daff/>`_. An alternative to :doc:`csvsql` is `q <https://github.com/harelba/q>`_. +To diff CSVs, consider `daff <https://github.com/paulfitz/daff>`_. An alternative to :doc:`csvsql` is `q <https://github.com/harelba/q>`_. Common arguments ================ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/common_arguments.rst new/csvkit-1.0.4/docs/common_arguments.rst --- old/csvkit-1.0.3/docs/common_arguments.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/docs/common_arguments.rst 2019-01-02 21:16:10.000000000 +0100 @@ -2,7 +2,7 @@ Arguments common to all tools ============================= -All tools which accept CSV as input share a set of common command-line arguments:: +csvkit's tools share a set of common command-line arguments. Not every argument is supported by every tool, so please check which are supported by the tool you are using with the :code:`--help` flag:: -d DELIMITER, --delimiter DELIMITER Delimiting character of the input CSV file. @@ -37,10 +37,11 @@ Specify a strptime datetime format string like "%m/%d/%Y %I:%M %p". -H, --no-header-row Specify that the input CSV file has no header row. - Will create default headers (A,B,C,...). + Will create default headers (a,b,c,...). -K SKIP_LINES, --skip-lines SKIP_LINES - Specify the number of initial lines to skip (e.g. - comments, copyright notices, empty rows). + Specify the number of initial lines to skip before the + header row (e.g. comments, copyright notices, empty + rows). -v, --verbose Print detailed tracebacks when errors occur. -l, --linenumbers Insert a column of line numbers at the front of the output. Useful when piping to grep or as a simple @@ -50,7 +51,10 @@ numbering. -V, --version Display version information and exit. -These arguments may be used to override csvkit's default "smart" parsing of CSV files. This is frequently necessary if the input file uses a particularly unusual style of quoting or is an encoding that is not compatible with utf-8. Not every command is supported by every tool, but the majority of them are. +These arguments can be used to override csvkit's default "smart" parsing of CSV files. This may be necessary, for example, if the input file uses a particularly unusual quoting style or has an encoding that is incompatible with UTF-8. -Note that the output of csvkit's tools is always formatted with "default" formatting options. This means that when executing multiple csvkit commands (either with a pipe or via intermediary files) it is only ever necessary to specify formatting arguments the first time. (And doing so for subsequent commands will likely cause them to fail.) +For example, to disable CSV sniffing, set :code:`--snifflimit 0` and then, if necessary, set the :code:`--delimiter` and :code:`--quotechar` options yourself. To disable type inference, add the :code:`--no-inference` flag. +The output of csvkit's tools is always formatted with "default" formatting options. This means that when executing multiple csvkit commands (either with a pipe or through intermediary files) it is only ever necessary to specify these arguments the first time (and doing so for subsequent commands will likely cause them to fail). + +See the documentation of :doc:`/scripts/csvclean` for a description of the default formatting options. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/conf.py new/csvkit-1.0.4/docs/conf.py --- old/csvkit-1.0.3/docs/conf.py 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/docs/conf.py 2018-09-20 19:09:31.000000000 +0200 @@ -41,9 +41,9 @@ # built documents. # # The short X.Y version. -version = '1.0.3' +version = '1.0.4' # The full version, including alpha/beta/rc tags. -release = '1.0.3' +release = '1.0.4' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/contributing.rst new/csvkit-1.0.4/docs/contributing.rst --- old/csvkit-1.0.3/docs/contributing.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/docs/contributing.rst 2018-11-07 18:54:23.000000000 +0100 @@ -87,13 +87,13 @@ Currently, the following tools buffer: * :doc:`/scripts/csvjoin` -* :doc:`/scripts/csvjson` unless ``--no-inference --stream --snifflimit 0`` is set and ``--skip-lines`` isn't set +* :doc:`/scripts/csvjson` unless :code:`--stream --no-inference --snifflimit 0` is set and :code:`--skip-lines` isn't set * :doc:`/scripts/csvlook` * :doc:`/scripts/csvpy` * :doc:`/scripts/csvsort` * :doc:`/scripts/csvsql` * :doc:`/scripts/csvstat` -* :doc:`/scripts/in2csv` unless ``--format ndjson --no-inference`` is set, or unless ``--format csv --no-inference --snifflimit 0`` is set and ``--no-header-row`` and ``--skip-lines`` aren't set +* :doc:`/scripts/in2csv` unless :code:`--format ndjson --no-inference` is set, or unless :code:`--format csv --no-inference --snifflimit 0` is set and :code:`--no-header-row` and :code:`--skip-lines` aren't set Legalese ======== diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/index.rst new/csvkit-1.0.4/docs/index.rst --- old/csvkit-1.0.3/docs/index.rst 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/docs/index.rst 2018-09-20 19:09:31.000000000 +0200 @@ -7,6 +7,24 @@ .. include:: ../README.rst +First time? See :doc:`tutorial`. + +.. note:: + + If you need to do more complex data analysis than csvkit can handle, use `agate <https://github.com/wireservice/agate>`_. + +.. note:: + + To change the field separator, line terminator, etc. of the **output**, you must use :doc:`/scripts/csvformat`. + +.. note:: + + csvkit, by default, `sniffs <https://docs.python.org/3.5/library/csv.html#csv.Sniffer>`_ CSV formats (it deduces whether commas, tabs or spaces delimit fields, for example), and performs type inference (it converts text to numbers, dates, booleans, etc.). These features are useful and work well in most cases, but occasional errors occur. If you don't need these features, set :code:`--snifflimit 0` (:code:`-y 0`) and :code:`--no-inference` (:code:`-I`). + +.. note:: + + If you need csvkit to be faster or to handle larger files, you may be reaching the limits of csvkit. + Why csvkit? =========== diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/scripts/csvclean.rst new/csvkit-1.0.4/docs/scripts/csvclean.rst --- old/csvkit-1.0.3/docs/scripts/csvclean.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/docs/scripts/csvclean.rst 2018-09-20 19:09:31.000000000 +0200 @@ -14,7 +14,7 @@ * removes optional quote characters, unless the `--quoting` (`-u`) option is set to change this behavior * changes the field delimiter to a comma, if the input delimiter is set with the `--delimiter` (`-d`) or `--tabs` (`-t`) options -* changes the record delimiter to a line feed +* changes the record delimiter to a line feed (LF or ``\n``) * changes the quote character to a double-quotation mark, if the character is set with the `--quotechar` (`-q`) option * changes the character encoding to UTF-8, if the input encoding is set with the `--encoding` (`-e`) option @@ -47,3 +47,7 @@ Line 1: Expected 3 columns, found 4 columns Line 2: Expected 3 columns, found 2 columns + +To change the line ending from line feed (LF or ``\n``) to carriage return and line feed (CRLF or ``\r\n``) use:: + + csvformat -M $'\r\n' examples/dummy.csv diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/scripts/csvcut.rst new/csvkit-1.0.4/docs/scripts/csvcut.rst --- old/csvkit-1.0.3/docs/scripts/csvcut.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/docs/scripts/csvcut.rst 2018-11-08 04:18:33.000000000 +0100 @@ -41,6 +41,10 @@ csvcut does not implement row filtering, for this you should pipe data to :doc:`csvgrep`. +.. note:: + + If a data row is longer than the header row, its additional columns are truncated. Use :doc:`csvclean` first to fix such rows. + Examples ======== @@ -71,7 +75,6 @@ Post-Vietnam Era Veteran's Educational Assistance Program TOTAL - Extract the first and third columns:: csvcut -c 1,3 examples/realdata/FY09_EDU_Recipients_by_State.csv @@ -93,3 +96,6 @@ csvcut -c 1 examples/realdata/FY09_EDU_Recipients_by_State.csv | sed 1d | sort | uniq +Or:: + + csvcut -c 1 examples/realdata/FY09_EDU_Recipients_by_State.csv | csvsql --query 'SELECT DISTINCT("State Name") FROM stdin' diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/scripts/csvgrep.rst new/csvkit-1.0.4/docs/scripts/csvgrep.rst --- old/csvkit-1.0.3/docs/scripts/csvgrep.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/docs/scripts/csvgrep.rst 2018-09-20 19:11:36.000000000 +0200 @@ -59,3 +59,7 @@ Search for rows that do not contain an empty state cell:: csvgrep -c 1 -r "^$" -i examples/realdata/FY09_EDU_Recipients_by_State.csv + +Perform a case-insensitive search:: + + csvgrep -c 1 -r "(?i)illinois" examples/realdata/FY09_EDU_Recipients_by_State.csv diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/scripts/csvjson.rst new/csvkit-1.0.4/docs/scripts/csvjson.rst --- old/csvkit-1.0.3/docs/scripts/csvjson.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/docs/scripts/csvjson.rst 2018-09-20 19:09:31.000000000 +0200 @@ -37,9 +37,16 @@ --lon LON A column index or name containing a longitude. Output will be GeoJSON instead of JSON. Only valid if --lat is also specified. + --type TYPE A column index or name containing a GeoJSON type. + Output will be GeoJSON instead of JSON. Only valid if + --lat and --lon are also specified. + --geometry GEOMETRY A column index or name containing a GeoJSON geometry. + Output will be GeoJSON instead of JSON. Only valid if + --lat and --lon are also specified. --crs CRS A coordinate reference system string to be included with GeoJSON output. Only valid if --lat and --lon are also specified. + --no-bbox Disable the calculation of a bounding box. --stream Output JSON as a stream of newline-separated objects, rather than an as an array. -y SNIFF_LIMIT, --snifflimit SNIFF_LIMIT diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/scripts/csvsql.rst new/csvkit-1.0.4/docs/scripts/csvsql.rst --- old/csvkit-1.0.3/docs/scripts/csvsql.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/docs/scripts/csvsql.rst 2019-02-23 19:26:21.000000000 +0100 @@ -12,12 +12,12 @@ [-S] [--blanks] [--date-format DATE_FORMAT] [--datetime-format DATETIME_FORMAT] [-H] [-K SKIP_LINES] [-v] [-l] [--zero] [-V] - [-i {firebird,mssql,mysql,oracle,postgresql,sqlite,sybase}] + [-i {firebird,mssql,mysql,oracle,postgresql,sqlite,sybase,crate}] [--db CONNECTION_STRING] [--query QUERY] [--insert] [--prefix PREFIX] [--tables TABLE_NAMES] [--no-constraints] [--unique-constraint UNIQUE_CONSTRAINT] [--no-create] [--create-if-not-exists] [--overwrite] [--db-schema DB_SCHEMA] - [-y SNIFF_LIMIT] [-I] + [-y SNIFF_LIMIT] [-I] [--chunk-size NUM] [FILE [FILE ...]] Generate SQL statements for one or more CSV files, or execute those statements @@ -29,7 +29,7 @@ optional arguments: -h, --help show this help message and exit - -i {firebird,mssql,mysql,oracle,postgresql,sqlite,sybase}, --dialect {firebird,mssql,mysql,oracle,postgresql,sqlite,sybase} + -i {firebird,mssql,mysql,oracle,postgresql,sqlite,sybase,crate}, --dialect {firebird,mssql,mysql,oracle,postgresql,sqlite,sybase,crate} Dialect of SQL to generate. Only valid when --db is not specified. --db CONNECTION_STRING @@ -57,7 +57,8 @@ Create table if it does not exist, otherwise keep going. Only valid when --insert is specified. --overwrite Drop the table before creating. Only valid when - --insert is specified. + --insert is specified and --no-create is not + specified. --db-schema DB_SCHEMA Optional name of database schema to create table(s) in. @@ -65,7 +66,9 @@ Limit CSV dialect sniffing to the specified number of bytes. Specify "0" to disable sniffing entirely. -I, --no-inference Disable type inference when parsing the input. - + --chunk-size NUM + Chunk size for batch insert into the table. + Only valid when --insert is specified. See also: :doc:`../common_arguments`. @@ -76,7 +79,7 @@ .. note:: - Using the ``--query`` option may cause rounding (in Python 2) or introduce [Python floating point issues](https://docs.python.org/3.4/tutorial/floatingpoint.html) (in Python 3). + Using the :code:`--query` option may cause rounding (in Python 2) or introduce [Python floating point issues](https://docs.python.org/3.4/tutorial/floatingpoint.html) (in Python 3). Examples ======== @@ -90,7 +93,7 @@ createdb test csvsql --db postgresql:///test --tables fy09 --insert examples/realdata/FY09_EDU_Recipients_by_State.csv -For large tables it may not be practical to process the entire table. One solution to this is to analyze a sample of the table. In this case it can be useful to turn off length limits and null checks with the ``no-constraints`` option:: +For large tables it may not be practical to process the entire table. One solution to this is to analyze a sample of the table. In this case it can be useful to turn off length limits and null checks with the :code:`--no-constraints` option:: head -n 20 examples/realdata/FY09_EDU_Recipients_by_State.csv | csvsql --no-constraints --tables fy09 @@ -114,4 +117,8 @@ Concatenate two columns:: - csvsql --query "select a||b from 'dummy3'" examples/dummy3.csv + csvsql --query "select a || b from 'dummy3'" examples/dummy3.csv + +If a column contains null values, you must ``COALESCE`` the column:: + + csvsql --query "select a || COALESCE(b, '') from 'sort_ints_nulls'" --no-inference examples/sort_ints_nulls.csv diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/scripts/csvstack.rst new/csvkit-1.0.4/docs/scripts/csvstack.rst --- old/csvkit-1.0.3/docs/scripts/csvstack.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/docs/scripts/csvstack.rst 2018-09-20 19:09:31.000000000 +0200 @@ -37,7 +37,7 @@ .. warn:: - If you redirect output to an input file like ``csvstack file.csv > file.csv``, the file will grow indefinitely. + If you redirect output to an input file like :code:`csvstack file.csv > file.csv`, the file will grow indefinitely. Examples ======== diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/scripts/in2csv.rst new/csvkit-1.0.4/docs/scripts/in2csv.rst --- old/csvkit-1.0.3/docs/scripts/in2csv.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/docs/scripts/in2csv.rst 2019-03-01 04:19:33.000000000 +0100 @@ -97,3 +97,7 @@ Convert a DBase DBF file to an equivalent CSV:: in2csv examples/testdbf.dbf + +This tool names unnamed headers. To avoid that behavior, run:: + + in2csv --no-header-row examples/test.xlsx | tail -n +2 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/tricks.rst new/csvkit-1.0.4/docs/tricks.rst --- old/csvkit-1.0.3/docs/tricks.rst 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/docs/tricks.rst 2018-09-20 19:09:31.000000000 +0200 @@ -25,13 +25,13 @@ Specifying STDIN as a file -------------------------- -Most tools use ``STDIN`` as input if no filename is given, but tools that accept multiple inputs like :doc:`scripts/csvjoin` and :doc:`scripts/csvstack` don't. To use ``STDIN`` as an input to these tools, use ``-`` as the filename. For example, these three commands produce the same output:: +Most tools use ``STDIN`` as input if no filename is given, but tools that accept multiple inputs like :doc:`/scripts/csvjoin` and :doc:`/scripts/csvstack` don't. To use ``STDIN`` as an input to these tools, use ``-`` as the filename. For example, these three commands produce the same output:: csvstat examples/dummy.csv cat examples/dummy.csv | csvstat cat examples/dummy.csv | csvstat - -``csvstack`` can take a filename and ``STDIN`` as input, for example:: +:doc:`/scripts/csvstack` can take a filename and ``STDIN`` as input, for example:: cat examples/dummy.csv | csvstack examples/dummy3.csv - @@ -49,9 +49,14 @@ * Python 2.7+ * Python 3.3+ -* `PyPy <http://pypy.org/>`_ -It is tested on OS X, and has also been used on Linux and Windows. +It is tested on macOS, and has also been used on Linux and Windows. + +If installing on macOS, you may need to install Homebrew first:: + + /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" + brew install python + pip install csvkit If installing on Ubuntu, you may need to install Python's development headers first:: @@ -63,7 +68,7 @@ pip install --upgrade setuptools pip install --upgrade csvkit -On OS X, if you see `OSError: [Errno 1] Operation not permitted`, try:: +On macOS, if you see ``OSError: [Errno 1] Operation not permitted``, try:: sudo pip install --ignore-installed csvkit @@ -81,7 +86,8 @@ * Are values appearing in incorrect columns? * Does the output combine multiple fields into a single column with double-quotes? * Does the outplit split a single field into multiple columns? -* Are `csvstat -c 1` and `csvstat --count` reporting inconsistent row counts? +* Are :code:`csvstat -c 1` and :code:`csvstat --count` reporting inconsistent row counts? +* Do you see ``Row # has # values, but Table only has # columns.``? These may be symptoms of CSV sniffing gone wrong. As there is no single, standard CSV format, csvkit uses Python's `csv.Sniffer <https://docs.python.org/3.5/library/csv.html#csv.Sniffer>`_ to deduce the format of a CSV file: that is, the field delimiter and quote character. By default, the entire file is sent for sniffing, which can be slow. You can send a small sample with the :code:`--snifflimit` option. If you're encountering any cases above, you can try setting :code:`--snifflimit 0` to disable sniffing and set the :code:`--delimiter` and :code:`--quotechar` options yourself. @@ -96,26 +102,26 @@ These may be symptoms of csvkit's type inference being too aggressive for your data. CSV is a text format, but it may contain text representing numbers, dates, booleans or other types. csvkit attempts to reverse engineer that text into proper data types—a process called "type inference". -For some data, type inference can be error prone. If necessary you can disable it with the To :code:`--no-inference` switch. This will force all columns to be treated as regular text. +For some data, type inference can be error prone. If necessary you can disable it with the :code:`--no-inference` switch. This will force all columns to be treated as regular text. Slow performance ---------------- -csvkit's tools fall into two categories: Those that load an entire CSV into memory (e.g. :doc:`/scripts/csvstat`) and those that only read data one row at a time (e.g. :doc:`/scripts/csvcut`). Those that stream results will generally be very fast. For those that buffer the entire file, the slowest part of that process is typically the "type inference" described in the previous section. +csvkit's tools fall into two categories: Those that load an entire CSV into memory (e.g. :doc:`/scripts/csvstat`) and those that only read data one row at a time (e.g. :doc:`/scripts/csvcut`). Those that stream results will generally be very fast. See :doc:`contributing` for a full list. For those that buffer the entire file, the slowest part of that process is typically the "type inference" described in the previous section. If a tool is too slow to be practical for your data try setting the :code:`--snifflimit` option or using the :code:`--no-inference`. Database errors --------------- -Are you seeing this error message, even after running :code:`pip install psycopg2` or :code:`pip install MySQL-python`? +Are you seeing this error message, even after running :code:`pip install psycopg2` or :code:`pip install mysql-connector-python`? :: You don't appear to have the necessary database backend installed for connection string you're trying to use. Available backends include: Postgresql: pip install psycopg2 - MySQL: pip install MySQL-python + MySQL: pip install mysql-connector-python For details on connection strings and other backends, please see the SQLAlchemy documentation on dialects at: diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/tutorial/1_getting_started.rst new/csvkit-1.0.4/docs/tutorial/1_getting_started.rst --- old/csvkit-1.0.3/docs/tutorial/1_getting_started.rst 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/docs/tutorial/1_getting_started.rst 2018-09-20 19:09:31.000000000 +0200 @@ -137,11 +137,11 @@ In addition to specifying filenames, all csvkit tools accept an input file via "standard in". This means that, using the ``|`` ("pipe") character we can use the output of one csvkit tool as the input of the next. -In the example above, the output of ``csvcut`` becomes the input to ``csvlook``. This also allow us to pipe output to standard Unix commands such as ``head``, which prints only the first ten lines of its input. Here, the output of ``csvlook`` becomes the input of ``head``. +In the example above, the output of :doc:`/scripts/csvcut` becomes the input to :doc:`/scripts/csvlook`. This also allow us to pipe output to standard Unix commands such as ``head``, which prints only the first ten lines of its input. Here, the output of :doc:`/scripts/csvlook` becomes the input of ``head``. Piping is a core feature of csvkit. Of course, you can always write the output of each command to a file using ``>``. However, it's often faster and more convenient to use pipes to chain several commands together. -We can also pipe ``in2csv``, allowing us to combine all our previous operations into one: +We can also pipe :doc:`/scripts/in2csv`, allowing us to combine all our previous operations into one: .. code-block:: bash diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/tutorial/2_examining_the_data.rst new/csvkit-1.0.4/docs/tutorial/2_examining_the_data.rst --- old/csvkit-1.0.3/docs/tutorial/2_examining_the_data.rst 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/docs/tutorial/2_examining_the_data.rst 2018-09-20 19:09:31.000000000 +0200 @@ -7,9 +7,9 @@ In the previous section we saw how we could use :doc:`csvlook` and :doc:`csvcut` to view slices of our data. This is a good tool for exploring a dataset, but in practice we usually need to get the broadest possible view before we can start diving into specifics. -:doc:`/scripts/csvstat` is designed to give us just such a broad understanding of our data. Inspired by the ``summary()`` function from the computational statistics programming language `"R" <http://www.r-project.org/>`_, ``csvstat`` will generate summary statistics for all the data in a CSV file. +:doc:`/scripts/csvstat` is designed to give us just such a broad understanding of our data. Inspired by the ``summary()`` function from the computational statistics programming language `"R" <http://www.r-project.org/>`_, :doc:`/scripts/csvstat` will generate summary statistics for all the data in a CSV file. -Let's examine summary statistics for a few columns from our dataset. As we learned in the last section, we can use ``csvcut`` and a pipe to pick out the columns we want: +Let's examine summary statistics for a few columns from our dataset. As we learned in the last section, we can use :doc:`/scripts/csvcut` and a pipe to pick out the columns we want: .. code-block:: bash @@ -72,7 +72,7 @@ csvgrep: find the data you need =============================== -After reviewing the summary statistics you might wonder what equipment was received by a particular county. To get a simple answer to the question we can use :doc:`/scripts/csvgrep` to search for the state's name amongst the rows. Let's also use ``csvcut`` to just look at the columns we care about and ``csvlook`` to format the output: +After reviewing the summary statistics you might wonder what equipment was received by a particular county. To get a simple answer to the question we can use :doc:`/scripts/csvgrep` to search for the state's name amongst the rows. Let's also use :doc:`/scripts/csvcut` to just look at the columns we care about and :doc:`/scripts/csvlook` to format the output: .. code-block:: bash diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/tutorial/3_power_tools.rst new/csvkit-1.0.4/docs/tutorial/3_power_tools.rst --- old/csvkit-1.0.3/docs/tutorial/3_power_tools.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/docs/tutorial/3_power_tools.rst 2018-09-20 19:09:31.000000000 +0200 @@ -106,7 +106,7 @@ | NANCE | RIFLE,7.62 MILLIMETER | 3,730 | | NANCE | RIFLE,7.62 MILLIMETER | 3,730 | -Two counties with fewer than one-thousand residents were the recipients of 5.56 millimeter assault rifles. This simple example demonstrates the power of joining datasets. Although SQL will always be a more flexible option, ``csvjoin`` will often get you where you need to go faster. +Two counties with fewer than one-thousand residents were the recipients of 5.56 millimeter assault rifles. This simple example demonstrates the power of joining datasets. Although SQL will always be a more flexible option, :doc:`/scripts/csvjoin` will often get you where you need to go faster. csvstack: combining subsets =========================== @@ -164,14 +164,14 @@ Row count: 2611 -If you supply the ``-g`` flag then ``csvstack`` can also add a "grouping column" to each row, so that you can tell which file each row came from. In this case we don't need this, but you can imagine a situation in which instead of having a ``county`` column each of this datasets had simply been named ``nebraska.csv`` and ``kansas.csv``. In that case, using a grouping column would prevent us from losing information when we stacked them. +If you supply the :code:`-g` flag then :doc:`/scripts/csvstack` can also add a "grouping column" to each row, so that you can tell which file each row came from. In this case we don't need this, but you can imagine a situation in which instead of having a ``county`` column each of this datasets had simply been named ``nebraska.csv`` and ``kansas.csv``. In that case, using a grouping column would prevent us from losing information when we stacked them. csvsql and sql2csv: ultimate power ================================== -Sometimes (almost always), the command-line isn't enough. It would be crazy to try to do all your analysis using command-line tools. Often times, the correct tool for data analysis is SQL. :doc:`/scripts/csvsql` and :doc:`/scripts/sql2csv` form a bridge that eases migrating your data into and out of a SQL database. For smaller datasets ``csvsql`` can also leverage `sqlite <https://www.sqlite.org/>`_ to allow execution of ad hoc SQL queries without ever touching a database. +Sometimes (almost always), the command-line isn't enough. It would be crazy to try to do all your analysis using command-line tools. Often times, the correct tool for data analysis is SQL. :doc:`/scripts/csvsql` and :doc:`/scripts/sql2csv` form a bridge that eases migrating your data into and out of a SQL database. For smaller datasets :doc:`/scripts/csvsql` can also leverage `sqlite <https://www.sqlite.org/>`_ to allow execution of ad hoc SQL queries without ever touching a database. -By default, ``csvsql`` will generate a create table statement for your data. You can specify what sort of database you are using with the ``-i`` flag: +By default, :doc:`/scripts/csvsql` will generate a create table statement for your data. You can specify what sort of database you are using with the ``-i`` flag: .. code-block:: bash @@ -199,27 +199,27 @@ margin_of_error DECIMAL NOT NULL ); -Here we have the sqlite "create table" statement for our joined data. You'll see that, like ``csvstat``, ``csvsql`` has done its best to infer the column types. +Here we have the sqlite "create table" statement for our joined data. You'll see that, like :doc:`/scripts/csvstat`, :doc:`/scripts/csvsql` has done its best to infer the column types. -Often you won't care about storing the SQL statements locally. You can also use ``csvsql`` to create the table directly in the database on your local machine. If you add the ``--insert`` option the data will also be imported: +Often you won't care about storing the SQL statements locally. You can also use :doc:`/scripts/csvsql` to create the table directly in the database on your local machine. If you add the :code:`--insert` option the data will also be imported: .. code-block:: bash csvsql --db sqlite:///leso.db --insert joined.csv -How can we check that our data was imported successfully? We could use the sqlite command-line interface, but rather than worry about the specifics of another tool, we can also use ``sql2csv``: +How can we check that our data was imported successfully? We could use the sqlite command-line interface, but rather than worry about the specifics of another tool, we can also use :doc:`/scripts/sql2csv`: .. code-block:: bash sql2csv --db sqlite:///leso.db --query "select * from joined" -Note that the ``--query`` parameter to ``sql2csv`` accepts any SQL query. For example, to export Douglas county from the ``joined`` table from our sqlite database, we would run: +Note that the :code:`--query` parameter to :doc:`/scripts/sql2csv` accepts any SQL query. For example, to export Douglas county from the ``joined`` table from our sqlite database, we would run: .. code-block:: bash sql2csv --db sqlite:///leso.db --query "select * from joined where county='DOUGLAS';" > douglas.csv -Sometimes, if you will only be running a single query, even constructing the database is a waste of time. For that case, you can actually skip the database entirely and ``csvsql`` will create one in memory for you: +Sometimes, if you will only be running a single query, even constructing the database is a waste of time. For that case, you can actually skip the database entirely and :doc:`/scripts/csvsql` will create one in memory for you: .. code-block:: bash @@ -230,4 +230,4 @@ Summing up ========== -``csvjoin``, ``csvstack``, ``csvsql`` and ``sql2csv`` represent the power tools of csvkit. Using these tools can vastly simplify processes that would otherwise require moving data between other systems. But what about cases where these tools still don't cut it? What if you need to move your data onto the web or into a legacy database system? We've got a few solutions for those problems in our final section, :doc:`4_going_elsewhere`. +:doc:`/scripts/csvjoin`, :doc:`/scripts/csvstack`, :doc:`/scripts/csvsql` and :doc:`/scripts/sql2csv` represent the power tools of csvkit. Using these tools can vastly simplify processes that would otherwise require moving data between other systems. But what about cases where these tools still don't cut it? What if you need to move your data onto the web or into a legacy database system? We've got a few solutions for those problems in our final section, :doc:`4_going_elsewhere`. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/docs/tutorial/4_going_elsewhere.rst new/csvkit-1.0.4/docs/tutorial/4_going_elsewhere.rst --- old/csvkit-1.0.3/docs/tutorial/4_going_elsewhere.rst 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/docs/tutorial/4_going_elsewhere.rst 2018-09-20 19:09:31.000000000 +0200 @@ -5,7 +5,7 @@ csvjson: going online ===================== -Very frequently one of the last steps in any data analysis is to get the data onto the web for display as a table, map or chart. CSV is rarely the ideal format for this. More often than not what you want is JSON and that's where :doc:`/scripts/csvjson` comes in. ``csvjson`` takes an input CSV and outputs neatly formatted JSON. For the sake of illustration, let's use ``csvcut`` and ``csvgrep`` to convert just a small slice of our data: +Very frequently one of the last steps in any data analysis is to get the data onto the web for display as a table, map or chart. CSV is rarely the ideal format for this. More often than not what you want is JSON and that's where :doc:`/scripts/csvjson` comes in. :doc:`/scripts/csvjson` takes an input CSV and outputs neatly formatted JSON. For the sake of illustration, let's use :doc:`/scripts/csvcut` and :doc:`/scripts/csvgrep` to convert just a small slice of our data: .. code-block:: bash @@ -50,7 +50,7 @@ } } -For making maps, ``csvjson`` can also output GeoJSON, see its :doc:`/scripts/csvjson` for more details. +For making maps, :doc:`/scripts/csvjson` can also output GeoJSON, see its :doc:`/scripts/csvjson` for more details. csvpy: going into code ====================== diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/examples/test_numeric_date_format.csv new/csvkit-1.0.4/examples/test_numeric_date_format.csv --- old/csvkit-1.0.3/examples/test_numeric_date_format.csv 1970-01-01 01:00:00.000000000 +0100 +++ new/csvkit-1.0.4/examples/test_numeric_date_format.csv 2018-11-07 15:23:30.000000000 +0100 @@ -0,0 +1,3 @@ +a +20140102 +20121231 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/setup.py new/csvkit-1.0.4/setup.py --- old/csvkit-1.0.3/setup.py 2018-03-11 16:11:37.000000000 +0100 +++ new/csvkit-1.0.4/setup.py 2019-02-23 17:57:27.000000000 +0100 @@ -18,7 +18,7 @@ setup( name='csvkit', - version='1.0.3', + version='1.0.4', description='A suite of command-line tools for working with CSV, the king of tabular file formats.', long_description=open('README.rst').read(), author='Christopher Groskopf', @@ -36,10 +36,10 @@ 'Operating System :: OS Independent', 'Programming Language :: Python', 'Programming Language :: Python :: 2.7', - 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.6', + 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: Implementation :: CPython', 'Programming Language :: Python :: Implementation :: PyPy', 'Topic :: Scientific/Engineering :: Information Analysis', diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/tests/test_cleanup.py new/csvkit-1.0.4/tests/test_cleanup.py --- old/csvkit-1.0.3/tests/test_cleanup.py 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/tests/test_cleanup.py 2018-09-20 19:11:36.000000000 +0200 @@ -1,9 +1,6 @@ #!/usr/bin/env python -try: - import unittest2 as unittest -except ImportError: - import unittest +import unittest from csvkit.cleanup import join_rows diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/tests/test_cli.py new/csvkit-1.0.4/tests/test_cli.py --- old/csvkit-1.0.3/tests/test_cli.py 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/tests/test_cli.py 2018-09-20 19:11:36.000000000 +0200 @@ -1,9 +1,6 @@ #!/usr/bin/env python -try: - import unittest2 as unittest -except ImportError: - import unittest +import unittest from csvkit.cli import match_column_identifier, parse_column_identifiers diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/tests/test_convert/test_convert.py new/csvkit-1.0.4/tests/test_convert/test_convert.py --- old/csvkit-1.0.3/tests/test_convert/test_convert.py 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/tests/test_convert/test_convert.py 2018-09-20 19:11:36.000000000 +0200 @@ -1,9 +1,6 @@ #!/usr/bin/env python -try: - import unittest2 as unittest -except ImportError: - import unittest +import unittest from csvkit import convert diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/tests/test_grep.py new/csvkit-1.0.4/tests/test_grep.py --- old/csvkit-1.0.3/tests/test_grep.py 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/tests/test_grep.py 2018-09-20 19:11:36.000000000 +0200 @@ -1,11 +1,7 @@ #!/usr/bin/env python import re - -try: - import unittest2 as unittest -except ImportError: - import unittest +import unittest from csvkit.grep import FilteringCSVReader from csvkit.exceptions import ColumnIdentifierError diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/tests/test_utilities/test_csvcut.py new/csvkit-1.0.4/tests/test_utilities/test_csvcut.py --- old/csvkit-1.0.3/tests/test_utilities/test_csvcut.py 2018-01-28 02:55:31.000000000 +0100 +++ new/csvkit-1.0.4/tests/test_utilities/test_csvcut.py 2019-01-02 21:16:10.000000000 +0100 @@ -62,6 +62,12 @@ ['1'], ]) + def test_delete_empty(self): + self.assertRows(['-c', 'column_c', '--delete-empty-rows', 'examples/bad.csv'], [ + ['column_c'], + ['17'], + ]) + def test_no_header_row(self): self.assertRows(['-c', '2', '--no-header-row', 'examples/no_header_row.csv'], [ ['b'], @@ -71,3 +77,17 @@ def test_ragged(self): # Test that csvcut doesn't error when a row is short. self.get_output(['-c', 'column_c', 'examples/bad.csv']) + + def test_truncate(self): + # Test that csvcut truncates long rows. + self.assertRows(['-C', 'column_a,column_b', '--delete-empty-rows', 'examples/bad.csv'], [ + ['column_c'], + ['17'], + ]) + + def test_names_with_skip_lines(self): + self.assertLines(['--names', '--skip-lines', '3', 'examples/test_skip_lines.csv'], [ + ' 1: a', + ' 2: b', + ' 3: c', + ]) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/tests/test_utilities/test_csvjoin.py new/csvkit-1.0.4/tests/test_utilities/test_csvjoin.py --- old/csvkit-1.0.3/tests/test_utilities/test_csvjoin.py 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/tests/test_utilities/test_csvjoin.py 2018-11-21 19:26:58.000000000 +0100 @@ -44,6 +44,12 @@ with open('examples/join_short.csv') as f: self.assertEqual(output.readlines(), f.readlines()) + def test_single(self): + self.assertRows(['examples/dummy.csv', '--no-inference'], [ + ['a', 'b', 'c'], + ['1', '2', '3'], + ]) + def test_no_blanks(self): self.assertRows(['examples/blanks.csv', 'examples/blanks.csv'], [ ['a', 'b', 'c', 'd', 'e', 'f', 'a2', 'b2', 'c2', 'd2', 'e2', 'f2'], diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/tests/test_utilities/test_csvjson.py new/csvkit-1.0.4/tests/test_utilities/test_csvjson.py --- old/csvkit-1.0.3/tests/test_utilities/test_csvjson.py 2018-03-11 16:11:37.000000000 +0100 +++ new/csvkit-1.0.4/tests/test_utilities/test_csvjson.py 2019-02-23 17:57:27.000000000 +0100 @@ -58,7 +58,7 @@ output = self.get_output(['-i', '4', 'examples/dummy.csv']) js = json.loads(output) self.assertDictEqual(js[0], {'a': True, 'c': 3.0, 'b': 2.0}) - self.assertRegex(output, ' "a": true,') + six.assertRegex(self, output, ' "a": true,') def test_keying(self): js = json.loads(self.get_output(['-k', 'a', 'examples/dummy.csv'])) @@ -67,7 +67,9 @@ def test_duplicate_keys(self): output_file = six.StringIO() utility = CSVJSON(['-k', 'a', 'examples/dummy3.csv'], output_file) - self.assertRaisesRegex(ValueError, 'Value True is not unique in the key column.', utility.run) + six.assertRaisesRegex(self, ValueError, + 'Value True is not unique in the key column.', + utility.run) output_file.close() def test_geojson_with_id(self): @@ -212,5 +214,3 @@ '{"type": "Feature", "properties": {"slug": "obeidder", "title": "Obeidder Monster", "description": "Sharpie and Spray Paint", "address": "3319 Seaton St.", "type": "Street Art", "photo_url": "http://i.imgur.com/3aX7E.jpg", "photo_credit": "Photo by Justin Edwards. Used with permission.", "last_seen_date": "4/15/12"}, "geometry": {"type": "Point", "coordinates": [-95.334619, 32.314431]}}', '{"type": "Feature", "properties": {"slug": "sensor-device", "title": "Sensor Device", "artist": "Kurt Dyrhaug", "address": "University of Texas, Campus Drive", "type": "Sculpture", "photo_url": "http://media.hacktyler.com/artmap/photos/sensor-device.jpg", "photo_credit": "Photo by Christopher Groskopf. Used with permission.", "last_seen_date": "4/16/12"}, "geometry": {"type": "Point", "coordinates": [-95.250699, 32.317216]}}' ]) - - diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/tests/test_utilities/test_csvsql.py new/csvkit-1.0.4/tests/test_utilities/test_csvsql.py --- old/csvkit-1.0.3/tests/test_utilities/test_csvsql.py 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/tests/test_utilities/test_csvsql.py 2018-09-20 19:09:31.000000000 +0200 @@ -176,6 +176,11 @@ "question,text\n" "36,©\n") + def test_query_update(self): + sql = self.get_output(['--query', 'UPDATE dummy SET a=10 WHERE a=1', '--no-inference', 'examples/dummy.csv']) + + self.assertEqual(sql, '') + def test_before_after_insert(self): self.get_output(['--db', 'sqlite:///' + self.db_file, '--insert', 'examples/dummy.csv', '--before-insert', 'SELECT 1; CREATE TABLE foobar (date DATE)', '--after-insert', 'INSERT INTO dummy VALUES (0, 5, 6)']) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/tests/test_utilities/test_csvstat.py new/csvkit-1.0.4/tests/test_utilities/test_csvstat.py --- old/csvkit-1.0.3/tests/test_utilities/test_csvstat.py 2017-06-15 00:10:26.000000000 +0200 +++ new/csvkit-1.0.4/tests/test_utilities/test_csvstat.py 2018-09-20 19:11:36.000000000 +0200 @@ -2,6 +2,8 @@ import sys +import six + import agate try: @@ -47,11 +49,11 @@ def test_unique(self): output = self.get_output(['-c', 'county', 'examples/realdata/ks_1033_data.csv']) - self.assertRegex(output, r'Unique values:\s+73') + six.assertRegex(self, output, r'Unique values:\s+73') def test_max_length(self): output = self.get_output(['-c', 'county', 'examples/realdata/ks_1033_data.csv']) - self.assertRegex(output, r'Longest value:\s+12') + six.assertRegex(self, output, r'Longest value:\s+12') def test_freq_list(self): output = self.get_output(['examples/realdata/ks_1033_data.csv']) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/tests/test_utilities/test_in2csv.py new/csvkit-1.0.4/tests/test_utilities/test_in2csv.py --- old/csvkit-1.0.3/tests/test_utilities/test_in2csv.py 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/tests/test_utilities/test_in2csv.py 2019-02-23 17:57:27.000000000 +0100 @@ -47,6 +47,9 @@ def test_date_format(self): self.assertConverted('csv', 'examples/test_date_format.csv', 'examples/test_date_format_converted.csv', ['--date-format', '%d/%m/%Y']) + def test_numeric_date_format(self): + self.assertConverted('csv', 'examples/test_numeric_date_format.csv', 'examples/test_date_format_converted.csv', ['--date-format', '%Y%m%d']) + def test_convert_csv(self): self.assertConverted('csv', 'examples/testfixed_converted.csv', 'examples/testfixed_converted.csv') @@ -104,6 +107,13 @@ def test_convert_xlsx_with_skip_lines(self): self.assertConverted('xlsx', 'examples/test_skip_lines.xlsx', 'examples/testxlsx_converted.csv', ['--skip-lines', '3']) + def test_names(self): + self.assertLines(['--names', 'examples/sheets.xlsx'], [ + 'not this one', + 'data', + u'ʤ', + ]) + def test_csv_no_headers(self): self.assertConverted('csv', 'examples/no_header_row.csv', 'examples/dummy.csv', ['--no-header-row', '--no-inference']) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/csvkit-1.0.3/tests/utils.py new/csvkit-1.0.4/tests/utils.py --- old/csvkit-1.0.3/tests/utils.py 2018-03-11 15:51:56.000000000 +0100 +++ new/csvkit-1.0.4/tests/utils.py 2018-09-20 19:11:36.000000000 +0200 @@ -20,17 +20,13 @@ """ import sys +import unittest import warnings from contextlib import contextmanager import agate import six -try: - import unittest2 as unittest -except ImportError: - import unittest - from csvkit.exceptions import ColumnIdentifierError, RequiredHeaderError
