Hello community,
here is the log from the commit of package python-dateparser for
openSUSE:Factory checked in at 2020-04-07 10:23:18
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-dateparser (Old)
and /work/SRC/openSUSE:Factory/.python-dateparser.new.3248 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-dateparser"
Tue Apr 7 10:23:18 2020 rev:3 rq:790846 version:0.7.4
Changes:
--------
--- /work/SRC/openSUSE:Factory/python-dateparser/python-dateparser.changes
2019-09-30 15:57:09.505574384 +0200
+++
/work/SRC/openSUSE:Factory/.python-dateparser.new.3248/python-dateparser.changes
2020-04-07 10:23:38.449983871 +0200
@@ -1,0 +2,17 @@
+Thu Apr 2 09:44:00 UTC 2020 - Marketa Calabkova <[email protected]>
+
+- update to version 0.7.4
+ * Fixed Python 2.7 tests
+ * Extended Norwegian support
+ * Implement a PARSERS setting
+ * Add support for `PREFER_DATES_FROM` in relative/freshness parser
+ * Add support for `PREFER_DAY_OF_MONTH` in base-formats parser
+ * Added UTC -00:00 as a valid offset
+ * Fix support for “one”
+ * Fix tokenizer for non recognized characters
+ * Prevent installing regex 2019.02.19
+ * Added Hungarian language.
+ * Added setting, `STRICT_PARSING` to ignore incomplete dates.
+ * More simplifications for Russian and Ukrainian languages.
+
+-------------------------------------------------------------------
Old:
----
dateparser-0.7.2.tar.gz
New:
----
dateparser-0.7.4.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ python-dateparser.spec ++++++
--- /var/tmp/diff_new_pack.NPGXT8/_old 2020-04-07 10:23:39.085984579 +0200
+++ /var/tmp/diff_new_pack.NPGXT8/_new 2020-04-07 10:23:39.085984579 +0200
@@ -1,7 +1,7 @@
#
# spec file for package python-dateparser
#
-# Copyright (c) 2019 SUSE LINUX GmbH, Nuernberg, Germany.
+# Copyright (c) 2020 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
@@ -18,7 +18,7 @@
%{?!python_module:%define python_module() python-%{**} python3-%{**}}
Name: python-dateparser
-Version: 0.7.2
+Version: 0.7.4
Release: 0
Summary: Date parsing library designed to parse dates from HTML pages
License: BSD-3-Clause
@@ -69,7 +69,7 @@
%python_expand %fdupes %{buildroot}%{$python_sitelib}
%check
-%python_exec setup.py test
+%python_expand nosetests-%{$python_bin_suffix}
%files %{python_files}
%doc AUTHORS.rst README.rst
++++++ dateparser-0.7.2.tar.gz -> dateparser-0.7.4.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/CONTRIBUTING.rst
new/dateparser-0.7.4/CONTRIBUTING.rst
--- old/dateparser-0.7.2/CONTRIBUTING.rst 2019-09-17 12:57:56.000000000
+0200
+++ new/dateparser-0.7.4/CONTRIBUTING.rst 2020-03-06 12:31:08.000000000
+0100
@@ -41,6 +41,13 @@
official DateParser docs, in docstrings, or even on the web in blog posts,
articles, and such.
+After you make local changes to the documentation, build it with ``tox``::
+
+ tox -e docs
+
+Then open ``.tox/docs/tmp/html/index.html`` in a web browser to see your local
+build of the documentation.
+
Submit Feedback
~~~~~~~~~~~~~~~
@@ -109,16 +116,44 @@
Guidelines for Editing Translation Data
---------------------------------------
-English is the primary language of the dateparser. Dates in all other
languages are translated into English equivalents before they are parsed.
-The language data required for parsing dates is contained in
*dateparser/data/date_translation_data*.
-It contains variable parts that can be used in dates, language by language:
month and week names - and their abbreviations, prepositions, conjunctions and
frequently used descriptive words and phrases (like "today").
-The data in *dateparser/data/date_translation_data* is formed by supplementing
data retrieved from unicode CLDR, contained in
*data/cldr_language_data/date_translation_data*, with supplementary data
contributed by the community, contained in
*data/supplementary_language_data/date_translation_data*.
-Additional data to supplement existing data or translation data for a new
language should be added to
*dateparser_data/supplementary_language_data/date_translation_data*.
-The chosen data format is YAML because it is readable and simple to edit.
-After adding or changing any data in YAML files we need to move them to
internal data files with *scripts/write_complete_data.py*. Otherwise the
changes to YAML files will not have any effect.
-
-Refer to :ref:`language-data-template` for details about its structure and
take a look at already implemented languages for examples.
-As we deal with the delicate fabric of interwoven languages, tests are
essential to keep the functionality across them.
-Therefore any addition or change should be reflected in tests.
-However, there is nothing to be afraid of: our tests are highly parameterized
and in most cases a test fits in one declarative line of data.
-Alternatively, you can provide required information and ask the maintainers to
create the tests for you.
+
+English is the primary language of Dateparser. Dates in all other languages are
+translated into English equivalents before they are parsed.
+
+The language data that Dateparser uses to parse dates is in
+``dateparser/data/date_translation_data``. For each supported language, there
+is a Python file containing translation data.
+
+Each translation data Python files contains different kinds of translation data
+for date parsing: month and week names - and their abbreviations, prepositions,
+conjunctions, frequently used descriptive words and phrases (like “today”),
+etc.
+
+Translation data Python files are generated from the following sources:
+
+- `Unicode CLDR <http://cldr.unicode.org/>`_ data in JSON format, located at
+ ``dateparser_data/cldr_language_data/date_translation_data``
+
+- Additional data from the Dateparser community in YAML format, located at
+ ``dateparser_data/supplementary_language_data/date_translation_data``
+
+If you wish to extend the data of an existing language, or add data for a new
+language, you must:
+
+#. Edit or create the corresponding file within
+ ``dateparser_data/supplementary_language_data/date_translation_data``
+
+ See existing files to learn how they are defined, and see
+ :ref:`language-data-template` for details.
+
+#. Regenerate the corresponding file within
+ ``dateparser/data/date_translation_data`` running the following script::
+
+ scripts/write_complete_data.py
+
+#. Write tests that cover your changes
+
+ You should be able to find tests that cover the affected data, and use
+ copy-and-paste to create the corresponding new test.
+
+ If in doubt, ask Dateparser maintainers for help.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/HISTORY.rst
new/dateparser-0.7.4/HISTORY.rst
--- old/dateparser-0.7.2/HISTORY.rst 2019-09-17 12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/HISTORY.rst 2020-03-06 12:31:08.000000000 +0100
@@ -3,6 +3,39 @@
History
=======
+0.7.4 (2020-03-06)
+------------------
+
+Improvements:
+
+* Fixed Python 2.7 tests
+
+
+0.7.3 (2020-03-06)
+------------------
+
+New features:
+
+* Extended Norwegian support (see
https://github.com/scrapinghub/dateparser/pull/598)
+* Implement a PARSERS setting (see
https://github.com/scrapinghub/dateparser/pull/603)
+
+
+Improvements:
+
+* Add support for `PREFER_DATES_FROM` in relative/freshness parser
(https://github.com/scrapinghub/dateparser/pull/414)
+* Add support for `PREFER_DAY_OF_MONTH` in base-formats parser (see
https://github.com/scrapinghub/dateparser/pull/611)
+* Added UTC -00:00 as a valid offset (see
https://github.com/scrapinghub/dateparser/pull/574)
+* Fix support for “one” (see
https://github.com/scrapinghub/dateparser/pull/593)
+* Fix TypeError when parsing some invalid dates (see
https://github.com/scrapinghub/dateparser/pull/536)
+* Fix tokenizer for non recognized characters (see
https://github.com/scrapinghub/dateparser/pull/622)
+* Prevent installing regex 2019.02.19
(https://github.com/scrapinghub/dateparser/pull/600)
+* Resolve DeprecationWarning related to raw string escape sequences (see
https://github.com/scrapinghub/dateparser/pull/596)
+* Implement a tox environment to build the documentation
(https://github.com/scrapinghub/dateparser/pull/604)
+* Improve tests stability (see
https://github.com/scrapinghub/dateparser/pull/591,
https://github.com/scrapinghub/dateparser/pull/605)
+* Documentation improvements (see
https://github.com/scrapinghub/dateparser/pull/510,
https://github.com/scrapinghub/dateparser/pull/578,
https://github.com/scrapinghub/dateparser/pull/619,
https://github.com/scrapinghub/dateparser/pull/614,
https://github.com/scrapinghub/dateparser/pull/620)
+* Performance improvements (see
https://github.com/scrapinghub/dateparser/pull/570,
https://github.com/scrapinghub/dateparser/pull/569,
https://github.com/scrapinghub/dateparser/pull/625)
+
+
0.7.2 (2019-09-17)
------------------
@@ -120,8 +153,8 @@
* `DateDataParser` now also returns detected language in the result dictionary.
* Explicit and lucid timezone conversion for a given datestring using
`TIMEZONE`, `TO_TIMEZONE` settings.
-* Added Hungarian langauge.
-* Added setting, `STRICT_PARSING` to ignore imcomplete dates.
+* Added Hungarian language.
+* Added setting, `STRICT_PARSING` to ignore incomplete dates.
Improvements:
@@ -203,7 +236,7 @@
* Fixed problem with caching :func:`datetime.now` in
:class:`FreshnessDateDataParser`.
* Added month names and week day names abbreviations to several languages.
-* More simplifications for Russian and Ukranian languages.
+* More simplifications for Russian and Ukrainian languages.
* Fixed problem with parsing time component of date strings with several kinds
of apostrophes.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/PKG-INFO
new/dateparser-0.7.4/PKG-INFO
--- old/dateparser-0.7.2/PKG-INFO 2019-09-17 12:59:32.000000000 +0200
+++ new/dateparser-0.7.4/PKG-INFO 2020-03-06 12:32:39.000000000 +0100
@@ -1,6 +1,6 @@
Metadata-Version: 1.2
Name: dateparser
-Version: 0.7.2
+Version: 0.7.4
Summary: Date parsing library designed to parse dates from HTML pages
Home-page: https://github.com/scrapinghub/dateparser
Author: Scrapinghub
@@ -34,6 +34,7 @@
`dateparser` provides modules to easily parse localized dates in almost
any string formats commonly found on web pages.
+ .. contents::
Documentation
=============
@@ -121,7 +122,7 @@
>>> # parsing ambiguous date
>>> parse('02-03-2016') # assumes english language, uses MDY date
order
- datetime.datetime(2016, 3, 2, 0, 0)
+ datetime.datetime(2016, 2, 3, 0, 0)
>>> parse('le 02-03-2016') # detects french, uses DMY date order
datetime.datetime(2016, 3, 2, 0, 0)
@@ -497,6 +498,39 @@
History
=======
+ 0.7.4 (2020-03-06)
+ ------------------
+
+ Improvements:
+
+ * Fixed Python 2.7 tests
+
+
+ 0.7.3 (2020-03-06)
+ ------------------
+
+ New features:
+
+ * Extended Norwegian support (see
https://github.com/scrapinghub/dateparser/pull/598)
+ * Implement a PARSERS setting (see
https://github.com/scrapinghub/dateparser/pull/603)
+
+
+ Improvements:
+
+ * Add support for `PREFER_DATES_FROM` in relative/freshness parser
(https://github.com/scrapinghub/dateparser/pull/414)
+ * Add support for `PREFER_DAY_OF_MONTH` in base-formats parser (see
https://github.com/scrapinghub/dateparser/pull/611)
+ * Added UTC -00:00 as a valid offset (see
https://github.com/scrapinghub/dateparser/pull/574)
+ * Fix support for “one” (see
https://github.com/scrapinghub/dateparser/pull/593)
+ * Fix TypeError when parsing some invalid dates (see
https://github.com/scrapinghub/dateparser/pull/536)
+ * Fix tokenizer for non recognized characters (see
https://github.com/scrapinghub/dateparser/pull/622)
+ * Prevent installing regex 2019.02.19
(https://github.com/scrapinghub/dateparser/pull/600)
+ * Resolve DeprecationWarning related to raw string escape sequences
(see https://github.com/scrapinghub/dateparser/pull/596)
+ * Implement a tox environment to build the documentation
(https://github.com/scrapinghub/dateparser/pull/604)
+ * Improve tests stability (see
https://github.com/scrapinghub/dateparser/pull/591,
https://github.com/scrapinghub/dateparser/pull/605)
+ * Documentation improvements (see
https://github.com/scrapinghub/dateparser/pull/510,
https://github.com/scrapinghub/dateparser/pull/578,
https://github.com/scrapinghub/dateparser/pull/619,
https://github.com/scrapinghub/dateparser/pull/614,
https://github.com/scrapinghub/dateparser/pull/620)
+ * Performance improvements (see
https://github.com/scrapinghub/dateparser/pull/570,
https://github.com/scrapinghub/dateparser/pull/569,
https://github.com/scrapinghub/dateparser/pull/625)
+
+
0.7.2 (2019-09-17)
------------------
@@ -614,8 +648,8 @@
* `DateDataParser` now also returns detected language in the result
dictionary.
* Explicit and lucid timezone conversion for a given datestring using
`TIMEZONE`, `TO_TIMEZONE` settings.
- * Added Hungarian langauge.
- * Added setting, `STRICT_PARSING` to ignore imcomplete dates.
+ * Added Hungarian language.
+ * Added setting, `STRICT_PARSING` to ignore incomplete dates.
Improvements:
@@ -697,7 +731,7 @@
* Fixed problem with caching `datetime.now` in
`FreshnessDateDataParser`.
* Added month names and week day names abbreviations to several
languages.
- * More simplifications for Russian and Ukranian languages.
+ * More simplifications for Russian and Ukrainian languages.
* Fixed problem with parsing time component of date strings with
several kinds of apostrophes.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/README.rst
new/dateparser-0.7.4/README.rst
--- old/dateparser-0.7.2/README.rst 2019-09-17 12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/README.rst 2020-03-06 12:31:08.000000000 +0100
@@ -26,6 +26,7 @@
`dateparser` provides modules to easily parse localized dates in almost
any string formats commonly found on web pages.
+.. contents::
Documentation
=============
@@ -113,7 +114,7 @@
>>> # parsing ambiguous date
>>> parse('02-03-2016') # assumes english language, uses MDY date order
- datetime.datetime(2016, 3, 2, 0, 0)
+ datetime.datetime(2016, 2, 3, 0, 0)
>>> parse('le 02-03-2016') # detects french, uses DMY date order
datetime.datetime(2016, 3, 2, 0, 0)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser/__init__.py
new/dateparser-0.7.4/dateparser/__init__.py
--- old/dateparser-0.7.2/dateparser/__init__.py 2019-09-17 12:57:56.000000000
+0200
+++ new/dateparser-0.7.4/dateparser/__init__.py 2020-03-06 12:31:08.000000000
+0100
@@ -1,5 +1,5 @@
# -*- coding: utf-8 -*-
-__version__ = '0.7.2'
+__version__ = '0.7.4'
from .date import DateDataParser
from .conf import apply_settings
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser/conf.py
new/dateparser-0.7.4/dateparser/conf.py
--- old/dateparser-0.7.2/dateparser/conf.py 2019-09-17 12:57:56.000000000
+0200
+++ new/dateparser-0.7.4/dateparser/conf.py 2020-03-06 12:31:08.000000000
+0100
@@ -11,7 +11,6 @@
Currently, supported settings are:
* `PREFER_DATES_FROM`: defaults to `current_period`. Options are `future`
or `past`.
- * `SUPPORT_BEFORE_COMMON_ERA`: defaults to `False`.
* `PREFER_DAY_OF_MONTH`: defaults to `current`. Could be `first` and
`last` day of month.
* `SKIP_TOKENS`: defaults to `['t']`. Can be any string.
* `TIMEZONE`: defaults to `UTC`. Can be timezone abbreviation or any of
`tz database name as given here
<https://en.wikipedia.org/wiki/List_of_tz_database_time_zones>`_.
@@ -19,6 +18,8 @@
* `RELATIVE_BASE`: count relative date from this base date. Should be
datetime object.
* `RETURN_TIME_AS_PERIOD`: returns period as `time` in case time component
is detected in the date string.
Default: False.
+ * `PARSERS`: list of date parsers to use, in order of preference. Default:
+ :attr:`dateparser.settings.default_parsers`.
"""
_default = True
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/dateparser-0.7.2/dateparser/data/date_translation_data/en.py
new/dateparser-0.7.4/dateparser/data/date_translation_data/en.py
--- old/dateparser-0.7.2/dateparser/data/date_translation_data/en.py
2019-09-17 12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/dateparser/data/date_translation_data/en.py
2020-03-06 12:31:08.000000000 +0100
@@ -784,6 +784,9 @@
"less than 1 minute ago": "45 second ago"
},
{
+ "one": "1"
+ },
+ {
"two": "2"
},
{
@@ -817,4 +820,4 @@
"twelve": "12"
}
]
-}
\ No newline at end of file
+}
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore'
old/dateparser-0.7.2/dateparser/data/date_translation_data/nb.py
new/dateparser-0.7.4/dateparser/data/date_translation_data/nb.py
--- old/dateparser-0.7.2/dateparser/data/date_translation_data/nb.py
2019-09-17 12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/dateparser/data/date_translation_data/nb.py
2020-03-06 12:31:08.000000000 +0100
@@ -93,15 +93,18 @@
],
"week": [
"uke",
- "u"
+ "u",
+ "uker"
],
"day": [
"dag",
- "d"
+ "d",
+ "dager"
],
"hour": [
"time",
- "t"
+ "t",
+ "timer"
],
"minute": [
"minutt",
@@ -197,7 +200,8 @@
],
"\\1 day ago": [
"for (\\d+) døgn siden",
- "for (\\d+) d siden"
+ "for (\\d+) d siden",
+ "for (\\d+) dager siden"
],
"in \\1 hour": [
"om (\\d+) time",
@@ -235,6 +239,12 @@
"name": "nb-SJ"
}
},
+ "ago": [
+ "siden"
+ ],
+ "in": [
+ "om"
+ ],
"skip": [
" ",
".",
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser/date.py
new/dateparser-0.7.4/dateparser/date.py
--- old/dateparser-0.7.2/dateparser/date.py 2019-09-17 12:57:56.000000000
+0200
+++ new/dateparser-0.7.4/dateparser/date.py 2020-03-06 12:31:08.000000000
+0100
@@ -1,7 +1,6 @@
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
-import calendar
import collections
from datetime import datetime, timedelta
from warnings import warn
@@ -15,7 +14,8 @@
from dateparser.languages.loader import LocaleDataLoader
from dateparser.conf import apply_settings
from dateparser.timezone_parser import pop_tz_offset_from_string
-from dateparser.utils import apply_timezone_from_settings
+from dateparser.utils import apply_timezone_from_settings, \
+ set_correct_day_from_settings
try:
# Python 3
@@ -128,10 +128,6 @@
return date_obj
-def get_last_day_of_month(year, month):
- return calendar.monthrange(year, month)[1]
-
-
def parse_with_formats(date_string, date_formats, settings):
""" Parse with formats and return a dictionary with 'period' and
'obj_date'.
@@ -148,12 +144,9 @@
except ValueError:
continue
else:
- # If format does not include the day, use last day of the month
- # instead of first, because the first is usually out of range.
if '%d' not in date_format:
period = 'month'
- date_obj = date_obj.replace(
- day=get_last_day_of_month(date_obj.year, date_obj.month))
+ date_obj = set_correct_day_from_settings(date_obj, settings)
if not ('%y' in date_format or '%Y' in date_format):
today = datetime.today()
@@ -182,6 +175,20 @@
self.date_formats = date_formats
self._translated_date = None
self._translated_date_with_formatting = None
+ self._parsers = {
+ 'timestamp': self._try_timestamp,
+ 'relative-time': self._try_freshness_parser,
+ 'custom-formats': self._try_given_formats,
+ 'absolute-time': self._try_parser,
+ 'base-formats': self._try_hardcoded_formats,
+ }
+ unknown_parsers = set(self._settings.PARSERS) -
set(self._parsers.keys())
+ if unknown_parsers:
+ raise ValueError(
+ 'Unknown parsers found in the PARSERS setting: {}'.format(
+ ', '.join(sorted(unknown_parsers))
+ )
+ )
@classmethod
def parse(cls, locale, date_string, date_formats=None, settings=None):
@@ -189,14 +196,8 @@
return instance._parse()
def _parse(self):
- for parser in (
- self._try_timestamp,
- self._try_freshness_parser,
- self._try_given_formats,
- self._try_parser,
- self._try_hardcoded_formats,
- ):
- date_obj = parser()
+ for parser_name in self._settings.PARSERS:
+ date_obj = self._parsers[parser_name]()
if self._is_valid_date_obj(date_obj):
return date_obj
else:
@@ -355,7 +356,7 @@
self.languages = languages
self.locales = locales
self.region = region
- self.previous_locales = []
+ self.previous_locales = set()
def get_date_data(self, date_string, date_formats=None):
"""
@@ -418,7 +419,7 @@
if parsed_date:
parsed_date['locale'] = locale.shortname
if self.try_previous_locales:
- self.previous_locales.insert(0, locale)
+ self.previous_locales.add(locale)
return parsed_date
else:
return {'date_obj': None, 'period': 'day', 'locale': None}
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser/freshness_date_parser.py
new/dateparser-0.7.4/dateparser/freshness_date_parser.py
--- old/dateparser-0.7.2/dateparser/freshness_date_parser.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/dateparser/freshness_date_parser.py 2020-03-06
12:31:08.000000000 +0100
@@ -34,7 +34,7 @@
return not list(words)
def _parse_time(self, date_string, settings):
- """Attemps to parse time part of date strings like '1 day ago, 2 PM'
"""
+ """Attempts to parse time part of date strings like '1 day ago, 2 PM'
"""
date_string = PATTERN.sub('', date_string)
date_string = re.sub(r'\b(?:ago|in)\b', '', date_string)
try:
@@ -93,7 +93,7 @@
else:
self.now = datetime.now(self.get_local_tz())
- date, period = self._parse_date(date_string)
+ date, period = self._parse_date(date_string,
settings.PREFER_DATES_FROM)
if date:
date = apply_time(date, _time)
@@ -110,7 +110,7 @@
self.now = None
return date, period
- def _parse_date(self, date_string):
+ def _parse_date(self, date_string, prefer_dates_from):
if not self._are_all_words_units(date_string):
return None, None
@@ -126,7 +126,11 @@
break
td = relativedelta(**kwargs)
- if re.search(r'\bin\b', date_string):
+ if (
+ re.search(r'\bin\b', date_string) or
+ ('future' in prefer_dates_from and
+ not re.search(r'\bago\b', date_string))
+ ):
date = self.now + td
else:
date = self.now - td
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser/languages/dictionary.py
new/dateparser-0.7.4/dateparser/languages/dictionary.py
--- old/dateparser-0.7.2/dateparser/languages/dictionary.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/dateparser/languages/dictionary.py 2020-03-06
12:31:08.000000000 +0100
@@ -103,6 +103,10 @@
:return: True if tokens are valid, False otherwise.
"""
+ has_only_keep_tokens = not set(tokens) - set(ALWAYS_KEEP_TOKENS)
+ if has_only_keep_tokens:
+ return False
+
match_relative_regex = self._get_match_relative_regex_cache()
for token in tokens:
if any([match_relative_regex.match(token),
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser/languages/locale.py
new/dateparser-0.7.4/dateparser/languages/locale.py
--- old/dateparser-0.7.2/dateparser/languages/locale.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/dateparser/languages/locale.py 2020-03-06
12:31:08.000000000 +0100
@@ -248,13 +248,13 @@
for digit_abbreviation in digit_abbreviations:
abbreviation_string += '(?<!' + digit_abbreviation + ')' #
negative lookbehind
- splitters_dict = {1: '[\.!?;…\r\n]+(?:\s|$)*', # most European,
Tagalog, Hebrew, Georgian,
+ splitters_dict = {1: r'[\.!?;…\r\n]+(?:\s|$)*', # most European,
Tagalog, Hebrew, Georgian,
# Indonesian, Vietnamese
- 2: '(?:[¡¿]+|[\.!?;…\r\n]+(?:\s|$))+', # Spanish
- 3: '[|!?;\r\n]+(?:\s|$)+', # Hindi and Bangla
- 4: '[。…‥\.!??!;\r\n]+(?:\s|$)+', # Japanese and
Chinese
- 5: '[\r\n]+', # Thai
- 6: '[\r\n؟!\.…]+(?:\s|$)+'} # Arabic and Farsi
+ 2: r'(?:[¡¿]+|[\.!?;…\r\n]+(?:\s|$))+', # Spanish
+ 3: r'[|!?;\r\n]+(?:\s|$)+', # Hindi and Bangla
+ 4: r'[。…‥\.!??!;\r\n]+(?:\s|$)+', # Japanese and
Chinese
+ 5: r'[\r\n]+', # Thai
+ 6: r'[\r\n؟!\.…]+(?:\s|$)+'} # Arabic and Farsi
if 'sentence_splitter_group' not in self.info:
split_reg = abbreviation_string + splitters_dict[1]
sentences = re.split(split_reg, string)
@@ -358,17 +358,17 @@
if 'no_word_spacing' in self.info:
return self._join(chunk, separator="", settings=settings)
else:
- return re.sub('\s{2,}', ' ', " ".join(chunk))
+ return re.sub(r'\s{2,}', ' ', " ".join(chunk))
def _token_with_digits_is_ok(self, token):
if 'no_word_spacing' in self.info:
- if re.search('[\d\.:\-/]+', token) is not None:
+ if re.search(r'[\d\.:\-/]+', token) is not None:
return True
else:
return False
else:
- if re.search('\d+', token) is not None:
+ if re.search(r'\d+', token) is not None:
return True
else:
return False
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser/parser.py
new/dateparser-0.7.4/dateparser/parser.py
--- old/dateparser-0.7.2/dateparser/parser.py 2019-09-17 12:57:56.000000000
+0200
+++ new/dateparser-0.7.4/dateparser/parser.py 2020-03-06 12:31:08.000000000
+0100
@@ -7,6 +7,8 @@
from datetime import datetime
from datetime import timedelta
+from dateparser.utils import set_correct_day_from_settings, \
+ get_last_day_of_month
from dateparser.utils.strptime import strptime
@@ -185,7 +187,6 @@
'year': ['%y', '%Y'],
}
-
def __init__(self, tokens, settings):
self.settings = settings
self.tokens = list(tokens)
@@ -271,7 +272,6 @@
for token, type, _ in self.unset_tokens:
if type == 0:
params.update({attr: int(token)})
- datetime(**params)
setattr(self, '_token_%s' % attr, token)
setattr(self, attr, int(token))
@@ -301,8 +301,7 @@
(error_msgs[0] in error_text or error_msgs[1] in error_text)
and
not(self._token_day or hasattr(self, '_token_weekday'))
):
- _, tail = calendar.monthrange(params['year'], params['month'])
- params['day'] = tail
+ params['day'] = get_last_day_of_month(params['year'],
params['month'])
return datetime(**params)
else:
raise e
@@ -429,17 +428,10 @@
):
return dateobj
- _, tail = calendar.monthrange(dateobj.year, dateobj.month)
- options = {
- 'first': 1,
- 'last': tail,
- 'current': self.now.day
- }
-
- try:
- return
dateobj.replace(day=options[self.settings.PREFER_DAY_OF_MONTH])
- except ValueError:
- return dateobj.replace(day=options['last'])
+ dateobj = set_correct_day_from_settings(
+ dateobj, self.settings, current_day=self.now.day
+ )
+ return dateobj
@classmethod
def parse(cls, datestring, settings):
@@ -525,13 +517,12 @@
class tokenizer(object):
digits = u'0123456789:'
letters = u'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
- nonwords = u"./\()\"',.;<>~!@#$%^&*|+=[]{}`~?- "
- def _isletter(self, tkn): return tkn in self.letters
+ def _isletter(self, tkn):
+ return tkn in self.letters
- def _isdigit(self, tkn): return tkn in self.digits
-
- def _isnonword(self, tkn): return tkn in self.nonwords
+ def _isdigit(self, tkn):
+ return tkn in self.digits
def __init__(self, ds):
self.instream = StringIO(ds)
@@ -543,10 +534,7 @@
if self._isletter(chara):
return 1, not self._isletter(charb)
- if self._isnonword(chara):
- return 2, not self._isnonword(charb)
-
- return '', True
+ return 2, self._isdigit(charb) or self._isletter(charb)
def tokenize(self):
token = ''
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser/timezone_parser.py
new/dateparser-0.7.4/dateparser/timezone_parser.py
--- old/dateparser-0.7.2/dateparser/timezone_parser.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/dateparser/timezone_parser.py 2020-03-06
12:31:08.000000000 +0100
@@ -33,15 +33,17 @@
def pop_tz_offset_from_string(date_string, as_offset=True):
- for name, info in _tz_offsets:
- timezone_re = info['regex']
- timezone_match = timezone_re.search(date_string)
- if timezone_match:
- start, stop = timezone_match.span()
- date_string = date_string[:start + 1] + date_string[stop:]
- return date_string, StaticTzInfo(name, info['offset']) if
as_offset else name
- else:
- return date_string, None
+ if _search_regex_ignorecase.search(date_string):
+ for name, info in _tz_offsets:
+ timezone_re = info['regex']
+ timezone_match = timezone_re.search(date_string)
+ if timezone_match:
+ start, stop = timezone_match.span()
+ date_string = date_string[:start + 1] + date_string[stop:]
+ return (
+ date_string,
+ StaticTzInfo(name, info['offset']) if as_offset else name)
+ return date_string, None
def word_is_tz(word):
@@ -85,4 +87,6 @@
_search_regex_parts = []
_tz_offsets = list(build_tz_offsets(_search_regex_parts))
_search_regex = re.compile('|'.join(_search_regex_parts))
+_search_regex_ignorecase = re.compile(
+ '|'.join(_search_regex_parts), re.IGNORECASE)
local_tz_offset = get_local_tz_offset()
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser/timezones.py
new/dateparser-0.7.4/dateparser/timezones.py
--- old/dateparser-0.7.2/dateparser/timezones.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/dateparser/timezones.py 2020-03-06
12:31:08.000000000 +0100
@@ -19,47 +19,48 @@
(r'(?:UTC|GMT)(\\[+-])(\d{2}):(\d{2})\$',
r'(?:UTC|GMT)\1\2\3.*'), # GMT+nnmm
],
'timezones':
- [('UTC\-12:00', -43200),
- ('UTC\-11:00', -39600),
- ('UTC\-10:00', -36000),
- ('UTC\-09:30', -34200),
- ('UTC\-09:00', -32400),
- ('UTC\-08:00', -28800),
- ('UTC\-07:00', -25200),
- ('UTC\-06:00', -21600),
- ('UTC\-05:00', -18000),
- ('UTC\-04:30', -16200),
- ('UTC\-04:00', -14400),
- ('UTC\-03:30', -12600),
- ('UTC\-03:00', -10800),
- ('UTC\-02:30', -9000),
- ('UTC\-02:00', -7200),
- ('UTC\-01:00', -3600),
- ('UTC\+00:00', 0),
- ('UTC\+01:00', 3600),
- ('UTC\+02:00', 7200),
- ('UTC\+03:00', 10800),
- ('UTC\+03:30', 12600),
- ('UTC\+04:00', 14400),
- ('UTC\+04:30', 16200),
- ('UTC\+05:00', 18000),
- ('UTC\+05:30', 19800),
- ('UTC\+05:45', 20700),
- ('UTC\+06:00', 21600),
- ('UTC\+06:30', 23400),
- ('UTC\+07:00', 25200),
- ('UTC\+08:00', 28800),
- ('UTC\+08:45', 31500),
- ('UTC\+09:00', 32400),
- ('UTC\+09:30', 34200),
- ('UTC\+10:00', 36000),
- ('UTC\+10:30', 37800),
- ('UTC\+11:00', 39600),
- ('UTC\+11:30', 41400),
- ('UTC\+12:00', 43200),
- ('UTC\+12:45', 45900),
- ('UTC\+13:00', 46800),
- ('UTC\+14:00', 50400)]
+ [(r'UTC\-12:00', -43200),
+ (r'UTC\-11:00', -39600),
+ (r'UTC\-10:00', -36000),
+ (r'UTC\-09:30', -34200),
+ (r'UTC\-09:00', -32400),
+ (r'UTC\-08:00', -28800),
+ (r'UTC\-07:00', -25200),
+ (r'UTC\-06:00', -21600),
+ (r'UTC\-05:00', -18000),
+ (r'UTC\-04:30', -16200),
+ (r'UTC\-04:00', -14400),
+ (r'UTC\-03:30', -12600),
+ (r'UTC\-03:00', -10800),
+ (r'UTC\-02:30', -9000),
+ (r'UTC\-02:00', -7200),
+ (r'UTC\-01:00', -3600),
+ (r'UTC\-00:00', 0),
+ (r'UTC\+00:00', 0),
+ (r'UTC\+01:00', 3600),
+ (r'UTC\+02:00', 7200),
+ (r'UTC\+03:00', 10800),
+ (r'UTC\+03:30', 12600),
+ (r'UTC\+04:00', 14400),
+ (r'UTC\+04:30', 16200),
+ (r'UTC\+05:00', 18000),
+ (r'UTC\+05:30', 19800),
+ (r'UTC\+05:45', 20700),
+ (r'UTC\+06:00', 21600),
+ (r'UTC\+06:30', 23400),
+ (r'UTC\+07:00', 25200),
+ (r'UTC\+08:00', 28800),
+ (r'UTC\+08:45', 31500),
+ (r'UTC\+09:00', 32400),
+ (r'UTC\+09:30', 34200),
+ (r'UTC\+10:00', 36000),
+ (r'UTC\+10:30', 37800),
+ (r'UTC\+11:00', 39600),
+ (r'UTC\+11:30', 41400),
+ (r'UTC\+12:00', 43200),
+ (r'UTC\+12:45', 45900),
+ (r'UTC\+13:00', 46800),
+ (r'UTC\+14:00', 50400)]
},
{
'regex_patterns':
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser/utils/__init__.py
new/dateparser-0.7.4/dateparser/utils/__init__.py
--- old/dateparser-0.7.2/dateparser/utils/__init__.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/dateparser/utils/__init__.py 2020-03-06
12:31:08.000000000 +0100
@@ -1,7 +1,9 @@
# -*- coding: utf-8 -*-
+import calendar
import logging
import types
import unicodedata
+from datetime import datetime
import regex as re
from tzlocal import get_localzone
@@ -133,6 +135,24 @@
return date_obj
+def get_last_day_of_month(year, month):
+ return calendar.monthrange(year, month)[1]
+
+
+def set_correct_day_from_settings(date_obj, settings, current_day=None):
+ """ Set correct day attending the `PREFER_DAY_OF_MONTH` setting."""
+ options = {
+ 'first': 1,
+ 'last': get_last_day_of_month(date_obj.year, date_obj.month),
+ 'current': current_day or datetime.now().day
+ }
+
+ try:
+ return date_obj.replace(day=options[settings.PREFER_DAY_OF_MONTH])
+ except ValueError:
+ return date_obj.replace(day=options['last'])
+
+
def registry(cls):
def choose(creator):
def constructor(cls, *args, **kwargs):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser/utils/strptime.py
new/dateparser-0.7.4/dateparser/utils/strptime.py
--- old/dateparser-0.7.2/dateparser/utils/strptime.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/dateparser/utils/strptime.py 2020-03-06
12:31:08.000000000 +0100
@@ -10,9 +10,9 @@
TIME_MATCHER = re.compile(
r'.*?'
r'(?P<hour>2[0-3]|[0-1]\d|\d):'
- '(?P<minute>[0-5]\d|\d):'
- '(?P<second>6[0-1]|[0-5]\d|\d)'
- '\.(?P<microsecond>[0-9]{1,6})'
+ r'(?P<minute>[0-5]\d|\d):'
+ r'(?P<second>6[0-1]|[0-5]\d|\d)'
+ r'\.(?P<microsecond>[0-9]{1,6})'
)
MS_SEARCHER = re.compile(r'\.(?P<microsecond>[0-9]{1,6})')
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser.egg-info/PKG-INFO
new/dateparser-0.7.4/dateparser.egg-info/PKG-INFO
--- old/dateparser-0.7.2/dateparser.egg-info/PKG-INFO 2019-09-17
12:59:32.000000000 +0200
+++ new/dateparser-0.7.4/dateparser.egg-info/PKG-INFO 2020-03-06
12:32:39.000000000 +0100
@@ -1,6 +1,6 @@
Metadata-Version: 1.2
Name: dateparser
-Version: 0.7.2
+Version: 0.7.4
Summary: Date parsing library designed to parse dates from HTML pages
Home-page: https://github.com/scrapinghub/dateparser
Author: Scrapinghub
@@ -34,6 +34,7 @@
`dateparser` provides modules to easily parse localized dates in almost
any string formats commonly found on web pages.
+ .. contents::
Documentation
=============
@@ -121,7 +122,7 @@
>>> # parsing ambiguous date
>>> parse('02-03-2016') # assumes english language, uses MDY date
order
- datetime.datetime(2016, 3, 2, 0, 0)
+ datetime.datetime(2016, 2, 3, 0, 0)
>>> parse('le 02-03-2016') # detects french, uses DMY date order
datetime.datetime(2016, 3, 2, 0, 0)
@@ -497,6 +498,39 @@
History
=======
+ 0.7.4 (2020-03-06)
+ ------------------
+
+ Improvements:
+
+ * Fixed Python 2.7 tests
+
+
+ 0.7.3 (2020-03-06)
+ ------------------
+
+ New features:
+
+ * Extended Norwegian support (see
https://github.com/scrapinghub/dateparser/pull/598)
+ * Implement a PARSERS setting (see
https://github.com/scrapinghub/dateparser/pull/603)
+
+
+ Improvements:
+
+ * Add support for `PREFER_DATES_FROM` in relative/freshness parser
(https://github.com/scrapinghub/dateparser/pull/414)
+ * Add support for `PREFER_DAY_OF_MONTH` in base-formats parser (see
https://github.com/scrapinghub/dateparser/pull/611)
+ * Added UTC -00:00 as a valid offset (see
https://github.com/scrapinghub/dateparser/pull/574)
+ * Fix support for “one” (see
https://github.com/scrapinghub/dateparser/pull/593)
+ * Fix TypeError when parsing some invalid dates (see
https://github.com/scrapinghub/dateparser/pull/536)
+ * Fix tokenizer for non recognized characters (see
https://github.com/scrapinghub/dateparser/pull/622)
+ * Prevent installing regex 2019.02.19
(https://github.com/scrapinghub/dateparser/pull/600)
+ * Resolve DeprecationWarning related to raw string escape sequences
(see https://github.com/scrapinghub/dateparser/pull/596)
+ * Implement a tox environment to build the documentation
(https://github.com/scrapinghub/dateparser/pull/604)
+ * Improve tests stability (see
https://github.com/scrapinghub/dateparser/pull/591,
https://github.com/scrapinghub/dateparser/pull/605)
+ * Documentation improvements (see
https://github.com/scrapinghub/dateparser/pull/510,
https://github.com/scrapinghub/dateparser/pull/578,
https://github.com/scrapinghub/dateparser/pull/619,
https://github.com/scrapinghub/dateparser/pull/614,
https://github.com/scrapinghub/dateparser/pull/620)
+ * Performance improvements (see
https://github.com/scrapinghub/dateparser/pull/570,
https://github.com/scrapinghub/dateparser/pull/569,
https://github.com/scrapinghub/dateparser/pull/625)
+
+
0.7.2 (2019-09-17)
------------------
@@ -614,8 +648,8 @@
* `DateDataParser` now also returns detected language in the result
dictionary.
* Explicit and lucid timezone conversion for a given datestring using
`TIMEZONE`, `TO_TIMEZONE` settings.
- * Added Hungarian langauge.
- * Added setting, `STRICT_PARSING` to ignore imcomplete dates.
+ * Added Hungarian language.
+ * Added setting, `STRICT_PARSING` to ignore incomplete dates.
Improvements:
@@ -697,7 +731,7 @@
* Fixed problem with caching `datetime.now` in
`FreshnessDateDataParser`.
* Added month names and week day names abbreviations to several
languages.
- * More simplifications for Russian and Ukranian languages.
+ * More simplifications for Russian and Ukrainian languages.
* Fixed problem with parsing time component of date strings with
several kinds of apostrophes.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser.egg-info/requires.txt
new/dateparser-0.7.4/dateparser.egg-info/requires.txt
--- old/dateparser-0.7.2/dateparser.egg-info/requires.txt 2019-09-17
12:59:32.000000000 +0200
+++ new/dateparser-0.7.4/dateparser.egg-info/requires.txt 2020-03-06
12:32:39.000000000 +0100
@@ -1,4 +1,4 @@
python-dateutil
pytz
-regex
+regex!=2019.02.19
tzlocal
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/dateparser_data/settings.py
new/dateparser-0.7.4/dateparser_data/settings.py
--- old/dateparser-0.7.2/dateparser_data/settings.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/dateparser_data/settings.py 2020-03-06
12:31:08.000000000 +0100
@@ -1,6 +1,13 @@
+default_parsers = [
+ 'timestamp',
+ 'relative-time',
+ 'custom-formats',
+ 'absolute-time',
+ 'base-formats',
+]
+
settings = {
'PREFER_DATES_FROM': 'current_period',
- 'SUPPORT_BEFORE_COMMON_ERA': False,
'PREFER_DAY_OF_MONTH': 'current',
'SKIP_TOKENS': ["t"],
'SKIP_TOKENS_PARSER': ["t", "year", "hour", "minute"],
@@ -14,4 +21,5 @@
'FUZZY': False,
'STRICT_PARSING': False,
'RETURN_TIME_AS_PERIOD': False,
+ 'PARSERS': default_parsers,
}
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/docs/usage.rst
new/dateparser-0.7.4/docs/usage.rst
--- old/dateparser-0.7.2/docs/usage.rst 2019-09-17 12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/docs/usage.rst 2020-03-06 12:31:08.000000000 +0100
@@ -55,7 +55,7 @@
>>> # parsing ambiguous date
>>> parse('02-03-2016') # assumes english language, uses MDY date order
- datetime.datetime(2016, 3, 2, 0, 0)
+ datetime.datetime(2016, 2, 3, 0, 0)
>>> parse('le 02-03-2016') # detects french, hence, uses DMY date order
datetime.datetime(2016, 3, 2, 0, 0)
@@ -146,6 +146,45 @@
>>> ddp.get_date_data(u'vr jan 24, 2014 12:49')
{'date_obj': datetime.datetime(2014, 1, 24, 12, 49), 'period': 'time',
'locale': 'nl'}
+``PARSERS`` is a list of names of parsers to try, allowing to customize which
+parsers are tried against the input date string, and in which order they are
+tried.
+
+The following parsers exist:
+
+- ``'timestamp'``: If the input string starts with 10 digits, optionally
+ followed by additional digits or a period (``.``), those first 10 digits
+ are interpreted as `Unix time <https://en.wikipedia.org/wiki/Unix_time>`_.
+
+- ``'relative-time'``: Parses dates and times expressed in relation to the
+ current date and time (e.g. “1 day ago”, “in 2 weeks”).
+
+- ``'custom-formats'``: Parses dates that match one of the date formats in
+ the list of the ``date_formats`` parameter of :func:`dateparser.parse` or
+ :meth:`DateDataParser.get_date_data
+ <dateparser.date.DateDataParser.get_date_data>`.
+
+- ``'absolute-time'``: Parses dates and times expressed in absolute form
+ (e.g. “May 4th”, “1991-05-17”). It takes into account settings such as
+ ``DATE_ORDER`` or ``PREFER_LOCALE_DATE_ORDER``.
+
+- ``'base-formats'``: Parses dates that match one of the following date
+ formats::
+
+ %B %d, %Y, %I:%M:%S %p
+ %b %d, %Y at %I:%M %p
+ %d %B %Y %H:%M:%S
+ %A, %B %d, %Y
+ %Y-%m-%dT%H:%M:%S.%fZ
+
+:data:`dateparser.settings.default_parsers` contains the default value of
+``PARSERS`` (the list above, in that order) and can be used to write code that
+changes the parsers to try without skipping parsers that may be added to
+Dateparser in the future. For example, to ignore relative times:
+
+>>> from dateparser.settings import default_parsers
+>>> parsers = [parser for parser in default_parsers if parser !=
'relative-time']
+>>> parse('today', settings={'PARSERS': parsers})
Language Detection
++++++++++++++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/setup.py
new/dateparser-0.7.4/setup.py
--- old/dateparser-0.7.2/setup.py 2019-09-17 12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/setup.py 2020-03-06 12:31:08.000000000 +0100
@@ -28,7 +28,8 @@
install_requires=[
'python-dateutil',
'pytz',
- 'regex',
+ #
https://bitbucket.org/mrabarnett/mrab-regex/issues/314/import-error-no-module-named
+ 'regex !=2019.02.19',
'tzlocal',
],
extra_requires={
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/tests/requirements.txt
new/dateparser-0.7.4/tests/requirements.txt
--- old/dateparser-0.7.2/tests/requirements.txt 2019-09-17 12:57:56.000000000
+0200
+++ new/dateparser-0.7.4/tests/requirements.txt 2020-03-06 12:31:08.000000000
+0100
@@ -3,3 +3,4 @@
parameterized
six
coverage
+flake8
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/tests/test_date.py
new/dateparser-0.7.4/tests/test_date.py
--- old/dateparser-0.7.2/tests/test_date.py 2019-09-17 12:57:56.000000000
+0200
+++ new/dateparser-0.7.4/tests/test_date.py 2020-03-06 12:31:08.000000000
+0100
@@ -4,6 +4,7 @@
import unittest
from collections import OrderedDict
+from copy import copy
from datetime import datetime, timedelta
from mock import Mock, patch
@@ -12,7 +13,6 @@
import dateparser
from dateparser import date
-from dateparser.date import get_last_day_of_month
from dateparser.conf import settings
from tests import BaseTestCase
@@ -274,18 +274,29 @@
@parameterized.expand([
param(date_string='August 2014', date_formats=['%B %Y'],
- expected_year=2014, expected_month=8),
+ expected_year=2014, expected_month=8, today_day=12,
+ prefer_day_of_month='first', expected_day=1),
+ param(date_string='August 2014', date_formats=['%B %Y'],
+ expected_year=2014, expected_month=8, today_day=12,
+ prefer_day_of_month='last', expected_day=31),
+ param(date_string='August 2014', date_formats=['%B %Y'],
+ expected_year=2014, expected_month=8, today_day=12,
+ prefer_day_of_month='current', expected_day=12),
])
- def test_should_use_last_day_of_month_for_dates_without_day(
- self, date_string, date_formats, expected_year, expected_month
+ def test_should_use_correct_day_from_settings_for_dates_without_day(
+ self, date_string, date_formats, expected_year, expected_month,
+ today_day, prefer_day_of_month, expected_day
):
- self.given_now(2014, 8, 12)
- self.when_date_is_parsed_with_formats(date_string, date_formats)
+ self.given_now(2014, 8, today_day)
+ settings_mod = copy(settings)
+ settings_mod.PREFER_DAY_OF_MONTH = prefer_day_of_month
+ self.when_date_is_parsed_with_formats(date_string, date_formats,
settings_mod)
self.then_date_was_parsed()
self.then_parsed_period_is('month')
self.then_parsed_date_is(datetime(year=expected_year,
month=expected_month,
-
day=get_last_day_of_month(expected_year, expected_month)))
+ day=expected_day))
+
@parameterized.expand([
param(date_string='25-03-14', date_formats='%d-%m-%y',
expected_result=datetime(2014, 3, 25)),
@@ -303,9 +314,10 @@
datetime_mock.now = Mock(return_value=now)
datetime_mock.today = Mock(return_value=now)
self.add_patch(patch('dateparser.date.datetime', new=datetime_mock))
+ self.add_patch(patch('dateparser.utils.datetime', new=datetime_mock))
- def when_date_is_parsed_with_formats(self, date_string, date_formats):
- self.result = date.parse_with_formats(date_string, date_formats,
settings)
+ def when_date_is_parsed_with_formats(self, date_string, date_formats,
custom_settings=None):
+ self.result = date.parse_with_formats(date_string, date_formats,
custom_settings or settings)
def then_date_was_not_parsed(self):
self.assertIsNotNone(self.result)
@@ -437,6 +449,12 @@
TypeError, ["Date formats should be list, tuple or set of strings",
"'{}' object is not
iterable".format(type(date_formats).__name__)])
+ def test_parsing_date_using_unknown_parsers_must_raise_error(self):
+ self.given_parser(settings={'PARSERS': ['foo']})
+ self.when_date_string_is_parsed('2020-02-19')
+ self.then_error_was_raised(
+ ValueError, ["Unknown parsers found in the PARSERS setting: foo"])
+
@parameterized.expand([
param(date_string={"date": "12/11/1998"}),
param(date_string=[2017, 12, 1]),
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/tests/test_date_parser.py
new/dateparser-0.7.4/tests/test_date_parser.py
--- old/dateparser-0.7.2/tests/test_date_parser.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/tests/test_date_parser.py 2020-03-06
12:31:08.000000000 +0100
@@ -493,7 +493,7 @@
param('April 2015', today=datetime(2015, 2, 28),
expected=datetime(2015, 4, 28)),
param('December 2014', today=datetime(2015, 2, 15),
expected=datetime(2014, 12, 15)),
])
- def test_dates_with_day_missing_prefering_current_day_of_month(
+ def test_dates_with_day_missing_preferring_current_day_of_month(
self, date_string, today=None, expected=None):
self.given_parser(settings={'PREFER_DAY_OF_MONTH': 'current',
'RELATIVE_BASE': today})
self.when_date_is_parsed(date_string)
@@ -508,7 +508,7 @@
param('April 2015', today=datetime(2015, 2, 28),
expected=datetime(2015, 4, 30)),
param('December 2014', today=datetime(2015, 2, 15),
expected=datetime(2014, 12, 31)),
])
- def test_dates_with_day_missing_prefering_last_day_of_month(
+ def test_dates_with_day_missing_preferring_last_day_of_month(
self, date_string, today=None, expected=None):
self.given_parser(settings={'PREFER_DAY_OF_MONTH': 'last',
'RELATIVE_BASE': today})
self.when_date_is_parsed(date_string)
@@ -523,7 +523,7 @@
param('April 2015', today=datetime(2015, 2, 28),
expected=datetime(2015, 4, 1)),
param('December 2014', today=datetime(2015, 2, 15),
expected=datetime(2014, 12, 1)),
])
- def test_dates_with_day_missing_prefering_first_day_of_month(
+ def test_dates_with_day_missing_preferring_first_day_of_month(
self, date_string, today=None, expected=None):
self.given_parser(settings={'PREFER_DAY_OF_MONTH': 'first',
'RELATIVE_BASE': today})
self.when_date_is_parsed(date_string)
@@ -703,6 +703,20 @@
self.then_date_was_parsed_by_date_parser()
self.then_date_obj_exactly_is(expected)
+ @parameterized.expand([
+ param('::', None),
+ param('..', None),
+ param(' ', None),
+ param('--', None),
+ param('//', None),
+ param('++', None),
+ ])
+ def test_parsing_strings_containing_only_separator_tokens(self,
date_string, expected):
+ self.given_parser()
+ self.when_date_is_parsed(date_string)
+ self.then_period_is('day')
+ self.then_date_obj_exactly_is(expected)
+
def given_local_tz_offset(self, offset):
self.add_patch(
patch.object(dateparser.timezone_parser,
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/tests/test_freshness_date_parser.py
new/dateparser-0.7.4/tests/test_freshness_date_parser.py
--- old/dateparser-0.7.2/tests/test_freshness_date_parser.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/tests/test_freshness_date_parser.py 2020-03-06
12:31:08.000000000 +0100
@@ -1445,8 +1445,10 @@
param('the day before yesterday 16:50', date(2014, 8, 30), time(16,
50)),
param('2 Tage 18:50', date(2014, 8, 30), time(18, 50)),
param('1 day ago at 2 PM', date(2014, 8, 31), time(14, 0)),
+ param('one day ago at 2 PM', date(2014, 8, 31), time(14, 0)),
param('Dnes v 12:40', date(2014, 9, 1), time(12, 40)),
param('1 week ago at 12:00 am', date(2014, 8, 25), time(0, 0)),
+ param('one week ago at 12:00 am', date(2014, 8, 25), time(0, 0)),
param('tomorrow at 2 PM', date(2014, 9, 2), time(14, 0)),
])
def test_freshness_date_with_time(self, date_string, date, time):
@@ -1560,6 +1562,30 @@
self.given_date_string(date_string)
self.when_date_is_parsed()
self.then_date_is(date)
+ self.then_time_is(time)
+
+ @parameterized.expand([
+ param('3 days', date(2010, 6, 1), time(13, 15)),
+ param('2 years', date(2008, 6, 4), time(13, 15)),
+ ])
+ def test_freshness_date_with_relative_base_past(self, date_string, date,
time):
+ self.given_parser(settings={'PREFER_DATES_FROM': 'past',
+ 'RELATIVE_BASE': datetime(2010, 6, 4, 13, 15)})
+ self.given_date_string(date_string)
+ self.when_date_is_parsed()
+ self.then_date_is(date)
+ self.then_time_is(time)
+
+ @parameterized.expand([
+ param('3 days', date(2010, 6, 7), time(13, 15)),
+ param('2 years', date(2012, 6, 4), time(13, 15)),
+ ])
+ def test_freshness_date_with_relative_base_future(self, date_string, date,
time):
+ self.given_parser(settings={'PREFER_DATES_FROM': 'future',
+ 'RELATIVE_BASE': datetime(2010, 6, 4, 13, 15)})
+ self.given_date_string(date_string)
+ self.when_date_is_parsed()
+ self.then_date_is(date)
self.then_time_is(time)
def given_date_string(self, date_string):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/tests/test_languages.py
new/dateparser-0.7.4/tests/test_languages.py
--- old/dateparser-0.7.2/tests/test_languages.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/tests/test_languages.py 2020-03-06
12:31:08.000000000 +0100
@@ -1415,6 +1415,13 @@
param('nb', "om 6 timer", "in 6 hour"),
param('nb', "om 2 måneder", "in 2 month"),
param('nb', "forrige uke", "1 week ago"),
+ param('nb', "for 3 dager siden", "3 day ago"),
+ param('nb', "for 3 timer siden", "3 hour ago"),
+ param('nb', '3 dager siden', '3 day ago'),
+ param('nb', "3 mnd siden", "3 month ago"),
+ param('nb', "2 uker siden", "2 week ago"),
+ param('nb', "1 uke siden", "1 week ago"),
+ param('nb', "10 timer siden", "10 hour ago"),
# nd
param('nd', "kusasa", "in 1 day"),
param('nd', "izolo", "1 day ago"),
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/tests/test_parser.py
new/dateparser-0.7.4/tests/test_parser.py
--- old/dateparser-0.7.2/tests/test_parser.py 2019-09-17 12:57:56.000000000
+0200
+++ new/dateparser-0.7.4/tests/test_parser.py 2020-03-06 12:31:08.000000000
+0100
@@ -1,3 +1,5 @@
+# coding: utf-8
+
from datetime import datetime, time
from parameterized import parameterized, param
@@ -49,6 +51,11 @@
expected_types=[1, 2, 0, 2, 0, 2, 0, 2, 1],
),
param(
+ date_string=u"Oct 1 2018 4:40 PM EST —",
+ expected_tokens=['Oct', ' ', '1', ' ', '2018', ' ', '4:40', ' ',
'PM', ' ', 'EST', u' —'],
+ expected_types=[1, 2, 0, 2, 0, 2, 0, 2, 1, 2, 1, 2],
+ ),
+ param(
date_string=tokenizer.digits,
expected_tokens=[tokenizer.digits],
expected_types=[0],
@@ -59,8 +66,8 @@
expected_types=[1],
),
param(
- date_string=tokenizer.nonwords,
- expected_tokens=[tokenizer.nonwords],
+ date_string=u"./\()\"',.;<>~!@#$%^&*|+=[]{}`~?-—– 😊", #
unrecognized characters
+ expected_tokens=[u"./\()\"',.;<>~!@#$%^&*|+=[]{}`~?-—– 😊"],
expected_types=[2],
),
])
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/tests/test_search.py
new/dateparser-0.7.4/tests/test_search.py
--- old/dateparser-0.7.2/tests/test_search.py 2019-09-17 12:57:56.000000000
+0200
+++ new/dateparser-0.7.4/tests/test_search.py 2020-03-06 12:31:08.000000000
+0100
@@ -6,6 +6,7 @@
from dateparser.search.search import DateSearchWithDetection
from dateparser.search import search_dates
from dateparser.conf import Settings, apply_settings
+from dateparser_data.settings import default_parsers
import datetime
@@ -274,6 +275,14 @@
'Aug 06, 2018 05:05 PM CDT',
datetime.datetime(2018, 8, 6, 17, 5,
tzinfo=StaticTzInfo('CDT', datetime.timedelta(seconds=-18000))))],
settings={'RELATIVE_BASE': datetime.datetime(2000, 1, 1)}),
+ param('en', '25th march 2015 , i need this report today.',
+ [('25th march 2015', datetime.datetime(2015, 3, 25))],
+ settings={'PARSERS': [parser for parser in default_parsers
+ if parser != 'relative-time']}),
+ param('en', '25th march 2015 , i need this report today.',
+ [('25th march 2015', datetime.datetime(2015, 3, 25)),
+ ('today', datetime.datetime(2000, 1, 1))],
+ settings={'RELATIVE_BASE': datetime.datetime(2000, 1, 1)}),
# Filipino / Tagalog
param('tl', 'Maraming namatay sa mga Hapon hanggang sila\'y sumuko
noong Agosto 15, 1945.',
@@ -426,8 +435,8 @@
('February 1st', datetime.datetime(2017, 2, 1, 0, 0))]),
param('en', '2014 was good! October was excellent!'
' Friday, 21 was especially good!',
- [('2014', datetime.datetime(2014,
datetime.datetime.today().month, datetime.datetime.today().day, 0, 0)),
- ('October', datetime.datetime(2014, 10,
datetime.datetime.today().day, 0, 0)),
+ [('2014', datetime.datetime(2014,
datetime.datetime.utcnow().month, datetime.datetime.utcnow().day, 0, 0)),
+ ('October', datetime.datetime(2014, 10,
datetime.datetime.utcnow().day, 0, 0)),
('Friday, 21', datetime.datetime(2014, 10, 21, 0, 0))]),
# Russian
@@ -469,7 +478,7 @@
('July 13th', datetime.datetime(2014, 7, 13, 0, 0)),
('July 14th', datetime.datetime(2014, 7, 14, 0, 0))]),
param('en', '2014. July 13th July 14th',
- [('2014', datetime.datetime(2014,
datetime.datetime.today().month, datetime.datetime.today().day, 0, 0)),
+ [('2014', datetime.datetime(2014,
datetime.datetime.utcnow().month, datetime.datetime.utcnow().day, 0, 0)),
('July 13th', datetime.datetime(2014, 7, 13, 0, 0)),
('July 14th', datetime.datetime(2014, 7, 14, 0, 0))]),
param('en', 'July 13th 2014 July 14th 2014',
@@ -482,16 +491,16 @@
[('July 13th, 2014', datetime.datetime(2014, 7, 13, 0, 0)),
('July 14th, 2014', datetime.datetime(2014, 7, 14, 0, 0))]),
param('en', '2014. July 12th, July 13th, July 14th',
- [('2014', datetime.datetime(2014,
datetime.datetime.today().month, datetime.datetime.today().day, 0, 0)),
+ [('2014', datetime.datetime(2014,
datetime.datetime.utcnow().month, datetime.datetime.utcnow().day, 0, 0)),
('July 12th', datetime.datetime(2014, 7, 12, 0, 0)),
('July 13th', datetime.datetime(2014, 7, 13, 0, 0)),
('July 14th', datetime.datetime(2014, 7, 14, 0, 0))]),
# Swedish
param('sv', '1938–1939 marscherade tyska soldater i Österrike
samtidigt som '
'österrikiska soldater marscherade i Berlin.',
- [('1938', datetime.datetime(1938,
datetime.datetime.today().month, datetime.datetime.today().day, 0, 0)),
+ [('1938', datetime.datetime(1938,
datetime.datetime.utcnow().month, datetime.datetime.utcnow().day, 0, 0)),
('1939', datetime.datetime(1939,
- datetime.datetime.today().month,
datetime.datetime.today().day, 0, 0))]),
+ datetime.datetime.utcnow().month,
datetime.datetime.utcnow().day, 0, 0))]),
# German
param('de', 'Verteidiger der Stadt kapitulierten am 2. Mai 1945. Am 8.
Mai 1945 (VE-Day) trat '
'bedingungslose Kapitulation der Wehrmacht in Kraft',
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/tests/test_timezone_parser.py
new/dateparser-0.7.4/tests/test_timezone_parser.py
--- old/dateparser-0.7.2/tests/test_timezone_parser.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/tests/test_timezone_parser.py 2020-03-06
12:31:08.000000000 +0100
@@ -28,6 +28,7 @@
param('Nov 25 2014 | 10:17 pm -0930', -9.5),
param('20 Oct 2014 | 05:17 am -1200', -12),
param('20 Oct 2014 | 05:17 am +0000', 0),
+ param('20 Oct 2014 | 05:17 am -0000', 0),
param('15 May 2004', None),
param('Wed Aug 05 12:00:00 EDTERR 2015', None),
param('Wed Aug 05 12:00:00 EDT 2015', -4),
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/tests/test_utils.py
new/dateparser-0.7.4/tests/test_utils.py
--- old/dateparser-0.7.2/tests/test_utils.py 2019-09-17 12:57:56.000000000
+0200
+++ new/dateparser-0.7.4/tests/test_utils.py 2020-03-06 12:31:08.000000000
+0100
@@ -5,8 +5,8 @@
from parameterized import parameterized, param
from dateparser.utils import (
find_date_separator, localize_timezone, apply_timezone,
- apply_timezone_from_settings, registry
-)
+ apply_timezone_from_settings, registry,
+ get_last_day_of_month)
from pytz import UnknownTimeZoneError, utc
from dateparser.conf import settings
@@ -20,10 +20,10 @@
def given_date_format(self, date_format):
self.date_format = date_format
- def when_date_seperator_is_parsed(self):
+ def when_date_separator_is_parsed(self):
self.result = find_date_separator(self.date_format)
- def then_date_seperator_is(self, sep):
+ def then_date_separator_is(self, sep):
self.assertEqual(self.result, sep)
@staticmethod
@@ -41,8 +41,8 @@
])
def test_separator_extraction(self, date_format, expected_sep):
self.given_date_format(date_format)
- self.when_date_seperator_is_parsed()
- self.then_date_seperator_is(expected_sep)
+ self.when_date_separator_is_parsed()
+ self.then_date_separator_is(expected_sep)
@parameterized.expand([
param(datetime(2015, 12, 12), timezone='UTC', zone='UTC'),
@@ -104,3 +104,23 @@
def test_registry_when_get_keys_not_implemented(self):
cl = self.make_class_without_get_keys()
self.assertRaises(NotImplementedError, registry, cl)
+
+ @parameterized.expand([
+ param(2111, 1, 31),
+ param(1999, 2, 28), # normal year
+ param(1996, 2, 29), # leap and not centurial year
+ param(2000, 2, 29), # leap and centurial year
+ param(1700, 2, 28), # no leap and centurial year (exception)
+ param(2020, 3, 31),
+ param(1987, 4, 30),
+ param(1000, 5, 31),
+ param(1534, 6, 30),
+ param(1777, 7, 31),
+ param(1234, 8, 31),
+ param(1678, 9, 30),
+ param(1947, 10, 31),
+ param(2015, 11, 30),
+ param(2300, 12, 31),
+ ])
+ def test_get_last_day_of_month(self, year, month, expected_last_day):
+ assert get_last_day_of_month(year, month) == expected_last_day
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn'
'--exclude=.svnignore' old/dateparser-0.7.2/tests/test_utils_strptime.py
new/dateparser-0.7.4/tests/test_utils_strptime.py
--- old/dateparser-0.7.2/tests/test_utils_strptime.py 2019-09-17
12:57:56.000000000 +0200
+++ new/dateparser-0.7.4/tests/test_utils_strptime.py 2020-03-06
12:31:08.000000000 +0100
@@ -1,6 +1,7 @@
import locale
from parameterized import parameterized, param
from datetime import datetime
+from unittest import SkipTest
from tests import BaseTestCase
from dateparser.utils.strptime import strptime
@@ -11,7 +12,10 @@
super(TestStrptime, self).setUp()
def given_system_locale_is(self, locale_str):
- locale.setlocale(locale.LC_ALL, locale_str)
+ try:
+ locale.setlocale(locale.LC_ALL, locale_str)
+ except locale.Error:
+ raise SkipTest('Locale {} is not installed'.format(locale_str))
def when_date_string_is_parsed(self, date_string, fmt):
try: