There is a good talk by Chris Houser, "Condition Systems in an Exceptional
Language" [1] where he systematically goes through many of the aspects on
handling exceptions in batch processing (like the problem Josh describes).
An interesting way to build a conditional error handler is outlined towards
the end of that talk, the essential idea is to define dynamic functions
which receives the exceptions and decides how to continue.
If we look into the example outlined in Josh's question, there is some
object with a :created-field. The :created-field can be suspicious in that
equals beginning of unix epoch.
One thing to note with detectable errors, like the zero-timestamp, that the
check for this is probably best represented as a predicate function like
non-zero-timestamp?, and that the call to such a function is a actually
some kind of pre-emptive branching.
This branching could be represented as two different types, maybe like
[:valid-date #inst "2017-01-02T10:01:22.231Z"] or [:invalid-date
"1970-01-01T00:00:00.000Z"]. This makes it nescessary for all logic
afterwards to be able to handle both of these types, but de-couples all
other logic from deciding what is a valid date or not. In an application
specialized in converting dirty data this is probably a worthy trade-off.
I think the most important thing when handling this type of situation is
that the inconsistency has to be embraced and that the data representation
of the system has to be able to contain both wanted and unwanted values.
In this case, it would mean that the "outer", most general spec for such a
timestamp must be able to contain both valid dates and things which are
sorted out as not valid. These non-valid dates may still have to live up to
certain properties (being a "correct error object"). This distinguishes
handle-able errors from programming errors which is usually what we want to
detect during tests for.
It is very hard to construct specs which are certainly disjunct, if they
are not different types or similar. When it comes to various maps it is
certainly possible to construct specs for various types of maps, but it's
very complicated to make dispatches based on values of keys in the map
(which probably is the most common way to distinguish various types of
objects in clojure).
A very useful feature of clojure.spec is conform. With conform, one
"upsert" any value to something which is according to our specs.
Example:
parsed-date-spec is a spec accepting {:unparsable-date ...} as well as some
representation of correct timestamps, and suspicous timestamps.
(defn parse-date
"conform, which accepts a broad range of arguments"
[o]
(cond?
(s/valid? parsed-date-spec o) o
(inst? o) o ;; java.util.Date is ok
(string? o) (try (parse-date o) (catch Throwable t {:unparsable-date
o :exception t})
:else
:clojure.spec/invalid) ;;something is wrong with the program if it's
not an internal date representation, a string, or inst.
When we know that the thing we have is something which we have control
over, it is much easier to construct a predicate which checks validly
parseable-dates for being 0 timestamps (or to far in the future, or
anything).
I suggest you
1. create a representation that makes it possible for you and your
application to handle correct and in-correct values in a unified manner and
that "normal errors" are embraced by the application logic.
2. strive to use predicate- or other simple functions (rather than
complicated clojure.spec setups) to identify invalid entities in the
input-data.
3. handle such errors inside the scope of the function / part of the
program that handles the correct path of the program.
4. uses :clojure.spec/invalid and informative exceptions when there are
unforeseen programming errors which the program cannot possibly recover
from.
The idea of using clojure spec to classify error messages in
data-validation "all the way" is interesting. One problem is that the logic
that want to make use of the diagnosis-data needs to know quite much about
the general structure of the data (and the specs).
/Linus
[1] Chris Houser, "Condition Systems in an Exceptional Language"
https://www.youtube.com/watch?v=zp0OEDcAro0
On Saturday, February 18, 2017 at 12:42:25 AM UTC+1, Josh Tilles wrote:
>
> Has anyone explored using spec for “soft” failures? For example, if I’m
> writing an ETL system to migrate legacy customer account data, all I might
> *require* of a record’s :created field is that the value is a
> syntactically valid date-time string. If any record claimed that it was
> created on "1970-01-01T00:00:00.000Z", of course that would almost
> certainly be bad data; but instead of crashing the program or refusing to
> process the record, let’s say I want to log (at the WARN level) so