Re: Spark MySQL Invalid DateTime value killing job

2019-06-05 Thread Anthony May
Murphy's Law striking after asking the question, I just discovered the
solution:
The jdbc url should set the zeroDateTimeBehavior option.
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-configuration-properties.html
https://stackoverflow.com/questions/11133759/-00-00-00-can-not-be-represented-as-java-sql-timestamp-error

On Wed, Jun 5, 2019 at 6:29 PM Anthony May  wrote:

> Hi,
>
> We have a legacy process of scraping a MySQL Database. The Spark job uses
> the DataFrame API and MySQL JDBC driver to read the tables and save them as
> JSON files. One table has DateTime columns that contain values invalid for
> java.sql.Timestamp so it's throwing the exception:
> java.sql.SQLException: Value '-00-00 00:00:00' can not be represented
> as java.sql.Timestamp
>
> Unfortunately, I can't edit the values in the table to make them valid.
> There doesn't seem to be a way to specify row level exception handling in
> the DataFrame API. Is there a way to handle this that would scale for
> hundreds of tables?
>
> Any help is appreciated.
>
> Anthony
>


Spark MySQL Invalid DateTime value killing job

2019-06-05 Thread Anthony May
Hi,

We have a legacy process of scraping a MySQL Database. The Spark job uses
the DataFrame API and MySQL JDBC driver to read the tables and save them as
JSON files. One table has DateTime columns that contain values invalid for
java.sql.Timestamp so it's throwing the exception:
java.sql.SQLException: Value '-00-00 00:00:00' can not be represented
as java.sql.Timestamp

Unfortunately, I can't edit the values in the table to make them valid.
There doesn't seem to be a way to specify row level exception handling in
the DataFrame API. Is there a way to handle this that would scale for
hundreds of tables?

Any help is appreciated.

Anthony