Manjunath Hatti created SPARK-32547: ---------------------------------------
Summary: Cant able to process Timestamp 0001-01-01T00:00:00.000+0000 with TimestampType Key: SPARK-32547 URL: https://issues.apache.org/jira/browse/SPARK-32547 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.0.0 Reporter: Manjunath Hatti Spark Version : 3.0.0 Below is the sample code to reproduce the problem with TimestampType. {code:java} from pyspark.sql.functions import lit from pyspark.sql.types import TimestampType df=spark.createDataFrame([(1, 'foo'),(2, 'bar'),],['id', 'txt']) new_df=df.withColumn("test_timestamp",lit("0001-01-01T00:00:00.000+0000").cast(TimestampType())) new_df.printSchema() root |-- id: long (nullable = true) |-- txt: string (nullable = true) |-- test_date: timestamp (nullable = true) |-- test_timestamp: timestamp (nullable = true) new_df.show() +---+---+-------------------+ | id|txt| test_date| +---+---+-------------------+ | 1|foo|0001-01-01 00:00:00| | 2|bar|0001-01-01 00:00:00| +---+---+------------------- {code} df.rdd.isEmpty() operation is failing with *year 0 is out of range* {code:java} df.rdd.isEmpty() Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Traceback (most recent call last): File "/databricks/spark/python/pyspark/serializers.py", line 177, in _read_with_length return self.loads(obj) File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads return pickle.loads(obj, encoding=encoding) File "/databricks/spark/python/pyspark/sql/types.py", line 1415, in <lambda> return lambda *a: dataType.fromInternal(a) File "/databricks/spark/python/pyspark/sql/types.py", line 635, in fromInternal for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 635, in <listcomp> for f, v, c in zip(self.fields, obj, self._needConversion)] File "/databricks/spark/python/pyspark/sql/types.py", line 447, in fromInternal return self.dataType.fromInternal(obj) File "/databricks/spark/python/pyspark/sql/types.py", line 201, in fromInternal return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000) ValueError: year 0 is out of range{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org