Hi,
We are running on Spark 2.2.1, generating parquet files, like the following
pseudo code
df.write.parquet(...)
We have recently noticed parquet file corruptions, when reading the parquet
in Spark or Presto, as the following:
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not
before, what do you do to
prevent a recurrence?
Thanks,
Dong
From: Ryan Blue <rb...@netflix.com>
Reply-To: "rb...@netflix.com" <rb...@netflix.com>
Date: Monday, February 5, 2018 at 12:46 PM
To: Dong Jiang <dji...@dataxu.com>
Cc: Spark Dev List <dev@spark.apache.or
back the entire
data set, and then copy from HDFS to S3. Any other thoughts?
From: Steve Loughran <ste...@hortonworks.com>
Date: Monday, February 12, 2018 at 2:27 PM
To: "rb...@netflix.com" <rb...@netflix.com>
Cc: Dong Jiang <dji...@dataxu.com>, Apache Spark Dev <de
o: "rb...@netflix.com" <rb...@netflix.com>
Date: Monday, February 5, 2018 at 1:34 PM
To: Dong Jiang <dji...@dataxu.com>
Cc: Spark Dev List <dev@spark.apache.org>
Subject: Re: Corrupt parquet file
We ensure the bad node is removed from our cluster and reprocess to replac
a recurrence? Can you share your experience?
Thanks,
Dong
From: Ryan Blue <rb...@netflix.com>
Reply-To: "rb...@netflix.com" <rb...@netflix.com>
Date: Monday, February 5, 2018 at 12:38 PM
To: Dong Jiang <dji...@dataxu.com>
Cc: Spark Dev List <dev@spark.apache.or
Hi,
I opened a JIRA ticket https://issues.apache.org/jira/browse/SPARK-23549, I
don't know if anyone can take a look?
Spark SQL unexpected behavior when comparing timestamp to date
scala> spark.version
res1: String = 2.2.1
scala> spark.sql("select cast('2017-03-01 00:00:00' as timestamp)