sorry for hijacking this thread.
@irving, how do you restart a spark job from checkpoint?
Regards
Sumit Chawla
On Fri, Dec 16, 2016 at 2:24 AM, Selvam Raman wrote:
> Hi,
>
> Acutally my requiremnt is read the parquet file which is 100 partition.
> Then i use foreachpartition to read the data
Hi,
Acutally my requiremnt is read the parquet file which is 100 partition.
Then i use foreachpartition to read the data and process it.
My sample code
public static void main(String[] args) {
SparkSession sparkSession = SparkSession.builder().appName("checkpoint
verification").getOrCreate();
I am using java. I will try and let u know.
On Dec 15, 2016 8:45 PM, "Irving Duran" wrote:
> Not sure what programming language you are using, but in python you can do
> "sc.setCheckpointDir('~/apps/spark-2.0.1-bin-hadoop2.7/checkpoint/')".
> This will store checkpoints on that directory that I c
Not sure what programming language you are using, but in python you can do "
sc.setCheckpointDir('~/apps/spark-2.0.1-bin-hadoop2.7/checkpoint/')". This
will store checkpoints on that directory that I called checkpoint.
Thank You,
Irving Duran
On Thu, Dec 15, 2016 at 10:33 AM, Selvam Raman wro
Hi,
is there any provision in spark batch for checkpoint.
I am having huge data, it takes more than 3 hours to process all data. I am
currently having 100 partitions.
if the job fails after two hours, lets say it has processed 70 partition.
should i start spark job from the beginning or is there