subject:"Dataset API Question"

Re: Dataset API Question

2017-10-25 Thread Wenchen Fan

It's because of different API design. *RDD.checkpoint* returns void, which means it mutates the RDD state so you need a *RDD**.isCheckpointed* method to check if this RDD is checkpointed. *Dataset.checkpoint* returns a new Dataset, which means there is no isCheckpointed state in Dataset, and thus

Re: Dataset API Question

2017-10-25 Thread Bernard Jesop

Actually, I realized keeping the info would not be enough as I need to find back the checkpoint files to delete them :/ 2017-10-25 19:07 GMT+02:00 Bernard Jesop : > As far as I understand, Dataset.rdd is not the same as InternalRDD. > It is just another RDD representation of the same Dataset and

Re: Dataset API Question

2017-10-25 Thread Bernard Jesop

As far as I understand, Dataset.rdd is not the same as InternalRDD. It is just another RDD representation of the same Dataset and is created on demand (lazy val) when Dataset.rdd is called. This totally explains the observed behavior. But how would would it be possible to know that a Dataset have

Re: Dataset API Question

2017-10-25 Thread Reynold Xin

It is a bit more than syntactic sugar, but not much more: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L533 BTW this is basically writing all the data out, and then create a new Dataset to load them in. On Wed, Oct 25, 2017 at 6:51 AM, Be

Dataset API Question

2017-10-25 Thread Bernard Jesop

Hello everyone, I have a question about checkpointing on dataset. It seems in 2.1.0 that there is a Dataset.checkpoint(), however unlike RDD there is no Dataset.isCheckpointed(). I wonder if Dataset.checkpoint is a syntactic sugar for Dataset.rdd.checkpoint. When I do : Dataset.checkpoint; Data

Re: Dataset API Question

Re: Dataset API Question

Re: Dataset API Question

Re: Dataset API Question

Dataset API Question

5 matches

Site Navigation

Mail list logo

Footer information