i original assumed that persisting is similar to writing. But its not. Hence i want to change the behavior of intermediate persists.
On Wed, Jul 1, 2015 at 8:46 AM, Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > So do you want to change the behavior of persist api or write the rdd on > disk... > On Jul 1, 2015 9:13 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepuj...@gmail.com> wrote: > >> I think i want to use persist then and write my intermediate RDDs to >> disk+mem. >> >> On Wed, Jul 1, 2015 at 8:28 AM, Raghavendra Pandey < >> raghavendra.pan...@gmail.com> wrote: >> >>> I think persist api is internal to rdd whereas write api is for saving >>> content on dist. >>> Rdd persist will dump your obj bytes serialized on the disk.. If you >>> wanna change that behavior you need to override the class serialization >>> that your are storing in rdd.. >>> On Jul 1, 2015 8:50 PM, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepuj...@gmail.com> wrote: >>> >>>> This is my write API. how do i integrate it here. >>>> >>>> >>>> protected def writeOutputRecords(detailRecords: >>>> RDD[(AvroKey[DetailOutputRecord], NullWritable)], outputDir: String) { >>>> val writeJob = new Job() >>>> val schema = SchemaUtil.outputSchema(_detail) >>>> AvroJob.setOutputKeySchema(writeJob, schema) >>>> val outputRecords = detailRecords.coalesce(100) >>>> outputRecords.saveAsNewAPIHadoopFile(outputDir, >>>> classOf[AvroKey[GenericRecord]], >>>> classOf[org.apache.hadoop.io.NullWritable], >>>> classOf[AvroKeyOutputFormat[GenericRecord]], >>>> writeJob.getConfiguration) >>>> } >>>> >>>> On Wed, Jul 1, 2015 at 8:11 AM, Koert Kuipers <ko...@tresata.com> >>>> wrote: >>>> >>>>> rdd.persist(StorageLevel.MEMORY_AND_DISK_SER) >>>>> >>>>> On Wed, Jul 1, 2015 at 11:01 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com> >>>>> wrote: >>>>> >>>>>> How do i persist an RDD using StorageLevel.MEMORY_AND_DISK_SER ? >>>>>> >>>>>> >>>>>> -- >>>>>> Deepak >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Deepak >>>> >>>> >> >> >> -- >> Deepak >> >> -- Deepak