Re: Efficient approach to store an RDD as a file in HDFS and read it back as an RDD?

Igor Berman Thu, 05 Nov 2015 10:03:27 -0800

Hi,
we are using avro with compression(snappy). As soon as you have enough
partitions, the saving won't be a problem imho.
in general hdfs is pretty fast, s3 is less so
the issue with storing data is that you will loose your partitioner(even
though rdd has it) at loading moment. There is PR that tries to solve this.



On 5 November 2015 at 01:09, swetha <swethakasire...@gmail.com> wrote:

> Hi,
>
> What is the efficient approach to save an RDD as a file in HDFS and
> retrieve
> it back? I was thinking between Avro, Parquet and SequenceFileFormart. We
> currently use SequenceFileFormart for one of our use cases.
>
> Any example on how to store and retrieve an RDD in an Avro and Parquet file
> formats would be of great help.
>
> Thanks,
> Swetha
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Efficient-approach-to-store-an-RDD-as-a-file-in-HDFS-and-read-it-back-as-an-RDD-tp25279.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Efficient approach to store an RDD as a file in HDFS and read it back as an RDD?

Reply via email to