Re: PySpark still reading only text?

Jesvin Jose Wed, 16 Apr 2014 21:56:17 -0700

When this is implemented, can you load/save an RDD of pickled objects to
HDFS?



On Thu, Apr 17, 2014 at 1:51 AM, Matei Zaharia <matei.zaha...@gmail.com>wrote:

> Hi Bertrand,
>
> We should probably add a SparkContext.pickleFile and RDD.saveAsPickleFile
> that will allow saving pickled objects. Unfortunately this is not in yet,
> but there is an issue up to track it:
> https://issues.apache.org/jira/browse/SPARK-1161.
>
> In 1.0, one feature we do have now is the ability to load binary data from
> Hive using Spark SQL’s Python API. Later we will also be able to save to
> Hive.
>
> Matei
>
> On Apr 16, 2014, at 4:27 AM, Bertrand Dechoux <decho...@gmail.com> wrote:
>
> > Hi,
> >
> > I have browsed the online documentation and it is stated that PySpark
> only read text files as sources. Is it still the case?
> >
> > From what I understand, the RDD can after this first step be any
> serialized python structure if the class definitions are well distributed.
> >
> > Is it not possible to read back those RDDs? That is create a flow to
> parse everything and then, e.g. the next week, start from the binary,
> structured data?
> >
> > Technically, what is the difficulty? I would assume the code reading a
> binary python RDD or a binary python file to be quite similar. Where can I
> know more about this subject?
> >
> > Thanks in advance
> >
> > Bertrand
>
>


-- 
We dont beat the reaper by living longer. We beat the reaper by living well
and living fully. The reaper will come for all of us. Question is, what do
we do between the time we are born and the time he shows up? -Randy Pausch

Re: PySpark still reading only text?

Reply via email to