Re: PySpark still reading only text?

Matei Zaharia Wed, 16 Apr 2014 22:24:07 -0700

Yes, this JIRA would enable that. The Hive support also handles HDFS.

Matei


On Apr 16, 2014, at 9:55 PM, Jesvin Jose <frank.einst...@gmail.com> wrote:

> When this is implemented, can you load/save an RDD of pickled objects to HDFS?
> 
> 
> On Thu, Apr 17, 2014 at 1:51 AM, Matei Zaharia <matei.zaha...@gmail.com> 
> wrote:
> Hi Bertrand,
> 
> We should probably add a SparkContext.pickleFile and RDD.saveAsPickleFile 
> that will allow saving pickled objects. Unfortunately this is not in yet, but 
> there is an issue up to track it: 
> https://issues.apache.org/jira/browse/SPARK-1161.
> 
> In 1.0, one feature we do have now is the ability to load binary data from 
> Hive using Spark SQL’s Python API. Later we will also be able to save to Hive.
> 
> Matei
> 
> On Apr 16, 2014, at 4:27 AM, Bertrand Dechoux <decho...@gmail.com> wrote:
> 
> > Hi,
> >
> > I have browsed the online documentation and it is stated that PySpark only 
> > read text files as sources. Is it still the case?
> >
> > From what I understand, the RDD can after this first step be any serialized 
> > python structure if the class definitions are well distributed.
> >
> > Is it not possible to read back those RDDs? That is create a flow to parse 
> > everything and then, e.g. the next week, start from the binary, structured 
> > data?
> >
> > Technically, what is the difficulty? I would assume the code reading a 
> > binary python RDD or a binary python file to be quite similar. Where can I 
> > know more about this subject?
> >
> > Thanks in advance
> >
> > Bertrand
> 
> 
> 
> 
> -- 
> We dont beat the reaper by living longer. We beat the reaper by living well 
> and living fully. The reaper will come for all of us. Question is, what do we 
> do between the time we are born and the time he shows up? -Randy Pausch
>

Re: PySpark still reading only text?

Reply via email to