Yes, this JIRA would enable that. The Hive support also handles HDFS. Matei
On Apr 16, 2014, at 9:55 PM, Jesvin Jose <frank.einst...@gmail.com> wrote: > When this is implemented, can you load/save an RDD of pickled objects to HDFS? > > > On Thu, Apr 17, 2014 at 1:51 AM, Matei Zaharia <matei.zaha...@gmail.com> > wrote: > Hi Bertrand, > > We should probably add a SparkContext.pickleFile and RDD.saveAsPickleFile > that will allow saving pickled objects. Unfortunately this is not in yet, but > there is an issue up to track it: > https://issues.apache.org/jira/browse/SPARK-1161. > > In 1.0, one feature we do have now is the ability to load binary data from > Hive using Spark SQL’s Python API. Later we will also be able to save to Hive. > > Matei > > On Apr 16, 2014, at 4:27 AM, Bertrand Dechoux <decho...@gmail.com> wrote: > > > Hi, > > > > I have browsed the online documentation and it is stated that PySpark only > > read text files as sources. Is it still the case? > > > > From what I understand, the RDD can after this first step be any serialized > > python structure if the class definitions are well distributed. > > > > Is it not possible to read back those RDDs? That is create a flow to parse > > everything and then, e.g. the next week, start from the binary, structured > > data? > > > > Technically, what is the difficulty? I would assume the code reading a > > binary python RDD or a binary python file to be quite similar. Where can I > > know more about this subject? > > > > Thanks in advance > > > > Bertrand > > > > > -- > We dont beat the reaper by living longer. We beat the reaper by living well > and living fully. The reaper will come for all of us. Question is, what do we > do between the time we are born and the time he shows up? -Randy Pausch >