Re: PySpark still reading only text?

2014-04-22 Thread Bertrand Dechoux
Cool, thanks for the link. Bertrand Dechoux On Mon, Apr 21, 2014 at 7:31 PM, Nick Pentreath nick.pentre...@gmail.comwrote: Also see: https://github.com/apache/spark/pull/455 This will add support for reading sequencefile and other inputformat in PySpark, as long as the Writables are either

PySpark still reading only text?

2014-04-16 Thread Bertrand Dechoux
Hi, I have browsed the online documentation and it is stated that PySpark only read text files as sources. Is it still the case? From what I understand, the RDD can after this first step be any serialized python structure if the class definitions are well distributed. Is it not possible to read

Re: PySpark still reading only text?

2014-04-16 Thread Matei Zaharia
Hi Bertrand, We should probably add a SparkContext.pickleFile and RDD.saveAsPickleFile that will allow saving pickled objects. Unfortunately this is not in yet, but there is an issue up to track it: https://issues.apache.org/jira/browse/SPARK-1161. In 1.0, one feature we do have now is the

Re: PySpark still reading only text?

2014-04-16 Thread Jesvin Jose
When this is implemented, can you load/save an RDD of pickled objects to HDFS? On Thu, Apr 17, 2014 at 1:51 AM, Matei Zaharia matei.zaha...@gmail.comwrote: Hi Bertrand, We should probably add a SparkContext.pickleFile and RDD.saveAsPickleFile that will allow saving pickled objects.