Hi,

I have browsed the online documentation and it is stated that PySpark only
read text files as sources. Is it still the case?

>From what I understand, the RDD can after this first step be any serialized
python structure if the class definitions are well distributed.

Is it not possible to read back those RDDs? That is create a flow to parse
everything and then, e.g. the next week, start from the binary, structured
data?

Technically, what is the difficulty? I would assume the code reading a
binary python RDD or a binary python file to be quite similar. Where can I
know more about this subject?

Thanks in advance

Bertrand

Reply via email to