Tassilo, newAPIHadoopRDD has been added to PySpark in master and yet-to-be-released 1.1 branch. It allows you specify your custom InputFormat. Examples of using it include hbase_inputformat.py and cassandra_inputformat.py in examples/src/main/python. Check it out.
On Wed, Aug 13, 2014 at 3:12 PM, Sunny Khatri <sunny.k...@gmail.com> wrote: > Not that much familiar with Python APIs, but You should be able to > configure a job object with your custom InputFormat and pass in the > required configuration (:- job.getConfiguration()) to newAPIHadoopRDD to > get the required RDD > > > On Wed, Aug 13, 2014 at 2:59 PM, Tassilo Klein <tjkl...@gmail.com> wrote: > >> Hi, >> >> I'd like to read in a (binary) file from Python for which I have defined a >> Java InputFormat (.java) definition. However, now I am stuck in how to use >> that in Python and didn't find anything in newsgroups either. >> As far as I know, I have to use this newAPIHadoopRDD function. However, I >> am >> not sure how to use that in combination with my custom InputFormat. >> Does anybody have a short snipped of code how to do it? >> Thanks in advance. >> Best, >> Tassilo >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Using-Hadoop-InputFormat-in-Python-tp12067.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >