Re: Using Hadoop InputFormat in Python

Kan Zhang Wed, 13 Aug 2014 15:36:45 -0700

Tassilo, newAPIHadoopRDD has been added to PySpark in master and
yet-to-be-released 1.1 branch. It allows you specify your custom
InputFormat. Examples of using it include hbase_inputformat.py and
cassandra_inputformat.py in examples/src/main/python. Check it out.



On Wed, Aug 13, 2014 at 3:12 PM, Sunny Khatri <sunny.k...@gmail.com> wrote:

> Not that much familiar with Python APIs, but You should be able to
> configure a job object with your custom InputFormat and pass in the
> required configuration (:- job.getConfiguration()) to newAPIHadoopRDD to
> get the required RDD
>
>
> On Wed, Aug 13, 2014 at 2:59 PM, Tassilo Klein <tjkl...@gmail.com> wrote:
>
>> Hi,
>>
>> I'd like to read in a (binary) file from Python for which I have defined a
>> Java InputFormat (.java) definition. However, now I am stuck in how to use
>> that in Python and didn't find anything in newsgroups either.
>> As far as I know, I have to use this newAPIHadoopRDD function. However, I
>> am
>> not sure how to use that in combination with my custom InputFormat.
>> Does anybody have a short snipped of code how to do it?
>> Thanks in advance.
>> Best,
>>  Tassilo
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Using-Hadoop-InputFormat-in-Python-tp12067.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Re: Using Hadoop InputFormat in Python

Reply via email to