Hi Nick, I finally got around to downloading and building the patch.
I pulled the code from https://github.com/MLnick/spark-1/tree/pyspark-inputformats I am running on a CDH5 node. While the code in the CDH branch is different from spark master, I do believe that I have resolved any inconsistencies. When attempting to connect to an HBase table using SparkContext.newAPIHadoopFile I receive the following error: Py4JError: org.apache.spark.api.python.PythonRDDnewAPIHadoopFile does not exist in the JVM I have searched the pyspark-inputformats branch and cannot find any reference to the class org.apache.spark.api.python.PythonRDDnewAPIHadoopFile Any ideas? Also, do you have a working example of HBase access with the new code? Thanks Tommer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-Spark-and-HBase-tp6142p6502.html Sent from the Apache Spark User List mailing list archive at Nabble.com.