Hi Colleagues
We need to call a Scala Class from pySpark in Ipython notebook.
We tried something like below :
from py4j.java_gateway import java_import
java_import(sparkContext._jvm,'<mynamespace>')
myScalaClass = sparkContext._jvm.SimpleScalaClass ()
myScalaClass.sayHello("World") Works Fine
But
When we try to pass sparkContext to our class it fails like below
myContext = _jvm.MySQLContext(sparkContext) fails with
AttributeError Traceback (most recent call last)
<ipython-input-19-34330244f574> in <module>()
----> 1 z = _jvm.MySQLContext(sparkContext)
C:\Users\i033085\spark\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py in
__call__(self, *args)
690
691 args_command = ''.join(
--> 692 [get_command_part(arg, self._pool) for arg in new_args])
693
694 command = CONSTRUCTOR_COMMAND_NAME +\
C:\Users\i033085\spark\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py in
get_command_part(parameter, python_proxy_pool)
263 command_part += ';' + interface
264 else:
--> 265 command_part = REFERENCE_TYPE + parameter._get_object_id()
266
267 command_part += '\n'
attributeError: 'SparkContext' object has no attribute '_get_object_id'
And
myContext = _jvm.MySQLContext(sparkContext._jsc) fails with
Constructor org.apache.spark.sql.MySQLContext([class
org.apache.spark.api.java.JavaSparkContext]) does not exist
Would this be possible ... or there are serialization issues and hence not
possible.
If not what are the options we have to instantiate our own SQLContext written
in scala from pySpark...
Best Regards,
Santosh