Re: Including additional scala libraries in sparkR

2015-07-14 Thread Michal Haris
Ok thanks. It seems that --jars is not behaving as expected - getting class
not found for even the most simple object from my lib. But anyways, I have
to do at least a filter transformation before collecting the HBaseRDD into
R so will have to go the route of using scala spark shell to transform and
collect and save into local filesystem and the visualise the file with R
until custom RDD transformations are exposed in the SparkR API.

On 13 July 2015 at 10:27, Sun, Rui rui@intel.com wrote:

 Hi, Michal,

 SparkR comes with a JVM backend that supports Java object instantiation,
 calling Java instance and static methods from R side. As defined in
 https://github.com/apache/spark/blob/master/R/pkg/R/backend.R,
 newJObject() is to create an instance of a Java class;
 callJMethod() is to call an instance method of a Java object;
 callJStatic() is to call a static method of a Java class.

 If the thing is as simple as data visualization, you can use the above
 low-level functions to create an instance of your HBASE RDD in JVM side,
 collect the data to R side, and visualize it.

 However, if you want to do HBASE RDD transformation and HBASE table
 update, things are quite complex now. SparkR supports majority of RDD API
 (though not exposed publicly in 1.4 release) allowing transformation
 functions in R code, but currently it only supports RDD source from text
 files and SparkR Data Frames, so your HBASE RDDs can't be used by SparkR
 RDD API for further processing.

 You can use --jars to include your scala library to be accessed by the JVM
 backend.

 
 From: Michal Haris [michal.ha...@visualdna.com]
 Sent: Sunday, July 12, 2015 6:39 PM
 To: user@spark.apache.org
 Subject: Including additional scala libraries in sparkR

 I have spark program with a custom optimised rdd for hbase scans and
 updates. I have a small library of objects in scala to support efficient
 serialisation, partitioning etc. I would like to use R as an analysis and
 visualisation front-end. I have tried to use rJava (i.e. not using sparkR)
 and I got as far as initialising the spark context but I have encountered
 problems with hbase dependencies (HBaseConfiguration : Unsupported
 major.minor version 51.0) so tried sparkR but I can't figure out how to
 make my custom scala classes available to sparkR other than re-implementing
 them in R. Is there a way to include and invoke additional scala objects
 and RDDs within sparkR shell/job ? Something similar to additional jars and
 init script in normal spark submit/shell..

 --
 Michal Haris
 Technical Architect
 direct line: +44 (0) 207 749 0229
 www.visualdna.comhttp://www.visualdna.com | t: +44 (0) 207 734 7033
 31 Old Nichol Street
 London
 E2 7HR




-- 
Michal Haris
Technical Architect
direct line: +44 (0) 207 749 0229
www.visualdna.com | t: +44 (0) 207 734 7033
31 Old Nichol Street
London
E2 7HR


RE: Including additional scala libraries in sparkR

2015-07-14 Thread Sun, Rui
Could you give more details about the mis-behavior of --jars for SparkR? maybe 
it's a bug.

From: Michal Haris [michal.ha...@visualdna.com]
Sent: Tuesday, July 14, 2015 5:31 PM
To: Sun, Rui
Cc: Michal Haris; user@spark.apache.org
Subject: Re: Including additional scala libraries in sparkR

Ok thanks. It seems that --jars is not behaving as expected - getting class not 
found for even the most simple object from my lib. But anyways, I have to do at 
least a filter transformation before collecting the HBaseRDD into R so will 
have to go the route of using scala spark shell to transform and collect and 
save into local filesystem and the visualise the file with R until custom RDD 
transformations are exposed in the SparkR API.

On 13 July 2015 at 10:27, Sun, Rui 
rui@intel.commailto:rui@intel.com wrote:
Hi, Michal,

SparkR comes with a JVM backend that supports Java object instantiation, 
calling Java instance and static methods from R side. As defined in 
https://github.com/apache/spark/blob/master/R/pkg/R/backend.R,
newJObject() is to create an instance of a Java class;
callJMethod() is to call an instance method of a Java object;
callJStatic() is to call a static method of a Java class.

If the thing is as simple as data visualization, you can use the above 
low-level functions to create an instance of your HBASE RDD in JVM side, 
collect the data to R side, and visualize it.

However, if you want to do HBASE RDD transformation and HBASE table update, 
things are quite complex now. SparkR supports majority of RDD API (though not 
exposed publicly in 1.4 release) allowing transformation functions in R code, 
but currently it only supports RDD source from text files and SparkR Data 
Frames, so your HBASE RDDs can't be used by SparkR RDD API for further 
processing.

You can use --jars to include your scala library to be accessed by the JVM 
backend.


From: Michal Haris 
[michal.ha...@visualdna.commailto:michal.ha...@visualdna.com]
Sent: Sunday, July 12, 2015 6:39 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Including additional scala libraries in sparkR

I have spark program with a custom optimised rdd for hbase scans and updates. I 
have a small library of objects in scala to support efficient serialisation, 
partitioning etc. I would like to use R as an analysis and visualisation 
front-end. I have tried to use rJava (i.e. not using sparkR) and I got as far 
as initialising the spark context but I have encountered problems with hbase 
dependencies (HBaseConfiguration : Unsupported major.minor version 51.0) so 
tried sparkR but I can't figure out how to make my custom scala classes 
available to sparkR other than re-implementing them in R. Is there a way to 
include and invoke additional scala objects and RDDs within sparkR shell/job ? 
Something similar to additional jars and init script in normal spark 
submit/shell..

--
Michal Haris
Technical Architect
direct line: +44 (0) 207 749 0229tel:%2B44%20%280%29%20207%20749%200229
www.visualdna.comhttp://www.visualdna.comhttp://www.visualdna.com | t: +44 
(0) 207 734 7033tel:%2B44%20%280%29%20207%20734%207033
31 Old Nichol Street
London
E2 7HR



--
Michal Haris
Technical Architect
direct line: +44 (0) 207 749 0229
www.visualdna.comhttp://www.visualdna.com | t: +44 (0) 207 734 7033
31 Old Nichol Street
London
E2 7HR

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Including additional scala libraries in sparkR

2015-07-14 Thread Shivaram Venkataraman
There was a fix for `--jars` that went into 1.4.1
https://github.com/apache/spark/commit/2579948bf5d89ac2d822ace605a6a4afce5258d6

Shivaram

On Tue, Jul 14, 2015 at 4:18 AM, Sun, Rui rui@intel.com wrote:

 Could you give more details about the mis-behavior of --jars for SparkR?
 maybe it's a bug.
 
 From: Michal Haris [michal.ha...@visualdna.com]
 Sent: Tuesday, July 14, 2015 5:31 PM
 To: Sun, Rui
 Cc: Michal Haris; user@spark.apache.org
 Subject: Re: Including additional scala libraries in sparkR

 Ok thanks. It seems that --jars is not behaving as expected - getting
 class not found for even the most simple object from my lib. But anyways, I
 have to do at least a filter transformation before collecting the HBaseRDD
 into R so will have to go the route of using scala spark shell to transform
 and collect and save into local filesystem and the visualise the file with
 R until custom RDD transformations are exposed in the SparkR API.

 On 13 July 2015 at 10:27, Sun, Rui rui@intel.commailto:
 rui@intel.com wrote:
 Hi, Michal,

 SparkR comes with a JVM backend that supports Java object instantiation,
 calling Java instance and static methods from R side. As defined in
 https://github.com/apache/spark/blob/master/R/pkg/R/backend.R,
 newJObject() is to create an instance of a Java class;
 callJMethod() is to call an instance method of a Java object;
 callJStatic() is to call a static method of a Java class.

 If the thing is as simple as data visualization, you can use the above
 low-level functions to create an instance of your HBASE RDD in JVM side,
 collect the data to R side, and visualize it.

 However, if you want to do HBASE RDD transformation and HBASE table
 update, things are quite complex now. SparkR supports majority of RDD API
 (though not exposed publicly in 1.4 release) allowing transformation
 functions in R code, but currently it only supports RDD source from text
 files and SparkR Data Frames, so your HBASE RDDs can't be used by SparkR
 RDD API for further processing.

 You can use --jars to include your scala library to be accessed by the JVM
 backend.

 
 From: Michal Haris [michal.ha...@visualdna.commailto:
 michal.ha...@visualdna.com]
 Sent: Sunday, July 12, 2015 6:39 PM
 To: user@spark.apache.orgmailto:user@spark.apache.org
 Subject: Including additional scala libraries in sparkR

 I have spark program with a custom optimised rdd for hbase scans and
 updates. I have a small library of objects in scala to support efficient
 serialisation, partitioning etc. I would like to use R as an analysis and
 visualisation front-end. I have tried to use rJava (i.e. not using sparkR)
 and I got as far as initialising the spark context but I have encountered
 problems with hbase dependencies (HBaseConfiguration : Unsupported
 major.minor version 51.0) so tried sparkR but I can't figure out how to
 make my custom scala classes available to sparkR other than re-implementing
 them in R. Is there a way to include and invoke additional scala objects
 and RDDs within sparkR shell/job ? Something similar to additional jars and
 init script in normal spark submit/shell..

 --
 Michal Haris
 Technical Architect
 direct line: +44 (0) 207 749 0229tel:%2B44%20%280%29%20207%20749%200229
 www.visualdna.comhttp://www.visualdna.comhttp://www.visualdna.com |
 t: +44 (0) 207 734 7033tel:%2B44%20%280%29%20207%20734%207033
 31 Old Nichol Street
 London
 E2 7HR



 --
 Michal Haris
 Technical Architect
 direct line: +44 (0) 207 749 0229
 www.visualdna.comhttp://www.visualdna.com | t: +44 (0) 207 734 7033
 31 Old Nichol Street
 London
 E2 7HR

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




RE: Including additional scala libraries in sparkR

2015-07-13 Thread Sun, Rui
Hi, Michal,

SparkR comes with a JVM backend that supports Java object instantiation, 
calling Java instance and static methods from R side. As defined in 
https://github.com/apache/spark/blob/master/R/pkg/R/backend.R,
newJObject() is to create an instance of a Java class;
callJMethod() is to call an instance method of a Java object;
callJStatic() is to call a static method of a Java class.

If the thing is as simple as data visualization, you can use the above 
low-level functions to create an instance of your HBASE RDD in JVM side, 
collect the data to R side, and visualize it.

However, if you want to do HBASE RDD transformation and HBASE table update, 
things are quite complex now. SparkR supports majority of RDD API (though not 
exposed publicly in 1.4 release) allowing transformation functions in R code, 
but currently it only supports RDD source from text files and SparkR Data 
Frames, so your HBASE RDDs can't be used by SparkR RDD API for further 
processing.

You can use --jars to include your scala library to be accessed by the JVM 
backend.


From: Michal Haris [michal.ha...@visualdna.com]
Sent: Sunday, July 12, 2015 6:39 PM
To: user@spark.apache.org
Subject: Including additional scala libraries in sparkR

I have spark program with a custom optimised rdd for hbase scans and updates. I 
have a small library of objects in scala to support efficient serialisation, 
partitioning etc. I would like to use R as an analysis and visualisation 
front-end. I have tried to use rJava (i.e. not using sparkR) and I got as far 
as initialising the spark context but I have encountered problems with hbase 
dependencies (HBaseConfiguration : Unsupported major.minor version 51.0) so 
tried sparkR but I can't figure out how to make my custom scala classes 
available to sparkR other than re-implementing them in R. Is there a way to 
include and invoke additional scala objects and RDDs within sparkR shell/job ? 
Something similar to additional jars and init script in normal spark 
submit/shell..

--
Michal Haris
Technical Architect
direct line: +44 (0) 207 749 0229
www.visualdna.comhttp://www.visualdna.com | t: +44 (0) 207 734 7033
31 Old Nichol Street
London
E2 7HR

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org