Re: Zeppelin+spark+R+hive

cs user Fri, 18 Dec 2015 00:09:34 -0800

Hi All,

I gave https://github.com/elbamos/Zeppelin-With-R another try and was able
to get that working as well, and that does indeed work with the official
precompiled spark with hadoop 2.6.


I compiled Zeppelin-With-R with:

mvn package install -DskipTests

Had to reset my R home in the spark interpreter to:

/usr/lib64/R/

All then works fine.

Cheers!







On Thu, Dec 17, 2015 at 1:10 AM, Corneau Damien <cornead...@gmail.com>
wrote:

> First of all, I would start by saying that if you have trouble with
> https://github.com/datalayer/zeppelin-R
> You better ask directly on that repository or ask echarles for some help
> since it isn't part of the https://github.com/apache/incubator-zeppelin
> repository.
> I think he will be able to give you better answers regarding your error.
>
> @Amos
> Zeppelin is an Open Source project, and we welcome any type of
> contributions, including helping on the mailing list since we can't answer
> every thread.
> Your remark about @FelixCheung has nothing to do here, and it doesn't help
> resolving @csuser issue.
> Furthermore, you shouldn't have to tell him what to do or not to do, your
> grudges whatever it is has nothing to do in this mailing list.
>
> On Thu, Dec 17, 2015 at 2:30 AM, Amos B. Elberg <amos.elb...@me.com>
> wrote:
>
>> CS:   What you’re doing is compiling two versions of Zeppelin from source
>> on top of a binary of a third version.  That’s going to give you trouble.
>>
>> The R Interpreter you’re using doesn’t interface with Zeppelin’s spark
>> installation at all.  All it shares is the name.  So, none of the things
>> you’ve been doing, with recompiling Zeppelin or Spark or whatever, is
>> actually having any impact on R working with hive.  R working or not
>> working, for you, with hive, is incidental.
>>
>> I suggest you start from a clean installation and install this
>> https://github.com/elbamos/Zeppelin-With-R from source.
>>
>> You should not need to specify -Pyarn, -Phive, etc. etc.   The R
>> interpreter in the package will use the same Spark as the rest of Zeppelin.
>>
>> Just mvn package install -DskipTests to install.
>>
>> At runtime, set the environment variable SPARK_HOME to point to your
>> existing, separately compiled, installation of Spark.  Zeppelin should try
>> to use Hive by default, and the R interpreter will use whatever the rest of
>> Zeppelin uses.
>>
>> Also — @FelixCheung, you have no business trying to provide support for
>> anyone on this project, and you certainly have no business giving anyone
>> advice about using R with it.
>>
>>
>> From: cs user <acldstk...@gmail.com> <acldstk...@gmail.com>
>> Reply: users@zeppelin.incubator.apache.org
>> <users@zeppelin.incubator.apache.org>
>> <users@zeppelin.incubator.apache.org>
>> Date: December 16, 2015 at 5:27:20 AM
>> To: users@zeppelin.incubator.apache.org
>> <users@zeppelin.incubator.apache.org>
>> <users@zeppelin.incubator.apache.org>
>> Subject:  Re: Zeppelin+spark+R+hive
>>
>> Hi All,
>>
>> Many thanks for getting back to me. I've managed to get this working by
>> downloading the tagged spark 1.5.2 release and compiling it with:
>>
>> ./make-distribution.sh --name custom-spark --tgz -Phadoop-2.6
>> -Dhadoop.version=2.6.0 -Pyarn -Phive -Phive-thriftserver -Psparkr
>>
>> I've then downloaded the source for this version of zeppelin:
>>
>> https://github.com/datalayer/zeppelin-R
>>
>> Then compiled it with (based on the readme from the above project):
>>
>> mvn clean install -Pyarn -Pspark-1.5 -Dspark.version=1.5.2
>> -Dhadoop.version=2.6.0 -Phadoop-2.6 -Ppyspark -Dmaven.findbugs.enable=false
>> -Drat.skip=true -Dcheckstyle.skip=true -DskipTests -pl
>> '!flink,!ignite,!phoenix,!postgresql,!tajo,!hive,!cassandra,!lens,!kylin'
>>
>> Within Zeppelin this allows spark to run with yarn, as well as the
>> ability to use the R interpreter with hive.
>>
>> Hope this helps someone else :-)
>>
>> Cheers!
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Dec 15, 2015 at 5:37 PM, Sourav Mazumder <
>> sourav.mazumde...@gmail.com> wrote:
>>
>>> I believe that is not going to solve the problem.
>>>
>>> If you need to run spark on Yarn (assuming that it is your requirement)
>>> ensure that you run it in Yarn Client mode. Yarn Clustre mode is not
>>> supported with Zeppelin yet.
>>>
>>> Regards,
>>> Sourav
>>>
>>>
>>> On Tue, Dec 15, 2015 at 9:32 AM, Felix Cheung <felixcheun...@hotmail.com
>>> > wrote:
>>>
>>>> If you are not using YARN, try building your Spark distribution without
>>>> this:
>>>>  -Pyarn
>>>> ?
>>>>
>>>>
>>>>
>>>> On Tue, Dec 15, 2015 at 12:31 AM -0800, "cs user" <acldstk...@gmail.com
>>>> > wrote:
>>>>
>>>> Hi Folks,
>>>>
>>>> We've been playing around with this project:
>>>>
>>>> https://github.com/datalayer/zeppelin-R
>>>>
>>>> However when we try and write a notebook using R which requires hive,
>>>> we run into the following:
>>>>
>>>> Error in value[[3L]](cond): Spark SQL is not built with Hive support
>>>>
>>>> This is when we are using the pre compiled spark with hadoop 2.6
>>>> support.
>>>>
>>>> To work around this, I've tried recompiling spark with hive support.
>>>> Accessing the hive context within an R notebook now works fine.
>>>>
>>>> However, it is then impossible to run existing notebooks which try to
>>>> submit jobs via yarn, the following error is encountered:
>>>>
>>>> java.lang.NoSuchMethodException:
>>>> org.apache.spark.repl.SparkILoop$SparkILoopInterpreter.classServerUri() at
>>>> java.lang.Class.getMethod(Class.java:1678) at
>>>> org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:271)
>>>> at
>>>> org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:145)
>>>> at
>>>> org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:464)
>>>> at
>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
>>>> at
>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
>>>> at
>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
>>>> at
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:292)
>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) at
>>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
>>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:745)
>>>>
>>>> If I switch back to the old spark home, these jobs then work fine
>>>> again.
>>>>
>>>> I am compiling our custom version of spark with the following:
>>>>
>>>> ./make-distribution.sh --name custom-spark --tgz -Phadoop-2.6
>>>> -Dhadoop.version=2.6.0 -Pyarn -Phive -Phive-thriftserver
>>>>
>>>> Are there any other switches I need to add to overcome the above error?
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>
>>
>

Re: Zeppelin+spark+R+hive

Reply via email to