Re: sparkR ORC support.

Sandeep Khurana Tue, 12 Jan 2016 07:36:54 -0800

I call stop from console as R studio warns  and advises it. And yes. after
stop was called the whole script was run again together. It means init
 "hivecontext <- sparkRHive.init(sc)" is called after stop always.


On Tue, Jan 12, 2016 at 8:31 PM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> As you can see from my reply below from Jan 6, calling sparkR.stop()
> invalidates both sc and hivecontext you have and results in this invalid
> jobj error.
>
> If you start R and run this, it should work:
>
> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
>
> sc <- sparkR.init()
> hivecontext <- sparkRHive.init(sc)
> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>
>
> Is there a reason you want to call stop? If you do, you would need to call
> the line hivecontext <- sparkRHive.init(sc) again.
>
>
> _____________________________
> From: Sandeep Khurana <sand...@infoworks.io>
> Sent: Tuesday, January 12, 2016 5:20 AM
> Subject: Re: sparkR ORC support.
> To: Felix Cheung <felixcheun...@hotmail.com>
> Cc: spark users <user@spark.apache.org>, Prem Sure <premsure...@gmail.com>,
> Deepak Sharma <deepakmc...@gmail.com>, Yanbo Liang <yblia...@gmail.com>
>
>
> It worked for sometime. Then I did  sparkR.stop() an re-ran again to get
> the same error. Any idea why it ran fine before ( while running fine it
> kept giving warning reusing existing spark-context and that I should
> restart) ? There is one more R code which instantiated spark , I ran that
> too again.
>
>
> On Tue, Jan 12, 2016 at 3:05 PM, Sandeep Khurana <sand...@infoworks.io>
> wrote:
>
>> Complete stacktrace is. Can it be something wih java versions?
>>
>>
>> stop("invalid jobj ", value$id)
>> 8
>> writeJobj(con, object)
>> 7
>> writeObject(con, a)
>> 6
>> writeArgs(rc, args)
>> 5
>> invokeJava(isStatic = TRUE, className, methodName, ...)
>> 4
>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF", sqlContext,
>> source, options)
>> 3
>> read.df(sqlContext, path, source, schema, ...)
>> 2
>> loadDF(hivecontext, filepath, "orc")
>>
>> On Tue, Jan 12, 2016 at 2:41 PM, Sandeep Khurana <sand...@infoworks.io>
>> wrote:
>>
>>> Running this gave
>>>
>>> 16/01/12 04:06:54 INFO BlockManagerMaster: Registered BlockManagerError in 
>>> writeJobj(con, object) : invalid jobj 3
>>>
>>>
>>> How does it know which hive schema to connect to?
>>>
>>>
>>>
>>> On Tue, Jan 12, 2016 at 2:34 PM, Felix Cheung <felixcheun...@hotmail.com
>>> > wrote:
>>>
>>>> It looks like you have overwritten sc. Could you try this:
>>>>
>>>>
>>>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>>>
>>>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
>>>> .libPaths()))
>>>> library(SparkR)
>>>>
>>>> sc <- sparkR.init()
>>>> hivecontext <- sparkRHive.init(sc)
>>>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> Date: Tue, 12 Jan 2016 14:28:58 +0530
>>>> Subject: Re: sparkR ORC support.
>>>> From: sand...@infoworks.io
>>>> To: felixcheun...@hotmail.com
>>>> CC: yblia...@gmail.com; user@spark.apache.org; premsure...@gmail.com;
>>>> deepakmc...@gmail.com
>>>>
>>>>
>>>> The code is very simple, pasted below .
>>>> hive-site.xml is in spark conf already. I still see this error
>>>>
>>>> Error in writeJobj(con, object) : invalid jobj 3
>>>>
>>>> after running the script  below
>>>>
>>>>
>>>> script
>>>> =======
>>>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client")
>>>>
>>>>
>>>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"),
>>>> .libPaths()))
>>>> library(SparkR)
>>>>
>>>> sc <<- sparkR.init()
>>>> sc <<- sparkRHive.init()
>>>> hivecontext <<- sparkRHive.init(sc)
>>>> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc")
>>>> #View(df)
>>>>
>>>>
>>>> On Wed, Jan 6, 2016 at 11:08 PM, Felix Cheung <
>>>> felixcheun...@hotmail.com> wrote:
>>>>
>>>> Yes, as Yanbo suggested, it looks like there is something wrong with
>>>> the sqlContext.
>>>>
>>>> Could you forward us your code please?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jan 6, 2016 at 5:52 AM -0800, "Yanbo Liang" <yblia...@gmail.com
>>>> > wrote:
>>>>
>>>> You should ensure your sqlContext is HiveContext.
>>>>
>>>> sc <- sparkR.init()
>>>>
>>>> sqlContext <- sparkRHive.init(sc)
>>>>
>>>>
>>>> 2016-01-06 20:35 GMT+08:00 Sandeep Khurana <sand...@infoworks.io>:
>>>>
>>>> Felix
>>>>
>>>> I tried the option suggested by you.  It gave below error.  I am going
>>>> to try the option suggested by Prem .
>>>>
>>>> Error in writeJobj(con, object) : invalid jobj 1
>>>> 8
>>>> stop("invalid jobj ", value$id)
>>>> 7
>>>> writeJobj(con, object)
>>>> 6
>>>> writeObject(con, a)
>>>> 5
>>>> writeArgs(rc, args)
>>>> 4
>>>> invokeJava(isStatic = TRUE, className, methodName, ...)
>>>> 3
>>>> callJStatic("org.apache.spark.sql.api.r.SQLUtils", "loadDF",
>>>> sqlContext, source, options)
>>>> 2
>>>> read.df(sqlContext, filepath, "orc") at
>>>> spark_api.R#108
>>>>
>>>> On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <
>>>> felixcheun...@hotmail.com> wrote:
>>>>
>>>> Firstly I don't have ORC data to verify but this should work:
>>>>
>>>> df <- loadDF(sqlContext, "data/path", "orc")
>>>>
>>>> Secondly, could you check if sparkR.stop() was called?
>>>> sparkRHive.init() should be called after sparkR.init() - please check if
>>>> there is any error message there.
>>>>
>>>> _____________________________
>>>> From: Prem Sure < premsure...@gmail.com>
>>>> Sent: Tuesday, January 5, 2016 8:12 AM
>>>> Subject: Re: sparkR ORC support.
>>>> To: Sandeep Khurana < sand...@infoworks.io>
>>>> Cc: spark users < user@spark.apache.org>, Deepak Sharma <
>>>> deepakmc...@gmail.com>
>>>>
>>>>
>>>>
>>>> Yes Sandeep, also copy hive-site.xml too to spark conf directory.
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana <sand...@infoworks.io>
>>>> wrote:
>>>>
>>>> Also, do I need to setup hive in spark as per the link
>>>> http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark
>>>> ?
>>>>
>>>> We might need to copy hdfs-site.xml file to spark conf directory ?
>>>>
>>>> On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana <sand...@infoworks.io>
>>>> wrote:
>>>>
>>>> Deepak
>>>>
>>>> Tried this. Getting this error now
>>>>
>>>> rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") :   
>>>> unused argument ("")
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma <deepakmc...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Sandeep
>>>> can you try this ?
>>>>
>>>> results <- sql(hivecontext, "FROM test SELECT id","")
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana <sand...@infoworks.io>
>>>> wrote:
>>>>
>>>> Thanks Deepak.
>>>>
>>>> I tried this as well. I created a hivecontext   with  "hivecontext <<-
>>>> sparkRHive.init(sc) "  .
>>>>
>>>> When I tried to read hive table from this ,
>>>>
>>>> results <- sql(hivecontext, "FROM test SELECT id")
>>>>
>>>> I get below error,
>>>>
>>>> Error in callJMethod(sqlContext, "sql", sqlQuery) :   Invalid jobj 2. If 
>>>> SparkR was restarted, Spark operations need to be re-executed.
>>>>
>>>>
>>>> Not sure what is causing this? Any leads or ideas? I am using rstudio.
>>>>
>>>>
>>>>
>>>> On Tue, Jan 5, 2016 at 5:35 PM, Deepak Sharma <deepakmc...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Sandeep
>>>> I am not sure if ORC can be read directly in R.
>>>> But there can be a workaround .First create hive table on top of ORC
>>>> files and then access hive table in R.
>>>>
>>>> Thanks
>>>> Deepak
>>>>
>>>> On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana <sand...@infoworks.io>
>>>> wrote:
>>>>
>>>> Hello
>>>>
>>>> I need to read an ORC files in hdfs in R using spark. I am not able to
>>>> find a package to do that.
>>>>
>>>> Can anyone help with documentation or example for this purpose?
>>>>
>>>> --
>>>> Architect
>>>> Infoworks.io <http://infoworks.io>
>>>> http://Infoworks.io
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks
>>>> Deepak
>>>> www.bigdatabig.com
>>>> www.keosha.net
>>>>
>>>>
>>>>
>>>>
>>>>

Re: sparkR ORC support.

Reply via email to