Re: [SparkR] gapply with strings with arrow

2020-10-10 Thread Hyukjin Kwon
If it works without Arrow optimization, it's likely a bug. Please feel free to file a JIRA for that. On Wed, 7 Oct 2020, 22:44 Jacek Pliszka, wrote: > Hi! > > Is there any place I can find information how to use gapply with arrow? > > I've tried something very simple > > collect(gapply( > df,

Re: SparkR integration with Hive 3 spark-r

2019-11-24 Thread Felix Cheung
. From: Alfredo Marquez Sent: Friday, November 22, 2019 4:26:49 PM To: user@spark.apache.org Subject: Re: SparkR integration with Hive 3 spark-r Does anyone else have some insight to this question? Thanks, Alfredo On Mon, Nov 18, 2019, 3:00 PM Alfredo Marquez

Re: SparkR integration with Hive 3 spark-r

2019-11-22 Thread Alfredo Marquez
Does anyone else have some insight to this question? Thanks, Alfredo On Mon, Nov 18, 2019, 3:00 PM Alfredo Marquez wrote: > Hello Nicolas, > > Well the issue is that with Hive 3, Spark gets it's own metastore, > separate from the Hive 3 metastore. So how do you reconcile this > separation of

Re: SparkR integration with Hive 3 spark-r

2019-11-18 Thread Alfredo Marquez
Hello Nicolas, Well the issue is that with Hive 3, Spark gets it's own metastore, separate from the Hive 3 metastore. So how do you reconcile this separation of metastores? Can you continue to "enableHivemetastore" and be able to connect to Hive 3? Does this connection take advantage of Hive's

Re: SparkR integration with Hive 3 spark-r

2019-11-18 Thread Nicolas Paris
Hi Alfredo my 2 cents: To my knowlegde and reading the spark3 pre-release note, it will handle hive metastore 2.3.5 - no mention of hive 3 metastore. I made several tests on this in the past[1] and it seems to handle any hive metastore version. However spark cannot read hive managed table AKA

Re: SparkR + binary type + how to get value

2019-02-19 Thread Felix Cheung
there: From: Thijs Haarhuis Sent: Tuesday, February 19, 2019 5:28 AM To: Felix Cheung; user@spark.apache.org Subject: Re: SparkR + binary type + how to get value Hi Felix, Thanks. I got it working now by using the unlist function. I have another question, maybe you can help me with, since I did

Re: SparkR + binary type + how to get value

2019-02-19 Thread Thijs Haarhuis
From: Felix Cheung Sent: Sunday, February 17, 2019 7:18 PM To: Thijs Haarhuis; user@spark.apache.org Subject: Re: SparkR + binary type + how to get value A byte buffer in R is the raw vector type, so seems like it is working as expected. What do you have in the raw

Re: SparkR + binary type + how to get value

2019-02-17 Thread Felix Cheung
: Thijs Haarhuis Sent: Thursday, February 14, 2019 4:01 AM To: Felix Cheung; user@spark.apache.org Subject: Re: SparkR + binary type + how to get value Hi Felix, Sure.. I have the following code: printSchema(results) cat("\n\n\n") firstRow <- first(results

Re: SparkR + binary type + how to get value

2019-02-14 Thread Thijs Haarhuis
Any idea how to get the actual value, or how to process the individual bytes? Thanks Thijs From: Felix Cheung Sent: Thursday, February 14, 2019 5:31 AM To: Thijs Haarhuis; user@spark.apache.org Subject: Re: SparkR + binary type + how to get value Please share

Re: SparkR + binary type + how to get value

2019-02-13 Thread Felix Cheung
Please share your code From: Thijs Haarhuis Sent: Wednesday, February 13, 2019 6:09 AM To: user@spark.apache.org Subject: SparkR + binary type + how to get value Hi all, Does anybody have any experience in accessing the data from a column which has a binary

Re: SparkR issue

2018-10-14 Thread Felix Cheung
1 seems like its spending a lot of time in R (slicing the data I guess?) and not with Spark 2 could you write it into a csv file locally and then read it from Spark? From: ayan guha Sent: Monday, October 8, 2018 11:21 PM To: user Subject: SparkR issue Hi We

Re: SparkR test script issue: unable to run run-tests.h on spark 2.2

2018-02-14 Thread chandan prakash
an earlier version with devtools? will > follow up for a fix. > > _ > From: Hyukjin Kwon <gurwls...@gmail.com> > Sent: Wednesday, February 14, 2018 6:49 PM > Subject: Re: SparkR test script issue: unable to run run-tests.h on spark > 2.2 > To: chand

Re: SparkR test script issue: unable to run run-tests.h on spark 2.2

2018-02-14 Thread Felix Cheung
Yes it is issue with the newer release of testthat. To workaround could you install an earlier version with devtools? will follow up for a fix. _ From: Hyukjin Kwon <gurwls...@gmail.com> Sent: Wednesday, February 14, 2018 6:49 PM Subject: Re: SparkR test script

Re: SparkR test script issue: unable to run run-tests.h on spark 2.2

2018-02-14 Thread Hyukjin Kwon
>From a very quick look, I think testthat version issue with SparkR. I had to fix that version to 1.x before in AppVeyor. There are few details in https://github.com/apache/spark/pull/20003 Can you check and lower testthat version? On 14 Feb 2018 6:09 pm, "chandan prakash"

Re: sparkR 3rd library

2017-09-05 Thread Yanbo Liang
I guess you didn't install R package `genalg` for all worker nodes. This is not built-in package for basic R, so you need to install it to all worker nodes manually or running `install.packages` inside of your SparkR UDF. Regards to how to download third party packages and install them inside of

Re: sparkR 3rd library

2017-09-04 Thread Felix Cheung
Can you include the code you call spark.lapply? From: patcharee Sent: Sunday, September 3, 2017 11:46:40 PM To: spar >> user@spark.apache.org Subject: sparkR 3rd library Hi, I am using spark.lapply to execute an existing R script in

Re: [sparkR] [MLlib] : Is word2vec implemented in SparkR MLlib ?

2017-04-21 Thread Felix Cheung
Not currently - how are you planning to use the output from word2vec? From: Radhwane Chebaane Sent: Thursday, April 20, 2017 4:30:14 AM To: user@spark.apache.org Subject: [sparkR] [MLlib] : Is word2vec implemented in SparkR MLlib ? Hi,

Re: SparkR execution hang on when handle a RDD which is converted from DataFrame

2016-10-14 Thread Lantao Jin
40GB 2016-10-14 14:20 GMT+08:00 Felix Cheung : > How big is the metrics_moveing_detection_cube table? > > > > > > On Thu, Oct 13, 2016 at 8:51 PM -0700, "Lantao Jin" > wrote: > > sqlContext <- sparkRHive.init(sc) > sqlString<- > "SELECT > key_id,

Re: SparkR API problem with subsetting distributed data frame

2016-09-11 Thread Bene
I am calling dirs(x, dat) with a number for x and a distributed dataframe for dat, like dirs(3, df). With your logical expression Felix I would get another data frame, right? This is not what I need, I need to extract a single value in a specific cell for my calculations. Is that somehow possible?

Re: SparkR error: reference is ambiguous.

2016-09-10 Thread Felix Cheung
t:double] > head(c) speed dist 1 0 2 2 0 10 3 0 4 4 0 22 5 0 16 6 0 10 _ From: Bedrytski Aliaksandr <sp...@bedryt.ski<mailto:sp...@bedryt.ski>> Sent: Friday, September 9, 2016 9:13 PM Subject: Re: SparkR error: reference is ambiguous. To: xingye <t

Re: SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Felix Cheung
How are you calling dirs()? What would be x? Is dat a SparkDataFrame? With SparkR, i in dat[i, 4] should be an logical expression for row, eg. df[df$age %in% c(19, 30), 1:2] On Sat, Sep 10, 2016 at 11:02 AM -0700, "Bene" >

Re: SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Bene
Here are a few code snippets: The data frame looks like this: kfzzeit datum latitude longitude 1 # 2015-02-09 07:18:33 2015-02-09 52.35234 9.881965 2 # 2015-02-09 07:18:34 2015-02-09 52.35233 9.881970 3 #

Re: SparkR API problem with subsetting distributed data frame

2016-09-10 Thread Felix Cheung
Could you include code snippets you are running? On Sat, Sep 10, 2016 at 1:44 AM -0700, "Bene" > wrote: Hi, I am having a problem with the SparkR API. I need to subset a distributed data so I can extract single values from

Re: SparkR error: reference is ambiguous.

2016-09-09 Thread Bedrytski Aliaksandr
Hi, Can you use full-string queries in SparkR? Like (in Scala): df1.registerTempTable("df1") df2.registerTempTable("df2") val df3 = sparkContext.sql("SELECT * FROM df1 JOIN df2 ON df1.ra = df2.ra") explicitly mentioning table names in the query often solves ambiguity problems. Regards --

Re: SparkR error when repartition is called

2016-08-09 Thread Felix Cheung
nvalid>> Sent: Tuesday, August 9, 2016 12:19 AM Subject: Re: SparkR error when repartition is called To: Sun Rui <sunrise_...@163.com<mailto:sunrise_...@163.com>> Cc: User <user@spark.apache.org<mailto:user@spark.apache.org>> Sun, I am using spark in yarn client mode i

Re: SparkR error when repartition is called

2016-08-09 Thread Shane Lee
Sun, I am using spark in yarn client mode in a 2-node cluster with hadoop-2.7.2. My R version is 3.3.1. I have the following in my spark-defaults.conf:spark.executor.extraJavaOptions =-XX:+PrintGCDetails

Re: SparkR error when repartition is called

2016-08-09 Thread Sun Rui
I can’t reproduce your issue with len=1 in local mode. Could you give more environment information? > On Aug 9, 2016, at 11:35, Shane Lee wrote: > > Hi All, > > I am trying out SparkR 2.0 and have run into an issue with repartition. > > Here is the R code

Re: SparkR : glm model

2016-06-11 Thread Sun Rui
You were looking at some old code. poisson family is supported in latest master branch. You can try spark 2.0 preview release from http://spark.apache.org/news/spark-2.0.0-preview.html > On Jun 10, 2016, at 12:14, april_ZMQ

Re: SparkR interaction with R libraries (currently 1.5.2)

2016-06-07 Thread Sun Rui
Hi, Ian, You should not use the Spark DataFrame a_df in your closure. For an R function for lapplyPartition, the parameter is a list of lists, representing the rows in the corresponding partition. In Spark 2.0, SparkR provides a new public API called dapply, which can apply an R function to each

Re: SparkR query

2016-05-17 Thread Sun Rui
To: Mike Lewis > Cc: user@spark.apache.org > Subject: Re: SparkR query > > Lewis, > 1. Could you check the values of “SPARK_HOME” environment on all of your > worker nodes? > 2. How did you start your SparkR shell? > > On May 17, 2016, at 18:07, Mike Lewis <mle...@nephila

RE: SparkR query

2016-05-17 Thread Mike Lewis
Rui [mailto:sunrise_...@163.com] Sent: 17 May 2016 11:32 To: Mike Lewis Cc: user@spark.apache.org Subject: Re: SparkR query Lewis, 1. Could you check the values of “SPARK_HOME” environment on all of your worker nodes? 2. How did you start your SparkR shell? On May 17, 2016, at 18:07, Mike Lewis

Re: SparkR query

2016-05-17 Thread Sun Rui
Lewis, 1. Could you check the values of “SPARK_HOME” environment on all of your worker nodes? 2. How did you start your SparkR shell? > On May 17, 2016, at 18:07, Mike Lewis wrote: > > Hi, > > I have a SparkR driver process that connects to a master running on

RE: sparkR issues ?

2016-03-18 Thread Sun, Rui
Kozlov <ale...@gmail.com>; roni <roni.epi...@gmail.com> Cc: user@spark.apache.org Subject: RE: sparkR issues ? I have submitted https://issues.apache.org/jira/browse/SPARK-13905 and a PR for it. From: Alex Kozlov [mailto:ale...@gmail.com] Sent: Wednesday, March 16, 2016 12:52

RE: sparkR issues ?

2016-03-15 Thread Sun, Rui
I have submitted https://issues.apache.org/jira/browse/SPARK-13905 and a PR for it. From: Alex Kozlov [mailto:ale...@gmail.com] Sent: Wednesday, March 16, 2016 12:52 AM To: roni <roni.epi...@gmail.com> Cc: Sun, Rui <rui@intel.com>; user@spark.apache.org Subject: Re: sparkR issue

Re: sparkR issues ?

2016-03-15 Thread Alex Kozlov
ame() >> in SparkR to avoid such covering. >> >> >> >> *From:* Alex Kozlov [mailto:ale...@gmail.com] >> *Sent:* Tuesday, March 15, 2016 2:59 PM >> *To:* roni <roni.epi...@gmail.com> >> *Cc:* user@spark.apache.org >> *Subject:* Re: sparkR is

Re: sparkR issues ?

2016-03-15 Thread roni
e if we can change the implementation of as.data.frame() > in SparkR to avoid such covering. > > > > *From:* Alex Kozlov [mailto:ale...@gmail.com] > *Sent:* Tuesday, March 15, 2016 2:59 PM > *To:* roni <roni.epi...@gmail.com> > *Cc:* user@spark.apache.org > *Subject:* Re: sparkR i

Re: sparkR issues ?

2016-03-15 Thread roni
Alex, No I have not defined he "dataframe" its the spark default Dataframe. That line is just casting Factor as datarame to send to the function. Thanks -R On Mon, Mar 14, 2016 at 11:58 PM, Alex Kozlov wrote: > This seems to be a very unfortunate name collision. SparkR

RE: sparkR issues ?

2016-03-15 Thread Sun, Rui
epi...@gmail.com> Cc: user@spark.apache.org Subject: Re: sparkR issues ? This seems to be a very unfortunate name collision. SparkR defines it's own DataFrame class which shadows what seems to be your own definition. Is DataFrame something you define? Can you rename it? On Mon, Mar 14, 2016 at

Re: sparkR issues ?

2016-03-15 Thread Alex Kozlov
This seems to be a very unfortunate name collision. SparkR defines it's own DataFrame class which shadows what seems to be your own definition. Is DataFrame something you define? Can you rename it? On Mon, Mar 14, 2016 at 10:44 PM, roni wrote: > Hi, > I am working

Re: SparkR Count vs Take performance

2016-03-02 Thread Dirceu Semighini Filho
n Owen [mailto:so...@cloudera.com] > Sent: Wednesday, March 2, 2016 3:37 AM > To: Dirceu Semighini Filho <dirceu.semigh...@gmail.com> > Cc: user <user@spark.apache.org> > Subject: Re: SparkR Count vs Take performance > > Yeah one surprising result is that you can't call i

RE: SparkR Count vs Take performance

2016-03-02 Thread Sun, Rui
fetch. -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Wednesday, March 2, 2016 3:37 AM To: Dirceu Semighini Filho <dirceu.semigh...@gmail.com> Cc: user <user@spark.apache.org> Subject: Re: SparkR Count vs Take performance Yeah one surprising result is th

Re: SparkR Count vs Take performance

2016-03-01 Thread Sean Owen
Yeah one surprising result is that you can't call isEmpty on an RDD of nonserializable objects. You can't do much with an RDD of nonserializable objects anyway, but they can exist as an intermediate stage. We could fix that pretty easily with a little copy and paste of the take() code; right now

Re: SparkR Count vs Take performance

2016-03-01 Thread Dirceu Semighini Filho
Great, I didn't noticed this isEmpty method. Well serialization is been a problem in this project, we have noticed a lot of time been spent in serializing and deserializing things to send and get from the cluster. 2016-03-01 15:47 GMT-03:00 Sean Owen : > There is an "isEmpty"

Re: SparkR Count vs Take performance

2016-03-01 Thread Sean Owen
There is an "isEmpty" method that basically does exactly what your second version does. I have seen it be unusually slow at times because it must copy 1 element to the driver, and it's possible that's slow. It still shouldn't be slow in general, and I'd be surprised if it's slower than a count in

Re: sparkR not able to create /append new columns

2016-02-04 Thread Devesh Raj Singh
browse/SPARK-12225) which is still under > discussion. If you desire this feature, you could comment on it. > > > > *From:* Franc Carter [mailto:franc.car...@gmail.com] > *Sent:* Wednesday, February 3, 2016 7:40 PM > *To:* Devesh Raj Singh > *Cc:* user@spark.apache.org > *Subj

Re: sparkR not able to create /append new columns

2016-02-03 Thread Franc Carter
Yes, I didn't work out how to solve that - sorry On 3 February 2016 at 22:37, Devesh Raj Singh wrote: > Hi, > > but "withColumn" will only add once, if i want to add columns to the same > dataframe in a loop it will keep overwriting the added column and in the > end the

Re: sparkR not able to create /append new columns

2016-02-03 Thread Franc Carter
I had problems doing this as well - I ended up using 'withColumn', it's not particularly graceful but it worked (1.5.2 on AWS EMR) cheerd On 3 February 2016 at 22:06, Devesh Raj Singh wrote: > Hi, > > i am trying to create dummy variables in sparkR by creating new

Re: sparkR not able to create /append new columns

2016-02-03 Thread Devesh Raj Singh
Hi, but "withColumn" will only add once, if i want to add columns to the same dataframe in a loop it will keep overwriting the added column and in the end the last added column( in the loop) will be the added column. like in my code above. On Wed, Feb 3, 2016 at 5:05 PM, Franc Carter

RE: sparkR not able to create /append new columns

2016-02-03 Thread Sun, Rui
/SPARK-12225) which is still under discussion. If you desire this feature, you could comment on it. From: Franc Carter [mailto:franc.car...@gmail.com] Sent: Wednesday, February 3, 2016 7:40 PM To: Devesh Raj Singh Cc: user@spark.apache.org Subject: Re: sparkR not able to create /append new columns

Re: SparkR works from command line but not from rstudio

2016-01-26 Thread Sandeep Khurana
Resolved this issue after reinstalling r, rstudio. Had issues with earlier installation. On Jan 22, 2016 6:48 PM, "Sandeep Khurana" wrote: > This problem is fixed by restarting R from R studio. Now see > > 16/01/22 08:08:38 INFO HiveMetaStore: No user is added in admin

Re: SparkR works from command line but not from rstudio

2016-01-22 Thread Sandeep Khurana
This problem is fixed by restarting R from R studio. Now see 16/01/22 08:08:38 INFO HiveMetaStore: No user is added in admin role, since config is empty16/01/22 08:08:38 ERROR RBackendHandler: on org.apache.spark.sql.hive.HiveContext failedError in value[[3L]](cond) : Spark SQL is not built with

Re: SparkR with Hive integration

2016-01-19 Thread Felix Cheung
You might need hive-site.xml _ From: Peter Zhang <zhangju...@gmail.com> Sent: Monday, January 18, 2016 9:08 PM Subject: Re: SparkR with Hive integration To: Jeff Zhang <zjf...@gmail.com> Cc: <user@spark.apache.org> Thanks, 

Re: SparkR with Hive integration

2016-01-18 Thread Peter Zhang
Thanks,  I will try. Peter --  Google Sent with Airmail On January 19, 2016 at 12:44:46, Jeff Zhang (zjf...@gmail.com) wrote: Please make sure you export environment variable HADOOP_CONF_DIR which contains the core-site.xml On Mon, Jan 18, 2016 at 8:23 PM, Peter Zhang

Re: SparkR with Hive integration

2016-01-18 Thread Jeff Zhang
Please make sure you export environment variable HADOOP_CONF_DIR which contains the core-site.xml On Mon, Jan 18, 2016 at 8:23 PM, Peter Zhang wrote: > Hi all, > > http://spark.apache.org/docs/latest/sparkr.html#sparkr-dataframes > From Hive tables >

Re: sparkR ORC support.

2016-01-12 Thread Sandeep Khurana
at > spark_api.R#108 > > On Wed, Jan 6, 2016 at 10:30 AM, Felix Cheung <felixcheun...@hotmail.com> > wrote: > > Firstly I don't have ORC data to verify but this should work: > > df <- loadDF(sqlContext, "data/path", "orc") > > Secondly, could you check i

Re: sparkR ORC support.

2016-01-12 Thread Sandeep Khurana
t;- sparkR.init() >> hivecontext <- sparkRHive.init(sc) >> df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc") >> >> >> >> -- >> Date: Tue, 12 Jan 2016 14:28:58 +0530 >> Subject: Re: s

RE: sparkR ORC support.

2016-01-12 Thread Felix Cheung
Hive.init(sc)df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc") Date: Tue, 12 Jan 2016 14:28:58 +0530 Subject: Re: sparkR ORC support. From: sand...@infoworks.io To: felixcheun...@hotmail.com CC: yblia...@gmail.com; user@spark.apache.org; premsure...@gmail.com;

Re: sparkR ORC support.

2016-01-12 Thread Sandeep Khurana
t; hivecontext <- sparkRHive.init(sc) > df <- loadDF(hivecontext, "/data/ingest/sparktest1/", "orc") > > > > -- > Date: Tue, 12 Jan 2016 14:28:58 +0530 > Subject: Re: sparkR ORC support. > From: sand...@infoworks.i

Re: sparkR ORC support.

2016-01-12 Thread Sandeep Khurana
;>> >>> >>> Sys.setenv(SPARK_HOME="/usr/hdp/current/spark-client") >>> >>> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), >>> .libPaths())) >>> library(SparkR) >>> >>> sc <

Re: sparkR ORC support.

2016-01-12 Thread Sandeep Khurana
1/", "orc") > > > Is there a reason you want to call stop? If you do, you would need to call > the line hivecontext <- sparkRHive.init(sc) again. > > > _ > From: Sandeep Khurana <sand...@infoworks.io> > Sent: Tuesday, Jan

Re: sparkR ORC support.

2016-01-12 Thread Felix Cheung
would need to call the line hivecontext <- sparkRHive.init(sc) again. _ From: Sandeep Khurana <sand...@infoworks.io> Sent: Tuesday, January 12, 2016 5:20 AM Subject: Re: sparkR ORC support. To: Felix Cheung <felixcheun...@hotmail.com> Cc: spark users

Re: sparkR ORC support.

2016-01-06 Thread Sandeep Khurana
there is any error > message there. > > _ > From: Prem Sure <premsure...@gmail.com> > Sent: Tuesday, January 5, 2016 8:12 AM > Subject: Re: sparkR ORC support. > To: Sandeep Khurana <sand...@infoworks.io> > Cc: spark users <user@spark.

Re: sparkR ORC support.

2016-01-06 Thread Yanbo Liang
after sparkR.init() - please check if there is any error >> message there. >> >> _ >> From: Prem Sure <premsure...@gmail.com> >> Sent: Tuesday, January 5, 2016 8:12 AM >> Subject: Re: sparkR ORC support. >> To: Sandeep Khurana &

Re: sparkR ORC support.

2016-01-06 Thread Felix Cheung
o verify but this should work: >> >> df <- loadDF(sqlContext, "data/path", "orc") >> >> Secondly, could you check if sparkR.stop() was called? sparkRHive.init() >> should be called after sparkR.init() - please check if there is any error >> messag

Re: sparkR ORC support.

2016-01-05 Thread Prem Sure
Yes Sandeep, also copy hive-site.xml too to spark conf directory. On Tue, Jan 5, 2016 at 10:07 AM, Sandeep Khurana wrote: > Also, do I need to setup hive in spark as per the link > http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark > ? > > We might

Re: sparkR ORC support.

2016-01-05 Thread Sandeep Khurana
Also, do I need to setup hive in spark as per the link http://stackoverflow.com/questions/26360725/accesing-hive-tables-in-spark ? We might need to copy hdfs-site.xml file to spark conf directory ? On Tue, Jan 5, 2016 at 8:28 PM, Sandeep Khurana wrote: > Deepak > > Tried

Re: sparkR ORC support.

2016-01-05 Thread Sandeep Khurana
Deepak Tried this. Getting this error now rror in sql(hivecontext, "FROM CATEGORIES SELECT category_id", "") : unused argument ("") On Tue, Jan 5, 2016 at 6:48 PM, Deepak Sharma wrote: > Hi Sandeep > can you try this ? > > results <- sql(hivecontext, "FROM test

Re: sparkR ORC support.

2016-01-05 Thread Felix Cheung
re is any error message there. _ From: Prem Sure <premsure...@gmail.com> Sent: Tuesday, January 5, 2016 8:12 AM Subject: Re: sparkR ORC support. To: Sandeep Khurana <sand...@infoworks.io> Cc: spark users <user@spark.apache.org>, Deepak Sharma <deepakmc...@gmail.com>

Re: sparkR ORC support.

2016-01-05 Thread Deepak Sharma
Hi Sandeep can you try this ? results <- sql(hivecontext, "FROM test SELECT id","") Thanks Deepak On Tue, Jan 5, 2016 at 5:49 PM, Sandeep Khurana wrote: > Thanks Deepak. > > I tried this as well. I created a hivecontext with "hivecontext <<- > sparkRHive.init(sc) "

Re: sparkR ORC support.

2016-01-05 Thread Deepak Sharma
Hi Sandeep I am not sure if ORC can be read directly in R. But there can be a workaround .First create hive table on top of ORC files and then access hive table in R. Thanks Deepak On Tue, Jan 5, 2016 at 4:57 PM, Sandeep Khurana wrote: > Hello > > I need to read an ORC

Re: sparkR ORC support.

2016-01-05 Thread Sandeep Khurana
Thanks Deepak. I tried this as well. I created a hivecontext with "hivecontext <<- sparkRHive.init(sc) " . When I tried to read hive table from this , results <- sql(hivecontext, "FROM test SELECT id") I get below error, Error in callJMethod(sqlContext, "sql", sqlQuery) : Invalid jobj

Re: [SparkR] Is rdd in SparkR deprecated ?

2015-12-14 Thread Jeff Zhang
Thanks Felix, Just curious when I read the code. On Tue, Dec 15, 2015 at 1:32 AM, Felix Cheung wrote: > RDD API in SparkR is not officially supported. You could still access them > with the SparkR::: prefix though. > > May I ask what uses you have for them? Would the

Re: [SparkR] Any reason why saveDF's mode is append by default ?

2015-12-14 Thread Shivaram Venkataraman
I think its just a bug -- I think we originally followed the Python API (in the original PR [1]) but the Python API seems to have been changed to match Scala / Java in https://issues.apache.org/jira/browse/SPARK-6366 Feel free to open a JIRA / PR for this. Thanks Shivaram [1]

Re: [SparkR] Any reason why saveDF's mode is append by default ?

2015-12-14 Thread Jeff Zhang
Thanks Shivaram, created https://issues.apache.org/jira/browse/SPARK-12318 I will work on it. On Mon, Dec 14, 2015 at 4:13 PM, Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > I think its just a bug -- I think we originally followed the Python > API (in the original PR [1]) but the

Re: SparkR read.df failed to read file from local directory

2015-12-08 Thread Boyu Zhang
Thanks for the comment Felix, I tried giving "/home/myuser/test_data/sparkR/flights.csv", but it tried to search the path in hdfs, and gave errors: 15/12/08 12:47:10 ERROR r.RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils failed Error in invokeJava(isStatic = TRUE, className,

Re: SparkR read.df failed to read file from local directory

2015-12-08 Thread Felix Cheung
Have you tried flightsDF <- read.df(sqlContext, "/home/myuser/test_data/sparkR/flights.csv", source = "com.databricks.spark.csv", header = "true")     _ From: Boyu Zhang Sent: Tuesday, December 8, 2015 8:47 AM Subject: SparkR read.df

RE: SparkR read.df failed to read file from local directory

2015-12-08 Thread Sun, Rui
@spark.apache.org Subject: Re: SparkR read.df failed to read file from local directory Thanks for the comment Felix, I tried giving "/home/myuser/test_data/sparkR/flights.csv", but it tried to search the path in hdfs, and gave errors: 15/12/08 12:47:10 ERROR r.RBackendHandl

Re: SparkR in Spark 1.5.2 jsonFile Bug Found

2015-12-04 Thread Yanbo Liang
I have created SPARK-12146 to track this issue. 2015-12-04 9:16 GMT+08:00 Felix Cheung : > It looks like this has been broken around Spark 1.5. > > Please see JIRA SPARK-10185. This has been fixed in pyspark but > unfortunately SparkR was missed. I have confirmed this

Re: SparkR in Spark 1.5.2 jsonFile Bug Found

2015-12-03 Thread Felix Cheung
It looks like this has been broken around Spark 1.5. Please see JIRA SPARK-10185. This has been fixed in pyspark but unfortunately SparkR was missed. I have confirmed this is still broken in Spark 1.6. Could you please open a JIRA? On Thu, Dec 3, 2015 at 2:08 PM -0800, "tomasr3"

Re: SparkR DataFrame , Out of memory exception for very small file.

2015-11-23 Thread Vipul Rai
Hi Jeff, This is only part of the actual code. My questions are mentioned in comments near the code. SALES<- SparkR::sql(hiveContext, "select * from sales") PRICING<- SparkR::sql(hiveContext, "select * from pricing") ## renaming of columns ## #sales file# # Is this right ??? Do we have to

Re: SparkR DataFrame , Out of memory exception for very small file.

2015-11-23 Thread Jeff Zhang
>>> Do I need to create a new DataFrame for every update to the DataFrame like addition of new column or need to update the original sales DataFrame. Yes, DataFrame is immutable, and every mutation of DataFrame will produce a new DataFrame. On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai

Re: SparkR DataFrame , Out of memory exception for very small file.

2015-11-23 Thread Vipul Rai
Hello Rui, Sorry , What I meant was the resultant of the original dataframe to which a new column was added gives a new DataFrame. Please check this for more https://spark.apache.org/docs/1.5.1/api/R/index.html Check for WithColumn Thanks, Vipul On 23 November 2015 at 12:42, Sun, Rui

Re: SparkR DataFrame , Out of memory exception for very small file.

2015-11-23 Thread Vipul Rai
Hi Zeff, Thanks for the reply, but could you tell me why is it taking so much time. What could be wrong , also when I remove the DataFrame from memory using rm(). It does not clear the memory but the object is deleted. Also , What about the R functions which are not supported in SparkR. Like

Re: SparkR DataFrame , Out of memory exception for very small file.

2015-11-23 Thread Jeff Zhang
If possible, could you share your code ? What kind of operation are you doing on the dataframe ? On Mon, Nov 23, 2015 at 5:10 PM, Vipul Rai wrote: > Hi Zeff, > > Thanks for the reply, but could you tell me why is it taking so much time. > What could be wrong , also when

RE: SparkR DataFrame , Out of memory exception for very small file.

2015-11-22 Thread Sun, Rui
Vipul, Not sure if I understand your question. DataFrame is immutable. You can't update a DataFrame. Could you paste some log info for the OOM error? -Original Message- From: vipulrai [mailto:vipulrai8...@gmail.com] Sent: Friday, November 20, 2015 12:11 PM To: user@spark.apache.org

RE: [sparkR] Any insight on java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-11-07 Thread Sun, Rui
This is probably because your config option actually do not take effect. Please refer to the email thread titled “How to set memory for SparkR with master="local[*]"”, which may answer you. I recommend you to try to use SparkR built from the master branch, which contains two fixes that may

RE: sparkR 1.5.1 batch yarn-client mode failing on daemon.R not found

2015-11-01 Thread Sun, Rui
Tom, Have you set the “MASTER” evn variable on your machine? What is the value if set? From: Tom Stewart [mailto:stewartthom...@yahoo.com.INVALID] Sent: Friday, October 30, 2015 10:11 PM To: user@spark.apache.org Subject: sparkR 1.5.1 batch yarn-client mode failing on daemon.R not found I have

RE: SparkR job with >200 tasks hangs when calling from web server

2015-11-01 Thread Sun, Rui
I guess that this is not related to SparkR, but something wrong in the Spark Core. Could you try your application logic within spark-shell (you have to use Scala DataFrame API) instead of SparkR shell and to see if this issue still happens? -Original Message- From: rporcio

Re: SparkR 1.5.1 ClassCastException when working with CSV files

2015-10-28 Thread rporcio
It seems that the cause of this exception was the wrong version of the spark-csv package. After I upgraded it to the latest (1.2.0) version, the exception is gone and it works fine. -- View this message in context:

Re: SparkR in yarn-client mode needs sparkr.zip

2015-10-25 Thread Ram Venkatesh
Felix, Missed your reply - agree looks like the same issue, resolved mine as Duplicate. Thanks! Ram On Sun, Oct 25, 2015 at 2:47 PM, Felix Cheung wrote: > > > This might be related to https://issues.apache.org/jira/browse/SPARK-10500 > > > > On Sun, Oct 25, 2015 at

Re: SparkR in yarn-client mode needs sparkr.zip

2015-10-25 Thread Felix Cheung
This might be related to https://issues.apache.org/jira/browse/SPARK-10500 On Sun, Oct 25, 2015 at 9:57 AM -0700, "Ted Yu" wrote: In zipRLibraries(): // create a zip file from scratch, do not append to existing file. val zipFile = new File(dir, name) I guess

Re: SparkR in yarn-client mode needs sparkr.zip

2015-10-25 Thread Ram Venkatesh
Ted Yu, Agree that either picking up sparkr.zip if it already exists, or creating a zip in a local scratch directory will work. This code is called by the client side job submission logic and the resulting zip is already added to the local resources for the YARN job, so I don't think the

Re: SparkR in yarn-client mode needs sparkr.zip

2015-10-25 Thread Ted Yu
In zipRLibraries(): // create a zip file from scratch, do not append to existing file. val zipFile = new File(dir, name) I guess instead of creating sparkr.zip in the same directory as R lib, the zip file can be created under some directory writable by the user launching the app and

RE: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-10-09 Thread Khandeshi, Ami
It seems the problem is with creating Usage: RBackend From: Sun, Rui [mailto:rui@intel.com] Sent: Wednesday, October 07, 2015 10:23 PM To: Khandeshi, Ami; Hossein Cc: akhandeshi; user@spark.apache.org Subject: RE: SparkR Error in sparkR.init(master=“local”) in RStudio Can you extract

RE: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-10-09 Thread Khandeshi, Ami
Thank you for your help! I was able to resolve it by changing my working directory to local. The default was a map drive. From: Khandeshi, Ami Sent: Friday, October 09, 2015 11:23 AM To: 'Sun, Rui'; Hossein Cc: akhandeshi; user@spark.apache.org Subject: RE: SparkR Error in sparkR.init(master

RE: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-10-08 Thread Sun, Rui
m] Sent: Wednesday, October 07, 2015 2:35 AM To: Hossein; Khandeshi, Ami Cc: akhandeshi; user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: SparkR Error in sparkR.init(master=“local”) in RStudio Not sure "/C/DevTools/spark-1.5.1/bin/spark-submit.cmd" is a valid? From: Hos

RE: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-10-07 Thread Sun, Rui
Not sure "/C/DevTools/spark-1.5.1/bin/spark-submit.cmd" is a valid? From: Hossein [mailto:fal...@gmail.com] Sent: Wednesday, October 7, 2015 12:46 AM To: Khandeshi, Ami Cc: Sun, Rui; akhandeshi; user@spark.apache.org Subject: Re: SparkR Error in sparkR.init(master=“local”) in RStudio

RE: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-10-07 Thread Khandeshi, Ami
5.1/bin/spark-submit.cmd sparkr-shell C:\Users\a554719\AppData\Local\Temp\RtmpkXZVBa\backend_port45ac487f2fbd Error in sparkR.init(master = "local") : JVM is not ready after 10 seconds From: Sun, Rui [mailto:rui@intel.com] Sent: Wednesday, October 07, 2015 2:35 AM To: Hossei

Re: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-10-06 Thread Felix Cheung
Is it possible that your user does not have permission to write temp file? On Tue, Oct 6, 2015 at 10:26 AM -0700, "akhandeshi" wrote: It seems it is failing at path <- tempfile(pattern = "backend_port") I do not see backend_port directory created... -- View

RE: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-10-06 Thread Khandeshi, Ami
9\AppData\Local\Temp\Rtmpw11KJ1\backend_port31b0afd4391' had status 127 -Original Message- From: Sun, Rui [mailto:rui@intel.com] Sent: Tuesday, October 06, 2015 9:39 AM To: akhandeshi; user@spark.apache.org Subject: RE: SparkR Error in sparkR.init(master=“local”) in RStudio What

  1   2   >