from:"Shivaram Venkataraman \\\\\\\(JIRA\\\\\\\)"

[jira] [Resolved] (SPARK-8847) String concatination with column in SparkR

2015-08-18 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-8847.
--
   Resolution: Duplicate
Fix Version/s: 1.5.0

 String concatination with column in SparkR
 --

 Key: SPARK-8847
 URL: https://issues.apache.org/jira/browse/SPARK-8847
 Project: Spark
  Issue Type: New Feature
  Components: R
Reporter: Amar Gondaliya
 Fix For: 1.5.0


 1. String concatination with the values of the column. i.e. df$newcol 
 -paste(a,df$column) type functionality.
 2. String concatination between columns i.e. df$newcol - 
 paste(df$col1,-,df$col2)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10007) Update `NAMESPACE` file in SparkR for simple parameters functions

2015-08-18 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-10007:
--
Assignee: Yu Ishikawa

 Update `NAMESPACE` file in SparkR for simple parameters functions
 -

 Key: SPARK-10007
 URL: https://issues.apache.org/jira/browse/SPARK-10007
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Assignee: Yu Ishikawa
 Fix For: 1.5.0


 I appreciate that I forgot to update {{NAMESPACE}} file for the simple 
 parameters functions, such as {{ascii}}, {{base64}} and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10007) Update `NAMESPACE` file in SparkR for simple parameters functions

2015-08-18 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-10007.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 8277
[https://github.com/apache/spark/pull/8277]

 Update `NAMESPACE` file in SparkR for simple parameters functions
 -

 Key: SPARK-10007
 URL: https://issues.apache.org/jira/browse/SPARK-10007
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
 Fix For: 1.5.0


 I appreciate that I forgot to update {{NAMESPACE}} file for the simple 
 parameters functions, such as {{ascii}}, {{base64}} and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-17 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700058#comment-14700058
 ] 

Shivaram Venkataraman commented on SPARK-9427:
--

Well half of the functions are already in branch-1.5 and I guess we should have 
PRs for some of the other simpler parts (like 9856) come in soon. The more 
complex ones which require changing SerDe might not be appropriate for 1.5, but 
my plan is to get as many of the simple ones in as we can ?

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10043) Add window functions into SparkR

2015-08-16 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699032#comment-14699032
 ] 

Shivaram Venkataraman commented on SPARK-10043:
---

[~yuu.ishik...@gmail.com] Could you clarify which of these functions need 
support for better `collect` in SparkR. We only need the collect functionality 
if we are fetching data back to the driver ?

 Add window functions into SparkR
 

 Key: SPARK-10043
 URL: https://issues.apache.org/jira/browse/SPARK-10043
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Add window functions as follows in SparkR. I think we should improve 
 {{collect}} function in SparkR.
 - lead
 - cumuDist
 - denseRank
 - lag
 - ntile
 - percentRank
 - rank
 - rowNumber



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9871) Add expression functions into SparkR which have a variable parameter

2015-08-17 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-9871:
-
Assignee: Yu Ishikawa

 Add expression functions into SparkR which have a variable parameter
 

 Key: SPARK-9871
 URL: https://issues.apache.org/jira/browse/SPARK-9871
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Assignee: Yu Ishikawa
 Fix For: 1.5.0


 Add expression functions into SparkR which has a variable parameter, like 
 {{concat}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9871) Add expression functions into SparkR which have a variable parameter

2015-08-17 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9871.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 8194
[https://github.com/apache/spark/pull/8194]

 Add expression functions into SparkR which have a variable parameter
 

 Key: SPARK-9871
 URL: https://issues.apache.org/jira/browse/SPARK-9871
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
 Fix For: 1.5.0


 Add expression functions into SparkR which has a variable parameter, like 
 {{concat}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-17 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699080#comment-14699080
 ] 

Shivaram Venkataraman commented on SPARK-9427:
--

Yeah I think the simplest thing might be to add a version of `rand(seed: Int)` 
(or  `rand(seed: Double)` if we want to maintain precision ?) to the API and do 
a cast in Scala to call the version with Long. 

cc [~rxin]

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8684) Update R version in Spark EC2 AMI

2015-06-29 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606293#comment-14606293
 ] 

Shivaram Venkataraman commented on SPARK-8684:
--

The create_image.sh script is only used for generating new AMIs and we don't 
generate new AMIs very often as its a pretty expensive process to do this on 
all zones, regions etc.

Instead you could try to add a new directory named rstudio and in the init.sh 
file there you could try to upgrade the R package using yum. This would look 
somewhat like the ganglia file at 
https://github.com/mesos/spark-ec2/blob/branch-1.4/ganglia/init.sh

BTW the best way to test this is to try out the yum upgrade on a running 
spark-ec2 cluster and then put the commands into a script. Also you can point 
spark-ec2 to a custom repository with the flag --spark-ec2-git-repo 
https://github.com/apache/spark/blob/c6ba2ea341ad23de265d870669b25e6a41f461e5/ec2/spark_ec2.py#L206
 (the default is github.com/mesos/spark-ec2). So for example you could point it 
to your fork kaoning/spark-ec2.

 Update R version in Spark EC2 AMI
 -

 Key: SPARK-8684
 URL: https://issues.apache.org/jira/browse/SPARK-8684
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman
Priority: Minor

 Right now the R version in the AMI is 3.1 -- However a number of R libraries 
 need R version 3.2 and it will be good to update the R version on the AMI 
 while launching a EC2 cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8699) Select command not working for SparkR built on Spark Version: 1.4.0 and R 3.2.0

2015-06-30 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608597#comment-14608597
 ] 

Shivaram Venkataraman commented on SPARK-8699:
--

I can't reproduce this. My guess is that this is happening because you have 
some other packages loaded which are overriding the select function. Fit 
example if you replace select with SparkR::select does it work ?

 Select command not working for SparkR built on Spark Version: 1.4.0 and R 
 3.2.0
 ---

 Key: SPARK-8699
 URL: https://issues.apache.org/jira/browse/SPARK-8699
 Project: Spark
  Issue Type: Bug
  Components: R
Affects Versions: 1.4.0
 Environment: Windows 7, 64 bit 
Reporter: Kamlesh Kumar
Priority: Critical
  Labels: test

 I can successfully run Showdf and head on rrdd data frame in R but it throws 
 unexpected error for select commands. 
  R console output after running select command on rrdd data object is 
 following:
 command
 head(select(df, df$eruptions))
 output: 
 Error in head(select(df, df$eruptions)) : 
   error in evaluating the argument 'x' in selecting a method for function 
 'head': Error in UseMethod(select_) : 
   no applicable method for 'select_' applied to an object of class DataFrame



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8724) Need documentation on how to deploy or use SparkR in Spark 1.4.0+

2015-06-30 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608573#comment-14608573
 ] 

Shivaram Venkataraman commented on SPARK-8724:
--

I'm not sure what kind of documentation we need -- could you explain more ? 
Other than YARN cluster mode SparkR should work in all other modes by just 
running bin/sparkR (for shell) and bin/spark-submit (for batch jobs). Feel free 
to open a PR if you have a good idea of what would be useful.

 Need documentation on how to deploy or use SparkR in Spark 1.4.0+
 -

 Key: SPARK-8724
 URL: https://issues.apache.org/jira/browse/SPARK-8724
 Project: Spark
  Issue Type: Bug
  Components: R
Affects Versions: 1.4.0
Reporter: Felix Cheung
Priority: Minor

 As of now there doesn't seem to be any official documentation on how to 
 deploy SparkR with Spark 1.4.0+
 Also, cluster manager specific documentation (like 
 http://spark.apache.org/docs/latest/spark-standalone.html) does not call out 
 what mode is supported for SparkR and details on deployment steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8277) SparkR createDataFrame is slow

2015-06-30 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608689#comment-14608689
 ] 

Shivaram Venkataraman commented on SPARK-8277:
--

Yeah so the bottleneck is in converting R data frames from columns to a list of 
rows. It would be interesting to see if we can serialize each column at a time 
and then somehow add them as columns to the Scala DataFrame (or do a column to 
row conversion in Scala). [~cafreeman] was looking at some related stuff at 
some point.

 SparkR createDataFrame is slow
 --

 Key: SPARK-8277
 URL: https://issues.apache.org/jira/browse/SPARK-8277
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Shivaram Venkataraman

 For example calling `createDataFrame` on the data from 
 http://s3-us-west-2.amazonaws.com/sparkr-data/flights.csv takes a really long 
 time
 This is mainly because we try to convert a DataFrame to a List in order to 
 parallelize it by rows and the conversion from DF to list is very slow for 
 large data frames.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7210) Test matrix decompositions for speed vs. numerical stability for Gaussians

2015-06-30 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609002#comment-14609002
 ] 

Shivaram Venkataraman commented on SPARK-7210:
--

A more stable way would probably be do a QR decomposition and then get the SVD 
from it. There are a bunch of QR algorithms implemented at 
https://github.com/amplab/ml-matrix in case anybody wants to take a shot at 
this.

 Test matrix decompositions for speed vs. numerical stability for Gaussians
 --

 Key: SPARK-7210
 URL: https://issues.apache.org/jira/browse/SPARK-7210
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Joseph K. Bradley
Priority: Minor

 We currently use SVD for inverting the Gaussian's covariance matrix and 
 computing the determinant.  SVD is numerically stable but slow.  We could 
 experiment with Cholesky, etc. to figure out a better option, or a better 
 option for certain settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7210) Test matrix decompositions for speed vs. numerical stability for Gaussians

2015-06-30 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609052#comment-14609052
 ] 

Shivaram Venkataraman commented on SPARK-7210:
--

Sorry I think I misunderstood the JIRA title a little bit. I was commenting on 
generating procedures for computing SVD of a matrix. I am not really sure what 
the problem setting is inside the GMM. 

 Test matrix decompositions for speed vs. numerical stability for Gaussians
 --

 Key: SPARK-7210
 URL: https://issues.apache.org/jira/browse/SPARK-7210
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Joseph K. Bradley
Priority: Minor

 We currently use SVD for inverting the Gaussian's covariance matrix and 
 computing the determinant.  SVD is numerically stable but slow.  We could 
 experiment with Cholesky, etc. to figure out a better option, or a better 
 option for certain settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8684) Update R version in Spark EC2 AMI

2015-06-30 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609054#comment-14609054
 ] 

Shivaram Venkataraman commented on SPARK-8684:
--

Building from source might take a while and it wouldn't be a good idea to do it 
by default. We could put it behind a flat (--r-version=3.2) and then only build 
from source if the user specifies the flag. But the yum option could be made 
default if we could get it to work.

 Update R version in Spark EC2 AMI
 -

 Key: SPARK-8684
 URL: https://issues.apache.org/jira/browse/SPARK-8684
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman
Priority: Minor

 Right now the R version in the AMI is 3.1 -- However a number of R libraries 
 need R version 3.2 and it will be good to update the R version on the AMI 
 while launching a EC2 cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-06-30 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609148#comment-14609148
 ] 

Shivaram Venkataraman commented on SPARK-8596:
--

I think the assumption is that the root user is running the scripts in 
/root/spark/bin -- No other use cases have been tests AFAIK. On the other hand 
the Spark master (i.e the service running at spark://master_host_name:7077 
doesn't do any authentication as far as I know. So we should be able to submit 
jobs from other user accounts but you might need to copy Spark to that user's 
account before running things.

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman

 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6803) [SparkR] Support SparkR Streaming

2015-08-02 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6803:
-
Target Version/s:   (was: 1.5.0)

 [SparkR] Support SparkR Streaming
 -

 Key: SPARK-6803
 URL: https://issues.apache.org/jira/browse/SPARK-6803
 Project: Spark
  Issue Type: New Feature
  Components: SparkR, Streaming
Reporter: Hao

 Adds R API for Spark Streaming.
 A experimental version is presented in repo [1]. which follows the PySpark 
 streaming design. Also, this PR can be further broken down into sub task 
 issues.
 [1] https://github.com/hlin09/spark/tree/SparkR-streaming/ 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6823) Add a model.matrix like capability to DataFrames (modelDataFrame)

2015-08-02 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651196#comment-14651196
 ] 

Shivaram Venkataraman commented on SPARK-6823:
--

[~ekhliang] [~mengxr] Is this addressed by the StringType PR ? I'm wondering if 
we can resolve this issue

 Add a model.matrix like capability to DataFrames (modelDataFrame)
 -

 Key: SPARK-6823
 URL: https://issues.apache.org/jira/browse/SPARK-6823
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Shivaram Venkataraman

 Currently Mllib modeling tools work only with double data. However, data 
 tables in practice often have a set of categorical fields (factors in R), 
 that need to be converted to a set of 0/1 indicator variables (making the 
 data actually used in a modeling algorithm completely numeric). In R, this is 
 handled in modeling functions using the model.matrix function. Similar 
 functionality needs to be available within Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6816) Add SparkConf API to configure SparkR

2015-08-02 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6816:
-
Target Version/s:   (was: 1.5.0)

 Add SparkConf API to configure SparkR
 -

 Key: SPARK-6816
 URL: https://issues.apache.org/jira/browse/SPARK-6816
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman
Priority: Minor

 Right now the only way to configure SparkR is to pass in arguments to 
 sparkR.init. The goal is to add an API similar to SparkConf on Scala/Python 
 to make configuration easier



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6832) Handle partial reads in SparkR JVM to worker communication

2015-08-02 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6832:
-
Target Version/s:   (was: 1.5.0)

 Handle partial reads in SparkR JVM to worker communication
 --

 Key: SPARK-6832
 URL: https://issues.apache.org/jira/browse/SPARK-6832
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Shivaram Venkataraman
Priority: Minor

 After we move to use socket between R worker and JVM, it's possible that 
 readBin() in R will return partial results (for example, interrupted by 
 signal).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6821) Refactor SerDe API in SparkR to be more developer friendly

2015-08-02 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6821:
-
Target Version/s:   (was: 1.5.0)

 Refactor SerDe API in SparkR to be more developer friendly
 --

 Key: SPARK-6821
 URL: https://issues.apache.org/jira/browse/SPARK-6821
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Shivaram Venkataraman

 The existing SerDe API we use in the SparkR JVM backend is limited and not 
 very easy to use. We should refactor it to make it use more of Scala's type 
 system and also allow extensions for user-defined S3 or S4 types in R



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9319) Add support for setting column names, types

2015-08-02 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651193#comment-14651193
 ] 

Shivaram Venkataraman commented on SPARK-9319:
--

[~falaki] I believe we already added support for setting column names with 
`names(data) - c(Date)`  ? Should we also just make `colnames` a synonym for 
`names` ?

 Add support for setting column names, types
 ---

 Key: SPARK-9319
 URL: https://issues.apache.org/jira/browse/SPARK-9319
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman

 This will help us support functions of the form 
 {code}
 colnames(data) - c(“Date”, “Arrival_Delay”)
 coltypes(data) - c(“numeric”, “logical”, “character”)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6798) Fix Date serialization in SparkR

2015-08-02 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651199#comment-14651199
 ] 

Shivaram Venkataraman commented on SPARK-6798:
--

[~davies] Do you remember if this is this an actual bug or just a clunky 
implementation detail ? I'm thinking of changing the type of this JIRA to 
`Improvement` and unsetting its target version. Let me know if this sounds good 
to you.

 Fix Date serialization in SparkR
 

 Key: SPARK-6798
 URL: https://issues.apache.org/jira/browse/SPARK-6798
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Reporter: Shivaram Venkataraman
Assignee: Davies Liu
Priority: Minor

 SparkR's date serialization right now sends strings from R to the JVM. We 
 should convert this to integers and also account for timezones correctly by 
 using DateUtils



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6809) Make numPartitions optional in pairRDD APIs

2015-08-02 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6809:
-
Target Version/s:   (was: 1.5.0)

 Make numPartitions optional in pairRDD APIs
 ---

 Key: SPARK-6809
 URL: https://issues.apache.org/jira/browse/SPARK-6809
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Davies Liu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6815) Support accumulators in R

2015-08-02 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6815:
-
Target Version/s:   (was: 1.5.0)

 Support accumulators in R
 -

 Key: SPARK-6815
 URL: https://issues.apache.org/jira/browse/SPARK-6815
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman
Priority: Minor

 SparkR doesn't support acccumulators right now.  It might be good to add 
 support for this to get feature parity with PySpark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6838) Explore using Reference Classes instead of S4 objects

2015-08-02 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6838:
-
Target Version/s:   (was: 1.5.0)

 Explore using Reference Classes instead of S4 objects
 -

 Key: SPARK-6838
 URL: https://issues.apache.org/jira/browse/SPARK-6838
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Shivaram Venkataraman
Priority: Minor

 The current RDD and PipelinedRDD are represented in S4 objects. R has a new 
 OO system: Reference Class (RC or R5). It seems to be a more message-passing 
 OO and instances are mutable objects. It is not an important issue, but it 
 should also require trivial work. It could also remove the kind-of awkward 
 @ operator in S4.
 R6 is also worth checking out. Feels closer to your ordinary object oriented 
 language. https://github.com/wch/R6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8082) Functionality to Reset DF Schemas/Cast Multiple Columns

2015-08-02 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-8082:
-
Target Version/s:   (was: 1.5.0)

 Functionality to Reset DF Schemas/Cast Multiple Columns
 ---

 Key: SPARK-8082
 URL: https://issues.apache.org/jira/browse/SPARK-8082
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Aleksander Eskilson
Priority: Minor

 Currently only one column can be casted at a time with the cast() function. 
 Either a cast with multiple arguments and/or a function allowing the DF 
 schema to be reset would cut down on code to recast a DF in some cases. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6810) Performance benchmarks for SparkR

2015-08-02 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6810:
-
Target Version/s:   (was: 1.5.0)

 Performance benchmarks for SparkR
 -

 Key: SPARK-6810
 URL: https://issues.apache.org/jira/browse/SPARK-6810
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman
Priority: Critical

 We should port some performance benchmarks from spark-perf to SparkR for 
 tracking performance regressions / improvements.
 https://github.com/databricks/spark-perf/tree/master/pyspark-tests has a list 
 of PySpark performance benchmarks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-6825) Data sources implementation to support `sequenceFile`

2015-08-02 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-6825:
-
Target Version/s:   (was: 1.5.0)

 Data sources implementation to support `sequenceFile`
 -

 Key: SPARK-6825
 URL: https://issues.apache.org/jira/browse/SPARK-6825
 Project: Spark
  Issue Type: New Feature
  Components: SparkR, SQL
Reporter: Shivaram Venkataraman

 SequenceFiles are a widely used input format and right now they are not 
 supported in SparkR. 
 It would be good to add support for SequenceFiles by implementing a new data 
 source that can create a DataFrame from a SequenceFile. However as 
 SequenceFiles can have arbitrary types, we probably need to map them to 
 User-defined types in SQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6831) Document how to use external data sources

2015-08-02 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651198#comment-14651198
 ] 

Shivaram Venkataraman commented on SPARK-6831:
--

[~yhuai] Is this something we will plan to do for 1.5 ? If not we can unset the 
target version for this JIRA

 Document how to use external data sources
 -

 Key: SPARK-6831
 URL: https://issues.apache.org/jira/browse/SPARK-6831
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, PySpark, SparkR, SQL
Reporter: Shivaram Venkataraman
Priority: Critical

 We should include some instructions on how to use an external datasource for 
 users who are beginners. Do they need to install it on all the machines ? Or 
 just the master ? Are there are any special flags they need to pass to 
 `bin/spark-submit` etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8684) Update R version in Spark EC2 AMI

2015-08-02 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-8684:
-
Target Version/s:   (was: 1.5.0)

 Update R version in Spark EC2 AMI
 -

 Key: SPARK-8684
 URL: https://issues.apache.org/jira/browse/SPARK-8684
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman
Priority: Minor
 Fix For: 1.5.0


 Right now the R version in the AMI is 3.1 -- However a number of R libraries 
 need R version 3.2 and it will be good to update the R version on the AMI 
 while launching a EC2 cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9443) Expose sampleByKey in SparkR

2015-08-02 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-9443:
-
Summary: Expose sampleByKey in SparkR  (was: Explose sampleByKey in SparkR)

 Expose sampleByKey in SparkR
 

 Key: SPARK-9443
 URL: https://issues.apache.org/jira/browse/SPARK-9443
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.5.0
Reporter: Hossein Falaki

 There is pull request for DataFrames (I believe close to merging) that adds 
 sampleByKey. It would be great to expose it in SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9053) Fix spaces around parens, infix operators etc.

2015-07-31 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9053.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7584
[https://github.com/apache/spark/pull/7584]

 Fix spaces around parens, infix operators etc.
 --

 Key: SPARK-9053
 URL: https://issues.apache.org/jira/browse/SPARK-9053
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman
 Fix For: 1.5.0


 We have a number of style errors which look like 
 {code}
 Place a space before left parenthesis
 ...
 Put spaces around all infix operators.
 {code}
 However some of the warnings are spurious (example space around infix 
 operator in
 {code}
 expect_equal(collect(select(df, hypot(df$a, df$b)))[4, HYPOT(a, b)], 
 sqrt(4^2 + 8^2))
 {code}
 We should add a ignore rule for these spurious examples



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9053) Fix spaces around parens, infix operators etc.

2015-07-31 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-9053:
-
Assignee: Yu Ishikawa

 Fix spaces around parens, infix operators etc.
 --

 Key: SPARK-9053
 URL: https://issues.apache.org/jira/browse/SPARK-9053
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman
Assignee: Yu Ishikawa
 Fix For: 1.5.0


 We have a number of style errors which look like 
 {code}
 Place a space before left parenthesis
 ...
 Put spaces around all infix operators.
 {code}
 However some of the warnings are spurious (example space around infix 
 operator in
 {code}
 expect_equal(collect(select(df, hypot(df$a, df$b)))[4, HYPOT(a, b)], 
 sqrt(4^2 + 8^2))
 {code}
 We should add a ignore rule for these spurious examples



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9510) Fix remaining SparkR style violations

2015-07-31 Thread Shivaram Venkataraman (JIRA)

Shivaram Venkataraman created SPARK-9510:


 Summary: Fix remaining SparkR style violations
 Key: SPARK-9510
 URL: https://issues.apache.org/jira/browse/SPARK-9510
 Project: Spark
  Issue Type: Sub-task
Reporter: Shivaram Venkataraman


lint-r should report no errors / warnings before we can turn it on in Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8742) Improve SparkR error messages for DataFrame API

2015-07-30 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-8742:
-
Assignee: Hossein Falaki

 Improve SparkR error messages for DataFrame API
 ---

 Key: SPARK-8742
 URL: https://issues.apache.org/jira/browse/SPARK-8742
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.4.1
Reporter: Hossein Falaki
Assignee: Hossein Falaki
Priority: Blocker
 Fix For: 1.5.0


 Currently all DataFrame API errors result in following generic error:
 {code}
 Error: returnStatus == 0 is not TRUE
 {code}
 This is because invokeJava in backend.R does not inspect error messages. For 
 most use cases it is critical to return better error messages. Initially, we 
 can return the stack trace from the JVM. In future we can inspect the errors 
 and translate them to human-readable error messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8742) Improve SparkR error messages for DataFrame API

2015-07-30 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-8742.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7742
[https://github.com/apache/spark/pull/7742]

 Improve SparkR error messages for DataFrame API
 ---

 Key: SPARK-8742
 URL: https://issues.apache.org/jira/browse/SPARK-8742
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.4.1
Reporter: Hossein Falaki
Priority: Blocker
 Fix For: 1.5.0


 Currently all DataFrame API errors result in following generic error:
 {code}
 Error: returnStatus == 0 is not TRUE
 {code}
 This is because invokeJava in backend.R does not inspect error messages. For 
 most use cases it is critical to return better error messages. Initially, we 
 can return the stack trace from the JVM. In future we can inspect the errors 
 and translate them to human-readable error messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9510) Fix remaining SparkR style violations

2015-07-31 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman reassigned SPARK-9510:


Assignee: Shivaram Venkataraman

 Fix remaining SparkR style violations
 -

 Key: SPARK-9510
 URL: https://issues.apache.org/jira/browse/SPARK-9510
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman
Assignee: Shivaram Venkataraman
 Fix For: 1.5.0


 lint-r should report no errors / warnings before we can turn it on in Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9324) Add `unique` as a synonym for `distinct`

2015-07-31 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-9324:
-
Assignee: Hossein Falaki

 Add `unique` as a synonym for `distinct`
 

 Key: SPARK-9324
 URL: https://issues.apache.org/jira/browse/SPARK-9324
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman
Assignee: Hossein Falaki
 Fix For: 1.5.0


 In R unique returns a new data.frame with duplicate rows removed.
 cc [~rxin] is there some different meaning for `unique` in Spark ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9510) Fix remaining SparkR style violations

2015-07-31 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9510.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7834
[https://github.com/apache/spark/pull/7834]

 Fix remaining SparkR style violations
 -

 Key: SPARK-9510
 URL: https://issues.apache.org/jira/browse/SPARK-9510
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman
 Fix For: 1.5.0


 lint-r should report no errors / warnings before we can turn it on in Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9322) Add rbind as a synonym for `unionAll`

2015-07-31 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-9322:
-
Assignee: Hossein Falaki

 Add rbind as a synonym for `unionAll`
 -

 Key: SPARK-9322
 URL: https://issues.apache.org/jira/browse/SPARK-9322
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman
Assignee: Hossein Falaki
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9321) Add nrow, ncol, dim for SparkR data frames

2015-07-31 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9321.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7764
[https://github.com/apache/spark/pull/7764]

 Add nrow, ncol, dim for SparkR data frames
 --

 Key: SPARK-9321
 URL: https://issues.apache.org/jira/browse/SPARK-9321
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman
 Fix For: 1.5.0


 `nrow` will be a synonym for `count` and `ncol` can be implemented using 
 `columns()` or `dtypes`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9321) Add nrow, ncol, dim for SparkR data frames

2015-07-31 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-9321:
-
Assignee: Hossein Falaki

 Add nrow, ncol, dim for SparkR data frames
 --

 Key: SPARK-9321
 URL: https://issues.apache.org/jira/browse/SPARK-9321
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman
Assignee: Hossein Falaki
 Fix For: 1.5.0


 `nrow` will be a synonym for `count` and `ncol` can be implemented using 
 `columns()` or `dtypes`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9322) Add rbind as a synonym for `unionAll`

2015-07-31 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9322.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7764
[https://github.com/apache/spark/pull/7764]

 Add rbind as a synonym for `unionAll`
 -

 Key: SPARK-9322
 URL: https://issues.apache.org/jira/browse/SPARK-9322
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9324) Add `unique` as a synonym for `distinct`

2015-07-31 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9324.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7764
[https://github.com/apache/spark/pull/7764]

 Add `unique` as a synonym for `distinct`
 

 Key: SPARK-9324
 URL: https://issues.apache.org/jira/browse/SPARK-9324
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman
 Fix For: 1.5.0


 In R unique returns a new data.frame with duplicate rows removed.
 cc [~rxin] is there some different meaning for `unique` in Spark ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9562) Move spark-ec2 from mesos to amplab

2015-08-03 Thread Shivaram Venkataraman (JIRA)

Shivaram Venkataraman created SPARK-9562:


 Summary: Move spark-ec2 from mesos to amplab
 Key: SPARK-9562
 URL: https://issues.apache.org/jira/browse/SPARK-9562
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Shivaram Venkataraman


See 
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Should-spark-ec2-get-its-own-repo-td13151.html
 for more details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9248) Closing curly-braces should always be on their own line

2015-07-30 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-9248:
-
Assignee: Yu Ishikawa

 Closing curly-braces should always be on their own line
 ---

 Key: SPARK-9248
 URL: https://issues.apache.org/jira/browse/SPARK-9248
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Assignee: Yu Ishikawa
Priority: Minor
 Fix For: 1.5.0


 Closing curly-braces should always be on their own line
 For example,
 {noformat}
 inst/tests/test_sparkSQL.R:606:3: style: Closing curly-braces should always 
 be on their own line, unless it's followed by an else.
   }, error = function(err) {
   ^
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9248) Closing curly-braces should always be on their own line

2015-07-30 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9248.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7795
[https://github.com/apache/spark/pull/7795]

 Closing curly-braces should always be on their own line
 ---

 Key: SPARK-9248
 URL: https://issues.apache.org/jira/browse/SPARK-9248
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Priority: Minor
 Fix For: 1.5.0


 Closing curly-braces should always be on their own line
 For example,
 {noformat}
 inst/tests/test_sparkSQL.R:606:3: style: Closing curly-braces should always 
 be on their own line, unless it's followed by an else.
   }, error = function(err) {
   ^
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9437) SizeEstimator overflows for primitive arrays

2015-07-30 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648025#comment-14648025
 ] 

Shivaram Venkataraman commented on SPARK-9437:
--

Resolved by https://github.com/apache/spark/pull/7750

 SizeEstimator overflows for primitive arrays
 

 Key: SPARK-9437
 URL: https://issues.apache.org/jira/browse/SPARK-9437
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.1
Reporter: Imran Rashid
Assignee: Imran Rashid
Priority: Minor
 Fix For: 1.5.0


 {{SizeEstimator}} can overflow when dealing w/ large primitive arrays eg if 
 you have an {{Array[Double]}} of size 1  28.  This means that when you try 
 to broadcast a large primitive array, you get:
 {noformat}
 java.lang.IllegalArgumentException: requirement failed: sizeInBytes was 
 negative: -2147483608
at scala.Predef$.require(Predef.scala:233)
at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:815)
at 
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9437) SizeEstimator overflows for primitive arrays

2015-07-30 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9437.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

 SizeEstimator overflows for primitive arrays
 

 Key: SPARK-9437
 URL: https://issues.apache.org/jira/browse/SPARK-9437
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.1
Reporter: Imran Rashid
Assignee: Imran Rashid
Priority: Minor
 Fix For: 1.5.0


 {{SizeEstimator}} can overflow when dealing w/ large primitive arrays eg if 
 you have an {{Array[Double]}} of size 1  28.  This means that when you try 
 to broadcast a large primitive array, you get:
 {noformat}
 java.lang.IllegalArgumentException: requirement failed: sizeInBytes was 
 negative: -2147483608
at scala.Predef$.require(Predef.scala:233)
at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:815)
at 
 org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:638)
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8724) Need documentation on how to deploy or use SparkR in Spark 1.4.0+

2015-08-11 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682219#comment-14682219
 ] 

Shivaram Venkataraman commented on SPARK-8724:
--

[~cantdutchthis] One thing we could do is add a section at the bottom of 
http://spark.apache.org/docs/latest/sparkr.html titled `Deploying SparkR` or 
`Where to go from here` and a short description of how to launch EC2 clusters 
with RStudio (in 1.5) and also link to the RStudio blog post.

 Need documentation on how to deploy or use SparkR in Spark 1.4.0+
 -

 Key: SPARK-8724
 URL: https://issues.apache.org/jira/browse/SPARK-8724
 Project: Spark
  Issue Type: Bug
  Components: R
Affects Versions: 1.4.0
Reporter: Felix Cheung
Priority: Minor

 As of now there doesn't seem to be any official documentation on how to 
 deploy SparkR with Spark 1.4.0+
 Also, cluster manager specific documentation (like 
 http://spark.apache.org/docs/latest/spark-standalone.html) does not call out 
 what mode is supported for SparkR and details on deployment steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9562) Move spark-ec2 from mesos to amplab

2015-08-04 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman reassigned SPARK-9562:


Assignee: Shivaram Venkataraman

 Move spark-ec2 from mesos to amplab
 ---

 Key: SPARK-9562
 URL: https://issues.apache.org/jira/browse/SPARK-9562
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Shivaram Venkataraman
Assignee: Shivaram Venkataraman
 Fix For: 1.5.0


 See 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Should-spark-ec2-get-its-own-repo-td13151.html
  for more details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9562) Move spark-ec2 from mesos to amplab

2015-08-04 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9562.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7899
[https://github.com/apache/spark/pull/7899]

 Move spark-ec2 from mesos to amplab
 ---

 Key: SPARK-9562
 URL: https://issues.apache.org/jira/browse/SPARK-9562
 Project: Spark
  Issue Type: Improvement
  Components: EC2
Reporter: Shivaram Venkataraman
 Fix For: 1.5.0


 See 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Should-spark-ec2-get-its-own-repo-td13151.html
  for more details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9603) Re-enable complex R package test in SparkSubmitSuite

2015-08-04 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-9603:
-
Component/s: SparkR

 Re-enable complex R package test in SparkSubmitSuite
 

 Key: SPARK-9603
 URL: https://issues.apache.org/jira/browse/SPARK-9603
 Project: Spark
  Issue Type: Test
  Components: Deploy, SparkR, Tests
Affects Versions: 1.5.0
Reporter: Burak Yavuz

 For building complex Spark Packages that contain R code in addition to Scala, 
 we have a complex procedure, where R source code is shipped inside a jar. The 
 source code is extracted, built, and is added as a library among SparkR.
 The end to end test in SparkSubmitSuite (correctly builds R packages 
 included in a jar with --packages) can't run on Jenkins now, because the 
 pull request builder is not built with SparkR. Once the PR Builder is built 
 with SparkR, we should re-enable the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9605) SparkR installation error: Error: Invalid or corrupt jarfile sbt/sbt-launch-0.13.6.jar

2015-08-04 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654073#comment-14654073
 ] 

Shivaram Venkataraman commented on SPARK-9605:
--

The amplab version of SparkR is no longer supported and the SparkR project has 
become a part of the Apache Spark project. Please follow instructions to 
download and run Spark ( 1.4) at 
http://spark.apache.org/docs/latest/#downloading

 SparkR installation error: Error: Invalid or corrupt jarfile 
 sbt/sbt-launch-0.13.6.jar
 --

 Key: SPARK-9605
 URL: https://issues.apache.org/jira/browse/SPARK-9605
 Project: Spark
  Issue Type: Bug
 Environment: R version 3.1.2 (2014-10-31)
 Platform: x86_64-apple-darwin13.4.0 (64-bit)
 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
 attached base packages:
 [1] stats graphics grDevices utils datasets methods base
 other attached packages:
 [1] devtools_1.6.1 rJava_0.9-7
 loaded via a namespace (and not attached):
 [1] bitops_1.0-6 httr_0.5 magrittr_1.5 RCurl_1.95-4.5 stringi_0.5-5 
 stringr_1.0.0 tools_3.1.2
Reporter: Selcuk Korkmaz

 I am fairly new to Spark! I am trying to install SparkR package. But I am 
 getting following error:
  library(devtools)
  install_github(amplab-extras/SparkR-pkg, subdir=pkg)
 Downloading github repo amplab-extras/SparkR-pkg@master
 Installing SparkR
 Installing dependencies for SparkR:
 '/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD INSTALL \
 '/private/var/folders/x_/y8_3xqc130n1q55fwwkmgm00gn/T/RtmpRH9vkn/devtools1ec166a2c628/amplab-extras-SparkR-pkg-e532627/pkg'
  \
 --library='/Library/Frameworks/R.framework/Versions/3.1/Resources/library' 
 --install-tests
 installing source package 'SparkR' ...
 libs
 arch -
 ./sbt/sbt assembly
 Attempting to fetch sbt
 'SparkR'
 removing 
 '/Library/Frameworks/R.framework/Versions/3.1/Resources/library/SparkR'
 Error: Command failed (1)
 I have installed scala-2.11.7 with following approach.
 $ brew update
 $ brew install scala
 $ brew install sbt
 I could not install scala-2.10. Is this the part of the problem.
 Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9605) SparkR installation error: Error: Invalid or corrupt jarfile sbt/sbt-launch-0.13.6.jar

2015-08-04 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654149#comment-14654149
 ] 

Shivaram Venkataraman commented on SPARK-9605:
--

You don't need to install SparkR package in R. You can download Spark 1.4.1 
from http://spark.apache.org/downloads.html, unzip it and then run 
./bin/sparkR. BTW this is a more appropriate question for the Spark user 
mailing list (http://spark.apache.org/community.html) and not for the JIRA 
(which is used for bug reports, development tracking etc.)

 SparkR installation error: Error: Invalid or corrupt jarfile 
 sbt/sbt-launch-0.13.6.jar
 --

 Key: SPARK-9605
 URL: https://issues.apache.org/jira/browse/SPARK-9605
 Project: Spark
  Issue Type: Bug
 Environment: R version 3.1.2 (2014-10-31)
 Platform: x86_64-apple-darwin13.4.0 (64-bit)
 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
 attached base packages:
 [1] stats graphics grDevices utils datasets methods base
 other attached packages:
 [1] devtools_1.6.1 rJava_0.9-7
 loaded via a namespace (and not attached):
 [1] bitops_1.0-6 httr_0.5 magrittr_1.5 RCurl_1.95-4.5 stringi_0.5-5 
 stringr_1.0.0 tools_3.1.2
Reporter: Selcuk Korkmaz

 I am fairly new to Spark! I am trying to install SparkR package. But I am 
 getting following error:
  library(devtools)
  install_github(amplab-extras/SparkR-pkg, subdir=pkg)
 Downloading github repo amplab-extras/SparkR-pkg@master
 Installing SparkR
 Installing dependencies for SparkR:
 '/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD INSTALL  \
   
 '/private/var/folders/x_/y8_3xqc130n1q55fwwkmgm00gn/T/RtmpRH9vkn/devtools1ec15ed080d/amplab-extras-SparkR-pkg-e532627/pkg'
   \
   --library='/Library/Frameworks/R.framework/Versions/3.1/Resources/library' 
 --install-tests 
 * installing *source* package ‘SparkR’ ...
 ** libs
 ** arch - 
 ./sbt/sbt assembly
 Attempting to fetch sbt
 Launching sbt from sbt/sbt-launch-0.13.6.jar
 Error: Invalid or corrupt jarfile sbt/sbt-launch-0.13.6.jar
 make: *** [target/scala-2.10/sparkr-assembly-0.1.jar] Error 1
 ERROR: compilation failed for package ‘SparkR’
 * removing 
 ‘/Library/Frameworks/R.framework/Versions/3.1/Resources/library/SparkR’
 Error: Command failed (1)
 I have installed scala-2.11.7 with following approach.
 $ brew update
 $ brew install scala
 $ brew install sbt
 I could not install scala-2.10. Is this the part of the problem.
 Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9605) SparkR installation error: Error: Invalid or corrupt jarfile sbt/sbt-launch-0.13.6.jar

2015-08-04 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9605.
--
Resolution: Not A Problem

 SparkR installation error: Error: Invalid or corrupt jarfile 
 sbt/sbt-launch-0.13.6.jar
 --

 Key: SPARK-9605
 URL: https://issues.apache.org/jira/browse/SPARK-9605
 Project: Spark
  Issue Type: Bug
 Environment: R version 3.1.2 (2014-10-31)
 Platform: x86_64-apple-darwin13.4.0 (64-bit)
 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
 attached base packages:
 [1] stats graphics grDevices utils datasets methods base
 other attached packages:
 [1] devtools_1.6.1 rJava_0.9-7
 loaded via a namespace (and not attached):
 [1] bitops_1.0-6 httr_0.5 magrittr_1.5 RCurl_1.95-4.5 stringi_0.5-5 
 stringr_1.0.0 tools_3.1.2
Reporter: Selcuk Korkmaz

 I am fairly new to Spark! I am trying to install SparkR package. But I am 
 getting following error:
  library(devtools)
  install_github(amplab-extras/SparkR-pkg, subdir=pkg)
 Downloading github repo amplab-extras/SparkR-pkg@master
 Installing SparkR
 Installing dependencies for SparkR:
 '/Library/Frameworks/R.framework/Resources/bin/R' --vanilla CMD INSTALL  \
   
 '/private/var/folders/x_/y8_3xqc130n1q55fwwkmgm00gn/T/RtmpRH9vkn/devtools1ec15ed080d/amplab-extras-SparkR-pkg-e532627/pkg'
   \
   --library='/Library/Frameworks/R.framework/Versions/3.1/Resources/library' 
 --install-tests 
 * installing *source* package ‘SparkR’ ...
 ** libs
 ** arch - 
 ./sbt/sbt assembly
 Attempting to fetch sbt
 Launching sbt from sbt/sbt-launch-0.13.6.jar
 Error: Invalid or corrupt jarfile sbt/sbt-launch-0.13.6.jar
 make: *** [target/scala-2.10/sparkr-assembly-0.1.jar] Error 1
 ERROR: compilation failed for package ‘SparkR’
 * removing 
 ‘/Library/Frameworks/R.framework/Versions/3.1/Resources/library/SparkR’
 Error: Command failed (1)
 I have installed scala-2.11.7 with following approach.
 $ brew update
 $ brew install scala
 $ brew install sbt
 I could not install scala-2.10. Is this the part of the problem.
 Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9972) Add `struct` function in SparkR

2015-08-14 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697232#comment-14697232
 ] 

Shivaram Venkataraman commented on SPARK-9972:
--

Yeah this can be marked as being blocked by 
https://issues.apache.org/jira/browse/SPARK-6819

 Add `struct` function in SparkR
 ---

 Key: SPARK-9972
 URL: https://issues.apache.org/jira/browse/SPARK-9972
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Support {{struct}} function on a DataFrame in SparkR. However, I think we 
 need to improve {{collect}} function in SparkR in order to implement 
 {{struct}} function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7420) Flaky test: o.a.s.streaming.JobGeneratorSuite Do not clear received block data too soon

2015-08-09 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-7420:
-
Labels: flaky-test  (was: )

 Flaky test: o.a.s.streaming.JobGeneratorSuite Do not clear received block 
 data too soon
 -

 Key: SPARK-7420
 URL: https://issues.apache.org/jira/browse/SPARK-7420
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.3.1, 1.4.0
Reporter: Andrew Or
Assignee: Tathagata Das
Priority: Critical
  Labels: flaky-test

 {code}
 The code passed to eventually never returned normally. Attempted 18 times 
 over 10.13803606001 seconds. Last failure message: 
 receiverTracker.hasUnallocatedBlocks was false.
 {code}
 It seems to be failing only in maven.
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-Maven-pre-YARN/hadoop.version=2.0.0-mr1-cdh4.1.2,label=centos/458/
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.3-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/459/
 https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/hadoop.version=1.0.4,label=centos/2173/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9710) RPackageUtilsSuite fails if R is not installed

2015-08-10 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9710.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 8008
[https://github.com/apache/spark/pull/8008]

 RPackageUtilsSuite fails if R is not installed
 --

 Key: SPARK-9710
 URL: https://issues.apache.org/jira/browse/SPARK-9710
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.5.0
Reporter: Marcelo Vanzin
 Fix For: 1.5.0


 That's because there's a bug in RUtils.scala. PR soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9710) RPackageUtilsSuite fails if R is not installed

2015-08-10 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-9710:
-
Assignee: Marcelo Vanzin

 RPackageUtilsSuite fails if R is not installed
 --

 Key: SPARK-9710
 URL: https://issues.apache.org/jira/browse/SPARK-9710
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.5.0
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
 Fix For: 1.5.0


 That's because there's a bug in RUtils.scala. PR soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9972) Add `struct`, `encode` and `decode` function in SparkR

2015-08-14 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698012#comment-14698012
 ] 

Shivaram Venkataraman commented on SPARK-9972:
--

[~yuu.ishik...@gmail.com] Why does `sort_array` need nested types ? The sorting 
is only going to happen in the Java side and the return type is only a Column ?

 Add `struct`, `encode` and `decode` function in SparkR
 --

 Key: SPARK-9972
 URL: https://issues.apache.org/jira/browse/SPARK-9972
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Support {{struct}} function on a DataFrame in SparkR. However, I think we 
 need to improve {{collect}} function in SparkR in order to implement 
 {{struct}} function.
 - struct
 - encode
 - decode
 - array_contains
 - sort_array



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-11 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692662#comment-14692662
 ] 

Shivaram Venkataraman commented on SPARK-9427:
--

[~yuu.ishik...@gmail.com] Breaking it into 3 PRs sounds good to me. Do you have 
an idea of how many functions there are of each type ?

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9865) Flaky SparkR test: test_sparkSQL.R: sample on a DataFrame

2015-08-12 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693814#comment-14693814
 ] 

Shivaram Venkataraman commented on SPARK-9865:
--

So we sample 10% in a DataFrame with 3 rows and expect to get less than 3 rows. 
I guess there is a very small chance that you still get back 3 rows.
One fix for this might be to just sample 1% ? [~davies] Do you have any other 
fix in mind ?

 Flaky SparkR test: test_sparkSQL.R: sample on a DataFrame
 -

 Key: SPARK-9865
 URL: https://issues.apache.org/jira/browse/SPARK-9865
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Reporter: Davies Liu

 1. Failure (at test_sparkSQL.R#525): sample on a DataFrame 
 -
 count(sampled3)  3 isn't true
 Error: Test failures
 Execution halted
 https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1468/console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8313) Support Spark Packages containing R code with --packages

2015-08-04 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-8313.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7139
[https://github.com/apache/spark/pull/7139]

 Support Spark Packages containing R code with --packages
 

 Key: SPARK-8313
 URL: https://issues.apache.org/jira/browse/SPARK-8313
 Project: Spark
  Issue Type: New Feature
  Components: Spark Submit, SparkR
Reporter: Burak Yavuz
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8313) Support Spark Packages containing R code with --packages

2015-08-04 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-8313:
-
Assignee: Burak Yavuz

 Support Spark Packages containing R code with --packages
 

 Key: SPARK-8313
 URL: https://issues.apache.org/jira/browse/SPARK-8313
 Project: Spark
  Issue Type: New Feature
  Components: Spark Submit, SparkR
Reporter: Burak Yavuz
Assignee: Burak Yavuz
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9121) Get rid of the warnings about `no visible global function definition` in SparkR

2015-07-21 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635300#comment-14635300
 ] 

Shivaram Venkataraman commented on SPARK-9121:
--

Yeah we can add `install-dev.sh` in Jenkins before dev/lint-r. One unfortunate 
thing is that we typically do a lint-check before we run the rest of the 
Jenkins tests (build, unit tests etc.) So it would be good to not have this be 
the other way around I guess

 Get rid of the warnings about `no visible global function definition` in 
 SparkR
 ---

 Key: SPARK-9121
 URL: https://issues.apache.org/jira/browse/SPARK-9121
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 We have a lot of warnings about {{no visible global function definition}} in 
 SparkR. So we should get rid of them.
 {noformat}
 R/utils.R:513:5: warning: no visible global function definition for 
 ‘processClosure’
 processClosure(func.body, oldEnv, defVars, checkedFuncs, newEnv)
 ^~
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9053) Fix spaces around parens, infix operators etc.

2015-07-21 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636060#comment-14636060
 ] 

Shivaram Venkataraman commented on SPARK-9053:
--

Yeah - there are a bunch of real issues to be fixed first and we can discuss 
the ignore rule after that. Also I don't think we should ignore all warnings of 
this form -- just say on the `^` operator or we can mark out portions of the 
code that need to be ignored etc.

 Fix spaces around parens, infix operators etc.
 --

 Key: SPARK-9053
 URL: https://issues.apache.org/jira/browse/SPARK-9053
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Shivaram Venkataraman

 We have a number of style errors which look like 
 {code}
 Place a space before left parenthesis
 ...
 Put spaces around all infix operators.
 {code}
 However some of the warnings are spurious (example space around infix 
 operator in
 {code}
 expect_equal(collect(select(df, hypot(df$a, df$b)))[4, HYPOT(a, b)], 
 sqrt(4^2 + 8^2))
 {code}
 We should add a ignore rule for these spurious examples



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9121) Get rid of the warnings about `no visible global function definition` in SparkR

2015-07-21 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9121.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7567
[https://github.com/apache/spark/pull/7567]

 Get rid of the warnings about `no visible global function definition` in 
 SparkR
 ---

 Key: SPARK-9121
 URL: https://issues.apache.org/jira/browse/SPARK-9121
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
 Fix For: 1.5.0


 We have a lot of warnings about {{no visible global function definition}} in 
 SparkR. So we should get rid of them.
 {noformat}
 R/utils.R:513:5: warning: no visible global function definition for 
 ‘processClosure’
 processClosure(func.body, oldEnv, defVars, checkedFuncs, newEnv)
 ^~
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9121) Get rid of the warnings about `no visible global function definition` in SparkR

2015-07-21 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-9121:
-
Assignee: Yu Ishikawa

 Get rid of the warnings about `no visible global function definition` in 
 SparkR
 ---

 Key: SPARK-9121
 URL: https://issues.apache.org/jira/browse/SPARK-9121
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Assignee: Yu Ishikawa
 Fix For: 1.5.0


 We have a lot of warnings about {{no visible global function definition}} in 
 SparkR. So we should get rid of them.
 {noformat}
 R/utils.R:513:5: warning: no visible global function definition for 
 ‘processClosure’
 processClosure(func.body, oldEnv, defVars, checkedFuncs, newEnv)
 ^~
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9230) SparkR RFormula should support StringType features

2015-07-21 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635776#comment-14635776
 ] 

Shivaram Venkataraman commented on SPARK-9230:
--

[~ekhliang] [~mengxr] One more thing that would be good to do is to make these 
formulas also work with actual columns in R. For example in DataFrames we parse 
columns with df$col_name. So it will be great to support a formula of the kind 
df$Sepal_Length ~ df$Sepal_Width

 SparkR RFormula should support StringType features
 --

 Key: SPARK-9230
 URL: https://issues.apache.org/jira/browse/SPARK-9230
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Eric Liang

 StringType features will need to be encoded using OneHotEncoder to be used 
 for regression. See umbrella design doc 
 https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9230) SparkR RFormula should support StringType features

2015-07-21 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635796#comment-14635796
 ] 

Shivaram Venkataraman commented on SPARK-9230:
--

The thing to do there would be to capture it as SparkR DataFrame columns. so 
df$Sepal_Width actually resolves to a Java column class and then we can parse 
those in RFormula -- So in some sense we'll have two constructors, one from 
strings and one from DataFrame columns.

 SparkR RFormula should support StringType features
 --

 Key: SPARK-9230
 URL: https://issues.apache.org/jira/browse/SPARK-9230
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Eric Liang

 StringType features will need to be encoded using OneHotEncoder to be used 
 for regression. See umbrella design doc 
 https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9322) Add rbind as a synonym for `unionAll`

2015-07-24 Thread Shivaram Venkataraman (JIRA)

Shivaram Venkataraman created SPARK-9322:


 Summary: Add rbind as a synonym for `unionAll`
 Key: SPARK-9322
 URL: https://issues.apache.org/jira/browse/SPARK-9322
 Project: Spark
  Issue Type: Sub-task
Reporter: Shivaram Venkataraman






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8364) Add crosstab to SparkR DataFrames

2015-07-22 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-8364.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7318
[https://github.com/apache/spark/pull/7318]

 Add crosstab to SparkR DataFrames
 -

 Key: SPARK-8364
 URL: https://issues.apache.org/jira/browse/SPARK-8364
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
 Fix For: 1.5.0


 Add `crosstab` to SparkR DataFrames, which takes two column names and returns 
 a local R data.frame. This is similar to `table` in R. However, `table` in 
 SparkR is used for loading SQL tables as DataFrames. The return type is 
 data.frame instead table for `crosstab` to be compatible with Scala/Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8807) Add between operator in SparkR

2015-07-16 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-8807.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7356
[https://github.com/apache/spark/pull/7356]

 Add between operator in SparkR
 --

 Key: SPARK-8807
 URL: https://issues.apache.org/jira/browse/SPARK-8807
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa
 Fix For: 1.5.0


 Add between operator in SparkR
 ```
 df$age between c(1, 2)
 ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8807) Add between operator in SparkR

2015-07-16 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-8807:
-
Assignee: Liang-Chi Hsieh

 Add between operator in SparkR
 --

 Key: SPARK-8807
 URL: https://issues.apache.org/jira/browse/SPARK-8807
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa
Assignee: Liang-Chi Hsieh
 Fix For: 1.5.0


 Add between operator in SparkR
 ```
 df$age between c(1, 2)
 ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9053) Fix spaces around parens, infix operators etc.

2015-07-14 Thread Shivaram Venkataraman (JIRA)

Shivaram Venkataraman created SPARK-9053:


 Summary: Fix spaces around parens, infix operators etc.
 Key: SPARK-9053
 URL: https://issues.apache.org/jira/browse/SPARK-9053
 Project: Spark
  Issue Type: Sub-task
Reporter: Shivaram Venkataraman


We have a number of style errors which look like 

{code}
Place a space before left parenthesis
...
Put spaces around all infix operators.
{code}

However some of the warnings are spurious (example space around infix operator 
in
{code}
expect_equal(collect(select(df, hypot(df$a, df$b)))[4, HYPOT(a, b)], sqrt(4^2 
+ 8^2))
{code}

We should add a ignore rule for these spurious examples



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8808) Fix assignments in SparkR

2015-07-14 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-8808:
-
Assignee: Sun Rui

 Fix assignments in SparkR
 -

 Key: SPARK-8808
 URL: https://issues.apache.org/jira/browse/SPARK-8808
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Assignee: Sun Rui
 Fix For: 1.5.0


 {noformat}
 inst/tests/test_binary_function.R:79:12: style: Use -, not =, for assignment.
   mockFile = c(Spark is pretty., Spark is awesome.)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8808) Fix assignments in SparkR

2015-07-14 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-8808.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7395
[https://github.com/apache/spark/pull/7395]

 Fix assignments in SparkR
 -

 Key: SPARK-8808
 URL: https://issues.apache.org/jira/browse/SPARK-8808
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
 Fix For: 1.5.0


 {noformat}
 inst/tests/test_binary_function.R:79:12: style: Use -, not =, for assignment.
   mockFile = c(Spark is pretty., Spark is awesome.)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9052) Fix comments after curly braces

2015-07-14 Thread Shivaram Venkataraman (JIRA)

Shivaram Venkataraman created SPARK-9052:


 Summary: Fix comments after curly braces
 Key: SPARK-9052
 URL: https://issues.apache.org/jira/browse/SPARK-9052
 Project: Spark
  Issue Type: Sub-task
Reporter: Shivaram Venkataraman


Right now we have a number of style check errors of the form 

{code}
Opening curly braces should never go on their own line and should always and be 
followed by a new line.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9121) Get rid of the warnings about `no visible global function definition` in SparkR

2015-07-17 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631590#comment-14631590
 ] 

Shivaram Venkataraman commented on SPARK-9121:
--

[~yuu.ishik...@gmail.com] I think I found a fix for this problem. If we include 
the SparkR package in dev/lint-r.R before we call lint_package then we dont' 
get these errors

{code}
 library(SparkR, lib.loc=paste(SPARK_ROOT_DIR, /R, /lib, sep = ))
{code}

 Get rid of the warnings about `no visible global function definition` in 
 SparkR
 ---

 Key: SPARK-9121
 URL: https://issues.apache.org/jira/browse/SPARK-9121
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 We have a lot of warnings about {{no visible global function definition}} in 
 SparkR. So we should get rid of them.
 {noformat}
 R/utils.R:513:5: warning: no visible global function definition for 
 ‘processClosure’
 processClosure(func.body, oldEnv, defVars, checkedFuncs, newEnv)
 ^~
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-13 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-8596.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7366
[https://github.com/apache/spark/pull/7366]

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman
 Fix For: 1.5.0


 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8952) JsonFile() of SQLContext display improper warning message for a S3 path

2015-07-13 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624781#comment-14624781
 ] 

Shivaram Venkataraman commented on SPARK-8952:
--

So the reason normalizePath exists is to make the local file paths work 
correctly (i.e things like ~/spark/README.md) -- Maybe we could have a function 
that does this on the Scala side but also verifies this with the Hadoop 
Configuration ?

cc [~davies]

 JsonFile() of SQLContext display improper warning message for a S3 path
 ---

 Key: SPARK-8952
 URL: https://issues.apache.org/jira/browse/SPARK-8952
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Sun Rui

 This is an issue reported by Ben Spark ben_spar...@yahoo.com.au.
 {quote}
 Spark 1.4 deployed on AWS EMR 
 jsonFile is working though with some warning message
 Warning message:
 In normalizePath(path) :
   
 path[1]=s3://rea-consumer-data-dev/cbr/profiler/output/20150618/part-0: 
 No such file or directory
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8596) Install and configure RStudio server on Spark EC2

2015-07-13 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-8596:
-
Assignee: Vincent Warmerdam

 Install and configure RStudio server on Spark EC2
 -

 Key: SPARK-8596
 URL: https://issues.apache.org/jira/browse/SPARK-8596
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SparkR
Reporter: Shivaram Venkataraman
Assignee: Vincent Warmerdam
 Fix For: 1.5.0


 This will make it convenient for R users to use SparkR from their browsers 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6797) Add support for YARN cluster mode

2015-07-13 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-6797.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 6743
[https://github.com/apache/spark/pull/6743]

 Add support for YARN cluster mode
 -

 Key: SPARK-6797
 URL: https://issues.apache.org/jira/browse/SPARK-6797
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Shivaram Venkataraman
Assignee: Sun Rui
Priority: Critical
 Fix For: 1.5.0


 SparkR currently does not work in YARN cluster mode as the R package is not 
 shipped along with the assembly jar to the YARN AM. We could try to use the 
 support for archives in YARN to send out the R package as a zip file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9201) Integrate MLlib with SparkR using RFormula

2015-07-20 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-9201.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7483
[https://github.com/apache/spark/pull/7483]

 Integrate MLlib with SparkR using RFormula
 --

 Key: SPARK-9201
 URL: https://issues.apache.org/jira/browse/SPARK-9201
 Project: Spark
  Issue Type: New Feature
  Components: ML, SparkR
Reporter: Eric Liang
Assignee: Eric Liang
 Fix For: 1.5.0


 We need to interface R glm() and predict() with mllib R formula support.
 Design doc from umbrella task: 
 https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9121) Get rid of the warnings about `no visible global function definition` in SparkR

2015-07-21 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634626#comment-14634626
 ] 

Shivaram Venkataraman commented on SPARK-9121:
--

Yeah - we can just call `install-dev.sh` before running the lint script to make 
sure of that if this is required.

 Get rid of the warnings about `no visible global function definition` in 
 SparkR
 ---

 Key: SPARK-9121
 URL: https://issues.apache.org/jira/browse/SPARK-9121
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 We have a lot of warnings about {{no visible global function definition}} in 
 SparkR. So we should get rid of them.
 {noformat}
 R/utils.R:513:5: warning: no visible global function definition for 
 ‘processClosure’
 processClosure(func.body, oldEnv, defVars, checkedFuncs, newEnv)
 ^~
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10219) Error when additional options provided as variable in write.df

2015-08-25 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711642#comment-14711642
 ] 

Shivaram Venkataraman commented on SPARK-10219:
---

I think thats happening because `mode` is actually an argument name that is 
taken in by the write.df method -- So I am not sure you need option=mode, but 
just mode=mode or mode=append should work ?

 Error when additional options provided as variable in write.df
 --

 Key: SPARK-10219
 URL: https://issues.apache.org/jira/browse/SPARK-10219
 Project: Spark
  Issue Type: Bug
  Components: R
Affects Versions: 1.4.0
 Environment: SparkR shell
Reporter: Samuel Alexander
  Labels: spark-shell, sparkR

 Opened a SparkR shell
 Created a df using 
  df - jsonFile(sqlContext, examples/src/main/resources/people.json)
 Assigned a variable like below
  mode - append
 When write.df called using below statement got the mentioned error
  write.df(df, source=org.apache.spark.sql.parquet, path=par_path, 
  option=mode)
 Error in writeType(con, type) : Unsupported type for serialization name
 Whereas mode is passed as append itself, i.e. not via mode variable as 
 below everything works fine
  write.df(df, source=org.apache.spark.sql.parquet, path=par_path, 
  option=append)
 Note: For parquet it is not needed to hanve option. But we are using Spark 
 Salesforce package 
 (http://spark-packages.org/package/springml/spark-salesforce) which require 
 additional options to be passed.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10214) Improve SparkR Column, DataFrame API docs

2015-08-24 Thread Shivaram Venkataraman (JIRA)

Shivaram Venkataraman created SPARK-10214:
-

 Summary: Improve SparkR Column, DataFrame API docs
 Key: SPARK-10214
 URL: https://issues.apache.org/jira/browse/SPARK-10214
 Project: Spark
  Issue Type: Documentation
  Components: SparkR
Reporter: Shivaram Venkataraman


Right now the docs for functions like `agg` and `filter` have duplicate entries 
like `agg-method` and `filter-method` etc. We should use the `name` Rd tag and 
remove these duplicates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10214) Improve SparkR Column, DataFrame API docs

2015-08-24 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710404#comment-14710404
 ] 

Shivaram Venkataraman commented on SPARK-10214:
---

cc [~yuu.ishik...@gmail.com] 

 Improve SparkR Column, DataFrame API docs
 -

 Key: SPARK-10214
 URL: https://issues.apache.org/jira/browse/SPARK-10214
 Project: Spark
  Issue Type: Documentation
  Components: SparkR
Reporter: Shivaram Venkataraman

 Right now the docs for functions like `agg` and `filter` have duplicate 
 entries like `agg-method` and `filter-method` etc. We should use the `name` 
 Rd tag and remove these duplicates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10118) Improve SparkR API docs for 1.5 release

2015-08-24 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-10118:
--
Assignee: Yu Ishikawa

 Improve SparkR API docs for 1.5 release
 ---

 Key: SPARK-10118
 URL: https://issues.apache.org/jira/browse/SPARK-10118
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, SparkR
Reporter: Shivaram Venkataraman
Assignee: Yu Ishikawa
 Fix For: 1.5.0


 This includes checking if the new DataFrame functions  expression show up 
 appropriately in the roxygen docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10118) Improve SparkR API docs for 1.5 release

2015-08-24 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-10118.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 8386
[https://github.com/apache/spark/pull/8386]

 Improve SparkR API docs for 1.5 release
 ---

 Key: SPARK-10118
 URL: https://issues.apache.org/jira/browse/SPARK-10118
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, SparkR
Reporter: Shivaram Venkataraman
 Fix For: 1.5.0


 This includes checking if the new DataFrame functions  expression show up 
 appropriately in the roxygen docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11255) R Test build should run on R 3.1.1

2015-10-24 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972765#comment-14972765
 ] 

Shivaram Venkataraman commented on SPARK-11255:
---

cc [~shaneknapp]

> R Test build should run on R 3.1.1
> --
>
> Key: SPARK-11255
> URL: https://issues.apache.org/jira/browse/SPARK-11255
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Felix Cheung
>Priority: Minor
>
> Test should run on R 3.1.1 which is the version listed as supported.
> Apparently there are few R changes that can go undetected since Jenkins Test 
> build is running something newer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11231) join returns schema with duplicated and ambiguous join columns

2015-10-21 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967364#comment-14967364
 ] 

Shivaram Venkataraman commented on SPARK-11231:
---

[~Narine] [~sunrui] Is this covered by 
https://github.com/apache/spark/pull/9012 or does this require some changes on 
the scala side ?

cc [~davies]

> join returns schema with duplicated and ambiguous join columns
> --
>
> Key: SPARK-11231
> URL: https://issues.apache.org/jira/browse/SPARK-11231
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.1
> Environment: R
>Reporter: Matt Pollock
>
> In the case where the key column of two data frames are named the same thing, 
> join returns a data frame where that column is duplicated. Since the content 
> of the columns is guaranteed to be the same by row consolidating the 
> identical columns into a single column would replicate standard R behavior[1] 
> and help prevent ambiguous names.
> Example:
> {code}
> > df1 <- data.frame(key=c("A", "B", "C"), value1=c(1, 2, 3))
> > df2 <- data.frame(key=c("A", "B", "C"), value2=c(4, 5, 6))
> > sdf1 <- createDataFrame(sqlContext, df1)
> > sdf2 <- createDataFrame(sqlContext, df2)
> > sjdf <- join(sdf1, sdf2, sdf1$key == sdf2$key, "inner")
> > schema(sjdf)
> StructType
> |-name = "key", type = "StringType", nullable = TRUE
> |-name = "value1", type = "DoubleType", nullable = TRUE
> |-name = "key", type = "StringType", nullable = TRUE
> |-name = "value2", type = "DoubleType", nullable = TRUE
> {code}
> The duplicated key columns cause things like:
> {code}
> > library(magrittr)
> > sjdf %>% select("key")
> 15/10/21 11:04:28 ERROR r.RBackendHandler: select on 1414 failed
> Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : 
>   org.apache.spark.sql.AnalysisException: Reference 'key' is ambiguous, could 
> be: key#125, key#127.;
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:278)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:162)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4$$anonfun$20.apply(Analyzer.scala:403)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4$$anonfun$20.apply(Analyzer.scala:403)
>   at 
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:403)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$8$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:399)
>   at org.apache.spark.sql.catalyst.tree
> {code}
> [1] In base R there is no"join", but a similar function "merge" is provided 
> in which a "by" argument identifies the shared key column in the two data 
> frames. In the case where the key column names differ "by.x" and "by.y" 
> arguments can be used. In the case of same-named key columns the 
> consolidation behavior requested above is observed. In the case of differing 
> names they "by.x" name is retained and consolidated with the "by.y" column 
> which is dropped.
> {code}
> > df1 <- data.frame(key=c("A", "B", "C"), value1=c(1, 2, 3))
> > df2 <- data.frame(key=c("A", "B", "C"), value2=c(4, 5, 6))
> > merge(df1, df2, by="key")
>   key value1 value2
> 1   A  1  4
> 2   B  2  5
> 3   C  3  6
> df3 <- data.frame(akey=c("A", "B", "C"), value1=c(1, 2, 3))
> > merge(df2, df3, by.x="key", by.y="akey")
>   key value2 value1
> 1   A  4  1
> 2   B  5  2
> 3   C  6  3
> > merge(df3, df2, by.x="akey", by.y="key")
>   akey value1 value2
> 1A  1  4
> 2B  2  5
> 3C  3  6
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11244) sparkR.stop doesn't clean up .sparkRSQLsc in environment

2015-10-21 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967778#comment-14967778
 ] 

Shivaram Venkataraman commented on SPARK-11244:
---

Good catch -- Could you send a PR for this ?

> sparkR.stop doesn't clean up .sparkRSQLsc in environment
> 
>
> Key: SPARK-11244
> URL: https://issues.apache.org/jira/browse/SPARK-11244
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Sen Fang
>
> Currently {{sparkR.stop}} removes relevant variables from {{.sparkREnv}} for 
> SparkContext and backend. However it doesn't clean up {{.sparkRSQLsc}} and 
> {{.sparkRHivesc}}.
> It results 
> {code}
> sc <- sparkR.init("local")
> sqlContext <- sparkRSQL.init(sc)
> sparkR.stop()
> sc <- sparkR.init("local")
> sqlContext <- sparkRSQL.init(sc)
> sqlContext
> {code}
> producing
> {code}
>  sqlContext
> Error in callJMethod(x, "getClass") : 
>   Invalid jobj 1. If SparkR was restarted, Spark operations need to be 
> re-executed.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11238) SparkR: Documentation change for merge function

2015-10-21 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967807#comment-14967807
 ] 

Shivaram Venkataraman commented on SPARK-11238:
---

Also on this we should mark this as a breaking API change from 1.5 in the 
release notes

cc [~rxin] [~pwendell]

> SparkR: Documentation change for merge function
> ---
>
> Key: SPARK-11238
> URL: https://issues.apache.org/jira/browse/SPARK-11238
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>
> As discussed in pull request: https://github.com/apache/spark/pull/9012, the 
> signature of the merge function will be changed, therefore documentation 
> change is required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-11294) Improve R doc for read.df, write.df, saveAsTable

2015-10-25 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-11294:
--
Fix Version/s: (was: 1.5.2)
   1.5.3

> Improve R doc for read.df, write.df, saveAsTable
> 
>
> Key: SPARK-11294
> URL: https://issues.apache.org/jira/browse/SPARK-11294
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Minor
> Fix For: 1.5.3, 1.6.0
>
>
> API doc lacks example and has several formatting issues



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-11258) Converting a Spark DataFrame into an R data.frame is slow / requires a lot of memory

2015-10-26 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-11258:
--
Assignee: Frank Rosner

> Converting a Spark DataFrame into an R data.frame is slow / requires a lot of 
> memory
> 
>
> Key: SPARK-11258
> URL: https://issues.apache.org/jira/browse/SPARK-11258
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Frank Rosner
>Assignee: Frank Rosner
> Fix For: 1.6.0
>
>
> h4. Problem
> We tried to collect a DataFrame with > 1 million rows and a few hundred 
> columns in SparkR. This took a huge amount of time (much more than in the 
> Spark REPL). When looking into the code, I found that the 
> {{org.apache.spark.sql.api.r.SQLUtils.dfToCols}} method does some map and 
> then {{.toArray}} which might cause the problem.
> h4. Solution
> Directly transpose the row wise representation to the column wise 
> representation with one pass through the data. I will create a pull request 
> for this.
> h4. Runtime comparison
> On a test data frame with 1 million rows and 22 columns, the old {{dfToCols}} 
> method takes average 2267 ms to complete. My implementation takes only 554 ms 
> on average. This effect might be due to garbage collection, especially if you 
> consider that the old implementation didn't complete on an even bigger data 
> frame.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10979) SparkR: Add merge to DataFrame

2015-10-26 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-10979:
--
Assignee: Narine Kokhlikyan

> SparkR: Add merge to DataFrame
> --
>
> Key: SPARK-10979
> URL: https://issues.apache.org/jira/browse/SPARK-10979
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>Assignee: Narine Kokhlikyan
> Fix For: 1.6.0
>
>
> Add merge function to DataFrame, which supports R signature.
> https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10979) SparkR: Add merge to DataFrame

2015-10-26 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-10979.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 9012
[https://github.com/apache/spark/pull/9012]

> SparkR: Add merge to DataFrame
> --
>
> Key: SPARK-10979
> URL: https://issues.apache.org/jira/browse/SPARK-10979
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Narine Kokhlikyan
> Fix For: 1.6.0
>
>
> Add merge function to DataFrame, which supports R signature.
> https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 1271 matches

Mail list logo