[jira] [Commented] (SPARK-11046) Pass schema from R to JVM using JSON format

2015-12-07 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046004#comment-15046004
 ] 

Nakul Jindal commented on SPARK-11046:
--

[~shivaram], [~sunrui] - Is it ok to depend on / import the 
[jsonlite|https://cran.r-project.org/web/packages/jsonlite/index.html] package?


> Pass schema from R to JVM using JSON format
> ---
>
> Key: SPARK-11046
> URL: https://issues.apache.org/jira/browse/SPARK-11046
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Sun Rui
>Priority: Minor
>
> Currently, SparkR passes a DataFrame schema from R to JVM backend using 
> regular expression. However, Spark now supports schmea using JSON format.   
> So enhance SparkR to use schema in JSON format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11046) Pass schema from R to JVM using JSON format

2015-12-07 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046269#comment-15046269
 ] 

Nakul Jindal commented on SPARK-11046:
--

I am trying to understand the benefit of doing it using JSON as opposed to the 
format that it currently is in.

We have 3 cases:


Case 1 - Leave things the way they are.
Here is what we have currently:
Let us say, our type is 
array  >>

- The R function structField.character (in schema.R) is passed this exact string
- In turn it calls checkType to recursively validate the schema string
- The scala function SQLUtils.getSQLDataType (in SQLUtils.scala), recursively 
converts this to an object of type DataType

Case 2 - Expect the user to specify the input schema in JSON
If we converted the schema format to JSON, it would look like this:
{
  "type": "array",
  "elementType": {
"type": "map",
"keyType": "string",
"valueType": {
  "type": "struct",
  "fields": [{
"name": "a",
"type": "integer",
"nullable": true,
"metadata": {}
  }, {
"name": "b",
"type": "long",
"nullable": true,
"metadata": {}
  }, {
"name": "c",
"type": "string",
"nullable": true,
"metadata": {}
  }]
},
"valueContainsNull": false
  },
  "containsNull": true
}
(based on what DataType.fromJson expects).
which is placing way too much burden on the sparkR user.

- I am not entirely sure about this, but I think we do not want to or cannot 
(or simply haven't implemented) a way to communicate exceptions encountered in 
the scala code back to R.
- We'd need to write a way to validate the JSON schema in R code (or use a JSON 
parsing library to do it in some way).
- The code in SQLUtils.getSQLDataType will now be significantly reduced as we 
can just call DataType.fromJson.

Case 3 - Convert the schema to JSON in R code before calling the JVM function 
org.apache.spark.sql.api.r.SQLUtils.createStructField
- This is essentially moving the work done in SQLUtils.getSQLDataType to R 
code. This IMHO is significantly more complicated to write and maintain.

TLDR: At the cost of inconvenience to the sparkR user, we will convert 
specifying the schema from its current (IMHO - simple) form to JSON.

[~shivaram], [~sunrui] - Any thoughts?


> Pass schema from R to JVM using JSON format
> ---
>
> Key: SPARK-11046
> URL: https://issues.apache.org/jira/browse/SPARK-11046
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Sun Rui
>Priority: Minor
>
> Currently, SparkR passes a DataFrame schema from R to JVM backend using 
> regular expression. However, Spark now supports schmea using JSON format.   
> So enhance SparkR to use schema in JSON format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11046) Pass schema from R to JVM using JSON format

2015-12-04 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15042513#comment-15042513
 ] 

Nakul Jindal commented on SPARK-11046:
--

Hi, I am trying to look into this. 
When you say that SparkR passes a DataFrame schema from R to JVM backend using 
regular expression, do you mean this format:

map
or
array

Also, is "structField.character" the only function where this "regular 
expression" format is passed from R to JVM (using 
org.apache.spark.sql.api.r.SQLUtils", "createDF)?

> Pass schema from R to JVM using JSON format
> ---
>
> Key: SPARK-11046
> URL: https://issues.apache.org/jira/browse/SPARK-11046
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 1.5.1
>Reporter: Sun Rui
>Priority: Minor
>
> Currently, SparkR passes a DataFrame schema from R to JVM backend using 
> regular expression. However, Spark now supports schmea using JSON format.   
> So enhance SparkR to use schema in JSON format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11439) Optimization of creating sparse feature without dense one

2015-11-16 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007487#comment-15007487
 ] 

Nakul Jindal commented on SPARK-11439:
--

Thanks [~lewuathe].
I've also updated the comment in the LinearRegressionSuite.scala file with an R 
snippet to reproduce the results.

> Optimization of creating sparse feature without dense one
> -
>
> Key: SPARK-11439
> URL: https://issues.apache.org/jira/browse/SPARK-11439
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Kai Sasaki
>Priority: Minor
>
> Currently, sparse feature generated in {{LinearDataGenerator}} needs to 
> create dense vectors once. It is cost efficient to prevent from generating 
> dense feature when creating sparse features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11392) GroupedIterator's hasNext is not idempotent

2015-11-12 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003184#comment-15003184
 ] 

Nakul Jindal commented on SPARK-11392:
--

Sorry, it's been a while since I last worked on this. 

[~yhuai] - After looking at the code, I am not entirely clear on what you mean 
when you say
{quote}
If we call GroupedIterator's hasNext immediately after its next, we will 
generate an extra group (CoGroupedIterator has this behavior).
{quote}

The title however makes sense to me - about {{hasNext}} not being idempotent. 
Per my understanding {{hasNext}} in iterators should not be modifying the 
underlying iterator in general, but it does for GroupedIterator. 

I can think of two things we can do to make {{hasNext}} idempotent, both of 
which are less than ideal:

* Eagerly evaluate the GroupedIterator - This is probably not what we want to 
do. 
* Do the work done in {{fetchNextGroupIterator}} twice, specifically this loop: 
[L118-L120|https://github.com/apache/spark/blob/14d08b99085d4e609aeae0cf54d4584e860eb552/sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala#L118-L120]
{code}
while (input.hasNext && keyOrdering.compare(currentGroup, currentRow) == 0) {
currentRow = input.next()
}
{code}
Once for {{hasNext}} and one for {{next}}. This obviously introduces some 
inefficiency.

*Thoughts?*


> GroupedIterator's hasNext is not idempotent
> ---
>
> Key: SPARK-11392
> URL: https://issues.apache.org/jira/browse/SPARK-11392
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> If we call 
> [GroupedIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala]'s
>  {{hasNext}} immediately after its {{next}}, we will generate an extra group 
> ([CoGroupedIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CoGroupedIterator.scala]
>  has this behavior). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11439) Optimization of creating sparse feature without dense one

2015-11-06 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994861#comment-14994861
 ] 

Nakul Jindal commented on SPARK-11439:
--

This is the piece of R code that is used as reference for the test :

predictions <- predict(fit, newx=features)
residuals <- label - predictions
mean(residuals^2) # MSE 
mean(abs(residuals)) # MAD
cor(predictions, label)^2# r^2

How do I create the "fit" object?

NOTE : I have no experience with R and have scrounged whatever little knowledge 
I could get by asking around and from the internet.

I tried this:

In a Spark REPL:
import org.apache.spark.mllib.util.LinearDataGenerator
val data = sc.parallelize(LinearDataGenerator.generateLinearInput(6.3, 
Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 1, 42, 0.1), 2)
data.map(x=> x.label + ", " + x.features(0) + ", " + 
x.features(1)).coalesce(1).saveAsTextFile("path")

Then, in an R Shell:
library("glmnet")
d1 <- read.csv("path/part-0", header=FALSE, stringsAsFactors=FALSE)
features <- as.matrix(data.frame(as.numeric(d1$V2), as.numeric(d1$V3)))
label <- as.numeric(d1$V1)
fit <- glmnet(features, label, family="gaussian", alpha = 0, lambda = 0)

I then used this fit object in the earlier snippet of R code. The results were 
too way off.
> mean(residuals^2)
[1] 10885.15
> 
> mean(abs(residuals))
[1] 103.959
> 
>  cor(predictions, label)^2
[,1]
s0 0.9998749


So, I guess, that is not how you create the "fit" object.

How do you create the "fit" object?

 


> Optimization of creating sparse feature without dense one
> -
>
> Key: SPARK-11439
> URL: https://issues.apache.org/jira/browse/SPARK-11439
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Kai Sasaki
>Priority: Minor
>
> Currently, sparse feature generated in {{LinearDataGenerator}} needs to 
> create dense vectors once. It is cost efficient to prevent from generating 
> dense feature when creating sparse features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11439) Optiomization of creating sparse feature without dense one

2015-11-05 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14992828#comment-14992828
 ] 

Nakul Jindal commented on SPARK-11439:
--

I seem to be running into a problem.

1. 
[This|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala#L124-L165]
 is the current implementation.
2. [This|https://gist.github.com/nakul02/9341a9ed67cd192d98df] is the 
implementation that I tried first (and it passed all tests).
3. [This|https://gist.github.com/nakul02/4f5392c7d5997871da7b] is an improved 
implementation that doesn't form the "x" array, but it fails tests in suites -
* org.apache.spark.ml.regression.LinearRegressionSuite
* org.apache.spark.ml.evaluation.RegressionEvaluatorSuite

The difference between 2 and 3 is the way in which the random number generator 
is used. Could this possibly cause the tests to fail? Maybe I am doing 
something obviously stupid here. 
This is frustrating and any insight would help!



> Optiomization of creating sparse feature without dense one
> --
>
> Key: SPARK-11439
> URL: https://issues.apache.org/jira/browse/SPARK-11439
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Kai Sasaki
>Priority: Minor
>
> Currently, sparse feature generated in {{LinearDataGenerator}} needs to 
> create dense vectors once. It is cost efficient to prevent from generating 
> dense feature when creating sparse features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11439) Optimization of creating sparse feature without dense one

2015-11-05 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993281#comment-14993281
 ] 

Nakul Jindal commented on SPARK-11439:
--

[info] - linear regression model training summary *** FAILED *** (966 
milliseconds)
[info]   Expected 0.009955579236410212 and 0.00972035 to be within 1.0E-5 using 
relative tolerance. (TestingUtils.scala:78)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at 
org.apache.spark.mllib.util.TestingUtils$DoubleWithAlmostEquals.$tilde$eq$eq(TestingUtils.scala:78)
[info]   at 
org.apache.spark.ml.regression.LinearRegressionSuite$$anonfun$11$$anonfun$apply$mcV$sp$9.apply(LinearRegressionSuite.scala:606)
[info]   at 
org.apache.spark.ml.regression.LinearRegressionSuite$$anonfun$11$$anonfun$apply$mcV$sp$9.apply(LinearRegressionSuite.scala:559)
.



> Optimization of creating sparse feature without dense one
> -
>
> Key: SPARK-11439
> URL: https://issues.apache.org/jira/browse/SPARK-11439
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Kai Sasaki
>Priority: Minor
>
> Currently, sparse feature generated in {{LinearDataGenerator}} needs to 
> create dense vectors once. It is cost efficient to prevent from generating 
> dense feature when creating sparse features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11439) Optiomization of creating sparse feature without dense one

2015-11-04 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990273#comment-14990273
 ] 

Nakul Jindal commented on SPARK-11439:
--

Yes, this sounds good. Also, for the sake of uniformity, it would make sense to 
convert the other blas.ddot call to the one from BLAS.scala. 

> Optiomization of creating sparse feature without dense one
> --
>
> Key: SPARK-11439
> URL: https://issues.apache.org/jira/browse/SPARK-11439
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Kai Sasaki
>Priority: Minor
>
> Currently, sparse feature generated in {{LinearDataGenerator}} needs to 
> create dense vectors once. It is cost efficient to prevent from generating 
> dense feature when creating sparse features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11439) Optiomization of creating sparse feature without dense one

2015-11-02 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985890#comment-14985890
 ] 

Nakul Jindal commented on SPARK-11439:
--

I will work on this.

> Optiomization of creating sparse feature without dense one
> --
>
> Key: SPARK-11439
> URL: https://issues.apache.org/jira/browse/SPARK-11439
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Kai Sasaki
>Priority: Minor
>
> Currently, sparse feature generated in {{LinearDataGenerator}} needs to 
> create dense vectors once. It is cost efficient to prevent from generating 
> dense feature when creating sparse features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11439) Optiomization of creating sparse feature without dense one

2015-11-02 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986684#comment-14986684
 ] 

Nakul Jindal commented on SPARK-11439:
--

[~holdenk] [~lewuathe] - A couple of places where there could be work savings :

1. 
[L144|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala#L144]
 - Here is where sparsity data is first populated. The index array and values 
array can be maintained and populated at this line. The problem is that this 
won't sit well with blas.ddot at line 
[L153|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala#L153].
 Either a new weights array would need to be created or the ddot function would 
need to be rewritten.

2. 
[L162|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala#L162]
 - If done here, we would essentially be doing what toSparse does internally. 

Either of these cases don't make sense to me.  
Suggestions on what direction to take?

> Optiomization of creating sparse feature without dense one
> --
>
> Key: SPARK-11439
> URL: https://issues.apache.org/jira/browse/SPARK-11439
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Kai Sasaki
>Priority: Minor
>
> Currently, sparse feature generated in {{LinearDataGenerator}} needs to 
> create dense vectors once. It is cost efficient to prevent from generating 
> dense feature when creating sparse features.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11392) GroupedIterator's hasNext is not idempotent

2015-10-30 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983487#comment-14983487
 ] 

Nakul Jindal commented on SPARK-11392:
--

[~yhuai], [~cloud_fan] - SPARK-11393 works around the problem the mentioned in 
this JIRA. Would we need to revert back the changes made by the associates PR 
if this JIRA were to be resolved?

> GroupedIterator's hasNext is not idempotent
> ---
>
> Key: SPARK-11392
> URL: https://issues.apache.org/jira/browse/SPARK-11392
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> If we call 
> [GroupedIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala]'s
>  {{hasNext}} immediately after its {{next}}, we will generate an extra group 
> ([CoGroupedIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CoGroupedIterator.scala]
>  has this behavior). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11392) GroupedIterator's hasNext is not idempotent

2015-10-30 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982949#comment-14982949
 ] 

Nakul Jindal commented on SPARK-11392:
--

I will work on this

> GroupedIterator's hasNext is not idempotent
> ---
>
> Key: SPARK-11392
> URL: https://issues.apache.org/jira/browse/SPARK-11392
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>
> If we call 
> [GroupedIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala]'s
>  {{hasNext}} immediately after its {{next}}, we will generate an extra group 
> ([CoGroupedIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CoGroupedIterator.scala]
>  has this behavior). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11385) Add foreach API to MLLib's vector API

2015-10-28 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978991#comment-14978991
 ] 

Nakul Jindal commented on SPARK-11385:
--

I'll be working on this.

> Add foreach API to MLLib's vector API
> -
>
> Key: SPARK-11385
> URL: https://issues.apache.org/jira/browse/SPARK-11385
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: holdenk
>Priority: Minor
>
> Add a foreach API to MLLib's vector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11386) Refactor appropriate uses of Vector to use the new foreach API

2015-10-28 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14978980#comment-14978980
 ] 

Nakul Jindal commented on SPARK-11386:
--

I'll be working on this.

> Refactor appropriate uses of Vector to use the new foreach API
> --
>
> Key: SPARK-11386
> URL: https://issues.apache.org/jira/browse/SPARK-11386
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: holdenk
>Priority: Minor
>
> Once SPARK-11385 - Add foreach API to MLLib's vector API  is in look for 
> places where it should be used internally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11332) WeightedLeastSquares should use ml features generic Instance class instead of private

2015-10-26 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975436#comment-14975436
 ] 

Nakul Jindal commented on SPARK-11332:
--

I'll be working on this.

> WeightedLeastSquares should use ml features generic Instance class instead of 
> private
> -
>
> Key: SPARK-11332
> URL: https://issues.apache.org/jira/browse/SPARK-11332
> Project: Spark
>  Issue Type: Improvement
>Reporter: holdenk
>Priority: Minor
>
> WeightedLeastSquares should use the common Instance class in ml.feature 
> instead of a private one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10436) spark-submit overwrites spark.files defaults with the job script filename

2015-10-02 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941225#comment-14941225
 ] 

Nakul Jindal edited comment on SPARK-10436 at 10/2/15 3:01 PM:
---

I am new to Spark and will be working on this.


was (Author: nakul02):
I am new to Spark and will take a look at it too.

> spark-submit overwrites spark.files defaults with the job script filename
> -
>
> Key: SPARK-10436
> URL: https://issues.apache.org/jira/browse/SPARK-10436
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.4.0
> Environment: Ubuntu, Spark 1.4.0 Standalone
>Reporter: axel dahl
>Priority: Minor
>  Labels: easyfix, feature
>
> In my spark-defaults.conf I have configured a set of libararies to be 
> uploaded to my Spark 1.4.0 Standalone cluster.  The entry appears as:
> spark.files  libarary.zip,file1.py,file2.py
> When I execute spark-submit -v test.py
> I see that spark-submit reads the defaults correctly, but that it overwrites 
> the "spark.files" default entry and replaces it with the name if the job 
> script, i.e. "test.py".
> This behavior doesn't seem intuitive.  test.py, should be added to the spark 
> working folder, but it should not overwrite the "spark.files" defaults.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10436) spark-submit overwrites spark.files defaults with the job script filename

2015-10-02 Thread Nakul Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941225#comment-14941225
 ] 

Nakul Jindal commented on SPARK-10436:
--

I am new to Spark and will take a look at it too.

> spark-submit overwrites spark.files defaults with the job script filename
> -
>
> Key: SPARK-10436
> URL: https://issues.apache.org/jira/browse/SPARK-10436
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.4.0
> Environment: Ubuntu, Spark 1.4.0 Standalone
>Reporter: axel dahl
>Priority: Minor
>  Labels: easyfix, feature
>
> In my spark-defaults.conf I have configured a set of libararies to be 
> uploaded to my Spark 1.4.0 Standalone cluster.  The entry appears as:
> spark.files  libarary.zip,file1.py,file2.py
> When I execute spark-submit -v test.py
> I see that spark-submit reads the defaults correctly, but that it overwrites 
> the "spark.files" default entry and replaces it with the name if the job 
> script, i.e. "test.py".
> This behavior doesn't seem intuitive.  test.py, should be added to the spark 
> working folder, but it should not overwrite the "spark.files" defaults.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org