[jira] [Commented] (SPARK-12804) ml.classification.LogisticRegression fails when FitIntercept with same-label dataset

2016-01-17 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103742#comment-15103742
 ] 

Feynman Liang commented on SPARK-12804:
---

[~josephkb] the two issues are slightly different; SPARK-12732 addresses the 
case where fitIntercept=false and all labels are the same, which LiR currently 
treats the same as if fitIntercept=true. I'll make sure that my fix doesn't 
introduce that bug.

> ml.classification.LogisticRegression fails when FitIntercept with same-label 
> dataset
> 
>
> Key: SPARK-12804
> URL: https://issues.apache.org/jira/browse/SPARK-12804
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.6.0
>Reporter: Feynman Liang
>Assignee: Feynman Liang
>
> When training LogisticRegression on a dataset where the label is all 0 or all 
> 1, an array out of bounds exception is thrown. The problematic code is
> {code}
>   initialCoefficientsWithIntercept.toArray(numFeatures)
> = math.log(histogram(1) / histogram(0))
> }
> {code}
> The correct behaviour is to short-circuit training entirely when only a 
> single label is present (can be detected from {{labelSummarizer}}) and return 
> a classifier which assigns all true/false with infinite weights.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12810) PySpark CrossValidatorModel should support avgMetrics

2016-01-13 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-12810:
-

 Summary: PySpark CrossValidatorModel should support avgMetrics
 Key: SPARK-12810
 URL: https://issues.apache.org/jira/browse/SPARK-12810
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Reporter: Feynman Liang


The {CrossValidator} in Scala supports {avgMetrics} since 1.5.0, which allows 
the user to evaluate how well each {ParamMap} in the grid search performed and 
identify the best parameters. We should support this in PySpark as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12806) Support SQL expressions extracting values from VectorUDT

2016-01-13 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-12806:
--
Description: 
Use cases exist where a specific index within a {{VectorUDT} column of a 
{{DataFrame}} is required. For example, we may be interested in extracting a 
specific class probability from the {{probabilityCol}} of a 
{{LogisticRegression}} to compute losses. However, if {{probability}} is a 
column of {{df}} with type {{VectorUDT}}, the following code fails:

{code}
df.select("probability.0")

AnalysisException: u"Can't extract value from probability"
{code}

thrown from 
{{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala}}.

{{VectorUDT}} essentially wraps a {{StructType}}, hence one would expect it to 
support value extraction Expressions in an analogous way.

  was:
Use cases exist where a specific index within a {VectorUDT} column of a 
{DataFrame} is required. For example, we may be interested in extracting a 
specific class probability from the {probabilityCol} of a {LogisticRegression} 
to compute losses. However, if {probability} is a column of {df} with type 
{VectorUDT}, the following code fails:

{code}
df.select("probability.0")

AnalysisException: u"Can't extract value from probability"
{code}

thrown from 
{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala}.

{VectorUDT} essentially wraps a {StructType}, hence one would expect it to 
support value extraction Expressions in an analogous way.


> Support SQL expressions extracting values from VectorUDT
> 
>
> Key: SPARK-12806
> URL: https://issues.apache.org/jira/browse/SPARK-12806
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, SQL
>Affects Versions: 1.6.0
>Reporter: Feynman Liang
>
> Use cases exist where a specific index within a {{VectorUDT} column of a 
> {{DataFrame}} is required. For example, we may be interested in extracting a 
> specific class probability from the {{probabilityCol}} of a 
> {{LogisticRegression}} to compute losses. However, if {{probability}} is a 
> column of {{df}} with type {{VectorUDT}}, the following code fails:
> {code}
> df.select("probability.0")
> AnalysisException: u"Can't extract value from probability"
> {code}
> thrown from 
> {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala}}.
> {{VectorUDT}} essentially wraps a {{StructType}}, hence one would expect it 
> to support value extraction Expressions in an analogous way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12806) Support SQL expressions extracting values from VectorUDT

2016-01-13 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-12806:
-

 Summary: Support SQL expressions extracting values from VectorUDT
 Key: SPARK-12806
 URL: https://issues.apache.org/jira/browse/SPARK-12806
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, SQL
Affects Versions: 1.6.0
Reporter: Feynman Liang


Use cases exist where a specific index within a {VectorUDT} column of a 
{DataFrame} is required. For example, we may be interested in extracting a 
specific class probability from the {probabilityCol} of a {LogisticRegression} 
to compute losses. However, if {probability} is a column of {df} with type 
{VectorUDT}, the following code fails:

{code}
df.select("probability.0")

AnalysisException: u"Can't extract value from probability"
{code}

thrown from 
{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala}.

{VectorUDT} essentially wraps a {StructType}, hence one would expect it to 
support value extraction Expressions in an analogous way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12804) ml.classification.LogisticRegression fails when FitIntercept with same-label dataset

2016-01-13 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-12804:
--
Description: 
When training LogisticRegression on a dataset where the label is all 0 or all 
1, an array out of bounds exception is thrown. The problematic code is

{code:scala}
  initialCoefficientsWithIntercept.toArray(numFeatures)
= math.log(histogram(1) / histogram(0))
}
{code}

The correct behaviour is to short-circuit training entirely when only a single 
label is present (can be detected from {{labelSummarizer}}) and return a 
classifier which assigns all true/false with infinite weights.

  was:
When training LogisticRegression on a dataset where the label is all 0 or all 
1, an array out of bounds exception is thrown. The problematic code is

{code}
  initialCoefficientsWithIntercept.toArray(numFeatures)
= math.log(histogram(1) / histogram(0))
}
{/code}

The correct behaviour is to short-circuit training entirely when only a single 
label is present (can be detected from {{labelSummarizer}}) and return a 
classifier which assigns all true/false with infinite weights.


> ml.classification.LogisticRegression fails when FitIntercept with same-label 
> dataset
> 
>
> Key: SPARK-12804
> URL: https://issues.apache.org/jira/browse/SPARK-12804
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.6.0
>Reporter: Feynman Liang
>
> When training LogisticRegression on a dataset where the label is all 0 or all 
> 1, an array out of bounds exception is thrown. The problematic code is
> {code:scala}
>   initialCoefficientsWithIntercept.toArray(numFeatures)
> = math.log(histogram(1) / histogram(0))
> }
> {code}
> The correct behaviour is to short-circuit training entirely when only a 
> single label is present (can be detected from {{labelSummarizer}}) and return 
> a classifier which assigns all true/false with infinite weights.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12804) ml.classification.LogisticRegression fails when FitIntercept with same-label dataset

2016-01-13 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-12804:
--
Description: 
When training LogisticRegression on a dataset where the label is all 0 or all 
1, an array out of bounds exception is thrown. The problematic code is

{code}
  initialCoefficientsWithIntercept.toArray(numFeatures)
= math.log(histogram(1) / histogram(0))
}
{code}

The correct behaviour is to short-circuit training entirely when only a single 
label is present (can be detected from {{labelSummarizer}}) and return a 
classifier which assigns all true/false with infinite weights.

  was:
When training LogisticRegression on a dataset where the label is all 0 or all 
1, an array out of bounds exception is thrown. The problematic code is

{code:scala}
  initialCoefficientsWithIntercept.toArray(numFeatures)
= math.log(histogram(1) / histogram(0))
}
{code}

The correct behaviour is to short-circuit training entirely when only a single 
label is present (can be detected from {{labelSummarizer}}) and return a 
classifier which assigns all true/false with infinite weights.


> ml.classification.LogisticRegression fails when FitIntercept with same-label 
> dataset
> 
>
> Key: SPARK-12804
> URL: https://issues.apache.org/jira/browse/SPARK-12804
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.6.0
>Reporter: Feynman Liang
>
> When training LogisticRegression on a dataset where the label is all 0 or all 
> 1, an array out of bounds exception is thrown. The problematic code is
> {code}
>   initialCoefficientsWithIntercept.toArray(numFeatures)
> = math.log(histogram(1) / histogram(0))
> }
> {code}
> The correct behaviour is to short-circuit training entirely when only a 
> single label is present (can be detected from {{labelSummarizer}}) and return 
> a classifier which assigns all true/false with infinite weights.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12804) ml.classification.LogisticRegression fails when FitIntercept with same-label dataset

2016-01-13 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-12804:
-

 Summary: ml.classification.LogisticRegression fails when 
FitIntercept with same-label dataset
 Key: SPARK-12804
 URL: https://issues.apache.org/jira/browse/SPARK-12804
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 1.6.0
Reporter: Feynman Liang


When training LogisticRegression on a dataset where the label is all 0 or all 
1, an array out of bounds exception is thrown. The problematic code is

{code}
  initialCoefficientsWithIntercept.toArray(numFeatures)
= math.log(histogram(1) / histogram(0))
}
{/code}

The correct behaviour is to short-circuit training entirely when only a single 
label is present (can be detected from {{labelSummarizer}}) and return a 
classifier which assigns all true/false with infinite weights.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12804) ml.classification.LogisticRegression fails when FitIntercept with same-label dataset

2016-01-13 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096064#comment-15096064
 ] 

Feynman Liang commented on SPARK-12804:
---

Please assign to me

> ml.classification.LogisticRegression fails when FitIntercept with same-label 
> dataset
> 
>
> Key: SPARK-12804
> URL: https://issues.apache.org/jira/browse/SPARK-12804
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.6.0
>Reporter: Feynman Liang
>
> When training LogisticRegression on a dataset where the label is all 0 or all 
> 1, an array out of bounds exception is thrown. The problematic code is
> {code}
>   initialCoefficientsWithIntercept.toArray(numFeatures)
> = math.log(histogram(1) / histogram(0))
> }
> {/code}
> The correct behaviour is to short-circuit training entirely when only a 
> single label is present (can be detected from {{labelSummarizer}}) and return 
> a classifier which assigns all true/false with infinite weights.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-12779) StringIndexer should handle null

2016-01-12 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang closed SPARK-12779.
-
Resolution: Duplicate

> StringIndexer should handle null
> 
>
> Key: SPARK-12779
> URL: https://issues.apache.org/jira/browse/SPARK-12779
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.6.0
>Reporter: Feynman Liang
>
> StringIndexer currently fails with a {{NullPointerException}} when indexing a 
> column containing {{null}}s. It should instead index all {{null}}s into some 
> sentinel value (say -1).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12779) StringIndexer should handle null

2016-01-12 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095057#comment-15095057
 ] 

Feynman Liang commented on SPARK-12779:
---

Yep you're right, thanks!

> StringIndexer should handle null
> 
>
> Key: SPARK-12779
> URL: https://issues.apache.org/jira/browse/SPARK-12779
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.6.0
>Reporter: Feynman Liang
>
> StringIndexer currently fails with a {{NullPointerException}} when indexing a 
> column containing {{null}}s. It should instead index all {{null}}s into some 
> sentinel value (say -1).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12779) StringIndexer should handle null

2016-01-12 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-12779:
-

 Summary: StringIndexer should handle null
 Key: SPARK-12779
 URL: https://issues.apache.org/jira/browse/SPARK-12779
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 1.6.0
Reporter: Feynman Liang


StringIndexer currently fails with a {{NullPointerException}} when indexing a 
column containing {{null}}s. It should instead index all {{null}}s into some 
sentinel value (say -1).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11960) User guide section for streaming a/b testing

2015-11-24 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025593#comment-15025593
 ] 

Feynman Liang commented on SPARK-11960:
---

[~josephkb] happy to work on it, when is the 1.6 QA deadline?

> User guide section for streaming a/b testing
> 
>
> Key: SPARK-11960
> URL: https://issues.apache.org/jira/browse/SPARK-11960
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Feynman Liang
>
> [~fliang] Assigning since you added the feature.  Will you have a chance to 
> do this soon?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9798) CrossValidatorModel Documentation Improvements

2015-09-22 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903657#comment-14903657
 ] 

Feynman Liang commented on SPARK-9798:
--

The actual scala doc

> CrossValidatorModel Documentation Improvements
> --
>
> Key: SPARK-9798
> URL: https://issues.apache.org/jira/browse/SPARK-9798
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Feynman Liang
>Priority: Minor
>  Labels: starter
>
> CrossValidatorModel's avgMetrics and bestModel need documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10691) Make LogisticRegressionModel's evaluate method public

2015-09-18 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14875959#comment-14875959
 ] 

Feynman Liang commented on SPARK-10691:
---

Also, +1 for calling it "evaluate".

> Make LogisticRegressionModel's evaluate method public
> -
>
> Key: SPARK-10691
> URL: https://issues.apache.org/jira/browse/SPARK-10691
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.5.0
>Reporter: Hao Ren
>
> The following method in {{LogisticRegressionModel}} is marked as {{private}}, 
> which prevents users from creating a summary on any given data set. Check 
> [here|https://github.com/feynmanliang/spark/blob/d219fa4c216e8f35b71a26921561104d15cd6055/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L272].
> {code}
> // TODO: decide on a good name before exposing to public API
> private[classification] def evaluate(dataset: DataFrame)
> : LogisticRegressionSummary = {
> new BinaryLogisticRegressionSummary(
> this.transform(dataset), 
> $(probabilityCol), 
> $(labelCol))
> }
> {code}
> This method is definitely necessary to test model performance.
> By the way, the name {{evaluate}} is already pretty good for me.
> [~mengxr] Could you check this ? Thx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10691) Make LogisticRegressionModel's evaluate method public

2015-09-18 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14875956#comment-14875956
 ] 

Feynman Liang commented on SPARK-10691:
---

We should also create one for linear regression (and link the two issues)

> Make LogisticRegressionModel's evaluate method public
> -
>
> Key: SPARK-10691
> URL: https://issues.apache.org/jira/browse/SPARK-10691
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 1.5.0
>Reporter: Hao Ren
>
> The following method in {{LogisticRegressionModel}} is marked as {{private}}, 
> which prevents users from creating a summary on any given data set. Check 
> [here|https://github.com/feynmanliang/spark/blob/d219fa4c216e8f35b71a26921561104d15cd6055/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L272].
> {code}
> // TODO: decide on a good name before exposing to public API
> private[classification] def evaluate(dataset: DataFrame)
> : LogisticRegressionSummary = {
> new BinaryLogisticRegressionSummary(
> this.transform(dataset), 
> $(probabilityCol), 
> $(labelCol))
> }
> {code}
> This method is definitely necessary to test model performance.
> By the way, the name {{evaluate}} is already pretty good for me.
> [~mengxr] Could you check this ? Thx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10583) Correctness test for Multilayer Perceptron using Weka Reference

2015-09-13 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10583:
-

 Summary: Correctness test for Multilayer Perceptron using Weka 
Reference
 Key: SPARK-10583
 URL: https://issues.apache.org/jira/browse/SPARK-10583
 Project: Spark
  Issue Type: Bug
  Components: ML
Reporter: Feynman Liang


SPARK-9471 adds MLP and a [TODO 
item|https://github.com/apache/spark/blob/6add4eddb39e7748a87da3e921ea3c7881d30a82/mllib/src/test/scala/org/apache/spark/ml/ann/ANNSuite.scala#L28]
 to create a test checking implementation's learned weights against Weka's MLP 
implementation.

We need to add this as a unit test. The work should include the reference Weka 
code ran as a comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9668) ML 1.5 QA: Docs: Check for new APIs

2015-09-11 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741237#comment-14741237
 ] 

Feynman Liang commented on SPARK-9668:
--

Yep, thanks!

> ML 1.5 QA: Docs: Check for new APIs
> ---
>
> Key: SPARK-9668
> URL: https://issues.apache.org/jira/browse/SPARK-9668
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Feynman Liang
> Fix For: 1.5.0
>
>
> Check the user guide vs. a list of new APIs (classes, methods, data members) 
> to see what items require updates to the user guide.
> For each feature missing user guide doc:
> * Create a JIRA for that feature, and assign it to the author of the feature
> * Link it to (a) the original JIRA which introduced that feature ("related 
> to") and (b) to this JIRA ("requires").
> Note: Now that we have algorithms in spark.ml which are not in spark.mllib, 
> we should make subsections for the spark.ml API as needed. We can follow the 
> structure of the spark.mllib user guide.
> * The spark.ml user guide can provide: (a) code examples and (b) info on 
> algorithms which do not exist in spark.mllib.
> * We should not duplicate info in the spark.ml guides. Since spark.mllib is 
> still the primary API, we should provide links to the corresponding 
> algorithms in the spark.mllib user guide for more info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10489) GraphX dataframe wrapper

2015-09-10 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735206#comment-14735206
 ] 

Feynman Liang edited comment on SPARK-10489 at 9/10/15 11:52 PM:
-

Doing this in a separate spark package


was (Author: fliang):
Doing this in a separate spark package 
(https://github.com/databricks/spark-df-graph)

> GraphX dataframe wrapper
> 
>
> Key: SPARK-10489
> URL: https://issues.apache.org/jira/browse/SPARK-10489
> Project: Spark
>  Issue Type: New Feature
>  Components: GraphX
>Reporter: Feynman Liang
>
> We want to wrap GraphX Graph using DataFrames and implement basic high-level 
> algorithms like PageRank. Then we can easily implement Python API, 
> import/export, and other features.
> {code}
> val graph = new GraphF(vDF, eDF)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-10489) GraphX dataframe wrapper

2015-09-08 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang closed SPARK-10489.
-
Resolution: Won't Fix

Doing this in a separate spark package 
(https://github.com/databricks/spark-df-graph)

> GraphX dataframe wrapper
> 
>
> Key: SPARK-10489
> URL: https://issues.apache.org/jira/browse/SPARK-10489
> Project: Spark
>  Issue Type: New Feature
>  Components: GraphX
>Reporter: Feynman Liang
>
> We want to wrap GraphX Graph using DataFrames and implement basic high-level 
> algorithms like PageRank. Then we can easily implement Python API, 
> import/export, and other features.
> {code}
> val graph = new GraphF(vDF, eDF)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10489) GraphX dataframe wrapper

2015-09-08 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10489:
-

 Summary: GraphX dataframe wrapper
 Key: SPARK-10489
 URL: https://issues.apache.org/jira/browse/SPARK-10489
 Project: Spark
  Issue Type: New Feature
  Components: GraphX
Reporter: Feynman Liang


We want to wrap GraphX Graph using DataFrames and implement basic high-level 
algorithms like PageRank. Then we can easily implement Python API, 
import/export, and other features.

{code}
val graph = new GraphF(vDF, eDF)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10489) GraphX dataframe wrapper

2015-09-08 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735202#comment-14735202
 ] 

Feynman Liang commented on SPARK-10489:
---

Working on this

> GraphX dataframe wrapper
> 
>
> Key: SPARK-10489
> URL: https://issues.apache.org/jira/browse/SPARK-10489
> Project: Spark
>  Issue Type: New Feature
>  Components: GraphX
>Reporter: Feynman Liang
>
> We want to wrap GraphX Graph using DataFrames and implement basic high-level 
> algorithms like PageRank. Then we can easily implement Python API, 
> import/export, and other features.
> {code}
> val graph = new GraphF(vDF, eDF)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10479) LogisticRegression copy should copy model summary if available

2015-09-08 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735053#comment-14735053
 ] 

Feynman Liang commented on SPARK-10479:
---

[~lravindr] Sorry, I didn't know that someone was already working on this. My 
apologies for any work you may have already done

> LogisticRegression copy should copy model summary if available
> --
>
> Key: SPARK-10479
> URL: https://issues.apache.org/jira/browse/SPARK-10479
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Reporter: Feynman Liang
>Assignee: Yanbo Liang
>Priority: Minor
>  Labels: starter
>
> SPARK-9112 adds LogisticRegressionSummary but [does not copy the model 
> summary if 
> available|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L471]
> We should add behavior similar to that in 
> [LinearRegression.copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L314]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10479) LogisticRegression copy should copy model summary if available

2015-09-07 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734146#comment-14734146
 ] 

Feynman Liang commented on SPARK-10479:
---

Thanks for your help! Please go head and work on it; your comment should be 
enough to let others know. Only committers can assign issues on JIRA.

> LogisticRegression copy should copy model summary if available
> --
>
> Key: SPARK-10479
> URL: https://issues.apache.org/jira/browse/SPARK-10479
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Reporter: Feynman Liang
>Priority: Minor
>  Labels: starter
>
> SPARK-9112 adds LogisticRegressionSummary but [does not copy the model 
> summary if 
> available|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L471]
> We should add behavior similar to that in 
> [LinearRegression.copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L314]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10479) LogisticRegression copy should copy model summary if available

2015-09-07 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10479:
-

 Summary: LogisticRegression copy should copy model summary if 
available
 Key: SPARK-10479
 URL: https://issues.apache.org/jira/browse/SPARK-10479
 Project: Spark
  Issue Type: Bug
  Components: ML
Reporter: Feynman Liang
Priority: Minor


SPARK-9112 adds LogisticRegressionSummary but does not update 
[copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L471]
 to copy the model summary if available.

We should add behavior similar to that in 
[LinearRegression.copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L314]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10479) LogisticRegression copy should copy model summary if available

2015-09-07 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10479:
--
Description: 
SPARK-9112 adds LogisticRegressionSummary but [does not copy the model summary 
if 
available|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L471]

We should add behavior similar to that in 
[LinearRegression.copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L314]

  was:
SPARK-9112 adds LogisticRegressionSummary but does not update 
[copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L471]
 to copy the model summary if available.

We should add behavior similar to that in 
[LinearRegression.copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L314]


> LogisticRegression copy should copy model summary if available
> --
>
> Key: SPARK-10479
> URL: https://issues.apache.org/jira/browse/SPARK-10479
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Reporter: Feynman Liang
>Priority: Minor
>  Labels: starter
>
> SPARK-9112 adds LogisticRegressionSummary but [does not copy the model 
> summary if 
> available|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L471]
> We should add behavior similar to that in 
> [LinearRegression.copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L314]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10478) Improve spark.ml.ann implementations for MLP

2015-09-07 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10478:
-

 Summary: Improve spark.ml.ann implementations for MLP
 Key: SPARK-10478
 URL: https://issues.apache.org/jira/browse/SPARK-10478
 Project: Spark
  Issue Type: Bug
  Components: ML
Reporter: Feynman Liang
Priority: Critical


SPARK-9471 adds an implementation of multi-layer perceptrons. However, there 
are a few issues with the current code that should be addressed, namely:

* Style guide: 4 indent method arguments, braces around one-line {{if}}s, 
punctuation in scaladocs
* Comments: cryptic variable names (e.g. {{gw}}, {{gb}}, {{gwb}}) should be 
documented
* Performance: parts of the code can be improved using Breeze broadcasts, 
vectorized operations, and Breeze UFuncs to avoid manual iteration and leverage 
BLAS optimization



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10199) Avoid using reflections for parquet model save

2015-08-31 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724138#comment-14724138
 ] 

Feynman Liang commented on SPARK-10199:
---

CC [~mengxr] [~josephkb]

> Avoid using reflections for parquet model save
> --
>
> Key: SPARK-10199
> URL: https://issues.apache.org/jira/browse/SPARK-10199
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> These items are not high priority since the overhead writing to Parquest is 
> much greater than for runtime reflections.
> Multiple model save/load in MLlib use case classes to infer a schema for the 
> data frame saved to Parquet. However, inferring a schema from case classes or 
> tuples uses [runtime 
> reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361]
>  which is unnecessary since the types are already known at the time `save` is 
> called.
> It would be better to just specify the schema for the data frame directly 
> using {{sqlContext.createDataFrame(dataRDD, schema)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7454) Perf test for power iteration clustering (PIC)

2015-08-31 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723666#comment-14723666
 ] 

Feynman Liang commented on SPARK-7454:
--

[~mengxr] [~josephkb] can we close this since PR 86 was merged?

> Perf test for power iteration clustering (PIC)
> --
>
> Key: SPARK-7454
> URL: https://issues.apache.org/jira/browse/SPARK-7454
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10199) Avoid using reflections for parquet model save

2015-08-30 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721907#comment-14721907
 ] 

Feynman Liang commented on SPARK-10199:
---

[~vinodkc] Thanks! I think these results are convincing. Let's see what others 
think but FWIW I'm all for these changes, particularly because it sets 
precedence for future model save/load to explicitly specify the schema.

> Avoid using reflections for parquet model save
> --
>
> Key: SPARK-10199
> URL: https://issues.apache.org/jira/browse/SPARK-10199
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> These items are not high priority since the overhead writing to Parquest is 
> much greater than for runtime reflections.
> Multiple model save/load in MLlib use case classes to infer a schema for the 
> data frame saved to Parquet. However, inferring a schema from case classes or 
> tuples uses [runtime 
> reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361]
>  which is unnecessary since the types are already known at the time `save` is 
> called.
> It would be better to just specify the schema for the data frame directly 
> using {{sqlContext.createDataFrame(dataRDD, schema)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10351:
--
Summary: UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow  
(was: UnsafeRow.getString should handle off-heap backed UnsafeRow)

> UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow
> ---
>
> Key: SPARK-10351
> URL: https://issues.apache.org/jira/browse/SPARK-10351
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>Priority: Critical
>
> {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
> returns {{null}} when passed a {{null}} base object, failing to handle 
> off-heap backed {{UnsafeRow}}s correctly.
> This will also cause a {{NullPointerException}} when {{getString}} is called 
> with off-heap storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10351) UnsafeRow.getString should handle off-heap backed UnsafeRow

2015-08-29 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721325#comment-14721325
 ] 

Feynman Liang commented on SPARK-10351:
---

Sorry, the fix is for {{getUTF8String}}. {{getString}} is the method which 
causes the {{NullPointerException}}. Updated title.

> UnsafeRow.getString should handle off-heap backed UnsafeRow
> ---
>
> Key: SPARK-10351
> URL: https://issues.apache.org/jira/browse/SPARK-10351
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>Priority: Critical
>
> {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
> returns {{null}} when passed a {{null}} base object, failing to handle 
> off-heap backed {{UnsafeRow}}s correctly.
> This will also cause a {{NullPointerException}} when {{getString}} is called 
> with off-heap storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-10352) Replace SQLTestData internal usages of String with UTF8String

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang closed SPARK-10352.
-
Resolution: Not A Problem

Caused by my code not respecting {{InternalRow}} can only contain 
{{UTF8String}} and no {{java.lang.String}}

> Replace SQLTestData internal usages of String with UTF8String
> -
>
> Key: SPARK-10352
> URL: https://issues.apache.org/jira/browse/SPARK-10352
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>
> Running the code:
> {code}
> val inputString = "abc"
> val row = InternalRow.apply(inputString)
> val unsafeRow = 
> UnsafeProjection.create(Array[DataType](StringType)).apply(row)
> {code}
> generates the error:
> {code}
> [info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
> ***snip***
> {code}
> Although {{StringType}} should in theory only have internal type 
> {{UTF8String}}, we [are inconsistent with this 
> constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131]
>  and being more strict would [break existing 
> code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41]
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10352) Replace SQLTestData internal usages of String with UTF8String

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10352:
--
Summary: Replace SQLTestData internal usages of String with UTF8String  
(was: Replace internal usages of String with UTF8String)

> Replace SQLTestData internal usages of String with UTF8String
> -
>
> Key: SPARK-10352
> URL: https://issues.apache.org/jira/browse/SPARK-10352
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>
> Running the code:
> {code}
> val inputString = "abc"
> val row = InternalRow.apply(inputString)
> val unsafeRow = 
> UnsafeProjection.create(Array[DataType](StringType)).apply(row)
> {code}
> generates the error:
> {code}
> [info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
> ***snip***
> {code}
> Although {{StringType}} should in theory only have internal type 
> {{UTF8String}}, we [are inconsistent with this 
> constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131]
>  and being more strict would [break existing 
> code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41]
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10352) Replace internal usages of String with UTF8String

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10352:
--
Summary: Replace internal usages of String with UTF8String  (was: 
BaseGenericInternalRow.getUTF8String should support java.lang.String)

> Replace internal usages of String with UTF8String
> -
>
> Key: SPARK-10352
> URL: https://issues.apache.org/jira/browse/SPARK-10352
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>
> Running the code:
> {code}
> val inputString = "abc"
> val row = InternalRow.apply(inputString)
> val unsafeRow = 
> UnsafeProjection.create(Array[DataType](StringType)).apply(row)
> {code}
> generates the error:
> {code}
> [info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
> ***snip***
> {code}
> Although {{StringType}} should in theory only have internal type 
> {{UTF8String}}, we [are inconsistent with this 
> constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131]
>  and being more strict would [break existing 
> code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41]
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10351:
--
Summary: UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow  
(was: UnsafeRow.getUTF8String should handle off-heap memory)

> UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow
> ---
>
> Key: SPARK-10351
> URL: https://issues.apache.org/jira/browse/SPARK-10351
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>Priority: Critical
>
> {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
> returns {{null}} when passed a {{null}} base object, failing to handle 
> off-heap backed {{UnsafeRow}}s correctly.
> This will also cause a {{NullPointerException}} when {{getString}} is called 
> with off-heap storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10351) UnsafeRow.getString should handle off-heap backed UnsafeRow

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10351:
--
Summary: UnsafeRow.getString should handle off-heap backed UnsafeRow  (was: 
UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow)

> UnsafeRow.getString should handle off-heap backed UnsafeRow
> ---
>
> Key: SPARK-10351
> URL: https://issues.apache.org/jira/browse/SPARK-10351
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>Priority: Critical
>
> {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
> returns {{null}} when passed a {{null}} base object, failing to handle 
> off-heap backed {{UnsafeRow}}s correctly.
> This will also cause a {{NullPointerException}} when {{getString}} is called 
> with off-heap storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10351:
--
Description: 
{{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
returns {{null}} when passed a {{null}} base object, failing to handle off-heap 
backed {{UnsafeRow}}s correctly.

This will also cause a {{NullPointerException}} when {{getString}} is called 
with off-heap storage.

  was:
{{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
returns {{null}} when passed a {{null}} base object, failing to handle off-heap 
memory correctly.

This will also cause a {{NullPointerException}} when {{getString}} is called 
with off-heap storage.


> UnsafeRow.getUTF8String should handle off-heap memory
> -
>
> Key: SPARK-10351
> URL: https://issues.apache.org/jira/browse/SPARK-10351
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>Priority: Critical
>
> {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
> returns {{null}} when passed a {{null}} base object, failing to handle 
> off-heap backed {{UnsafeRow}}s correctly.
> This will also cause a {{NullPointerException}} when {{getString}} is called 
> with off-heap storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10352:
--
Description: 
Running the code:
{code}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{code}

Although {{StringType}} should in theory only have internal type 
{{UTF8String}}, we [are inconsistent with this 
constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131]
 and being more strict would [break existing 
code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41]
 

  was:
Running the code:
{code}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{code}

Although `StringType` should in theory only have internal type `UTF8String`, we 
[are inconsistent with this 
constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131]
 and being more strict would [break existing 
code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41]
 


> BaseGenericInternalRow.getUTF8String should support java.lang.String
> 
>
> Key: SPARK-10352
> URL: https://issues.apache.org/jira/browse/SPARK-10352
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>
> Running the code:
> {code}
> val inputString = "abc"
> val row = InternalRow.apply(inputString)
> val unsafeRow = 
> UnsafeProjection.create(Array[DataType](StringType)).apply(row)
> {code}
> generates the error:
> {code}
> [info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
> ***snip***
> {code}
> Although {{StringType}} should in theory only have internal type 
> {{UTF8String}}, we [are inconsistent with this 
> constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131]
>  and being more strict would [break existing 
> code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41]
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10352:
--
Description: 
Running the code:
{code}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{code}

Although `StringType` should in theory only have internal type `UTF8String`, we 
[are inconsistent with this 
constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131]
 and being more strict would [break existing 
code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41]
 

  was:
Running the code:
{code}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{code}


> BaseGenericInternalRow.getUTF8String should support java.lang.String
> 
>
> Key: SPARK-10352
> URL: https://issues.apache.org/jira/browse/SPARK-10352
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>
> Running the code:
> {code}
> val inputString = "abc"
> val row = InternalRow.apply(inputString)
> val unsafeRow = 
> UnsafeProjection.create(Array[DataType](StringType)).apply(row)
> {code}
> generates the error:
> {code}
> [info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
> ***snip***
> {code}
> Although `StringType` should in theory only have internal type `UTF8String`, 
> we [are inconsistent with this 
> constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131]
>  and being more strict would [break existing 
> code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41]
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10352:
--
Description: 
Running the code:
{code}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{code}

  was:
Running the code:
{code:scala}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{code}


> BaseGenericInternalRow.getUTF8String should support java.lang.String
> 
>
> Key: SPARK-10352
> URL: https://issues.apache.org/jira/browse/SPARK-10352
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>
> Running the code:
> {code}
> val inputString = "abc"
> val row = InternalRow.apply(inputString)
> val unsafeRow = 
> UnsafeProjection.create(Array[DataType](StringType)).apply(row)
> {code}
> generates the error:
> {code}
> [info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
> ***snip***
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10352:
--
Description: 
Running the code:
{code:scala}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{code}

  was:
Running the code:
{code}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{code}


> BaseGenericInternalRow.getUTF8String should support java.lang.String
> 
>
> Key: SPARK-10352
> URL: https://issues.apache.org/jira/browse/SPARK-10352
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>
> Running the code:
> {code:scala}
> val inputString = "abc"
> val row = InternalRow.apply(inputString)
> val unsafeRow = 
> UnsafeProjection.create(Array[DataType](StringType)).apply(row)
> {code}
> generates the error:
> {code}
> [info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
> ***snip***
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10352:
--
Description: 
Running the code:
{code scala}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{code}

  was:
Running the code:
{code}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{/code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{/code}


> BaseGenericInternalRow.getUTF8String should support java.lang.String
> 
>
> Key: SPARK-10352
> URL: https://issues.apache.org/jira/browse/SPARK-10352
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>
> Running the code:
> {code scala}
> val inputString = "abc"
> val row = InternalRow.apply(inputString)
> val unsafeRow = 
> UnsafeProjection.create(Array[DataType](StringType)).apply(row)
> {code}
> generates the error:
> {code}
> [info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
> ***snip***
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10352:
--
Description: 
Running the code:
{{code}}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{{/code}}
generates the error:
{{code}}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{{/code}}

  was:
Running the code:
{{
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
}}
generates the error:
{{[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***}}


> BaseGenericInternalRow.getUTF8String should support java.lang.String
> 
>
> Key: SPARK-10352
> URL: https://issues.apache.org/jira/browse/SPARK-10352
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>
> Running the code:
> {{code}}
> val inputString = "abc"
> val row = InternalRow.apply(inputString)
> val unsafeRow = 
> UnsafeProjection.create(Array[DataType](StringType)).apply(row)
> {{/code}}
> generates the error:
> {{code}}
> [info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
> ***snip***
> {{/code}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10352:
--
Description: 
Running the code:
{code}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{code}

  was:
Running the code:
{code scala}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{code}


> BaseGenericInternalRow.getUTF8String should support java.lang.String
> 
>
> Key: SPARK-10352
> URL: https://issues.apache.org/jira/browse/SPARK-10352
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>
> Running the code:
> {code}
> val inputString = "abc"
> val row = InternalRow.apply(inputString)
> val unsafeRow = 
> UnsafeProjection.create(Array[DataType](StringType)).apply(row)
> {code}
> generates the error:
> {code}
> [info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
> ***snip***
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10352:
--
Description: 
Running the code:
{code}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{/code}
generates the error:
{code}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{/code}

  was:
Running the code:
{{code}}
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
{{/code}}
generates the error:
{{code}}
[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***
{{/code}}


> BaseGenericInternalRow.getUTF8String should support java.lang.String
> 
>
> Key: SPARK-10352
> URL: https://issues.apache.org/jira/browse/SPARK-10352
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>
> Running the code:
> {code}
> val inputString = "abc"
> val row = InternalRow.apply(inputString)
> val unsafeRow = 
> UnsafeProjection.create(Array[DataType](StringType)).apply(row)
> {/code}
> generates the error:
> {code}
> [info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
> ***snip***
> {/code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String

2015-08-29 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721289#comment-14721289
 ] 

Feynman Liang commented on SPARK-10352:
---

Working on a PR.

[~rxin] can you confirm that this is a bug?

> BaseGenericInternalRow.getUTF8String should support java.lang.String
> 
>
> Key: SPARK-10352
> URL: https://issues.apache.org/jira/browse/SPARK-10352
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>
> Running the code:
> {{
> val inputString = "abc"
> val row = InternalRow.apply(inputString)
> val unsafeRow = 
> UnsafeProjection.create(Array[DataType](StringType)).apply(row)
> }}
> generates the error:
> {{[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
> [info]   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
> ***snip***}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String

2015-08-29 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10352:
-

 Summary: BaseGenericInternalRow.getUTF8String should support 
java.lang.String
 Key: SPARK-10352
 URL: https://issues.apache.org/jira/browse/SPARK-10352
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Feynman Liang


Running the code:
{{
val inputString = "abc"
val row = InternalRow.apply(inputString)
val unsafeRow = 
UnsafeProjection.create(Array[DataType](StringType)).apply(row)
}}
generates the error:
{{[info]   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.unsafe.types.UTF8String
[info]   at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
***snip***}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory

2015-08-29 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721286#comment-14721286
 ] 

Feynman Liang edited comment on SPARK-10351 at 8/29/15 11:12 PM:
-

I'm working on a PR to fix this.

[~rxin] is this a bug or actually intended behavior (and I'm just not 
interpreting correctly)?


was (Author: fliang):
I'm working on a PR to make my use case work.

[~rxin] is this a bug or actually intended behavior (and I'm just not 
interpreting correctly)?

> UnsafeRow.getUTF8String should handle off-heap memory
> -
>
> Key: SPARK-10351
> URL: https://issues.apache.org/jira/browse/SPARK-10351
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>Priority: Critical
>
> {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
> returns {{null}} when passed a {{null}} base object, failing to handle 
> off-heap memory correctly.
> This will also cause a {{NullPointerException}} when {{getString}} is called 
> with off-heap storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10351:
--
Description: 
{{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
returns {{null}} when passed a {{null}} base object, failing to handle off-heap 
memory correctly.

This will also cause a {{NullPointerException}} when {{getString}} is called 
with off-heap storage.

  was:{{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
returns {{null}} when passed a {{null}} base object, failing to handle off-heap 
memory correctly. 


> UnsafeRow.getUTF8String should handle off-heap memory
> -
>
> Key: SPARK-10351
> URL: https://issues.apache.org/jira/browse/SPARK-10351
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>Priority: Critical
>
> {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
> returns {{null}} when passed a {{null}} base object, failing to handle 
> off-heap memory correctly.
> This will also cause a {{NullPointerException}} when {{getString}} is called 
> with off-heap storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory

2015-08-29 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10351:
--
Description: {{UnsafeRow.getUTF8String}} delegates to 
{{UTF8String.fromAddress}} which returns {{null}} when passed a {{null}} base 
object, failing to handle off-heap memory correctly.   (was: 
{{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which does 
not handle off-heap memory correctly. )

> UnsafeRow.getUTF8String should handle off-heap memory
> -
>
> Key: SPARK-10351
> URL: https://issues.apache.org/jira/browse/SPARK-10351
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>Priority: Critical
>
> {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
> returns {{null}} when passed a {{null}} base object, failing to handle 
> off-heap memory correctly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory

2015-08-29 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721286#comment-14721286
 ] 

Feynman Liang commented on SPARK-10351:
---

I'm working on a PR to make my use case work.

[~rxin] is this a bug or actually intended behavior (and I'm just not 
interpreting correctly)?

> UnsafeRow.getUTF8String should handle off-heap memory
> -
>
> Key: SPARK-10351
> URL: https://issues.apache.org/jira/browse/SPARK-10351
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Feynman Liang
>Priority: Critical
>
> {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which 
> does not handle off-heap memory correctly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory

2015-08-29 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10351:
-

 Summary: UnsafeRow.getUTF8String should handle off-heap memory
 Key: SPARK-10351
 URL: https://issues.apache.org/jira/browse/SPARK-10351
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Feynman Liang
Priority: Critical


{{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which does 
not handle off-heap memory correctly. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7454) Perf test for power iteration clustering (PIC)

2015-08-28 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720860#comment-14720860
 ] 

Feynman Liang edited comment on SPARK-7454 at 8/29/15 12:54 AM:


[~mengxr] do you mind assigning to me? Thanks!


was (Author: fliang):
[~mengxr] do you mind assigning to me and linking my PR, thanks!

> Perf test for power iteration clustering (PIC)
> --
>
> Key: SPARK-7454
> URL: https://issues.apache.org/jira/browse/SPARK-7454
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7454) Perf test for power iteration clustering (PIC)

2015-08-28 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720860#comment-14720860
 ] 

Feynman Liang commented on SPARK-7454:
--

[~mengxr] do you mind assigning to me and linking my PR, thanks!

> Perf test for power iteration clustering (PIC)
> --
>
> Key: SPARK-7454
> URL: https://issues.apache.org/jira/browse/SPARK-7454
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7454) Perf test for power iteration clustering (PIC)

2015-08-28 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720854#comment-14720854
 ] 

Feynman Liang commented on SPARK-7454:
--

https://github.com/databricks/spark-perf/pull/86

> Perf test for power iteration clustering (PIC)
> --
>
> Key: SPARK-7454
> URL: https://issues.apache.org/jira/browse/SPARK-7454
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Affects Versions: 1.4.0
>Reporter: Xiangrui Meng
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10199) Avoid using reflections for parquet model save

2015-08-28 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720277#comment-14720277
 ] 

Feynman Liang commented on SPARK-10199:
---

[~vinodkc] would it be possible to get some microbenchmarks? You can surround 
the call to 
[createDataFrame|https://github.com/apache/spark/pull/8507/files#diff-13d1de98ab7ae677f9b345eb90a8b8e8R237]
 with some timing code before and after the change.

> Avoid using reflections for parquet model save
> --
>
> Key: SPARK-10199
> URL: https://issues.apache.org/jira/browse/SPARK-10199
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> These items are not high priority since the overhead writing to Parquest is 
> much greater than for runtime reflections.
> Multiple model save/load in MLlib use case classes to infer a schema for the 
> data frame saved to Parquet. However, inferring a schema from case classes or 
> tuples uses [runtime 
> reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361]
>  which is unnecessary since the types are already known at the time `save` is 
> called.
> It would be better to just specify the schema for the data frame directly 
> using {{sqlContext.createDataFrame(dataRDD, schema)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10333) Add user guide for linear-methods.md columns

2015-08-28 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10333:
-

 Summary: Add user guide for linear-methods.md columns
 Key: SPARK-10333
 URL: https://issues.apache.org/jira/browse/SPARK-10333
 Project: Spark
  Issue Type: Documentation
  Components: ML
Reporter: Feynman Liang
Priority: Minor


Add example code to document input output columns based on 
https://github.com/apache/spark/pull/8491 feedback



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10253) Remove Guava dependencies in MLlib java tests

2015-08-26 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715175#comment-14715175
 ] 

Feynman Liang commented on SPARK-10253:
---

I believe only committers can assign JIRAs.

Lets keep discussion about number of JIRAs vs PR size in SPARK-7751.

> Remove Guava dependencies in MLlib java tests
> -
>
> Key: SPARK-10253
> URL: https://issues.apache.org/jira/browse/SPARK-10253
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> Many tests depend on Google Guava's {{Lists.newArrayList}} when 
> {{java.util.Arrays.asList}} could be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7751) Add @Since annotation to stable and experimental methods in MLlib

2015-08-26 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715157#comment-14715157
 ] 

Feynman Liang commented on SPARK-7751:
--

If we are worried about differing solutions for the JIRAs, I think providing an 
example PR and asking to follow suit can help that issue. That's what's done in 
this JIRA and I think for the most part it has been effective.

I agree that keeping the noise low on issues@ is important, but a smaller 
number of JIRAs is in direct tension with keeping PRs small and easy to review. 
It's not clear what the solution is here, but IMO discussing the right balance 
is a good direction.

> Add @Since annotation to stable and experimental methods in MLlib
> -
>
> Key: SPARK-7751
> URL: https://issues.apache.org/jira/browse/SPARK-7751
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation, MLlib
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Minor
>  Labels: starter
>
> This is useful to check whether a feature exists in some version of Spark. 
> This is an umbrella JIRA to track the progress. We want to have -@since tag- 
> @Since annotation for both stable (those without any 
> Experimental/DeveloperApi/AlphaComponent annotations) and experimental 
> methods in MLlib:
> (Do NOT tag private or package private classes or methods, nor local 
> variables and methods.)
> * an example PR for Scala: https://github.com/apache/spark/pull/8309
> We need to dig the history of git commit to figure out what was the Spark 
> version when a method was first introduced. Take `NaiveBayes.setModelType` as 
> an example. We can grep `def setModelType` at different version git tags.
> {code}
> meng@xm:~/src/spark
> $ git show 
> v1.3.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
>  | grep "def setModelType"
> meng@xm:~/src/spark
> $ git show 
> v1.4.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
>  | grep "def setModelType"
>   def setModelType(modelType: String): NaiveBayes = {
> {code}
> If there are better ways, please let us know.
> We cannot add all -@since tags- @Since annotation in a single PR, which is 
> hard to review. So we made some subtasks for each package, for example 
> `org.apache.spark.classification`. Feel free to add more sub-tasks for Python 
> and the `spark.ml` package.
> Plan:
> 1. In 1.5, we try to add @Since annotation to all stable/experimental methods 
> under `spark.mllib`.
> 2. Starting from 1.6, we require @Since annotation in all new PRs.
> 3. In 1.6, we try to add @SInce annotation to all stable/experimental methods 
> under `spark.ml`, `pyspark.mllib`, and `pyspark.ml`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10199) Avoid using reflections for parquet model save

2015-08-26 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715148#comment-14715148
 ] 

Feynman Liang commented on SPARK-10199:
---

Awesome, thanks! You can tag that PR with the parent JIRA (SPARK-10199) then.

> Avoid using reflections for parquet model save
> --
>
> Key: SPARK-10199
> URL: https://issues.apache.org/jira/browse/SPARK-10199
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> These items are not high priority since the overhead writing to Parquest is 
> much greater than for runtime reflections.
> Multiple model save/load in MLlib use case classes to infer a schema for the 
> data frame saved to Parquet. However, inferring a schema from case classes or 
> tuples uses [runtime 
> reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361]
>  which is unnecessary since the types are already known at the time `save` is 
> called.
> It would be better to just specify the schema for the data frame directly 
> using {{sqlContext.createDataFrame(dataRDD, schema)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10208) Specify schema during LocalLDAModel.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10208:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during LocalLDAModel.save to avoid reflection
> 
>
> Key: SPARK-10208
> URL: https://issues.apache.org/jira/browse/SPARK-10208
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [LocalLDAModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala#L389]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10211) Specify schema during MatrixFactorizationModel.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10211:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during MatrixFactorizationModel.save to avoid reflection
> ---
>
> Key: SPARK-10211
> URL: https://issues.apache.org/jira/browse/SPARK-10211
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [MatrixFactorizationModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala#L361]
>  currently infers a schema from a RDD of tuples when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10205) Specify schema during PowerIterationClustering.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10205:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during PowerIterationClustering.save to avoid reflection
> ---
>
> Key: SPARK-10205
> URL: https://issues.apache.org/jira/browse/SPARK-10205
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [PowerIterationClustering.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala#L82]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10212) Specify schema during TreeEnsembleModel.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10212:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during TreeEnsembleModel.save to avoid reflection
> 
>
> Key: SPARK-10212
> URL: https://issues.apache.org/jira/browse/SPARK-10212
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [TreeEnsembleModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala#L451]
>  currently infers a schema from a RDD of {{NodeData}} case classes when the 
> schema is known and should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10206) Specify schema during IsotonicRegression.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10206:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during IsotonicRegression.save to avoid reflection
> -
>
> Key: SPARK-10206
> URL: https://issues.apache.org/jira/browse/SPARK-10206
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [IsotonicRegression.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala#L184]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10213) Specify schema during DecisionTreeModel.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10213:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during DecisionTreeModel.save to avoid reflection
> 
>
> Key: SPARK-10213
> URL: https://issues.apache.org/jira/browse/SPARK-10213
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [DecisionTreeModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/tree/model/DecisionTreeModel.scala#L238]
>  currently infers a schema from a {{NodeData}} case class when the schema is 
> known and should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10209) Specify schema during DistributedLDAModel.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10209:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during DistributedLDAModel.save to avoid reflection
> --
>
> Key: SPARK-10209
> URL: https://issues.apache.org/jira/browse/SPARK-10209
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [DistributedLDAModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala#L783]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10203) Specify schema during GLMClassificationModel.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10203:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during GLMClassificationModel.save to avoid reflection
> -
>
> Key: SPARK-10203
> URL: https://issues.apache.org/jira/browse/SPARK-10203
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [GLMClassificationModel.save|https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/mllib/src/main/scala/org/apache/spark/mllib/classification/impl/GLMClassificationModel.scala#L38]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10202) Specify schema during KMeansModel.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10202:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during KMeansModel.save to avoid reflection
> --
>
> Key: SPARK-10202
> URL: https://issues.apache.org/jira/browse/SPARK-10202
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [KMeansModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala#L110]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10201) Specify schema during GaussianMixtureModel.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10201:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during GaussianMixtureModel.save to avoid reflection
> ---
>
> Key: SPARK-10201
> URL: https://issues.apache.org/jira/browse/SPARK-10201
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [GaussianMixtureModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala#L140]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10204) Specify schema during NaiveBayes.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10204:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during NaiveBayes.save to avoid reflection
> -
>
> Key: SPARK-10204
> URL: https://issues.apache.org/jira/browse/SPARK-10204
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [NaiveBayes.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala#L181]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10200) Specify schema during GLMRegressionModel.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10200:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during GLMRegressionModel.save to avoid reflection
> -
>
> Key: SPARK-10200
> URL: https://issues.apache.org/jira/browse/SPARK-10200
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [GLMRegressionModel.save|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/mllib/src/main/scala/org/apache/spark/mllib/regression/impl/GLMRegressionModel.scala#L44]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10207) Specify schema during Word2Vec.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10207:
--
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-10199

> Specify schema during Word2Vec.save to avoid reflection
> ---
>
> Key: SPARK-10207
> URL: https://issues.apache.org/jira/browse/SPARK-10207
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [Word2Vec.save|https://github.com/apache/spark/blob/7cfc0750e14f2c1b3847e4720cc02150253525a9/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala#L615]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10207) Specify schema during Word2Vec.save to avoid reflection

2015-08-26 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715141#comment-14715141
 ] 

Feynman Liang commented on SPARK-10207:
---

[~srowen] I linked them using "is required by" but in retrospect I think making 
these subtasks of SPARK-10199 is more appropriate, thanks for bring that up! I 
will make that change.

Let's discuss the issue of large numbers of logically similar issues in 
SPARK-7751.

> Specify schema during Word2Vec.save to avoid reflection
> ---
>
> Key: SPARK-10207
> URL: https://issues.apache.org/jira/browse/SPARK-10207
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [Word2Vec.save|https://github.com/apache/spark/blob/7cfc0750e14f2c1b3847e4720cc02150253525a9/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala#L615]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10199) Avoid using reflections for parquet model save

2015-08-25 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712492#comment-14712492
 ] 

Feynman Liang commented on SPARK-10199:
---

Hi [~vinodkc], I saw that you took all of these issues. Thanks for your help! 
To make things easier for review, do you mind grouping all the changes into a 
single PR?

> Avoid using reflections for parquet model save
> --
>
> Key: SPARK-10199
> URL: https://issues.apache.org/jira/browse/SPARK-10199
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> These items are not high priority since the overhead writing to Parquest is 
> much greater than for runtime reflections.
> Multiple model save/load in MLlib use case classes to infer a schema for the 
> data frame saved to Parquet. However, inferring a schema from case classes or 
> tuples uses [runtime 
> reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361]
>  which is unnecessary since the types are already known at the time `save` is 
> called.
> It would be better to just specify the schema for the data frame directly 
> using {{sqlContext.createDataFrame(dataRDD, schema)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10257) Remove Guava dependencies in spark.mllib JavaTests

2015-08-25 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10257:
-

 Summary: Remove Guava dependencies in spark.mllib JavaTests
 Key: SPARK-10257
 URL: https://issues.apache.org/jira/browse/SPARK-10257
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10254) Remove Guava dependencies in spark.ml.feature

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10254:
--
Priority: Minor  (was: Major)

> Remove Guava dependencies in spark.ml.feature
> -
>
> Key: SPARK-10254
> URL: https://issues.apache.org/jira/browse/SPARK-10254
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Feynman Liang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10256) Remove Guava dependencies in spark.ml.classificaiton

2015-08-25 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10256:
-

 Summary: Remove Guava dependencies in spark.ml.classificaiton
 Key: SPARK-10256
 URL: https://issues.apache.org/jira/browse/SPARK-10256
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Feynman Liang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10255) Remove Guava dependencies in spark.ml.param

2015-08-25 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10255:
-

 Summary: Remove Guava dependencies in spark.ml.param
 Key: SPARK-10255
 URL: https://issues.apache.org/jira/browse/SPARK-10255
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Feynman Liang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10254) Remove Guava dependencies in spark.ml.feature

2015-08-25 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10254:
-

 Summary: Remove Guava dependencies in spark.ml.feature
 Key: SPARK-10254
 URL: https://issues.apache.org/jira/browse/SPARK-10254
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Feynman Liang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10253) Remove Guava dependencies in MLlib java tests

2015-08-25 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712443#comment-14712443
 ] 

Feynman Liang commented on SPARK-10253:
---

Working on this

> Remove Guava dependencies in MLlib java tests
> -
>
> Key: SPARK-10253
> URL: https://issues.apache.org/jira/browse/SPARK-10253
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> Many tests depend on Google Guava's {{Lists.newArrayList}} when 
> {{java.util.Arrays.asList}} could be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10253) Remove Guava dependencies in MLlib java tests

2015-08-25 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10253:
-

 Summary: Remove Guava dependencies in MLlib java tests
 Key: SPARK-10253
 URL: https://issues.apache.org/jira/browse/SPARK-10253
 Project: Spark
  Issue Type: Improvement
  Components: ML, MLlib
Reporter: Feynman Liang
Priority: Minor


Many tests depend on Google Guava's {{Lists.newArrayList}} when 
{{java.util.Arrays.asList}} could be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9680) Update programming guide section for ml.feature.StopWordsRemover

2015-08-25 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712374#comment-14712374
 ] 

Feynman Liang commented on SPARK-9680:
--

[~holdenk] phew ok, I just wanted to make sure my PR didn't redo anything 
you've already worked on :)

> Update programming guide section for ml.feature.StopWordsRemover
> 
>
> Key: SPARK-9680
> URL: https://issues.apache.org/jira/browse/SPARK-9680
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: yuhao yang
>Assignee: Feynman Liang
>Priority: Minor
>  Labels: document
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10249) Add Python Code Example to StopWordsRemover User GUide

2015-08-25 Thread Feynman Liang (JIRA)
Feynman Liang created SPARK-10249:
-

 Summary: Add Python Code Example to StopWordsRemover User GUide
 Key: SPARK-10249
 URL: https://issues.apache.org/jira/browse/SPARK-10249
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Feynman Liang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10249) Add Python Code Example to StopWordsRemover User Guide

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10249:
--
Summary: Add Python Code Example to StopWordsRemover User Guide  (was: Add 
Python Code Example to StopWordsRemover User GUide)

> Add Python Code Example to StopWordsRemover User Guide
> --
>
> Key: SPARK-10249
> URL: https://issues.apache.org/jira/browse/SPARK-10249
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Feynman Liang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9680) Update programming guide section for ml.feature.StopWordsRemover

2015-08-25 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712125#comment-14712125
 ] 

Feynman Liang commented on SPARK-9680:
--

[~mengxr] please assign to me

> Update programming guide section for ml.feature.StopWordsRemover
> 
>
> Key: SPARK-9680
> URL: https://issues.apache.org/jira/browse/SPARK-9680
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: yuhao yang
>Priority: Minor
>  Labels: document
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-9796) LogisticRegressionModel Docs Completeness

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang closed SPARK-9796.

Resolution: Not A Problem

> LogisticRegressionModel Docs Completeness
> -
>
> Key: SPARK-9796
> URL: https://issues.apache.org/jira/browse/SPARK-9796
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Feynman Liang
>Priority: Minor
>  Labels: starter
>
> Add docs for
>  * LogisticRegressionModel$.load
>  * LogisticRegressionModel.save
>  * LogisticRegressionModel.toString()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9796) LogisticRegressionModel Docs Completeness

2015-08-25 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711857#comment-14711857
 ] 

Feynman Liang commented on SPARK-9796:
--

Closing since this problem appears only in how unidoc handles @Since 
annotations for overriden methods; the actual generated scaladocs are fine.

> LogisticRegressionModel Docs Completeness
> -
>
> Key: SPARK-9796
> URL: https://issues.apache.org/jira/browse/SPARK-9796
> Project: Spark
>  Issue Type: Documentation
>  Components: ML
>Reporter: Feynman Liang
>Priority: Minor
>  Labels: starter
>
> Add docs for
>  * LogisticRegressionModel$.load
>  * LogisticRegressionModel.save
>  * LogisticRegressionModel.toString()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9799) SVMModel documentation improvements

2015-08-25 Thread Feynman Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711855#comment-14711855
 ] 

Feynman Liang commented on SPARK-9799:
--

Closing since this problem appears only in how unidoc handles @Since 
annotations for overriden methods; the actual generated scaladocs are fine.

> SVMModel documentation improvements
> ---
>
> Key: SPARK-9799
> URL: https://issues.apache.org/jira/browse/SPARK-9799
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>  Labels: starter
>
> SVMModel missing descriptions in documentation for save, load, and toString



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-9799) SVMModel documentation improvements

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang closed SPARK-9799.

Resolution: Not A Problem

> SVMModel documentation improvements
> ---
>
> Key: SPARK-9799
> URL: https://issues.apache.org/jira/browse/SPARK-9799
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>  Labels: starter
>
> SVMModel missing descriptions in documentation for save, load, and toString



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10199) Avoid using reflections for parquet model save

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10199:
--
Description: 
These items are not high priority since the overhead writing to Parquest is 
much greater than for runtime reflections.

Multiple model save/load in MLlib use case classes to infer a schema for the 
data frame saved to Parquet. However, inferring a schema from case classes or 
tuples uses [runtime 
reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361]
 which is unnecessary since the types are already known at the time `save` is 
called.

It would be better to just specify the schema for the data frame directly using 
{{sqlContext.createDataFrame(dataRDD, schema)}}

  was:
Multiple model save/load in MLlib use case classes to infer a schema for the 
data frame saved to Parquet. However, inferring a schema from case classes or 
tuples uses [runtime 
reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361]
 which is unnecessary since the types are already known at the time `save` is 
called.

It would be better to just specify the schema for the data frame directly using 
{{sqlContext.createDataFrame(dataRDD, schema)}}


> Avoid using reflections for parquet model save
> --
>
> Key: SPARK-10199
> URL: https://issues.apache.org/jira/browse/SPARK-10199
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> These items are not high priority since the overhead writing to Parquest is 
> much greater than for runtime reflections.
> Multiple model save/load in MLlib use case classes to infer a schema for the 
> data frame saved to Parquet. However, inferring a schema from case classes or 
> tuples uses [runtime 
> reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361]
>  which is unnecessary since the types are already known at the time `save` is 
> called.
> It would be better to just specify the schema for the data frame directly 
> using {{sqlContext.createDataFrame(dataRDD, schema)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10205) Specify schema during PowerIterationClustering.save to avoid reflection

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10205:
--
Priority: Minor  (was: Major)

> Specify schema during PowerIterationClustering.save to avoid reflection
> ---
>
> Key: SPARK-10205
> URL: https://issues.apache.org/jira/browse/SPARK-10205
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [PowerIterationClustering.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala#L82]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10200) Specify schema during GLMRegressionModel.save to avoid reflection

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10200:
--
Target Version/s:   (was: 1.6.0)

> Specify schema during GLMRegressionModel.save to avoid reflection
> -
>
> Key: SPARK-10200
> URL: https://issues.apache.org/jira/browse/SPARK-10200
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [GLMRegressionModel.save|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/mllib/src/main/scala/org/apache/spark/mllib/regression/impl/GLMRegressionModel.scala#L44]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10207) Specify schema during Word2Vec.save to avoid reflection

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10207:
--
Priority: Minor  (was: Major)

> Specify schema during Word2Vec.save to avoid reflection
> ---
>
> Key: SPARK-10207
> URL: https://issues.apache.org/jira/browse/SPARK-10207
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [Word2Vec.save|https://github.com/apache/spark/blob/7cfc0750e14f2c1b3847e4720cc02150253525a9/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala#L615]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10202) Specify schema during KMeansModel.save to avoid reflection

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10202:
--
Target Version/s:   (was: 1.6.0)

> Specify schema during KMeansModel.save to avoid reflection
> --
>
> Key: SPARK-10202
> URL: https://issues.apache.org/jira/browse/SPARK-10202
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [KMeansModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala#L110]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10206) Specify schema during IsotonicRegression.save to avoid reflection

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10206:
--
Target Version/s:   (was: 1.6.0)

> Specify schema during IsotonicRegression.save to avoid reflection
> -
>
> Key: SPARK-10206
> URL: https://issues.apache.org/jira/browse/SPARK-10206
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [IsotonicRegression.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala#L184]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10201) Specify schema during GaussianMixtureModel.save to avoid reflection

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10201:
--
Target Version/s:   (was: 1.6.0)

> Specify schema during GaussianMixtureModel.save to avoid reflection
> ---
>
> Key: SPARK-10201
> URL: https://issues.apache.org/jira/browse/SPARK-10201
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [GaussianMixtureModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala#L140]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10201) Specify schema during GaussianMixtureModel.save to avoid reflection

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10201:
--
Priority: Minor  (was: Major)

> Specify schema during GaussianMixtureModel.save to avoid reflection
> ---
>
> Key: SPARK-10201
> URL: https://issues.apache.org/jira/browse/SPARK-10201
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [GaussianMixtureModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala#L140]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10208) Specify schema during LocalLDAModel.save to avoid reflection

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10208:
--
Target Version/s:   (was: 1.6.0)

> Specify schema during LocalLDAModel.save to avoid reflection
> 
>
> Key: SPARK-10208
> URL: https://issues.apache.org/jira/browse/SPARK-10208
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [LocalLDAModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala#L389]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10205) Specify schema during PowerIterationClustering.save to avoid reflection

2015-08-25 Thread Feynman Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-10205:
--
Target Version/s:   (was: 1.6.0)

> Specify schema during PowerIterationClustering.save to avoid reflection
> ---
>
> Key: SPARK-10205
> URL: https://issues.apache.org/jira/browse/SPARK-10205
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Feynman Liang
>Priority: Minor
>
> [PowerIterationClustering.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala#L82]
>  currently infers a schema from a case class when the schema is known and 
> should be manually provided.
> See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   >