[jira] [Commented] (SPARK-12804) ml.classification.LogisticRegression fails when FitIntercept with same-label dataset
[ https://issues.apache.org/jira/browse/SPARK-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15103742#comment-15103742 ] Feynman Liang commented on SPARK-12804: --- [~josephkb] the two issues are slightly different; SPARK-12732 addresses the case where fitIntercept=false and all labels are the same, which LiR currently treats the same as if fitIntercept=true. I'll make sure that my fix doesn't introduce that bug. > ml.classification.LogisticRegression fails when FitIntercept with same-label > dataset > > > Key: SPARK-12804 > URL: https://issues.apache.org/jira/browse/SPARK-12804 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.6.0 >Reporter: Feynman Liang >Assignee: Feynman Liang > > When training LogisticRegression on a dataset where the label is all 0 or all > 1, an array out of bounds exception is thrown. The problematic code is > {code} > initialCoefficientsWithIntercept.toArray(numFeatures) > = math.log(histogram(1) / histogram(0)) > } > {code} > The correct behaviour is to short-circuit training entirely when only a > single label is present (can be detected from {{labelSummarizer}}) and return > a classifier which assigns all true/false with infinite weights. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12810) PySpark CrossValidatorModel should support avgMetrics
Feynman Liang created SPARK-12810: - Summary: PySpark CrossValidatorModel should support avgMetrics Key: SPARK-12810 URL: https://issues.apache.org/jira/browse/SPARK-12810 Project: Spark Issue Type: Improvement Components: ML, PySpark Reporter: Feynman Liang The {CrossValidator} in Scala supports {avgMetrics} since 1.5.0, which allows the user to evaluate how well each {ParamMap} in the grid search performed and identify the best parameters. We should support this in PySpark as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12806) Support SQL expressions extracting values from VectorUDT
[ https://issues.apache.org/jira/browse/SPARK-12806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-12806: -- Description: Use cases exist where a specific index within a {{VectorUDT} column of a {{DataFrame}} is required. For example, we may be interested in extracting a specific class probability from the {{probabilityCol}} of a {{LogisticRegression}} to compute losses. However, if {{probability}} is a column of {{df}} with type {{VectorUDT}}, the following code fails: {code} df.select("probability.0") AnalysisException: u"Can't extract value from probability" {code} thrown from {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala}}. {{VectorUDT}} essentially wraps a {{StructType}}, hence one would expect it to support value extraction Expressions in an analogous way. was: Use cases exist where a specific index within a {VectorUDT} column of a {DataFrame} is required. For example, we may be interested in extracting a specific class probability from the {probabilityCol} of a {LogisticRegression} to compute losses. However, if {probability} is a column of {df} with type {VectorUDT}, the following code fails: {code} df.select("probability.0") AnalysisException: u"Can't extract value from probability" {code} thrown from {sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala}. {VectorUDT} essentially wraps a {StructType}, hence one would expect it to support value extraction Expressions in an analogous way. > Support SQL expressions extracting values from VectorUDT > > > Key: SPARK-12806 > URL: https://issues.apache.org/jira/browse/SPARK-12806 > Project: Spark > Issue Type: Improvement > Components: MLlib, SQL >Affects Versions: 1.6.0 >Reporter: Feynman Liang > > Use cases exist where a specific index within a {{VectorUDT} column of a > {{DataFrame}} is required. For example, we may be interested in extracting a > specific class probability from the {{probabilityCol}} of a > {{LogisticRegression}} to compute losses. However, if {{probability}} is a > column of {{df}} with type {{VectorUDT}}, the following code fails: > {code} > df.select("probability.0") > AnalysisException: u"Can't extract value from probability" > {code} > thrown from > {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala}}. > {{VectorUDT}} essentially wraps a {{StructType}}, hence one would expect it > to support value extraction Expressions in an analogous way. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12806) Support SQL expressions extracting values from VectorUDT
Feynman Liang created SPARK-12806: - Summary: Support SQL expressions extracting values from VectorUDT Key: SPARK-12806 URL: https://issues.apache.org/jira/browse/SPARK-12806 Project: Spark Issue Type: Improvement Components: MLlib, SQL Affects Versions: 1.6.0 Reporter: Feynman Liang Use cases exist where a specific index within a {VectorUDT} column of a {DataFrame} is required. For example, we may be interested in extracting a specific class probability from the {probabilityCol} of a {LogisticRegression} to compute losses. However, if {probability} is a column of {df} with type {VectorUDT}, the following code fails: {code} df.select("probability.0") AnalysisException: u"Can't extract value from probability" {code} thrown from {sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala}. {VectorUDT} essentially wraps a {StructType}, hence one would expect it to support value extraction Expressions in an analogous way. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12804) ml.classification.LogisticRegression fails when FitIntercept with same-label dataset
[ https://issues.apache.org/jira/browse/SPARK-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-12804: -- Description: When training LogisticRegression on a dataset where the label is all 0 or all 1, an array out of bounds exception is thrown. The problematic code is {code:scala} initialCoefficientsWithIntercept.toArray(numFeatures) = math.log(histogram(1) / histogram(0)) } {code} The correct behaviour is to short-circuit training entirely when only a single label is present (can be detected from {{labelSummarizer}}) and return a classifier which assigns all true/false with infinite weights. was: When training LogisticRegression on a dataset where the label is all 0 or all 1, an array out of bounds exception is thrown. The problematic code is {code} initialCoefficientsWithIntercept.toArray(numFeatures) = math.log(histogram(1) / histogram(0)) } {/code} The correct behaviour is to short-circuit training entirely when only a single label is present (can be detected from {{labelSummarizer}}) and return a classifier which assigns all true/false with infinite weights. > ml.classification.LogisticRegression fails when FitIntercept with same-label > dataset > > > Key: SPARK-12804 > URL: https://issues.apache.org/jira/browse/SPARK-12804 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.6.0 >Reporter: Feynman Liang > > When training LogisticRegression on a dataset where the label is all 0 or all > 1, an array out of bounds exception is thrown. The problematic code is > {code:scala} > initialCoefficientsWithIntercept.toArray(numFeatures) > = math.log(histogram(1) / histogram(0)) > } > {code} > The correct behaviour is to short-circuit training entirely when only a > single label is present (can be detected from {{labelSummarizer}}) and return > a classifier which assigns all true/false with infinite weights. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12804) ml.classification.LogisticRegression fails when FitIntercept with same-label dataset
[ https://issues.apache.org/jira/browse/SPARK-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-12804: -- Description: When training LogisticRegression on a dataset where the label is all 0 or all 1, an array out of bounds exception is thrown. The problematic code is {code} initialCoefficientsWithIntercept.toArray(numFeatures) = math.log(histogram(1) / histogram(0)) } {code} The correct behaviour is to short-circuit training entirely when only a single label is present (can be detected from {{labelSummarizer}}) and return a classifier which assigns all true/false with infinite weights. was: When training LogisticRegression on a dataset where the label is all 0 or all 1, an array out of bounds exception is thrown. The problematic code is {code:scala} initialCoefficientsWithIntercept.toArray(numFeatures) = math.log(histogram(1) / histogram(0)) } {code} The correct behaviour is to short-circuit training entirely when only a single label is present (can be detected from {{labelSummarizer}}) and return a classifier which assigns all true/false with infinite weights. > ml.classification.LogisticRegression fails when FitIntercept with same-label > dataset > > > Key: SPARK-12804 > URL: https://issues.apache.org/jira/browse/SPARK-12804 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.6.0 >Reporter: Feynman Liang > > When training LogisticRegression on a dataset where the label is all 0 or all > 1, an array out of bounds exception is thrown. The problematic code is > {code} > initialCoefficientsWithIntercept.toArray(numFeatures) > = math.log(histogram(1) / histogram(0)) > } > {code} > The correct behaviour is to short-circuit training entirely when only a > single label is present (can be detected from {{labelSummarizer}}) and return > a classifier which assigns all true/false with infinite weights. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12804) ml.classification.LogisticRegression fails when FitIntercept with same-label dataset
Feynman Liang created SPARK-12804: - Summary: ml.classification.LogisticRegression fails when FitIntercept with same-label dataset Key: SPARK-12804 URL: https://issues.apache.org/jira/browse/SPARK-12804 Project: Spark Issue Type: Bug Components: ML Affects Versions: 1.6.0 Reporter: Feynman Liang When training LogisticRegression on a dataset where the label is all 0 or all 1, an array out of bounds exception is thrown. The problematic code is {code} initialCoefficientsWithIntercept.toArray(numFeatures) = math.log(histogram(1) / histogram(0)) } {/code} The correct behaviour is to short-circuit training entirely when only a single label is present (can be detected from {{labelSummarizer}}) and return a classifier which assigns all true/false with infinite weights. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12804) ml.classification.LogisticRegression fails when FitIntercept with same-label dataset
[ https://issues.apache.org/jira/browse/SPARK-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096064#comment-15096064 ] Feynman Liang commented on SPARK-12804: --- Please assign to me > ml.classification.LogisticRegression fails when FitIntercept with same-label > dataset > > > Key: SPARK-12804 > URL: https://issues.apache.org/jira/browse/SPARK-12804 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.6.0 >Reporter: Feynman Liang > > When training LogisticRegression on a dataset where the label is all 0 or all > 1, an array out of bounds exception is thrown. The problematic code is > {code} > initialCoefficientsWithIntercept.toArray(numFeatures) > = math.log(histogram(1) / histogram(0)) > } > {/code} > The correct behaviour is to short-circuit training entirely when only a > single label is present (can be detected from {{labelSummarizer}}) and return > a classifier which assigns all true/false with infinite weights. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-12779) StringIndexer should handle null
[ https://issues.apache.org/jira/browse/SPARK-12779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang closed SPARK-12779. - Resolution: Duplicate > StringIndexer should handle null > > > Key: SPARK-12779 > URL: https://issues.apache.org/jira/browse/SPARK-12779 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.6.0 >Reporter: Feynman Liang > > StringIndexer currently fails with a {{NullPointerException}} when indexing a > column containing {{null}}s. It should instead index all {{null}}s into some > sentinel value (say -1). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12779) StringIndexer should handle null
[ https://issues.apache.org/jira/browse/SPARK-12779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095057#comment-15095057 ] Feynman Liang commented on SPARK-12779: --- Yep you're right, thanks! > StringIndexer should handle null > > > Key: SPARK-12779 > URL: https://issues.apache.org/jira/browse/SPARK-12779 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.6.0 >Reporter: Feynman Liang > > StringIndexer currently fails with a {{NullPointerException}} when indexing a > column containing {{null}}s. It should instead index all {{null}}s into some > sentinel value (say -1). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12779) StringIndexer should handle null
Feynman Liang created SPARK-12779: - Summary: StringIndexer should handle null Key: SPARK-12779 URL: https://issues.apache.org/jira/browse/SPARK-12779 Project: Spark Issue Type: Bug Components: ML Affects Versions: 1.6.0 Reporter: Feynman Liang StringIndexer currently fails with a {{NullPointerException}} when indexing a column containing {{null}}s. It should instead index all {{null}}s into some sentinel value (say -1). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11960) User guide section for streaming a/b testing
[ https://issues.apache.org/jira/browse/SPARK-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025593#comment-15025593 ] Feynman Liang commented on SPARK-11960: --- [~josephkb] happy to work on it, when is the 1.6 QA deadline? > User guide section for streaming a/b testing > > > Key: SPARK-11960 > URL: https://issues.apache.org/jira/browse/SPARK-11960 > Project: Spark > Issue Type: Documentation > Components: Documentation, MLlib >Reporter: Joseph K. Bradley >Assignee: Feynman Liang > > [~fliang] Assigning since you added the feature. Will you have a chance to > do this soon? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9798) CrossValidatorModel Documentation Improvements
[ https://issues.apache.org/jira/browse/SPARK-9798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903657#comment-14903657 ] Feynman Liang commented on SPARK-9798: -- The actual scala doc > CrossValidatorModel Documentation Improvements > -- > > Key: SPARK-9798 > URL: https://issues.apache.org/jira/browse/SPARK-9798 > Project: Spark > Issue Type: Documentation > Components: ML >Reporter: Feynman Liang >Priority: Minor > Labels: starter > > CrossValidatorModel's avgMetrics and bestModel need documentation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10691) Make LogisticRegressionModel's evaluate method public
[ https://issues.apache.org/jira/browse/SPARK-10691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14875959#comment-14875959 ] Feynman Liang commented on SPARK-10691: --- Also, +1 for calling it "evaluate". > Make LogisticRegressionModel's evaluate method public > - > > Key: SPARK-10691 > URL: https://issues.apache.org/jira/browse/SPARK-10691 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.5.0 >Reporter: Hao Ren > > The following method in {{LogisticRegressionModel}} is marked as {{private}}, > which prevents users from creating a summary on any given data set. Check > [here|https://github.com/feynmanliang/spark/blob/d219fa4c216e8f35b71a26921561104d15cd6055/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L272]. > {code} > // TODO: decide on a good name before exposing to public API > private[classification] def evaluate(dataset: DataFrame) > : LogisticRegressionSummary = { > new BinaryLogisticRegressionSummary( > this.transform(dataset), > $(probabilityCol), > $(labelCol)) > } > {code} > This method is definitely necessary to test model performance. > By the way, the name {{evaluate}} is already pretty good for me. > [~mengxr] Could you check this ? Thx -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10691) Make LogisticRegressionModel's evaluate method public
[ https://issues.apache.org/jira/browse/SPARK-10691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14875956#comment-14875956 ] Feynman Liang commented on SPARK-10691: --- We should also create one for linear regression (and link the two issues) > Make LogisticRegressionModel's evaluate method public > - > > Key: SPARK-10691 > URL: https://issues.apache.org/jira/browse/SPARK-10691 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 1.5.0 >Reporter: Hao Ren > > The following method in {{LogisticRegressionModel}} is marked as {{private}}, > which prevents users from creating a summary on any given data set. Check > [here|https://github.com/feynmanliang/spark/blob/d219fa4c216e8f35b71a26921561104d15cd6055/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L272]. > {code} > // TODO: decide on a good name before exposing to public API > private[classification] def evaluate(dataset: DataFrame) > : LogisticRegressionSummary = { > new BinaryLogisticRegressionSummary( > this.transform(dataset), > $(probabilityCol), > $(labelCol)) > } > {code} > This method is definitely necessary to test model performance. > By the way, the name {{evaluate}} is already pretty good for me. > [~mengxr] Could you check this ? Thx -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10583) Correctness test for Multilayer Perceptron using Weka Reference
Feynman Liang created SPARK-10583: - Summary: Correctness test for Multilayer Perceptron using Weka Reference Key: SPARK-10583 URL: https://issues.apache.org/jira/browse/SPARK-10583 Project: Spark Issue Type: Bug Components: ML Reporter: Feynman Liang SPARK-9471 adds MLP and a [TODO item|https://github.com/apache/spark/blob/6add4eddb39e7748a87da3e921ea3c7881d30a82/mllib/src/test/scala/org/apache/spark/ml/ann/ANNSuite.scala#L28] to create a test checking implementation's learned weights against Weka's MLP implementation. We need to add this as a unit test. The work should include the reference Weka code ran as a comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9668) ML 1.5 QA: Docs: Check for new APIs
[ https://issues.apache.org/jira/browse/SPARK-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741237#comment-14741237 ] Feynman Liang commented on SPARK-9668: -- Yep, thanks! > ML 1.5 QA: Docs: Check for new APIs > --- > > Key: SPARK-9668 > URL: https://issues.apache.org/jira/browse/SPARK-9668 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML, MLlib >Reporter: Joseph K. Bradley >Assignee: Feynman Liang > Fix For: 1.5.0 > > > Check the user guide vs. a list of new APIs (classes, methods, data members) > to see what items require updates to the user guide. > For each feature missing user guide doc: > * Create a JIRA for that feature, and assign it to the author of the feature > * Link it to (a) the original JIRA which introduced that feature ("related > to") and (b) to this JIRA ("requires"). > Note: Now that we have algorithms in spark.ml which are not in spark.mllib, > we should make subsections for the spark.ml API as needed. We can follow the > structure of the spark.mllib user guide. > * The spark.ml user guide can provide: (a) code examples and (b) info on > algorithms which do not exist in spark.mllib. > * We should not duplicate info in the spark.ml guides. Since spark.mllib is > still the primary API, we should provide links to the corresponding > algorithms in the spark.mllib user guide for more info. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10489) GraphX dataframe wrapper
[ https://issues.apache.org/jira/browse/SPARK-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735206#comment-14735206 ] Feynman Liang edited comment on SPARK-10489 at 9/10/15 11:52 PM: - Doing this in a separate spark package was (Author: fliang): Doing this in a separate spark package (https://github.com/databricks/spark-df-graph) > GraphX dataframe wrapper > > > Key: SPARK-10489 > URL: https://issues.apache.org/jira/browse/SPARK-10489 > Project: Spark > Issue Type: New Feature > Components: GraphX >Reporter: Feynman Liang > > We want to wrap GraphX Graph using DataFrames and implement basic high-level > algorithms like PageRank. Then we can easily implement Python API, > import/export, and other features. > {code} > val graph = new GraphF(vDF, eDF) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-10489) GraphX dataframe wrapper
[ https://issues.apache.org/jira/browse/SPARK-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang closed SPARK-10489. - Resolution: Won't Fix Doing this in a separate spark package (https://github.com/databricks/spark-df-graph) > GraphX dataframe wrapper > > > Key: SPARK-10489 > URL: https://issues.apache.org/jira/browse/SPARK-10489 > Project: Spark > Issue Type: New Feature > Components: GraphX >Reporter: Feynman Liang > > We want to wrap GraphX Graph using DataFrames and implement basic high-level > algorithms like PageRank. Then we can easily implement Python API, > import/export, and other features. > {code} > val graph = new GraphF(vDF, eDF) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10489) GraphX dataframe wrapper
Feynman Liang created SPARK-10489: - Summary: GraphX dataframe wrapper Key: SPARK-10489 URL: https://issues.apache.org/jira/browse/SPARK-10489 Project: Spark Issue Type: New Feature Components: GraphX Reporter: Feynman Liang We want to wrap GraphX Graph using DataFrames and implement basic high-level algorithms like PageRank. Then we can easily implement Python API, import/export, and other features. {code} val graph = new GraphF(vDF, eDF) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10489) GraphX dataframe wrapper
[ https://issues.apache.org/jira/browse/SPARK-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735202#comment-14735202 ] Feynman Liang commented on SPARK-10489: --- Working on this > GraphX dataframe wrapper > > > Key: SPARK-10489 > URL: https://issues.apache.org/jira/browse/SPARK-10489 > Project: Spark > Issue Type: New Feature > Components: GraphX >Reporter: Feynman Liang > > We want to wrap GraphX Graph using DataFrames and implement basic high-level > algorithms like PageRank. Then we can easily implement Python API, > import/export, and other features. > {code} > val graph = new GraphF(vDF, eDF) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10479) LogisticRegression copy should copy model summary if available
[ https://issues.apache.org/jira/browse/SPARK-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735053#comment-14735053 ] Feynman Liang commented on SPARK-10479: --- [~lravindr] Sorry, I didn't know that someone was already working on this. My apologies for any work you may have already done > LogisticRegression copy should copy model summary if available > -- > > Key: SPARK-10479 > URL: https://issues.apache.org/jira/browse/SPARK-10479 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Feynman Liang >Assignee: Yanbo Liang >Priority: Minor > Labels: starter > > SPARK-9112 adds LogisticRegressionSummary but [does not copy the model > summary if > available|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L471] > We should add behavior similar to that in > [LinearRegression.copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L314] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10479) LogisticRegression copy should copy model summary if available
[ https://issues.apache.org/jira/browse/SPARK-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734146#comment-14734146 ] Feynman Liang commented on SPARK-10479: --- Thanks for your help! Please go head and work on it; your comment should be enough to let others know. Only committers can assign issues on JIRA. > LogisticRegression copy should copy model summary if available > -- > > Key: SPARK-10479 > URL: https://issues.apache.org/jira/browse/SPARK-10479 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Feynman Liang >Priority: Minor > Labels: starter > > SPARK-9112 adds LogisticRegressionSummary but [does not copy the model > summary if > available|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L471] > We should add behavior similar to that in > [LinearRegression.copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L314] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10479) LogisticRegression copy should copy model summary if available
Feynman Liang created SPARK-10479: - Summary: LogisticRegression copy should copy model summary if available Key: SPARK-10479 URL: https://issues.apache.org/jira/browse/SPARK-10479 Project: Spark Issue Type: Bug Components: ML Reporter: Feynman Liang Priority: Minor SPARK-9112 adds LogisticRegressionSummary but does not update [copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L471] to copy the model summary if available. We should add behavior similar to that in [LinearRegression.copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L314] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10479) LogisticRegression copy should copy model summary if available
[ https://issues.apache.org/jira/browse/SPARK-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10479: -- Description: SPARK-9112 adds LogisticRegressionSummary but [does not copy the model summary if available|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L471] We should add behavior similar to that in [LinearRegression.copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L314] was: SPARK-9112 adds LogisticRegressionSummary but does not update [copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L471] to copy the model summary if available. We should add behavior similar to that in [LinearRegression.copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L314] > LogisticRegression copy should copy model summary if available > -- > > Key: SPARK-10479 > URL: https://issues.apache.org/jira/browse/SPARK-10479 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Feynman Liang >Priority: Minor > Labels: starter > > SPARK-9112 adds LogisticRegressionSummary but [does not copy the model > summary if > available|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L471] > We should add behavior similar to that in > [LinearRegression.copy|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala#L314] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10478) Improve spark.ml.ann implementations for MLP
Feynman Liang created SPARK-10478: - Summary: Improve spark.ml.ann implementations for MLP Key: SPARK-10478 URL: https://issues.apache.org/jira/browse/SPARK-10478 Project: Spark Issue Type: Bug Components: ML Reporter: Feynman Liang Priority: Critical SPARK-9471 adds an implementation of multi-layer perceptrons. However, there are a few issues with the current code that should be addressed, namely: * Style guide: 4 indent method arguments, braces around one-line {{if}}s, punctuation in scaladocs * Comments: cryptic variable names (e.g. {{gw}}, {{gb}}, {{gwb}}) should be documented * Performance: parts of the code can be improved using Breeze broadcasts, vectorized operations, and Breeze UFuncs to avoid manual iteration and leverage BLAS optimization -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10199) Avoid using reflections for parquet model save
[ https://issues.apache.org/jira/browse/SPARK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724138#comment-14724138 ] Feynman Liang commented on SPARK-10199: --- CC [~mengxr] [~josephkb] > Avoid using reflections for parquet model save > -- > > Key: SPARK-10199 > URL: https://issues.apache.org/jira/browse/SPARK-10199 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Feynman Liang >Priority: Minor > > These items are not high priority since the overhead writing to Parquest is > much greater than for runtime reflections. > Multiple model save/load in MLlib use case classes to infer a schema for the > data frame saved to Parquet. However, inferring a schema from case classes or > tuples uses [runtime > reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361] > which is unnecessary since the types are already known at the time `save` is > called. > It would be better to just specify the schema for the data frame directly > using {{sqlContext.createDataFrame(dataRDD, schema)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7454) Perf test for power iteration clustering (PIC)
[ https://issues.apache.org/jira/browse/SPARK-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723666#comment-14723666 ] Feynman Liang commented on SPARK-7454: -- [~mengxr] [~josephkb] can we close this since PR 86 was merged? > Perf test for power iteration clustering (PIC) > -- > > Key: SPARK-7454 > URL: https://issues.apache.org/jira/browse/SPARK-7454 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10199) Avoid using reflections for parquet model save
[ https://issues.apache.org/jira/browse/SPARK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721907#comment-14721907 ] Feynman Liang commented on SPARK-10199: --- [~vinodkc] Thanks! I think these results are convincing. Let's see what others think but FWIW I'm all for these changes, particularly because it sets precedence for future model save/load to explicitly specify the schema. > Avoid using reflections for parquet model save > -- > > Key: SPARK-10199 > URL: https://issues.apache.org/jira/browse/SPARK-10199 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Feynman Liang >Priority: Minor > > These items are not high priority since the overhead writing to Parquest is > much greater than for runtime reflections. > Multiple model save/load in MLlib use case classes to infer a schema for the > data frame saved to Parquet. However, inferring a schema from case classes or > tuples uses [runtime > reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361] > which is unnecessary since the types are already known at the time `save` is > called. > It would be better to just specify the schema for the data frame directly > using {{sqlContext.createDataFrame(dataRDD, schema)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10351: -- Summary: UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow (was: UnsafeRow.getString should handle off-heap backed UnsafeRow) > UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow > --- > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10351) UnsafeRow.getString should handle off-heap backed UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721325#comment-14721325 ] Feynman Liang commented on SPARK-10351: --- Sorry, the fix is for {{getUTF8String}}. {{getString}} is the method which causes the {{NullPointerException}}. Updated title. > UnsafeRow.getString should handle off-heap backed UnsafeRow > --- > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-10352) Replace SQLTestData internal usages of String with UTF8String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang closed SPARK-10352. - Resolution: Not A Problem Caused by my code not respecting {{InternalRow}} can only contain {{UTF8String}} and no {{java.lang.String}} > Replace SQLTestData internal usages of String with UTF8String > - > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} > Although {{StringType}} should in theory only have internal type > {{UTF8String}}, we [are inconsistent with this > constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] > and being more strict would [break existing > code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) Replace SQLTestData internal usages of String with UTF8String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Summary: Replace SQLTestData internal usages of String with UTF8String (was: Replace internal usages of String with UTF8String) > Replace SQLTestData internal usages of String with UTF8String > - > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} > Although {{StringType}} should in theory only have internal type > {{UTF8String}}, we [are inconsistent with this > constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] > and being more strict would [break existing > code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) Replace internal usages of String with UTF8String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Summary: Replace internal usages of String with UTF8String (was: BaseGenericInternalRow.getUTF8String should support java.lang.String) > Replace internal usages of String with UTF8String > - > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} > Although {{StringType}} should in theory only have internal type > {{UTF8String}}, we [are inconsistent with this > constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] > and being more strict would [break existing > code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10351: -- Summary: UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow (was: UnsafeRow.getUTF8String should handle off-heap memory) > UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow > --- > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10351) UnsafeRow.getString should handle off-heap backed UnsafeRow
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10351: -- Summary: UnsafeRow.getString should handle off-heap backed UnsafeRow (was: UnsafeRow.getUTF8String should handle off-heap backed UnsafeRow) > UnsafeRow.getString should handle off-heap backed UnsafeRow > --- > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10351: -- Description: {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which returns {{null}} when passed a {{null}} base object, failing to handle off-heap backed {{UnsafeRow}}s correctly. This will also cause a {{NullPointerException}} when {{getString}} is called with off-heap storage. was: {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which returns {{null}} when passed a {{null}} base object, failing to handle off-heap memory correctly. This will also cause a {{NullPointerException}} when {{getString}} is called with off-heap storage. > UnsafeRow.getUTF8String should handle off-heap memory > - > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap backed {{UnsafeRow}}s correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} Although {{StringType}} should in theory only have internal type {{UTF8String}}, we [are inconsistent with this constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] and being more strict would [break existing code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] was: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} Although `StringType` should in theory only have internal type `UTF8String`, we [are inconsistent with this constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] and being more strict would [break existing code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} > Although {{StringType}} should in theory only have internal type > {{UTF8String}}, we [are inconsistent with this > constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] > and being more strict would [break existing > code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} Although `StringType` should in theory only have internal type `UTF8String`, we [are inconsistent with this constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] and being more strict would [break existing code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] was: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} > Although `StringType` should in theory only have internal type `UTF8String`, > we [are inconsistent with this > constraint|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L131] > and being more strict would [break existing > code|https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala#L41] > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} was: Running the code: {code:scala} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code:scala} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} was: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code:scala} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code scala} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} was: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {/code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {/code} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code scala} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {{code}} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {{/code}} generates the error: {{code}} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {{/code}} was: Running the code: {{ val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) }} generates the error: {{[info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip***}} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {{code}} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {{/code}} > generates the error: > {{code}} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {{/code}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} was: Running the code: {code scala} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {code} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10352: -- Description: Running the code: {code} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {/code} generates the error: {code} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {/code} was: Running the code: {{code}} val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) {{/code}} generates the error: {{code}} [info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip*** {{/code}} > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {code} > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > {/code} > generates the error: > {code} > [info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip*** > {/code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
[ https://issues.apache.org/jira/browse/SPARK-10352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721289#comment-14721289 ] Feynman Liang commented on SPARK-10352: --- Working on a PR. [~rxin] can you confirm that this is a bug? > BaseGenericInternalRow.getUTF8String should support java.lang.String > > > Key: SPARK-10352 > URL: https://issues.apache.org/jira/browse/SPARK-10352 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang > > Running the code: > {{ > val inputString = "abc" > val row = InternalRow.apply(inputString) > val unsafeRow = > UnsafeProjection.create(Array[DataType](StringType)).apply(row) > }} > generates the error: > {{[info] java.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.spark.unsafe.types.UTF8String > [info] at > org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) > ***snip***}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10352) BaseGenericInternalRow.getUTF8String should support java.lang.String
Feynman Liang created SPARK-10352: - Summary: BaseGenericInternalRow.getUTF8String should support java.lang.String Key: SPARK-10352 URL: https://issues.apache.org/jira/browse/SPARK-10352 Project: Spark Issue Type: Bug Components: SQL Reporter: Feynman Liang Running the code: {{ val inputString = "abc" val row = InternalRow.apply(inputString) val unsafeRow = UnsafeProjection.create(Array[DataType](StringType)).apply(row) }} generates the error: {{[info] java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String [info] at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46) ***snip***}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721286#comment-14721286 ] Feynman Liang edited comment on SPARK-10351 at 8/29/15 11:12 PM: - I'm working on a PR to fix this. [~rxin] is this a bug or actually intended behavior (and I'm just not interpreting correctly)? was (Author: fliang): I'm working on a PR to make my use case work. [~rxin] is this a bug or actually intended behavior (and I'm just not interpreting correctly)? > UnsafeRow.getUTF8String should handle off-heap memory > - > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap memory correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10351: -- Description: {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which returns {{null}} when passed a {{null}} base object, failing to handle off-heap memory correctly. This will also cause a {{NullPointerException}} when {{getString}} is called with off-heap storage. was:{{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which returns {{null}} when passed a {{null}} base object, failing to handle off-heap memory correctly. > UnsafeRow.getUTF8String should handle off-heap memory > - > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap memory correctly. > This will also cause a {{NullPointerException}} when {{getString}} is called > with off-heap storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10351: -- Description: {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which returns {{null}} when passed a {{null}} base object, failing to handle off-heap memory correctly. (was: {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which does not handle off-heap memory correctly. ) > UnsafeRow.getUTF8String should handle off-heap memory > - > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > returns {{null}} when passed a {{null}} base object, failing to handle > off-heap memory correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory
[ https://issues.apache.org/jira/browse/SPARK-10351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721286#comment-14721286 ] Feynman Liang commented on SPARK-10351: --- I'm working on a PR to make my use case work. [~rxin] is this a bug or actually intended behavior (and I'm just not interpreting correctly)? > UnsafeRow.getUTF8String should handle off-heap memory > - > > Key: SPARK-10351 > URL: https://issues.apache.org/jira/browse/SPARK-10351 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Feynman Liang >Priority: Critical > > {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which > does not handle off-heap memory correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10351) UnsafeRow.getUTF8String should handle off-heap memory
Feynman Liang created SPARK-10351: - Summary: UnsafeRow.getUTF8String should handle off-heap memory Key: SPARK-10351 URL: https://issues.apache.org/jira/browse/SPARK-10351 Project: Spark Issue Type: Bug Components: SQL Reporter: Feynman Liang Priority: Critical {{UnsafeRow.getUTF8String}} delegates to {{UTF8String.fromAddress}} which does not handle off-heap memory correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7454) Perf test for power iteration clustering (PIC)
[ https://issues.apache.org/jira/browse/SPARK-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720860#comment-14720860 ] Feynman Liang edited comment on SPARK-7454 at 8/29/15 12:54 AM: [~mengxr] do you mind assigning to me? Thanks! was (Author: fliang): [~mengxr] do you mind assigning to me and linking my PR, thanks! > Perf test for power iteration clustering (PIC) > -- > > Key: SPARK-7454 > URL: https://issues.apache.org/jira/browse/SPARK-7454 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7454) Perf test for power iteration clustering (PIC)
[ https://issues.apache.org/jira/browse/SPARK-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720860#comment-14720860 ] Feynman Liang commented on SPARK-7454: -- [~mengxr] do you mind assigning to me and linking my PR, thanks! > Perf test for power iteration clustering (PIC) > -- > > Key: SPARK-7454 > URL: https://issues.apache.org/jira/browse/SPARK-7454 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7454) Perf test for power iteration clustering (PIC)
[ https://issues.apache.org/jira/browse/SPARK-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720854#comment-14720854 ] Feynman Liang commented on SPARK-7454: -- https://github.com/databricks/spark-perf/pull/86 > Perf test for power iteration clustering (PIC) > -- > > Key: SPARK-7454 > URL: https://issues.apache.org/jira/browse/SPARK-7454 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Affects Versions: 1.4.0 >Reporter: Xiangrui Meng > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10199) Avoid using reflections for parquet model save
[ https://issues.apache.org/jira/browse/SPARK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720277#comment-14720277 ] Feynman Liang commented on SPARK-10199: --- [~vinodkc] would it be possible to get some microbenchmarks? You can surround the call to [createDataFrame|https://github.com/apache/spark/pull/8507/files#diff-13d1de98ab7ae677f9b345eb90a8b8e8R237] with some timing code before and after the change. > Avoid using reflections for parquet model save > -- > > Key: SPARK-10199 > URL: https://issues.apache.org/jira/browse/SPARK-10199 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Feynman Liang >Priority: Minor > > These items are not high priority since the overhead writing to Parquest is > much greater than for runtime reflections. > Multiple model save/load in MLlib use case classes to infer a schema for the > data frame saved to Parquet. However, inferring a schema from case classes or > tuples uses [runtime > reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361] > which is unnecessary since the types are already known at the time `save` is > called. > It would be better to just specify the schema for the data frame directly > using {{sqlContext.createDataFrame(dataRDD, schema)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10333) Add user guide for linear-methods.md columns
Feynman Liang created SPARK-10333: - Summary: Add user guide for linear-methods.md columns Key: SPARK-10333 URL: https://issues.apache.org/jira/browse/SPARK-10333 Project: Spark Issue Type: Documentation Components: ML Reporter: Feynman Liang Priority: Minor Add example code to document input output columns based on https://github.com/apache/spark/pull/8491 feedback -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10253) Remove Guava dependencies in MLlib java tests
[ https://issues.apache.org/jira/browse/SPARK-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715175#comment-14715175 ] Feynman Liang commented on SPARK-10253: --- I believe only committers can assign JIRAs. Lets keep discussion about number of JIRAs vs PR size in SPARK-7751. > Remove Guava dependencies in MLlib java tests > - > > Key: SPARK-10253 > URL: https://issues.apache.org/jira/browse/SPARK-10253 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Feynman Liang >Priority: Minor > > Many tests depend on Google Guava's {{Lists.newArrayList}} when > {{java.util.Arrays.asList}} could be used instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7751) Add @Since annotation to stable and experimental methods in MLlib
[ https://issues.apache.org/jira/browse/SPARK-7751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715157#comment-14715157 ] Feynman Liang commented on SPARK-7751: -- If we are worried about differing solutions for the JIRAs, I think providing an example PR and asking to follow suit can help that issue. That's what's done in this JIRA and I think for the most part it has been effective. I agree that keeping the noise low on issues@ is important, but a smaller number of JIRAs is in direct tension with keeping PRs small and easy to review. It's not clear what the solution is here, but IMO discussing the right balance is a good direction. > Add @Since annotation to stable and experimental methods in MLlib > - > > Key: SPARK-7751 > URL: https://issues.apache.org/jira/browse/SPARK-7751 > Project: Spark > Issue Type: Umbrella > Components: Documentation, MLlib >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Minor > Labels: starter > > This is useful to check whether a feature exists in some version of Spark. > This is an umbrella JIRA to track the progress. We want to have -@since tag- > @Since annotation for both stable (those without any > Experimental/DeveloperApi/AlphaComponent annotations) and experimental > methods in MLlib: > (Do NOT tag private or package private classes or methods, nor local > variables and methods.) > * an example PR for Scala: https://github.com/apache/spark/pull/8309 > We need to dig the history of git commit to figure out what was the Spark > version when a method was first introduced. Take `NaiveBayes.setModelType` as > an example. We can grep `def setModelType` at different version git tags. > {code} > meng@xm:~/src/spark > $ git show > v1.3.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala > | grep "def setModelType" > meng@xm:~/src/spark > $ git show > v1.4.0:mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala > | grep "def setModelType" > def setModelType(modelType: String): NaiveBayes = { > {code} > If there are better ways, please let us know. > We cannot add all -@since tags- @Since annotation in a single PR, which is > hard to review. So we made some subtasks for each package, for example > `org.apache.spark.classification`. Feel free to add more sub-tasks for Python > and the `spark.ml` package. > Plan: > 1. In 1.5, we try to add @Since annotation to all stable/experimental methods > under `spark.mllib`. > 2. Starting from 1.6, we require @Since annotation in all new PRs. > 3. In 1.6, we try to add @SInce annotation to all stable/experimental methods > under `spark.ml`, `pyspark.mllib`, and `pyspark.ml`. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10199) Avoid using reflections for parquet model save
[ https://issues.apache.org/jira/browse/SPARK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715148#comment-14715148 ] Feynman Liang commented on SPARK-10199: --- Awesome, thanks! You can tag that PR with the parent JIRA (SPARK-10199) then. > Avoid using reflections for parquet model save > -- > > Key: SPARK-10199 > URL: https://issues.apache.org/jira/browse/SPARK-10199 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Feynman Liang >Priority: Minor > > These items are not high priority since the overhead writing to Parquest is > much greater than for runtime reflections. > Multiple model save/load in MLlib use case classes to infer a schema for the > data frame saved to Parquet. However, inferring a schema from case classes or > tuples uses [runtime > reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361] > which is unnecessary since the types are already known at the time `save` is > called. > It would be better to just specify the schema for the data frame directly > using {{sqlContext.createDataFrame(dataRDD, schema)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10208) Specify schema during LocalLDAModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10208: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during LocalLDAModel.save to avoid reflection > > > Key: SPARK-10208 > URL: https://issues.apache.org/jira/browse/SPARK-10208 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [LocalLDAModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala#L389] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10211) Specify schema during MatrixFactorizationModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10211: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during MatrixFactorizationModel.save to avoid reflection > --- > > Key: SPARK-10211 > URL: https://issues.apache.org/jira/browse/SPARK-10211 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [MatrixFactorizationModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala#L361] > currently infers a schema from a RDD of tuples when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10205) Specify schema during PowerIterationClustering.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10205: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during PowerIterationClustering.save to avoid reflection > --- > > Key: SPARK-10205 > URL: https://issues.apache.org/jira/browse/SPARK-10205 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [PowerIterationClustering.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala#L82] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10212) Specify schema during TreeEnsembleModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10212: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during TreeEnsembleModel.save to avoid reflection > > > Key: SPARK-10212 > URL: https://issues.apache.org/jira/browse/SPARK-10212 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [TreeEnsembleModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/tree/model/treeEnsembleModels.scala#L451] > currently infers a schema from a RDD of {{NodeData}} case classes when the > schema is known and should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10206) Specify schema during IsotonicRegression.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10206: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during IsotonicRegression.save to avoid reflection > - > > Key: SPARK-10206 > URL: https://issues.apache.org/jira/browse/SPARK-10206 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [IsotonicRegression.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala#L184] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10213) Specify schema during DecisionTreeModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10213: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during DecisionTreeModel.save to avoid reflection > > > Key: SPARK-10213 > URL: https://issues.apache.org/jira/browse/SPARK-10213 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [DecisionTreeModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/tree/model/DecisionTreeModel.scala#L238] > currently infers a schema from a {{NodeData}} case class when the schema is > known and should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10209) Specify schema during DistributedLDAModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10209: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during DistributedLDAModel.save to avoid reflection > -- > > Key: SPARK-10209 > URL: https://issues.apache.org/jira/browse/SPARK-10209 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [DistributedLDAModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala#L783] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10203) Specify schema during GLMClassificationModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10203: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during GLMClassificationModel.save to avoid reflection > - > > Key: SPARK-10203 > URL: https://issues.apache.org/jira/browse/SPARK-10203 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [GLMClassificationModel.save|https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/mllib/src/main/scala/org/apache/spark/mllib/classification/impl/GLMClassificationModel.scala#L38] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10202) Specify schema during KMeansModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10202: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during KMeansModel.save to avoid reflection > -- > > Key: SPARK-10202 > URL: https://issues.apache.org/jira/browse/SPARK-10202 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [KMeansModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala#L110] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10201) Specify schema during GaussianMixtureModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10201: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during GaussianMixtureModel.save to avoid reflection > --- > > Key: SPARK-10201 > URL: https://issues.apache.org/jira/browse/SPARK-10201 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [GaussianMixtureModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala#L140] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10204) Specify schema during NaiveBayes.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10204: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during NaiveBayes.save to avoid reflection > - > > Key: SPARK-10204 > URL: https://issues.apache.org/jira/browse/SPARK-10204 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [NaiveBayes.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala#L181] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10200) Specify schema during GLMRegressionModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10200: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during GLMRegressionModel.save to avoid reflection > - > > Key: SPARK-10200 > URL: https://issues.apache.org/jira/browse/SPARK-10200 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [GLMRegressionModel.save|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/mllib/src/main/scala/org/apache/spark/mllib/regression/impl/GLMRegressionModel.scala#L44] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10207) Specify schema during Word2Vec.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10207: -- Issue Type: Sub-task (was: Improvement) Parent: SPARK-10199 > Specify schema during Word2Vec.save to avoid reflection > --- > > Key: SPARK-10207 > URL: https://issues.apache.org/jira/browse/SPARK-10207 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [Word2Vec.save|https://github.com/apache/spark/blob/7cfc0750e14f2c1b3847e4720cc02150253525a9/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala#L615] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10207) Specify schema during Word2Vec.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715141#comment-14715141 ] Feynman Liang commented on SPARK-10207: --- [~srowen] I linked them using "is required by" but in retrospect I think making these subtasks of SPARK-10199 is more appropriate, thanks for bring that up! I will make that change. Let's discuss the issue of large numbers of logically similar issues in SPARK-7751. > Specify schema during Word2Vec.save to avoid reflection > --- > > Key: SPARK-10207 > URL: https://issues.apache.org/jira/browse/SPARK-10207 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [Word2Vec.save|https://github.com/apache/spark/blob/7cfc0750e14f2c1b3847e4720cc02150253525a9/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala#L615] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10199) Avoid using reflections for parquet model save
[ https://issues.apache.org/jira/browse/SPARK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712492#comment-14712492 ] Feynman Liang commented on SPARK-10199: --- Hi [~vinodkc], I saw that you took all of these issues. Thanks for your help! To make things easier for review, do you mind grouping all the changes into a single PR? > Avoid using reflections for parquet model save > -- > > Key: SPARK-10199 > URL: https://issues.apache.org/jira/browse/SPARK-10199 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Feynman Liang >Priority: Minor > > These items are not high priority since the overhead writing to Parquest is > much greater than for runtime reflections. > Multiple model save/load in MLlib use case classes to infer a schema for the > data frame saved to Parquet. However, inferring a schema from case classes or > tuples uses [runtime > reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361] > which is unnecessary since the types are already known at the time `save` is > called. > It would be better to just specify the schema for the data frame directly > using {{sqlContext.createDataFrame(dataRDD, schema)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10257) Remove Guava dependencies in spark.mllib JavaTests
Feynman Liang created SPARK-10257: - Summary: Remove Guava dependencies in spark.mllib JavaTests Key: SPARK-10257 URL: https://issues.apache.org/jira/browse/SPARK-10257 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Feynman Liang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10254) Remove Guava dependencies in spark.ml.feature
[ https://issues.apache.org/jira/browse/SPARK-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10254: -- Priority: Minor (was: Major) > Remove Guava dependencies in spark.ml.feature > - > > Key: SPARK-10254 > URL: https://issues.apache.org/jira/browse/SPARK-10254 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Feynman Liang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10256) Remove Guava dependencies in spark.ml.classificaiton
Feynman Liang created SPARK-10256: - Summary: Remove Guava dependencies in spark.ml.classificaiton Key: SPARK-10256 URL: https://issues.apache.org/jira/browse/SPARK-10256 Project: Spark Issue Type: Improvement Components: ML Reporter: Feynman Liang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10255) Remove Guava dependencies in spark.ml.param
Feynman Liang created SPARK-10255: - Summary: Remove Guava dependencies in spark.ml.param Key: SPARK-10255 URL: https://issues.apache.org/jira/browse/SPARK-10255 Project: Spark Issue Type: Improvement Components: ML Reporter: Feynman Liang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10254) Remove Guava dependencies in spark.ml.feature
Feynman Liang created SPARK-10254: - Summary: Remove Guava dependencies in spark.ml.feature Key: SPARK-10254 URL: https://issues.apache.org/jira/browse/SPARK-10254 Project: Spark Issue Type: Improvement Components: ML Reporter: Feynman Liang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10253) Remove Guava dependencies in MLlib java tests
[ https://issues.apache.org/jira/browse/SPARK-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712443#comment-14712443 ] Feynman Liang commented on SPARK-10253: --- Working on this > Remove Guava dependencies in MLlib java tests > - > > Key: SPARK-10253 > URL: https://issues.apache.org/jira/browse/SPARK-10253 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Feynman Liang >Priority: Minor > > Many tests depend on Google Guava's {{Lists.newArrayList}} when > {{java.util.Arrays.asList}} could be used instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10253) Remove Guava dependencies in MLlib java tests
Feynman Liang created SPARK-10253: - Summary: Remove Guava dependencies in MLlib java tests Key: SPARK-10253 URL: https://issues.apache.org/jira/browse/SPARK-10253 Project: Spark Issue Type: Improvement Components: ML, MLlib Reporter: Feynman Liang Priority: Minor Many tests depend on Google Guava's {{Lists.newArrayList}} when {{java.util.Arrays.asList}} could be used instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9680) Update programming guide section for ml.feature.StopWordsRemover
[ https://issues.apache.org/jira/browse/SPARK-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712374#comment-14712374 ] Feynman Liang commented on SPARK-9680: -- [~holdenk] phew ok, I just wanted to make sure my PR didn't redo anything you've already worked on :) > Update programming guide section for ml.feature.StopWordsRemover > > > Key: SPARK-9680 > URL: https://issues.apache.org/jira/browse/SPARK-9680 > Project: Spark > Issue Type: Documentation > Components: ML >Reporter: yuhao yang >Assignee: Feynman Liang >Priority: Minor > Labels: document > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10249) Add Python Code Example to StopWordsRemover User GUide
Feynman Liang created SPARK-10249: - Summary: Add Python Code Example to StopWordsRemover User GUide Key: SPARK-10249 URL: https://issues.apache.org/jira/browse/SPARK-10249 Project: Spark Issue Type: Improvement Components: ML Reporter: Feynman Liang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10249) Add Python Code Example to StopWordsRemover User Guide
[ https://issues.apache.org/jira/browse/SPARK-10249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10249: -- Summary: Add Python Code Example to StopWordsRemover User Guide (was: Add Python Code Example to StopWordsRemover User GUide) > Add Python Code Example to StopWordsRemover User Guide > -- > > Key: SPARK-10249 > URL: https://issues.apache.org/jira/browse/SPARK-10249 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Feynman Liang >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9680) Update programming guide section for ml.feature.StopWordsRemover
[ https://issues.apache.org/jira/browse/SPARK-9680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712125#comment-14712125 ] Feynman Liang commented on SPARK-9680: -- [~mengxr] please assign to me > Update programming guide section for ml.feature.StopWordsRemover > > > Key: SPARK-9680 > URL: https://issues.apache.org/jira/browse/SPARK-9680 > Project: Spark > Issue Type: Documentation > Components: ML >Reporter: yuhao yang >Priority: Minor > Labels: document > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-9796) LogisticRegressionModel Docs Completeness
[ https://issues.apache.org/jira/browse/SPARK-9796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang closed SPARK-9796. Resolution: Not A Problem > LogisticRegressionModel Docs Completeness > - > > Key: SPARK-9796 > URL: https://issues.apache.org/jira/browse/SPARK-9796 > Project: Spark > Issue Type: Documentation > Components: ML >Reporter: Feynman Liang >Priority: Minor > Labels: starter > > Add docs for > * LogisticRegressionModel$.load > * LogisticRegressionModel.save > * LogisticRegressionModel.toString() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9796) LogisticRegressionModel Docs Completeness
[ https://issues.apache.org/jira/browse/SPARK-9796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711857#comment-14711857 ] Feynman Liang commented on SPARK-9796: -- Closing since this problem appears only in how unidoc handles @Since annotations for overriden methods; the actual generated scaladocs are fine. > LogisticRegressionModel Docs Completeness > - > > Key: SPARK-9796 > URL: https://issues.apache.org/jira/browse/SPARK-9796 > Project: Spark > Issue Type: Documentation > Components: ML >Reporter: Feynman Liang >Priority: Minor > Labels: starter > > Add docs for > * LogisticRegressionModel$.load > * LogisticRegressionModel.save > * LogisticRegressionModel.toString() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9799) SVMModel documentation improvements
[ https://issues.apache.org/jira/browse/SPARK-9799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711855#comment-14711855 ] Feynman Liang commented on SPARK-9799: -- Closing since this problem appears only in how unidoc handles @Since annotations for overriden methods; the actual generated scaladocs are fine. > SVMModel documentation improvements > --- > > Key: SPARK-9799 > URL: https://issues.apache.org/jira/browse/SPARK-9799 > Project: Spark > Issue Type: Documentation > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > Labels: starter > > SVMModel missing descriptions in documentation for save, load, and toString -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-9799) SVMModel documentation improvements
[ https://issues.apache.org/jira/browse/SPARK-9799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang closed SPARK-9799. Resolution: Not A Problem > SVMModel documentation improvements > --- > > Key: SPARK-9799 > URL: https://issues.apache.org/jira/browse/SPARK-9799 > Project: Spark > Issue Type: Documentation > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > Labels: starter > > SVMModel missing descriptions in documentation for save, load, and toString -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10199) Avoid using reflections for parquet model save
[ https://issues.apache.org/jira/browse/SPARK-10199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10199: -- Description: These items are not high priority since the overhead writing to Parquest is much greater than for runtime reflections. Multiple model save/load in MLlib use case classes to infer a schema for the data frame saved to Parquet. However, inferring a schema from case classes or tuples uses [runtime reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361] which is unnecessary since the types are already known at the time `save` is called. It would be better to just specify the schema for the data frame directly using {{sqlContext.createDataFrame(dataRDD, schema)}} was: Multiple model save/load in MLlib use case classes to infer a schema for the data frame saved to Parquet. However, inferring a schema from case classes or tuples uses [runtime reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361] which is unnecessary since the types are already known at the time `save` is called. It would be better to just specify the schema for the data frame directly using {{sqlContext.createDataFrame(dataRDD, schema)}} > Avoid using reflections for parquet model save > -- > > Key: SPARK-10199 > URL: https://issues.apache.org/jira/browse/SPARK-10199 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Reporter: Feynman Liang >Priority: Minor > > These items are not high priority since the overhead writing to Parquest is > much greater than for runtime reflections. > Multiple model save/load in MLlib use case classes to infer a schema for the > data frame saved to Parquet. However, inferring a schema from case classes or > tuples uses [runtime > reflection|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L361] > which is unnecessary since the types are already known at the time `save` is > called. > It would be better to just specify the schema for the data frame directly > using {{sqlContext.createDataFrame(dataRDD, schema)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10205) Specify schema during PowerIterationClustering.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10205: -- Priority: Minor (was: Major) > Specify schema during PowerIterationClustering.save to avoid reflection > --- > > Key: SPARK-10205 > URL: https://issues.apache.org/jira/browse/SPARK-10205 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [PowerIterationClustering.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala#L82] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10200) Specify schema during GLMRegressionModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10200: -- Target Version/s: (was: 1.6.0) > Specify schema during GLMRegressionModel.save to avoid reflection > - > > Key: SPARK-10200 > URL: https://issues.apache.org/jira/browse/SPARK-10200 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [GLMRegressionModel.save|https://github.com/apache/spark/blob/d7b4c095271c36fcc7f9ded267ecf5ec66fac803/mllib/src/main/scala/org/apache/spark/mllib/regression/impl/GLMRegressionModel.scala#L44] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10207) Specify schema during Word2Vec.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10207: -- Priority: Minor (was: Major) > Specify schema during Word2Vec.save to avoid reflection > --- > > Key: SPARK-10207 > URL: https://issues.apache.org/jira/browse/SPARK-10207 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [Word2Vec.save|https://github.com/apache/spark/blob/7cfc0750e14f2c1b3847e4720cc02150253525a9/mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala#L615] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10202) Specify schema during KMeansModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10202: -- Target Version/s: (was: 1.6.0) > Specify schema during KMeansModel.save to avoid reflection > -- > > Key: SPARK-10202 > URL: https://issues.apache.org/jira/browse/SPARK-10202 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [KMeansModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala#L110] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10206) Specify schema during IsotonicRegression.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10206: -- Target Version/s: (was: 1.6.0) > Specify schema during IsotonicRegression.save to avoid reflection > - > > Key: SPARK-10206 > URL: https://issues.apache.org/jira/browse/SPARK-10206 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [IsotonicRegression.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala#L184] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10201) Specify schema during GaussianMixtureModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10201: -- Target Version/s: (was: 1.6.0) > Specify schema during GaussianMixtureModel.save to avoid reflection > --- > > Key: SPARK-10201 > URL: https://issues.apache.org/jira/browse/SPARK-10201 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [GaussianMixtureModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala#L140] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10201) Specify schema during GaussianMixtureModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10201: -- Priority: Minor (was: Major) > Specify schema during GaussianMixtureModel.save to avoid reflection > --- > > Key: SPARK-10201 > URL: https://issues.apache.org/jira/browse/SPARK-10201 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [GaussianMixtureModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixtureModel.scala#L140] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10208) Specify schema during LocalLDAModel.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10208: -- Target Version/s: (was: 1.6.0) > Specify schema during LocalLDAModel.save to avoid reflection > > > Key: SPARK-10208 > URL: https://issues.apache.org/jira/browse/SPARK-10208 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [LocalLDAModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala#L389] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10205) Specify schema during PowerIterationClustering.save to avoid reflection
[ https://issues.apache.org/jira/browse/SPARK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feynman Liang updated SPARK-10205: -- Target Version/s: (was: 1.6.0) > Specify schema during PowerIterationClustering.save to avoid reflection > --- > > Key: SPARK-10205 > URL: https://issues.apache.org/jira/browse/SPARK-10205 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Feynman Liang >Priority: Minor > > [PowerIterationClustering.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala#L82] > currently infers a schema from a case class when the schema is known and > should be manually provided. > See parent JIRA for rationale. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org