[jira] [Closed] (SPARK-8779) Add documentation for Python's FP-growth
[ https://issues.apache.org/jira/browse/SPARK-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hrishikesh closed SPARK-8779. - Already fixed. > Add documentation for Python's FP-growth > > > Key: SPARK-8779 > URL: https://issues.apache.org/jira/browse/SPARK-8779 > Project: Spark > Issue Type: Documentation > Components: Documentation, MLlib, PySpark >Reporter: Hrishikesh >Priority: Minor > > We need to add documentation for Python FP-Growth in the MLlib Programming > Guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8467) Add LDAModel.describeTopics() in Python
[ https://issues.apache.org/jira/browse/SPARK-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14706257#comment-14706257 ] Hrishikesh edited comment on SPARK-8467 at 9/1/15 6:49 AM: --- [~yuu.ishik...@gmail.com], are you still working on this? was (Author: hrishikesh91): [~yuu.ishik...@gmail.com], are you still working on this? > Add LDAModel.describeTopics() in Python > --- > > Key: SPARK-8467 > URL: https://issues.apache.org/jira/browse/SPARK-8467 > Project: Spark > Issue Type: New Feature > Components: MLlib, PySpark >Reporter: Yu Ishikawa > > Add LDAModel. describeTopics() in Python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8467) Add LDAModel.describeTopics() in Python
[ https://issues.apache.org/jira/browse/SPARK-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14706257#comment-14706257 ] Hrishikesh edited comment on SPARK-8467 at 9/1/15 6:50 AM: --- [~yuu.ishik...@gmail.com], are you still working on this? was (Author: hrishikesh91): [~yuu.ishik...@gmail.com], are you still working on this? > Add LDAModel.describeTopics() in Python > --- > > Key: SPARK-8467 > URL: https://issues.apache.org/jira/browse/SPARK-8467 > Project: Spark > Issue Type: New Feature > Components: MLlib, PySpark >Reporter: Yu Ishikawa > > Add LDAModel. describeTopics() in Python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8467) Add LDAModel.describeTopics() in Python
[ https://issues.apache.org/jira/browse/SPARK-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706257#comment-14706257 ] Hrishikesh commented on SPARK-8467: --- [~yuu.ishik...@gmail.com], are you still working on this? Add LDAModel.describeTopics() in Python --- Key: SPARK-8467 URL: https://issues.apache.org/jira/browse/SPARK-8467 Project: Spark Issue Type: New Feature Components: MLlib, PySpark Reporter: Yu Ishikawa Add LDAModel. describeTopics() in Python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5567) Add prediction methods to LDA
[ https://issues.apache.org/jira/browse/SPARK-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14661502#comment-14661502 ] Hrishikesh commented on SPARK-5567: --- Hi [~fliang], is there any way I can use topicDistributions from the master branch via spark-shell, similar to how you call it in the test suite? When i tried to call it in shell, I get {{error: value topicDistributions is not a member of org.apache.spark.mllib.clustering.LDAModel}} Add prediction methods to LDA - Key: SPARK-5567 URL: https://issues.apache.org/jira/browse/SPARK-5567 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Assignee: Feynman Liang Fix For: 1.5.0 Original Estimate: 168h Remaining Estimate: 168h LDA currently supports prediction on the training set. E.g., you can call logLikelihood and topicDistributions to get that info for the training data. However, it should support the same functionality for new (test) documents. This will require inference but should be able to use the same code, with a few modification to keep the inferred topics fixed. Note: The API for these methods is already in the code but is commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6724) Model import/export for FPGrowth
[ https://issues.apache.org/jira/browse/SPARK-6724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616267#comment-14616267 ] Hrishikesh commented on SPARK-6724: --- [~hujiayin] sure! Model import/export for FPGrowth Key: SPARK-6724 URL: https://issues.apache.org/jira/browse/SPARK-6724 Project: Spark Issue Type: Sub-task Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Priority: Minor Note: experimental model API -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6724) Model import/export for FPGrowth
[ https://issues.apache.org/jira/browse/SPARK-6724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614920#comment-14614920 ] Hrishikesh commented on SPARK-6724: --- I am facing some issues in my code: https://github.com/FlytxtRnD/spark/commit/61fc9ee35f2a47402eb977c048fd07289141fe64 . I am getting the following error when I tried to build: _No TypeTag available for (Array\[Item], Long) val dataRDD = model.map(x= (x.items,x.freq)).toDF()_ I tried to return just a tuple of (Array\[Item], Long) rather than using the case class, but still results in the same error. Also, how to pass ClassTag to object FPGrowthModel to extend the Loader\[FPGrowthModel] ? Can anybody please help? Model import/export for FPGrowth Key: SPARK-6724 URL: https://issues.apache.org/jira/browse/SPARK-6724 Project: Spark Issue Type: Sub-task Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Priority: Minor Note: experimental model API -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8779) Add documentation for Python's FP-growth
Hrishikesh created SPARK-8779: - Summary: Add documentation for Python's FP-growth Key: SPARK-8779 URL: https://issues.apache.org/jira/browse/SPARK-8779 Project: Spark Issue Type: Documentation Components: Documentation, MLlib, PySpark Reporter: Hrishikesh Priority: Minor We need to add documentation for Python FP-Growth in the MLlib Programming Guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6724) Model import/export for FPGrowth
[ https://issues.apache.org/jira/browse/SPARK-6724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600639#comment-14600639 ] Hrishikesh commented on SPARK-6724: --- [~MechCoder].. yes, I'm working on this. Will let you know if I need any help. Thank you. Model import/export for FPGrowth Key: SPARK-6724 URL: https://issues.apache.org/jira/browse/SPARK-6724 Project: Spark Issue Type: Sub-task Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Priority: Minor Note: experimental model API -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6724) Model import/export for FPGrowth
[ https://issues.apache.org/jira/browse/SPARK-6724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583139#comment-14583139 ] Hrishikesh edited comment on SPARK-6724 at 6/16/15 4:22 AM: [~josephkb], please assign this ticket to me. was (Author: hrishikesh91): [~josephkb], please assign this ticket to me. Model import/export for FPGrowth Key: SPARK-6724 URL: https://issues.apache.org/jira/browse/SPARK-6724 Project: Spark Issue Type: Sub-task Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Priority: Minor Note: experimental model API -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7106) Support model save/load in Python's FPGrowth
[ https://issues.apache.org/jira/browse/SPARK-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581754#comment-14581754 ] Hrishikesh edited comment on SPARK-7106 at 6/12/15 7:32 AM: [~josephkb], Shouldn't save/load method be added in Scala first in order to work on this? was (Author: hrishikesh91): Shouldn't save/load method be added in Scala first in order to work on this? Support model save/load in Python's FPGrowth Key: SPARK-7106 URL: https://issues.apache.org/jira/browse/SPARK-7106 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6724) Model import/export for FPGrowth
[ https://issues.apache.org/jira/browse/SPARK-6724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583139#comment-14583139 ] Hrishikesh commented on SPARK-6724: --- [~josephkb], please assign this ticket to me. Model import/export for FPGrowth Key: SPARK-6724 URL: https://issues.apache.org/jira/browse/SPARK-6724 Project: Spark Issue Type: Sub-task Components: MLlib Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Priority: Minor Note: experimental model API -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-7106) Support model save/load in Python's FPGrowth
[ https://issues.apache.org/jira/browse/SPARK-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hrishikesh updated SPARK-7106: -- Comment: was deleted (was: Shouldn't save/load method be added in Scala first in order to work on this?) Support model save/load in Python's FPGrowth Key: SPARK-7106 URL: https://issues.apache.org/jira/browse/SPARK-7106 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7106) Support model save/load in Python's FPGrowth
[ https://issues.apache.org/jira/browse/SPARK-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581755#comment-14581755 ] Hrishikesh commented on SPARK-7106: --- Shouldn't save/load method be added in Scala first in order to work on this? Support model save/load in Python's FPGrowth Key: SPARK-7106 URL: https://issues.apache.org/jira/browse/SPARK-7106 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7106) Support model save/load in Python's FPGrowth
[ https://issues.apache.org/jira/browse/SPARK-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581754#comment-14581754 ] Hrishikesh commented on SPARK-7106: --- Shouldn't save/load method be added in Scala first in order to work on this? Support model save/load in Python's FPGrowth Key: SPARK-7106 URL: https://issues.apache.org/jira/browse/SPARK-7106 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-7106) Support model save/load in Python's FPGrowth
[ https://issues.apache.org/jira/browse/SPARK-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574225#comment-14574225 ] Hrishikesh edited comment on SPARK-7106 at 6/11/15 8:56 AM: [~josephkb], Do we have support for save/load in scala? was (Author: hrishikesh91): Do we have support for save/load in scala? Support model save/load in Python's FPGrowth Key: SPARK-7106 URL: https://issues.apache.org/jira/browse/SPARK-7106 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7106) Support model save/load in Python's FPGrowth
[ https://issues.apache.org/jira/browse/SPARK-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574225#comment-14574225 ] Hrishikesh commented on SPARK-7106: --- Do we have support for save/load in scala? Support model save/load in Python's FPGrowth Key: SPARK-7106 URL: https://issues.apache.org/jira/browse/SPARK-7106 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Joseph K. Bradley Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6258) Python MLlib API missing items: Clustering
[ https://issues.apache.org/jira/browse/SPARK-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530438#comment-14530438 ] Hrishikesh edited comment on SPARK-6258 at 5/6/15 12:34 PM: [~yanboliang], I got stuck at one stage. You can work on it. was (Author: hrishikesh91): [~yanboliang], you can start working on it. Python MLlib API missing items: Clustering -- Key: SPARK-6258 URL: https://issues.apache.org/jira/browse/SPARK-6258 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Affects Versions: 1.3.0 Reporter: Joseph K. Bradley This JIRA lists items missing in the Python API for this sub-package of MLlib. This list may be incomplete, so please check again when sending a PR to add these features to the Python API. Also, please check for major disparities between documentation; some parts of the Python API are less well-documented than their Scala counterparts. Some items may be listed in the umbrella JIRA linked to this task. KMeans * setEpsilon * setInitializationSteps KMeansModel * computeCost * k GaussianMixture * setInitialModel GaussianMixtureModel * k Completely missing items which should be fixed in separate JIRAs (which have been created and linked to the umbrella JIRA) * LDA * PowerIterationClustering * StreamingKMeans -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6258) Python MLlib API missing items: Clustering
[ https://issues.apache.org/jira/browse/SPARK-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530438#comment-14530438 ] Hrishikesh commented on SPARK-6258: --- [~yanboliang], you can start working on it. Python MLlib API missing items: Clustering -- Key: SPARK-6258 URL: https://issues.apache.org/jira/browse/SPARK-6258 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Affects Versions: 1.3.0 Reporter: Joseph K. Bradley This JIRA lists items missing in the Python API for this sub-package of MLlib. This list may be incomplete, so please check again when sending a PR to add these features to the Python API. Also, please check for major disparities between documentation; some parts of the Python API are less well-documented than their Scala counterparts. Some items may be listed in the umbrella JIRA linked to this task. KMeans * setEpsilon * setInitializationSteps KMeansModel * computeCost * k GaussianMixture * setInitialModel GaussianMixtureModel * k Completely missing items which should be fixed in separate JIRAs (which have been created and linked to the umbrella JIRA) * LDA * PowerIterationClustering * StreamingKMeans -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6258) Python MLlib API missing items: Clustering
[ https://issues.apache.org/jira/browse/SPARK-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386394#comment-14386394 ] Hrishikesh commented on SPARK-6258: --- Hi [~josephkb] I am a newbie to spark and I would like to contribute. Could you assign this ticket to me? Python MLlib API missing items: Clustering -- Key: SPARK-6258 URL: https://issues.apache.org/jira/browse/SPARK-6258 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Affects Versions: 1.3.0 Reporter: Joseph K. Bradley This JIRA lists items missing in the Python API for this sub-package of MLlib. This list may be incomplete, so please check again when sending a PR to add these features to the Python API. Also, please check for major disparities between documentation; some parts of the Python API are less well-documented than their Scala counterparts. Some items may be listed in the umbrella JIRA linked to this task. KMeans * setEpsilon * setInitializationSteps KMeansModel * computeCost * k GaussianMixture * setInitialModel GaussianMixtureModel * k Completely missing items which should be fixed in separate JIRAs (which have been created and linked to the umbrella JIRA) * LDA * PowerIterationClustering * StreamingKMeans -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6258) Python MLlib API missing items: Clustering
[ https://issues.apache.org/jira/browse/SPARK-6258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388003#comment-14388003 ] Hrishikesh commented on SPARK-6258: --- [~josephkb] Thank you for your response and valuable suggestions! Will send the PR asap. Python MLlib API missing items: Clustering -- Key: SPARK-6258 URL: https://issues.apache.org/jira/browse/SPARK-6258 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Affects Versions: 1.3.0 Reporter: Joseph K. Bradley This JIRA lists items missing in the Python API for this sub-package of MLlib. This list may be incomplete, so please check again when sending a PR to add these features to the Python API. Also, please check for major disparities between documentation; some parts of the Python API are less well-documented than their Scala counterparts. Some items may be listed in the umbrella JIRA linked to this task. KMeans * setEpsilon * setInitializationSteps KMeansModel * computeCost * k GaussianMixture * setInitialModel GaussianMixtureModel * k Completely missing items which should be fixed in separate JIRAs (which have been created and linked to the umbrella JIRA) * LDA * PowerIterationClustering * StreamingKMeans -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6612) Python KMeans parity
[ https://issues.apache.org/jira/browse/SPARK-6612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388000#comment-14388000 ] Hrishikesh commented on SPARK-6612: --- Please assign this ticket to me. Python KMeans parity Key: SPARK-6612 URL: https://issues.apache.org/jira/browse/SPARK-6612 Project: Spark Issue Type: Improvement Components: MLlib, PySpark Affects Versions: 1.3.0 Reporter: Joseph K. Bradley Priority: Minor This is a subtask of [SPARK-6258] for the Python API of KMeans. These items are missing: KMeans * setEpsilon * setInitializationSteps KMeansModel * computeCost * k -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org