[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...
Github user evanyc15 commented on the issue: https://github.com/apache/spark/pull/14653 Hey @jkbradley the checkParams method already exists in the Python side. It's defined in the tests.py DefaultValuesTests class and is being called by test_java_params. I'm removing the param testing from the Python Doctests now and will be implementing the Unit test in one of the classes for now. Once approved, I will then implement the Unit test in the remaining classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...
Github user evanyc15 commented on the issue: https://github.com/apache/spark/pull/14653 @MLnick @jkbradley Do you mind merging the PR? Thank you --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...
Github user evanyc15 commented on the issue: https://github.com/apache/spark/pull/14653 @davidnavas Hi David, I have rebased and pushed again. Could you tell Jenkins to re-test the PR? Thank you --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...
Github user evanyc15 commented on the issue: https://github.com/apache/spark/pull/14653 @holdenk I just rebased and pushed again. Hopefully, Jenkins passes this time --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...
Github user evanyc15 commented on the issue: https://github.com/apache/spark/pull/14653 Hey Joseph, I've resolved the merge conflicts. Can you please test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...
Github user evanyc15 commented on the issue: https://github.com/apache/spark/pull/14653 Sounds good Joseph. I'll resolve the conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...
Github user evanyc15 commented on the issue: https://github.com/apache/spark/pull/14653 @MechCoder I made the changes for emap -> estimator_paramMap, mmap -> model_paramMap, and (param, value) -> param, value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...
Github user evanyc15 commented on the issue: https://github.com/apache/spark/pull/14653 CC @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10270: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...
Github user evanyc15 commented on the issue: https://github.com/apache/spark/pull/10270 New Pull Request #14653 created for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models shou...
GitHub user evanyc15 opened a pull request: https://github.com/apache/spark/pull/14653 [SPARK-10931][PYSPARK][ML] PySpark ML Models should contain Param values ## What changes were proposed in this pull request? Changed PySpark models to include the Param values. Refer to the closed PR 10270 for additional information. ## How was this patch tested? Tested using Python doctests ## Changesets: Estimator UID is being copied correctly to the Transformer model objects and params now, working on Doctests Changed the way parameters are copied from the Estimator to Transformer Checkpoint, switching back to inheritance method Working on DocTests Implemented Doctests for Recommendation, Clustering, Classification (except RandomForestClassifier), Evaluation, Tuning, Regression (except RandomRegression) Ready for Code Review Code Review changeset #1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/evanyc15/spark SPARK-10931-pyspark-mllib Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14653.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14653 commit 2f9417ca3419afb421f4e86d082325fb5b10bbbf Author: Evan Chen <ch...@us.ibm.com> Date: 2015-11-19T03:54:57Z Copied parameters over from Estimator to Transformer Estimator UID is being copied correctly to the Transformer model objects and params now, working on Doctests Changed the way parameters are copied from the Estimator to Transformer Checkpoint, switching back to inheritance method Working on DocTests Implemented Doctests for Recommendation, Clustering, Classification (except RandomForestClassifier), Evaluation, Tuning, Regression (except RandomRegression) Ready for Code Review Code Review changeset #1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10270: [SPARK-10931][PYSPARK][ML] PySpark ML Models shou...
Github user evanyc15 closed the pull request at: https://github.com/apache/spark/pull/10270 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10270: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...
Github user evanyc15 commented on the issue: https://github.com/apache/spark/pull/10270 Yeah, I'll take a look. Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10270: [SPARK-10931][PYSPARK][ML] PySpark ML Models should cont...
Github user evanyc15 commented on the issue: https://github.com/apache/spark/pull/10270 Updated branch to the newest Spark changes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10931][PYSPARK][ML] PySpark ML Models s...
Github user evanyc15 commented on the pull request: https://github.com/apache/spark/pull/10270#issuecomment-218025352 CC @MLnick. Can you please review my code? Thank you --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10931][PYSPARK][ML] PySpark ML Models s...
Github user evanyc15 commented on the pull request: https://github.com/apache/spark/pull/10270#issuecomment-217942891 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10931][PYSPARK][ML] PySpark ML Models s...
Github user evanyc15 commented on the pull request: https://github.com/apache/spark/pull/10270#issuecomment-217027553 Hey Joseph, Just pushed in the new changes. Thank you --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10931][PYSPARK][ML] PySpark ML Models s...
Github user evanyc15 commented on the pull request: https://github.com/apache/spark/pull/10270#issuecomment-215257517 Hey Joseph, I'll work on getting the merge conflicts resolved. Thank you --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12376][TESTS] Spark Streaming Java8APIS...
GitHub user evanyc15 opened a pull request: https://github.com/apache/spark/pull/10336 [SPARK-12376][TESTS] Spark Streaming Java8APISuite fails in assertOrderInvariantEquals method org.apache.spark.streaming.Java8APISuite.java is failing due to trying to sort immutable list in assertOrderInvariantEquals method. You can merge this pull request into a Git repository by running: $ git pull https://github.com/evanyc15/spark SPARK-12376-StreamingJavaAPISuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10336.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10336 commit 28ff404c59aeff1b6604b70fb153d4d33e21635b Author: Evan Chen <ch...@us.ibm.com> Date: 2015-12-16T20:27:51Z Changes to Spark Streaming Java8APISuite.java --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12376][TESTS] Spark Streaming Java8APIS...
Github user evanyc15 commented on a diff in the pull request: https://github.com/apache/spark/pull/10336#discussion_r47838157 --- Diff: extras/java8-tests/src/test/java/org/apache/spark/streaming/Java8APISuite.java --- @@ -440,8 +441,13 @@ public void testPairFlatMap() { public static > void assertOrderInvariantEquals( List<List> expected, List<List> actual) { expected.forEach((List list) -> Collections.sort(list)); -actual.forEach((List list) -> Collections.sort(list)); -Assert.assertEquals(expected, actual); +ArrayList<ArrayList> sortedActual = new ArrayList<ArrayList>(); +actual.forEach((List list) -> { --- End diff -- Hey srowen, I feel like this can be more confusing to follow. Does your code alternative have any increases in performance? Thank you --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10931][PYSPARK][ML] PySpark ML Models s...
Github user evanyc15 commented on the pull request: https://github.com/apache/spark/pull/10270#issuecomment-164570713 Hey all, I've resolved the merge conflicts. Thanks, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10931][PYSPARK][ML] PySpark ML Models s...
GitHub user evanyc15 opened a pull request: https://github.com/apache/spark/pull/10270 [SPARK-10931][PYSPARK][ML] PySpark ML Models should contain Param values PySpark spark.ml Models are generally wrappers around Java objects and do not even contain Param values. This JIRA is for copying the Param values from the Estimator to the model. This can likely be solved by modifying Estimator.fit to copy Param values, but should also include proper unit tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/evanyc15/spark SPARK-10931-pyspark-mllib Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10270.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10270 commit 53062d1edc08bf89b7cdb46969c182aa0f26dbe4 Author: Evan Chen <ch...@us.ibm.com> Date: 2015-11-19T03:54:57Z Copied parameters over from Estimator to Transformer commit f0b124a1f67037f854d1e7891091ba4d1cdcecc8 Author: Evan Chen <ch...@us.ibm.com> Date: 2015-11-24T00:44:53Z Estimator UID is being copied correctly to the Transformer model objects and params now, working on Doctests commit 1c5a791775f7f078b3a488c5ea88beed29c2a8d7 Author: Evan Chen <ch...@us.ibm.com> Date: 2015-11-25T00:16:32Z Changed the way parameters are copied from the Estimator to Transformer commit 332cc670b61c5bd19cb5cea705a307440fc92868 Author: Evan Chen <ch...@us.ibm.com> Date: 2015-12-01T22:51:24Z Checkpoint, switching back to inheritance method commit 07fbbfd91692ecb61b0e8659ee296dfaf3150f13 Author: Evan Chen <ch...@us.ibm.com> Date: 2015-12-02T00:54:41Z Working on DocTests commit d86e1dfb33aadfae3a151edf0ceaa6593cfa074e Author: Evan Chen <ch...@us.ibm.com> Date: 2015-12-03T02:07:05Z Implemented Doctests for Recommendation, Clustering, Classification (except RandomForestClassifier), Evaluation, Tuning, Regression (except RandomRegression) commit a5902cfc6622eb4c6c5d83a489f6693b08f04518 Author: Evan Chen <ch...@us.ibm.com> Date: 2015-12-04T23:20:42Z Ready for Code Review commit 24dd45a30b75c9b7e33edf37993b2277f5cbe606 Author: Evan Chen <ch...@us.ibm.com> Date: 2015-12-11T01:35:40Z Code Review changeset #1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10779][PYSPARK][MLLIB] Set initialModel...
Github user evanyc15 commented on the pull request: https://github.com/apache/spark/pull/8967#issuecomment-146285890 Jenkins test this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10779][PYSPARK][MLLIB] Set initialModel...
Github user evanyc15 commented on the pull request: https://github.com/apache/spark/pull/8967#issuecomment-146029714 Jenkins test this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10779][PYSPARK][MLLIB] Set initialModel...
Github user evanyc15 commented on the pull request: https://github.com/apache/spark/pull/8967#issuecomment-146021824 Hey jkbradley, I have pushed another commit that removes the excess parameters in the Doctest and adds an if statement for checking if initialModel is not None and not isinstance of KMeansModel Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10779][PYSPARK][MLLIB] Set initialModel...
GitHub user evanyc15 opened a pull request: https://github.com/apache/spark/pull/8967 [SPARK-10779][PYSPARK][MLLIB] Set initialModel for KMeans model in PySpark (spark.mllib) Provide initialModel param for pyspark.mllib.clustering.KMeans You can merge this pull request into a Git repository by running: $ git pull https://github.com/evanyc15/spark SPARK-10779-pyspark-mllib Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8967.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8967 commit 3287f2a52410357cb75064e03d2a39e2eb1317d3 Author: Evan Chen <ch...@us.ibm.com> Date: 2015-09-30T01:17:09Z Modified PythonMMLLibAPI to set initialModel for KMeansModel in PySpark commit ce2e972bbfb1fd899dce4ca7b35aa486f96facce Author: Evan Chen <ch...@us.ibm.com> Date: 2015-09-30T20:38:09Z Added if statements for type checking commit 45780d3f7d335b1449528835401022fbf2b1b345 Author: Evan Chen <ch...@us.ibm.com> Date: 2015-09-30T22:23:12Z Working on Doctests now commit d2380ac11c26d6848889f83321f1703e31efdbf1 Author: Evan Chen <ch...@us.ibm.com> Date: 2015-10-02T19:36:32Z Doctests written, ready for pull request --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org