[jira] [Commented] (SPARK-20040) Python API for ml.stat.ChiSquareTest

2017-03-22 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15936902#comment-15936902 ] Bago Amirbekian commented on SPARK-20040: - I'd like to work on this. > Python API for

[jira] [Created] (SPARK-20861) Pyspark CrossValidator & TrainValidationSplit should delegate parameter looping to estimators

2017-05-23 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-20861: --- Summary: Pyspark CrossValidator & TrainValidationSplit should delegate parameter looping to estimators Key: SPARK-20861 URL:

[jira] [Commented] (SPARK-20861) Pyspark CrossValidator & TrainValidationSplit should delegate parameter looping to estimators

2017-05-23 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022137#comment-16022137 ] Bago Amirbekian commented on SPARK-20861: - [~josephkb] > Pyspark CrossValidator &

[jira] [Issue Comment Deleted] (SPARK-20861) Pyspark CrossValidator & TrainValidationSplit should delegate parameter looping to estimators

2017-05-23 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-20861: Comment: was deleted (was: I've made a PR to address this issue:

[jira] [Commented] (SPARK-20861) Pyspark CrossValidator & TrainValidationSplit should delegate parameter looping to estimators

2017-05-23 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16022134#comment-16022134 ] Bago Amirbekian commented on SPARK-20861: - I've made a PR to address this issue:

[jira] [Created] (SPARK-20862) LogisticRegressionModel throws TypeError

2017-05-23 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-20862: --- Summary: LogisticRegressionModel throws TypeError Key: SPARK-20862 URL: https://issues.apache.org/jira/browse/SPARK-20862 Project: Spark Issue Type:

[jira] [Commented] (SPARK-21926) Some transformers in spark.ml.feature fail when trying to transform streaming dataframes

2017-10-05 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193392#comment-16193392 ] Bago Amirbekian commented on SPARK-21926: - [~mslipper] The trickiest thing about 1 (b) is knowing

[jira] [Commented] (SPARK-13030) Change OneHotEncoder to Estimator

2017-10-05 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193389#comment-16193389 ] Bago Amirbekian commented on SPARK-13030: - Just so I'm clear, does multi-column in this context

[jira] [Updated] (SPARK-22232) Row objects in pyspark created using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-13 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22232: Component/s: SQL > Row objects in pyspark created using the `Row(**kwars)` syntax do not

[jira] [Created] (SPARK-21926) Some transformers in spark.ml.feature fail when trying to transform steaming dataframes

2017-09-05 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-21926: --- Summary: Some transformers in spark.ml.feature fail when trying to transform steaming dataframes Key: SPARK-21926 URL: https://issues.apache.org/jira/browse/SPARK-21926

[jira] [Updated] (SPARK-22232) Row objects in pyspark using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-09 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22232: Description: The fields in a Row object created from a dict (ie `Row(**kwargs)`) should be

[jira] [Commented] (SPARK-22232) Row objects in pyspark using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-09 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198068#comment-16198068 ] Bago Amirbekian commented on SPARK-22232: - Full trace: {code:none} [Row(a=u'a', c=3.0, b=2),

[jira] [Updated] (SPARK-22232) Row objects in pyspark using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-09 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22232: Description: The fields in a Row object created from a dict (ie {{Row(**kwargs)}}) should

[jira] [Updated] (SPARK-22232) Row objects in pyspark using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-09 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22232: Description: The fields in a Row object created from a dict (ie `Row(**kwargs)`) should be

[jira] [Created] (SPARK-22232) Row objects in pyspark using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-09 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-22232: --- Summary: Row objects in pyspark using the `Row(**kwars)` syntax do not get serialized/deserialized properly Key: SPARK-22232 URL:

[jira] [Updated] (SPARK-22232) Row objects in pyspark using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-09 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22232: Description: The fields in a Row object created from a dict (ie `Row(**kwargs)`) should be

[jira] [Updated] (SPARK-22232) Row objects in pyspark using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-09 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22232: Description: The fields in a Row object created from a dict (ie {{Row(**kwargs)}}) should

[jira] [Updated] (SPARK-22232) Row objects in pyspark using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-09 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22232: Description: The fields in a Row object created from a dict (ie {{Row(**kwargs)}}) should

[jira] [Updated] (SPARK-22232) Row objects in pyspark using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-09 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22232: Description: The fields in a Row object created from a dict (ie {{Row(**kwargs)}}) should

[jira] [Updated] (SPARK-22232) Row objects in pyspark using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-09 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22232: Description: The fields in a Row object created from a dict (ie {{Row(**kwargs)}}) should

[jira] [Updated] (SPARK-22232) Row objects in pyspark created using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-10 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22232: Summary: Row objects in pyspark created using the `Row(**kwars)` syntax do not get

[jira] [Updated] (SPARK-22811) pyspark.ml.tests is missing a py4j import.

2017-12-15 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22811: Priority: Minor (was: Major) > pyspark.ml.tests is missing a py4j import. >

[jira] [Created] (SPARK-22811) pyspark.ml.tests is missing a py4j import.

2017-12-15 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-22811: --- Summary: pyspark.ml.tests is missing a py4j import. Key: SPARK-22811 URL: https://issues.apache.org/jira/browse/SPARK-22811 Project: Spark Issue Type:

[jira] [Commented] (SPARK-22126) Fix model-specific optimization support for ML tuning

2017-12-18 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295964#comment-16295964 ] Bago Amirbekian commented on SPARK-22126: - Anyone who's following might want to scan the design

[jira] [Comment Edited] (SPARK-22126) Fix model-specific optimization support for ML tuning

2017-12-18 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295964#comment-16295964 ] Bago Amirbekian edited comment on SPARK-22126 at 12/19/17 12:55 AM:

[jira] [Commented] (SPARK-22346) Update VectorAssembler to work with Structured Streaming

2017-11-10 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248049#comment-16248049 ] Bago Amirbekian commented on SPARK-22346: - I think [~josephkb]'s version of Option 3 makes the

[jira] [Commented] (SPARK-20586) Add deterministic to ScalaUDF

2017-11-21 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261377#comment-16261377 ] Bago Amirbekian commented on SPARK-20586: - Is there some documentation somewhere about the right

[jira] [Commented] (SPARK-20586) Add deterministic to ScalaUDF

2017-11-21 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261393#comment-16261393 ] Bago Amirbekian commented on SPARK-20586: - Also a follow up questions, are the performance

[jira] [Created] (SPARK-22734) Create Python API for VectorSizeHint

2017-12-07 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-22734: --- Summary: Create Python API for VectorSizeHint Key: SPARK-22734 URL: https://issues.apache.org/jira/browse/SPARK-22734 Project: Spark Issue Type:

[jira] [Created] (SPARK-22735) Add VectorSizeHint to ML features documentation

2017-12-07 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-22735: --- Summary: Add VectorSizeHint to ML features documentation Key: SPARK-22735 URL: https://issues.apache.org/jira/browse/SPARK-22735 Project: Spark Issue

[jira] [Comment Edited] (SPARK-22126) Fix model-specific optimization support for ML tuning

2017-12-05 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279218#comment-16279218 ] Bago Amirbekian edited comment on SPARK-22126 at 12/5/17 9:58 PM: -- I

[jira] [Comment Edited] (SPARK-22126) Fix model-specific optimization support for ML tuning

2017-12-05 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279218#comment-16279218 ] Bago Amirbekian edited comment on SPARK-22126 at 12/5/17 9:53 PM: -- I

[jira] [Commented] (SPARK-22126) Fix model-specific optimization support for ML tuning

2017-12-05 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279218#comment-16279218 ] Bago Amirbekian commented on SPARK-22126: - I started a discussion about potential to this issue

[jira] [Commented] (SPARK-22126) Fix model-specific optimization support for ML tuning

2017-12-07 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283145#comment-16283145 ] Bago Amirbekian commented on SPARK-22126: - Joseph, the way I read your comment is to say that we

[jira] [Comment Edited] (SPARK-22126) Fix model-specific optimization support for ML tuning

2017-12-05 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279537#comment-16279537 ] Bago Amirbekian edited comment on SPARK-22126 at 12/6/17 2:19 AM: --

[jira] [Commented] (SPARK-22126) Fix model-specific optimization support for ML tuning

2017-12-05 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16279537#comment-16279537 ] Bago Amirbekian commented on SPARK-22126: - [~WeichenXu123] Sorry I misunderstood, I thought you

[jira] [Updated] (SPARK-22346) Update VectorAssembler to work with StreamingDataframes

2017-10-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22346: Description: The issue In batch mode, VectorAssembler can take multiple columns of

[jira] [Updated] (SPARK-21926) Compatibility between ML Transformers and Structured Streaming

2017-10-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-21926: Description: We've run into a few cases where ML components don't play nice with streaming

[jira] [Updated] (SPARK-21926) Compatibility between ML Transformers and Structured Streaming

2017-10-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-21926: Description: We've run into a few cases where ML components don't play nice with streaming

[jira] [Updated] (SPARK-21926) Compatibility between ML Transformers and Structured Streaming

2017-10-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-21926: Description: We've run into a few cases where ML components don't play nice with streaming

[jira] [Updated] (SPARK-21926) Compatibility between ML Transformers and Structured Streaming

2017-10-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-21926: Description: We've run into a few cases where ML components don't play nice with streaming

[jira] [Updated] (SPARK-21926) Compatibility between ML Transformers and Structured Streaming

2017-10-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-21926: Description: We've run into a few cases where ML components don't play nice with streaming

[jira] [Commented] (SPARK-22346) Update VectorAssembler to work with StreamingDataframes

2017-10-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219109#comment-16219109 ] Bago Amirbekian commented on SPARK-22346: - Nick I see that options as a stepping stone to option

[jira] [Updated] (SPARK-22346) Update VectorAssembler to work with StreamingDataframes

2017-10-24 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22346: Description: The issue In batch mode, VectorAssembler can take multiple columns of

[jira] [Updated] (SPARK-22346) Update VectorAssembler to work with StreamingDataframes

2017-10-24 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22346: Description: The issue In batch mode, VectorAssembler can take multiple columns of

[jira] [Updated] (SPARK-22346) Update VectorAssembler to work with StreamingDataframes

2017-10-24 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22346: Description: The issue In batch mode, VectorAssembler can take multiple columns of

[jira] [Created] (SPARK-22346) Update VectorAssembler to work with StreamingDataframes

2017-10-24 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-22346: --- Summary: Update VectorAssembler to work with StreamingDataframes Key: SPARK-22346 URL: https://issues.apache.org/jira/browse/SPARK-22346 Project: Spark

[jira] [Updated] (SPARK-22346) Update VectorAssembler to work with StreamingDataframes

2017-10-24 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22346: Description: The issue In batch mode, VectorAssembler can take multiple columns of

[jira] [Updated] (SPARK-22346) Update VectorAssembler to work with StreamingDataframes

2017-10-24 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22346: Description: The issue In batch mode, VectorAssembler can take multiple columns of

[jira] [Commented] (SPARK-22126) Fix model-specific optimization support for ML tuning

2018-01-04 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312264#comment-16312264 ] Bago Amirbekian commented on SPARK-22126: - [~bryanc] thanks for taking the time to put together

[jira] [Comment Edited] (SPARK-22126) Fix model-specific optimization support for ML tuning

2018-01-05 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313958#comment-16313958 ] Bago Amirbekian edited comment on SPARK-22126 at 1/5/18 9:48 PM: - > Do

[jira] [Commented] (SPARK-22126) Fix model-specific optimization support for ML tuning

2018-01-05 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313958#comment-16313958 ] Bago Amirbekian commented on SPARK-22126: - > Do you think it's possible to put this kind of

[jira] [Created] (SPARK-23037) RFormula should not use deprecated OneHotEncoder and should include VectorSizeHint in pipeline

2018-01-10 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-23037: --- Summary: RFormula should not use deprecated OneHotEncoder and should include VectorSizeHint in pipeline Key: SPARK-23037 URL:

[jira] [Created] (SPARK-23045) Have RFormula use OneHotEstimator

2018-01-11 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-23045: --- Summary: Have RFormula use OneHotEstimator Key: SPARK-23045 URL: https://issues.apache.org/jira/browse/SPARK-23045 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-23037) RFormula should not use deprecated OneHotEncoder and should include VectorSizeHint in pipeline

2018-01-11 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-23037: Affects Version/s: (was: 2.2.0) 2.3.0 > RFormula should not use

[jira] [Updated] (SPARK-23045) Have RFormula use OneHoEncoderEstimator

2018-01-11 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-23045: Summary: Have RFormula use OneHoEncoderEstimator (was: Have RFormula use OneHotEstimator)

[jira] [Created] (SPARK-23046) Have RFormula include VectorSizeHint in pipeline

2018-01-11 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-23046: --- Summary: Have RFormula include VectorSizeHint in pipeline Key: SPARK-23046 URL: https://issues.apache.org/jira/browse/SPARK-23046 Project: Spark Issue

[jira] [Commented] (SPARK-23109) ML 2.3 QA: API: Python API coverage

2018-01-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340399#comment-16340399 ] Bago Amirbekian commented on SPARK-23109: - [~bryanc] One reason the python API might be different

[jira] [Commented] (SPARK-23105) Spark MLlib, GraphX 2.3 QA umbrella

2018-01-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340448#comment-16340448 ] Bago Amirbekian commented on SPARK-23105: - [~mlnick] We can update the sub tasks to target 2.3 if

[jira] [Commented] (SPARK-23106) ML, Graph 2.3 QA: API: Binary incompatible changes

2018-01-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340234#comment-16340234 ] Bago Amirbekian commented on SPARK-23106: - I ran mina in branch-2.3 and got the following output:

[jira] [Comment Edited] (SPARK-23106) ML, Graph 2.3 QA: API: Binary incompatible changes

2018-01-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340234#comment-16340234 ] Bago Amirbekian edited comment on SPARK-23106 at 1/25/18 10:49 PM: --- I

[jira] [Comment Edited] (SPARK-23106) ML, Graph 2.3 QA: API: Binary incompatible changes

2018-01-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340234#comment-16340234 ] Bago Amirbekian edited comment on SPARK-23106 at 1/25/18 10:49 PM: --- I

[jira] [Resolved] (SPARK-23106) ML, Graph 2.3 QA: API: Binary incompatible changes

2018-01-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian resolved SPARK-23106. - Resolution: Resolved > ML, Graph 2.3 QA: API: Binary incompatible changes >

[jira] [Comment Edited] (SPARK-23106) ML, Graph 2.3 QA: API: Binary incompatible changes

2018-01-25 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340234#comment-16340234 ] Bago Amirbekian edited comment on SPARK-23106 at 1/25/18 10:49 PM: --- I

[jira] [Created] (SPARK-23048) Update mllib docs to replace OneHotEncoder with OneHotEncoderEstimator

2018-01-11 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-23048: --- Summary: Update mllib docs to replace OneHotEncoder with OneHotEncoderEstimator Key: SPARK-23048 URL: https://issues.apache.org/jira/browse/SPARK-23048

[jira] [Created] (SPARK-23377) Bucketizer with multiple columns persistence bug

2018-02-09 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-23377: --- Summary: Bucketizer with multiple columns persistence bug Key: SPARK-23377 URL: https://issues.apache.org/jira/browse/SPARK-23377 Project: Spark Issue

[jira] [Updated] (SPARK-23377) Bucketizer with multiple columns persistence bug

2018-02-09 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-23377: Description: A Bucketizer with multiple input/output columns get "inputCol" set to the

[jira] [Commented] (SPARK-23265) Update multi-column error handling logic in QuantileDiscretizer

2018-02-15 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366277#comment-16366277 ] Bago Amirbekian commented on SPARK-23265: - What's the status of this? Will this be a change in

[jira] [Created] (SPARK-22922) Python API for fitMultiple

2017-12-28 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-22922: --- Summary: Python API for fitMultiple Key: SPARK-22922 URL: https://issues.apache.org/jira/browse/SPARK-22922 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-22949) Reduce memory requirement for TrainValidationSplit

2018-01-03 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-22949: --- Summary: Reduce memory requirement for TrainValidationSplit Key: SPARK-22949 URL: https://issues.apache.org/jira/browse/SPARK-22949 Project: Spark

[jira] [Created] (SPARK-25149) ParallelPersonalizedPageRank raises an error if vertexIDs are > MaxInt

2018-08-17 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-25149: --- Summary: ParallelPersonalizedPageRank raises an error if vertexIDs are > MaxInt Key: SPARK-25149 URL: https://issues.apache.org/jira/browse/SPARK-25149

[jira] [Created] (SPARK-25268) runParallelPersonalizedPageRank throws serialization Exception

2018-08-28 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-25268: --- Summary: runParallelPersonalizedPageRank throws serialization Exception Key: SPARK-25268 URL: https://issues.apache.org/jira/browse/SPARK-25268 Project: Spark

[jira] [Updated] (SPARK-25149) Personalized Page Rank raises an error if vertexIDs are > MaxInt

2018-08-17 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-25149: Summary: Personalized Page Rank raises an error if vertexIDs are > MaxInt (was:

[jira] [Created] (SPARK-24852) Have spark.ml training use updated `Instrumentation` APIs.

2018-07-18 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-24852: --- Summary: Have spark.ml training use updated `Instrumentation` APIs. Key: SPARK-24852 URL: https://issues.apache.org/jira/browse/SPARK-24852 Project: Spark

[jira] [Created] (SPARK-24747) Make spark.ml.util.Instrumentation class more flexible

2018-07-05 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-24747: --- Summary: Make spark.ml.util.Instrumentation class more flexible Key: SPARK-24747 URL: https://issues.apache.org/jira/browse/SPARK-24747 Project: Spark

[jira] [Comment Edited] (SPARK-23471) RandomForestClassificationModel save() - incorrect metadata

2018-02-27 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379202#comment-16379202 ] Bago Amirbekian edited comment on SPARK-23471 at 2/27/18 8:04 PM: --

[jira] [Comment Edited] (SPARK-23471) RandomForestClassificationModel save() - incorrect metadata

2018-02-27 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379202#comment-16379202 ] Bago Amirbekian edited comment on SPARK-23471 at 2/27/18 8:04 PM: --

[jira] [Commented] (SPARK-23471) RandomForestClassificationModel save() - incorrect metadata

2018-02-27 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379202#comment-16379202 ] Bago Amirbekian commented on SPARK-23471: - [~Keepun] `train` is a protected API, it's called by

[jira] [Commented] (SPARK-19947) RFormulaModel always throws Exception on transforming data with NULL or Unseen labels

2018-02-27 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379282#comment-16379282 ] Bago Amirbekian commented on SPARK-19947: - I think this was resolved by

[jira] [Comment Edited] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-02-27 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379296#comment-16379296 ] Bago Amirbekian edited comment on SPARK-2 at 2/27/18 9:24 PM: --

[jira] [Comment Edited] (SPARK-23471) RandomForestClassificationModel save() - incorrect metadata

2018-02-27 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379202#comment-16379202 ] Bago Amirbekian edited comment on SPARK-23471 at 2/27/18 8:07 PM: --

[jira] [Commented] (SPARK-23333) SparkML VectorAssembler.transform slow when needing to invoke .first() on sorted DataFrame

2018-02-27 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379296#comment-16379296 ] Bago Amirbekian commented on SPARK-2: - [~MBALearnsToCode] you can use a `VectorSizeHint` 

[jira] [Comment Edited] (SPARK-23471) RandomForestClassificationModel save() - incorrect metadata

2018-02-27 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379202#comment-16379202 ] Bago Amirbekian edited comment on SPARK-23471 at 2/27/18 8:06 PM: --

[jira] [Comment Edited] (SPARK-23471) RandomForestClassificationModel save() - incorrect metadata

2018-02-27 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379202#comment-16379202 ] Bago Amirbekian edited comment on SPARK-23471 at 2/27/18 8:06 PM: --

[jira] [Created] (SPARK-23686) Make better usage of org.apache.spark.ml.util.Instrumentation

2018-03-14 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-23686: --- Summary: Make better usage of org.apache.spark.ml.util.Instrumentation Key: SPARK-23686 URL: https://issues.apache.org/jira/browse/SPARK-23686 Project: Spark

[jira] [Created] (SPARK-23562) RFormula handleInvalid should handle invalid values in non-string columns.

2018-03-01 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-23562: --- Summary: RFormula handleInvalid should handle invalid values in non-string columns. Key: SPARK-23562 URL: https://issues.apache.org/jira/browse/SPARK-23562

[jira] [Created] (SPARK-25921) Python worker reuse causes Barrier tasks to run without BarrierTaskContext

2018-11-01 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-25921: --- Summary: Python worker reuse causes Barrier tasks to run without BarrierTaskContext Key: SPARK-25921 URL: https://issues.apache.org/jira/browse/SPARK-25921

[jira] [Commented] (SPARK-25921) Python worker reuse causes Barrier tasks to run without BarrierTaskContext

2018-11-01 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672293#comment-16672293 ] Bago Amirbekian commented on SPARK-25921: - [~mengxr] [~jiangxb1987] Could you have a look. >

[jira] [Updated] (SPARK-25921) Python worker reuse causes Barrier tasks to run without BarrierTaskContext

2018-11-01 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-25921: Description: Running a barrier job after a normal spark job causes the barrier job to run

[jira] [Updated] (SPARK-25921) Python worker reuse causes Barrier tasks to run without BarrierTaskContext

2018-11-01 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-25921: Description: Running a barrier job after a normal spark job causes the barrier job to run

[jira] [Created] (SPARK-27446) RBackend always uses default values for spark confs

2019-04-11 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-27446: --- Summary: RBackend always uses default values for spark confs Key: SPARK-27446 URL: https://issues.apache.org/jira/browse/SPARK-27446 Project: Spark

[jira] [Created] (SPARK-29692) SparkContext.defaultParallism should reflect resource limits when resource limits are set

2019-10-31 Thread Bago Amirbekian (Jira)
Bago Amirbekian created SPARK-29692: --- Summary: SparkContext.defaultParallism should reflect resource limits when resource limits are set Key: SPARK-29692 URL: https://issues.apache.org/jira/browse/SPARK-29692