Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/21524
Let's close it for now.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user tengpeng closed the pull request at:
https://github.com/apache/spark/pull/21524
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/21524
Yes, but may not be recently. Is there a "deadline" (e.g. branch cut)
coming?
On Tue, Sep 18, 2018 at 4:23 PM Sean Owen wrote:
> @tengpeng <https://github.com
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/21524
Right, the doc is mostly copied with minor modifications. Let me update my
PR for the Python API this weekend.
On Fri, Sep 7, 2018 at 12:28 PM Sean Owen wrote:
> CC @tengp
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/21861
@gatorsmile Got you. I will update the implementation after DataSourceV2
API changes.
---
-
To unsubscribe, e-mail: reviews
GitHub user tengpeng opened a pull request:
https://github.com/apache/spark/pull/21861
[SPARK-24907][WIP] Migrate JDBC DataSource to JDBCDataSourceV2 Read using
DataSourceV2 API
## What changes were proposed in this pull request?
(After the update of DataSourceV2 API, this
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/21123
Any updates on this PR? Yes, I know this is a temporary hack, but without
it being merged, it is not possible to migrate other data sources to V2
(experimentally
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/20933#discussion_r200819285
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -241,39 +240,47 @@ final class DataFrameWriter[T] private[sql](ds
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/20933#discussion_r198456106
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FilePartitionUtil.scala
---
@@ -0,0 +1,225 @@
+/*
+ * Licensed to
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/21524
Gentle ping @jkbradley @WeichenXu123 @mengxr Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
GitHub user tengpeng opened a pull request:
https://github.com/apache/spark/pull/21524
[SPARK-24212][ML][doc] Add the example and user guide for ML PrefixSpan
## What changes were proposed in this pull request?
There are no example and user guide for ML PrefixSpan (not
GitHub user tengpeng opened a pull request:
https://github.com/apache/spark/pull/21522
[SPARK-24467][ML] VectorAssemblerEstimator
Background: See the JIRA ticket.
This PR is on its very early stage, and hopefully it would help us decide
what's the right dire
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/21125#discussion_r183386685
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -495,8 +495,8 @@ class
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/21125#discussion_r183386571
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala
---
@@ -782,8 +782,12 @@ object
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/21125#discussion_r183386022
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala
---
@@ -495,8 +495,8 @@ class
GitHub user tengpeng opened a pull request:
https://github.com/apache/spark/pull/21125
[Spark-24024] Fix poisson deviance calculations in GLM to handle y = 0
## What changes were proposed in this pull request?
It is reported by Spark users that the deviance calculations
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/20632
(A note to me & future readers) It seems this is actually for [SPARK-3159]
Check for reducible DecisionTree, rather than SPARK-3155
Support DecisionTree pruning. The title is confusing t
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/20842
Looks good! Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19666#discussion_r175646373
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala
---
@@ -152,15 +152,13 @@ private[spark] object
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19666#discussion_r175646335
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala
---
@@ -152,15 +152,13 @@ private[spark] object
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/20732#discussion_r172059794
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Binarizer.scala
---
@@ -45,66 +47,117 @@ final class Binarizer @Since("1.4.0") (@Si
GitHub user tengpeng opened a pull request:
https://github.com/apache/spark/pull/20732
[SPARK-23578][ML] Add multicolumn support for Binarizer
## What changes were proposed in this pull request?
[Spark-20542] added an API that Bucketizer that can bin multiple columns
Github user tengpeng closed the pull request at:
https://github.com/apache/spark/pull/20729
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
GitHub user tengpeng opened a pull request:
https://github.com/apache/spark/pull/20729
[SPARK-23578][ML]Add multicolumn support for Binarizer
[Spark-20542] added an API that Bucketizer that can bin multiple columns.
Based on this change, a multicolumn support is added for Binarizer
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/17819#discussion_r152891005
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala
---
@@ -108,26 +164,53 @@ final class Bucketizer @Since("1.4.0"
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/19638
Not sure what's happening here. The test on my local machine passed:
Running Apache RAT c
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r150394001
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -764,13 +764,17 @@ class LinearRegressionSuite
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r150393944
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -764,13 +764,17 @@ class LinearRegressionSuite
Github user tengpeng closed the pull request at:
https://github.com/apache/spark/pull/19660
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r149560345
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -764,13 +764,17 @@ class LinearRegressionSuite
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r149558607
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -764,13 +764,17 @@ class LinearRegressionSuite
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/19660
@srowen You are absolutely right. That's the what 2 aims to accomplish. I
believe implementing 1 & 2 is the goal, like what they did in sklearn. Need
some discussions
GitHub user tengpeng opened a pull request:
https://github.com/apache/spark/pull/19660
[SPARK-18755][WIP][ML] Add Randomized Grid Search to Spark ML
## What changes were proposed in this pull request?
Python sklearn has a randomized grid search for reducing the time for
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148919520
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -764,13 +764,17 @@ class LinearRegressionSuite
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148869278
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala
---
@@ -764,13 +764,17 @@ class LinearRegressionSuite
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/19638
I have used @sethah 's approach to address the issues we have. Since we are
not adding a new method to the public trait, there is no more binary
compatibility
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148692614
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/evaluation/RegressionMetrics.scala
---
@@ -125,4 +125,14 @@ class RegressionMetrics @Since("
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148641672
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/evaluation/RegressionEvaluatorSuite.scala
---
@@ -73,6 +73,11 @@ class RegressionEvaluatorSuite
Github user tengpeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/19638#discussion_r148626297
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/evaluation/RegressionEvaluator.scala
---
@@ -49,8 +49,8 @@ final class RegressionEvaluator @Since
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/19638
@srowen I have fixed scaladocs and since issues. I will pay special
attention to this issue next time.
---
-
To unsubscribe, e
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/19638
Would it be possible to add me to the white list for test? Thanks.
On Thu, Nov 2, 2017 at 12:17 AM UCB AMPLab wrote:
> Can one of the admins verify this patch?
>
GitHub user tengpeng opened a pull request:
https://github.com/apache/spark/pull/19638
[SPARK-22422][ML] Add Adjusted R2 to RegressionMetrics
## What changes were proposed in this pull request?
I added adjusted R2 as a regression metric which was implemented in all
major
Github user tengpeng commented on the issue:
https://github.com/apache/spark/pull/19600
I will follow the guideline strictly next time. Thanks.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
GitHub user tengpeng opened a pull request:
https://github.com/apache/spark/pull/19600
Added more information to Imputer
Often times we want to impute custom values other than 'NaN'. My addition
helps people locate this function without reading the API.
## Wh
44 matches
Mail list logo