Github user hhbyyh commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-117377174
Thanks @jkbradley. For setting the metadata, could you please give a
specific sample (perhaps an existing transformer) ?
---
If your project is set up for it, you can
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-117423429
I just added a link to the JIRA. Please ping if you have questions (and if
you have suggestions for simplifying the use of metadata, since that is a WIP).
Thanks!
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-117316488
Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-117316468
LGTM merging into master
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/6039
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r33536321
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-116927872
@hhbyyh Thanks for the updates. It looks fine, except for 1 to-do: Can you
please add a note in the Scala doc, saying that this transformer does not yet
set the
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r33536449
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r33536500
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-116961864
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-116961793
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-116983289
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-116982739
[Test build #36092 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36092/console)
for PR 6039 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-116963043
[Test build #36092 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36092/consoleFull)
for PR 6039 at commit
Github user hhbyyh commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-116973685
@jkbradley Thanks. A todo item is added.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-114173048
[Test build #35459 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35459/consoleFull)
for PR 6039 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-114172378
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-114173959
[Test build #35459 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35459/console)
for PR 6039 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-114173968
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-114172404
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user hhbyyh commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-114174051
@jkbradley, sent up update. Not sure if I did correctly for not adding the
metadata.
Spark-8529, 8530, 8531 created for the tasks listed.
---
If your project is set
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32949758
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-114345837
[Test build #35510 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35510/console)
for PR 6039 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-114332330
[Test build #35510 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35510/consoleFull)
for PR 6039 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-114332211
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-114332227
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32927784
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32938468
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32895915
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32895918
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32895917
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32895916
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-113938339
Very good point about vectors becoming dense. In the Scala doc, can you
please document that all vectors will become dense?
Also, could you go ahead an make
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32895919
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-113489197
[Test build #35262 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35262/console)
for PR 6039 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-113435601
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-113470039
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-113470067
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-113470504
[Test build #35262 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35262/consoleFull)
for PR 6039 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-113489308
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user hhbyyh commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-113436948
@jkbradley Thanks for helping review.
I found an issue. For SparseVector, MinMaxScaler probably will change many
zeros to non-zeros, resulting into a dense
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-113436994
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-113436987
[Test build #35253 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35253/console)
for PR 6039 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-113436707
[Test build #35253 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35253/consoleFull)
for PR 6039 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-113435675
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689854
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689867
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689872
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689874
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689869
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689862
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689859
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-112986615
A bunch of mostly small cleanups. That's all for now!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-112980823
@hhbyyh Sorry for the long delay around the release! I'll review this now.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689870
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689875
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689865
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689873
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689849
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689852
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689861
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689868
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689880
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala ---
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689850
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r32689857
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-105187284
Hi @jkbradley Sorry for the delay.
I've removed the MLlib version and added UT.
---
If your project is set up for it, you can reply to this email and have your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-105188263
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-105189183
[Test build #33466 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33466/consoleFull)
for PR 6039 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-105188316
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-105215836
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-105215796
[Test build #33466 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33466/consoleFull)
for PR 6039 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-105215843
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user hhbyyh commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-104561300
Thanks @jkbradley for the concrete suggestions.
Sorry I am occupied by the AmpCamp China event this Saturday.
I'll provide updates according to the comments around
Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-103177221
@hhbyyh Thanks for the update! Do you think we really need a copy of the
scaler in the spark.mllib package? The problem with adding it is that it's
duplicating the
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/6039#discussion_r30537358
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software
Github user hhbyyh commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-101988407
@mengxr
I sent an update and currently it's following the pattern of
`StandardScaler`. Let me know if this is what's in your mind. Thanks for your
time.
---
If
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-101630087
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-101630105
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-101630212
[Test build #32610 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32610/consoleFull)
for PR 6039 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-101654285
[Test build #32610 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32610/consoleFull)
for PR 6039 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-101654306
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-101654302
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-101396078
@hhbyyh The pipeline API is a better place for feature transformers. It
would be nice if you can implement `MinMaxScaler` under `spark.ml` using the
pipeline API.
Github user hhbyyh commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-101464482
Thanks @mengxr. The min/max suggestion aligns with the idea of Jkbradley.
I'll migrate it to ml today or tomorrow. Is it appropriate the use the same
PR?
---
If
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-100775532
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-100775522
[Test build #32369 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32369/consoleFull)
for PR 6039 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-100775530
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-100759270
[Test build #32369 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32369/consoleFull)
for PR 6039 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-100759239
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/6039#issuecomment-100759233
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
90 matches
Mail list logo