Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-85893445
[Test build #29154 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29154/consoleFull)
for PR 4504 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-85890886
[Test build #29153 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29153/consoleFull)
for PR 4504 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-85931868
[Test build #29154 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29154/consoleFull)
for PR 4504 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-85931920
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-85928788
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-85928723
[Test build #29153 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29153/consoleFull)
for PR 4504 at commit
Github user aborsu985 commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-85935456
@mengxr Thank you for your help with the Java unit tests. As you may have
guessed, I'm new to both Scala and Java and I was drowning in it.
---
If your project is set
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-86128922
LGTM. Merged into master. Thanks for contributing!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/4504
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-85367964
[Test build #29059 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29059/consoleFull)
for PR 4504 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-85391856
[Test build #29059 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29059/consoleFull)
for PR 4504 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-85391870
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-84962071
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-84962033
[Test build #28990 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28990/consoleFull)
for PR 4504 at commit
Github user aborsu985 commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26925182
--- Diff:
mllib/src/test/java/org/apache/spark/ml/feature/JavaTokenizerSuite.java ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-84934562
[Test build #28990 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28990/consoleFull)
for PR 4504 at commit
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26868590
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26868579
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26868583
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26868564
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,67 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26868574
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26868587
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26868562
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,67 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26868568
--- Diff:
mllib/src/test/java/org/apache/spark/ml/feature/JavaTokenizerSuite.java ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-84106511
I don't know a formatter that can do everything correctly. I use intellij
and with the default Scala code style (except indent 2). I need to manually
adjust the
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-83475464
[Test build #28862 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28862/consoleFull)
for PR 4504 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-83475548
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user aborsu985 commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-83425231
Sorry my commit was a bit hasty. Any automated style checkers to recommend?
---
If your project is set up for it, you can reply to this email and have your
reply
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-83425118
[Test build #28862 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28862/consoleFull)
for PR 4504 at commit
Github user aborsu985 commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-82969443
Thank you for the tip, I'll look into the java tests next week when I have
some time.
But in the meantime. I changed the RegexTokenizer to extend from Tokenizer
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-82968292
[Test build #28798 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28798/consoleFull)
for PR 4504 at commit
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26665211
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,67 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26665219
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26665229
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26665224
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26665203
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,67 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26665234
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26665227
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26665220
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-83004046
[Test build #28798 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28798/consoleFull)
for PR 4504 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-83004074
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r26665217
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/feature/TokenizerSuite.scala ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-83022231
@aborsu985 Please check the code style and make sure you follow
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
---
If your project is set up
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-82405249
[Test build #28727 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28727/consoleFull)
for PR 4504 at commit
Github user aborsu985 commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-82410900
@mengxr I do not think that LowerCase warrants a transformer but rather it
could be incorporated into a larger string to vector transformer that changes a
text into a
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-82421973
[Test build #28728 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28728/consoleFull)
for PR 4504 at commit
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-82541096
You can use
https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
as a template for unit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-82465490
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-82465423
[Test build #28727 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28727/consoleFull)
for PR 4504 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-82481855
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-82481809
[Test build #28728 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28728/consoleFull)
for PR 4504 at commit
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-78750052
add to whitelist
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-78777321
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-78777298
[Test build #28544 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28544/consoleFull)
for PR 4504 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-78750861
[Test build #28544 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28544/consoleFull)
for PR 4504 at commit
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-78709488
@aborsu985 Sorry for the delay! On the high level, I'm a little concerned
with exposing too many parameters in the first version. NLTK's regex tokenizer
Github user aborsu985 commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-76861500
Changed minimum token length to 1 and removed the excluded bit.
Added matching param which allows to switch from matching regex to
splitting regex.
Reduced
Github user aborsu985 commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25652664
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232685
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232690
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232686
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232692
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232704
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232703
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232681
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232679
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232691
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232693
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232697
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232695
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232699
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/4504#discussion_r25232683
--- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala
---
@@ -39,3 +39,66 @@ class Tokenizer extends UnaryTransformer[String,
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/4504#issuecomment-75710113
@aborsu985 I made a pass on the code. Besides my inline comments, please
add a unit test. It would be better if you can also add a Java unit test.
Thanks!
---
If your
73 matches
Mail list logo