Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/9794#issuecomment-19064
ok!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/9794#issuecomment-196132382
@jkbradley I upload the continuous works at
https://github.com/Intel-bigdata/CRF Thanks very much.
---
If your project is set up for it, you can reply to this email
Github user HuJiayin closed the pull request at:
https://github.com/apache/spark/pull/9794
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user HuJiayin reopened a pull request:
https://github.com/apache/spark/pull/9794
[SPARK-4036][MLlib]Add Conditional Random Fields (CRF) algorithm to Spark
MLlib
Conditional random fields (CRFs) are a class of statistical modelling
method often applied in pattern recognition
Github user HuJiayin closed the pull request at:
https://github.com/apache/spark/pull/9794
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/9794#issuecomment-161907963
@Lewuathe It is integrated with Spark. The multiple threads are at executor
sides for segments parallel.
cc @mengxr
---
If your project is set up for it, you
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/9794#issuecomment-158318631
@Lewuathe Thanks! I will use Spark API to write a wrapper for it. Then it
can use Spark's advantage. The multiple threads you asked will be run in
executor side
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/9794#discussion_r45425805
--- Diff: mllib/src/test/scala/org/apache/spark/ml/nlp/CRFTests.scala ---
@@ -0,0 +1,27 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
GitHub user HuJiayin opened a pull request:
https://github.com/apache/spark/pull/9794
[SPARK-4036]Add Conditional Random Fields (CRF) algorithm to Spark MLlib
Conditional random fields (CRFs) are a class of statistical modelling
method often applied in pattern recognition
Github user HuJiayin closed the pull request at:
https://github.com/apache/spark/pull/8546
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-141855212
On the other hand, newcenters will cause a sudden increasing of memory
usage, though call clear immediately, but i think it waits for GC to clear.
Newcenter will still
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-140959166
@mengxr I reduced centers storage and deleted the fallback and duplicate
codes. I tested the functionality, performance on my local side and works.
Could you give me
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-138047275
The user doesn't need enlarge the memory to run 1.5 kmeans after apply this
fix. They still can use 1.2 configuration and have stable run experience in the
same time
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-136908702
Kmeans|| is a better algorithms to find the centers, if user has sufficient
memory, the performance is better. But sometime because of Kmeans parameters
like K
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-136911191
Users can manually adjust memory when they meet failure but they may cost
some time to find the root cause. There are many ways to implement kmeans and
they work well
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8526#issuecomment-136276662
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8546#issuecomment-136566441
cc @mengxr
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user HuJiayin opened a pull request:
https://github.com/apache/spark/pull/8546
SPARK-10329
Kmeans || is better to find centers more efficient based on stochastic
processes,et al. But some users with small memory will meet difficulties to run
this. The patch will fallback
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/8526#issuecomment-136241091
The fix reduces around 50G RDD based on data size below. The performance is
improved. The user needs more than 8G memory to run the kmeans in Spark1.5
based
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7812#issuecomment-126670836
the other problem is
the current code failed at z測試, actual: Z000; expected: z測試
---
If your project is set up for it, you can reply to this email
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7812#discussion_r35956865
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -680,4 +680,57 @@ public int hashCode() {
}
return
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7812#discussion_r35939819
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -680,4 +680,57 @@ public int hashCode() {
}
return
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7812#discussion_r35942907
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -680,4 +680,57 @@ public int hashCode() {
}
return
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7115#issuecomment-126530603
ok : )
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7208#issuecomment-125055458
I tested the codegen works from my local computer and ask for a review
retest. The central build failed at:
Single command with --database *** FAILED *** (1 minute
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r35404443
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1765,6 +1765,24 @@ object functions
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7208#issuecomment-124508708
I tested it works from my side, did I miss something ..
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7208#issuecomment-124550026
It works from my local test. The central build log returns as follows, do
you have ideas, I guess maybe my problem but the problem doesn't appear at my
side. could you
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7115#discussion_r35294419
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -628,4 +634,93 @@ public int hashCode() {
}
return
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7186#discussion_r35187145
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -231,6 +233,36 @@ public UTF8String toLowerCase
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7115#discussion_r35282881
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -628,4 +634,93 @@ public int hashCode() {
}
return
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7115#discussion_r35291770
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -628,4 +634,93 @@ public int hashCode() {
}
return
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7208#issuecomment-123985765
Chenghao, you don't need follow up for the python and codegen part. I'll
follow.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7115#issuecomment-123912608
@rxin could you review/merge/close the pr
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7208#issuecomment-123912668
@rxin could you review/merge/close the pr
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7115#discussion_r34965084
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -515,4 +521,91 @@ public int hashCode() {
}
return
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7115#discussion_r34965629
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -515,4 +521,91 @@ public int hashCode() {
}
return
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7115#discussion_r34965616
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -515,4 +521,91 @@ public int hashCode() {
}
return
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7115#issuecomment-122731221
@rxin @davies @liancheng @admins, 8302b94 contains merge confliction
issue, I close and reopen the PR to prevent Jenkins continue the build. The
merge issue
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r34867329
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -593,6 +593,33 @@ case class Levenshtein(left
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r34868236
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -593,6 +593,33 @@ case class Levenshtein(left
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r34863266
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -552,6 +552,34 @@ case class Substring(str
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7115#issuecomment-121534169
Ok to test Retest Jenkins Thanks Retest Oktotest
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r34435401
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -570,6 +570,37 @@ case class StringLength
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r34435724
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -298,3 +299,56 @@ case class StringLength
Github user HuJiayin closed the pull request at:
https://github.com/apache/spark/pull/7115
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user HuJiayin reopened a pull request:
https://github.com/apache/spark/pull/7115
[SPARK-8271][SQL]string function: soundex
Add soundex for SQL
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HuJiayin/spark master
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7115#issuecomment-120821536
the patch is to add soundex support, not rebase code
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7115#issuecomment-120250661
This patch fails MiMa tests.
[error] running /home/jenkins/workspace/SparkPullRequestBuilder@2/dev/mima
; received return code 255
Can you help on this? Thanks
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7115#issuecomment-120303334
Could you review and merge this pr, thanks much.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7208#issuecomment-120303382
Could you review and merge this pr, thanks much.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7236#discussion_r34229711
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -254,6 +254,70 @@ public boolean equals(final Object other
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7208#issuecomment-119924304
I didn't add/remove spaces. The Intellij IDEA do it for me.( Scala 2
spaces, Java 4 spaces )
---
If your project is set up for it, you can reply to this email
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7115#issuecomment-119555834
@cloud-fan The soundex is a common string encode function. You could find
definition from wiki page. The original implementation in Apache common doesn't
implement
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7115#discussion_r34214181
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -275,6 +276,24 @@ case class StringLength
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7208#issuecomment-119089275
@rxin @davies @liancheng can you trigger the unit test?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r34012100
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -165,6 +165,38 @@ public UTF8String toLowerCase() {
return
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7115#issuecomment-119115714
@rxin @davies @liancheng
Because of merge issue, the UTF8String update was not pushed just now,
can you review and trigger the unit test?
commit code
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7115#issuecomment-119113518
@rxin @davies @liancheng
I removed 3rd soundex library and passed ut from my local test.
can you review and trigger the unit test?
---
If your project
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7186#discussion_r34034920
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -211,6 +211,77 @@ case class EndsWith(left
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7208#issuecomment-118753913
tarekauel
When you want to merge a small modification next time, please tell me,
Chenghao, we can avoid multiple code rebase issues.
---
If your project is set up
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r33910514
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -298,3 +299,56 @@ case class StringLength
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7214#issuecomment-118767524
you doesn't need to check left or right string are null or not since the
getLevenshteinDistance check if the strings are null or not. Duplicate checks
will make
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7186#discussion_r33915726
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -211,6 +211,77 @@ case class EndsWith(left
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r33925319
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -298,3 +299,38 @@ case class StringLength
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r33926169
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -165,6 +165,20 @@ public UTF8String toLowerCase() {
return
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7208#issuecomment-118836177
tarekauel:
Could you provide some valuable suggestions instead of the string is blank
or not check,etc. I will check all this kind of issues before ask
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7186#discussion_r33927820
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -211,6 +211,77 @@ case class EndsWith(left
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r33926185
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -165,6 +165,20 @@ public UTF8String toLowerCase() {
return
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r33998167
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -298,3 +299,34 @@ case class StringLength
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7208#issuecomment-119041940
@rxin @davies @liancheng can you trigger the unit test?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7115#discussion_r34001741
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -211,4 +215,57 @@ public boolean equals(final Object other
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r34007806
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -165,6 +165,38 @@ public UTF8String toLowerCase() {
return
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r34008326
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -298,3 +299,36 @@ case class StringLength
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r34008396
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -298,3 +299,36 @@ case class StringLength
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r34008971
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -298,3 +299,36 @@ case class StringLength
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r34009123
--- Diff:
unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -165,6 +165,38 @@ public UTF8String toLowerCase() {
return
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7115#issuecomment-118751684
@rxin the requirements are in 65d8feb
@rxin @davies @liancheng can you trigger the unit test?
---
If your project is set up for it, you can reply to this email
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7208#issuecomment-118877843
@rxin @davies @liancheng can you trigger the unit test?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r33901328
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -298,3 +299,56 @@ case class StringLength
Github user HuJiayin commented on the pull request:
https://github.com/apache/spark/pull/7115#issuecomment-118735097
@rxin @davies @liancheng can you trigger the unit test?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7214#discussion_r33901876
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1580,22 +1580,37 @@ object functions
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/7208#discussion_r33902172
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
---
@@ -298,3 +299,56 @@ case class StringLength
GitHub user HuJiayin opened a pull request:
https://github.com/apache/spark/pull/7208
[SPARK-8269][SQL]string function: initcap
Returns string, with the first letter of each word in uppercase, all other
letters in lowercase. Words are delimited by whitespace.
You can merge
Github user HuJiayin commented on a diff in the pull request:
https://github.com/apache/spark/pull/5307#discussion_r33741296
--- Diff:
core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala ---
@@ -123,12 +129,29 @@ private[spark] class DiskBlockObjectWriter
GitHub user HuJiayin opened a pull request:
https://github.com/apache/spark/pull/7115
[SPARK-8271][SQL]string function: soundex
Add soundex for SQL
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HuJiayin/spark master
86 matches
Mail list logo