[GitHub] spark issue #19988: [Spark-22795] [ML] Raise error when line search in First...
Github user mrkm4ntr commented on the issue: https://github.com/apache/spark/pull/19988 This was incorrect. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...
Github user mrkm4ntr closed the pull request at: https://github.com/apache/spark/pull/19988 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...
Github user mrkm4ntr closed the pull request at: https://github.com/apache/spark/pull/20568 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user mrkm4ntr commented on the issue: https://github.com/apache/spark/pull/20568 @gatorsmile Thanks! I will close it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user mrkm4ntr commented on the issue: https://github.com/apache/spark/pull/20568 I cannot reproduce this failure of the test in my environment. It seems to me that this is not related to this change... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...
Github user mrkm4ntr commented on a diff in the pull request: https://github.com/apache/spark/pull/20568#discussion_r168659153 --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/Murmur3_x86_32.java --- @@ -71,6 +73,20 @@ public static int hashUnsafeBytes(Object base, long offset, int lengthInBytes, i return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytes2(Object base, long offset, int lengthInBytes, int seed) { +// This is compatible with original and another implementations. +// Use this method after 2.3.0. --- End diff -- Thanks, fixed it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user mrkm4ntr commented on the issue: https://github.com/apache/spark/pull/20568 @hvanhovell I added a method and changed it so that we call it only from FeatureHasher. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user mrkm4ntr commented on the issue: https://github.com/apache/spark/pull/20568 @hvanhovell I sent an e-mail to the topic `[VOTE] Spark 2.3.0 (RC3)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user mrkm4ntr commented on the issue: https://github.com/apache/spark/pull/20568 I registered with the same user name in dev list. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user mrkm4ntr commented on the issue: https://github.com/apache/spark/pull/20568 @hvanhovell The main motivation is making the online prediction of trained parameters using FeatureHasher in MLLib. If the generated hash value is different from the implementations in another language, indices of coefficients do not match and can not predict correctly. But I agree backward compatibility is more important. Since FeatureHasher will be added from Spark 2.3.0, how about adding a new method of this content to Murmur 3 and using it only from FeatureHasher? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user mrkm4ntr commented on the issue: https://github.com/apache/spark/pull/20568 @kiszk Thank you for your review! I fixed it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...
GitHub user mrkm4ntr opened a pull request: https://github.com/apache/spark/pull/20568 [SPARK-23381][CORE] Murmur3 hash generates a different value from other implementations ## What changes were proposed in this pull request? Murmur3 hash generates a different value from the original and other implementations (like Scala standard library and Guava or so) when the length of a bytes array is not multiple of 4. ## How was this patch tested? Added a unit test. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mrkm4ntr/spark spark-23381 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20568.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20568 commit 8856e39e44037516080cbe25b7c6bdf52c81dde8 Author: Shintaro Murakami <mrkm4ntr@...> Date: 2018-02-10T14:29:48Z [SPARK-23381][CORE] Fix Murmur3 for byte arrays whose byte length is not a multiple of 4 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19988: [Spark-22795] [ML] Raise error when line search in First...
Github user mrkm4ntr commented on the issue: https://github.com/apache/spark/pull/19988 > When gradient nearly zero, and line-search failed. I think this is also possible. This should get valid model. Yes, I agree. > When gradient non-zero, line-search failed, will the model always be meaning-less ? I think the model is not always meaningless. LineSeachFailed occurs when there is an iteration exceeding the maximum value because the range that satisfies the strong wolfe condition is far away from the initial value (= 1.0). However, maxIter is 10. It is considered that the upper limit of the line search range is 1.5 ^ 10 = 57.665, and the lower limit is 0.5 ^ 10 = 0.001 (assuming the interpolation value is at the center). Of course, the range is not necessarily continuous, but I think whether it can be thought that LineSearchFailed will not occur so often if the input or intermediate calculation process is correct. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19988: [Spark-22795] [ML] Raise error when line search in First...
Github user mrkm4ntr commented on the issue: https://github.com/apache/spark/pull/19988 Surely... I lost my confidence as to whether the model is always meaningless. Instead of raising an error, would it be preferable that the model has information that LineSearch failed? I want to know whether line search failed or not programmatically. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...
Github user mrkm4ntr commented on a diff in the pull request: https://github.com/apache/spark/pull/19988#discussion_r157447946 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -244,9 +244,9 @@ class LinearSVC @Since("2.2.0") ( } bcFeaturesStd.destroy(blocking = false) - if (state == null) { + if (state == null || state.searchFailed) { val msg = s"${optimizer.getClass.getName} failed." -logError(msg) +logWarning(msg) throw new SparkException(msg) --- End diff -- Got it. Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...
Github user mrkm4ntr commented on a diff in the pull request: https://github.com/apache/spark/pull/19988#discussion_r157396794 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala --- @@ -244,9 +244,9 @@ class LinearSVC @Since("2.2.0") ( } bcFeaturesStd.destroy(blocking = false) - if (state == null) { + if (state == null || state.searchFailed) { val msg = s"${optimizer.getClass.getName} failed." -logError(msg) +logWarning(msg) throw new SparkException(msg) --- End diff -- The message that line search failed is logged by breeze. https://github.com/scalanlp/breeze/blob/4611fe3f4466ac5b1885e9d6288ba27fc9d205ec/math/src/main/scala/breeze/optimize/FirstOrderMinimizer.scala#L78 I think it is a bit verbose that logging at this place. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...
Github user mrkm4ntr commented on a diff in the pull request: https://github.com/apache/spark/pull/19988#discussion_r157345097 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala --- @@ -109,6 +109,11 @@ private[optim] class QuasiNewtonSolver( state = states.next() arrayBuilder += state.adjustedValue } +if (state == null || state.searchFailed) { + val msg = s"${optimizer.getClass.getName} failed." + logError(msg) --- End diff -- I got it. Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...
Github user mrkm4ntr commented on a diff in the pull request: https://github.com/apache/spark/pull/19988#discussion_r157243741 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala --- @@ -109,6 +109,11 @@ private[optim] class QuasiNewtonSolver( state = states.next() arrayBuilder += state.adjustedValue } +if (state == null || state.searchFailed) { + val msg = s"${optimizer.getClass.getName} failed." + logError(msg) --- End diff -- Thanks for your review. Is this a comment about the logging level? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19988: [Spark-22795] Raise error when line search in Fir...
GitHub user mrkm4ntr opened a pull request: https://github.com/apache/spark/pull/19988 [Spark-22795] Raise error when line search in FirstOrderMinimizer failed ## What changes were proposed in this pull request? When line search in FirstOrderMinimizer (LBFGS or OWLQN so on) failed, the fit method of estimator not failed, and a meaning-less transformer is returned. This pull request changes estimators to raise an error when line search failed. The error should be discovered early. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mrkm4ntr/spark spark-22795 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19988.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19988 commit b48debb1f5b7d6ec0a44e5c4806f20ee9d6806d9 Author: Shintaro Murakami <mrkm4...@gmail.com> Date: 2017-12-15T05:40:14Z [Spark-22795] Raise error when line search in FirstOrderMinimizer failed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org