[GitHub] spark issue #19988: [Spark-22795] [ML] Raise error when line search in First...

2018-10-18 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/19988
  
This was incorrect.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...

2018-10-18 Thread mrkm4ntr
Github user mrkm4ntr closed the pull request at:

https://github.com/apache/spark/pull/19988


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...

2018-02-16 Thread mrkm4ntr
Github user mrkm4ntr closed the pull request at:

https://github.com/apache/spark/pull/20568


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-16 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
@gatorsmile Thanks! I will close it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-15 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
I cannot reproduce this failure of the test in my environment.
It seems to me that this is not related to this change...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...

2018-02-15 Thread mrkm4ntr
Github user mrkm4ntr commented on a diff in the pull request:

https://github.com/apache/spark/pull/20568#discussion_r168659153
  
--- Diff: 
common/sketch/src/main/java/org/apache/spark/util/sketch/Murmur3_x86_32.java ---
@@ -71,6 +73,20 @@ public static int hashUnsafeBytes(Object base, long 
offset, int lengthInBytes, i
 return fmix(h1, lengthInBytes);
   }
 
+  public static int hashUnsafeBytes2(Object base, long offset, int 
lengthInBytes, int seed) {
+// This is compatible with original and another implementations.
+// Use this method after 2.3.0.
--- End diff --

Thanks, fixed it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-14 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
@hvanhovell I added a method and changed it so that we call it only from 
FeatureHasher.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-14 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
@hvanhovell I sent an e-mail to the topic `[VOTE] Spark 2.3.0 (RC3)`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-14 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
I registered with the same user name in dev list.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-12 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
@hvanhovell The main motivation is making the online prediction of trained 
parameters using FeatureHasher in MLLib.  If the generated hash value is 
different from the implementations in another language, indices of coefficients 
do not match and can not predict correctly.
But I agree backward compatibility is more important. Since FeatureHasher 
will be added from Spark 2.3.0, how about adding a new method of this content 
to Murmur 3 and using it only from FeatureHasher?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...

2018-02-11 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/20568
  
@kiszk Thank you for your review! I fixed it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...

2018-02-10 Thread mrkm4ntr
GitHub user mrkm4ntr opened a pull request:

https://github.com/apache/spark/pull/20568

[SPARK-23381][CORE] Murmur3 hash generates a different value from other 
implementations

## What changes were proposed in this pull request?
Murmur3 hash generates a different value from the original and other 
implementations (like Scala standard library and Guava or so) when the length 
of a bytes array is not multiple of 4.

## How was this patch tested?

Added a unit test.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mrkm4ntr/spark spark-23381

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20568.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20568


commit 8856e39e44037516080cbe25b7c6bdf52c81dde8
Author: Shintaro Murakami <mrkm4ntr@...>
Date:   2018-02-10T14:29:48Z

[SPARK-23381][CORE] Fix Murmur3 for byte arrays whose byte length is not a 
multiple of 4




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19988: [Spark-22795] [ML] Raise error when line search in First...

2017-12-20 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/19988
  
> When gradient nearly zero, and line-search failed. I think this is also 
possible. This should get valid model.

Yes, I agree.

> When gradient non-zero, line-search failed, will the model always be 
meaning-less ?

I think the model is not always meaningless.
LineSeachFailed occurs when there is an iteration exceeding the maximum 
value because the range that satisfies the strong wolfe condition is far away 
from the initial value (= 1.0). 
However, maxIter is 10. It is considered that the upper limit of the line 
search range is 1.5 ^ 10 = 57.665, and the lower limit is 0.5 ^ 10 = 0.001 
(assuming the interpolation value is at the center).
Of course, the range is not necessarily continuous, but I think whether it 
can be thought that LineSearchFailed will not occur so often if the input or 
intermediate calculation process is correct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19988: [Spark-22795] [ML] Raise error when line search in First...

2017-12-19 Thread mrkm4ntr
Github user mrkm4ntr commented on the issue:

https://github.com/apache/spark/pull/19988
  
Surely... I lost my confidence as to whether the model is always 
meaningless.
Instead of raising an error, would it be preferable that the model has 
information that LineSearch failed? I want to know whether line search failed 
or not programmatically.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...

2017-12-18 Thread mrkm4ntr
Github user mrkm4ntr commented on a diff in the pull request:

https://github.com/apache/spark/pull/19988#discussion_r157447946
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -244,9 +244,9 @@ class LinearSVC @Since("2.2.0") (
   }
 
   bcFeaturesStd.destroy(blocking = false)
-  if (state == null) {
+  if (state == null || state.searchFailed) {
 val msg = s"${optimizer.getClass.getName} failed."
-logError(msg)
+logWarning(msg)
 throw new SparkException(msg)
--- End diff --

Got it. Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...

2017-12-17 Thread mrkm4ntr
Github user mrkm4ntr commented on a diff in the pull request:

https://github.com/apache/spark/pull/19988#discussion_r157396794
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -244,9 +244,9 @@ class LinearSVC @Since("2.2.0") (
   }
 
   bcFeaturesStd.destroy(blocking = false)
-  if (state == null) {
+  if (state == null || state.searchFailed) {
 val msg = s"${optimizer.getClass.getName} failed."
-logError(msg)
+logWarning(msg)
 throw new SparkException(msg)
--- End diff --

The message that line search failed is logged by breeze.

https://github.com/scalanlp/breeze/blob/4611fe3f4466ac5b1885e9d6288ba27fc9d205ec/math/src/main/scala/breeze/optimize/FirstOrderMinimizer.scala#L78
I think it is a bit verbose that logging at this place.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...

2017-12-16 Thread mrkm4ntr
Github user mrkm4ntr commented on a diff in the pull request:

https://github.com/apache/spark/pull/19988#discussion_r157345097
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala ---
@@ -109,6 +109,11 @@ private[optim] class QuasiNewtonSolver(
   state = states.next()
   arrayBuilder += state.adjustedValue
 }
+if (state == null || state.searchFailed) {
+  val msg = s"${optimizer.getClass.getName} failed."
+  logError(msg)
--- End diff --

I got it. Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19988: [Spark-22795] [ML] Raise error when line search i...

2017-12-15 Thread mrkm4ntr
Github user mrkm4ntr commented on a diff in the pull request:

https://github.com/apache/spark/pull/19988#discussion_r157243741
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/NormalEquationSolver.scala ---
@@ -109,6 +109,11 @@ private[optim] class QuasiNewtonSolver(
   state = states.next()
   arrayBuilder += state.adjustedValue
 }
+if (state == null || state.searchFailed) {
+  val msg = s"${optimizer.getClass.getName} failed."
+  logError(msg)
--- End diff --

Thanks for your review.
Is this a comment about the logging level?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19988: [Spark-22795] Raise error when line search in Fir...

2017-12-15 Thread mrkm4ntr
GitHub user mrkm4ntr opened a pull request:

https://github.com/apache/spark/pull/19988

[Spark-22795] Raise error when line search in FirstOrderMinimizer failed

## What changes were proposed in this pull request?

When line search in FirstOrderMinimizer (LBFGS or OWLQN so on) failed, the 
fit method of estimator not failed, and a meaning-less transformer is returned. 
This pull request changes estimators to raise an error when line search 
failed.

The error should be discovered early.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mrkm4ntr/spark spark-22795

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19988.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19988


commit b48debb1f5b7d6ec0a44e5c4806f20ee9d6806d9
Author: Shintaro Murakami <mrkm4...@gmail.com>
Date:   2017-12-15T05:40:14Z

[Spark-22795] Raise error when line search in FirstOrderMinimizer failed




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org