[GitHub] [lucene] rmuir commented on a diff in pull request #913: Lucene 10577
rmuir commented on code in PR #913: URL: https://github.com/apache/lucene/pull/913#discussion_r878641687 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) { u[i] += v[i]; } } + + public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) { +// fixme -- move to codec? What if later we want to access the bytes some other way? +int total = 0; +for (int i = 0; i < len; i++) { + total += a.bytes[aOffset++] * b.bytes[bOffset++]; Review Comment: i wonder if we can coerce autovectorization of this loop somehow too. it is also worth looking into as it wouldn't require any vector api at all, maybe just manipulating the code. it is integer math: doesn't have the restrictions that floating point does (order of operations etc), so theoretically the compiler could do it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #913: Lucene 10577
rmuir commented on code in PR #913: URL: https://github.com/apache/lucene/pull/913#discussion_r878562615 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) { u[i] += v[i]; } } + + public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) { +// fixme -- move to codec? What if later we want to access the bytes some other way? +int total = 0; +for (int i = 0; i < len; i++) { + total += a.bytes[aOffset++] * b.bytes[bOffset++]; Review Comment: I think our result vector could be ShortVector and to try to use conversion (in some way that isnt slow) such as https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorOperators.html#B2S in a way that is efficient? for the adding-up part, with the floats we did reduceLanes(), so I think we'd have to use same trick with conversion of result to IntVector/LongVector to create the sum. There are gigantic paragraphs on Vector.java javadoc on how to do these conversions (maybe they are actually efficient?), but I haven't tried. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #913: Lucene 10577
rmuir commented on code in PR #913: URL: https://github.com/apache/lucene/pull/913#discussion_r878555057 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) { u[i] += v[i]; } } + + public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) { +// fixme -- move to codec? What if later we want to access the bytes some other way? +int total = 0; +for (int i = 0; i < len; i++) { + total += a.bytes[aOffset++] * b.bytes[bOffset++]; Review Comment: well its going to multiply each `a` byte * each `b` byte and store each result in `c` byte. So you will overflow right there, before you even do any sum. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on a diff in pull request #913: Lucene 10577
rmuir commented on code in PR #913: URL: https://github.com/apache/lucene/pull/913#discussion_r878550812 ## lucene/core/src/java/org/apache/lucene/util/VectorUtil.java: ## @@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) { u[i] += v[i]; } } + + public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int bOffset, int len) { +// fixme -- move to codec? What if later we want to access the bytes some other way? +int total = 0; +for (int i = 0; i < len; i++) { + total += a.bytes[aOffset++] * b.bytes[bOffset++]; Review Comment: Is this really the way the function should work? it is doing the equivalent of casting both `a` and `b` to `int` first, and then adding result to another `int`. That's just how java promotion works. But is this what we want? What about overflow? why `int` and not `long`? This is not at all what `ByteVector.mul()` does. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org