[GitHub] [lucene] rmuir commented on a diff in pull request #913: Lucene 10577

2022-05-20 Thread GitBox


rmuir commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878641687


##
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
   u[i] += v[i];
 }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int 
bOffset, int len) {
+// fixme -- move to codec? What if later we want to access the bytes some 
other way?
+int total = 0;
+for (int i = 0; i < len; i++) {
+  total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   i wonder if we can coerce autovectorization of this loop somehow too. it is 
also worth looking into as it wouldn't require any vector api at all, maybe 
just manipulating the code. it is integer math: doesn't have the restrictions 
that floating point does (order of operations etc), so theoretically the 
compiler could do it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #913: Lucene 10577

2022-05-20 Thread GitBox


rmuir commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878562615


##
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
   u[i] += v[i];
 }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int 
bOffset, int len) {
+// fixme -- move to codec? What if later we want to access the bytes some 
other way?
+int total = 0;
+for (int i = 0; i < len; i++) {
+  total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   I think our result vector could be ShortVector and to try to use conversion 
(in some way that isnt slow) such as 
https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorOperators.html#B2S
 in a way that is efficient? 
   
   for the adding-up part, with the floats we did reduceLanes(), so I think 
we'd have to use same trick with conversion of result to IntVector/LongVector 
to create the sum.
   
   There are gigantic paragraphs on Vector.java javadoc on how to do these 
conversions (maybe they are actually efficient?), but I haven't tried.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #913: Lucene 10577

2022-05-20 Thread GitBox


rmuir commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878555057


##
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
   u[i] += v[i];
 }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int 
bOffset, int len) {
+// fixme -- move to codec? What if later we want to access the bytes some 
other way?
+int total = 0;
+for (int i = 0; i < len; i++) {
+  total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   well its going to multiply each `a` byte * each `b` byte and store each 
result in `c` byte. So you will overflow right there, before you even do any 
sum.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a diff in pull request #913: Lucene 10577

2022-05-20 Thread GitBox


rmuir commented on code in PR #913:
URL: https://github.com/apache/lucene/pull/913#discussion_r878550812


##
lucene/core/src/java/org/apache/lucene/util/VectorUtil.java:
##
@@ -213,4 +213,21 @@ public static void add(float[] u, float[] v) {
   u[i] += v[i];
 }
   }
+
+  public static float dotProduct(BytesRef a, int aOffset, BytesRef b, int 
bOffset, int len) {
+// fixme -- move to codec? What if later we want to access the bytes some 
other way?
+int total = 0;
+for (int i = 0; i < len; i++) {
+  total += a.bytes[aOffset++] * b.bytes[bOffset++];

Review Comment:
   Is this really the way the function should work?
   
   it is doing the equivalent of casting both `a` and `b` to `int` first, and 
then adding result to another `int`. That's just how java promotion works.
   
   But is this what we want? What about overflow? why `int` and not `long`? 
   
   This is not at all what `ByteVector.mul()` does.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org