[jira] [Commented] (LUCENE-9838) simd version of VectorUtil.dotProduct

2021-03-20 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305576#comment-17305576
 ] 

Uwe Schindler commented on LUCENE-9838:
---

Hi David,
The problem is mainly that this only works exactly with JDK16 and only when you 
pass a command line parameter. In JDK17 it may not work anymore as the API is 
in incubator. Once it gets out of incubator Apis will change again. So you can 
only do it fully dynamic. Mr Jars won't work because of module system and the 
instability of api.
The same applies to mmap2 directory v2, but this one will likely be available 
soon, as api will possibly be released with JDK17. But mmap2 is simpler: we can 
ship it as a separate jar file (like lucene-mmap2-incubator-jdk16.jar). This 
works for mmap2, as there's an abstract interface in lucene. So it's easy to 
plugin and compile completely separate.

> simd version of VectorUtil.dotProduct
> -
>
> Key: LUCENE-9838
> URL: https://issues.apache.org/jira/browse/LUCENE-9838
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9838.patch, LUCENE-9838_standalone.patch
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Followup to LUCENE-9837
> Let's explore using JDK 16 vector API to speed this up more. It might be a 
> hassle to try to MR-JAR/package up for users (adding commandline flags and 
> stuff), but it gives good performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9838) simd version of VectorUtil.dotProduct

2021-03-20 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305573#comment-17305573
 ] 

Robert Muir commented on LUCENE-9838:
-

No mr-jar cannot be used here. I was the one that originally added the 
FutureArrays stuff :)

The problem here is the incubating module (cannot compile or run without 
special flags). And now it is undergoing another incubation round: 
https://openjdk.java.net/jeps/8261663

So it may seriously be java 21 or something like that before it can be used.

> simd version of VectorUtil.dotProduct
> -
>
> Key: LUCENE-9838
> URL: https://issues.apache.org/jira/browse/LUCENE-9838
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9838.patch, LUCENE-9838_standalone.patch
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Followup to LUCENE-9837
> Let's explore using JDK 16 vector API to speed this up more. It might be a 
> hassle to try to MR-JAR/package up for users (adding commandline flags and 
> stuff), but it gives good performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9838) simd version of VectorUtil.dotProduct

2021-03-20 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305572#comment-17305572
 ] 

David Smiley commented on LUCENE-9838:
--

Rob, is the concern that it'll be forever before Lucene requires JDK 16 or 
later?  Yeah.  I think there are ways to mitigate that.  Lucene could have a 
module containing functionality that has different implementations for 
different JVMs.  It could be published as either a multi-release JAR file or 
separate JAR files that are compatible.  There's [a blog post on 
gradle.com|https://blog.gradle.org/mrjars] discussing these techniques.  Today, 
Lucene has FutureArrays and FutureObjects classes which are kind of a baby step 
to these ideas.

CC [~uschindler]

> simd version of VectorUtil.dotProduct
> -
>
> Key: LUCENE-9838
> URL: https://issues.apache.org/jira/browse/LUCENE-9838
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9838.patch, LUCENE-9838_standalone.patch
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Followup to LUCENE-9837
> Let's explore using JDK 16 vector API to speed this up more. It might be a 
> hassle to try to MR-JAR/package up for users (adding commandline flags and 
> stuff), but it gives good performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene] rmuir commented on a change in pull request #15: LUCENE-8972: Add ICUTransformCharFilter, to support pre-tokenizer ICU text transformation

2021-03-20 Thread GitBox


rmuir commented on a change in pull request #15:
URL: https://github.com/apache/lucene/pull/15#discussion_r598135382



##
File path: 
lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/ICUTransformCharFilterFactory.java
##
@@ -0,0 +1,391 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.analysis.icu;
+
+import com.ibm.icu.impl.Utility;
+import com.ibm.icu.text.Normalizer2;
+import com.ibm.icu.text.Transliterator;
+import com.ibm.icu.text.UTF16;
+import java.io.Reader;
+import java.util.ArrayDeque;
+import java.util.Arrays;
+import java.util.Locale;
+import java.util.Map;
+import java.util.WeakHashMap;
+import org.apache.lucene.analysis.CharFilterFactory;
+
+/**
+ * Factory for {@link ICUTransformCharFilter}.
+ *
+ * Supports the following attributes:
+ *
+ * 
+ *   id (mandatory): A Transliterator ID, one from {@link 
Transliterator#getAvailableIDs()}
+ *   direction (optional): Either 'forward' or 'reverse'. Default is 
forward.
+ * 
+ *
+ * @see Transliterator
+ * @since 8.3.0
+ * @lucene.spi {@value #NAME}
+ */
+public class ICUTransformCharFilterFactory extends CharFilterFactory {
+
+  /** SPI name */
+  public static final String NAME = "icuTransform";
+
+  static final String MAX_ROLLBACK_BUFFER_CAPACITY_ARGNAME = 
"maxRollbackBufferCapacity";
+  static final String FAIL_ON_ROLLBACK_BUFFER_OVERFLOW_ARGNAME = 
"failOnRollbackBufferOverflow";
+  static final String SUPPRESS_UNICODE_NORMALIZATION_EXTERNALIZATION_ARGNAME =
+  "suppressUnicodeNormalizationExternalization";
+  private final NormType leading;
+  private final Transliterator transliterator;
+  private final NormType trailing;
+  private final int maxRollbackBufferCapacity;
+  private final boolean failOnRollbackBufferOverflow;
+
+  // TODO: add support for custom rules
+  /** Creates a new ICUTransformFilterFactory */
+  public ICUTransformCharFilterFactory(Map args) {
+super(args);
+String id = require(args, "id");
+String direction =
+get(args, "direction", Arrays.asList("forward", "reverse"), "forward", 
false);
+int dir = "forward".equals(direction) ? Transliterator.FORWARD : 
Transliterator.REVERSE;
+int tmpCapacityHint =
+getInt(
+args,
+MAX_ROLLBACK_BUFFER_CAPACITY_ARGNAME,
+ICUTransformCharFilter.DEFAULT_MAX_ROLLBACK_BUFFER_CAPACITY);
+this.maxRollbackBufferCapacity = tmpCapacityHint == -1 ? Integer.MAX_VALUE 
: tmpCapacityHint;
+this.failOnRollbackBufferOverflow =
+getBoolean(
+args,
+FAIL_ON_ROLLBACK_BUFFER_OVERFLOW_ARGNAME,
+ICUTransformCharFilter.DEFAULT_FAIL_ON_ROLLBACK_BUFFER_OVERFLOW);
+boolean suppressUnicodeNormalizationExternalization =
+getBoolean(args, 
SUPPRESS_UNICODE_NORMALIZATION_EXTERNALIZATION_ARGNAME, false);
+Transliterator stockTransliterator = Transliterator.getInstance(id, dir);
+if (suppressUnicodeNormalizationExternalization) {
+  this.leading = null;
+  this.transliterator = stockTransliterator;
+  this.trailing = null;
+} else {
+  ExternalNormalization ext = 
externalizeUnicodeNormalization(stockTransliterator);
+  this.leading = ext.leading;
+  this.transliterator = ext.t;
+  this.trailing = ext.trailing;
+}
+if (!args.isEmpty()) {
+  throw new IllegalArgumentException("Unknown parameters: " + args);
+}
+  }
+
+  private static final Reader wrapReader(NormType normType, Reader r) {
+if (normType == null) {
+  return r;
+}
+switch (normType) {
+  case NFC:
+return new ICUNormalizer2CharFilter(r, Normalizer2.getNFCInstance());
+  case NFD:
+return new ICUNormalizer2CharFilter(r, Normalizer2.getNFDInstance());
+  case NFKC:
+return new ICUNormalizer2CharFilter(r, Normalizer2.getNFKCInstance());
+  case NFKD:
+return new ICUNormalizer2CharFilter(r, Normalizer2.getNFKDInstance());
+  default:
+throw new UnsupportedOperationException(
+"test not yet able to compensate externally for normalization type 
\""
++ normType
++ "\"");
+}
+  }
+
+  /** Default ctor for compatibility with SPI */

[GitHub] [lucene-site] janhoy merged pull request #54: Update ASF banner style

2021-03-20 Thread GitBox


janhoy merged pull request #54:
URL: https://github.com/apache/lucene-site/pull/54


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org