adelapena commented on code in PR #2465:
URL: https://github.com/apache/cassandra/pull/2465#discussion_r1252975681


##########
src/java/org/apache/cassandra/index/sai/analyzer/filter/BasicResultFilters.java:
##########
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.index.sai.analyzer.filter;
+
+import java.text.Normalizer;
+import java.util.Locale;
+
+import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter;
+
+/**
+ * Basic/General Token Filters
+ */
+public class BasicResultFilters
+{
+    private static final Locale DEFAULT_LOCALE = Locale.getDefault();
+
+    public static class LowerCase extends FilterPipelineTask
+    {
+        private final Locale locale;
+
+        public LowerCase()
+        {
+            this.locale = DEFAULT_LOCALE;
+        }
+
+        public String process(String input)
+        {
+            return input.toLowerCase(locale);
+        }
+    }
+
+    public static class Normalize extends FilterPipelineTask
+    {
+        public Normalize() { }
+
+        public String process(String input)

Review Comment:
   Nit: add `@Override`



##########
src/java/org/apache/cassandra/index/sai/analyzer/AbstractAnalyzer.java:
##########
@@ -22,12 +22,22 @@
 import java.util.Iterator;
 import java.util.Map;
 import java.util.NoSuchElementException;
+import java.util.Set;
+
+import com.google.common.collect.ImmutableSet;
 
 import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.db.marshal.AsciiType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.exceptions.InvalidRequestException;
+import org.apache.cassandra.index.sai.utils.TypeUtil;
 
 public abstract class AbstractAnalyzer implements Iterator<ByteBuffer>
 {
+    public static final Set<AbstractType<?>> ANALYZABLE_TYPES = 
ImmutableSet.of(UTF8Type.instance, AsciiType.instance);
+
     protected ByteBuffer next = null;
+    String nextLiteral = null;

Review Comment:
   Nit: can be `protected`



##########
src/java/org/apache/cassandra/index/sai/analyzer/filter/BasicResultFilters.java:
##########
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.index.sai.analyzer.filter;
+
+import java.text.Normalizer;
+import java.util.Locale;
+
+import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter;
+
+/**
+ * Basic/General Token Filters
+ */
+public class BasicResultFilters
+{
+    private static final Locale DEFAULT_LOCALE = Locale.getDefault();
+
+    public static class LowerCase extends FilterPipelineTask
+    {
+        private final Locale locale;
+
+        public LowerCase()
+        {
+            this.locale = DEFAULT_LOCALE;
+        }
+
+        public String process(String input)

Review Comment:
   Nit: add `@Override`



##########
src/java/org/apache/cassandra/index/sai/analyzer/filter/BasicResultFilters.java:
##########
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.index.sai.analyzer.filter;
+
+import java.text.Normalizer;
+import java.util.Locale;
+
+import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter;
+
+/**
+ * Basic/General Token Filters
+ */
+public class BasicResultFilters
+{
+    private static final Locale DEFAULT_LOCALE = Locale.getDefault();
+
+    public static class LowerCase extends FilterPipelineTask
+    {
+        private final Locale locale;
+
+        public LowerCase()
+        {
+            this.locale = DEFAULT_LOCALE;
+        }
+
+        public String process(String input)
+        {
+            return input.toLowerCase(locale);
+        }
+    }
+
+    public static class Normalize extends FilterPipelineTask
+    {
+        public Normalize() { }
+
+        public String process(String input)
+        {
+            if (input == null) return null;
+            return Normalizer.isNormalized(input, Normalizer.Form.NFC) ? input 
: Normalizer.normalize(input, Normalizer.Form.NFC);
+        }
+    }
+
+    public static class Ascii extends FilterPipelineTask
+    {
+        public Ascii() { }
+
+        public String process(String input)

Review Comment:
   Nit: add `@Override`



##########
src/java/org/apache/cassandra/index/sai/analyzer/filter/BasicResultFilters.java:
##########
@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.index.sai.analyzer.filter;
+
+import java.text.Normalizer;
+import java.util.Locale;
+
+import org.apache.lucene.analysis.miscellaneous.ASCIIFoldingFilter;
+
+/**
+ * Basic/General Token Filters
+ */
+public class BasicResultFilters
+{
+    private static final Locale DEFAULT_LOCALE = Locale.getDefault();
+
+    public static class LowerCase extends FilterPipelineTask
+    {
+        private final Locale locale;
+
+        public LowerCase()
+        {
+            this.locale = DEFAULT_LOCALE;
+        }
+
+        public String process(String input)
+        {
+            return input.toLowerCase(locale);
+        }
+    }
+
+    public static class Normalize extends FilterPipelineTask
+    {
+        public Normalize() { }
+
+        public String process(String input)
+        {
+            if (input == null) return null;
+            return Normalizer.isNormalized(input, Normalizer.Form.NFC) ? input 
: Normalizer.normalize(input, Normalizer.Form.NFC);
+        }
+    }
+
+    public static class Ascii extends FilterPipelineTask
+    {
+        public Ascii() { }
+
+        public String process(String input)
+        {
+            if (input == null) return null;
+            char[] inputChars = input.toCharArray();
+            // The output can (potentially) be 4 times the size of the input
+            char[] outputChars = new char[inputChars.length * 4];
+            int outputSize = ASCIIFoldingFilter.foldToASCII(inputChars, 0, 
outputChars, 0, inputChars.length);
+            return new String(outputChars, 0, outputSize);
+        }
+    }
+
+    public static class NoOperation extends FilterPipelineTask
+    {
+        public String process(String input)

Review Comment:
   Nit: add `@Override`



##########
src/java/org/apache/cassandra/index/sai/analyzer/filter/FilterPipelineExecutor.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.index.sai.analyzer.filter;
+
+/**
+ * Executes all linked Pipeline Tasks serially and returns
+ * output (if exists) from the executed logic

Review Comment:
   Nit: no need to break the line



##########
test/unit/org/apache/cassandra/index/sai/analyzer/NonTokenizingAnalyzerTest.java:
##########
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.index.sai.analyzer;
+
+import java.nio.ByteBuffer;
+
+import org.junit.Test;
+
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.utils.ByteBufferUtil;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+
+/**
+ * Tests for the non-tokenizing analyzer
+ */
+public class NonTokenizingAnalyzerTest

Review Comment:
   It seems there is some code duplication across the tests in this class. 
Maybe we could use a utility method such as, for example:
   ```java
   private void test(String input, String expected, AbstractAnalyzer analyzer) 
throws Exception
   {
       ByteBuffer toAnalyze = ByteBuffer.wrap(input.getBytes());
       analyzer.reset(toAnalyze);
       ByteBuffer analyzed = null;
       while (analyzer.hasNext())
       {
           analyzed = analyzer.next();
       }
       String result = ByteBufferUtil.string(analyzed);
       assertEquals(expected, result);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to