[GitHub] [drill] jnturton opened a new pull request, #2635: DRILL-8282: Update hadoop.dll and winutils.exe to 3.2.4.

2022-08-29 Thread GitBox


jnturton opened a new pull request, #2635:
URL: https://github.com/apache/drill/pull/2635

   # [DRILL-8282](https://issues.apache.org/jira/browse/DRILL-8282): Update 
hadoop.dll and winutils.exe to 3.2.4.
   
   ## Description
   
   Completes #2630 by updating hadoop.dll and winutils.exe to 3.2.4.
   
   ## Documentation
   N/A
   
   ## Testing
   Launch Drill on Windows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] cgivre commented on a diff in pull request #2633: DRILL-8287: Add Support for Keyset Based Pagination

2022-08-29 Thread GitBox


cgivre commented on code in PR #2633:
URL: https://github.com/apache/drill/pull/2633#discussion_r957380889


##
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpPaginatorConfig.java:
##
@@ -137,21 +162,28 @@ public String toString() {
   .field("pageSize", pageSize)
   .field("maxRecords", maxRecords)
   .field("method", method)
+  .field("indexParam", indexParam)
+  .field("hasMoreParam", hasMoreParam)
+  .field("nextPageParam", nextPageParam)
   .toString();
   }
 
   public enum PaginatorMethod {
 OFFSET,
-PAGE
+PAGE,
+INDEX
   }
 
-  private HttpPaginatorConfig(HttpPaginatorConfig.HttpPaginatorBuilder 
builder) {
+  /*private HttpPaginatorConfig(HttpPaginatorConfig.HttpPaginatorConfigBuilder 
builder) {

Review Comment:
   Oops... Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] cgivre commented on pull request #2633: DRILL-8287: Add Support for Keyset Based Pagination

2022-08-29 Thread GitBox


cgivre commented on PR #2633:
URL: https://github.com/apache/drill/pull/2633#issuecomment-1230366234

   > I'm not sure that the concept of pagination from the HTTP plugin should 
spill into the JSON reader. Can you abstract it, e.g. by renaming paginationMap 
to, say, listenerColumnMap?
   
   Fixed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] jnturton merged pull request #2630: DRILL-8282: Bump Hadoop-Common Version to 3.2.4 (CVE)

2022-08-29 Thread GitBox


jnturton merged PR #2630:
URL: https://github.com/apache/drill/pull/2630


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] jnturton merged pull request #2635: DRILL-8282: Update hadoop.dll and winutils.exe to 3.2.4.

2022-08-29 Thread GitBox


jnturton merged PR #2635:
URL: https://github.com/apache/drill/pull/2635


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] jnturton commented on a diff in pull request #2633: DRILL-8287: Add Support for Keyset Based Pagination

2022-08-29 Thread GitBox


jnturton commented on code in PR #2633:
URL: https://github.com/apache/drill/pull/2633#discussion_r957433471


##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/values/ScalarListener.java:
##
@@ -76,4 +79,33 @@ protected void setArrayNull() {
   protected UserException typeConversionError(String jsonType) {
 return loader.typeConversionError(schema(), jsonType);
   }
+
+  /**
+   * Adds a field's most recent value to the pagination map.  This is 
necessary for the HTTP plugin
+   * for index or keyset pagination where the API transmits values in the 
results that are used to
+   * generate the next page.
+   *
+   * This data is only stored if the pagination map is defined, and has keys.

Review Comment:
   Can this be rewritten in terms of generic column listeners rather than 
pagination and the HTTP plugin?



##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/parser/SimpleMessageParser.java:
##
@@ -66,11 +68,13 @@
 public class SimpleMessageParser implements MessageParser {
 
   private final String[] path;
+  private final Map paginationFields;
 
-  public SimpleMessageParser(String dataPath) {
+  public SimpleMessageParser(String dataPath, Map 
paginationFields) {

Review Comment:
   Can we rename "pagination" here too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] cgivre commented on pull request #2633: DRILL-8287: Add Support for Keyset Based Pagination

2022-08-29 Thread GitBox


cgivre commented on PR #2633:
URL: https://github.com/apache/drill/pull/2633#issuecomment-1230390598

   @jnturton Thanks for the quick review!  I addressed your comments.   I 
actually reinserted the commented out block as that was intended to make sure 
that the user properly populates the pagination fields.  Not sure why I 
commented that out in the first place.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] jnturton commented on a diff in pull request #2633: DRILL-8287: Add Support for Keyset Based Pagination

2022-08-29 Thread GitBox


jnturton commented on code in PR #2633:
URL: https://github.com/apache/drill/pull/2633#discussion_r957255881


##
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpPaginatorConfig.java:
##
@@ -137,21 +162,28 @@ public String toString() {
   .field("pageSize", pageSize)
   .field("maxRecords", maxRecords)
   .field("method", method)
+  .field("indexParam", indexParam)
+  .field("hasMoreParam", hasMoreParam)
+  .field("nextPageParam", nextPageParam)
   .toString();
   }
 
   public enum PaginatorMethod {
 OFFSET,
-PAGE
+PAGE,
+INDEX
   }
 
-  private HttpPaginatorConfig(HttpPaginatorConfig.HttpPaginatorBuilder 
builder) {
+  /*private HttpPaginatorConfig(HttpPaginatorConfig.HttpPaginatorConfigBuilder 
builder) {

Review Comment:
   Is this commented out code meant to be included?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] cgivre commented on a diff in pull request #2633: DRILL-8287: Add Support for Keyset Based Pagination

2022-08-29 Thread GitBox


cgivre commented on code in PR #2633:
URL: https://github.com/apache/drill/pull/2633#discussion_r957448436


##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/values/ScalarListener.java:
##
@@ -76,4 +79,33 @@ protected void setArrayNull() {
   protected UserException typeConversionError(String jsonType) {
 return loader.typeConversionError(schema(), jsonType);
   }
+
+  /**
+   * Adds a field's most recent value to the pagination map.  This is 
necessary for the HTTP plugin
+   * for index or keyset pagination where the API transmits values in the 
results that are used to
+   * generate the next page.
+   *
+   * This data is only stored if the pagination map is defined, and has keys.

Review Comment:
   Done!



##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/parser/SimpleMessageParser.java:
##
@@ -66,11 +68,13 @@
 public class SimpleMessageParser implements MessageParser {
 
   private final String[] path;
+  private final Map paginationFields;
 
-  public SimpleMessageParser(String dataPath) {
+  public SimpleMessageParser(String dataPath, Map 
paginationFields) {

Review Comment:
   Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2633: DRILL-8287: Add Support for Keyset Based Pagination

2022-08-29 Thread GitBox


jnturton commented on PR #2633:
URL: https://github.com/apache/drill/pull/2633#issuecomment-1230204078

   I'm not sure that the concept of pagination from the HTTP plugin should 
spill into the JSON reader. Can you abstract it, e.g. by renaming paginationMap 
to, say, listenerColumnMap?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] jnturton commented on pull request #2630: DRILL-8282: Bump Hadoop-Common Version to 3.2.4 (CVE)

2022-08-29 Thread GitBox


jnturton commented on PR #2630:
URL: https://github.com/apache/drill/pull/2630#issuecomment-1229820379

   We also need to update hadoop.dll and winutils.exe.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] pjfanning commented on a diff in pull request #2634: DRILL-8289: Add Threat Hunting Functions

2022-08-29 Thread GitBox


pjfanning commented on code in PR #2634:
URL: https://github.com/apache/drill/pull/2634#discussion_r957792365


##
contrib/udfs/src/main/java/org/apache/drill/exec/udfs/ThreatHuntingFunctions.java:
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udfs;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.Float8Holder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+public class ThreatHuntingFunctions {
+  /**
+   * Punctuation pattern is useful for comparing log entries.  It extracts the 
all the punctuation and returns
+   * that pattern.  Spaces are replaced with an underscore.
+   * 
+   * Usage: SELECT punctuation_pattern( string ) FROM...
+   */
+  @FunctionTemplate(names = {"punctuation_pattern", "punctuationPattern"},
+scope = FunctionTemplate.FunctionScope.SIMPLE,
+nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
+  public static class PunctuationPatternFunction implements DrillSimpleFunc {
+
+@Param
+VarCharHolder rawInput;
+
+@Output
+VarCharHolder out;
+
+@Inject
+DrillBuf buffer;
+
+@Override
+public void setup() {
+}
+
+@Override
+public void eval() {
+
+  String input = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(rawInput.start,
 rawInput.end, rawInput.buffer);
+
+  String punctuationPattern = input.replaceAll("[a-zA-Z0-9]", "");
+  punctuationPattern = punctuationPattern.replaceAll(" ", "_");
+
+  out.buffer = buffer;
+  out.start = 0;
+  out.end = punctuationPattern.getBytes().length;

Review Comment:
   getBytes is safer if you specify a charset, otherwise you get the JVM 
default which differs from machine to machine (unless Drill startup shell 
scripts specify `-Dfile.encoding=...`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] pjfanning commented on a diff in pull request #2634: DRILL-8289: Add Threat Hunting Functions

2022-08-29 Thread GitBox


pjfanning commented on code in PR #2634:
URL: https://github.com/apache/drill/pull/2634#discussion_r957792833


##
contrib/udfs/src/main/java/org/apache/drill/exec/udfs/ThreatHuntingFunctions.java:
##
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.udfs;
+
+import io.netty.buffer.DrillBuf;
+import org.apache.drill.exec.expr.DrillSimpleFunc;
+import org.apache.drill.exec.expr.annotations.FunctionTemplate;
+import org.apache.drill.exec.expr.annotations.Output;
+import org.apache.drill.exec.expr.annotations.Param;
+import org.apache.drill.exec.expr.holders.Float8Holder;
+import org.apache.drill.exec.expr.holders.VarCharHolder;
+
+import javax.inject.Inject;
+
+public class ThreatHuntingFunctions {
+  /**
+   * Punctuation pattern is useful for comparing log entries.  It extracts the 
all the punctuation and returns

Review Comment:
   `the all the` should probably be `all the`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] cgivre merged pull request #2633: DRILL-8287: Add Support for Keyset Based Pagination

2022-08-29 Thread GitBox


cgivre merged PR #2633:
URL: https://github.com/apache/drill/pull/2633


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] jnturton commented on a diff in pull request #2633: DRILL-8287: Add Support for Keyset Based Pagination

2022-08-29 Thread GitBox


jnturton commented on code in PR #2633:
URL: https://github.com/apache/drill/pull/2633#discussion_r957489232


##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/parser/SimpleMessageParser.java:
##
@@ -129,6 +135,44 @@ private boolean parseInnerLevel(TokenIterator tokenizer, 
int level) throws Messa
 return parseToElement(tokenizer, level + 1);
   }
 
+  /**
+   * This function is called when a storage plugin needs to retrieve values 
which have been read.  This logic
+   * enables use of the data path in these situations.  Normally, when the 
datapath is defined, the JSON reader
+   * will "free-wheel" over unprojected columns or columns outside of the 
datapath.  However, in this case, often
+   * the values which are being read, are outside the dataPath.  This logic 
offers a way to capture these values
+   * without creating a ValueVector for them.
+   *
+   * @param tokenizer A {@link TokenIterator} of the parsed JSON data.
+   * @param fieldName A {@link String} of the pagination field name.

Review Comment:
   ```suggestion
  * @param fieldName A {@link String} of the listener column name.
   ```



##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/loader/TupleParser.java:
##
@@ -127,10 +127,19 @@ public TupleParser(JsonLoaderImpl loader, TupleWriter 
tupleWriter, TupleMetadata
 
   @Override
   public ElementParser onField(String key, TokenIterator tokenizer) {
-if (!tupleWriter.isProjected(key)) {
+if (projectField(key)) {
+  return fieldParserFor(key, tokenizer);
+} else {
   return fieldFactory().ignoredFieldParser();
+}
+  }
+
+  private boolean projectField(String key) {
+// This method makes sure that fields necessary for pagination are read.

Review Comment:
   ```suggestion
   // This method makes sure that fields necessary for column listeners are 
read.
   ```



##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/values/ScalarListener.java:
##
@@ -76,4 +79,30 @@ protected void setArrayNull() {
   protected UserException typeConversionError(String jsonType) {
 return loader.typeConversionError(schema(), jsonType);
   }
+
+  /**
+   * Adds a field's most recent value to the column listener map.
+   * This data is only stored if the listener column map is defined, and has 
keys.
+   * @param key The key of the pagination field

Review Comment:
   ```suggestion
  * @param key The key of the listener field
   ```



##
exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/values/ScalarListener.java:
##
@@ -76,4 +79,30 @@ protected void setArrayNull() {
   protected UserException typeConversionError(String jsonType) {
 return loader.typeConversionError(schema(), jsonType);
   }
+
+  /**
+   * Adds a field's most recent value to the column listener map.
+   * This data is only stored if the listener column map is defined, and has 
keys.
+   * @param key The key of the pagination field
+   * @param value The value of to be retained
+   */
+  protected void addValueToListenerMap(String key, String value) {
+Map listenerColumnMap = loader.listenerColumnMap();
+
+if (listenerColumnMap == null || listenerColumnMap.isEmpty()) {
+  return;
+} else if (listenerColumnMap.containsKey(key) && 
StringUtils.isNotEmpty(value)) {
+  listenerColumnMap.put(key, value);
+}
+  }
+
+  protected void addValueToListenerMap(String key, Object value) {
+Map paginationMap = loader.listenerColumnMap();

Review Comment:
   ```suggestion
   Map listenerMap = loader.listenerColumnMap();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [drill] pjfanning commented on pull request #2631: [MINOR UPDATE]: Disable lgtm build

2022-08-29 Thread GitBox


pjfanning commented on PR #2631:
URL: https://github.com/apache/drill/pull/2631#issuecomment-1230765965

   @cgivre thanks for sorting out the INFRA ticket. This PR is now really just 
for removing a file that is no longer needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org