[jira] [Commented] (DRILL-8459) bump avro to 1.11.3 due to cve

2023-10-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781196#comment-17781196
 ] 

ASF GitHub Bot commented on DRILL-8459:
---

cgivre merged PR #2841:
URL: https://github.com/apache/drill/pull/2841




> bump avro to 1.11.3 due to cve
> --
>
> Key: DRILL-8459
> URL: https://issues.apache.org/jira/browse/DRILL-8459
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/apache/drill/security/dependabot/49



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8460) bump zookeeper jar to 3.7.2 due to cve

2023-10-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781130#comment-17781130
 ] 

ASF GitHub Bot commented on DRILL-8460:
---

pjfanning opened a new pull request, #2842:
URL: https://github.com/apache/drill/pull/2842

   # [DRILL-8460](https://issues.apache.org/jira/browse/DRILL-8460): upgrade 
zookeeper jar to 3.7.2
   
   ## Description
   
   CVE issue
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   (Please describe how this PR has been tested.)
   




> bump zookeeper jar to 3.7.2 due to cve
> --
>
> Key: DRILL-8460
> URL: https://issues.apache.org/jira/browse/DRILL-8460
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/apache/drill/security/dependabot/51



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (DRILL-8460) bump zookeeper jar to 3.7.2 due to cve

2023-10-30 Thread PJ Fanning (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated DRILL-8460:
--
Parent: DRILL-8452
Issue Type: Sub-task  (was: Improvement)

> bump zookeeper jar to 3.7.2 due to cve
> --
>
> Key: DRILL-8460
> URL: https://issues.apache.org/jira/browse/DRILL-8460
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/apache/drill/security/dependabot/51



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8460) bump zookeeper jar to 3.7.2 due to cve

2023-10-30 Thread PJ Fanning (Jira)
PJ Fanning created DRILL-8460:
-

 Summary: bump zookeeper jar to 3.7.2 due to cve
 Key: DRILL-8460
 URL: https://issues.apache.org/jira/browse/DRILL-8460
 Project: Apache Drill
  Issue Type: Improvement
Reporter: PJ Fanning


https://github.com/apache/drill/security/dependabot/51



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8459) bump avro to 1.11.3 due to cve

2023-10-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781127#comment-17781127
 ] 

ASF GitHub Bot commented on DRILL-8459:
---

pjfanning opened a new pull request, #2841:
URL: https://github.com/apache/drill/pull/2841

   # [DRILL-8459](https://issues.apache.org/jira/browse/DRILL-8459): bump avro 
to 1.11.3
   
   ## Description
   
   CVE issue
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   (Please describe how this PR has been tested.)
   




> bump avro to 1.11.3 due to cve
> --
>
> Key: DRILL-8459
> URL: https://issues.apache.org/jira/browse/DRILL-8459
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/apache/drill/security/dependabot/49



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (DRILL-8459) bump avro to 1.11.3 due to cve

2023-10-30 Thread PJ Fanning (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated DRILL-8459:
--
Parent: DRILL-8452
Issue Type: Sub-task  (was: Improvement)

> bump avro to 1.11.3 due to cve
> --
>
> Key: DRILL-8459
> URL: https://issues.apache.org/jira/browse/DRILL-8459
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: PJ Fanning
>Priority: Major
>
> https://github.com/apache/drill/security/dependabot/49



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8459) bump avro to 1.11.3 due to cve

2023-10-30 Thread PJ Fanning (Jira)
PJ Fanning created DRILL-8459:
-

 Summary: bump avro to 1.11.3 due to cve
 Key: DRILL-8459
 URL: https://issues.apache.org/jira/browse/DRILL-8459
 Project: Apache Drill
  Issue Type: Improvement
Reporter: PJ Fanning


https://github.com/apache/drill/security/dependabot/49



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8457) Allow configuring csv parser in http storage plugin configuration

2023-10-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781046#comment-17781046
 ] 

ASF GitHub Bot commented on DRILL-8457:
---

ztomanek-dw commented on PR #2840:
URL: https://github.com/apache/drill/pull/2840#issuecomment-1785556883

   @cgivre 
   Thanks for the clarification, I was not sure if I could push multiple 
commits per one jira issue. 
   I've applied your suggestions and made sure it's rebased to current master




> Allow configuring csv parser in http storage plugin configuration
> -
>
> Key: DRILL-8457
> URL: https://issues.apache.org/jira/browse/DRILL-8457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HTTP
>Affects Versions: Future
>Reporter: Zbigniew Tomanek
>Priority: Minor
> Fix For: Future
>
>
> Currently there is no way to configure csv parser when http plugin is used. 
> Because of that some kind of files cannot be parsed (e.g. when any column has 
> more than 4096 chars or file has a delimiter different from `,`).
> Since in DataWalk we utilize http plugin quite often we've changed our 
> internal fork of Drill so following parser/format properties can be 
> configured using additional `csvOptions` field:
>  
> {code:json}
> {
>   "csvOptions": {
>     "delimiter": "\t",
>     "quote": "\"",
>     "quote_escape": "\"",
>     "line_separator": "\n",
>     "header_extraction_enabled": null,
>     "number_of_rows_to_skip": 0,
>     "number_of_records_to_read": -1,
>     "line_separator_detection_enabled": true,
>     "max_columns": 512,
>     "max_chars_per_column": 4096,
>     "skip_empty_lines": true,
>     "ignore_leading_whitespaces": true,
>     "ignore_trailing_whitespaces": true,
>     "null_value": null
>   }
> }{code}
> I'd be glad to get feedback whether creating PR with these changes would 
> bring any value to the Drill



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8457) Allow configuring csv parser in http storage plugin configuration

2023-10-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781042#comment-17781042
 ] 

ASF GitHub Bot commented on DRILL-8457:
---

cgivre commented on code in PR #2840:
URL: https://github.com/apache/drill/pull/2840#discussion_r1376443490


##
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpCSVOptions.java:
##
@@ -0,0 +1,287 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.http;
+
+
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.databind.annotation.JsonDeserialize;
+import com.fasterxml.jackson.databind.annotation.JsonPOJOBuilder;
+
+import java.util.Objects;
+
+@JsonInclude(JsonInclude.Include.NON_DEFAULT)
+@JsonDeserialize(builder = HttpCSVOptions.HttpCSVOptionsBuilder.class)
+public class HttpCSVOptions {
+
+
+  @JsonProperty
+  private final String delimiter;
+
+  @JsonProperty
+  private final char quote;
+
+  @JsonProperty
+  private final char quoteEscape;
+
+  @JsonProperty
+  private final String lineSeparator;
+
+  @JsonProperty
+  private final Boolean headerExtractionEnabled;
+
+  @JsonProperty
+  private final long numberOfRowsToSkip;
+
+  @JsonProperty
+  private final long numberOfRecordsToRead;
+
+  @JsonProperty
+  private final boolean lineSeparatorDetectionEnabled;
+
+  @JsonProperty
+  private final int maxColumns;
+
+  @JsonProperty
+  private final int maxCharsPerColumn;
+
+  @JsonProperty
+  private final boolean skipEmptyLines;
+
+  @JsonProperty
+  private final boolean ignoreLeadingWhitespaces;
+
+  @JsonProperty
+  private final boolean ignoreTrailingWhitespaces;
+
+  @JsonProperty
+  private final String nullValue;
+
+  HttpCSVOptions(HttpCSVOptionsBuilder builder) {
+this.delimiter = builder.delimiter;
+this.quote = builder.quote;
+this.quoteEscape = builder.quoteEscape;
+this.lineSeparator = builder.lineSeparator;
+this.headerExtractionEnabled = builder.headerExtractionEnabled;
+this.numberOfRowsToSkip = builder.numberOfRowsToSkip;
+this.numberOfRecordsToRead = builder.numberOfRecordsToRead;
+this.lineSeparatorDetectionEnabled = builder.lineSeparatorDetectionEnabled;
+this.maxColumns = builder.maxColumns;
+this.maxCharsPerColumn = builder.maxCharsPerColumn;
+this.skipEmptyLines = builder.skipEmptyLines;
+this.ignoreLeadingWhitespaces = builder.ignoreLeadingWhitespaces;
+this.ignoreTrailingWhitespaces = builder.ignoreTrailingWhitespaces;
+this.nullValue = builder.nullValue;
+  }
+
+  public static HttpCSVOptionsBuilder builder() {
+return new HttpCSVOptionsBuilder();
+  }
+
+  public String getDelimiter() {
+return delimiter;
+  }
+
+  public char getQuote() {
+return quote;
+  }
+
+  public char getQuoteEscape() {
+return quoteEscape;
+  }
+
+  public String getLineSeparator() {
+return lineSeparator;
+  }
+
+  public Boolean getHeaderExtractionEnabled() {
+return headerExtractionEnabled;
+  }
+
+  public long getNumberOfRowsToSkip() {
+return numberOfRowsToSkip;
+  }
+
+  public long getNumberOfRecordsToRead() {
+return numberOfRecordsToRead;
+  }
+
+  public boolean isLineSeparatorDetectionEnabled() {
+return lineSeparatorDetectionEnabled;
+  }
+
+  public int getMaxColumns() {
+return maxColumns;
+  }
+
+  public int getMaxCharsPerColumn() {
+return maxCharsPerColumn;
+  }
+
+  public boolean isSkipEmptyLines() {
+return skipEmptyLines;
+  }
+
+  public boolean isIgnoreLeadingWhitespaces() {
+return ignoreLeadingWhitespaces;
+  }
+
+  public boolean isIgnoreTrailingWhitespaces() {
+return ignoreTrailingWhitespaces;
+  }
+
+  public String getNullValue() {
+return nullValue;
+  }
+
+  @Override
+  public boolean equals(Object o) {
+if (this == o) {
+  return true;
+}
+if (o == null || getClass() != o.getClass()) {
+  return false;
+}
+HttpCSVOptions that = (HttpCSVOptions) o;
+return quote == that.quote && quoteEscape == that.quoteEscape && 
numberOfRowsToSkip == that.numberOfRowsToSkip && 

[jira] [Commented] (DRILL-8457) Allow configuring csv parser in http storage plugin configuration

2023-10-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781003#comment-17781003
 ] 

ASF GitHub Bot commented on DRILL-8457:
---

ztomanek-dw commented on PR #2840:
URL: https://github.com/apache/drill/pull/2840#issuecomment-1785165480

   @cgivre 
   Thanks for your feedback!
   
   According to your comments:
- written unit tests for `HttpCSVOptions`
- written unit tests for `HttpApiConfig`, by the way fixing small bug on 
`HttpMethod` validation
- added tsv parsing test to `TestHttpPlugin` 
- documented `csvOptions` configuration in `CSV_Options.md`
   
   Let me know if you see anything else to cover :)




> Allow configuring csv parser in http storage plugin configuration
> -
>
> Key: DRILL-8457
> URL: https://issues.apache.org/jira/browse/DRILL-8457
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - HTTP
>Affects Versions: Future
>Reporter: Zbigniew Tomanek
>Priority: Minor
> Fix For: Future
>
>
> Currently there is no way to configure csv parser when http plugin is used. 
> Because of that some kind of files cannot be parsed (e.g. when any column has 
> more than 4096 chars or file has a delimiter different from `,`).
> Since in DataWalk we utilize http plugin quite often we've changed our 
> internal fork of Drill so following parser/format properties can be 
> configured using additional `csvOptions` field:
>  
> {code:json}
> {
>   "csvOptions": {
>     "delimiter": "\t",
>     "quote": "\"",
>     "quote_escape": "\"",
>     "line_separator": "\n",
>     "header_extraction_enabled": null,
>     "number_of_rows_to_skip": 0,
>     "number_of_records_to_read": -1,
>     "line_separator_detection_enabled": true,
>     "max_columns": 512,
>     "max_chars_per_column": 4096,
>     "skip_empty_lines": true,
>     "ignore_leading_whitespaces": true,
>     "ignore_trailing_whitespaces": true,
>     "null_value": null
>   }
> }{code}
> I'd be glad to get feedback whether creating PR with these changes would 
> bring any value to the Drill



--
This message was sent by Atlassian Jira
(v8.20.10#820010)