[jira] [Commented] (DRILL-8459) bump avro to 1.11.3 due to cve
[ https://issues.apache.org/jira/browse/DRILL-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781196#comment-17781196 ] ASF GitHub Bot commented on DRILL-8459: --- cgivre merged PR #2841: URL: https://github.com/apache/drill/pull/2841 > bump avro to 1.11.3 due to cve > -- > > Key: DRILL-8459 > URL: https://issues.apache.org/jira/browse/DRILL-8459 > Project: Apache Drill > Issue Type: Sub-task >Reporter: PJ Fanning >Priority: Major > > https://github.com/apache/drill/security/dependabot/49 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8460) bump zookeeper jar to 3.7.2 due to cve
[ https://issues.apache.org/jira/browse/DRILL-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781130#comment-17781130 ] ASF GitHub Bot commented on DRILL-8460: --- pjfanning opened a new pull request, #2842: URL: https://github.com/apache/drill/pull/2842 # [DRILL-8460](https://issues.apache.org/jira/browse/DRILL-8460): upgrade zookeeper jar to 3.7.2 ## Description CVE issue ## Documentation (Please describe user-visible changes similar to what should appear in the Drill documentation.) ## Testing (Please describe how this PR has been tested.) > bump zookeeper jar to 3.7.2 due to cve > -- > > Key: DRILL-8460 > URL: https://issues.apache.org/jira/browse/DRILL-8460 > Project: Apache Drill > Issue Type: Sub-task >Reporter: PJ Fanning >Priority: Major > > https://github.com/apache/drill/security/dependabot/51 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8460) bump zookeeper jar to 3.7.2 due to cve
[ https://issues.apache.org/jira/browse/DRILL-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated DRILL-8460: -- Parent: DRILL-8452 Issue Type: Sub-task (was: Improvement) > bump zookeeper jar to 3.7.2 due to cve > -- > > Key: DRILL-8460 > URL: https://issues.apache.org/jira/browse/DRILL-8460 > Project: Apache Drill > Issue Type: Sub-task >Reporter: PJ Fanning >Priority: Major > > https://github.com/apache/drill/security/dependabot/51 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8460) bump zookeeper jar to 3.7.2 due to cve
PJ Fanning created DRILL-8460: - Summary: bump zookeeper jar to 3.7.2 due to cve Key: DRILL-8460 URL: https://issues.apache.org/jira/browse/DRILL-8460 Project: Apache Drill Issue Type: Improvement Reporter: PJ Fanning https://github.com/apache/drill/security/dependabot/51 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8459) bump avro to 1.11.3 due to cve
[ https://issues.apache.org/jira/browse/DRILL-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781127#comment-17781127 ] ASF GitHub Bot commented on DRILL-8459: --- pjfanning opened a new pull request, #2841: URL: https://github.com/apache/drill/pull/2841 # [DRILL-8459](https://issues.apache.org/jira/browse/DRILL-8459): bump avro to 1.11.3 ## Description CVE issue ## Documentation (Please describe user-visible changes similar to what should appear in the Drill documentation.) ## Testing (Please describe how this PR has been tested.) > bump avro to 1.11.3 due to cve > -- > > Key: DRILL-8459 > URL: https://issues.apache.org/jira/browse/DRILL-8459 > Project: Apache Drill > Issue Type: Sub-task >Reporter: PJ Fanning >Priority: Major > > https://github.com/apache/drill/security/dependabot/49 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8459) bump avro to 1.11.3 due to cve
[ https://issues.apache.org/jira/browse/DRILL-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated DRILL-8459: -- Parent: DRILL-8452 Issue Type: Sub-task (was: Improvement) > bump avro to 1.11.3 due to cve > -- > > Key: DRILL-8459 > URL: https://issues.apache.org/jira/browse/DRILL-8459 > Project: Apache Drill > Issue Type: Sub-task >Reporter: PJ Fanning >Priority: Major > > https://github.com/apache/drill/security/dependabot/49 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8459) bump avro to 1.11.3 due to cve
PJ Fanning created DRILL-8459: - Summary: bump avro to 1.11.3 due to cve Key: DRILL-8459 URL: https://issues.apache.org/jira/browse/DRILL-8459 Project: Apache Drill Issue Type: Improvement Reporter: PJ Fanning https://github.com/apache/drill/security/dependabot/49 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8457) Allow configuring csv parser in http storage plugin configuration
[ https://issues.apache.org/jira/browse/DRILL-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781046#comment-17781046 ] ASF GitHub Bot commented on DRILL-8457: --- ztomanek-dw commented on PR #2840: URL: https://github.com/apache/drill/pull/2840#issuecomment-1785556883 @cgivre Thanks for the clarification, I was not sure if I could push multiple commits per one jira issue. I've applied your suggestions and made sure it's rebased to current master > Allow configuring csv parser in http storage plugin configuration > - > > Key: DRILL-8457 > URL: https://issues.apache.org/jira/browse/DRILL-8457 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HTTP >Affects Versions: Future >Reporter: Zbigniew Tomanek >Priority: Minor > Fix For: Future > > > Currently there is no way to configure csv parser when http plugin is used. > Because of that some kind of files cannot be parsed (e.g. when any column has > more than 4096 chars or file has a delimiter different from `,`). > Since in DataWalk we utilize http plugin quite often we've changed our > internal fork of Drill so following parser/format properties can be > configured using additional `csvOptions` field: > > {code:json} > { > "csvOptions": { > "delimiter": "\t", > "quote": "\"", > "quote_escape": "\"", > "line_separator": "\n", > "header_extraction_enabled": null, > "number_of_rows_to_skip": 0, > "number_of_records_to_read": -1, > "line_separator_detection_enabled": true, > "max_columns": 512, > "max_chars_per_column": 4096, > "skip_empty_lines": true, > "ignore_leading_whitespaces": true, > "ignore_trailing_whitespaces": true, > "null_value": null > } > }{code} > I'd be glad to get feedback whether creating PR with these changes would > bring any value to the Drill -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8457) Allow configuring csv parser in http storage plugin configuration
[ https://issues.apache.org/jira/browse/DRILL-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781042#comment-17781042 ] ASF GitHub Bot commented on DRILL-8457: --- cgivre commented on code in PR #2840: URL: https://github.com/apache/drill/pull/2840#discussion_r1376443490 ## contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpCSVOptions.java: ## @@ -0,0 +1,287 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.http; + + +import com.fasterxml.jackson.annotation.JsonInclude; +import com.fasterxml.jackson.annotation.JsonProperty; +import com.fasterxml.jackson.databind.annotation.JsonDeserialize; +import com.fasterxml.jackson.databind.annotation.JsonPOJOBuilder; + +import java.util.Objects; + +@JsonInclude(JsonInclude.Include.NON_DEFAULT) +@JsonDeserialize(builder = HttpCSVOptions.HttpCSVOptionsBuilder.class) +public class HttpCSVOptions { + + + @JsonProperty + private final String delimiter; + + @JsonProperty + private final char quote; + + @JsonProperty + private final char quoteEscape; + + @JsonProperty + private final String lineSeparator; + + @JsonProperty + private final Boolean headerExtractionEnabled; + + @JsonProperty + private final long numberOfRowsToSkip; + + @JsonProperty + private final long numberOfRecordsToRead; + + @JsonProperty + private final boolean lineSeparatorDetectionEnabled; + + @JsonProperty + private final int maxColumns; + + @JsonProperty + private final int maxCharsPerColumn; + + @JsonProperty + private final boolean skipEmptyLines; + + @JsonProperty + private final boolean ignoreLeadingWhitespaces; + + @JsonProperty + private final boolean ignoreTrailingWhitespaces; + + @JsonProperty + private final String nullValue; + + HttpCSVOptions(HttpCSVOptionsBuilder builder) { +this.delimiter = builder.delimiter; +this.quote = builder.quote; +this.quoteEscape = builder.quoteEscape; +this.lineSeparator = builder.lineSeparator; +this.headerExtractionEnabled = builder.headerExtractionEnabled; +this.numberOfRowsToSkip = builder.numberOfRowsToSkip; +this.numberOfRecordsToRead = builder.numberOfRecordsToRead; +this.lineSeparatorDetectionEnabled = builder.lineSeparatorDetectionEnabled; +this.maxColumns = builder.maxColumns; +this.maxCharsPerColumn = builder.maxCharsPerColumn; +this.skipEmptyLines = builder.skipEmptyLines; +this.ignoreLeadingWhitespaces = builder.ignoreLeadingWhitespaces; +this.ignoreTrailingWhitespaces = builder.ignoreTrailingWhitespaces; +this.nullValue = builder.nullValue; + } + + public static HttpCSVOptionsBuilder builder() { +return new HttpCSVOptionsBuilder(); + } + + public String getDelimiter() { +return delimiter; + } + + public char getQuote() { +return quote; + } + + public char getQuoteEscape() { +return quoteEscape; + } + + public String getLineSeparator() { +return lineSeparator; + } + + public Boolean getHeaderExtractionEnabled() { +return headerExtractionEnabled; + } + + public long getNumberOfRowsToSkip() { +return numberOfRowsToSkip; + } + + public long getNumberOfRecordsToRead() { +return numberOfRecordsToRead; + } + + public boolean isLineSeparatorDetectionEnabled() { +return lineSeparatorDetectionEnabled; + } + + public int getMaxColumns() { +return maxColumns; + } + + public int getMaxCharsPerColumn() { +return maxCharsPerColumn; + } + + public boolean isSkipEmptyLines() { +return skipEmptyLines; + } + + public boolean isIgnoreLeadingWhitespaces() { +return ignoreLeadingWhitespaces; + } + + public boolean isIgnoreTrailingWhitespaces() { +return ignoreTrailingWhitespaces; + } + + public String getNullValue() { +return nullValue; + } + + @Override + public boolean equals(Object o) { +if (this == o) { + return true; +} +if (o == null || getClass() != o.getClass()) { + return false; +} +HttpCSVOptions that = (HttpCSVOptions) o; +return quote == that.quote && quoteEscape == that.quoteEscape && numberOfRowsToSkip == that.numberOfRowsToSkip &&
[jira] [Commented] (DRILL-8457) Allow configuring csv parser in http storage plugin configuration
[ https://issues.apache.org/jira/browse/DRILL-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17781003#comment-17781003 ] ASF GitHub Bot commented on DRILL-8457: --- ztomanek-dw commented on PR #2840: URL: https://github.com/apache/drill/pull/2840#issuecomment-1785165480 @cgivre Thanks for your feedback! According to your comments: - written unit tests for `HttpCSVOptions` - written unit tests for `HttpApiConfig`, by the way fixing small bug on `HttpMethod` validation - added tsv parsing test to `TestHttpPlugin` - documented `csvOptions` configuration in `CSV_Options.md` Let me know if you see anything else to cover :) > Allow configuring csv parser in http storage plugin configuration > - > > Key: DRILL-8457 > URL: https://issues.apache.org/jira/browse/DRILL-8457 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - HTTP >Affects Versions: Future >Reporter: Zbigniew Tomanek >Priority: Minor > Fix For: Future > > > Currently there is no way to configure csv parser when http plugin is used. > Because of that some kind of files cannot be parsed (e.g. when any column has > more than 4096 chars or file has a delimiter different from `,`). > Since in DataWalk we utilize http plugin quite often we've changed our > internal fork of Drill so following parser/format properties can be > configured using additional `csvOptions` field: > > {code:json} > { > "csvOptions": { > "delimiter": "\t", > "quote": "\"", > "quote_escape": "\"", > "line_separator": "\n", > "header_extraction_enabled": null, > "number_of_rows_to_skip": 0, > "number_of_records_to_read": -1, > "line_separator_detection_enabled": true, > "max_columns": 512, > "max_chars_per_column": 4096, > "skip_empty_lines": true, > "ignore_leading_whitespaces": true, > "ignore_trailing_whitespaces": true, > "null_value": null > } > }{code} > I'd be glad to get feedback whether creating PR with these changes would > bring any value to the Drill -- This message was sent by Atlassian Jira (v8.20.10#820010)