[jira] [Resolved] (DRILL-2171) Test framework throws IOOB for tests changing schema

2021-11-18 Thread Vitalii Diravka (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-2171.

Resolution: Done

> Test framework throws IOOB for tests changing schema
> 
>
> Key: DRILL-2171
> URL: https://issues.apache.org/jira/browse/DRILL-2171
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Reporter: Hanifi Gunes
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: Future
>
>
> I added a unit test as part of DRILL-1605 that resolves a problem with schema 
> change. Unfortunately test framework suffers from a similar problem throwing 
> IOOB while trying to verify the results. 
> TestSchemaChange#testMultiFilesWithDifferentSchema is currently ignored until 
> a patch is available for this issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (DRILL-1896) Unit tests failing due to string based comparison at JsonStringHashMap & JsonStringArrayList #equals methods

2021-11-18 Thread Vitalii Diravka (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-1896.

Resolution: Fixed

_jsonBaselineFile_ is doing a new Drill query. The result for this and original 
query is different, because of _SchemaChange_ in the original query. So need to 
compare the result of the query with baselineValues. Resolved in DRILL-8046

> Unit tests failing due to string based comparison at JsonStringHashMap & 
> JsonStringArrayList #equals methods
> 
>
> Key: DRILL-1896
> URL: https://issues.apache.org/jira/browse/DRILL-1896
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Hanifi Gunes
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: 0.8.0
>
> Attachments: DRILL-1896-v3.patch, DRILL-1896.patch, RILL-1896-v2.patch
>
>
> Unit test framework relies on JsonString*#equals methods to compare actual 
> and expected results. We should properly implement these to prevent unit 
> tests from failing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (DRILL-5612) Random failure in TestMergeJoinWithSchemaChanges

2021-11-18 Thread Vitalii Diravka (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-5612.

Resolution: Fixed

> Random failure in TestMergeJoinWithSchemaChanges
> 
>
> Key: DRILL-5612
> URL: https://issues.apache.org/jira/browse/DRILL-5612
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.11.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
>Priority: Major
> Attachments: image-2021-11-16-02-35-25-690.png
>
>
> The unit test 
> {{org.apache.drill.exec.physical.impl.join.TestMergeJoinWithSchemaChanges#testMissingAndNewColumns}}
>  is subject to random failures, perhaps due to changes in file order in 
> readers.
> The test builds a number of input files, then executes queries against them. 
> On most runs, the output is fine:
> {code}
> Running 
> org.apache.drill.exec.physical.impl.join.TestMergeJoinWithSchemaChanges#testMissingAndNewColumns
> /home/.../target/1498606483211-0/mergejoin-schemachanges-left
> /home/.../target/1498606483211-1/mergejoin-schemachanges-right
> {code}
> But, on occasion, the query fails:
> {code}
> org.apache.drill.exec.physical.impl.join.TestMergeJoinWithSchemaChanges
> testMissingAndNewColumns(org.apache.drill.exec.physical.impl.join.TestMergeJoinWithSchemaChanges)
>   Time elapsed: 0.569 sec  <<< ERROR!
> ...: UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support sorts with 
> changing schemas
> Fragment 0:0
>   (org.apache.drill.exec.exception.SchemaChangeException) Sort currently only 
> supports a single schema.
> 
> org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder.build():152
> 
> org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext():476
> ...
> {code}
> The line in the exception above:
> {code}
>   public void build(VectorContainer outputContainer) throws 
> SchemaChangeException {
> outputContainer.clear();
> if (batches.keySet().size() > 1) {
>   throw new SchemaChangeException("Sort currently only supports a single 
> schema.");
> }
> {code}
> The above code has not changed in quite some time. The failure is in the 
> "legacy" external sort.
> Although the external sort does support schema changes, it only does so in 
> the form of a union vector, which must be enabled. (Other tests validate that 
> schema changes work.)
> What is likely happening here is that the sort sometimes sees two files with 
> differing schemas, sometimes multiple threads run so that a single sort sees 
> only one file. This speculation can be verified by looking at a log file (not 
> available in the test run that failed) to see if the scan under the sort read 
> more than one file.
> Or, perhaps the order of the JSON files matters. Perhaps file order varies 
> across machines (since the Linux command to list directories does not 
> guarantee order.)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (DRILL-2933) RecordBatchLoader.load(...) calls catch SchemaChangeException that load(...) never actually throws

2021-11-18 Thread Vitalii Diravka (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-2933.

Resolution: Fixed

> RecordBatchLoader.load(...) calls catch SchemaChangeException that load(...) 
> never actually throws
> --
>
> Key: DRILL-2933
> URL: https://issues.apache.org/jira/browse/DRILL-2933
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Daniel Barclay
>Assignee: Vitalii Diravka
>Priority: Major
> Fix For: Future
>
>
> There are about 9 calls to RecordBatchLoader.load(...) that catch 
> SchemaChangeException because it is declared to be thrown by 
> RecordBatchLoader.load(...).
> However, RecordBatchLoader.load(...) never actually throws 
> SchemaChangeException.
> (To find those calls, comment out the "throws SchemaChangeException" on 
> RecordBatchLoader.load(...) and follow the compilation errors.)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [drill] lgtm-com[bot] commented on pull request #2282: DRILL-7978: Fixed Width Format Plugin

2021-11-18 Thread GitBox


lgtm-com[bot] commented on pull request #2282:
URL: https://github.com/apache/drill/pull/2282#issuecomment-973244742


   This pull request **introduces 2 alerts** when merging 
428a512ec35309c90b254c71735f88b97768f7ca into 
14d96d1b6a847f3c07a453f6641993da21a4167c - [view on 
LGTM.com](https://lgtm.com/projects/g/apache/drill/rev/pr-0dea4b746255df65147d73ec7e5b0297db4bb759)
   
   **new alerts:**
   
   * 2 for Unused format argument


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] fromrymail commented on issue #2380: JQuery Vulnerability issue (JQuery Update request)

2021-11-18 Thread GitBox


fromrymail commented on issue #2380:
URL: https://github.com/apache/drill/issues/2380#issuecomment-973244112


   > Hi, I can try to update it.
   
   Thank you so much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] cgivre commented on issue #2381: Extension information_schema.columns Drill

2021-11-18 Thread GitBox


cgivre commented on issue #2381:
URL: https://github.com/apache/drill/issues/2381#issuecomment-972931342


   I think it is a good idea. :-). We are getting ready to release Drill 1.20, 
so maybe for the next release? Unless it is an easy fix.
   
   
   
   > On Nov 18, 2021, at 9:40 AM, Ильшат ***@***.***> wrote:
   > 
   > 
   > Guys, what do you think about extending information_schema.columns with 
the "comment" column?
   > At this moment I need to search the database itself.
   > This was discussed here 
https://apache-drill.slack.com/archives/CG380K519/p1636905663050200 

   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub 
, or unsubscribe 
.
   > Triage notifications on the go with GitHub Mobile for iOS 

 or Android 
.
 
   > 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] imamerkhanov opened a new issue #2381: Extension information_schema.columns Drill

2021-11-18 Thread GitBox


imamerkhanov opened a new issue #2381:
URL: https://github.com/apache/drill/issues/2381


   Guys, what do you think about extending information_schema.columns with the 
"comment" column?
   At this moment I need to search the database itself.
   This was discussed here 
https://apache-drill.slack.com/archives/CG380K519/p1636905663050200
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [DISCUSS] Refactoring Drill's CSV (Text) Reader

2021-11-18 Thread Дмитрий Владимирович
Please exclude me from conversation

чт, 18 нояб. 2021 г., 13:30 Charles Givre :

> HI James,
> I do think it might be time to start considering creating a wiki of
> breaking changes for a Drill 2.0.  I'd also concur that having tons of
> config options that don't really add value is not a good use of config
> options as it leads to the creation of a lot of technical debt. I'll start
> a wiki page and put this on there.
>
> In the mean time, I may submit a PR that changes the default value of
> extractHeaders for CSV to true.  I don't really see that as a breaking
> change in that a user can simply change that flag and the previous behavior
> is restored.
> Best,
> -- C
>
>
>
> > On Nov 18, 2021, at 2:34 AM, James Turton  wrote:
> >
> > Definitely a +1 for this friendlier default behaviour and another +1 for
> the prospect of increased consistency across format plugins.
> >
> > My follow-up questions to the community.
> > Since these are examples of user-breaking changes, and not just in niche
> areas, are we approaching a point when we want to start working on Drill
> 2.x?
> > Do we have other user-breaking or significant refactoring ideas that
> we've been keeping stashed away in our heads, that would get their chance
> at life from the fact that a 2.x Drill can defensibly exhibit some
> incompatibilities with Drill 1.x?
> > Should we make a "Drill v2 Parking Lot" page in the Dev Wiki where we
> record such ideas?
> > Would we be fine in terms of dev resources with supporting both bug fix
> releases to a 1.x series and also pushing forward in a 2.x series?
> > My own feeling is that to get the most value from a good proposal such
> as the below, we don't want to conceal everything behind default-false
> options in order to avoid breaking Drill 1.x users, we want to embrace the
> breakage which (to me) points to Drill 2.x.
> >
> > On 2021/11/18 02:30, Charles Givre wrote:
> >> Hello Drill Community,
> >> I would like to put forward some thoughts I've had relating to the CSV
> reader in Drill.  I would like to propose a few changes which could
> actually be breaking changes, so I wanted to see if there are any strongly
> held opinions in the community.  Here goes:
> >>
> >> The Problems:
> >> 1.  The default behavior for Drill is to leave the extractColumnHeaders
> option as false.  When a user queries a CSV file this way, the results are
> returned in a list of columns called columns.  Thus if a user wants the
> first column, they would project columns[0].  I have never been a fan of
> this behavior.  Even though Drill ships with the csvh file extension which
> enables the header extraction, this is not a commonly used file format.
> Furthermore, the returned results (the column list) does not work well with
> BI tools.
> >>
> >> 2.  The CSV reader does not attempt to do any kind of data type
> discovery.
> >>
> >> Proposed Changes:
> >> The overall goal is to make it easier to query CSV data and also to
> make the behavior more consistent across format plugins.
> >> 1.  Change the default behavior and set the extractHeaders to true.
> >> 2.  Other formats, like the excel reader, read tables directly into
> columns.  If the header is not known, Drill assigns a name of field_n.  I
> would propose replacing the `columns` array with a model similar to the
> Excel reader.
> >> 3.  Implement schema discovery (data types) with an allTextMode option
> similar to the JSON reader.  When the allTextMode is disabled, the CSV
> reader would attempt to infer data types.
> >>
> >> Since there are some breaking changes here, I'd like to ask if people
> have any strong feelings on this topic or suggestions.
> >> Thanks!,
> >> -- C
> >>
> >>
> >>
> >
> > 
>
>


Re: [DISCUSS] Refactoring Drill's CSV (Text) Reader

2021-11-18 Thread Charles Givre
HI James, 
I do think it might be time to start considering creating a wiki of breaking 
changes for a Drill 2.0.  I'd also concur that having tons of config options 
that don't really add value is not a good use of config options as it leads to 
the creation of a lot of technical debt. I'll start a wiki page and put this on 
there.  

In the mean time, I may submit a PR that changes the default value of 
extractHeaders for CSV to true.  I don't really see that as a breaking change 
in that a user can simply change that flag and the previous behavior is 
restored.
Best,
-- C



> On Nov 18, 2021, at 2:34 AM, James Turton  wrote:
> 
> Definitely a +1 for this friendlier default behaviour and another +1 for the 
> prospect of increased consistency across format plugins.
> 
> My follow-up questions to the community.
> Since these are examples of user-breaking changes, and not just in niche 
> areas, are we approaching a point when we want to start working on Drill 2.x?
> Do we have other user-breaking or significant refactoring ideas that we've 
> been keeping stashed away in our heads, that would get their chance at life 
> from the fact that a 2.x Drill can defensibly exhibit some incompatibilities 
> with Drill 1.x?
> Should we make a "Drill v2 Parking Lot" page in the Dev Wiki where we record 
> such ideas?
> Would we be fine in terms of dev resources with supporting both bug fix 
> releases to a 1.x series and also pushing forward in a 2.x series?
> My own feeling is that to get the most value from a good proposal such as the 
> below, we don't want to conceal everything behind default-false options in 
> order to avoid breaking Drill 1.x users, we want to embrace the breakage 
> which (to me) points to Drill 2.x.
> 
> On 2021/11/18 02:30, Charles Givre wrote:
>> Hello Drill Community, 
>> I would like to put forward some thoughts I've had relating to the CSV 
>> reader in Drill.  I would like to propose a few changes which could actually 
>> be breaking changes, so I wanted to see if there are any strongly held 
>> opinions in the community.  Here goes:
>> 
>> The Problems:
>> 1.  The default behavior for Drill is to leave the extractColumnHeaders 
>> option as false.  When a user queries a CSV file this way, the results are 
>> returned in a list of columns called columns.  Thus if a user wants the 
>> first column, they would project columns[0].  I have never been a fan of 
>> this behavior.  Even though Drill ships with the csvh file extension which 
>> enables the header extraction, this is not a commonly used file format.  
>> Furthermore, the returned results (the column list) does not work well with 
>> BI tools. 
>> 
>> 2.  The CSV reader does not attempt to do any kind of data type discovery.
>> 
>> Proposed Changes:
>> The overall goal is to make it easier to query CSV data and also to make the 
>> behavior more consistent across format plugins.
>> 1.  Change the default behavior and set the extractHeaders to true. 
>> 2.  Other formats, like the excel reader, read tables directly into columns. 
>>  If the header is not known, Drill assigns a name of field_n.  I would 
>> propose replacing the `columns` array with a model similar to the Excel 
>> reader. 
>> 3.  Implement schema discovery (data types) with an allTextMode option 
>> similar to the JSON reader.  When the allTextMode is disabled, the CSV 
>> reader would attempt to infer data types. 
>> 
>> Since there are some breaking changes here, I'd like to ask if people have 
>> any strong feelings on this topic or suggestions. 
>> Thanks!,
>> -- C
>> 
>> 
>> 
> 
> 



[GitHub] [drill] dzamo commented on a change in pull request #2351: DRILL-1282: Add read and write support for Parquet v2

2021-11-18 Thread GitBox


dzamo commented on a change in pull request #2351:
URL: https://github.com/apache/drill/pull/2351#discussion_r752001140



##
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java
##
@@ -78,6 +80,8 @@
 
 public class ParquetFormatPlugin implements FormatPlugin {
 
+  public static final String[] PARQUET_VERSIONS = {"PARQUET_1_0", 
"PARQUET_2_0"};

Review comment:
   @vdiravka I followed `ParquetProperties#WriterVersion`, that's where 
`PARQUET_1_0` and `PARQUET_2_0` come from.  I agree it's clunky, but on the 
plus side I did not have to introduce any new version format strings or case 
statements.  Which do you think is preferable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [drill] dzamo commented on a change in pull request #2351: DRILL-1282: Add read and write support for Parquet v2

2021-11-18 Thread GitBox


dzamo commented on a change in pull request #2351:
URL: https://github.com/apache/drill/pull/2351#discussion_r752001140



##
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java
##
@@ -78,6 +80,8 @@
 
 public class ParquetFormatPlugin implements FormatPlugin {
 
+  public static final String[] PARQUET_VERSIONS = {"PARQUET_1_0", 
"PARQUET_2_0"};

Review comment:
   @vdiravka I followed `ParquetProperties#WriterVersion`, that's where 
`PARQUET_1_0` and `PARQUET_2_0` come from.  I agree it's clunky, but on the 
other hand I did not have introduce any new version format strings or case 
statements.  Which do you think is preferable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org