[jira] [Commented] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.
[ https://issues.apache.org/jira/browse/DRILL-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429635#comment-16429635 ] Paul Rogers commented on DRILL-6312: While we are focussing on the type of pesky fields, data processing system often allow other forms of column definitions. For example, it is often helpful to combine or split columns. Suppose I have a field like the following from a web log: {noformat} GET http://mySite.com/path/to/asset {noformat} I may want to split this into four field: HTTP operation ("GET"), service type ("http"), host ("mySite.com") and asset ("/path/to/asset"). Or, I may have two fields that give the and time: {noformat} 2018-04-07, 10:13:43.345 {noformat} And I may want to combine them into a single date-time type. A handy technique is to define a computed column that does the work. If the computed column can call a UDF, then pretty much any transform is possible. Here is a very simple case for a line item: {noformat} price * quantity AS extendedPrice {noformat} > Enable pushing of cast expressions to the scanner for better schema discovery. > -- > > Key: DRILL-6312 > URL: https://issues.apache.org/jira/browse/DRILL-6312 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning > Optimization >Affects Versions: 1.13.0 >Reporter: Hanumath Rao Maduri >Priority: Major > > Drill is a schema less engine which tries to infer the schema from disparate > sources at the read time. Currently the scanners infer the schema for each > batch depending upon the data for that column in the corresponding batch. > This solves many uses cases but can error out when the data is too different > between batches like int and array[int] etc... (There are other cases as well > but just to give one example). > There is also a mechanism to create a view by type casting the columns to > appropriate type. This solves issues in some cases but fails in many other > cases. This is due to the fact that cast expression is not being pushed down > to the scanner but staying at the project or filter etc operators up the > query plan. > This JIRA is to fix this by propagating the type information embedded in the > cast function to the scanners so that scanners can cast the incoming data > appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.
[ https://issues.apache.org/jira/browse/DRILL-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429634#comment-16429634 ] Paul Rogers edited comment on DRILL-6312 at 4/8/18 5:50 AM: While type inference (using Cast and other hints) is a very good idea, it cannot be the full answer. Here is why: * The only way to express a type is to include the column in a SELECT clause. If a column is not projected, no hint can be provided, and we can end up with possible read-time problems as discussed in the original e-mail thread ("Death of Schema on Read"). * The only way to express the type of a column is to explicitly include it in the SELECT clause. Using a wildcard ("*") query will bypass the type rules unless there is a view underneath that applies the rules. * There is no way to type just the pesky, troublesome columns, leaving the others to be detected automatically. If we must use a view, and we have to, say, use a cast for column x, then we have to include all other columns in the SELECT clause or we end up projecting only x. We can't use a wildcard for the other columns. * Putting the type information in the query puts the burden on the query writer (and, ultimately, something like Tableau.) But, the schema is a property of the data, not the query, so this is not good model of reality. For this reason, the cast idea, though elegant, and a very good enhancement, cannot be the full answer, It will reduce the number of cases where type ambiguity occurs, but it is not a general-purpose solution. A general-purpose solution would be to provide some means to explicitly apply type information. For example, in a view or query, provide explicit hint syntax: {noformat} SELECT * FROM myFunkyTable WITH HINTS (f: INT, m.x: BIGINT NOT NULL, a[]: VARCHAR NULL) {noformat} The hints say that, if fields "f", "m.x" and "a" appear, they are of the type specified. If the fields don't appear, just ignore the hints. Most systems put this information in metadata, but Drill is very hostile to metadata, so it must be in the query (or, equivalently, a view.) Lore has it that the early Drill designers proposed a ".drill" file to hold schema information. In this case, schema information would be an add-on file, much as views are. As proposed in the e-mail thread, perhaps both forms of information can be combined in a single file. was (Author: paul-rogers): While type inference (using Cast and other hints) is a very good idea, it cannot be the full answer. Here is why: * The only way to express a type is to include the column in a SELECT clause. If a column is not projected, no hint can be provided, and we can end up with possible read-time problems as discussed in the original e-mail thread ("Death of Schema on Read"). * The only way to express the type of a column is to explicitly include it in the SELECT clause. Using a wildcard ("*") query will bypass the type rules unless there is a view underneath that applies the rules. * There is no way to type just the pesky, troublesome columns, leaving the others to be detected automatically. If we must use a view, and we have to, say, use a cast for column x, then we have to include all other columns in the SELECT clause or we end up projecting only x. We can't use a wildcard for the other columns. * Putting the type information in the query puts the burden on the query writer (and, ultimately, something like Tableau.) But, the schema is a property of the data, not the query, so this is not good model of reality. For this reason, the cast idea, though elegant, and a very good enhancement, cannot be the full answer, It will reduce the number of cases where type ambiguity occurs, but it is not a general-purpose solution. A general-purpose solution would be to provide some means to explicitly apply type information. For example, in a view or query, provide explicit hint syntax: {noformat} SELECT * FROM myFunkyTable WITH HINTS (f: INT, m.x: BIGINT NOT NULL, a[]: VARCHAR NULL) {noformat} The hints say that, if fields "f", "m.x" and "a" appear, they are of the type specified. If the fields don't appear, just ignore the hints. Most systems put this information in metadata, but Drill is very hostile to metadata, so it must be in the query (or, equivalently, a view.) > Enable pushing of cast expressions to the scanner for better schema discovery. > -- > > Key: DRILL-6312 > URL: https://issues.apache.org/jira/browse/DRILL-6312 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning > Optimization >Affects Versions: 1.13.0 >Reporter: Hanumath Rao Maduri >Priority: Major > > Drill is a schema less engine which tries to infer the schema
[jira] [Comment Edited] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.
[ https://issues.apache.org/jira/browse/DRILL-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429634#comment-16429634 ] Paul Rogers edited comment on DRILL-6312 at 4/8/18 5:48 AM: While type inference (using Cast and other hints) is a very good idea, it cannot be the full answer. Here is why: * The only way to express a type is to include the column in a SELECT clause. If a column is not projected, no hint can be provided, and we can end up with possible read-time problems as discussed in the original e-mail thread ("Death of Schema on Read"). * The only way to express the type of a column is to explicitly include it in the SELECT clause. Using a wildcard ("*") query will bypass the type rules unless there is a view underneath that applies the rules. * There is no way to type just the pesky, troublesome columns, leaving the others to be detected automatically. If we must use a view, and we have to, say, use a cast for column x, then we have to include all other columns in the SELECT clause or we end up projecting only x. We can't use a wildcard for the other columns. * Putting the type information in the query puts the burden on the query writer (and, ultimately, something like Tableau.) But, the schema is a property of the data, not the query, so this is not good model of reality. For this reason, the cast idea, though elegant, and a very good enhancement, cannot be the full answer, It will reduce the number of cases where type ambiguity occurs, but it is not a general-purpose solution. A general-purpose solution would be to provide some means to explicitly apply type information. For example, in a view or query, provide explicit hint syntax: {noformat} SELECT * FROM myFunkyTable WITH HINTS (f: INT, m.x: BIGINT NOT NULL, a[]: VARCHAR NULL) {noformat} The hints say that, if fields "f", "m.x" and "a" appear, they are of the type specified. If the fields don't appear, just ignore the hints. Most systems put this information in metadata, but Drill is very hostile to metadata, so it must be in the query (or, equivalently, a view.) was (Author: paul-rogers): While type inference (using Cast and other hints) is a very good idea, it cannot be the full answer. Here is why: * The only way to express a type is to include the column in a SELECT clause. If a column is not projected, no hint can be provided, and we can end up with possible read-time problems as discussed in the original e-mail thread ("Death of Schema on Read"). * The only way to express the type of a column is to explicitly include it in the SELECT clause. Using a wildcard ("*") query will bypass the type rules unless there is a view underneath that applies the rules. * There is no way to type just the pesky, troublesome columns, leaving the others to be detected automatically. If we must use a view, and we have to, say, use a cast for column x, then we have to include all other columns in the SELECT clause or we end up projecting only x. For this reason, the cast idea, though elegant, and a very good enhancement, cannot be the full answer, It will reduce the number of cases where type ambiguity occurs, but it is not a general-purpose solution. A general-purpose solution would be to provide some means to explicitly apply type information. For example, in a view or query, provide explicit hint syntax: {noformat} SELECT * FROM myFunkyTable WITH HINTS (f: INT, m.x: BIGINT NOT NULL, a[]: VARCHAR NULL) {noformat} The hints say that, if fields "f", "m.x" and "a" appear, they are of the type specified. If the fields don't appear, just ignore the hints. Most systems put this information in metadata, but Drill is very hostile to metadata, so it must be in the query (or, equivalently, a view.) > Enable pushing of cast expressions to the scanner for better schema discovery. > -- > > Key: DRILL-6312 > URL: https://issues.apache.org/jira/browse/DRILL-6312 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning > Optimization >Affects Versions: 1.13.0 >Reporter: Hanumath Rao Maduri >Priority: Major > > Drill is a schema less engine which tries to infer the schema from disparate > sources at the read time. Currently the scanners infer the schema for each > batch depending upon the data for that column in the corresponding batch. > This solves many uses cases but can error out when the data is too different > between batches like int and array[int] etc... (There are other cases as well > but just to give one example). > There is also a mechanism to create a view by type casting the columns to > appropriate type. This solves issues in some cases but fails in many other > cases. This is due to the
[jira] [Commented] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.
[ https://issues.apache.org/jira/browse/DRILL-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429634#comment-16429634 ] Paul Rogers commented on DRILL-6312: While type inference (using Cast and other hints) is a very good idea, it cannot be the full answer. Here is why: * The only way to express a type is to include the column in a SELECT clause. If a column is not projected, no hint can be provided, and we can end up with possible read-time problems as discussed in the original e-mail thread ("Death of Schema on Read"). * The only way to express the type of a column is to explicitly include it in the SELECT clause. Using a wildcard ("*") query will bypass the type rules unless there is a view underneath that applies the rules. * There is no way to type just the pesky, troublesome columns, leaving the others to be detected automatically. If we must use a view, and we have to, say, use a cast for column x, then we have to include all other columns in the SELECT clause or we end up projecting only x. For this reason, the cast idea, though elegant, and a very good enhancement, cannot be the full answer, It will reduce the number of cases where type ambiguity occurs, but it is not a general-purpose solution. A general-purpose solution would be to provide some means to explicitly apply type information. For example, in a view or query, provide explicit hint syntax: {noformat} SELECT * FROM myFunkyTable WITH HINTS (f: INT, m.x: BIGINT NOT NULL, a[]: VARCHAR NULL) {noformat} The hints say that, if fields "f", "m.x" and "a" appear, they are of the type specified. If the fields don't appear, just ignore the hints. Most systems put this information in metadata, but Drill is very hostile to metadata, so it must be in the query (or, equivalently, a view.) > Enable pushing of cast expressions to the scanner for better schema discovery. > -- > > Key: DRILL-6312 > URL: https://issues.apache.org/jira/browse/DRILL-6312 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning > Optimization >Affects Versions: 1.13.0 >Reporter: Hanumath Rao Maduri >Priority: Major > > Drill is a schema less engine which tries to infer the schema from disparate > sources at the read time. Currently the scanners infer the schema for each > batch depending upon the data for that column in the corresponding batch. > This solves many uses cases but can error out when the data is too different > between batches like int and array[int] etc... (There are other cases as well > but just to give one example). > There is also a mechanism to create a view by type casting the columns to > appropriate type. This solves issues in some cases but fails in many other > cases. This is due to the fact that cast expression is not being pushed down > to the scanner but staying at the project or filter etc operators up the > query plan. > This JIRA is to fix this by propagating the type information embedded in the > cast function to the scanners so that scanners can cast the incoming data > appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.
[ https://issues.apache.org/jira/browse/DRILL-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429632#comment-16429632 ] Paul Rogers commented on DRILL-6312: The idea of using the cast statement came from [~tdunning], and is a very good one. The idea can be generalized using ideas from [this paper|https://blog.acolyer.org/2015/08/03/towards-practical-gradual-typing/]. Cast is just a special case of a more general idea: top-down, then bottom-up typing. Drill already implements bottom-up typing: Drill starts with columns, then infers the overridden versions of functions based on arguments, and eventually arrives at the type of each column in the result set. For example, if we have an expression {{a + b}}, the reader will figure out the types of {{a}} and {{b}}. Perhaps {{a}} is an {{INT}} and {{b}} is a {{Float8}}. Through type inference, Drill will find a version of the {{add}} function that takes two {{Float8}} arguments. Next, Drill will infer that it can convert an {{INT}} to a {{Float8}}. The idea here is to run the system in reverse, from the result set back out to the scan columns. For each expression (function) in the SELECT clause, infer the types of the input. If we have an the expression above, {{a + b}}, then we can scan all the available versions of the {{add}} function to determine the set of possible argument types. Since {{add}} has many versions, one for each numeric type, we'll need a way to say that the arguments must be numeric, though we don't care the specific type. So, label the inputs as the new abstract type {{Numeric}}. We've now labeled the arguments {{a}} and {{b}} as {{Numeric}}. We pass that information into the Scan operator, say the JSON reader. Now, when JSON sees the first value of {{a}} as null, and finds that {{b}} is missing, JSON has context to choose the correct type; say {{Float8}} or {{BigInt}} (the two numeric types that JSON uses.) As we can see, Cast is just a special case: one in which the type is narrowed down to one very specific type. That is {{CAST(a AS INT)}} says not just that {{a}} is numeric, but that it is {{Int}}. While this is all very useful, it still leads to ambiguity. In the case above, if all we know is that {{a}} is numeric, the first reader, the one that sees as {{null}} value, can choose {{BigInt}}. But, if another reader (or a later record) actually has the value as {{Float8}}, we've still got problems. The result is a "bounce" algorithm: do a top-down tree traversal of the parse tree to infer possible expression types. Then, at runtime, continue to use the bottom-up traversal to infer actual types. > Enable pushing of cast expressions to the scanner for better schema discovery. > -- > > Key: DRILL-6312 > URL: https://issues.apache.org/jira/browse/DRILL-6312 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning > Optimization >Affects Versions: 1.13.0 >Reporter: Hanumath Rao Maduri >Priority: Major > > Drill is a schema less engine which tries to infer the schema from disparate > sources at the read time. Currently the scanners infer the schema for each > batch depending upon the data for that column in the corresponding batch. > This solves many uses cases but can error out when the data is too different > between batches like int and array[int] etc... (There are other cases as well > but just to give one example). > There is also a mechanism to create a view by type casting the columns to > appropriate type. This solves issues in some cases but fails in many other > cases. This is due to the fact that cast expression is not being pushed down > to the scanner but staying at the project or filter etc operators up the > query plan. > This JIRA is to fix this by propagating the type information embedded in the > cast function to the scanners so that scanners can cast the incoming data > appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6313) ScanBatch.Mutator does not report new schema for empty first batch
Paul Rogers created DRILL-6313: -- Summary: ScanBatch.Mutator does not report new schema for empty first batch Key: DRILL-6313 URL: https://issues.apache.org/jira/browse/DRILL-6313 Project: Apache Drill Issue Type: Bug Affects Versions: 1.13.0 Reporter: Paul Rogers Assignee: Paul Rogers Fix For: 1.14.0 Create a format plugin that honors an empty select list by returning no columns. This case occurs in a {{COUNT(\*)}} query. When run, the query fails with: {noformat} SYSTEM ERROR: IllegalStateException: next() returned OK without first returning OK_NEW_SCHEMA [#2, ScanBatch] {noformat} The reason is that the {{Mutator}} class uses a flag, {{schemaChanged}}, which defaults to {{schemaChanged}}. It is set to {{true}} only when a field is added. But, since the query requested no fields, no field is added. The fix is simple, just default {{schemaChanged}} to {{true}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.
[ https://issues.apache.org/jira/browse/DRILL-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429403#comment-16429403 ] Hanumath Rao Maduri commented on DRILL-6312: Please find the mail thread which discusses various issues and approaches to deal with discovery of schema. {noformat} Hi Hanu, The problem with views as is, even with casts, is that the casting comes too late to resolve he issues I highlighted in earlier messages. Ted's cast push-down idea causes the conversion to happen at read time so that we can, say, cast a string to an int, or cast a null to the proper type. Today, if we use a cast, such as SELECT cast(a AS INT) FROM myTable then we get a DAG that has tree parts (to keep things simple): * Scan the data, using types inferred from the data itself * In a Filter operator, convert the type of data to INT * In Screen, return the result to the user If the type is ambiguous in the file, then the first step above fails; data never gets far enough for the Filter to kick in and apply the cast. Also, if a file contains a run of nulls, the scanner will choose Nullable Int, then fail when it finds, say, a string. The key point is that the cast push-down means that the query will not fail due to dicey files: the cast resolves the ambiguity. If we push the cast down, then it is the SCAN operator that resolves the conflict and does the cast; avoiding the failures we've been discussing. I like the idea you seem to be proposing: cascading views. Have a table view that cleans up each table. Then, these can be combined in higher-order views for specialized purposes. The beauty of the cast push-down idea is that no metadata is needed other than the query. If the user wants metadata, they use existing views (that contain the casts and cause the cast push-down.) This seems like such a simple, elegant solution that we could try it out quickly (if we get past the planner issues Aman mentioned.) In fact, the new scan operator code (done as part of the batch sizing work) already has a prototype mechanism for type hints. If the type hint is provided to the scanner, it uses them, otherwise it infers the type. We'd just hook up the cast push down data to that prototype and we could try out the result quickly. (The new scan operator is still in my private branch, in case anyone goes looking for it...) Some of your discussion talks about automatically inferring the schema. I really don't think we need to do that. The hint (cast push-down) is sufficient to resolve ambiguities in the existing scan-time schema inference. The syntax trick would be to find a way to provide hints just for those columns that are issues. If I have a table with columns a, b, ... z, but only b is a problem, I don't want to have to do: SELECT a, CAST(b AS INT), c, ... z FROM myTable Would be great if we could just do: SELECT *, CAST(b AS INT) FROM myTable I realize the above has issues; the key idea is: provide casts only for the problem fields without spelling out all fields. If we really want to get fancy, we can do UDF push down for the complex cases you mentioned. Maybe: SELECT *, CAST(b AS INT), parseCode(c) ... We are diving into design here; maybe you can file a JIRA and we can shift detailed design discussion to that JIRA. Salim already has one related to schema change errors, which was why the "Death" article caught my eye. Thanks, - Paul On Friday, April 6, 2018, 4:59:40 PM PDT, Hanumath Rao Maduriwrote: Hello, Thanks for Ted & Paul for clarifying my questions. Sorry for not being clear in my previous post, When I said create view I was under the impression for simple views where we use cast expressions currently to cast them to types. In this case planner can use this information to force the scans to use this as the schema. If the query fails then it fails at the scan and not after inferring the schema by the scanner. I know that views can get complicated with joins and expressions. For schema hinting through views I assume they should be created on single tables with corresponding columns one wants to project from the table. Regarding the same question, today we had a discussion with Aman. Here view can be considered as a "view" of the table with schema in place. We can change some syntax to suite it for specifying schema. something like this. create schema[optional] view(/virtual table ) v1 as (a: int, b : int) select a, b from t1 with some other rules as to conversion of scalar to complex types. Then the queries when used on this view (below) should enable the scanner to use this type information and then use it to convert the data into the appropriate types. select * from v1 For the possibility of schema information not being known by the user, may be use something like this. create schema[optional] view(/virtual table) v1 as select a, b from t1 infer
[jira] [Created] (DRILL-6312) Enable pushing of cast expressions to the scanner for better schema discovery.
Hanumath Rao Maduri created DRILL-6312: -- Summary: Enable pushing of cast expressions to the scanner for better schema discovery. Key: DRILL-6312 URL: https://issues.apache.org/jira/browse/DRILL-6312 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators, Query Planning Optimization Affects Versions: 1.13.0 Reporter: Hanumath Rao Maduri Drill is a schema less engine which tries to infer the schema from disparate sources at the read time. Currently the scanners infer the schema for each batch depending upon the data for that column in the corresponding batch. This solves many uses cases but can error out when the data is too different between batches like int and array[int] etc... (There are other cases as well but just to give one example). There is also a mechanism to create a view by type casting the columns to appropriate type. This solves issues in some cases but fails in many other cases. This is due to the fact that cast expression is not being pushed down to the scanner but staying at the project or filter etc operators up the query plan. This JIRA is to fix this by propagating the type information embedded in the cast function to the scanners so that scanners can cast the incoming data appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6289) Cluster view should show more relevant information
[ https://issues.apache.org/jira/browse/DRILL-6289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429347#comment-16429347 ] ASF GitHub Bot commented on DRILL-6289: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/1203 Before the review I guess we need to clarify one thing. After DRILL-6044 Shutdown button was shown only for the current drillbit. As far as I understood, you cannot shutdown other drillbits from Web UI except of current. @dvjyothsna please confirm. > Cluster view should show more relevant information > -- > > Key: DRILL-6289 > URL: https://issues.apache.org/jira/browse/DRILL-6289 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.13.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Fix For: 1.14.0 > > Original Estimate: 168h > Remaining Estimate: 168h > > When fixing DRILL-6224, I noticed that the same information can be very > useful to have in the cluster view shown on a Drillbit's homepage. > The proposal is to show the following: > # Heap Memory in use > # Direct Memory (actively) in use - Since we're not able to get the total > memory held by Netty at the moment, but only what is currently allocated to > running queries > # Process CPU > # Average (System) Load Factor > Information such as the port numbers don't help much during general cluster > health, so it might be worth removing this information if more real-estate is > needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-6296) Add operator metrics for batch sizing for merge join
[ https://issues.apache.org/jira/browse/DRILL-6296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-6296. - Resolution: Fixed Merged with commit id da241134fb88464139437b05b1feaafbb3014bb0. > Add operator metrics for batch sizing for merge join > > > Key: DRILL-6296 > URL: https://issues.apache.org/jira/browse/DRILL-6296 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Relational Operators >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Major > Fix For: 1.14.0 > > > Add operator metrics for batch sizing stats for merge join. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6287) apache-release profile should be disabled by default
[ https://issues.apache.org/jira/browse/DRILL-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6287: Fix Version/s: 1.14.0 > apache-release profile should be disabled by default > > > Key: DRILL-6287 > URL: https://issues.apache.org/jira/browse/DRILL-6287 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > Labels: ready-to-commit > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6230) Extend row set readers to handle hyper vectors
[ https://issues.apache.org/jira/browse/DRILL-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429337#comment-16429337 ] ASF GitHub Bot commented on DRILL-6230: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1161 > Extend row set readers to handle hyper vectors > -- > > Key: DRILL-6230 > URL: https://issues.apache.org/jira/browse/DRILL-6230 > Project: Apache Drill > Issue Type: Improvement >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > The current row set readers have incomplete support for hyper-vectors. To add > full support, we need an interface that supports either single batches or > hyper batches. Accessing vectors in hyper batches differs depending on > whether the vector is at the top level or is nested. See [this > post|https://github.com/paul-rogers/drill/wiki/BH-Column-Readers] for > details. Also includes a simpler reader template: replaces the original three > classes with one, in parallel with the writers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6303) Provide a button to copy the Drillbit's JStack shown in /threads
[ https://issues.apache.org/jira/browse/DRILL-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429332#comment-16429332 ] ASF GitHub Bot commented on DRILL-6303: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1199 > Provide a button to copy the Drillbit's JStack shown in /threads > > > Key: DRILL-6303 > URL: https://issues.apache.org/jira/browse/DRILL-6303 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Trivial > Labels: ready-to-commit > Fix For: 1.14.0 > > Attachments: mouseOnClick.png, mouseOver.png > > Original Estimate: 1h > Remaining Estimate: 1h > > Currently, when using the WebUI inspecting the JStack for the state of > threads within a Drillbit (via +{{http://:8047/threads}}+ ), the > contents of the `div` element refreshes automatically and resets any > selection, making it harder to freeze the contents for inspection. > Pausing the refresh is not recommended, so the alternative is to copy the > contents to the user's clipboard for separately viewing in a text editor. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6016) Error reading INT96 created by Apache Spark
[ https://issues.apache.org/jira/browse/DRILL-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429335#comment-16429335 ] ASF GitHub Bot commented on DRILL-6016: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1166 > Error reading INT96 created by Apache Spark > --- > > Key: DRILL-6016 > URL: https://issues.apache.org/jira/browse/DRILL-6016 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.13.0 >Reporter: Rahul Raj >Assignee: Rahul Raj >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > > Hi, > I am getting the error - SYSTEM ERROR : ClassCastException: > org.apache.drill.exec.vector.TimeStampVector cannot be cast to > org.apache.drill.exec.vector.VariableWidthVector while trying to read a spark > INT96 datetime field on Drill 1.11 in spite of setting the property > store.parquet.reader.int96_as_timestamp to true. > I believe this was fixed in drill > 1.10(https://issues.apache.org/jira/browse/DRILL-4373). What could be wrong. > I have attached the dataset at > https://github.com/rajrahul/files/blob/master/result.tar.gz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6279) Web UI should indicate when operators have spilled in-memory data to disk
[ https://issues.apache.org/jira/browse/DRILL-6279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429331#comment-16429331 ] ASF GitHub Bot commented on DRILL-6279: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1197 > Web UI should indicate when operators have spilled in-memory data to disk > - > > Key: DRILL-6279 > URL: https://issues.apache.org/jira/browse/DRILL-6279 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.13.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > Attachments: spillToDiskSnapshot.png > > > Currently, there is no indication of when an operator is spilling to disk, > which would help explain a slow running query. > Suggestions are welcome, but the current proposal is to simply update the > Operators Overview section to show average and max spill cycles, preferrably, > with a color code (or formatting). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6287) apache-release profile should be disabled by default
[ https://issues.apache.org/jira/browse/DRILL-6287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429333#comment-16429333 ] ASF GitHub Bot commented on DRILL-6287: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1182 > apache-release profile should be disabled by default > > > Key: DRILL-6287 > URL: https://issues.apache.org/jira/browse/DRILL-6287 > Project: Apache Drill > Issue Type: Bug >Reporter: Vlad Rozov >Assignee: Vlad Rozov >Priority: Minor > Labels: ready-to-commit > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6271) Update copyright range in NOTICE
[ https://issues.apache.org/jira/browse/DRILL-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429334#comment-16429334 ] ASF GitHub Bot commented on DRILL-6271: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1188 > Update copyright range in NOTICE > > > Key: DRILL-6271 > URL: https://issues.apache.org/jira/browse/DRILL-6271 > Project: Apache Drill > Issue Type: Task >Reporter: Vlad Rozov >Assignee: Venkata Jyothsna Donapati >Priority: Major > Labels: ready-to-commit > Fix For: 1.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6284) Add operator metrics for batch sizing for flatten
[ https://issues.apache.org/jira/browse/DRILL-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429336#comment-16429336 ] ASF GitHub Bot commented on DRILL-6284: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/1181 > Add operator metrics for batch sizing for flatten > - > > Key: DRILL-6284 > URL: https://issues.apache.org/jira/browse/DRILL-6284 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.13.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy >Priority: Critical > Labels: ready-to-commit > Fix For: 1.14.0 > > > Add the following operator metrics for flatten. > INPUT_BATCH_COUNT, > AVG_INPUT_BATCH_BYTES, > AVG_INPUT_ROW_BYTES, > INPUT_RECORD_COUNT, > OUTPUT_BATCH_COUNT, > AVG_OUTPUT_BATCH_BYTES, > AVG_OUTPUT_ROW_BYTES, > OUTPUT_RECORD_COUNT; > -- This message was sent by Atlassian JIRA (v7.6.3#76005)