[jira] [Updated] (DRILL-3091) Cancelled query continues to list on Drill UI with CANCELLATION_REQUESTED state
[ https://issues.apache.org/jira/browse/DRILL-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-3091: - Reviewer: Khurram Faraaz > Cancelled query continues to list on Drill UI with CANCELLATION_REQUESTED > state > --- > > Key: DRILL-3091 > URL: https://issues.apache.org/jira/browse/DRILL-3091 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.0.0 >Reporter: Abhishek Girish > Fix For: Future > > Attachments: drillbit.log > > > A long running query (TPC-DS SF 100 - query 2) continues to be listed on the > Drill UI query profile page, among the list of running queries. It's been > more than 30 minutes as of this report. > TOP -p showed no activity after the cancellation. And > Jstack on all nodes did not contain the queryID. > I can share more details for repro. > Git.Commit.ID: 583ca4a (May 14 build) > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5293) Poor performance of Hash Table due to same hash value as distribution below
[ https://issues.apache.org/jira/browse/DRILL-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5293: - Reviewer: Kunal Khatua (was: Chunhui Shi) > Poor performance of Hash Table due to same hash value as distribution below > --- > > Key: DRILL-5293 > URL: https://issues.apache.org/jira/browse/DRILL-5293 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Affects Versions: 1.8.0 >Reporter: Boaz Ben-Zvi >Assignee: Boaz Ben-Zvi > Labels: ready-to-commit > Fix For: 1.10.0 > > > The computation of the hash value is basically the same whether for the Hash > Table (used by Hash Agg, and Hash Join), or for distribution of rows at the > exchange. As a result, a specific Hash Table (in a parallel minor fragment) > gets only rows "filtered out" by the partition below ("upstream"), so the > pattern of this filtering leads to a non uniform usage of the hash buckets in > the table. > Here is a simplified example: An exchange partitions into TWO (minor > fragments), each running a Hash Agg. So the partition sends rows of EVEN hash > values to the first, and rows of ODD hash values to the second. Now the first > recomputes the _same_ hash value for its Hash table -- and only the even > buckets get used !! (Or with a partition into EIGHT -- possibly only one > eighth of the buckets would be used !! ) >This would lead to longer hash chains and thus a _poor performance_ ! > A possible solution -- add a distribution function distFunc (only for > partitioning) that takes the hash value and "scrambles" it so that the > entropy in all the bits effects the low bits of the output. This function > should be applied (in HashPrelUtil) over the generated code that produces the > hash value, like: >distFunc( hash32(field1, hash32(field2, hash32(field3, 0))) ); > Tested with a huge hash aggregate (64 M rows) and a parallelism of 8 ( > planner.width.max_per_node = 8 ); minor fragments 0 and 4 used only 1/8 of > their buckets, the others used 1/4 of their buckets. Maybe the reason for > this variance is that distribution is using "hash32AsDouble" and hash agg is > using "hash32". -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5290) Provide an option to build operator table once for built-in static functions and reuse it across queries.
[ https://issues.apache.org/jira/browse/DRILL-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5290: - Reviewer: Kunal Khatua (was: Sudheesh Katkam) > Provide an option to build operator table once for built-in static functions > and reuse it across queries. > - > > Key: DRILL-5290 > URL: https://issues.apache.org/jira/browse/DRILL-5290 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy > Labels: doc-impacting, ready-to-commit > Fix For: 1.10.0 > > > Currently, DrillOperatorTable which contains standard SQL operators and > functions and Drill User Defined Functions (UDFs) (built-in and dynamic) gets > built for each query as part of creating QueryContext. This is an expensive > operation ( ~30 msec to build) and allocates ~2M on heap for each query. For > high throughput, low latency operational queries, we quickly run out of heap > memory, causing JVM hangs. Build operator table once during startup for > static built-in functions and save in DrillbitContext, so we can reuse it > across queries. > Provide a system/session option to not use dynamic UDFs so we can use the > operator table saved in DrillbitContext and avoid building each time. > *Please note, changes are adding new option exec.udf.use_dynamic which needs > to be documented.* -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5287) Provide option to skip updates of ephemeral state changes in Zookeeper
[ https://issues.apache.org/jira/browse/DRILL-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5287: - Reviewer: Kunal Khatua (was: Sudheesh Katkam) > Provide option to skip updates of ephemeral state changes in Zookeeper > -- > > Key: DRILL-5287 > URL: https://issues.apache.org/jira/browse/DRILL-5287 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy > Labels: doc-impacting, ready-to-commit > Fix For: 1.10.0 > > > We put transient profiles in zookeeper and update state as query progresses > and changes states. It is observed that this adds latency of ~45msec for each > update in the query execution path. This gets even worse when high number of > concurrent queries are in progress. For concurrency=100, the average query > response time even for short queries is 8 sec vs 0.2 sec with these updates > disabled. For short lived queries in a high-throughput scenario, it is of no > value to update state changes in zookeeper. We need an option to disable > these updates for short running operational queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5304) Queries fail intermittently when there is skew in data distribution
[ https://issues.apache.org/jira/browse/DRILL-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5304: - Reviewer: Abhishek Girish (was: Jinfeng Ni) > Queries fail intermittently when there is skew in data distribution > --- > > Key: DRILL-5304 > URL: https://issues.apache.org/jira/browse/DRILL-5304 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.10.0 >Reporter: Abhishek Girish >Assignee: Padma Penumarthy > Labels: ready-to-commit > Fix For: 1.10.0 > > Attachments: query1_drillbit.log.txt, query2_drillbit.log.txt > > > In a distributed environment, we've observed certain queries to fail > execution intermittently, with an assignment logic issue, when the underlying > data is skewed w.r.t distribution. > For example the TPC-H [query > 7|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Advanced/tpch/tpch_sf100/parquet/07.q] > failed with the below error: > {code} > java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: > MinorFragmentId 105 has no read entries assigned > ... > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: MinorFragmentId 105 has no read entries > assigned > org.apache.drill.exec.work.foreman.Foreman.run():281 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():744 > Caused By (java.lang.IllegalArgumentException) MinorFragmentId 105 has no > read entries assigned > {code} > Log containing full stack trace is attached. > And for this query, the underlying TPC-H SF100 Parquet dataset was observed > to be located mostly only on 2-3 nodes on an 8 node DFS environment. The data > distribution skew on this cluster is most likely the triggering factor for > this case, as the same query, on the same dataset does not show this failure > on a different test cluster (with possibly different data distribution). > Also, another > [query|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/limit0/window_functions/bugs/data/drill-3700.sql] > failed with a similar error when slice target was set to 1. > {code} > Failed with exception > java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: > MinorFragmentId 66 has no read entries assigned > ... > (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception > during fragment initialization: MinorFragmentId 66 has no read entries > assigned > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5273) CompliantTextReader exhausts 4 GB memory when reading 5000 small files
[ https://issues.apache.org/jira/browse/DRILL-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5273: - Reviewer: Kunal Khatua (was: Chunhui Shi) > CompliantTextReader exhausts 4 GB memory when reading 5000 small files > -- > > Key: DRILL-5273 > URL: https://issues.apache.org/jira/browse/DRILL-5273 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.10.0 > > > A test case was created that consists of 5000 text files, each with a single > line with the file number: 1 to 5001. Each file has a single record, and at > most 4 characters per record. > Run the following query: > {code} > SELECT * FROM `dfs.data`.`5000files/text > {code} > The query will fail with an OOM in the scan batch on around record 3700 on a > Mac with 4GB of direct memory. > The code to read records in {ScanBatch} is complex. The following appears to > occur: > * Iterate over the record readers for each file. > * For each, call setup > The setup code is: > {code} > public void setup(OperatorContext context, OutputMutator outputMutator) > throws ExecutionSetupException { > oContext = context; > readBuffer = context.getManagedBuffer(READ_BUFFER); > whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER); > {code} > The two buffers are in direct memory. There is no code that releases the > buffers. > The sizes are: > {code} > private static final int READ_BUFFER = 1024*1024; > private static final int WHITE_SPACE_BUFFER = 64*1024; > = 1,048,576 + 65536 = 1,114,112 > {code} > This is exactly the amount of memory that accumulates per call to > {{ScanBatch.next()}} > {code} > Ctor: 0 -- Initial memory in constructor > Init setup: 1114112 -- After call to first record reader setup > Entry Memory: 1114112 -- first next() call, returns one record > Entry Memory: 1114112 -- second next(), eof and start second reader > Entry Memory: 2228224 -- third next(), second reader returns EOF > ... > {code} > If we leak 1 MB per file, with 5000 files we would leak 5 GB of memory, which > would explain the OOM when given only 4 GB. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5263) Prevent left NLJoin with non scalar subqueries
[ https://issues.apache.org/jira/browse/DRILL-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5263: - Reviewer: Abhishek Girish (was: Aman Sinha) > Prevent left NLJoin with non scalar subqueries > -- > > Key: DRILL-5263 > URL: https://issues.apache.org/jira/browse/DRILL-5263 > Project: Apache Drill > Issue Type: Bug >Reporter: Serhii Harnyk >Assignee: Serhii Harnyk > Fix For: 1.10.0 > > Attachments: tmp.tar.gz > > > Nested loop join operator in Drill supports only inner join and returns > incorrect result for queries with left join and non scalar sub-queries. Drill > should throw error in this case. > Example: > {code:sql} > alter session set planner.enable_nljoin_for_scalar_only=false; > select t2.dt, t1.fyq, t2.who, t2.event > from t2 > left join t1 on t2.dt between t1.dts and t1.dte > order by t2.dt; > {code} > Result: > {noformat} > +-+--+--++ > | dt | fyq| who| event| > +-+--+--++ > | 2016-12-26 | 2016-Q2 | aperson | had chrsitmas | > | 2017-01-06 | 2016-Q3 | aperson | did somthing | > | 2017-01-12 | 2016-Q3 | aperson | did somthing else | > +-+--+--++ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5221) cancel message is delayed until queryid or data is received
[ https://issues.apache.org/jira/browse/DRILL-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5221: - Reviewer: Khurram Faraaz > cancel message is delayed until queryid or data is received > --- > > Key: DRILL-5221 > URL: https://issues.apache.org/jira/browse/DRILL-5221 > Project: Apache Drill > Issue Type: Improvement > Components: Client - C++ >Affects Versions: 1.9.0 >Reporter: Laurent Goujon >Assignee: Laurent Goujon > Labels: ready-to-commit > Fix For: 1.10.0 > > > When user is calling the cancel method of the C++ client, the client wait for > a message from the server to reply back with a cancellation message. > In case of queries taking a long time to return batch results, it means > cancellation won't be effective until the next batch is received, instead of > cancelling right away the query (assuming the query id has already been > received, which is generally the case). > It seems this was foreseen by [~vkorukanti] in his initial patch > (https://github.com/vkorukanti/drill/commit/e0ef6349aac48de5828b6d725c2cf013905d18eb) > but was omitted when I backported it post metadata changes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5207) Improve Parquet scan pipelining
[ https://issues.apache.org/jira/browse/DRILL-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5207: - Reviewer: Kunal Khatua (was: Sudheesh Katkam) > Improve Parquet scan pipelining > --- > > Key: DRILL-5207 > URL: https://issues.apache.org/jira/browse/DRILL-5207 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.9.0 >Reporter: Parth Chandra >Assignee: Parth Chandra > Labels: doc-impacting > Fix For: 1.10.0 > > > The parquet reader's async page reader is not quite efficiently pipelined. > The default size of the disk read buffer is 4MB while the page reader reads > ~1MB at a time. The Parquet decode is also processing 1MB at a time. This > means the disk is idle while the data is being processed. Reducing the buffer > to 1MB will reduce the time the processing thread waits for the disk read > thread. > Additionally, since the data to process a page may be more or less than 1MB, > a queue of pages will help so that the disk scan does not block (until the > queue is full), waiting for the processing thread. > Additionally, the BufferedDirectBufInputStream class reads from disk as soon > as it is initialized. Since this is called at setup time, this increases the > setup time for the query and query execution does not begin until this is > completed. > There are a few other inefficiencies - options are read every time a page > reader is created. Reading options can be expensive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5123) Write query profile after sending final response to client to improve latency
[ https://issues.apache.org/jira/browse/DRILL-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5123: - Reviewer: Kunal Khatua (was: Padma Penumarthy) > Write query profile after sending final response to client to improve latency > - > > Key: DRILL-5123 > URL: https://issues.apache.org/jira/browse/DRILL-5123 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Labels: ready-to-commit > Fix For: 1.10.0 > > > In testing a particular query, I used a test setup that does not write to the > "persistent store", causing query profiles to not be saved. I then changed > the config to save them (to local disk). This produced about a 200ms > difference in query run time as perceived by the client. > I then moved writing the query profile _after_ sending the client the final > message. This resulted in an approximately 100ms savings, as perceived by the > client, in query run time on short (~3 sec.) queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5121) A memory leak is observed when exact case is not specified for a column in a filter condition
[ https://issues.apache.org/jira/browse/DRILL-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5121: - Reviewer: Chun Chang (was: Paul Rogers) > A memory leak is observed when exact case is not specified for a column in a > filter condition > - > > Key: DRILL-5121 > URL: https://issues.apache.org/jira/browse/DRILL-5121 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0, 1.8.0 >Reporter: Karthikeyan Manivannan >Assignee: Karthikeyan Manivannan > Labels: ready-to-commit > Fix For: 1.10.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > When the query SELECT XYZ from dfs.`/tmp/foo' where xYZ like "abc", is > executed on a setup where /tmp/foo has 2 Parquet files, 1.parquet and > 2.parquet, where 1.parquet has the column XYZ but 2.parquet does not, then > there is a memory leak. > This seems to happen because xYZ seem to be treated as a new column. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5097) Using store.parquet.reader.int96_as_timestamp gives IOOB whereas convert_from works
[ https://issues.apache.org/jira/browse/DRILL-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5097: - Reviewer: Krystal (was: Karthikeyan Manivannan) > Using store.parquet.reader.int96_as_timestamp gives IOOB whereas convert_from > works > --- > > Key: DRILL-5097 > URL: https://issues.apache.org/jira/browse/DRILL-5097 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types, Storage - Parquet >Affects Versions: 1.9.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka > Labels: ready-to-commit > Fix For: 1.10.0 > > Attachments: data.snappy.parquet > > > Using store.parquet.reader.int96_as_timestamp gives IOOB whereas convert_from > works. > The below query succeeds: > {code} > select c, convert_from(d, 'TIMESTAMP_IMPALA') from > dfs.`/drill/testdata/parquet_timestamp/spark_generated/d3`; > {code} > The below query fails: > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `store.parquet.reader.int96_as_timestamp` = true; > +---+---+ > | ok | summary | > +---+---+ > | true | store.parquet.reader.int96_as_timestamp updated. | > +---+---+ > 1 row selected (0.231 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> select c, d from > dfs.`/drill/testdata/parquet_timestamp/spark_generated/d3`; > Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: > 131076 (expected: 0 <= readerIndex <= writerIndex <= capacity(131072)) > Fragment 0:0 > [Error Id: bd94f477-7c01-420f-8920-06263212177b on qa-node190.qa.lab:31010] > (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5104) Foreman sets external sort memory allocation even for a physical plan
[ https://issues.apache.org/jira/browse/DRILL-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5104: - Reviewer: Rahul Challapalli (was: Boaz Ben-Zvi) > Foreman sets external sort memory allocation even for a physical plan > - > > Key: DRILL-5104 > URL: https://issues.apache.org/jira/browse/DRILL-5104 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.10.0 > > > Consider the (disabled) unit test > {{TestSimpleExternalSort.outOfMemoryExternalSort}} which uses the physical > plan {{xsort/oom_sort_test.json}} that contains a setting for the amount of > memory to allocate: > {code} >{ > ... > pop:"external-sort", > ... > initialAllocation: 100, > maxAllocation: 3000 > }, > {code} > When run, the amount of memory is set to 715827882. The reason is that code > was added to {{Foreman}} to compute the memory to allocate to the external > sort: > {code} > private void runPhysicalPlan(final PhysicalPlan plan) throws > ExecutionSetupException { > validatePlan(plan); > MemoryAllocationUtilities.setupSortMemoryAllocations(plan, queryContext); > {code} > The problem is that a physical plan should execute as provided to enable > detailed testing. > To solve this problem, move the sort memory setup to the path taken by SQL > queries, but not via physical plans. > This change is necessary to re-enable the previously-disabled external sort > tests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5098) Improving fault tolerance for connection between client and foreman node.
[ https://issues.apache.org/jira/browse/DRILL-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5098: - Reviewer: Chun Chang (was: Paul Rogers) > Improving fault tolerance for connection between client and foreman node. > - > > Key: DRILL-5098 > URL: https://issues.apache.org/jira/browse/DRILL-5098 > Project: Apache Drill > Issue Type: Improvement > Components: Client - JDBC >Reporter: Sorabh Hamirwasia >Assignee: Sorabh Hamirwasia > Labels: doc-impacting, ready-to-commit > Fix For: 1.10.0 > > > With DRILL-5015 we allowed support for specifying multiple Drillbits in > connection string and randomly choosing one out of it. Over time some of the > Drillbits specified in the connection string may die and the client can fail > to connect to Foreman node if random selection happens to be of dead Drillbit. > Even if ZooKeeper is used for selecting a random Drillbit from the registered > one there is a small window when client selects one Drillbit and then that > Drillbit went down. The client will fail to connect to this Drillbit and > error out. > Instead if we try multiple Drillbits (configurable tries count through > connection string) then the probability of hitting this error window will > reduce in both the cases improving fault tolerance. During further > investigation it was also found that if there is Authentication failure then > we throw that error as generic RpcException. We need to improve that as well > to capture this case explicitly since in case of Auth failure we don't want > to try multiple Drillbits. > Connection string example with new parameter: > jdbc:drill:drillbit=[:][,[:]...;tries=5 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5080) Create a memory-managed version of the External Sort operator
[ https://issues.apache.org/jira/browse/DRILL-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5080: - Reviewer: Rahul Challapalli (was: Boaz Ben-Zvi) > Create a memory-managed version of the External Sort operator > - > > Key: DRILL-5080 > URL: https://issues.apache.org/jira/browse/DRILL-5080 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers > Labels: ready-to-commit > Fix For: 1.10.0 > > Attachments: ManagedExternalSortDesign.pdf > > > We propose to create a "managed" version of the external sort operator that > works to a clearly-defined memory limit. Attached is a design specification > for the work. > The project will include fixing a number of bugs related to the external > sort, include as sub-tasks of this umbrella task. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5081) Excessive info level logging introduced in DRILL-4203
[ https://issues.apache.org/jira/browse/DRILL-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5081: - Reviewer: Krystal (was: Sudheesh Katkam) > Excessive info level logging introduced in DRILL-4203 > - > > Key: DRILL-5081 > URL: https://issues.apache.org/jira/browse/DRILL-5081 > Project: Apache Drill > Issue Type: Bug >Reporter: Sudheesh Katkam >Assignee: Vitalii Diravka > Fix For: 1.10.0 > > > Excessive info level logging introduced in > [8461d10|https://github.com/apache/drill/commit/8461d10b4fd6ce56361d1d826bb3a38b6dc8473c]. > A line is printed for every row group being read, and for every metadata > file. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5065) Optimize count(*) queries on MapR-DB JSON Tables
[ https://issues.apache.org/jira/browse/DRILL-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5065: - Reviewer: Rahul Challapalli > Optimize count(*) queries on MapR-DB JSON Tables > > > Key: DRILL-5065 > URL: https://issues.apache.org/jira/browse/DRILL-5065 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - MapRDB >Affects Versions: 1.9.0 > Environment: Clusters with MapR v5.2.0 and above >Reporter: Abhishek Girish >Assignee: Smidth Panchamia > Labels: ready-to-commit > Fix For: 1.10.0 > > > The JSON FileReader optimizes count(* ) queries, by only counting the number > of records in the files and discarding the data. This makes the query > execution faster & efficient. > We need a similar feature in the MapR format plugin (maprdb) to optimize _id > only projection & count(* ) queries on MapR-DB JSON Tables. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5048) Fix type mismatch error in case statement with null timestamp
[ https://issues.apache.org/jira/browse/DRILL-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5048: - Reviewer: Krystal (was: Gautam Kumar Parai) > Fix type mismatch error in case statement with null timestamp > - > > Key: DRILL-5048 > URL: https://issues.apache.org/jira/browse/DRILL-5048 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Serhii Harnyk >Assignee: Serhii Harnyk > Fix For: 1.10.0 > > > AssertionError when we use case with timestamp and null: > {noformat} > 0: jdbc:drill:schema=dfs.tmp> SELECT res, CASE res WHEN true THEN > CAST('1990-10-10 22:40:50' AS TIMESTAMP) ELSE null END > . . . . . . . . . . . . . . > FROM > . . . . . . . . . . . . . . > ( > . . . . . . . . . . . . . . > SELECT > . . . . . . . . . . . . . . > (CASE WHEN (false) THEN null ELSE > CAST('1990-10-10 22:40:50' AS TIMESTAMP) END) res > . . . . . . . . . . . . . . > FROM (values(1)) foo > . . . . . . . . . . . . . . > ) foobar; > Error: SYSTEM ERROR: AssertionError: Type mismatch: > rowtype of new rel: > RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL > rowtype of set: > RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL > [Error Id: b56e0a4d-2f9e-4afd-8c60-5bc2f9d31f8f on centos-01.qa.lab:31010] > (state=,code=0) > {noformat} > Stack trace from drillbit.log > {noformat} > Caused by: java.lang.AssertionError: Type mismatch: > rowtype of new rel: > RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL > rowtype of set: > RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL > at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:1696) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at org.apache.calcite.plan.volcano.RelSubset.add(RelSubset.java:295) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at org.apache.calcite.plan.volcano.RelSet.add(RelSet.java:147) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1818) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1760) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1940) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > at > org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:138) > ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18] > ... 16 common frames omitted > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5051) DRILL-5051: Fix incorrect result returned in nest query with offset specified
[ https://issues.apache.org/jira/browse/DRILL-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5051: - Reviewer: Rahul Challapalli (was: Sudheesh Katkam) > DRILL-5051: Fix incorrect result returned in nest query with offset specified > - > > Key: DRILL-5051 > URL: https://issues.apache.org/jira/browse/DRILL-5051 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.8.0 > Environment: Fedora 24 / OpenJDK 8 >Reporter: Hongze Zhang >Assignee: Hongze Zhang > Labels: ready-to-commit > Fix For: 1.10.0 > > > My SQl: > select count(1) from (select id from (select id from > cp.`tpch/lineitem.parquet` LIMIT 2) limit 1 offset 1) > This SQL returns nothing. > Something goes wrong in LimitRecordBatch.java, and the reason is different > with [DRILL-4884|https://issues.apache.org/jira/browse/DRILL-4884?filter=-2] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()
[ https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala reassigned DRILL-5043: Assignee: Arina Ielchiieva Reviewer: Krystal (was: Arina Ielchiieva) > Function that returns a unique id per session/connection similar to MySQL's > CONNECTION_ID() > --- > > Key: DRILL-5043 > URL: https://issues.apache.org/jira/browse/DRILL-5043 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.8.0 >Reporter: Nagarajan Chinnasamy >Assignee: Arina Ielchiieva >Priority: Minor > Labels: CONNECTION_ID, SESSION, UDF, doc-impacting > Fix For: 1.10.0 > > Attachments: 01_session_id_sqlline.png, > 02_session_id_webconsole_query.png, 03_session_id_webconsole_result.png > > > Design and implement a function that returns a unique id per > session/connection similar to MySQL's CONNECTION_ID(). > *Implementation details* > function *session_id* will be added. Function returns current session unique > id represented as string. Parameter {code:java} boolean isNiladic{code} will > be added to UDF FunctionTemplate to indicate if a function is niladic (a > function to be called without any parameters and parentheses) > Please note, this function will override columns that have the same name. > Table alias should be used to retrieve column value from table. > Example: > {code:sql}select session_id from // returns the value of niladic > function session_id {code} > {code:sql}select t1.session_id from t1 // returns session_id column > value from table {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-5032) Drill query on hive parquet table failed with OutOfMemoryError: Java heap space
[ https://issues.apache.org/jira/browse/DRILL-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5032: - Reviewer: Rahul Challapalli (was: Jinfeng Ni) > Drill query on hive parquet table failed with OutOfMemoryError: Java heap > space > --- > > Key: DRILL-5032 > URL: https://issues.apache.org/jira/browse/DRILL-5032 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Hive >Affects Versions: 1.8.0 >Reporter: Serhii Harnyk >Assignee: Serhii Harnyk > Fix For: 1.10.0 > > Attachments: plan, plan with fix > > > Following query on hive parquet table failed with OOM Java heap space: > {code} > select distinct(businessdate) from vmdr_trades where trade_date='2016-04-12' > 2016-08-31 08:02:03,597 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 283938c3-fde8-0fc6-37e1-9a568c7f5913: select distinct(businessdate) from > vmdr_trades where trade_date='2016-04-12' > 2016-08-31 08:05:58,502 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning > class: > org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2 > 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze > filter tree: 1 ms > 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for > partition pruning.Total pruning elapsed time: 3 ms > 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning > class: > org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2 > 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze > filter tree: 0 ms > 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for > partition pruning.Total pruning elapsed time: 0 ms > 2016-08-31 08:05:58,664 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning > class: > org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$1 > 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze > filter tree: 0 ms > 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO > o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for > partition pruning.Total pruning elapsed time: 0 ms > 2016-08-31 08:09:42,355 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] ERROR > o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, > exiting. Information message: Unable to handle out of memory condition in > Foreman. > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_74] > at > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) > ~[na:1.8.0_74] > at > java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) > ~[na:1.8.0_74] > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) > ~[na:1.8.0_74] > at java.lang.StringBuilder.append(StringBuilder.java:136) > ~[na:1.8.0_74] > at java.lang.StringBuilder.append(StringBuilder.java:76) > ~[na:1.8.0_74] > at > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:457) > ~[na:1.8.0_74] > at java.lang.StringBuilder.append(StringBuilder.java:166) > ~[na:1.8.0_74] > at java.lang.StringBuilder.append(StringBuilder.java:76) > ~[na:1.8.0_74] > at > com.google.protobuf.TextFormat$TextGenerator.write(TextFormat.java:538) > ~[protobuf-java-2.5.0.jar:na] > at > com.google.protobuf.TextFormat$TextGenerator.print(TextFormat.java:526) > ~[protobuf-java-2.5.0.jar:na] > at > com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:389) > ~[protobuf-java-2.5.0.jar:na] > at > com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327) > ~[protobuf-java-2.5.0.jar:na] > at > com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286) > ~[protobuf-java-2.5.0.jar:na] > at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273) >
[jira] [Updated] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC
[ https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-5034: - Reviewer: Krystal (was: Karthikeyan Manivannan) > Select timestamp from hive generated parquet always return in UTC > - > > Key: DRILL-5034 > URL: https://issues.apache.org/jira/browse/DRILL-5034 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.9.0 >Reporter: Krystal >Assignee: Vitalii Diravka > Labels: doc-impacting, ready-to-commit > Fix For: 1.10.0 > > > commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904 > Reading timestamp data against a hive parquet table from drill automatically > converts the timestamp data to UTC. > {code} > SELECT TIMEOFDAY() FROM (VALUES(1)); > +--+ > |EXPR$0| > +--+ > | 2016-11-10 12:33:26.547 America/Los_Angeles | > +--+ > {code} > data schema: > {code} > message hive_schema { > optional int32 voter_id; > optional binary name (UTF8); > optional int32 age; > optional binary registration (UTF8); > optional fixed_len_byte_array(3) contributions (DECIMAL(6,2)); > optional int32 voterzone; > optional int96 create_timestamp; > optional int32 create_date (DATE); > } > {code} > Using drill-1.8, the returned timestamps match the table data: > {code} > select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from > `/user/hive/warehouse/voter_hive_parquet` limit 5; > ++ > | EXPR$0 | > ++ > | 2016-10-23 20:03:58.0 | > | null | > | 2016-09-09 12:01:18.0 | > | 2017-03-06 20:35:55.0 | > | 2017-01-20 22:32:43.0 | > ++ > 5 rows selected (1.032 seconds) > {code} > If the user timzone is changed to UTC, then the timestamp data is returned in > UTC time. > Using drill-1.9, the returned timestamps got converted to UTC eventhough the > user timezone is in PST. > {code} > select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from > dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5; > ++ > | EXPR$0 | > ++ > | 2016-10-24 03:03:58.0 | > | null | > | 2016-09-09 19:01:18.0 | > | 2017-03-07 04:35:55.0 | > | 2017-01-21 06:32:43.0 | > ++ > {code} > {code} > alter session set `store.parquet.reader.int96_as_timestamp`=true; > +---+---+ > | ok | summary | > +---+---+ > | true | store.parquet.reader.int96_as_timestamp updated. | > +---+---+ > select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` > limit 5; > ++ > |create_timestamp| > ++ > | 2016-10-24 03:03:58.0 | > | null | > | 2016-09-09 19:01:18.0 | > | 2017-03-07 04:35:55.0 | > | 2017-01-21 06:32:43.0 | > ++ > {code} > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4987) Use ImpersonationUtil in RemoteFunctionRegistry
[ https://issues.apache.org/jira/browse/DRILL-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4987: - Reviewer: Chun Chang > Use ImpersonationUtil in RemoteFunctionRegistry > --- > > Key: DRILL-4987 > URL: https://issues.apache.org/jira/browse/DRILL-4987 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Sudheesh Katkam >Assignee: Sudheesh Katkam >Priority: Minor > Fix For: 1.10.0 > > > + Use ImpersonationUtil#getProcessUserName rather than > UserGroupInformation#getCurrentUser#getUserName in RemoteFunctionRegistry > + Expose process users' group info in ImpersonationUtil and use that in > RemoteFunctionRegistry, rather than > UserGroupInformation#getCurrentUser#getGroupNames -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection
[ https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4980: - Reviewer: (was: Parth Chandra) > Upgrading of the approach of parquet date correctness status detection > -- > > Key: DRILL-4980 > URL: https://issues.apache.org/jira/browse/DRILL-4980 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Parquet >Affects Versions: 1.9.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka > Fix For: 1.10.0 > > > This jira is an addition for the > [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203]. > The date correctness label for the new generated parquet files should be > upgraded. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4938) Report UserException when constant expression reduction fails
[ https://issues.apache.org/jira/browse/DRILL-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4938: - Reviewer: Khurram Faraaz (was: Boaz Ben-Zvi) > Report UserException when constant expression reduction fails > - > > Key: DRILL-4938 > URL: https://issues.apache.org/jira/browse/DRILL-4938 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.9.0 >Reporter: Khurram Faraaz >Assignee: Serhii Harnyk >Priority: Minor > Fix For: 1.10.0 > > > We need a better error message instead of DrillRuntimeException > Drill 1.9.0 git commit ID : 4edabe7a > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select (res1 = 2016/09/22) res2 > . . . . . . . . . . . . . . > from > . . . . . . . . . . . . . . > ( > . . . . . . . . . . . . . . > select (case when (false) then null else > cast('2016/09/22' as date) end) res1 > . . . . . . . . . . . . . . > from (values(1)) foo > . . . . . . . . . . . . . . > ) foobar; > Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing > expression in constant expression evaluator [CASE(false, =(null, /(/(2016, > 9), 22)), =(CAST('2016/09/22'):DATE NOT NULL, /(/(2016, 9), 22)))]. Errors: > Error in expression at index -1. Error: Missing function implementation: > [castTIMESTAMP(INT-REQUIRED)]. Full expression: --UNKNOWN EXPRESSION--. > Error in expression at index -1. Error: Missing function implementation: > [castTIMESTAMP(INT-REQUIRED)]. Full expression: --UNKNOWN EXPRESSION--. > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4956) Temporary tables support
[ https://issues.apache.org/jira/browse/DRILL-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4956: - Reviewer: Khurram Faraaz (was: Paul Rogers) > Temporary tables support > > > Key: DRILL-4956 > URL: https://issues.apache.org/jira/browse/DRILL-4956 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting, ready-to-commit > Fix For: 1.10.0 > > > Link to design doc - > https://docs.google.com/document/d/1gSRo_w6q2WR5fPx7SsQ5IaVmJXJ6xCOJfYGyqpVOC-g/edit > Gist - > https://gist.github.com/arina-ielchiieva/50158175867a18eee964b5ba36455fbf#file-temporarytablessupport-md > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DRILL-4935) Allow drillbits to advertise a configurable host address to Zookeeper
[ https://issues.apache.org/jira/browse/DRILL-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala reassigned DRILL-4935: Assignee: Abhishek Girish Reviewer: Abhishek Girish (was: Khurram Faraaz) > Allow drillbits to advertise a configurable host address to Zookeeper > - > > Key: DRILL-4935 > URL: https://issues.apache.org/jira/browse/DRILL-4935 > Project: Apache Drill > Issue Type: New Feature > Components: Execution - RPC >Affects Versions: 1.8.0 >Reporter: Harrison Mebane >Assignee: Abhishek Girish >Priority: Minor > Labels: ready-to-commit > Fix For: 1.10.0 > > > There are certain situations, such as running Drill in distributed Docker > containers, in which it is desirable to advertise a different hostname to > Zookeeper than would be output by INetAddress.getLocalHost(). I propose > adding a configuration variable 'drill.exec.rpc.bit.advertised.host' and > passing this address to Zookeeper when the configuration variable is > populated, otherwise falling back to the present behavior. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4272) When sort runs out of memory and query fails, resources are seemingly not freed
[ https://issues.apache.org/jira/browse/DRILL-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4272: - Reviewer: Rahul Challapalli > When sort runs out of memory and query fails, resources are seemingly not > freed > --- > > Key: DRILL-4272 > URL: https://issues.apache.org/jira/browse/DRILL-4272 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Relational Operators >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Paul Rogers >Priority: Critical > Fix For: 1.10.0 > > > Executed query11.sql from resources/Advanced/tpcds/tpcds_sf1/original/parquet > Query runs out of memory: > {code} > Error: RESOURCE ERROR: One or more nodes ran out of memory while executing > the query. > Unable to allocate sv2 for 32768 records, and not enough batchGroups to spill. > batchGroups.size 1 > spilledBatchGroups.size 0 > allocated memory 19961472 > allocator limit 2000 > Fragment 19:0 > [Error Id: 87aa32b8-17eb-488e-90cb-5f5b9aec on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > And leaves fragments running, holding resources: > {code} > 2016-01-14 22:46:32,435 [Drillbit-ShutdownHook#0] INFO > o.apache.drill.exec.server.Drillbit - Received shutdown request. > 2016-01-14 22:46:32,546 [Curator-ServiceCache-0] WARN > o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-136.qa.lab no longer > active. Cancelling fragment 2967db08-cd38-925a-4960-9e881f537af8:19:0. > 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 2967db08-cd38-925a-4960-9e881f537af8:19:0: State change requested > CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED > 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] WARN > o.a.d.e.w.fragment.FragmentExecutor - > 2967db08-cd38-925a-4960-9e881f537af8:19:0: Ignoring unexpected state > transition CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED > 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] WARN > o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-136.qa.lab no longer > active. Cancelling fragment 2967db08-cd38-925a-4960-9e881f537af8:17:0. > 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 2967db08-cd38-925a-4960-9e881f537af8:17:0: State change requested > CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED > 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] WARN > o.a.d.e.w.fragment.FragmentExecutor - > 2967db08-cd38-925a-4960-9e881f537af8:17:0: Ignoring unexpected state > transition CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED > 2016-01-14 22:46:33,563 [BitServer-1] INFO > o.a.d.exec.rpc.control.ControlClient - Channel closed /10.10.88.134:59069 > <--> atsqa4-136.qa.lab/10.10.88.136:31011. > 2016-01-14 22:46:33,563 [BitClient-1] INFO > o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:34802 <--> > atsqa4-136.qa.lab/10.10.88.136:31012. > 2016-01-14 22:46:33,590 [BitClient-1] INFO > o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:36937 <--> > atsqa4-135.qa.lab/10.10.88.135:31012. > 2016-01-14 22:46:33,595 [BitClient-1] INFO > o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:53860 <--> > atsqa4-133.qa.lab/10.10.88.133:31012. > 2016-01-14 22:46:38,467 [BitClient-1] INFO > o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:48276 <--> > atsqa4-134.qa.lab/10.10.88.134:31012. > 2016-01-14 22:46:39,470 [pool-6-thread-1] INFO > o.a.drill.exec.rpc.user.UserServer - closed eventLoopGroup > io.netty.channel.nio.NioEventLoopGroup@6fb32dfb in 1003 ms > 2016-01-14 22:46:39,470 [pool-6-thread-2] INFO > o.a.drill.exec.rpc.data.DataServer - closed eventLoopGroup > io.netty.channel.nio.NioEventLoopGroup@5c93dd80 in 1003 ms > 2016-01-14 22:46:39,470 [pool-6-thread-1] INFO > o.a.drill.exec.service.ServiceEngine - closed userServer in 1004 ms > 2016-01-14 22:46:39,470 [pool-6-thread-2] INFO > o.a.drill.exec.service.ServiceEngine - closed dataPool in 1005 ms > 2016-01-14 22:46:39,483 [Drillbit-ShutdownHook#0] WARN > o.apache.drill.exec.work.WorkManager - Closing WorkManager but there are 2 > running fragments. > 2016-01-14 22:46:41,489 [Drillbit-ShutdownHook#0] ERROR > o.a.d.exec.server.BootStrapContext - Pool did not terminate > 2016-01-14 22:46:41,498 [Drillbit-ShutdownHook#0] WARN > o.apache.drill.exec.server.Drillbit - Failure on close() > java.lang.RuntimeException: Exception while closing > at > org.apache.drill.common.DrillAutoCloseables.closeNoChecked(DrillAutoCloseables.java:46) > ~[drill-common-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] > at > org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:127) >
[jira] [Updated] (DRILL-4919) Fix select count(1) / count(*) on csv with header
[ https://issues.apache.org/jira/browse/DRILL-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4919: - Reviewer: Krystal (was: Gautam Kumar Parai) > Fix select count(1) / count(*) on csv with header > - > > Key: DRILL-4919 > URL: https://issues.apache.org/jira/browse/DRILL-4919 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.8.0 >Reporter: F Méthot >Assignee: Arina Ielchiieva >Priority: Minor > Labels: ready-to-commit > Fix For: 1.10.0 > > > This happens since 1.8 > Dataset (I used extended char for display purpose) test.csvh: > a,b,c,d\n > 1,2,3,4\n > 5,6,7,8\n > Storage config: > "csvh": { > "type": "text", > "extensions" : [ > "csvh" >], >"extractHeader": true, >"delimiter": "," > } > select count(1) from dfs.`test.csvh` > Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header > names are supported > coumn name columns > column index > Fragment 0:0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4864) Add ANSI format for date/time functions
[ https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4864: - Reviewer: Krystal (was: Paul Rogers) > Add ANSI format for date/time functions > --- > > Key: DRILL-4864 > URL: https://issues.apache.org/jira/browse/DRILL-4864 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Serhii Harnyk >Assignee: Serhii Harnyk > Labels: doc-impacting > Fix For: 1.10.0 > > > The TO_DATE() is exposing the Joda string formatting conventions into the SQL > layer. This is not following SQL conventions used by ANSI and many other > database engines on the market. > Add new UDFs: > * sql_to_date(String, Format), > * sql_to_time(String, Format), > * sql_to_timestamp(String, Format) > that requires Postgres datetime format. > Table of supported Postgres patterns > ||Pattern name||Postgres format > |Full name of day|day > |Day of year|ddd > |Day of month|dd > |Day of week|d > |Name of month|month > |Abr name of month|mon > |Full era name|ee > |Name of day|dy > |Time zone|tz > |Hour 12 |hh > |Hour 12 |hh12 > |Hour 24|hh24 > |Minute of hour|mi > |Second of minute|ss > |Millisecond of minute|ms > |Week of year|ww > |Month|mm > |Halfday am|am > |Year | y > |ref.| > https://www.postgresql.org/docs/8.2/static/functions-formatting.html | > Table of acceptable Postgres pattern modifiers, which may be used in Format > string > ||Description||Pattern|| > |fill mode (suppress padding blanks and zeroes)|fm | > |fixed format global option (see usage notes)|fx | > |translation mode (print localized day and month names based on > lc_messages)|tm | > |spell mode (not yet implemented)|sp| > |ref.| > https://www.postgresql.org/docs/8.2/static/functions-formatting.html| -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4812) Wildcard queries fail on Windows
[ https://issues.apache.org/jira/browse/DRILL-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4812: - Reviewer: Kunal Khatua (was: Paul Rogers) > Wildcard queries fail on Windows > > > Key: DRILL-4812 > URL: https://issues.apache.org/jira/browse/DRILL-4812 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.7.0 > Environment: Windows 7 >Reporter: Mike Lavender > Labels: easyfix, easytest, ready-to-commit, windows > Fix For: 1.10.0 > > > Wildcards within the path of a query are not handled on windows and result in > a "String index out of range" exception. > for example: > {noformat} > 0: jdbc:drill:zk=local> SELECT SUM(qty) as num FROM > dfs.parquet.`/trends/2016/1/*/*/3701`; > Error: VALIDATION ERROR: String index out of range: -1 > SQL Query null > {noformat} > > The problem exists within: > exec\java-exec\src\main\java\org\apache\drill\exec\store\dfs\FileSelection.java > private static Path handleWildCard(final String root) > This function is looking for the index of the system specific PATH_SEPARATOR > which on windows is '\' (from System.getProperty("file.separator")). The > path passed in to handleWildcard will not ever have those type of path > separators as the Path constructor (from org.apache.hadoop.fs.Path) sets all > the path separators to '/'. > NOTE: > private static String removeLeadingSlash(String path) > in that same file explicitly looks for '/' and does not use the system > specific PATH_SEPARATOR. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4764) Parquet file with INT_16, etc. logical types not supported by simple SELECT
[ https://issues.apache.org/jira/browse/DRILL-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4764: - Reviewer: Rahul Challapalli (was: Parth Chandra) > Parquet file with INT_16, etc. logical types not supported by simple SELECT > --- > > Key: DRILL-4764 > URL: https://issues.apache.org/jira/browse/DRILL-4764 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.6.0 >Reporter: Paul Rogers >Assignee: Serhii Harnyk > Fix For: 1.10.0 > > Attachments: int_16.parquet, int_8.parquet, uint_16.parquet, > uint_32.parquet, uint_8.parquet > > > Create a Parquet file with the following schema: > message int16Data { required int32 index; required int32 value (INT_16); } > Store it as int_16.parquet in the local file system. Query it with: > SELECT * from `local`.`root`.`int_16.parquet`; > The result, in the web UI, is this error: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > UnsupportedOperationException: unsupported type: INT32 INT_16 Fragment 0:0 > [Error Id: c63f66b4-e5a9-4a35-9ceb-546b74645dd4 on 172.30.1.28:31010] > The INT_16 logical (or "original") type simply tells consumers of the file > that the data is actually a 16-bit signed int. Presumably, this should tell > Drill to use the SmallIntVector (or NullableSmallIntVector) class for > storage. Without supporting this annotation, even 16-bit integers must be > stored as 32-bits within Drill. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4301) OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
[ https://issues.apache.org/jira/browse/DRILL-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4301: - Reviewer: Rahul Challapalli > OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to > spill. > --- > > Key: DRILL-4301 > URL: https://issues.apache.org/jira/browse/DRILL-4301 > Project: Apache Drill > Issue Type: Sub-task > Components: Execution - Flow >Affects Versions: 1.5.0 > Environment: 4 node cluster >Reporter: Khurram Faraaz >Assignee: Paul Rogers > Fix For: 1.10.0 > > > Query below in Functional tests, fails due to OOM > {code} > select * from dfs.`/drill/testdata/metadata_caching/fewtypes_boolpartition` > where bool_col = true; > {code} > Drill version : drill-1.5.0 > JAVA_VERSION=1.8.0 > {noformat} > version commit_id commit_message commit_time build_email > build_time > 1.5.0-SNAPSHOT2f0e3f27e630d5ac15cdaef808564e01708c3c55 > DRILL-4190 Don't hold on to batches from left side of merge join. > 20.01.2016 @ 22:30:26 UTC Unknown 20.01.2016 @ 23:48:33 UTC > framework/framework/resources/Functional/metadata_caching/data/bool_partition1.q > (connection: 808078113) > [#1378] Query failed: > oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: > One or more nodes ran out of memory while executing the query. > Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill. > batchGroups.size 0 > spilledBatchGroups.size 0 > allocated memory 48326272 > allocator limit 46684427 > Fragment 0:0 > [Error Id: 97d58ea3-8aff-48cf-a25e-32363b8e0ecd on drill-demod2:31010] > at > oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119) > at > oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113) > at > oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) > at > oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) > at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67) > at > oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374) > at > oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89) > at > oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252) > at > oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123) > at > oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285) > at > oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257) > at > oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > oadd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > at > oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at >
[jira] [Updated] (DRILL-4280) Kerberos Authentication
[ https://issues.apache.org/jira/browse/DRILL-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4280: - Reviewer: Chun Chang (was: Chunhui Shi) > Kerberos Authentication > --- > > Key: DRILL-4280 > URL: https://issues.apache.org/jira/browse/DRILL-4280 > Project: Apache Drill > Issue Type: Improvement >Reporter: Keys Botzum >Assignee: Sudheesh Katkam > Labels: security > Fix For: 1.10.0 > > > Drill should support Kerberos based authentication from clients. This means > that both the ODBC and JDBC drivers as well as the web/REST interfaces should > support inbound Kerberos. For Web this would most likely be SPNEGO while for > ODBC and JDBC this will be more generic Kerberos. > Since Hive and much of Hadoop supports Kerberos there is a potential for a > lot of reuse of ideas if not implementation. > Note that this is related to but not the same as > https://issues.apache.org/jira/browse/DRILL-3584 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4217) Query parquet file treat INT_16 & INT_8 as INT32
[ https://issues.apache.org/jira/browse/DRILL-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4217: - Reviewer: Rahul Challapalli > Query parquet file treat INT_16 & INT_8 as INT32 > > > Key: DRILL-4217 > URL: https://issues.apache.org/jira/browse/DRILL-4217 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Data Types >Reporter: Low Chin Wei >Assignee: Serhii Harnyk > Fix For: 1.10.0 > > > Encounter this issue while trying to query a parquet file: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > UnsupportedOperationException: unsupported type: INT32 INT_16 Fragment 1:1 > We can treat the following Field Type as INTEGER before support of Short & > Byte is implemeted: > - INT32 INT_16 > - INT32 INT_8 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array
[ https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-3562: - Reviewer: Rahul Challapalli (was: Arina Ielchiieva) > Query fails when using flatten on JSON data where some documents have an > empty array > > > Key: DRILL-3562 > URL: https://issues.apache.org/jira/browse/DRILL-3562 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.1.0 >Reporter: Philip Deegan >Assignee: Serhii Harnyk > Fix For: 1.10.0 > > > Drill query fails when using flatten when some records contain an empty array > {noformat} > SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) > flat WHERE flat.c.d.e = 'f' limit 1; > {noformat} > Succeeds on > { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } } > Fails on > { "a": { "b": { "c": [] } } } > Error > {noformat} > Error: SYSTEM ERROR: ClassCastException: Cannot cast > org.apache.drill.exec.vector.NullableIntVector to > org.apache.drill.exec.vector.complex.RepeatedValueVector > {noformat} > Is it possible to ignore the empty arrays, or do they need to be populated > with dummy data? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK
[ https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala reassigned DRILL-5316: Assignee: Chun Chang Reviewer: Chun Chang (was: Sorabh Hamirwasia) > C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children > completed with ZOK > > > Key: DRILL-5316 > URL: https://issues.apache.org/jira/browse/DRILL-5316 > Project: Apache Drill > Issue Type: Bug > Components: Client - C++ >Reporter: Rob Wu >Assignee: Chun Chang >Priority: Critical > Labels: ready-to-commit > Fix For: 1.11.0 > > > When connecting to drillbit with Zookeeper, occasionally the C++ client would > crash without any reason. > A further look into the code revealed that during this call > rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); > zoo_get_children returns ZOK (0) but drillbitsVector.count is 0. > This causes drillbits to stay empty and thus > causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to > crash > Size check should be done to prevent this from happening -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4872) NPE from CTAS partitioned by a projected casted null
[ https://issues.apache.org/jira/browse/DRILL-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4872: - Reviewer: Rahul Challapalli > NPE from CTAS partitioned by a projected casted null > > > Key: DRILL-4872 > URL: https://issues.apache.org/jira/browse/DRILL-4872 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.7.0 >Reporter: Boaz Ben-Zvi >Assignee: Arina Ielchiieva > Labels: NPE > Fix For: 1.10.0 > > > Extracted from DRILL-3898 : Running the same test case on a smaller table ( > store_sales.dat from TPCDS SF 1) has no space issues, but there is a Null > Pointer Exception from the projection: > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:100) > ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > at > org.apache.drill.exec.test.generated.ProjectorGen1.doEval(ProjectorTemplate.java:49) > ~[na:na] > at > org.apache.drill.exec.test.generated.ProjectorGen1.projectRecords(ProjectorTemplate.java:62) > ~[na:na] > at > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:199) > ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT] > A simplified version of the test case: > 0: jdbc:drill:zk=local> create table dfs.tmp.ttt partition by ( x ) as select > case when columns[8] = '' then cast(null as varchar(10)) else cast(columns[8] > as varchar(10)) end as x FROM > dfs.`/Users/boazben-zvi/data/store_sales/store_sales.dat`; > Error: SYSTEM ERROR: NullPointerException > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (DRILL-4969) Basic implementation for displaySize
[ https://issues.apache.org/jira/browse/DRILL-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4969: - Reviewer: Chun Chang (was: Sudheesh Katkam) > Basic implementation for displaySize > > > Key: DRILL-4969 > URL: https://issues.apache.org/jira/browse/DRILL-4969 > Project: Apache Drill > Issue Type: Sub-task > Components: Metadata >Reporter: Laurent Goujon >Assignee: Laurent Goujon > Fix For: 1.9.0 > > > display size is fixed to 10, but for most types display size is well defined > as shown in the ODBC table: > https://msdn.microsoft.com/en-us/library/ms713974(v=vs.85).aspx -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4560) ZKClusterCoordinator does not call DrillbitStatusListener.drillbitRegistered for new bits
[ https://issues.apache.org/jira/browse/DRILL-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4560: - Reviewer: Abhishek Girish (was: Sorabh Hamirwasia) > ZKClusterCoordinator does not call DrillbitStatusListener.drillbitRegistered > for new bits > - > > Key: DRILL-4560 > URL: https://issues.apache.org/jira/browse/DRILL-4560 > Project: Apache Drill > Issue Type: Sub-task > Components: Server >Affects Versions: 1.6.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.9.0 > > > ZKClusterCoordinator notifies listeners of type DrillbitStatusListener when > drillbits disappear from ZooKeeper. The YARN Application Master (AM) also > needs to know when bits register themselves with ZK. So, ZKClusterCoordinator > should change to detect new Drill-bits, then call > DrillbitStatusListener.drillbitRegistered with the new Drill-bits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4581) Various problems in the Drill startup scripts
[ https://issues.apache.org/jira/browse/DRILL-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4581: - Reviewer: Abhishek Girish (was: Sudheesh Katkam) [~agirish]assigning for verification > Various problems in the Drill startup scripts > - > > Key: DRILL-4581 > URL: https://issues.apache.org/jira/browse/DRILL-4581 > Project: Apache Drill > Issue Type: Sub-task > Components: Server >Affects Versions: 1.6.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.8.0 > > > Noticed the following in drillbit.sh: > 1) Comment: DRILL_LOG_DIRWhere log files are stored. PWD by default. > Code: DRILL_LOG_DIR=/var/log/drill or, if it does not exist, $DRILL_HOME/log > 2) Comment: DRILL_PID_DIRThe pid files are stored. /tmp by default. > Code: DRILL_PID_DIR=$DRILL_HOME > 3) Redundant checking of JAVA_HOME. drillbit.sh sources drill-config.sh which > checks JAVA_HOME. Later, drillbit.sh checks it again. The second check is > both unnecessary and prints a less informative message than the > drill-config.sh check. Suggestion: Remove the JAVA_HOME check in drillbit.sh. > 4) Though drill-config.sh carefully checks JAVA_HOME, it does not export the > JAVA_HOME variable. Perhaps this is why drillbit.sh repeats the check? > Recommended: export JAVA_HOME from drill-config.sh. > 5) Both drillbit.sh and the sourced drill-config.sh check DRILL_LOG_DIR and > set the default value. Drill-config.sh defaults to /var/log/drill, or if that > fails, to $DRILL_HOME/log. Drillbit.sh just sets /var/log/drill and does not > handle the case where that directory is not writable. Suggested: remove the > check in drillbit.sh. > 6) Drill-config.sh checks the writability of the DRILL_LOG_DIR by touching > sqlline.log, but does not delete that file, leaving a bogus, empty client log > file on the drillbit server. Recommendation: use bash commands instead. > 7) The implementation of the above check is a bit awkward. It has a fallback > case with somewhat awkward logic. Clean this up. > 8) drillbit.sh, but not drill-config.sh, attempts to create /var/log/drill if > it does not exist. Recommended: decide on a single choice, implement it in > drill-config.sh. > 9) drill-config.sh checks if $DRILL_CONF_DIR is a directory. If not, defaults > it to $DRILL_HOME/conf. This can lead to subtle errors. If I use > drillbit.sh --config /misspelled/path > where I mistype the path, I won't get an error, I get the default config, > which may not at all be what I want to run. Recommendation: if the value of > DRILL_CONF_DRILL is passed into the script (as a variable or via --config), > then that directory must exist. Else, use the default. > 10) drill-config.sh exports, but may not set, HADOOP_HOME. This may be left > over from the original Hadoop script that the Drill script was based upon. > Recomendation: export only in the case that HADOOP_HOME is set for cygwin. > 11) Drill-config.sh checks JAVA_HOME and prints a big, bold error message to > stderr if JAVA_HOME is not set. Then, it checks the Java version and prints a > different message (to stdout) if the version is wrong. Recommendation: use > the same format (and stderr) for both. > 12) Similarly, other Java checks later in the script produce messages to > stdout, not stderr. > 13) Drill-config.sh searches $JAVA_HOME to find java/java.exe and verifies > that it is executable. The script then throws away what we just found. Then, > drill-bit.sh tries to recreate this information as: > JAVA=$JAVA_HOME/bin/java > This is wrong in two ways: 1) it ignores the actual java location and assumes > it, and 2) it does not handle the java.exe case that drill-config.sh > carefully worked out. > Recommendation: export JAVA from drill-config.sh and remove the above line > from drillbit.sh. > 14) drillbit.sh presumably takes extra arguments like this: > drillbit.sh -Dvar0=value0 --config /my/conf/dir start -Dvar1=value1 > -Dvar2=value2 -Dvar3=value3 > The -D bit allows the user to override config variables at the command line. > But, the scripts don't use the values. > A) drill-config.sh consumes --config /my/conf/dir after consuming the leading > arguments: > while [ $# -gt 1 ]; do > if [ "--config" = "$1" ]; then > shift > confdir=$1 > shift > DRILL_CONF_DIR=$confdir > else > # Presume we are at end of options and break > break > fi > done > B) drill-bit.sh will discard the var1: > startStopStatus=$1 <-- grabs "start" > shift > command=drillbit > shift <-- Consumes -Dvar1=value1 > C) Remaining values passed back into drillbit.sh: > args=$@ > nohup $thiscmd internal_start $command $args > D) Second invocation discards -Dvar2=value2 as described above. > E) Remaining values
[jira] [Updated] (DRILL-4857) When no partition pruning occurs with metadata caching there's a performance regression
[ https://issues.apache.org/jira/browse/DRILL-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4857: - Reviewer: Dechang Gu > When no partition pruning occurs with metadata caching there's a performance > regression > --- > > Key: DRILL-4857 > URL: https://issues.apache.org/jira/browse/DRILL-4857 > Project: Apache Drill > Issue Type: Bug > Components: Metadata, Query Planning & Optimization >Affects Versions: 1.7.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.8.0 > > > After DRILL-4530, we see the (expected) performance improvements in planning > time with metadata cache for cases where partition pruning got applied. > However, in cases where it did not get applied and for sufficiently large > number of files (tested with up to 400K files), there's performance > regression. Part of this was addressed by DRILL-4846. This JIRA is to > track some remaining fixes to address the regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4854) Incorrect logic in log directory checks in drill-config.sh
[ https://issues.apache.org/jira/browse/DRILL-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4854: - Reviewer: Abhishek Girish > Incorrect logic in log directory checks in drill-config.sh > -- > > Key: DRILL-4854 > URL: https://issues.apache.org/jira/browse/DRILL-4854 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.8.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.8.0 > > > The recent changes to the launch scripts introduced a subtle bug in the logic > that verifies the log directory: > if [[ ! -d "$DRILL_LOG_DIR" && ! -w "$DRILL_LOG_DIR" ]]; then > ... > if [[ ! -d "$DRILL_LOG_DIR" && ! -w "$DRILL_LOG_DIR" ]]; then > In both cases, the operator should be or ("||"). > That is, if either the item is not a directory, or it is a directory but is > not writable, then do the fall-back steps. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4846) Eliminate extra operations during metadata cache pruning
[ https://issues.apache.org/jira/browse/DRILL-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4846: - Reviewer: Dechang Gu > Eliminate extra operations during metadata cache pruning > > > Key: DRILL-4846 > URL: https://issues.apache.org/jira/browse/DRILL-4846 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.7.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.8.0 > > > While doing performance testing for DRILL-4530 using a new data set and > queries, we found two potential performance issues: (a) the metadata cache > was being read twice in some cases and (b) the checking for directory > modification time was being done twice, once as part of the first phase of > directory-based pruning and subsequently after the second phase pruning. > This check gets expensive for large number of directories. Creating this > JIRA to track fixes for these issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4727) Exclude netty from HBase Client's transitive dependencies
[ https://issues.apache.org/jira/browse/DRILL-4727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4727: - Reviewer: Abhishek Girish > Exclude netty from HBase Client's transitive dependencies > - > > Key: DRILL-4727 > URL: https://issues.apache.org/jira/browse/DRILL-4727 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Affects Versions: 1.7.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore >Priority: Critical > Fix For: 1.7.0 > > > Reported on dev/user list after moving to HBase 1.1 > {noformat} > Hi Aditya, > I tested the latest version and got this exception and the drillbit fail to > startup . > Exception in thread "main" java.lang.NoSuchMethodError: > io.netty.util.UniqueName.(Ljava/lang/String;)V > at io.netty.channel.ChannelOption.(ChannelOption.java:136) > at io.netty.channel.ChannelOption.valueOf(ChannelOption.java:99) > at io.netty.channel.ChannelOption.(ChannelOption.java:42) > at org.apache.drill.exec.rpc.BasicServer.(BasicServer.java:63) > at > org.apache.drill.exec.rpc.user.UserServer.(UserServer.java:74) > at > org.apache.drill.exec.service.ServiceEngine.(ServiceEngine.java:78) > at org.apache.drill.exec.server.Drillbit.(Drillbit.java:108) > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285) > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:271) > at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:267) > It will working if I remove jars/3rdparty/netty-all-4.0.23.Final.jar, the > drill can startup. I think there have some package dependency version issue, > do you think so ? > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4768) Drill may leak hive meta store connection if hive meta store client call hits error
[ https://issues.apache.org/jira/browse/DRILL-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4768: - Reviewer: Chun Chang > Drill may leak hive meta store connection if hive meta store client call hits > error > --- > > Key: DRILL-4768 > URL: https://issues.apache.org/jira/browse/DRILL-4768 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni > Fix For: 1.8.0 > > > We are seeing one drillbit creates hundreds of connections to hive meta > store. This indicates that drill is leaking those connection, and did not > close those connections properly. When such leaking happens, it may prevent > other applications from connecting to hive meta store. > It seems one cause of leaking connection happens when hive meta store client > call hits exception. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4801) Setting extractHeader attribute for CSV format does not propagate to all drillbits
[ https://issues.apache.org/jira/browse/DRILL-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4801: - Reviewer: Krystal > Setting extractHeader attribute for CSV format does not propagate to all > drillbits > --- > > Key: DRILL-4801 > URL: https://issues.apache.org/jira/browse/DRILL-4801 > Project: Apache Drill > Issue Type: Bug > Components: Client - CLI, Client - HTTP >Affects Versions: 1.7.0 >Reporter: Krystal >Assignee: Arina Ielchiieva > Fix For: 1.8.0 > > > I have multiple drillbits running. From web UI of one drillbit, I added > "extractHeader": true to the csv format. I logged to the Web UI of a > different drillbit and did not see the added attributed. > I tried the same for the TSV format and that worked as expect as the change > got propagated to all drillbits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3149) TextReader should support multibyte line delimiters
[ https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-3149: - Reviewer: Krystal > TextReader should support multibyte line delimiters > --- > > Key: DRILL-3149 > URL: https://issues.apache.org/jira/browse/DRILL-3149 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Text & CSV >Affects Versions: 1.0.0, 1.1.0 >Reporter: Jim Scott >Assignee: Arina Ielchiieva >Priority: Minor > Labels: doc-impacting > Fix For: 1.8.0 > > > lineDelimiter in the TextFormatConfig doesn't support \r\n for record > delimiters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4786) Improve metadata cache performance for queries with multiple partitions
[ https://issues.apache.org/jira/browse/DRILL-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4786: - Reviewer: Rahul Challapalli > Improve metadata cache performance for queries with multiple partitions > --- > > Key: DRILL-4786 > URL: https://issues.apache.org/jira/browse/DRILL-4786 > Project: Apache Drill > Issue Type: Improvement > Components: Metadata, Query Planning & Optimization >Affects Versions: 1.7.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.8.0 > > > Consider queries of the following type run against Parquet data with > metadata caching: > {noformat} > SELECT col FROM `A` WHERE dir0 = 'B`' AND dir1 IN ('1', '2', '3') > {noformat} > For such queries, Drill will read the metadata cache file from the top level > directory 'A', which is not very efficient since we are only interested in > the files from some subdirectories of 'B'. DRILL-4530 improves the > performance of such queries when the leaf level directory is a single > partition. Here, there are 3 subpartitions due to the IN list. We can > build upon the DRILL-4530 enhancement by at least reading the cache file from > the immediate parent level `/A/B` instead of the top level. > The goal of this JIRA is to improve performance for such types of queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4816) sqlline -f failed to read the query file
[ https://issues.apache.org/jira/browse/DRILL-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4816: - Reviewer: Abhishek Girish > sqlline -f failed to read the query file > > > Key: DRILL-4816 > URL: https://issues.apache.org/jira/browse/DRILL-4816 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.8.0 > Environment: redha 2.6.32-358.el6 >Reporter: Dechang Gu >Assignee: Parth Chandra > Fix For: 1.8.0 > > > Installed Apache Drill master (commit id: 4e1bdac) on a 10 node cluster. > sqlline -u "jdbc:drill:schema=dfs.xxxParquet" -f refresh_meta_dirs.sql > hit the "No such file or directory" error: > 16:34:47,956 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could > NOT find resource [logback.groovy] > 16:34:47,956 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could > NOT find resource [logback-test.xml] > 16:34:47,960 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found > resource [logback.xml] at [file:/mapr/drillPerf/ > drillbit/conf/logback.xml] > 16:34:47,961 |-WARN in ch.qos.logback.classic.LoggerContext[default] - > Resource [logback.xml] occurs multiple times on the cl > asspath. > 16:34:47,961 |-WARN in ch.qos.logback.classic.LoggerContext[default] - > Resource [logback.xml] occurs at [file:/opt/mapr/drill > /drill-1.8.0/conf/logback.xml] > 16:34:47,961 |-WARN in ch.qos.logback.classic.LoggerContext[default] - > Resource [logback.xml] occurs at [file:/mapr/drillPerf > /drillbit/conf/logback.xml] > 16:34:48,163 |-INFO in > ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not > set > 16:34:48,168 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - > About to instantiate appender of type [ch.qos.logbac > k.core.rolling.RollingFileAppender] > 16:34:48,182 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - > Naming appender as [FILE] > 16:34:48,246 |-INFO in > ch.qos.logback.core.rolling.FixedWindowRollingPolicy@29989e7c - No > compression will be used > 16:34:48,246 |-WARN in > ch.qos.logback.core.rolling.FixedWindowRollingPolicy@29989e7c - Large window > sizes are not allowed. > 16:34:48,246 |-WARN in > ch.qos.logback.core.rolling.FixedWindowRollingPolicy@29989e7c - MaxIndex > reduced to 21 > 16:34:48,257 |-INFO in > ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default > type [ch.qos.logback.class > ic.encoder.PatternLayoutEncoder] for [encoder] property > 16:34:48,319 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] > - Active log file name: /var/log/drill/drillbit_ > ucs-node1.perf.lab.log > 16:34:48,319 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] > - File property is set to [/var/log/drill/drillb > it_ucs-node1.perf.lab.log] > 16:34:48,321 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - > Setting additivity of logger [org.apache.drill] to > false > 16:34:48,321 |-INFO in ch.qos.logback.classic.joran.action.LevelAction - > org.apache.drill level set to INFO > 16:34:48,322 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - > Attaching appender named [FILE] to Logger[org.apa > che.drill] > 16:34:48,323 |-INFO in ch.qos.logback.classic.joran.action.LevelAction - ROOT > level set to INFO > 16:34:48,323 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - > Attaching appender named [FILE] to Logger[ROOT] > 16:34:48,323 |-INFO in > ch.qos.logback.classic.joran.action.ConfigurationAction - End of > configuration. > 16:34:48,324 |-INFO in > ch.qos.logback.classic.joran.JoranConfigurator@62ccf439 - Registering current > configuration as safe fa > llback point > -u[i+1] (No such file or directory) > Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl > running the query from inside the sqlline connection is ok. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3710) Make the 20 in-list optimization configurable
[ https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-3710: - Reviewer: Chun Chang > Make the 20 in-list optimization configurable > - > > Key: DRILL-3710 > URL: https://issues.apache.org/jira/browse/DRILL-3710 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.1.0 >Reporter: Hao Zhu >Assignee: Gautam Kumar Parai > Fix For: 1.8.0 > > > If Drill has more than 20 in-lists , Drill can do an optimization to convert > that in-lists into a small hash table in memory, and then do a table join > instead. > This can improve the performance of the query which has many in-lists. > Could we make "20" configurable? So that we do not need to add duplicate/junk > in-list to make it more than 20. > Sample query is : > select count(*) from table where col in > (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4751) Remove dumpcat script from Drill distribution
[ https://issues.apache.org/jira/browse/DRILL-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4751: - Reviewer: Abhishek Girish > Remove dumpcat script from Drill distribution > - > > Key: DRILL-4751 > URL: https://issues.apache.org/jira/browse/DRILL-4751 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Minor > Fix For: 1.8.0 > > > The Drill distribution includes a "dumpcat" script in the $DRILL_HOME/bin > directory. However, no documentation exists for the tool. The only reference > on Apache Drill is from a JIRA. > This appears to be a script used by developers in years past to diagnose > customer issues. In case the tool may be useful in the future, we will leave > it in the Git source tree, but omit it from the distribution (since we will > not test or document it.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4825) Wrong data with UNION ALL when querying different sub-directories under the same table
[ https://issues.apache.org/jira/browse/DRILL-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4825: - Reviewer: Rahul Challapalli > Wrong data with UNION ALL when querying different sub-directories under the > same table > -- > > Key: DRILL-4825 > URL: https://issues.apache.org/jira/browse/DRILL-4825 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.6.0, 1.7.0, 1.8.0 >Reporter: Rahul Challapalli >Assignee: Jinfeng Ni >Priority: Critical > Fix For: 1.8.0 > > Attachments: l_3level.tgz > > > git.commit.id.abbrev=0700c6b > The below query returns wrongs results > {code} > select count (*) from ( > select l_orderkey, dir0 from l_3level t1 where t1.dir0 = 1 and > t1.dir1='one' and t1.dir2 = '2015-7-12' > union all > select l_orderkey, dir0 from l_3level t2 where t2.dir0 = 1 and > t2.dir1='two' and t2.dir2 = '2015-8-12') data; > +-+ > | EXPR$0 | > +-+ > | 20 | > +-+ > {code} > The wrong result is evident from the output of the below queries > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> select count (*) from (select > l_orderkey, dir0 from l_3level t2 where t2.dir0 = 1 and t2.dir1='two' and > t2.dir2 = '2015-8-12'); > +-+ > | EXPR$0 | > +-+ > | 30 | > +-+ > 1 row selected (0.258 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> select count (*) from (select > l_orderkey, dir0 from l_3level t2 where t2.dir0 = 1 and t2.dir1='one' and > t2.dir2 = '2015-7-12'); > +-+ > | EXPR$0 | > +-+ > | 10 | > +-+ > {code} > I attached the data set. Let me know if you need anything more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4759) Drill throwing array index out of bound exception when reading a parquet file written by map reduce program.
[ https://issues.apache.org/jira/browse/DRILL-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4759: - Reviewer: Chun Chang > Drill throwing array index out of bound exception when reading a parquet file > written by map reduce program. > > > Key: DRILL-4759 > URL: https://issues.apache.org/jira/browse/DRILL-4759 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Padma Penumarthy >Assignee: Padma Penumarthy > Fix For: 1.8.0 > > > An ArrayIndexOutOfBound exception is thrown while reading bigInt data type > from dictionary encoded parquet data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4819) Update MapR version to 5.2.0
[ https://issues.apache.org/jira/browse/DRILL-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4819: - Reviewer: Abhishek Girish > Update MapR version to 5.2.0 > > > Key: DRILL-4819 > URL: https://issues.apache.org/jira/browse/DRILL-4819 > Project: Apache Drill > Issue Type: New Feature > Components: Tools, Build & Test >Affects Versions: 1.8.0 >Reporter: Patrick Wong >Assignee: Patrick Wong > Fix For: 1.8.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4147) Union All operator runs in a single fragment
[ https://issues.apache.org/jira/browse/DRILL-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4147: - Reviewer: Robert Hou > Union All operator runs in a single fragment > > > Key: DRILL-4147 > URL: https://issues.apache.org/jira/browse/DRILL-4147 > Project: Apache Drill > Issue Type: Bug >Reporter: amit hadke >Assignee: Aman Sinha > Fix For: 1.8.0 > > > A User noticed that running select from a single directory is much faster > than union all on two directories. > (https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/#comment-2349732267) > > It seems like UNION ALL operator doesn't parallelize sub scans (its using > SINGLETON for distribution type). Everything is ran in single fragment. > We may have to use SubsetTransformer in UnionAllPrule. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4704) select statement behavior is inconsistent for decimal values in parquet
[ https://issues.apache.org/jira/browse/DRILL-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4704: - Reviewer: Rahul Challapalli > select statement behavior is inconsistent for decimal values in parquet > --- > > Key: DRILL-4704 > URL: https://issues.apache.org/jira/browse/DRILL-4704 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.6.0 > Environment: Windows 7 Pro, Java 1.8.0_91 >Reporter: Dave Oshinsky > Fix For: 1.8.0 > > > A select statement that searches a parquet file for a decimal value matching > a specific value behaves inconsistently. The query expressed most simply > finds nothing: > 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where > employee_id = 100; > +--+-+++---+---+ > | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | > HIRE_DATE | > +--+-+++---+---+ > +--+-+++---+---+ > No rows selected (0.348 seconds) > The query can be modified to find the matching row in a few ways, such as the > following (using between instead of '=', changing 100 to 100.0, or casting as > decimal: > 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where > employee_id between 100 and 100; > +--+-+++---+---+ > | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | > HIR | > +--+-+++---+---+ > | 100 | Steven | King | SKING | 515.123.4567 | > 2003-06-1 | > +--+-+++---+---+ > 1 row selected (0.226 seconds) > 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where > employee_id = 100.0; > +--+-+++---+---+ > | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | > HIR | > +--+-+++---+---+ > | 100 | Steven | King | SKING | 515.123.4567 | > 2003-06-1 | > +--+-+++---+---+ > 1 row selected (0.259 seconds) > 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where > cast(employee_id AS DECIMAL) = 100; > +--+-+++---+---+ > | EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | > HIR | > +--+-+++---+---+ > | 100 | Steven | King | SKING | 515.123.4567 | > 2003-06-1 | > +--+-+++---+---+ > 1 row selected (0.232 seconds) > 0: jdbc:drill:zk=local> > The schema of the parquet data that is being searched is as follows: > $ java -jar parquet-tools*1.jar meta c:/archiveHR/HR.EMPLOYEES/1.parquet > file: file:/c:/archiveHR/HR.EMPLOYEES/1.parquet > creator:parquet-mr version 1.8.1 (build > 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf) > . > file schema:HR.EMPLOYEES > > EMPLOYEE_ID:REQUIRED FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:0 > FIRST_NAME: OPTIONAL BINARY O:UTF8 R:0 D:1 > LAST_NAME: REQUIRED BINARY O:UTF8 R:0 D:0 > EMAIL: REQUIRED BINARY O:UTF8 R:0 D:0 > PHONE_NUMBER: OPTIONAL BINARY O:UTF8 R:0 D:1 > HIRE_DATE: REQUIRED BINARY O:UTF8 R:0 D:0 > JOB_ID: REQUIRED BINARY O:UTF8 R:0 D:0 > SALARY: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > COMMISSION_PCT: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > MANAGER_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > DEPARTMENT_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1 > row group 1:RC:107 TS:9943 OFFSET:4 > > EMPLOYEE_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:360/355/0.99 > VC:107 ENC:PLAIN,BIT_PACKED > FIRST_NAME: BINARY SNAPPY DO:0 FPO:364 SZ:902/1058/1.17 VC:107 > ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED > LAST_NAME: BINARY SNAPPY DO:0 FPO:1266 SZ:913//1.22 VC:107 > ENC:PLAIN,BIT_PACKED > EMAIL: BINARY SNAPPY DO:0 FPO:2179 SZ:977/1184/1.21 VC:107 > ENC:PLAIN,BIT_PACKED > PHONE_NUMBER:BINARY SNAPPY DO:0 FPO:3156 SZ:750/1987/2.65 VC:107 > ENC:PLAIN,RLE,BIT_PACKED > HIRE_DATE: BINARY SNAPPY DO:0
[jira] [Updated] (DRILL-4766) FragmentExecutor should use EventProcessor and avoid blocking rpc threads
[ https://issues.apache.org/jira/browse/DRILL-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4766: - Reviewer: Rahul Challapalli > FragmentExecutor should use EventProcessor and avoid blocking rpc threads > - > > Key: DRILL-4766 > URL: https://issues.apache.org/jira/browse/DRILL-4766 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Flow >Affects Versions: 1.7.0 >Reporter: Deneche A. Hakim >Assignee: Sudheesh Katkam >Priority: Minor > Fix For: 1.8.0 > > > Currently, rpc thread can block when trying to deliver a cancel or early > termination message to a blocked fragment executor. > Foreman already uses an EventProcessor to avoid such scenarios. > FragmentExecutor could be improved to avoid blocking rpc threads as well -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4822) Extend distrib-env.sh search to consider site directory
[ https://issues.apache.org/jira/browse/DRILL-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4822: - Reviewer: Abhishek Girish > Extend distrib-env.sh search to consider site directory > --- > > Key: DRILL-4822 > URL: https://issues.apache.org/jira/browse/DRILL-4822 > Project: Apache Drill > Issue Type: Improvement >Reporter: Paul Rogers >Priority: Minor > Fix For: 1.8.0 > > > DRILL-4581 provided revisions to the Drill launch scripts. As part of that > fix, we introduced a new distrib-env.sh file to hold settings created by > custom Drill installers (that is, by custom distributions.) The original > version of this feature looks for distrib-env.sh only in $DRILL_HOME/env. > Experience suggests that installers will write site-specific values to > distrib-env.sh and so the file must then be copied to $DRILL_SITE when > running under YARN. Add $DRILL_SITE to the search path in drill-config.sh for > distrib-env.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4836) ZK Issue during Drillbit startup, possibly due to race condition
[ https://issues.apache.org/jira/browse/DRILL-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4836: - Reviewer: Abhishek Girish > ZK Issue during Drillbit startup, possibly due to race condition > > > Key: DRILL-4836 > URL: https://issues.apache.org/jira/browse/DRILL-4836 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: Abhishek Girish >Assignee: Paul Rogers > Fix For: 1.8.0 > > > During a parallel launch of Drillbits on a 4 node cluster, I hit this issue > during startup: > {code} > Exception in thread "main" > org.apache.drill.exec.exception.DrillbitStartupException: Failure during > initial startup of Drillbit. > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:284) > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261) > at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257) > Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: unable > to put > at > org.apache.drill.exec.coord.zk.ZookeeperClient.put(ZookeeperClient.java:196) > at > org.apache.drill.exec.store.sys.store.ZookeeperPersistentStore.putIfAbsent(ZookeeperPersistentStore.java:94) > ... > at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:113) > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:281) > ... 2 more > Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: > KeeperErrorCode = NodeExists for /drill/sys.storage_plugins/dfs > at > org.apache.drill.exec.coord.zk.ZookeeperClient.put(ZookeeperClient.java:191) > ... 7 more > {code} > And similarly, > {code} > Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: > KeeperErrorCode = NodeExists for /drill/sys.storage_plugins/kudu > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4623) Disable Epoll by Default
[ https://issues.apache.org/jira/browse/DRILL-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4623: - Reviewer: Dechang Gu > Disable Epoll by Default > > > Key: DRILL-4623 > URL: https://issues.apache.org/jira/browse/DRILL-4623 > Project: Apache Drill > Issue Type: Bug >Reporter: Sudheesh Katkam >Assignee: Sudheesh Katkam > Fix For: 1.8.0 > > > At higher concurrency (and/or spuriously), we hit [netty issue > #3539|https://github.com/netty/netty/issues/3539]. This is an issue with the > version of Netty Drill currently uses. Once Drill moves to a later version of > Netty, epoll should be reenabled by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3726) Drill is not properly interpreting CRLF (0d0a). CR gets read as content.
[ https://issues.apache.org/jira/browse/DRILL-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-3726: - Reviewer: Krystal > Drill is not properly interpreting CRLF (0d0a). CR gets read as content. > > > Key: DRILL-3726 > URL: https://issues.apache.org/jira/browse/DRILL-3726 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV >Affects Versions: 1.1.0 > Environment: Linux RHEL 6.6, OSX 10.9 >Reporter: Edmon Begoli >Assignee: Arina Ielchiieva > Fix For: 1.8.0 > > Original Estimate: 120h > Remaining Estimate: 120h > > When we query the last attribute of a text file, we get missing characters. > Looking at the row through Drill, a \r is included at the end of the last > attribute. > Looking in a text editor, it's not embedded into that attribute. > I'm thinking that Drill is not interpreting CRLF (0d0a) as a new line, only > the LF, resulting in the CR becoming part of the last attribute. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4658) cannot specify tab as a fieldDelimiter in table function
[ https://issues.apache.org/jira/browse/DRILL-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4658: - Reviewer: Krystal > cannot specify tab as a fieldDelimiter in table function > > > Key: DRILL-4658 > URL: https://issues.apache.org/jira/browse/DRILL-4658 > Project: Apache Drill > Issue Type: Bug > Components: SQL Parser >Affects Versions: 1.6.0 > Environment: Mac OS X, Java 8 >Reporter: Vince Gonzalez >Assignee: Arina Ielchiieva > Fix For: 1.8.0 > > > I can't specify a tab delimiter in the table function because it maybe counts > the characters rather than trying to interpret as a character escape code? > {code} > 0: jdbc:drill:zk=local> select columns[0] as a, cast(columns[1] as bigint) as > b from table(dfs.tmp.`sample_cast.tsv`(type => 'text', fieldDelimiter => > '\t', skipFirstLine => true)); > Error: PARSE ERROR: Expected single character but was String: \t > table sample_cast.tsv > parameter fieldDelimiter > SQL Query null > [Error Id: 3efa82e1-3810-4d4a-b23c-32d6658dffcf on 172.30.1.144:31010] > (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4175) IOBE may occur in Calcite RexProgramBuilder when queries are submitted concurrently
[ https://issues.apache.org/jira/browse/DRILL-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4175: - Reviewer: Rahul Challapalli > IOBE may occur in Calcite RexProgramBuilder when queries are submitted > concurrently > --- > > Key: DRILL-4175 > URL: https://issues.apache.org/jira/browse/DRILL-4175 > Project: Apache Drill > Issue Type: Bug > Environment: distribution >Reporter: huntersjm > Fix For: 1.8.0 > > > I queryed a sql just like `selelct v from table limit 1`,I get a error: > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IndexOutOfBoundsException: Index: 68, Size: 67 > After debug, I found there is a bug in calcite parse: > first we look line 72 in org.apache.calcite.rex.RexProgramBuilder > {noformat} >registerInternal(RexInputRef.of(i, fields), false); > {noformat} > there we get RexInputRef from RexInputRef.of, and it has a method named > createName(int idex), here NAMES is SelfPopulatingList.class. > SelfPopulatingList.class describe as Thread-safe list, but in fact it is > Thread-unsafe. when NAMES.get(index) is called distributed, it gets a error. > We hope SelfPopulatingList.class to be {$0 $1 $2 $n}, but when it called > distributed, it may be {$0,$1...$29,$30...$59,$30,$31...$59...}. > We see method registerInternal > {noformat} > private RexLocalRef registerInternal(RexNode expr, boolean force) { > expr = simplify(expr); > RexLocalRef ref; > final Pairkey; > if (expr instanceof RexLocalRef) { > key = null; > ref = (RexLocalRef) expr; > } else { > key = RexUtil.makeKey(expr); > ref = exprMap.get(key); > } > if (ref == null) { > if (validating) { > validate( > expr, > exprList.size()); > } > {noformat} > Here makeKey(expr) hope to get different key, however it get same key, so > addExpr(expr) called less, in this method > {noformat} > RexLocalRef ref; > final int index = exprList.size(); > exprList.add(expr); > ref = > new RexLocalRef( > index, > expr.getType()); > localRefList.add(ref); > return ref; > {noformat} > localRefList get error size, so in line 939, > {noformat} > final RexLocalRef ref = localRefList.get(index); > {noformat} > throw IndexOutOfBoundsException > bugfix: > We can't change origin code of calcite before they fix this bug, so we can > init NAMEs in RexLocalRef on start. Just add > {noformat} > RexInputRef.createName(2048); > {noformat} > on Bootstrap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return
[ https://issues.apache.org/jira/browse/DRILL-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4673: - Reviewer: Chun Chang > Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on > command return > - > > Key: DRILL-4673 > URL: https://issues.apache.org/jira/browse/DRILL-4673 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Minor > Labels: doc-impacting, drill > Fix For: 1.8.0 > > > Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on > command "DROP TABLE" return if table doesn't exist. > The same for "DROP VIEW IF EXISTS" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4783) Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty
[ https://issues.apache.org/jira/browse/DRILL-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4783: - Reviewer: Rahul Challapalli > Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty > --- > > Key: DRILL-4783 > URL: https://issues.apache.org/jira/browse/DRILL-4783 > Project: Apache Drill > Issue Type: Bug >Reporter: Chunhui Shi >Assignee: Chunhui Shi >Priority: Critical > Fix For: 1.8.0 > > > Flatten failed to work on top of convert_from when the resultset is empty. > For a HBase table like this: > 0: jdbc:drill:zk=localhost:5181> select convert_from(t.address.cities,'json') > from hbase.`/tmp/flattentest` t; > +--+ > | EXPR$0 > | > +--+ > | {"list":[{"city":"SunnyVale"},{"city":"Palo Alto"},{"city":"Mountain > View"}]}| > | {"list":[{"city":"Seattle"},{"city":"Bellevue"},{"city":"Renton"}]} > | > | {"list":[{"city":"Minneapolis"},{"city":"Falcon Heights"},{"city":"San > Paul"}]} | > +--+ > Flatten works when row_key is in (1,2,3) > 0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select > convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t > where row_key=1) t1; > +---+ > | EXPR$0 | > +---+ > | {"city":"SunnyVale"} | > | {"city":"Palo Alto"} | > | {"city":"Mountain View"} | > +---+ > But Flatten throws exception if the resultset is empty > 0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select > convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t > where row_key=4) t1; > Error: SYSTEM ERROR: ClassCastException: Cannot cast > org.apache.drill.exec.vector.NullableIntVector to > org.apache.drill.exec.vector.complex.RepeatedValueVector > Fragment 0:0 > [Error Id: 07fd0cab-d1e6-4259-bfec-ad80f02d93a2 on atsqa4-127.qa.lab:31010] > (state=,code=0) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4664) ScanBatch.isNewSchema() returns wrong result for map datatype
[ https://issues.apache.org/jira/browse/DRILL-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4664: - Reviewer: Chun Chang > ScanBatch.isNewSchema() returns wrong result for map datatype > - > > Key: DRILL-4664 > URL: https://issues.apache.org/jira/browse/DRILL-4664 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.6.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.8.0 > > > isNewSchema() method checks if top-level schema or any of the deeper map > schemas has changed. The last one doesn't work properly with count function. > "deeperSchemaChanged" equals true even when two map strings have the same > children fields. > Discovered while trying to fix [DRILL-2385|DRILL-2385]. > Dataset test.json for reproducing (MAP datatype object): > {code}{"oooi":{"oa":{"oab":{"oabc":1{code} > Example of query: > {code}select count(t.oooi) from dfs.tmp.`test.json` t{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4794) Regression: Wrong result for query with disjunctive partition filters
[ https://issues.apache.org/jira/browse/DRILL-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4794: - Reviewer: Rahul Challapalli > Regression: Wrong result for query with disjunctive partition filters > - > > Key: DRILL-4794 > URL: https://issues.apache.org/jira/browse/DRILL-4794 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.7.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.8.0 > > > For a query that contains certain types of disjunctive filter conditions such > as 'dir0=x OR dir1=y' we get wrong result when metadata caching is used. > This is a regression due to DRILL-4530. > Note that the filter involves OR of 2 different directory levels. For the > normal case of OR condition at the same level the problem does not occur. > Correct result (without metadata cache) > {noformat} > 0: jdbc:drill:zk=local> select count(*) from dfs.`orders` where dir0=1994 or > dir1='Q3' ; > +-+ > | EXPR$0 | > +-+ > | 60 | > +-+ > {noformat} > Wrong result (with metadata cache): > {noformat} > 0: jdbc:drill:zk=local> select count(*) from dfs.`orders` where dir0=1994 or > dir1='Q3' ; > +-+ > | EXPR$0 | > +-+ > | 50 | > +-+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4744) Fully Qualified JDBC Plugin Tables return Table not Found via Rest API
[ https://issues.apache.org/jira/browse/DRILL-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4744: - Reviewer: Chun Chang > Fully Qualified JDBC Plugin Tables return Table not Found via Rest API > -- > > Key: DRILL-4744 > URL: https://issues.apache.org/jira/browse/DRILL-4744 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JDBC >Affects Versions: 1.6.0 >Reporter: John Omernik >Assignee: Roman Lozovyk >Priority: Minor > Fix For: 1.7.0 > > > When trying to query a JDBC table via authenticated Rest API, using a fully > qualified table name returns table not found. This does not occur in > sqlline, and a workaround is to "use pluginname.mysqldatabase" prior to the > query. (Then the fully qualified table name will work) > Plugin Name: mysql > Mysql Database: events > Mysql Table: curevents > Via Rest: > select * from mysql.events.curevents limit 10; > Fail with "VALIDATION ERROR "Table 'mysql.events.curevents' not found > Via Rest: > use mysql.events; > select * from mysql.events.curevents limit 10; > - Success. > Via SQL line, authenticating with the same username, you can connect, and run > select * from mysql.events.curevents limit 10; > without issue. (and without the use mysql.events) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan
[ https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4743: - Reviewer: Robert Hou > HashJoin's not fully parallelized in query plan > --- > > Key: DRILL-4743 > URL: https://issues.apache.org/jira/browse/DRILL-4743 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.5.0 >Reporter: Gautam Kumar Parai >Assignee: Gautam Kumar Parai > Labels: doc-impacting > Fix For: 1.8.0 > > > The underlying problem is filter selectivity under-estimate for a query with > complicated predicates e.g. deeply nested and/or predicates. This leads to > under parallelization of the major fragment doing the join. > To really resolve this problem we need table/column statistics to correctly > estimate the selectivity. However, in the absence of statistics OR even when > existing statistics are insufficient to get a correct estimate of selectivity > this will serve as a workaround. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4746) Verification Failures (Decimal values) in drill's regression tests
[ https://issues.apache.org/jira/browse/DRILL-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4746: - Reviewer: Khurram Faraaz > Verification Failures (Decimal values) in drill's regression tests > -- > > Key: DRILL-4746 > URL: https://issues.apache.org/jira/browse/DRILL-4746 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types, Storage - Text & CSV >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.8.0 > > > We started seeing the below 4 functional test failures in drill's extended > tests [1]. The data for the below tests can be downloaded from [2] > {code} > framework/resources/Functional/aggregates/tpcds_variants/text/aggregate28.q > framework/resources/Functional/tpcds/impala/text/q43.q > framework/resources/Functional/tpcds/variants/text/q6_1.sql > framework/resources/Functional/aggregates/tpcds_variants/text/aggregate29.q > {code} > The failures started showing up from the commit [3] > [1] https://github.com/mapr/drill-test-framework > [2] http://apache-drill.s3.amazonaws.com/files/tpcds-sf1-text.tgz > [3] > https://github.com/apache/drill/commit/223507b76ff6c2227e667ae4a53f743c92edd295 > Let me know if more information is needed to reproduce this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4530) Improve metadata cache performance for queries with single partition
[ https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4530: - Reviewer: Rahul Challapalli > Improve metadata cache performance for queries with single partition > - > > Key: DRILL-4530 > URL: https://issues.apache.org/jira/browse/DRILL-4530 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.8.0 > > > Consider two types of queries which are run with Parquet metadata caching: > {noformat} > query 1: > SELECT col FROM `A/B/C`; > query 2: > SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C'; > {noformat} > For a certain dataset, the query1 elapsed time is 1 sec whereas query2 > elapsed time is 9 sec even though both are accessing the same amount of data. > The user expectation is that they should perform roughly the same. The main > difference comes from reading the bigger metadata cache file at the root > level 'A' for query2 and then applying the partitioning filter. query1 reads > a much smaller metadata cache file at the subdirectory level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2330) Add support for nested aggregate expressions for window aggregates
[ https://issues.apache.org/jira/browse/DRILL-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-2330: - Reviewer: Khurram Faraaz > Add support for nested aggregate expressions for window aggregates > -- > > Key: DRILL-2330 > URL: https://issues.apache.org/jira/browse/DRILL-2330 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 0.8.0 >Reporter: Abhishek Girish >Assignee: Gautam Kumar Parai > Fix For: 1.8.0 > > Attachments: drillbit.log > > > Aggregate expressions currently cannot be nested. > *The following query fails to validate:* > {code:sql} > select avg(sum(i_item_sk)) from item; > {code} > Error: > Query failed: SqlValidatorException: Aggregate expressions cannot be nested > Log attached. > Reference: TPCDS queries (20, 63, 98, ...) fail to execute. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4199) Add Support for HBase 1.X
[ https://issues.apache.org/jira/browse/DRILL-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4199: - Reviewer: Abhishek Girish (was: Jacques Nadeau) > Add Support for HBase 1.X > - > > Key: DRILL-4199 > URL: https://issues.apache.org/jira/browse/DRILL-4199 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - HBase >Affects Versions: 1.7.0 >Reporter: Divjot singh >Assignee: Aditya Kishore > Fix For: 1.7.0 > > > Is there any Road map to upgrade the Hbase version to 1.x series. Currently > drill supports Hbase 0.98 version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2593) 500 error when crc for a query profile is out of sync
[ https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-2593: - Reviewer: Krystal > 500 error when crc for a query profile is out of sync > - > > Key: DRILL-2593 > URL: https://issues.apache.org/jira/browse/DRILL-2593 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 0.7.0 >Reporter: Jason Altekruse >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > Attachments: warning1.JPG, warning2.JPG > > > To reproduce, on a machine where an embedded drillbit has been run, edit one > of the profiles stored in /tmp/drill/profiles and try to navigate to the > profiles page on the Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4716) status.json doesn't work in drill ui
[ https://issues.apache.org/jira/browse/DRILL-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4716: - Reviewer: Krystal > status.json doesn't work in drill ui > > > Key: DRILL-4716 > URL: https://issues.apache.org/jira/browse/DRILL-4716 > Project: Apache Drill > Issue Type: Bug > Components: Client - HTTP >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva >Priority: Minor > Fix For: 1.7.0 > > > 1. http://localhost:8047/status returns "Running!" > But http://localhost:8047/status.json gives error. > {code} > { > "errorMessage" : "HTTP 404 Not Found" > } > {code} > 2. Remove link to System Options on page http://localhost:8047/status as > redundant. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2385) count on complex objects failed with missing function implementation
[ https://issues.apache.org/jira/browse/DRILL-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-2385: - Reviewer: Chun Chang > count on complex objects failed with missing function implementation > > > Key: DRILL-2385 > URL: https://issues.apache.org/jira/browse/DRILL-2385 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 0.8.0 >Reporter: Chun Chang >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.7.0 > > > #Wed Mar 04 01:23:42 EST 2015 > git.commit.id.abbrev=71b6bfe > Have a complex type looks like the following: > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.sia from > `complex.json` t limit 1; > ++ > |sia | > ++ > | [1,11,101,1001] | > ++ > {code} > A count on the complex type will fail with missing function implementation: > {code} > 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.gbyi, count(t.sia) > countsia from `complex.json` t group by t.gbyi; > Query failed: RemoteRpcException: Failure while running fragment., Schema is > currently null. You must call buildSchema(SelectionVectorMode) before this > container can return a schema. [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on > qa-node119.qa.lab:31010 ] > [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on qa-node119.qa.lab:31010 ] > Error: exception while executing query: Failure while executing query. > (state=,code=0) > {code} > drillbit.log > {code} > 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] ERROR > o.a.drill.exec.ops.FragmentContext - Fragment Context received failure. > org.apache.drill.exec.exception.SchemaChangeException: Failure while > materializing expression. > Error in expression at index 0. Error: Missing function implementation: > [count(BIGINT-REPEATED)]. Full expression: null. > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal(HashAggBatch.java:210) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator(HashAggBatch.java:158) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:101) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:114) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) > [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] WARN > o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing > fragment > java.lang.NullPointerException: Schema is currently null. You must call > buildSchema(SelectionVectorMode) before this container can return a schema. > at > com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) > ~[guava-14.0.1.jar:na] > at > org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:261) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at > org.apache.drill.exec.record.AbstractRecordBatch.getSchema(AbstractRecordBatch.java:155) > ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT] > at >
[jira] [Updated] (DRILL-3623) Limit 0 should avoid execution when querying a known schema
[ https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-3623: - Reviewer: Krystal > Limit 0 should avoid execution when querying a known schema > --- > > Key: DRILL-3623 > URL: https://issues.apache.org/jira/browse/DRILL-3623 > Project: Apache Drill > Issue Type: Sub-task > Components: Storage - Hive >Affects Versions: 1.1.0 > Environment: MapR cluster >Reporter: Andries Engelbrecht >Assignee: Sudheesh Katkam > Labels: doc-impacting > Fix For: 1.7.0 > > > Running a select * from hive.table limit 0 does not return (hangs). > Select * from hive.table limit 1 works fine > Hive table is about 6GB with 330 files with parquet using snappy compression. > Data types are int, bigint, string and double. > Querying directory with parquet files through the DFS plugin works fine > select * from dfs.root.`/user/hive/warehouse/database/table` limit 0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3474) Add implicit file columns support
[ https://issues.apache.org/jira/browse/DRILL-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-3474: - Reviewer: Krystal > Add implicit file columns support > - > > Key: DRILL-3474 > URL: https://issues.apache.org/jira/browse/DRILL-3474 > Project: Apache Drill > Issue Type: New Feature > Components: Metadata >Affects Versions: 1.1.0 >Reporter: Jim Scott >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.7.0 > > > I could not find another ticket which talks about this ... > The file name should be a column which can be selected or filtered when > querying a directory just like dir0, dir1 are available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result
[ https://issues.apache.org/jira/browse/DRILL-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4707: - Reviewer: Chun Chang > Conflicting columns names under case-insensitive policy lead to either memory > leak or incorrect result > -- > > Key: DRILL-4707 > URL: https://issues.apache.org/jira/browse/DRILL-4707 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni >Priority: Critical > Fix For: 1.8.0 > > > On latest master branch: > {code} > select version, commit_id, commit_message from sys.version; > +-+---+-+ > | version | commit_id | > commit_message | > +-+---+-+ > | 1.7.0-SNAPSHOT | 3186217e5abe3c6c2c7e504cdb695567ff577e4c | DRILL-4607: > Add a split function that allows to separate string by a delimiter | > +-+---+-+ > {code} > If a query has two conflicting column names under case-insensitive policy, > Drill will either hit memory leak, or incorrect issue. > Q1. > {code} > select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`; > Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. > Memory leaked: (131072) > Allocator(op:0:0:1:Project) 100/131072/2490368/100 > (res/actual/peak/limit) > Fragment 0:0 > {code} > Q2: return only one column in the result. > {code} > select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`; > +--+ > | XYZ | > +--+ > | 0| > | 1| > | 1| > | 1| > | 4| > | 0| > | 3| > {code} > The cause of the problem seems to be that the Project thinks the two incoming > columns as identical (since Drill adopts case-insensitive for column names in > execution). > The planner should make sure that the conflicting columns are resolved, since > execution is name-based. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3559) Make filename available to sql statments just like dirN
[ https://issues.apache.org/jira/browse/DRILL-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-3559: - Reviewer: Krystal > Make filename available to sql statments just like dirN > --- > > Key: DRILL-3559 > URL: https://issues.apache.org/jira/browse/DRILL-3559 > Project: Apache Drill > Issue Type: Improvement > Components: SQL Parser >Affects Versions: 1.1.0 >Reporter: Stefán Baxter >Assignee: Arina Ielchiieva >Priority: Minor > Labels: doc-impacting > Fix For: 1.7.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4693) Incorrect column ordering when CONVERT_FROM() json is used
[ https://issues.apache.org/jira/browse/DRILL-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4693: - Reviewer: Chun Chang > Incorrect column ordering when CONVERT_FROM() json is used > --- > > Key: DRILL-4693 > URL: https://issues.apache.org/jira/browse/DRILL-4693 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators, Query Planning & > Optimization >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.7.0 > > > For the following query, the column order in the results is wrong..it should > be col1, col2, col3. > {noformat} > 0: jdbc:drill:zk=local> select 'abc' as col1, convert_from('{"x" : "y"}', > 'json') as col2, 'xyz' as col3 from cp.`tpch/region.parquet`; > +---+---++ > | col1 | col3 |col2| > +---+---++ > | abc | xyz | {"x":"y"} | > | abc | xyz | {"x":"y"} | > | abc | xyz | {"x":"y"} | > | abc | xyz | {"x":"y"} | > | abc | xyz | {"x":"y"} | > +---+---++ > {noformat} > The EXPLAIN plan: > {noformat} > 0: jdbc:drill:zk=local> explain plan for select 'abc' as col1, > convert_from('{"x" : "y"}', 'json') as col2, 'xyz' as col3 from > cp.`tpch/region.parquet`; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(col1=['abc'], col2=[CONVERT_FROMJSON('{"x" : "y"}')], > col3=['xyz']) > 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath > [path=classpath:/tpch/region.parquet]], > selectionRoot=classpath:/tpch/region.parquet, numFiles=1, > usedMetadataFile=false, columns=[]]]) > {noformat} > This happens on current master branch as well as 1.6.0 and even earlier (I > checked 1.4.0 as well which also has the same behavior). So it is a > pre-existing bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4733) max(dir0) reading more columns than necessary
[ https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4733: - Reviewer: Chun Chang > max(dir0) reading more columns than necessary > - > > Key: DRILL-4733 > URL: https://issues.apache.org/jira/browse/DRILL-4733 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Arina Ielchiieva >Priority: Critical > Fix For: 1.7.0 > > Attachments: bug.tgz > > > The below query started to fail from this commit : > 3209886a8548eea4a2f74c059542672f8665b8d2 > {code} > select max(dir0) from dfs.`/drill/testdata/bug/2016`; > Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support > schema changes > Fragment 0:0 > [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] > (state=,code=0) > {code} > The sub-folders contains files which do have schema change for one column > "contributions" (int32 vs double). However prior to this commit we did not > fail in the scenario. Log files and test data are attached -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4701) Fix log name and missing lines in logs on Web UI
[ https://issues.apache.org/jira/browse/DRILL-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4701: - Reviewer: Krystal > Fix log name and missing lines in logs on Web UI > > > Key: DRILL-4701 > URL: https://issues.apache.org/jira/browse/DRILL-4701 > Project: Apache Drill > Issue Type: Bug >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > > 1. When the log files are downloaded from the ui, the name of the downloaded > file is "download". We should save the file with the same name as the log > file (ie. drillbit.log) > 2. The last N lines of the log file displayed in the web UI do not match the > log file itself. Some lines are missing compared with actual log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4588) Enable JMXReporter to Expose Metrics
[ https://issues.apache.org/jira/browse/DRILL-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4588: - Reviewer: Krystal > Enable JMXReporter to Expose Metrics > > > Key: DRILL-4588 > URL: https://issues.apache.org/jira/browse/DRILL-4588 > Project: Apache Drill > Issue Type: Bug >Reporter: Sudheesh Katkam >Assignee: Sudheesh Katkam > Fix For: 1.7.0 > > > -There is a static initialization order issue that needs to be fixed.- > The code is commented out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator
[ https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4679: - Reviewer: Chun Chang > CONVERT_FROM() json format fails if 0 rows are received from upstream > operator > --- > > Key: DRILL-4679 > URL: https://issues.apache.org/jira/browse/DRILL-4679 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.7.0 > > > CONVERT_FROM() json format fails as below if the underlying Filter produces 0 > rows: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without > first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch] > Fragment 0:0 > {noformat} > If the conversion is applied as UTF8 format, the same query succeeds: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > ++ > | x | > ++ > ++ > No rows selected (0.241 seconds) > {noformat} > The reason for this is the special handling in the ProjectRecordBatch for > JSON. The output schema is not known for this until the run time and the > ComplexWriter in the Project relies on seeing the input data to determine the > output schema - this could be a MapVector or ListVector etc. > If the input data has 0 rows due to a filter condition, we should at least > produce a default output schema, e.g an empty MapVector ? Need to decide a > good default. Note that the CONVERT_FROM(x, 'json') could occur on 2 > branches of a UNION-ALL and if one input is empty while the other side is > not, it may still cause incompatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4676) Foreman.moveToState can block forever if called by the foreman thread while the query is still being setup
[ https://issues.apache.org/jira/browse/DRILL-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4676: - Reviewer: Chun Chang > Foreman.moveToState can block forever if called by the foreman thread while > the query is still being setup > -- > > Key: DRILL-4676 > URL: https://issues.apache.org/jira/browse/DRILL-4676 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.6.0 >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim > Fix For: 1.7.0 > > > When the query is being setup, foreman has a special CountDownLatch that > blocks rpc threads from delivering external events, this latch is unblocked > at the end of the query setup. > In some cases though, when the foreman is submitting remote fragments, a > failure in RpcBus.send() causes an exception to be thrown that is reported to > Foreman.FragmentSubmitListener and blocks in the CountDownLatch. This causes > the foreman thread to block forever, and can rpc threads to be blocked too. > This seems to happen more frequently at a high concurrency load, and also can > prevent clients from connecting to the Drillbits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4654) Expose New System Metrics
[ https://issues.apache.org/jira/browse/DRILL-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4654: - Reviewer: Krystal > Expose New System Metrics > - > > Key: DRILL-4654 > URL: https://issues.apache.org/jira/browse/DRILL-4654 > Project: Apache Drill > Issue Type: Improvement >Reporter: Sudheesh Katkam >Assignee: Sudheesh Katkam > Fix For: 1.8.0 > > > + Add more metrics to the DrillMetrics registry (exposed through web UI and > jconsole, through JMX): pending queries, running queries, completed queries, > current memory usage (root allocator) > + Clean up and document metric registration API > -+ Deprecate getMetrics() method in contextual objects; use > DrillMetrics.getRegistry() directly- > + Make JMX reporting and log reporting configurable through system properties > (since config file is not meant to be used in common module) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4298) SYSTEM ERROR: ChannelClosedException
[ https://issues.apache.org/jira/browse/DRILL-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4298: - Assignee: Deneche A. Hakim Reviewer: Chun Chang > SYSTEM ERROR: ChannelClosedException > > > Key: DRILL-4298 > URL: https://issues.apache.org/jira/browse/DRILL-4298 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.5.0 >Reporter: Chun Chang >Assignee: Deneche A. Hakim > Fix For: 1.7.0 > > > 1.5.0-SNAPSHOT2f0e3f27e630d5ac15cdaef808564e01708c3c55 > Running functional regression, hit this error, seems random and not > associated with any particular query. > From client side: > {noformat} > 1/5 create table `existing_partition_pruning/lineitempart` partition > by (dir0) as select * from > dfs.`/drill/testdata/partition_pruning/dfs/lineitempart`; > [1;31mError: SYSTEM ERROR: ChannelClosedException: Channel closed > /10.10.100.171:31010 <--> /10.10.100.171:33713. > Fragment 0:0 > [Error Id: 772d90b8-c5e6-4ecc-8776-68ccc6b57d49 on drillats1.qa.lab:31010] > (state=,code=0)[m > java.sql.SQLException: SYSTEM ERROR: ChannelClosedException: Channel closed > /10.10.100.171:31010 <--> /10.10.100.171:33713. > Fragment 0:0 > [Error Id: 772d90b8-c5e6-4ecc-8776-68ccc6b57d49 on drillats1.qa.lab:31010] > at > org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247) > at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:321) > at > net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:172) > at sqlline.IncrementalRows.hasNext(IncrementalRows.java:62) > at > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) > at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) > at sqlline.SqlLine.print(SqlLine.java:1593) > at sqlline.Commands.execute(Commands.java:852) > at sqlline.Commands.sql(Commands.java:751) > at sqlline.SqlLine.dispatch(SqlLine.java:746) > at sqlline.SqlLine.runCommands(SqlLine.java:1651) > at sqlline.Commands.run(Commands.java:1304) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36) > at sqlline.SqlLine.dispatch(SqlLine.java:742) > at sqlline.SqlLine.initArgs(SqlLine.java:553) > at sqlline.SqlLine.begin(SqlLine.java:596) > at sqlline.SqlLine.start(SqlLine.java:375) > at sqlline.SqlLine.main(SqlLine.java:268) > Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM > ERROR: ChannelClosedException: Channel closed /10.10.100.171:31010 <--> > /10.10.100.171:33713. > Fragment 0:0 > [Error Id: 772d90b8-c5e6-4ecc-8776-68ccc6b57d49 on drillats1.qa.lab:31010] > at > org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119) > at > org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113) > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) > at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67) > at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374) > at > org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89) > at > org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252) > at > org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123) > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285) > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > at >
[jira] [Updated] (DRILL-4479) JsonReader should pick a less restrictive type when creating the default column
[ https://issues.apache.org/jira/browse/DRILL-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4479: - Reviewer: Chun Chang > JsonReader should pick a less restrictive type when creating the default > column > --- > > Key: DRILL-4479 > URL: https://issues.apache.org/jira/browse/DRILL-4479 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.5.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > Fix For: 1.7.0 > > Attachments: mostlynulls.json > > > This JIRA is related to DRILL-3806 but has a narrower scope, so I decided to > create separate one. > The JsonReader has the method ensureAtLeastOneField() (see > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L91) > that ensures that when no columns are found, create an empty one and it > chooses to create a nullable int column. One consequence is that queries of > the following type fail: > {noformat} > select c1 from dfs.`mostlynulls.json`; > ... > ... > | null | > | null | > Error: DATA_READ ERROR: Error parsing JSON - You tried to write a VarChar > type when you are using a ValueWriter of type NullableIntWriterImpl. > File /Users/asinha/data/mostlynulls.json > Record 4097 > {noformat} > In this file the first 4096 rows have NULL values for c1 followed by rows > that have a valid string. > It would be useful for the Json reader to choose a less restrictive type such > as varchar in order to allow more types of queries to run. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3826) Concurrent Query Submission leads to Channel Closed Exception
[ https://issues.apache.org/jira/browse/DRILL-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-3826: - Assignee: Deneche A. Hakim Reviewer: Rahul Challapalli > Concurrent Query Submission leads to Channel Closed Exception > - > > Key: DRILL-3826 > URL: https://issues.apache.org/jira/browse/DRILL-3826 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC, Execution - RPC >Affects Versions: 1.1.0, 1.2.0 > Environment: - CentOS release 6.6 (Final) > - hadoop-2.7.1 > - hbase-1.0.1.1 > - drill-1.1.0 > - jdk-1.8.0_45 >Reporter: Yiyi Hu >Assignee: Deneche A. Hakim > Labels: filesystem, hadoop, hbase, jdbc, rpc > Fix For: 1.7.0 > > Attachments: jdbc-test-client-drillbit.log, shell-sqlline.log, > shell-test-drillbit.log > > > Frequently seen CHANNEL CLOSED EXCEPTION while running concurrent quries with > relatively large LIMIT. > Here are the details, > SET UP: > - Single drillbit running on a single zookeeper node > - 4G heap size, 8G direct memory > - Storage plugins: local filesystem, hdfs, hbase > TEST DATA: > - A 50,000,000 records json file test.json, with two fields id, > title (approximately 3G). > SHELL TEST: > - Running 4 drill shells concurrently with query: > SELECT id, title from dfs.`test.json` LIMIT 500. > - Queries got canceled. Channel closing between client and server were seen > randomly, as an example shown below: > {noformat} > java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: > ChannelClosedException: Channel closed /192.168.4.201:31010 <--> > /192.168.4.201:48829. > Fragment 0:0 > [Error Id: 0bd2b500-155e-46e0-9f26-bd89fea47a25 on TEST-101:31010] > at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) > at > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) > at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) > at sqlline.SqlLine.print(SqlLine.java:1583) > at sqlline.Commands.execute(Commands.java:852) > at sqlline.Commands.sql(Commands.java:751) > at sqlline.SqlLine.dispatch(SqlLine.java:738) > at sqlline.SqlLine.begin(SqlLine.java:612) > at sqlline.SqlLine.start(SqlLine.java:366) > at sqlline.SqlLine.main(SqlLine.java:259) > {noformat} > JDBC TEST: > - 6 separate threads running the same query: SELECT id, title from > dfs.`test.json` LIMIT 1000, each maintains its own connection. ResultSet, > statement and connection are closed finally. > - Throws the same channel closed exception randomly. Log file were enclosed > for review. > - Memory usage was monitored, all good. > CROSS STORAGE PLUGINS: > - The same issue can be found not only in JSON on a file system (local/hdfs), > but also in HBASE. > - The issue was not found in a single thread application. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3833) Concurrent Queries Failed Unexpectedly
[ https://issues.apache.org/jira/browse/DRILL-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-3833: - Assignee: Deneche A. Hakim Reviewer: Rahul Challapalli > Concurrent Queries Failed Unexpectedly > -- > > Key: DRILL-3833 > URL: https://issues.apache.org/jira/browse/DRILL-3833 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC, Execution - RPC >Affects Versions: 1.1.0, 1.2.0 > Environment: CentOS release 6.6 (Final) > Hadoop-2.7.1 > Drill-1.1.0 > Single drillbit on single zookeeper >Reporter: Yiyi Hu >Assignee: Deneche A. Hakim > Labels: hdfs, jdbc, rpc > Fix For: 1.7.0 > > Attachments: drillbit.log > > > Concurrent queries with a relatively large LIMIT size *failed*, where the > failure occurred randomly. > To reproduce: > - Test data: a JSON file test.json (at least 10,000,000 records, two fields > id, title); > - Submit 5 queries with 5 separate threads using jdbc, where the query is: > {panel} > SELECT id, title FROM dfs.`test.json` LIMIT 1000; > {panel} > - The error message in drillbit.log: > {noformat} > 2015-09-24 19:15:15,393 [Client-1] INFO > o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#4] Query failed: > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > ChannelClosedException: Channel closed /192.168.4.201:31010 <--> > /192.168.4.201:58795. > Fragment 0:0 > [Error Id: 60a7baa8-a2ed-47e6-b7ca-68afd82c852a on TEST-101:31010] > at > org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118) > [drill-java-exec-1.1.0.jar:1.1.0] > at > org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:111) > [drill-java-exec-1.1.0.jar:1.1.0] > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47) > [drill-java-exec-1.1.0.jar:1.1.0] > at > org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32) > [drill-java-exec-1.1.0.jar:1.1.0] > at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61) > [drill-java-exec-1.1.0.jar:1.1.0] > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) > [drill-java-exec-1.1.0.jar:1.1.0] > at > org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) > [drill-java-exec-1.1.0.jar:1.1.0] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) > [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) > [netty-handler-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) > [netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) >
[jira] [Updated] (DRILL-4657) Rank() will return wrong results if a frame of data is too big (more than 2 batches)
[ https://issues.apache.org/jira/browse/DRILL-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4657: - Reviewer: Abhishek Girish > Rank() will return wrong results if a frame of data is too big (more than 2 > batches) > > > Key: DRILL-4657 > URL: https://issues.apache.org/jira/browse/DRILL-4657 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.3.0 >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim >Priority: Critical > Fix For: 1.7.0 > > > When you run a query with RANK, and one particular frame is too long to fit > in 2 batches of data, you will get wrong result. > I was able to reproduce the issue in a unit test, thanks to the fact that we > can control the size of the batches processed by the window operator. I will > post a fix soon along with the unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4478) binary_string cannot convert buffer that were not start from 0 correctly
[ https://issues.apache.org/jira/browse/DRILL-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4478: - Reviewer: Khurram Faraaz (was: Aman Sinha) > binary_string cannot convert buffer that were not start from 0 correctly > > > Key: DRILL-4478 > URL: https://issues.apache.org/jira/browse/DRILL-4478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Codegen >Reporter: Chunhui Shi >Assignee: Chunhui Shi > Fix For: 1.7.0 > > > When binary_string was called multiple times, it can only convert the first > one correctly if the drillbuf start from 0. For the second and afterwards > calls, because the drillbuf is not starting from 0 thus > DrillStringUtils.parseBinaryString could not do the work correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3845) PartitionSender doesn't send last batch for receivers that already terminated
[ https://issues.apache.org/jira/browse/DRILL-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-3845: - Reviewer: Kunal Khatua (was: Victoria Markman) > PartitionSender doesn't send last batch for receivers that already terminated > - > > Key: DRILL-3845 > URL: https://issues.apache.org/jira/browse/DRILL-3845 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim > Fix For: 1.5.0 > > Attachments: 29c45a5b-e2b9-72d6-89f2-d49ba88e2939.sys.drill > > > Even if a receiver has finished and informed the corresponding partition > sender, the sender will still try to send a "last batch" to the receiver when > it's done. In most cases this is fine as those batches will be silently > dropped by the receiving DataServer, but if a receiver has finished +10 > minutes ago, DataServer will throw an exception as it couldn't find the > corresponding FragmentManager (WorkEventBus has a 10 minutes recentlyFinished > cache). > DRILL-2274 is a reproduction for this case (after the corresponding fix is > applied). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4121) External Sort may not spill if above a receiver
[ https://issues.apache.org/jira/browse/DRILL-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4121: - Reviewer: Kunal Khatua (was: Victoria Markman) > External Sort may not spill if above a receiver > --- > > Key: DRILL-4121 > URL: https://issues.apache.org/jira/browse/DRILL-4121 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim > Fix For: 1.5.0 > > > If external sort is above a receiver, all received batches will contain non > root buffers. Sort operator doesn't account for non root buffers when > estimating how much memory and if it needs to spill to disk. This may delay > the spill and cause the corresponding Drillbit to use large amounts of memory -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4163) Support schema changes for MergeJoin operator.
[ https://issues.apache.org/jira/browse/DRILL-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4163: - Reviewer: Khurram Faraaz (was: Victoria Markman) > Support schema changes for MergeJoin operator. > -- > > Key: DRILL-4163 > URL: https://issues.apache.org/jira/browse/DRILL-4163 > Project: Apache Drill > Issue Type: Improvement >Reporter: amit hadke >Assignee: Jason Altekruse > Fix For: 1.5.0 > > > Since external sort operator supports schema changes, allow use of union > types in merge join to support for schema changes. > For now, we assume that merge join always works on record batches from sort > operator. Thus merging schemas and promoting to union vectors is already > taken care by sort operator. > Test Cases: > 1) Only one side changes schema (join on union type and primitive type) > 2) Both sids change schema on all columns. > 3) Join between numeric types and string types. > 4) Missing columns - each batch has different columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4187) Introduce a state to separate queries pending execution from those pending in the queue.
[ https://issues.apache.org/jira/browse/DRILL-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4187: - Reviewer: Chun Chang (was: Sudheesh Katkam) > Introduce a state to separate queries pending execution from those pending in > the queue. > > > Key: DRILL-4187 > URL: https://issues.apache.org/jira/browse/DRILL-4187 > Project: Apache Drill > Issue Type: Sub-task >Reporter: Hanifi Gunes >Assignee: Hanifi Gunes > Fix For: 1.5.0 > > > Currently queries pending in the queue are not listed in the web UI besides > we use the state PENDING to mean pending executions. This issue proposes i) > to list enqueued queries in the web UI ii) to introduce a new state for > queries sitting at the queue, differentiating then from those pending > execution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-2517) Apply Partition pruning before reading files during planning
[ https://issues.apache.org/jira/browse/DRILL-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-2517: - Reviewer: Kunal Khatua (was: Victoria Markman) > Apply Partition pruning before reading files during planning > > > Key: DRILL-2517 > URL: https://issues.apache.org/jira/browse/DRILL-2517 > Project: Apache Drill > Issue Type: New Feature > Components: Query Planning & Optimization >Affects Versions: 0.7.0, 0.8.0 >Reporter: Adam Gilmore >Assignee: Jinfeng Ni > Fix For: 1.6.0, Future > > > Partition pruning still tries to read Parquet files during the planning stage > even though they don't match the partition filter. > For example, if there were an invalid Parquet file in a directory that should > not be queried: > {code} > 0: jdbc:drill:zk=local> select sum(price) from dfs.tmp.purchases where dir0 = > 1; > Query failed: IllegalArgumentException: file:/tmp/purchases/4/0_0_0.parquet > is not a Parquet file (too small) > {code} > The reason is that the partition pruning happens after the Parquet plugin > tries to read the footer of each file. > Ideally, partition pruning would happen first before the format plugin gets > involved. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4490) Count(*) function returns as optional instead of required
[ https://issues.apache.org/jira/browse/DRILL-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4490: - Reviewer: Krystal > Count(*) function returns as optional instead of required > - > > Key: DRILL-4490 > URL: https://issues.apache.org/jira/browse/DRILL-4490 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.6.0 >Reporter: Krystal >Assignee: Sean Hsuan-Yi Chu > Fix For: 1.7.0 > > > git.commit.id.abbrev=c8a7840 > I have the following CTAS query: > create table test as select count(*) as col1 from cp.`tpch/orders.parquet`; > The schema of the test table shows col1 as optional: > message root { > optional int64 col1; > } -- This message was sent by Atlassian JIRA (v6.3.4#6332)