[jira] [Updated] (DRILL-3091) Cancelled query continues to list on Drill UI with CANCELLATION_REQUESTED state

2017-08-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3091:
-
Reviewer: Khurram Faraaz

> Cancelled query continues to list on Drill UI with CANCELLATION_REQUESTED 
> state
> ---
>
> Key: DRILL-3091
> URL: https://issues.apache.org/jira/browse/DRILL-3091
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.0.0
>Reporter: Abhishek Girish
> Fix For: Future
>
> Attachments: drillbit.log
>
>
> A long running query (TPC-DS SF 100 - query 2) continues to be listed on the 
> Drill UI query profile page, among the list of running queries. It's been 
> more than 30 minutes as of this report. 
> TOP -p  showed no activity after the cancellation. And 
> Jstack on all nodes did not contain the queryID. 
> I can share more details for repro. 
> Git.Commit.ID: 583ca4a (May 14 build)
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5293) Poor performance of Hash Table due to same hash value as distribution below

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5293:
-
Reviewer: Kunal Khatua  (was: Chunhui Shi)

> Poor performance of Hash Table due to same hash value as distribution below
> ---
>
> Key: DRILL-5293
> URL: https://issues.apache.org/jira/browse/DRILL-5293
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Affects Versions: 1.8.0
>Reporter: Boaz Ben-Zvi
>Assignee: Boaz Ben-Zvi
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> The computation of the hash value is basically the same whether for the Hash 
> Table (used by Hash Agg, and Hash Join), or for distribution of rows at the 
> exchange. As a result, a specific Hash Table (in a parallel minor fragment) 
> gets only rows "filtered out" by the partition below ("upstream"), so the 
> pattern of this filtering leads to a non uniform usage of the hash buckets in 
> the table.
>   Here is a simplified example: An exchange partitions into TWO (minor 
> fragments), each running a Hash Agg. So the partition sends rows of EVEN hash 
> values to the first, and rows of ODD hash values to the second. Now the first 
> recomputes the _same_ hash value for its Hash table -- and only the even 
> buckets get used !!  (Or with a partition into EIGHT -- possibly only one 
> eighth of the buckets would be used !! ) 
>This would lead to longer hash chains and thus a _poor performance_ !
> A possible solution -- add a distribution function distFunc (only for 
> partitioning) that takes the hash value and "scrambles" it so that the 
> entropy in all the bits effects the low bits of the output. This function 
> should be applied (in HashPrelUtil) over the generated code that produces the 
> hash value, like:
>distFunc( hash32(field1, hash32(field2, hash32(field3, 0))) );
> Tested with a huge hash aggregate (64 M rows) and a parallelism of 8 ( 
> planner.width.max_per_node = 8 ); minor fragments 0 and 4 used only 1/8 of 
> their buckets, the others used 1/4 of their buckets.  Maybe the reason for 
> this variance is that distribution is using "hash32AsDouble" and hash agg is 
> using "hash32".  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5290) Provide an option to build operator table once for built-in static functions and reuse it across queries.

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5290:
-
Reviewer: Kunal Khatua  (was: Sudheesh Katkam)

> Provide an option to build operator table once for built-in static functions 
> and reuse it across queries.
> -
>
> Key: DRILL-5290
> URL: https://issues.apache.org/jira/browse/DRILL-5290
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> Currently, DrillOperatorTable which contains standard SQL operators and 
> functions and Drill User Defined Functions (UDFs) (built-in and dynamic) gets 
> built for each query as part of creating QueryContext. This is an expensive 
> operation ( ~30 msec to build) and allocates  ~2M on heap for each query. For 
> high throughput, low latency operational queries, we quickly run out of heap 
> memory, causing JVM hangs. Build operator table once during startup for 
> static built-in functions and save in DrillbitContext, so we can reuse it 
> across queries.
> Provide a system/session option to not use dynamic UDFs so we can use the 
> operator table saved in DrillbitContext and avoid building each time.
> *Please note, changes are adding new option exec.udf.use_dynamic which needs 
> to be documented.*



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5287) Provide option to skip updates of ephemeral state changes in Zookeeper

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5287:
-
Reviewer: Kunal Khatua  (was: Sudheesh Katkam)

> Provide option to skip updates of ephemeral state changes in Zookeeper
> --
>
> Key: DRILL-5287
> URL: https://issues.apache.org/jira/browse/DRILL-5287
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> We put transient profiles in zookeeper and update state as query progresses 
> and changes states. It is observed that this adds latency of ~45msec for each 
> update in the query execution path. This gets even worse when high number of 
> concurrent queries are in progress. For concurrency=100, the average query 
> response time even for short queries  is 8 sec vs 0.2 sec with these updates 
> disabled. For short lived queries in a high-throughput scenario, it is of no 
> value to update state changes in zookeeper. We need an option to disable 
> these updates for short running operational queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5304) Queries fail intermittently when there is skew in data distribution

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5304:
-
Reviewer: Abhishek Girish  (was: Jinfeng Ni)

> Queries fail intermittently when there is skew in data distribution
> ---
>
> Key: DRILL-5304
> URL: https://issues.apache.org/jira/browse/DRILL-5304
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.10.0
>Reporter: Abhishek Girish
>Assignee: Padma Penumarthy
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
> Attachments: query1_drillbit.log.txt, query2_drillbit.log.txt
>
>
> In a distributed environment, we've observed certain queries to fail 
> execution intermittently, with an assignment logic issue, when the underlying 
> data is skewed w.r.t distribution. 
> For example the TPC-H [query 
> 7|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Advanced/tpch/tpch_sf100/parquet/07.q]
>  failed with the below error:
> {code}
> java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: 
> MinorFragmentId 105 has no read entries assigned
> ...
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: MinorFragmentId 105 has no read entries 
> assigned
> org.apache.drill.exec.work.foreman.Foreman.run():281
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():744
>   Caused By (java.lang.IllegalArgumentException) MinorFragmentId 105 has no 
> read entries assigned
> {code}
> Log containing full stack trace is attached.
> And for this query, the underlying TPC-H SF100 Parquet dataset was observed 
> to be located mostly only on 2-3 nodes on an 8 node DFS environment. The data 
> distribution skew on this cluster is most likely the triggering factor for 
> this case, as the same query, on the same dataset does not show this failure 
> on a different test cluster (with possibly different data distribution). 
> Also, another 
> [query|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/limit0/window_functions/bugs/data/drill-3700.sql]
>  failed with a similar error when slice target was set to 1. 
> {code}
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: 
> MinorFragmentId 66 has no read entries assigned
> ...
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: MinorFragmentId 66 has no read entries 
> assigned
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5273) CompliantTextReader exhausts 4 GB memory when reading 5000 small files

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5273:
-
Reviewer: Kunal Khatua  (was: Chunhui Shi)

> CompliantTextReader exhausts 4 GB memory when reading 5000 small files
> --
>
> Key: DRILL-5273
> URL: https://issues.apache.org/jira/browse/DRILL-5273
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> A test case was created that consists of 5000 text files, each with a single 
> line with the file number: 1 to 5001. Each file has a single record, and at 
> most 4 characters per record.
> Run the following query:
> {code}
> SELECT * FROM `dfs.data`.`5000files/text
> {code}
> The query will fail with an OOM in the scan batch on around record 3700 on a 
> Mac with 4GB of direct memory.
> The code to read records in {ScanBatch} is complex. The following appears to 
> occur:
> * Iterate over the record readers for each file.
> * For each, call setup
> The setup code is:
> {code}
>   public void setup(OperatorContext context, OutputMutator outputMutator) 
> throws ExecutionSetupException {
> oContext = context;
> readBuffer = context.getManagedBuffer(READ_BUFFER);
> whitespaceBuffer = context.getManagedBuffer(WHITE_SPACE_BUFFER);
> {code}
> The two buffers are in direct memory. There is no code that releases the 
> buffers.
> The sizes are:
> {code}
>   private static final int READ_BUFFER = 1024*1024;
>   private static final int WHITE_SPACE_BUFFER = 64*1024;
> = 1,048,576 + 65536 = 1,114,112
> {code}
> This is exactly the amount of memory that accumulates per call to 
> {{ScanBatch.next()}}
> {code}
> Ctor: 0  -- Initial memory in constructor
> Init setup: 1114112  -- After call to first record reader setup
> Entry Memory: 1114112  -- first next() call, returns one record
> Entry Memory: 1114112  -- second next(), eof and start second reader
> Entry Memory: 2228224 -- third next(), second reader returns EOF
> ...
> {code}
> If we leak 1 MB per file, with 5000 files we would leak 5 GB of memory, which 
> would explain the OOM when given only 4 GB.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5263) Prevent left NLJoin with non scalar subqueries

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5263:
-
Reviewer: Abhishek Girish  (was: Aman Sinha)

> Prevent left NLJoin with non scalar subqueries
> --
>
> Key: DRILL-5263
> URL: https://issues.apache.org/jira/browse/DRILL-5263
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
> Attachments: tmp.tar.gz
>
>
> Nested loop join operator in Drill supports only inner join and returns 
> incorrect result for queries with left join and non scalar sub-queries. Drill 
> should throw error in this case. 
> Example:
> {code:sql}
> alter session set planner.enable_nljoin_for_scalar_only=false;
> select t2.dt, t1.fyq, t2.who, t2.event
> from t2
> left join t1 on t2.dt between t1.dts and t1.dte
> order by t2.dt;
> {code}
> Result:
> {noformat}
> +-+--+--++
> | dt  |   fyq|   who|   event|
> +-+--+--++
> | 2016-12-26  | 2016-Q2  | aperson  | had chrsitmas  |
> | 2017-01-06  | 2016-Q3  | aperson  | did somthing   |
> | 2017-01-12  | 2016-Q3  | aperson  | did somthing else  |
> +-+--+--++
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5221) cancel message is delayed until queryid or data is received

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5221:
-
Reviewer: Khurram Faraaz

> cancel message is delayed until queryid or data is received
> ---
>
> Key: DRILL-5221
> URL: https://issues.apache.org/jira/browse/DRILL-5221
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - C++
>Affects Versions: 1.9.0
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> When user is calling the cancel method of the C++ client, the client wait for 
> a message from the server to reply back with a cancellation message.
> In case of queries taking a long time to return batch results, it means 
> cancellation won't be effective until the next batch is received, instead of 
> cancelling right away the query (assuming the query id has already been 
> received, which is generally the case).
> It seems this was foreseen by [~vkorukanti] in his initial patch 
> (https://github.com/vkorukanti/drill/commit/e0ef6349aac48de5828b6d725c2cf013905d18eb)
>  but was omitted when I backported it post metadata changes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5207) Improve Parquet scan pipelining

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5207:
-
Reviewer: Kunal Khatua  (was: Sudheesh Katkam)

> Improve Parquet scan pipelining
> ---
>
> Key: DRILL-5207
> URL: https://issues.apache.org/jira/browse/DRILL-5207
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> The parquet reader's async page reader is not quite efficiently pipelined. 
> The default size of the disk read buffer is 4MB while the page reader reads 
> ~1MB at a time. The Parquet decode is also processing 1MB at a time. This 
> means the disk is idle while the data is being processed. Reducing the buffer 
> to 1MB will reduce the time the processing thread waits for the disk read 
> thread.
> Additionally, since the data to process a page may be more or less than 1MB, 
> a queue of pages will help so that the disk scan does not block (until the 
> queue is full), waiting for the processing thread.
> Additionally, the BufferedDirectBufInputStream class reads from disk as soon 
> as it is initialized. Since this is called at setup time, this increases the 
> setup time for the query and query execution does not begin until this is 
> completed.
> There are a few other inefficiencies - options are read every time a page 
> reader is created. Reading options can be expensive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5123) Write query profile after sending final response to client to improve latency

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5123:
-
Reviewer: Kunal Khatua  (was: Padma Penumarthy)

> Write query profile after sending final response to client to improve latency
> -
>
> Key: DRILL-5123
> URL: https://issues.apache.org/jira/browse/DRILL-5123
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> In testing a particular query, I used a test setup that does not write to the 
> "persistent store", causing query profiles to not be saved. I then changed 
> the config to save them (to local disk). This produced about a 200ms 
> difference in query run time as perceived by the client.
> I then moved writing the query profile _after_ sending the client the final 
> message. This resulted in an approximately 100ms savings, as perceived by the 
> client, in query run time on short (~3 sec.) queries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5121) A memory leak is observed when exact case is not specified for a column in a filter condition

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5121:
-
Reviewer: Chun Chang  (was: Paul Rogers)

> A memory leak is observed when exact case is not specified for a column in a 
> filter condition
> -
>
> Key: DRILL-5121
> URL: https://issues.apache.org/jira/browse/DRILL-5121
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.6.0, 1.8.0
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When the query SELECT XYZ from dfs.`/tmp/foo' where xYZ like "abc", is 
> executed on a setup where /tmp/foo has 2 Parquet files, 1.parquet and 
> 2.parquet, where 1.parquet has the column XYZ but 2.parquet does not, then 
> there is a memory leak. 
> This seems to happen because xYZ seem to be treated as a new column. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5097) Using store.parquet.reader.int96_as_timestamp gives IOOB whereas convert_from works

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5097:
-
Reviewer: Krystal  (was: Karthikeyan Manivannan)

> Using store.parquet.reader.int96_as_timestamp gives IOOB whereas convert_from 
> works
> ---
>
> Key: DRILL-5097
> URL: https://issues.apache.org/jira/browse/DRILL-5097
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
> Attachments: data.snappy.parquet
>
>
> Using store.parquet.reader.int96_as_timestamp gives IOOB whereas convert_from 
> works. 
> The below query succeeds:
> {code}
> select c, convert_from(d, 'TIMESTAMP_IMPALA') from 
> dfs.`/drill/testdata/parquet_timestamp/spark_generated/d3`;
> {code}
> The below query fails:
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `store.parquet.reader.int96_as_timestamp` = true;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +---+---+
> 1 row selected (0.231 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select c, d from 
> dfs.`/drill/testdata/parquet_timestamp/spark_generated/d3`;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> 131076 (expected: 0 <= readerIndex <= writerIndex <= capacity(131072))
> Fragment 0:0
> [Error Id: bd94f477-7c01-420f-8920-06263212177b on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5104) Foreman sets external sort memory allocation even for a physical plan

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5104:
-
Reviewer: Rahul Challapalli  (was: Boaz Ben-Zvi)

> Foreman sets external sort memory allocation even for a physical plan
> -
>
> Key: DRILL-5104
> URL: https://issues.apache.org/jira/browse/DRILL-5104
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> Consider the (disabled) unit test 
> {{TestSimpleExternalSort.outOfMemoryExternalSort}} which uses the physical 
> plan {{xsort/oom_sort_test.json}} that contains a setting for the amount of 
> memory to allocate:
> {code}
>{
> ...
> pop:"external-sort",
> ...
> initialAllocation: 100,
> maxAllocation: 3000
> },
> {code}
> When run, the amount of memory is set to 715827882. The reason is that code 
> was added to {{Foreman}} to compute the memory to allocate to the external 
> sort:
> {code}
>   private void runPhysicalPlan(final PhysicalPlan plan) throws 
> ExecutionSetupException {
> validatePlan(plan);
> MemoryAllocationUtilities.setupSortMemoryAllocations(plan, queryContext);
> {code}
> The problem is that a physical plan should execute as provided to enable 
> detailed testing.
> To solve this problem, move the sort memory setup to the path taken by SQL 
> queries, but not via physical plans.
> This change is necessary to re-enable the previously-disabled external sort 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5098) Improving fault tolerance for connection between client and foreman node.

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5098:
-
Reviewer: Chun Chang  (was: Paul Rogers)

> Improving fault tolerance for connection between client and foreman node.
> -
>
> Key: DRILL-5098
> URL: https://issues.apache.org/jira/browse/DRILL-5098
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - JDBC
>Reporter: Sorabh Hamirwasia
>Assignee: Sorabh Hamirwasia
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> With DRILL-5015 we allowed support for specifying multiple Drillbits in 
> connection string and randomly choosing one out of it. Over time some of the 
> Drillbits specified in the connection string may die and the client can fail 
> to connect to Foreman node if random selection happens to be of dead Drillbit.
> Even if ZooKeeper is used for selecting a random Drillbit from the registered 
> one there is a small window when client selects one Drillbit and then that 
> Drillbit went down. The client will fail to connect to this Drillbit and 
> error out. 
> Instead if we try multiple Drillbits (configurable tries count through 
> connection string) then the probability of hitting this error window will 
> reduce in both the cases improving fault tolerance. During further 
> investigation it was also found that if there is Authentication failure then 
> we throw that error as generic RpcException. We need to improve that as well 
> to capture this case explicitly since in case of Auth failure we don't want 
> to try multiple Drillbits.
> Connection string example with new parameter:
> jdbc:drill:drillbit=[:][,[:]...;tries=5



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5080) Create a memory-managed version of the External Sort operator

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5080:
-
Reviewer: Rahul Challapalli  (was: Boaz Ben-Zvi)

> Create a memory-managed version of the External Sort operator
> -
>
> Key: DRILL-5080
> URL: https://issues.apache.org/jira/browse/DRILL-5080
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
> Attachments: ManagedExternalSortDesign.pdf
>
>
> We propose to create a "managed" version of the external sort operator that 
> works to a clearly-defined memory limit. Attached is a design specification 
> for the work.
> The project will include fixing a number of bugs related to the external 
> sort, include as sub-tasks of this umbrella task.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5081) Excessive info level logging introduced in DRILL-4203

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5081:
-
Reviewer: Krystal  (was: Sudheesh Katkam)

> Excessive info level logging introduced in DRILL-4203
> -
>
> Key: DRILL-5081
> URL: https://issues.apache.org/jira/browse/DRILL-5081
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sudheesh Katkam
>Assignee: Vitalii Diravka
> Fix For: 1.10.0
>
>
> Excessive info level logging introduced in 
> [8461d10|https://github.com/apache/drill/commit/8461d10b4fd6ce56361d1d826bb3a38b6dc8473c].
>  A line is printed for every row group being read, and for every metadata 
> file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5065) Optimize count(*) queries on MapR-DB JSON Tables

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5065:
-
Reviewer: Rahul Challapalli

> Optimize count(*) queries on MapR-DB JSON Tables
> 
>
> Key: DRILL-5065
> URL: https://issues.apache.org/jira/browse/DRILL-5065
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - MapRDB
>Affects Versions: 1.9.0
> Environment: Clusters with MapR v5.2.0 and above
>Reporter: Abhishek Girish
>Assignee: Smidth Panchamia
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> The JSON FileReader optimizes count(* ) queries, by only counting the number 
> of records in the files and discarding the data. This makes the query 
> execution faster & efficient. 
> We need a similar feature in the MapR format plugin (maprdb) to optimize _id 
> only projection & count(* ) queries on MapR-DB JSON Tables.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5048) Fix type mismatch error in case statement with null timestamp

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5048:
-
Reviewer: Krystal  (was: Gautam Kumar Parai)

> Fix type mismatch error in case statement with null timestamp
> -
>
> Key: DRILL-5048
> URL: https://issues.apache.org/jira/browse/DRILL-5048
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> AssertionError when we use case with timestamp and null:
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> SELECT res, CASE res WHEN true THEN 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) ELSE null END
> . . . . . . . . . . . . . . > FROM
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > SELECT
> . . . . . . . . . . . . . . > (CASE WHEN (false) THEN null ELSE 
> CAST('1990-10-10 22:40:50' AS TIMESTAMP) END) res
> . . . . . . . . . . . . . . > FROM (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> [Error Id: b56e0a4d-2f9e-4afd-8c60-5bc2f9d31f8f on centos-01.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> Caused by: java.lang.AssertionError: Type mismatch:
> rowtype of new rel:
> RecordType(TIMESTAMP(0) NOT NULL res, TIMESTAMP(0) EXPR$1) NOT NULL
> rowtype of set:
> RecordType(TIMESTAMP(0) res, TIMESTAMP(0) EXPR$1) NOT NULL
> at org.apache.calcite.plan.RelOptUtil.equal(RelOptUtil.java:1696) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSubset.add(RelSubset.java:295) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at org.apache.calcite.plan.volcano.RelSet.add(RelSet.java:147) 
> ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1818)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1760)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1017)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1037)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1940)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:138)
>  ~[calcite-core-1.4.0-drill-r18.jar:1.4.0-drill-r18]
> ... 16 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5051) DRILL-5051: Fix incorrect result returned in nest query with offset specified

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5051:
-
Reviewer: Rahul Challapalli  (was: Sudheesh Katkam)

> DRILL-5051: Fix incorrect result returned in nest query with offset specified
> -
>
> Key: DRILL-5051
> URL: https://issues.apache.org/jira/browse/DRILL-5051
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.8.0
> Environment: Fedora 24 / OpenJDK 8
>Reporter: Hongze Zhang
>Assignee: Hongze Zhang
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> My SQl:
> select count(1) from (select id from (select id from 
> cp.`tpch/lineitem.parquet` LIMIT 2) limit 1 offset 1) 
> This SQL returns nothing.
> Something goes wrong in LimitRecordBatch.java, and the reason is different 
> with [DRILL-4884|https://issues.apache.org/jira/browse/DRILL-4884?filter=-2]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5043) Function that returns a unique id per session/connection similar to MySQL's CONNECTION_ID()

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala reassigned DRILL-5043:


Assignee: Arina Ielchiieva
Reviewer: Krystal  (was: Arina Ielchiieva)

> Function that returns a unique id per session/connection similar to MySQL's 
> CONNECTION_ID()
> ---
>
> Key: DRILL-5043
> URL: https://issues.apache.org/jira/browse/DRILL-5043
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: Nagarajan Chinnasamy
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: CONNECTION_ID, SESSION, UDF, doc-impacting
> Fix For: 1.10.0
>
> Attachments: 01_session_id_sqlline.png, 
> 02_session_id_webconsole_query.png, 03_session_id_webconsole_result.png
>
>
> Design and implement a function that returns a unique id per 
> session/connection similar to MySQL's CONNECTION_ID().
> *Implementation details*
> function *session_id* will be added. Function returns current session unique 
> id represented as string. Parameter {code:java} boolean isNiladic{code} will 
> be added to UDF FunctionTemplate to indicate if a function is niladic (a 
> function to be called without any parameters and parentheses)
> Please note, this function will override columns that have the same name. 
> Table alias should be used to retrieve column value from table.
> Example:
> {code:sql}select session_id from   // returns the value of niladic 
> function session_id {code} 
> {code:sql}select t1.session_id from  t1 // returns session_id column 
> value from table {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-5032) Drill query on hive parquet table failed with OutOfMemoryError: Java heap space

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5032:
-
Reviewer: Rahul Challapalli  (was: Jinfeng Ni)

> Drill query on hive parquet table failed with OutOfMemoryError: Java heap 
> space
> ---
>
> Key: DRILL-5032
> URL: https://issues.apache.org/jira/browse/DRILL-5032
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Hive
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
> Attachments: plan, plan with fix
>
>
> Following query on hive parquet table failed with OOM Java heap space:
> {code}
> select distinct(businessdate) from vmdr_trades where trade_date='2016-04-12'
> 2016-08-31 08:02:03,597 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.drill.exec.work.foreman.Foreman - Query text for query id 
> 283938c3-fde8-0fc6-37e1-9a568c7f5913: select distinct(businessdate) from 
> vmdr_trades where trade_date='2016-04-12'
> 2016-08-31 08:05:58,502 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
> 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 1 ms
> 2016-08-31 08:05:58,506 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 3 ms
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$2
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2016-08-31 08:05:58,663 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> 2016-08-31 08:05:58,664 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning 
> class: 
> org.apache.drill.exec.planner.sql.logical.HivePushPartitionFilterIntoScan$1
> 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze 
> filter tree: 0 ms
> 2016-08-31 08:05:58,665 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] INFO  
> o.a.d.e.p.l.partition.PruneScanRule - No conditions were found eligible for 
> partition pruning.Total pruning elapsed time: 0 ms
> 2016-08-31 08:09:42,355 [283938c3-fde8-0fc6-37e1-9a568c7f5913:foreman] ERROR 
> o.a.drill.common.CatastrophicFailure - Catastrophic Failure Occurred, 
> exiting. Information message: Unable to handle out of memory condition in 
> Foreman.
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:3332) ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
>  ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
>  ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:136) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:76) 
> ~[na:1.8.0_74]
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:457) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:166) 
> ~[na:1.8.0_74]
> at java.lang.StringBuilder.append(StringBuilder.java:76) 
> ~[na:1.8.0_74]
> at 
> com.google.protobuf.TextFormat$TextGenerator.write(TextFormat.java:538) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$TextGenerator.print(TextFormat.java:526) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:389) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327) 
> ~[protobuf-java-2.5.0.jar:na]
> at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286) 
> ~[protobuf-java-2.5.0.jar:na]
> at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273) 
> 

[jira] [Updated] (DRILL-5034) Select timestamp from hive generated parquet always return in UTC

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-5034:
-
Reviewer: Krystal  (was: Karthikeyan Manivannan)

> Select timestamp from hive generated parquet always return in UTC
> -
>
> Key: DRILL-5034
> URL: https://issues.apache.org/jira/browse/DRILL-5034
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Krystal
>Assignee: Vitalii Diravka
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> commit id: 5cea9afa6278e21574c6a982ae5c3d82085ef904
> Reading timestamp data against a hive parquet table from drill automatically 
> converts the timestamp data to UTC. 
> {code}
> SELECT TIMEOFDAY() FROM (VALUES(1));
> +--+
> |EXPR$0|
> +--+
> | 2016-11-10 12:33:26.547 America/Los_Angeles  |
> +--+
> {code}
> data schema:
> {code}
> message hive_schema {
>   optional int32 voter_id;
>   optional binary name (UTF8);
>   optional int32 age;
>   optional binary registration (UTF8);
>   optional fixed_len_byte_array(3) contributions (DECIMAL(6,2));
>   optional int32 voterzone;
>   optional int96 create_timestamp;
>   optional int32 create_date (DATE);
> }
> {code}
> Using drill-1.8, the returned timestamps match the table data:
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> `/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-23 20:03:58.0  |
> | null   |
> | 2016-09-09 12:01:18.0  |
> | 2017-03-06 20:35:55.0  |
> | 2017-01-20 22:32:43.0  |
> ++
> 5 rows selected (1.032 seconds)
> {code}
> If the user timzone is changed to UTC, then the timestamp data is returned in 
> UTC time.
> Using drill-1.9, the returned timestamps got converted to UTC eventhough the 
> user timezone is in PST.
> {code}
> select convert_from(create_timestamp, 'TIMESTAMP_IMPALA') from 
> dfs.`/user/hive/warehouse/voter_hive_parquet` limit 5;
> ++
> | EXPR$0 |
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
> {code}
> alter session set `store.parquet.reader.int96_as_timestamp`=true;
> +---+---+
> |  ok   |  summary  |
> +---+---+
> | true  | store.parquet.reader.int96_as_timestamp updated.  |
> +---+---+
> select create_timestamp from dfs.`/user/hive/warehouse/voter_hive_parquet` 
> limit 5;
> ++
> |create_timestamp|
> ++
> | 2016-10-24 03:03:58.0  |
> | null   |
> | 2016-09-09 19:01:18.0  |
> | 2017-03-07 04:35:55.0  |
> | 2017-01-21 06:32:43.0  |
> ++
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4987) Use ImpersonationUtil in RemoteFunctionRegistry

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4987:
-
Reviewer: Chun Chang

> Use ImpersonationUtil in RemoteFunctionRegistry
> ---
>
> Key: DRILL-4987
> URL: https://issues.apache.org/jira/browse/DRILL-4987
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Functions - Drill
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>Priority: Minor
> Fix For: 1.10.0
>
>
> + Use ImpersonationUtil#getProcessUserName rather than  
> UserGroupInformation#getCurrentUser#getUserName in RemoteFunctionRegistry
> + Expose process users' group info in ImpersonationUtil and use that in 
> RemoteFunctionRegistry, rather than 
> UserGroupInformation#getCurrentUser#getGroupNames



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4980) Upgrading of the approach of parquet date correctness status detection

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4980:
-
Reviewer:   (was: Parth Chandra)

> Upgrading of the approach of parquet date correctness status detection
> --
>
> Key: DRILL-4980
> URL: https://issues.apache.org/jira/browse/DRILL-4980
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Parquet
>Affects Versions: 1.9.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
> Fix For: 1.10.0
>
>
> This jira is an addition for the 
> [DRILL-4203|https://issues.apache.org/jira/browse/DRILL-4203].
> The date correctness label for the new generated parquet files should be 
> upgraded. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4938) Report UserException when constant expression reduction fails

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4938:
-
Reviewer: Khurram Faraaz  (was: Boaz Ben-Zvi)

> Report UserException when constant expression reduction fails
> -
>
> Key: DRILL-4938
> URL: https://issues.apache.org/jira/browse/DRILL-4938
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.9.0
>Reporter: Khurram Faraaz
>Assignee: Serhii Harnyk
>Priority: Minor
> Fix For: 1.10.0
>
>
> We need a better error message instead of DrillRuntimeException
> Drill 1.9.0 git commit ID : 4edabe7a
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select (res1 = 2016/09/22) res2
> . . . . . . . . . . . . . . > from
> . . . . . . . . . . . . . . > (
> . . . . . . . . . . . . . . > select (case when (false) then null else 
> cast('2016/09/22' as date) end) res1
> . . . . . . . . . . . . . . > from (values(1)) foo
> . . . . . . . . . . . . . . > ) foobar;
> Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing 
> expression in constant expression evaluator [CASE(false, =(null, /(/(2016, 
> 9), 22)), =(CAST('2016/09/22'):DATE NOT NULL, /(/(2016, 9), 22)))].  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTIMESTAMP(INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTIMESTAMP(INT-REQUIRED)].  Full expression: --UNKNOWN EXPRESSION--.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4956) Temporary tables support

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4956:
-
Reviewer: Khurram Faraaz  (was: Paul Rogers)

> Temporary tables support
> 
>
> Key: DRILL-4956
> URL: https://issues.apache.org/jira/browse/DRILL-4956
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.10.0
>
>
> Link to design doc - 
> https://docs.google.com/document/d/1gSRo_w6q2WR5fPx7SsQ5IaVmJXJ6xCOJfYGyqpVOC-g/edit
> Gist - 
> https://gist.github.com/arina-ielchiieva/50158175867a18eee964b5ba36455fbf#file-temporarytablessupport-md
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-4935) Allow drillbits to advertise a configurable host address to Zookeeper

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala reassigned DRILL-4935:


Assignee: Abhishek Girish
Reviewer: Abhishek Girish  (was: Khurram Faraaz)

> Allow drillbits to advertise a configurable host address to Zookeeper
> -
>
> Key: DRILL-4935
> URL: https://issues.apache.org/jira/browse/DRILL-4935
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Execution - RPC
>Affects Versions: 1.8.0
>Reporter: Harrison Mebane
>Assignee: Abhishek Girish
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> There are certain situations, such as running Drill in distributed Docker 
> containers, in which it is desirable to advertise a different hostname to 
> Zookeeper than would be output by INetAddress.getLocalHost().  I propose 
> adding a configuration variable 'drill.exec.rpc.bit.advertised.host' and 
> passing this address to Zookeeper when the configuration variable is 
> populated, otherwise falling back to the present behavior.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4272) When sort runs out of memory and query fails, resources are seemingly not freed

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4272:
-
Reviewer: Rahul Challapalli

> When sort runs out of memory and query fails, resources are seemingly not 
> freed
> ---
>
> Key: DRILL-4272
> URL: https://issues.apache.org/jira/browse/DRILL-4272
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
>Assignee: Paul Rogers
>Priority: Critical
> Fix For: 1.10.0
>
>
> Executed query11.sql from resources/Advanced/tpcds/tpcds_sf1/original/parquet
> Query runs out of memory:
> {code}
> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing 
> the query.
> Unable to allocate sv2 for 32768 records, and not enough batchGroups to spill.
> batchGroups.size 1
> spilledBatchGroups.size 0
> allocated memory 19961472
> allocator limit 2000
> Fragment 19:0
> [Error Id: 87aa32b8-17eb-488e-90cb-5f5b9aec on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> And leaves fragments running, holding resources:
> {code}
> 2016-01-14 22:46:32,435 [Drillbit-ShutdownHook#0] INFO  
> o.apache.drill.exec.server.Drillbit - Received shutdown request.
> 2016-01-14 22:46:32,546 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-136.qa.lab no longer 
> active.  Cancelling fragment 2967db08-cd38-925a-4960-9e881f537af8:19:0.
> 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2967db08-cd38-925a-4960-9e881f537af8:19:0: State change requested 
> CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED
> 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2967db08-cd38-925a-4960-9e881f537af8:19:0: Ignoring unexpected state 
> transition CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED
> 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Foreman atsqa4-136.qa.lab no longer 
> active.  Cancelling fragment 2967db08-cd38-925a-4960-9e881f537af8:17:0.
> 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] INFO  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2967db08-cd38-925a-4960-9e881f537af8:17:0: State change requested 
> CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED
> 2016-01-14 22:46:32,547 [Curator-ServiceCache-0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - 
> 2967db08-cd38-925a-4960-9e881f537af8:17:0: Ignoring unexpected state 
> transition CANCELLATION_REQUESTED --> CANCELLATION_REQUESTED
> 2016-01-14 22:46:33,563 [BitServer-1] INFO  
> o.a.d.exec.rpc.control.ControlClient - Channel closed /10.10.88.134:59069 
> <--> atsqa4-136.qa.lab/10.10.88.136:31011.
> 2016-01-14 22:46:33,563 [BitClient-1] INFO  
> o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:34802 <--> 
> atsqa4-136.qa.lab/10.10.88.136:31012.
> 2016-01-14 22:46:33,590 [BitClient-1] INFO  
> o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:36937 <--> 
> atsqa4-135.qa.lab/10.10.88.135:31012.
> 2016-01-14 22:46:33,595 [BitClient-1] INFO  
> o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:53860 <--> 
> atsqa4-133.qa.lab/10.10.88.133:31012.
> 2016-01-14 22:46:38,467 [BitClient-1] INFO  
> o.a.drill.exec.rpc.data.DataClient - Channel closed /10.10.88.134:48276 <--> 
> atsqa4-134.qa.lab/10.10.88.134:31012.
> 2016-01-14 22:46:39,470 [pool-6-thread-1] INFO  
> o.a.drill.exec.rpc.user.UserServer - closed eventLoopGroup 
> io.netty.channel.nio.NioEventLoopGroup@6fb32dfb in 1003 ms
> 2016-01-14 22:46:39,470 [pool-6-thread-2] INFO  
> o.a.drill.exec.rpc.data.DataServer - closed eventLoopGroup 
> io.netty.channel.nio.NioEventLoopGroup@5c93dd80 in 1003 ms
> 2016-01-14 22:46:39,470 [pool-6-thread-1] INFO  
> o.a.drill.exec.service.ServiceEngine - closed userServer in 1004 ms
> 2016-01-14 22:46:39,470 [pool-6-thread-2] INFO  
> o.a.drill.exec.service.ServiceEngine - closed dataPool in 1005 ms
> 2016-01-14 22:46:39,483 [Drillbit-ShutdownHook#0] WARN  
> o.apache.drill.exec.work.WorkManager - Closing WorkManager but there are 2 
> running fragments.
> 2016-01-14 22:46:41,489 [Drillbit-ShutdownHook#0] ERROR 
> o.a.d.exec.server.BootStrapContext - Pool did not terminate
> 2016-01-14 22:46:41,498 [Drillbit-ShutdownHook#0] WARN  
> o.apache.drill.exec.server.Drillbit - Failure on close()
> java.lang.RuntimeException: Exception while closing
> at 
> org.apache.drill.common.DrillAutoCloseables.closeNoChecked(DrillAutoCloseables.java:46)
>  ~[drill-common-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT]
> at 
> org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:127)
>  

[jira] [Updated] (DRILL-4919) Fix select count(1) / count(*) on csv with header

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4919:
-
Reviewer: Krystal  (was: Gautam Kumar Parai)

> Fix select count(1) / count(*) on csv with header
> -
>
> Key: DRILL-4919
> URL: https://issues.apache.org/jira/browse/DRILL-4919
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.8.0
>Reporter: F Méthot
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.10.0
>
>
> This happens since  1.8
> Dataset (I used extended char for display purpose) test.csvh:
> a,b,c,d\n
> 1,2,3,4\n
> 5,6,7,8\n
> Storage config:
> "csvh": {
>   "type": "text",
>   "extensions" : [
>   "csvh"
>],
>"extractHeader": true,
>"delimiter": ","
>   }
> select count(1) from dfs.`test.csvh`
> Error: UNSUPPORTED_OPERATION ERROR: With extractHeader enabled, only header 
> names are supported
> coumn name columns
> column index
> Fragment 0:0



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4864) Add ANSI format for date/time functions

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4864:
-
Reviewer: Krystal  (was: Paul Rogers)

> Add ANSI format for date/time functions
> ---
>
> Key: DRILL-4864
> URL: https://issues.apache.org/jira/browse/DRILL-4864
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Serhii Harnyk
>Assignee: Serhii Harnyk
>  Labels: doc-impacting
> Fix For: 1.10.0
>
>
> The TO_DATE() is exposing the Joda string formatting conventions into the SQL 
> layer. This is not following SQL conventions used by ANSI and many other 
> database engines on the market.
> Add new UDFs: 
> * sql_to_date(String, Format), 
> * sql_to_time(String, Format), 
> * sql_to_timestamp(String, Format)
> that requires Postgres datetime format.
> Table of supported Postgres patterns
> ||Pattern name||Postgres format   
> |Full name of day|day   
> |Day of year|ddd   
> |Day of month|dd
> |Day of week|d   
> |Name of month|month
> |Abr name of month|mon
> |Full era name|ee
> |Name of day|dy   
> |Time zone|tz   
> |Hour 12 |hh   
> |Hour 12 |hh12   
> |Hour 24|hh24
> |Minute of hour|mi  
> |Second of minute|ss   
> |Millisecond of minute|ms
> |Week of year|ww   
> |Month|mm   
> |Halfday am|am
> |Year   |   y   
> |ref.|
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html   |
> Table of acceptable Postgres pattern modifiers, which may be used in Format 
> string
> ||Description||Pattern||
> |fill mode (suppress padding blanks and zeroes)|fm |
> |fixed format global option (see usage notes)|fx |
> |translation mode (print localized day and month names based on 
> lc_messages)|tm |
> |spell mode (not yet implemented)|sp|
> |ref.|
> https://www.postgresql.org/docs/8.2/static/functions-formatting.html|



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4812) Wildcard queries fail on Windows

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4812:
-
Reviewer: Kunal Khatua  (was: Paul Rogers)

> Wildcard queries fail on Windows
> 
>
> Key: DRILL-4812
> URL: https://issues.apache.org/jira/browse/DRILL-4812
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.7.0
> Environment: Windows 7
>Reporter: Mike Lavender
>  Labels: easyfix, easytest, ready-to-commit, windows
> Fix For: 1.10.0
>
>
> Wildcards within the path of a query are not handled on windows and result in 
> a "String index out of range" exception.
> for example:
> {noformat}
> 0: jdbc:drill:zk=local> SELECT SUM(qty) as num FROM 
> dfs.parquet.`/trends/2016/1/*/*/3701`;
> Error: VALIDATION ERROR: String index out of range: -1
> SQL Query null
> {noformat}
> 
> The problem exists within:
> exec\java-exec\src\main\java\org\apache\drill\exec\store\dfs\FileSelection.java
> private static Path handleWildCard(final String root)
> This function is looking for the index of the system specific PATH_SEPARATOR 
> which on windows is '\' (from System.getProperty("file.separator")).  The 
> path passed in to handleWildcard will not ever have those type of path 
> separators as the Path constructor (from org.apache.hadoop.fs.Path) sets all 
> the path separators to '/'.
> NOTE:
> private static String removeLeadingSlash(String path)
> in that same file explicitly looks for '/' and does not use the system 
> specific PATH_SEPARATOR.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4764) Parquet file with INT_16, etc. logical types not supported by simple SELECT

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4764:
-
Reviewer: Rahul Challapalli  (was: Parth Chandra)

> Parquet file with INT_16, etc. logical types not supported by simple SELECT
> ---
>
> Key: DRILL-4764
> URL: https://issues.apache.org/jira/browse/DRILL-4764
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
> Attachments: int_16.parquet, int_8.parquet, uint_16.parquet, 
> uint_32.parquet, uint_8.parquet
>
>
> Create a Parquet file with the following schema:
> message int16Data { required int32 index; required int32 value (INT_16); }
> Store it as int_16.parquet in the local file system. Query it with:
> SELECT * from `local`.`root`.`int_16.parquet`;
> The result, in the web UI, is this error:
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> UnsupportedOperationException: unsupported type: INT32 INT_16 Fragment 0:0 
> [Error Id: c63f66b4-e5a9-4a35-9ceb-546b74645dd4 on 172.30.1.28:31010]
> The INT_16 logical (or "original") type simply tells consumers of the file 
> that the data is actually a 16-bit signed int. Presumably, this should tell 
> Drill to use the SmallIntVector (or NullableSmallIntVector) class for 
> storage. Without supporting this annotation, even 16-bit integers must be 
> stored as 32-bits within Drill.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4301) OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4301:
-
Reviewer: Rahul Challapalli

> OOM : Unable to allocate sv2 for 1000 records, and not enough batchGroups to 
> spill.
> ---
>
> Key: DRILL-4301
> URL: https://issues.apache.org/jira/browse/DRILL-4301
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>Assignee: Paul Rogers
> Fix For: 1.10.0
>
>
> Query below in Functional tests, fails due to OOM 
> {code}
> select * from dfs.`/drill/testdata/metadata_caching/fewtypes_boolpartition` 
> where bool_col = true;
> {code}
> Drill version : drill-1.5.0
> JAVA_VERSION=1.8.0
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOT2f0e3f27e630d5ac15cdaef808564e01708c3c55
> DRILL-4190 Don't hold on to batches from left side of merge join.   
> 20.01.2016 @ 22:30:26 UTC   Unknown 20.01.2016 @ 23:48:33 UTC
> framework/framework/resources/Functional/metadata_caching/data/bool_partition1.q
>  (connection: 808078113)
> [#1378] Query failed: 
> oadd.org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: 
> One or more nodes ran out of memory while executing the query.
> Unable to allocate sv2 for 1000 records, and not enough batchGroups to spill.
> batchGroups.size 0
> spilledBatchGroups.size 0
> allocated memory 48326272
> allocator limit 46684427
> Fragment 0:0
> [Error Id: 97d58ea3-8aff-48cf-a25e-32363b8e0ecd on drill-demod2:31010]
>   at 
> oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
>   at 
> oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285)
>   at 
> oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> oadd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
>   at 
> oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>   at 
> 

[jira] [Updated] (DRILL-4280) Kerberos Authentication

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4280:
-
Reviewer: Chun Chang  (was: Chunhui Shi)

> Kerberos Authentication
> ---
>
> Key: DRILL-4280
> URL: https://issues.apache.org/jira/browse/DRILL-4280
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Keys Botzum
>Assignee: Sudheesh Katkam
>  Labels: security
> Fix For: 1.10.0
>
>
> Drill should support Kerberos based authentication from clients. This means 
> that both the ODBC and JDBC drivers as well as the web/REST interfaces should 
> support inbound Kerberos. For Web this would most likely be SPNEGO while for 
> ODBC and JDBC this will be more generic Kerberos.
> Since Hive and much of Hadoop supports Kerberos there is a potential for a 
> lot of reuse of ideas if not implementation.
> Note that this is related to but not the same as 
> https://issues.apache.org/jira/browse/DRILL-3584 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4217) Query parquet file treat INT_16 & INT_8 as INT32

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4217:
-
Reviewer: Rahul Challapalli

> Query parquet file treat INT_16 & INT_8 as INT32
> 
>
> Key: DRILL-4217
> URL: https://issues.apache.org/jira/browse/DRILL-4217
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Data Types
>Reporter: Low Chin Wei
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Encounter this issue while trying to query a parquet file:
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> UnsupportedOperationException: unsupported type: INT32 INT_16 Fragment 1:1 
> We can treat the following Field Type as INTEGER before support of Short & 
> Byte is implemeted: 
> - INT32 INT_16
> - INT32 INT_8



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3562:
-
Reviewer: Rahul Challapalli  (was: Arina Ielchiieva)

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (DRILL-5316) C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children completed with ZOK

2017-03-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala reassigned DRILL-5316:


Assignee: Chun Chang
Reviewer: Chun Chang  (was: Sorabh Hamirwasia)

> C++ Client Crashes When drillbitsVector.count is 0 after zoo_get_children 
> completed with ZOK
> 
>
> Key: DRILL-5316
> URL: https://issues.apache.org/jira/browse/DRILL-5316
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Reporter: Rob Wu
>Assignee: Chun Chang
>Priority: Critical
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> When connecting to drillbit with Zookeeper, occasionally the C++ client would 
> crash without any reason.
> A further look into the code revealed that during this call 
> rc=zoo_get_children(p_zh.get(), m_path.c_str(), 0, ); 
> zoo_get_children returns ZOK (0) but drillbitsVector.count is 0.
> This causes drillbits to stay empty and thus 
> causes err = zook.getEndPoint(drillbits[drillbits.size() -1], endpoint); to 
> crash
> Size check should be done to prevent this from happening



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4872) NPE from CTAS partitioned by a projected casted null

2017-03-20 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4872:
-
Reviewer: Rahul Challapalli

> NPE from CTAS partitioned by a projected casted null
> 
>
> Key: DRILL-4872
> URL: https://issues.apache.org/jira/browse/DRILL-4872
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.7.0
>Reporter: Boaz Ben-Zvi
>Assignee: Arina Ielchiieva
>  Labels: NPE
> Fix For: 1.10.0
>
>
> Extracted from DRILL-3898 : Running the same test case on a smaller table ( 
> store_sales.dat from TPCDS SF 1) has no space issues, but there is a Null 
> Pointer Exception from the projection:
> Caused by: java.lang.NullPointerException: null
>   at 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.compare(ByteFunctionHelpers.java:100)
>  ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.test.generated.ProjectorGen1.doEval(ProjectorTemplate.java:49)
>  ~[na:na]
>   at 
> org.apache.drill.exec.test.generated.ProjectorGen1.projectRecords(ProjectorTemplate.java:62)
>  ~[na:na]
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork(ProjectRecordBatch.java:199)
>  ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
> A simplified version of the test case:
> 0: jdbc:drill:zk=local> create table dfs.tmp.ttt partition by ( x ) as select 
> case when columns[8] = '' then cast(null as varchar(10)) else cast(columns[8] 
> as varchar(10)) end as x FROM 
> dfs.`/Users/boazben-zvi/data/store_sales/store_sales.dat`;
> Error: SYSTEM ERROR: NullPointerException
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (DRILL-4969) Basic implementation for displaySize

2016-11-07 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4969:
-
Reviewer: Chun Chang  (was: Sudheesh Katkam)

> Basic implementation for displaySize
> 
>
> Key: DRILL-4969
> URL: https://issues.apache.org/jira/browse/DRILL-4969
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
> Fix For: 1.9.0
>
>
> display size is fixed to 10, but for most types display size is well defined 
> as shown in the ODBC table:
> https://msdn.microsoft.com/en-us/library/ms713974(v=vs.85).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4560) ZKClusterCoordinator does not call DrillbitStatusListener.drillbitRegistered for new bits

2016-11-07 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4560:
-
Reviewer: Abhishek Girish  (was: Sorabh Hamirwasia)

> ZKClusterCoordinator does not call DrillbitStatusListener.drillbitRegistered 
> for new bits
> -
>
> Key: DRILL-4560
> URL: https://issues.apache.org/jira/browse/DRILL-4560
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.9.0
>
>
> ZKClusterCoordinator notifies listeners of type DrillbitStatusListener when 
> drillbits disappear from ZooKeeper. The YARN Application Master (AM) also 
> needs to know when bits register themselves with ZK. So, ZKClusterCoordinator 
> should change to detect new Drill-bits, then call 
> DrillbitStatusListener.drillbitRegistered with the new Drill-bits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4581) Various problems in the Drill startup scripts

2016-09-14 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4581:
-
Reviewer: Abhishek Girish  (was: Sudheesh Katkam)

[~agirish]assigning for verification

> Various problems in the Drill startup scripts
> -
>
> Key: DRILL-4581
> URL: https://issues.apache.org/jira/browse/DRILL-4581
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components:  Server
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.8.0
>
>
> Noticed the following in drillbit.sh:
> 1) Comment: DRILL_LOG_DIRWhere log files are stored.  PWD by default.
> Code: DRILL_LOG_DIR=/var/log/drill or, if it does not exist, $DRILL_HOME/log
> 2) Comment: DRILL_PID_DIRThe pid files are stored. /tmp by default.
> Code: DRILL_PID_DIR=$DRILL_HOME
> 3) Redundant checking of JAVA_HOME. drillbit.sh sources drill-config.sh which 
> checks JAVA_HOME. Later, drillbit.sh checks it again. The second check is 
> both unnecessary and prints a less informative message than the 
> drill-config.sh check. Suggestion: Remove the JAVA_HOME check in drillbit.sh.
> 4) Though drill-config.sh carefully checks JAVA_HOME, it does not export the 
> JAVA_HOME variable. Perhaps this is why drillbit.sh repeats the check? 
> Recommended: export JAVA_HOME from drill-config.sh.
> 5) Both drillbit.sh and the sourced drill-config.sh check DRILL_LOG_DIR and 
> set the default value. Drill-config.sh defaults to /var/log/drill, or if that 
> fails, to $DRILL_HOME/log. Drillbit.sh just sets /var/log/drill and does not 
> handle the case where that directory is not writable. Suggested: remove the 
> check in drillbit.sh.
> 6) Drill-config.sh checks the writability of the DRILL_LOG_DIR by touching 
> sqlline.log, but does not delete that file, leaving a bogus, empty client log 
> file on the drillbit server. Recommendation: use bash commands instead.
> 7) The implementation of the above check is a bit awkward. It has a fallback 
> case with somewhat awkward logic. Clean this up.
> 8) drillbit.sh, but not drill-config.sh, attempts to create /var/log/drill if 
> it does not exist. Recommended: decide on a single choice, implement it in 
> drill-config.sh.
> 9) drill-config.sh checks if $DRILL_CONF_DIR is a directory. If not, defaults 
> it to $DRILL_HOME/conf. This can lead to subtle errors. If I use
> drillbit.sh --config /misspelled/path
> where I mistype the path, I won't get an error, I get the default config, 
> which may not at all be what I want to run. Recommendation: if the value of 
> DRILL_CONF_DRILL is passed into the script (as a variable or via --config), 
> then that directory must exist. Else, use the default.
> 10) drill-config.sh exports, but may not set, HADOOP_HOME. This may be left 
> over from the original Hadoop script that the Drill script was based upon. 
> Recomendation: export only in the case that HADOOP_HOME is set for cygwin.
> 11) Drill-config.sh checks JAVA_HOME and prints a big, bold error message to 
> stderr if JAVA_HOME is not set. Then, it checks the Java version and prints a 
> different message (to stdout) if the version is wrong. Recommendation: use 
> the same format (and stderr) for both.
> 12) Similarly, other Java checks later in the script produce messages to 
> stdout, not stderr.
> 13) Drill-config.sh searches $JAVA_HOME to find java/java.exe and verifies 
> that it is executable. The script then throws away what we just found. Then, 
> drill-bit.sh tries to recreate this information as:
> JAVA=$JAVA_HOME/bin/java
> This is wrong in two ways: 1) it ignores the actual java location and assumes 
> it, and 2) it does not handle the java.exe case that drill-config.sh 
> carefully worked out.
> Recommendation: export JAVA from drill-config.sh and remove the above line 
> from drillbit.sh.
> 14) drillbit.sh presumably takes extra arguments like this:
> drillbit.sh -Dvar0=value0 --config /my/conf/dir start -Dvar1=value1 
> -Dvar2=value2 -Dvar3=value3
> The -D bit allows the user to override config variables at the command line. 
> But, the scripts don't use the values.
> A) drill-config.sh consumes --config /my/conf/dir after consuming the leading 
> arguments:
> while [ $# -gt 1 ]; do
>   if [ "--config" = "$1" ]; then
> shift
> confdir=$1
> shift
> DRILL_CONF_DIR=$confdir
>   else
> # Presume we are at end of options and break
> break
>   fi
> done
> B) drill-bit.sh will discard the var1:
> startStopStatus=$1 <-- grabs "start"
> shift
> command=drillbit
> shift   <-- Consumes -Dvar1=value1
> C) Remaining values passed back into drillbit.sh:
> args=$@
> nohup $thiscmd internal_start $command $args
> D) Second invocation discards -Dvar2=value2 as described above.
> E) Remaining values 

[jira] [Updated] (DRILL-4857) When no partition pruning occurs with metadata caching there's a performance regression

2016-08-30 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4857:
-
Reviewer: Dechang Gu

> When no partition pruning occurs with metadata caching there's a performance 
> regression
> ---
>
> Key: DRILL-4857
> URL: https://issues.apache.org/jira/browse/DRILL-4857
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata, Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.8.0
>
>
> After DRILL-4530, we see the (expected) performance improvements in planning 
> time with metadata cache for cases where partition pruning got applied.  
> However, in cases where it did not get applied and for sufficiently large 
> number of files (tested with up to 400K files),  there's performance 
> regression.  Part of this was addressed by DRILL-4846.   This JIRA is to 
> track some remaining fixes to address the regression.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4854) Incorrect logic in log directory checks in drill-config.sh

2016-08-30 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4854:
-
Reviewer: Abhishek Girish

> Incorrect logic in log directory checks in drill-config.sh
> --
>
> Key: DRILL-4854
> URL: https://issues.apache.org/jira/browse/DRILL-4854
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.8.0
>
>
> The recent changes to the launch scripts introduced a subtle bug in the logic 
> that verifies the log directory:
>   if [[ ! -d "$DRILL_LOG_DIR" && ! -w "$DRILL_LOG_DIR" ]]; then
> ...
> if [[ ! -d "$DRILL_LOG_DIR" && ! -w "$DRILL_LOG_DIR" ]]; then
> In both cases, the operator should be or ("||").
> That is, if either the item is not a directory, or it is a directory but is 
> not writable, then do the fall-back steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4846) Eliminate extra operations during metadata cache pruning

2016-08-30 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4846:
-
Reviewer: Dechang Gu

> Eliminate extra operations during metadata cache pruning
> 
>
> Key: DRILL-4846
> URL: https://issues.apache.org/jira/browse/DRILL-4846
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Metadata
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.8.0
>
>
> While doing performance testing for DRILL-4530 using a new data set and 
> queries, we found two potential performance issues: (a) the metadata cache 
> was being read twice in some cases and (b) the checking for directory 
> modification time was being done twice, once as part of the first phase of 
> directory-based pruning and subsequently after the second phase pruning.   
> This check gets expensive for large number of directories.   Creating this 
> JIRA to track fixes for these issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4727) Exclude netty from HBase Client's transitive dependencies

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4727:
-
Reviewer: Abhishek Girish

> Exclude netty from HBase Client's transitive dependencies
> -
>
> Key: DRILL-4727
> URL: https://issues.apache.org/jira/browse/DRILL-4727
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Affects Versions: 1.7.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 1.7.0
>
>
> Reported on dev/user list after moving to HBase 1.1
> {noformat}
> Hi Aditya,
> I tested the latest version and got this exception and the drillbit fail to 
> startup .
> Exception in thread "main" java.lang.NoSuchMethodError: 
> io.netty.util.UniqueName.(Ljava/lang/String;)V
> at io.netty.channel.ChannelOption.(ChannelOption.java:136)
> at io.netty.channel.ChannelOption.valueOf(ChannelOption.java:99)
> at io.netty.channel.ChannelOption.(ChannelOption.java:42)
> at org.apache.drill.exec.rpc.BasicServer.(BasicServer.java:63)
> at 
> org.apache.drill.exec.rpc.user.UserServer.(UserServer.java:74)
> at 
> org.apache.drill.exec.service.ServiceEngine.(ServiceEngine.java:78)
> at org.apache.drill.exec.server.Drillbit.(Drillbit.java:108)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:285)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:271)
> at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:267)
> It will working if I remove jars/3rdparty/netty-all-4.0.23.Final.jar, the 
> drill can startup. I think there have some package dependency version issue, 
> do you think so ?
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4768) Drill may leak hive meta store connection if hive meta store client call hits error

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4768:
-
Reviewer: Chun Chang

> Drill may leak hive meta store connection if hive meta store client call hits 
> error
> ---
>
> Key: DRILL-4768
> URL: https://issues.apache.org/jira/browse/DRILL-4768
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
> Fix For: 1.8.0
>
>
> We are seeing one drillbit creates hundreds of connections to hive meta 
> store. This indicates that drill is leaking those connection, and did not 
> close those connections properly. When such leaking happens, it may prevent 
> other applications from connecting to hive meta store. 
> It seems one cause of leaking connection happens when hive meta store client 
> call hits exception. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4801) Setting extractHeader attribute for CSV format does not propagate to all drillbits

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4801:
-
Reviewer: Krystal

> Setting extractHeader attribute for CSV format does not propagate to all 
> drillbits 
> ---
>
> Key: DRILL-4801
> URL: https://issues.apache.org/jira/browse/DRILL-4801
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - CLI, Client - HTTP
>Affects Versions: 1.7.0
>Reporter: Krystal
>Assignee: Arina Ielchiieva
> Fix For: 1.8.0
>
>
> I have multiple drillbits running.  From web UI of one drillbit, I added 
> "extractHeader": true to the csv format.  I logged to the Web UI of a 
> different drillbit and did not see the added attributed.
> I tried the same for the TSV format and that worked as expect as the change 
> got propagated to all drillbits. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3149) TextReader should support multibyte line delimiters

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3149:
-
Reviewer: Krystal

> TextReader should support multibyte line delimiters
> ---
>
> Key: DRILL-3149
> URL: https://issues.apache.org/jira/browse/DRILL-3149
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Text & CSV
>Affects Versions: 1.0.0, 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> lineDelimiter in the TextFormatConfig doesn't support \r\n for record 
> delimiters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4786) Improve metadata cache performance for queries with multiple partitions

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4786:
-
Reviewer: Rahul Challapalli

> Improve metadata cache performance for queries with multiple partitions
> ---
>
> Key: DRILL-4786
> URL: https://issues.apache.org/jira/browse/DRILL-4786
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Metadata, Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.8.0
>
>
> Consider  queries of the following type run against Parquet data with 
> metadata caching:   
> {noformat}
> SELECT col FROM `A` WHERE dir0 = 'B`' AND dir1 IN ('1', '2', '3')
> {noformat}
> For such queries, Drill will read the metadata cache file from the top level 
> directory 'A', which is not very efficient since we are only interested in 
> the files  from some subdirectories of 'B'.   DRILL-4530 improves the 
> performance of such queries when the leaf level directory is a single 
> partition.  Here, there are 3 subpartitions due to the IN list.   We can 
> build upon the DRILL-4530 enhancement by at least reading the cache file from 
> the immediate parent level  `/A/B`  instead of the top level.  
> The goal of this JIRA is to improve performance for such types of queries.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4816) sqlline -f failed to read the query file

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4816:
-
Reviewer: Abhishek Girish

> sqlline -f failed to read the query file
> 
>
> Key: DRILL-4816
> URL: https://issues.apache.org/jira/browse/DRILL-4816
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.8.0
> Environment: redha 2.6.32-358.el6
>Reporter: Dechang Gu
>Assignee: Parth Chandra
> Fix For: 1.8.0
>
>
> Installed Apache Drill master (commit id: 4e1bdac) on a 10 node cluster.
> sqlline -u "jdbc:drill:schema=dfs.xxxParquet" -f refresh_meta_dirs.sql 
> hit the "No such file or directory" error:
> 16:34:47,956 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could 
> NOT find resource [logback.groovy]
> 16:34:47,956 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could 
> NOT find resource [logback-test.xml]
> 16:34:47,960 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found 
> resource [logback.xml] at [file:/mapr/drillPerf/
> drillbit/conf/logback.xml]
> 16:34:47,961 |-WARN in ch.qos.logback.classic.LoggerContext[default] - 
> Resource [logback.xml] occurs multiple times on the cl
> asspath.
> 16:34:47,961 |-WARN in ch.qos.logback.classic.LoggerContext[default] - 
> Resource [logback.xml] occurs at [file:/opt/mapr/drill
> /drill-1.8.0/conf/logback.xml]
> 16:34:47,961 |-WARN in ch.qos.logback.classic.LoggerContext[default] - 
> Resource [logback.xml] occurs at [file:/mapr/drillPerf
> /drillbit/conf/logback.xml]
> 16:34:48,163 |-INFO in 
> ch.qos.logback.classic.joran.action.ConfigurationAction - debug attribute not 
> set
> 16:34:48,168 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - 
> About to instantiate appender of type [ch.qos.logbac
> k.core.rolling.RollingFileAppender]
> 16:34:48,182 |-INFO in ch.qos.logback.core.joran.action.AppenderAction - 
> Naming appender as [FILE]
> 16:34:48,246 |-INFO in 
> ch.qos.logback.core.rolling.FixedWindowRollingPolicy@29989e7c - No 
> compression will be used
> 16:34:48,246 |-WARN in 
> ch.qos.logback.core.rolling.FixedWindowRollingPolicy@29989e7c - Large window 
> sizes are not allowed.
> 16:34:48,246 |-WARN in 
> ch.qos.logback.core.rolling.FixedWindowRollingPolicy@29989e7c - MaxIndex 
> reduced to 21
> 16:34:48,257 |-INFO in 
> ch.qos.logback.core.joran.action.NestedComplexPropertyIA - Assuming default 
> type [ch.qos.logback.class
> ic.encoder.PatternLayoutEncoder] for [encoder] property
> 16:34:48,319 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] 
> - Active log file name: /var/log/drill/drillbit_
> ucs-node1.perf.lab.log
> 16:34:48,319 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[FILE] 
> - File property is set to [/var/log/drill/drillb
> it_ucs-node1.perf.lab.log]
> 16:34:48,321 |-INFO in ch.qos.logback.classic.joran.action.LoggerAction - 
> Setting additivity of logger [org.apache.drill] to 
> false
> 16:34:48,321 |-INFO in ch.qos.logback.classic.joran.action.LevelAction - 
> org.apache.drill level set to INFO
> 16:34:48,322 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - 
> Attaching appender named [FILE] to Logger[org.apa
> che.drill]
> 16:34:48,323 |-INFO in ch.qos.logback.classic.joran.action.LevelAction - ROOT 
> level set to INFO
> 16:34:48,323 |-INFO in ch.qos.logback.core.joran.action.AppenderRefAction - 
> Attaching appender named [FILE] to Logger[ROOT]
> 16:34:48,323 |-INFO in 
> ch.qos.logback.classic.joran.action.ConfigurationAction - End of 
> configuration.
> 16:34:48,324 |-INFO in 
> ch.qos.logback.classic.joran.JoranConfigurator@62ccf439 - Registering current 
> configuration as safe fa
> llback point
> -u[i+1] (No such file or directory)
> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
> running the query from inside the sqlline connection is ok.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3710) Make the 20 in-list optimization configurable

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3710:
-
Reviewer: Chun Chang

> Make the 20 in-list optimization configurable
> -
>
> Key: DRILL-3710
> URL: https://issues.apache.org/jira/browse/DRILL-3710
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.1.0
>Reporter: Hao Zhu
>Assignee: Gautam Kumar Parai
> Fix For: 1.8.0
>
>
> If Drill has more than 20 in-lists , Drill can do an optimization to convert 
> that in-lists into a small hash table in memory, and then do a table join 
> instead.
> This can improve the performance of the query which has many in-lists.
> Could we make "20" configurable? So that we do not need to add duplicate/junk 
> in-list to make it more than 20.
> Sample query is :
> select count(*) from table where col in 
> (1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4751) Remove dumpcat script from Drill distribution

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4751:
-
Reviewer: Abhishek Girish

> Remove dumpcat script from Drill distribution
> -
>
> Key: DRILL-4751
> URL: https://issues.apache.org/jira/browse/DRILL-4751
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.8.0
>
>
> The Drill distribution includes a "dumpcat" script in the $DRILL_HOME/bin 
> directory. However, no documentation exists for the tool. The only reference 
> on Apache Drill is from a JIRA.
> This appears to be a script used by developers in years past to diagnose 
> customer issues. In case the tool may be useful in the future, we will leave 
> it in the Git source tree, but omit it from the distribution (since we will 
> not test or document it.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4825) Wrong data with UNION ALL when querying different sub-directories under the same table

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4825:
-
Reviewer: Rahul Challapalli

> Wrong data with UNION ALL when querying different sub-directories under the 
> same table
> --
>
> Key: DRILL-4825
> URL: https://issues.apache.org/jira/browse/DRILL-4825
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0, 1.7.0, 1.8.0
>Reporter: Rahul Challapalli
>Assignee: Jinfeng Ni
>Priority: Critical
> Fix For: 1.8.0
>
> Attachments: l_3level.tgz
>
>
> git.commit.id.abbrev=0700c6b
> The below query returns wrongs results 
> {code}
> select count (*) from (
>   select l_orderkey, dir0 from l_3level t1 where t1.dir0 = 1 and 
> t1.dir1='one' and t1.dir2 = '2015-7-12'
>   union all 
>   select l_orderkey, dir0 from l_3level t2 where t2.dir0 = 1 and 
> t2.dir1='two' and t2.dir2 = '2015-8-12') data;
> +-+
> | EXPR$0  |
> +-+
> | 20  |
> +-+
> {code}
> The wrong result is evident from the output of the below queries
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select count (*) from (select 
> l_orderkey, dir0 from l_3level t2 where t2.dir0 = 1 and t2.dir1='two' and 
> t2.dir2 = '2015-8-12');
> +-+
> | EXPR$0  |
> +-+
> | 30  |
> +-+
> 1 row selected (0.258 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count (*) from (select 
> l_orderkey, dir0 from l_3level t2 where t2.dir0 = 1 and t2.dir1='one' and 
> t2.dir2 = '2015-7-12');
> +-+
> | EXPR$0  |
> +-+
> | 10  |
> +-+
> {code}
> I attached the data set. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4759) Drill throwing array index out of bound exception when reading a parquet file written by map reduce program.

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4759:
-
Reviewer: Chun Chang

> Drill throwing array index out of bound exception when reading a parquet file 
> written by map reduce program.
> 
>
> Key: DRILL-4759
> URL: https://issues.apache.org/jira/browse/DRILL-4759
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
> Fix For: 1.8.0
>
>
> An ArrayIndexOutOfBound exception is thrown while reading bigInt data type 
> from dictionary encoded parquet data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4819) Update MapR version to 5.2.0

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4819:
-
Reviewer: Abhishek Girish

> Update MapR version to 5.2.0
> 
>
> Key: DRILL-4819
> URL: https://issues.apache.org/jira/browse/DRILL-4819
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Tools, Build & Test
>Affects Versions: 1.8.0
>Reporter: Patrick Wong
>Assignee: Patrick Wong
> Fix For: 1.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4147) Union All operator runs in a single fragment

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4147:
-
Reviewer: Robert Hou

> Union All operator runs in a single fragment
> 
>
> Key: DRILL-4147
> URL: https://issues.apache.org/jira/browse/DRILL-4147
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: amit hadke
>Assignee: Aman Sinha
> Fix For: 1.8.0
>
>
> A User noticed that running select  from a single directory is much faster 
> than union all on two directories.
> (https://drill.apache.org/blog/2014/12/09/running-sql-queries-on-amazon-s3/#comment-2349732267)
>  
> It seems like UNION ALL operator doesn't parallelize sub scans (its using 
> SINGLETON for distribution type). Everything is ran in single fragment.
> We may have to use SubsetTransformer in UnionAllPrule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4704) select statement behavior is inconsistent for decimal values in parquet

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4704:
-
Reviewer: Rahul Challapalli

> select statement behavior is inconsistent for decimal values in parquet
> ---
>
> Key: DRILL-4704
> URL: https://issues.apache.org/jira/browse/DRILL-4704
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.6.0
> Environment: Windows 7 Pro, Java 1.8.0_91
>Reporter: Dave Oshinsky
> Fix For: 1.8.0
>
>
> A select statement that searches a parquet file for a decimal value matching 
> a specific value behaves inconsistently.  The query expressed most simply 
> finds nothing:
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where 
> employee_id = 100;
> +--+-+++---+---+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  | 
> HIRE_DATE |
> +--+-+++---+---+
> +--+-+++---+---+
> No rows selected (0.348 seconds)
> The query can be modified to find the matching row in a few ways, such as the 
> following (using between instead of '=', changing 100 to 100.0, or casting as 
> decimal:
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where 
> employee_id between 100 and 100;
> +--+-+++---+---+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |   
> HIR |
> +--+-+++---+---+
> | 100  | Steven  | King   | SKING  | 515.123.4567  | 
> 2003-06-1 |
> +--+-+++---+---+
> 1 row selected (0.226 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where 
> employee_id = 100.0;
> +--+-+++---+---+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |   
> HIR |
> +--+-+++---+---+
> | 100  | Steven  | King   | SKING  | 515.123.4567  | 
> 2003-06-1 |
> +--+-+++---+---+
> 1 row selected (0.259 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where 
> cast(employee_id AS DECIMAL) = 100;
> +--+-+++---+---+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |   
> HIR |
> +--+-+++---+---+
> | 100  | Steven  | King   | SKING  | 515.123.4567  | 
> 2003-06-1 |
> +--+-+++---+---+
> 1 row selected (0.232 seconds)
> 0: jdbc:drill:zk=local>
> The schema of the parquet data that is being searched is as follows:
> $ java -jar parquet-tools*1.jar meta c:/archiveHR/HR.EMPLOYEES/1.parquet
> file:   file:/c:/archiveHR/HR.EMPLOYEES/1.parquet
> creator:parquet-mr version 1.8.1 (build 
> 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)
> .
> file schema:HR.EMPLOYEES
> 
> EMPLOYEE_ID:REQUIRED FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:0
> FIRST_NAME: OPTIONAL BINARY O:UTF8 R:0 D:1
> LAST_NAME:  REQUIRED BINARY O:UTF8 R:0 D:0
> EMAIL:  REQUIRED BINARY O:UTF8 R:0 D:0
> PHONE_NUMBER:   OPTIONAL BINARY O:UTF8 R:0 D:1
> HIRE_DATE:  REQUIRED BINARY O:UTF8 R:0 D:0
> JOB_ID: REQUIRED BINARY O:UTF8 R:0 D:0
> SALARY: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> COMMISSION_PCT: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> MANAGER_ID: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> DEPARTMENT_ID:  OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> row group 1:RC:107 TS:9943 OFFSET:4
> 
> EMPLOYEE_ID: FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:360/355/0.99 
> VC:107 ENC:PLAIN,BIT_PACKED
> FIRST_NAME:  BINARY SNAPPY DO:0 FPO:364 SZ:902/1058/1.17 VC:107 
> ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
> LAST_NAME:   BINARY SNAPPY DO:0 FPO:1266 SZ:913//1.22 VC:107 
> ENC:PLAIN,BIT_PACKED
> EMAIL:   BINARY SNAPPY DO:0 FPO:2179 SZ:977/1184/1.21 VC:107 
> ENC:PLAIN,BIT_PACKED
> PHONE_NUMBER:BINARY SNAPPY DO:0 FPO:3156 SZ:750/1987/2.65 VC:107 
> ENC:PLAIN,RLE,BIT_PACKED
> HIRE_DATE:   BINARY SNAPPY DO:0 

[jira] [Updated] (DRILL-4766) FragmentExecutor should use EventProcessor and avoid blocking rpc threads

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4766:
-
Reviewer: Rahul Challapalli

> FragmentExecutor should use EventProcessor and avoid blocking rpc threads
> -
>
> Key: DRILL-4766
> URL: https://issues.apache.org/jira/browse/DRILL-4766
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Flow
>Affects Versions: 1.7.0
>Reporter: Deneche A. Hakim
>Assignee: Sudheesh Katkam
>Priority: Minor
> Fix For: 1.8.0
>
>
> Currently, rpc thread can block when trying to deliver a cancel or early 
> termination message to a blocked fragment executor.
> Foreman already uses an EventProcessor to avoid such scenarios. 
> FragmentExecutor could be improved to avoid blocking rpc threads as well



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4822) Extend distrib-env.sh search to consider site directory

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4822:
-
Reviewer: Abhishek Girish

> Extend distrib-env.sh search to consider site directory
> ---
>
> Key: DRILL-4822
> URL: https://issues.apache.org/jira/browse/DRILL-4822
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Paul Rogers
>Priority: Minor
> Fix For: 1.8.0
>
>
> DRILL-4581 provided revisions to the Drill launch scripts. As part of that 
> fix, we introduced a new distrib-env.sh file to hold settings created by 
> custom Drill installers (that is, by custom distributions.) The original 
> version of this feature looks for distrib-env.sh only in $DRILL_HOME/env.
> Experience suggests that installers will write site-specific values to 
> distrib-env.sh and so the file must then be copied to $DRILL_SITE when 
> running under YARN. Add $DRILL_SITE to the search path in drill-config.sh for 
> distrib-env.sh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4836) ZK Issue during Drillbit startup, possibly due to race condition

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4836:
-
Reviewer: Abhishek Girish

> ZK Issue during Drillbit startup, possibly due to race condition
> 
>
> Key: DRILL-4836
> URL: https://issues.apache.org/jira/browse/DRILL-4836
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: Abhishek Girish
>Assignee: Paul Rogers
> Fix For: 1.8.0
>
>
> During a parallel launch of Drillbits on a 4 node cluster, I hit this issue 
> during startup:
> {code}
> Exception in thread "main" 
> org.apache.drill.exec.exception.DrillbitStartupException: Failure during 
> initial startup of Drillbit.
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:284)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:261)
> at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:257)
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: unable 
> to put
> at 
> org.apache.drill.exec.coord.zk.ZookeeperClient.put(ZookeeperClient.java:196)
> at 
> org.apache.drill.exec.store.sys.store.ZookeeperPersistentStore.putIfAbsent(ZookeeperPersistentStore.java:94)
> ...
> at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:113)
> at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:281)
> ... 2 more
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: 
> KeeperErrorCode = NodeExists for /drill/sys.storage_plugins/dfs
> at 
> org.apache.drill.exec.coord.zk.ZookeeperClient.put(ZookeeperClient.java:191)
> ... 7 more
> {code}
> And similarly,
> {code}
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: 
> KeeperErrorCode = NodeExists for /drill/sys.storage_plugins/kudu
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4623) Disable Epoll by Default

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4623:
-
Reviewer: Dechang Gu

> Disable Epoll by Default
> 
>
> Key: DRILL-4623
> URL: https://issues.apache.org/jira/browse/DRILL-4623
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
> Fix For: 1.8.0
>
>
> At higher concurrency (and/or spuriously), we hit [netty issue 
> #3539|https://github.com/netty/netty/issues/3539]. This is an issue with the 
> version of Netty Drill currently uses. Once Drill moves to a later version of 
> Netty, epoll should be reenabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3726) Drill is not properly interpreting CRLF (0d0a). CR gets read as content.

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3726:
-
Reviewer: Krystal

> Drill is not properly interpreting CRLF (0d0a). CR gets read as content.
> 
>
> Key: DRILL-3726
> URL: https://issues.apache.org/jira/browse/DRILL-3726
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Text & CSV
>Affects Versions: 1.1.0
> Environment: Linux RHEL 6.6, OSX 10.9
>Reporter: Edmon Begoli
>Assignee: Arina Ielchiieva
> Fix For: 1.8.0
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
>   When we query the last attribute of a text file, we get missing characters. 
>  Looking at the row through Drill, a \r is included at the end of the last 
> attribute.  
> Looking in a text editor, it's not embedded into that attribute.
> I'm thinking that Drill is not interpreting CRLF (0d0a) as a new line, only 
> the LF, resulting in the CR becoming part of the last attribute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4658) cannot specify tab as a fieldDelimiter in table function

2016-08-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4658:
-
Reviewer: Krystal

> cannot specify tab as a fieldDelimiter in table function
> 
>
> Key: DRILL-4658
> URL: https://issues.apache.org/jira/browse/DRILL-4658
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.6.0
> Environment: Mac OS X, Java 8
>Reporter: Vince Gonzalez
>Assignee: Arina Ielchiieva
> Fix For: 1.8.0
>
>
> I can't specify a tab delimiter in the table function because it maybe counts 
> the characters rather than trying to interpret as a character escape code?
> {code}
> 0: jdbc:drill:zk=local> select columns[0] as a, cast(columns[1] as bigint) as 
> b from table(dfs.tmp.`sample_cast.tsv`(type => 'text', fieldDelimiter => 
> '\t', skipFirstLine => true));
> Error: PARSE ERROR: Expected single character but was String: \t
> table sample_cast.tsv
> parameter fieldDelimiter
> SQL Query null
> [Error Id: 3efa82e1-3810-4d4a-b23c-32d6658dffcf on 172.30.1.144:31010] 
> (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4175) IOBE may occur in Calcite RexProgramBuilder when queries are submitted concurrently

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4175:
-
Reviewer: Rahul Challapalli

> IOBE may occur in Calcite RexProgramBuilder when queries are submitted 
> concurrently
> ---
>
> Key: DRILL-4175
> URL: https://issues.apache.org/jira/browse/DRILL-4175
> Project: Apache Drill
>  Issue Type: Bug
> Environment: distribution
>Reporter: huntersjm
> Fix For: 1.8.0
>
>
> I queryed a sql just like `selelct v from table limit 1`,I get a error:
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IndexOutOfBoundsException: Index: 68, Size: 67
> After debug, I found there is a bug in calcite parse:
> first we look line 72 in org.apache.calcite.rex.RexProgramBuilder
> {noformat}
>registerInternal(RexInputRef.of(i, fields), false);
> {noformat}
> there we get RexInputRef from RexInputRef.of, and it has a method named 
> createName(int idex), here NAMES is SelfPopulatingList.class. 
> SelfPopulatingList.class describe  as Thread-safe list, but in fact it is 
> Thread-unsafe. when NAMES.get(index) is called distributed, it gets a error. 
> We hope SelfPopulatingList.class to be {$0 $1 $2 $n}, but when it called 
> distributed, it may be {$0,$1...$29,$30...$59,$30,$31...$59...}.
> We see method registerInternal
> {noformat}
> private RexLocalRef registerInternal(RexNode expr, boolean force) {
> expr = simplify(expr);
> RexLocalRef ref;
> final Pair key;
> if (expr instanceof RexLocalRef) {
>   key = null;
>   ref = (RexLocalRef) expr;
> } else {
>   key = RexUtil.makeKey(expr);
>   ref = exprMap.get(key);
> }
> if (ref == null) {
>   if (validating) {
> validate(
> expr,
> exprList.size());
>   }
> {noformat}
> Here makeKey(expr) hope to get different key, however it get same key, so 
> addExpr(expr) called less, in this method
> {noformat}
> RexLocalRef ref;
> final int index = exprList.size();
> exprList.add(expr);
> ref =
> new RexLocalRef(
> index,
> expr.getType());
> localRefList.add(ref);
> return ref;
> {noformat}
> localRefList get error size, so in line 939,
> {noformat}
> final RexLocalRef ref = localRefList.get(index);
> {noformat}
> throw IndexOutOfBoundsException
> bugfix:
> We can't change origin code of calcite before they fix this bug, so we can 
> init NAMEs in RexLocalRef on start. Just add 
> {noformat}
> RexInputRef.createName(2048);
> {noformat}
> on Bootstrap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4673:
-
Reviewer: Chun Chang

> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command return
> -
>
> Key: DRILL-4673
> URL: https://issues.apache.org/jira/browse/DRILL-4673
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: doc-impacting, drill
> Fix For: 1.8.0
>
>
> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command "DROP TABLE" return if table doesn't exist.
> The same for "DROP VIEW IF EXISTS"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4783) Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4783:
-
Reviewer: Rahul Challapalli

> Flatten on CONVERT_FROM fails with ClassCastException if resultset is empty
> ---
>
> Key: DRILL-4783
> URL: https://issues.apache.org/jira/browse/DRILL-4783
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
> Fix For: 1.8.0
>
>
> Flatten failed to work on top of convert_from when the resultset is empty. 
> For a HBase table like this:
> 0: jdbc:drill:zk=localhost:5181> select convert_from(t.address.cities,'json') 
> from hbase.`/tmp/flattentest` t;
> +--+
> |  EXPR$0 
>  |
> +--+
> | {"list":[{"city":"SunnyVale"},{"city":"Palo Alto"},{"city":"Mountain 
> View"}]}|
> | {"list":[{"city":"Seattle"},{"city":"Bellevue"},{"city":"Renton"}]} 
>  |
> | {"list":[{"city":"Minneapolis"},{"city":"Falcon Heights"},{"city":"San 
> Paul"}]}  |
> +--+
> Flatten works when row_key is in (1,2,3)
> 0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select 
> convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t 
> where row_key=1) t1;
> +---+
> |  EXPR$0   |
> +---+
> | {"city":"SunnyVale"}  |
> | {"city":"Palo Alto"}  |
> | {"city":"Mountain View"}  |
> +---+
> But Flatten throws exception if the resultset is empty
> 0: jdbc:drill:zk=localhost:5181> select flatten(t1.json.list) from (select 
> convert_from(t.address.cities,'json') json from hbase.`/tmp/flattentest` t 
> where row_key=4) t1;
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> Fragment 0:0
> [Error Id: 07fd0cab-d1e6-4259-bfec-ad80f02d93a2 on atsqa4-127.qa.lab:31010] 
> (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4664) ScanBatch.isNewSchema() returns wrong result for map datatype

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4664:
-
Reviewer: Chun Chang

> ScanBatch.isNewSchema() returns wrong result for map datatype
> -
>
> Key: DRILL-4664
> URL: https://issues.apache.org/jira/browse/DRILL-4664
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.8.0
>
>
> isNewSchema() method checks if top-level schema or any of the deeper map 
> schemas has changed. The last one doesn't work properly with count function.
> "deeperSchemaChanged" equals true even when two map strings have the same 
> children fields.
> Discovered while trying to fix [DRILL-2385|DRILL-2385].
> Dataset test.json for reproducing (MAP datatype object):
> {code}{"oooi":{"oa":{"oab":{"oabc":1{code}
> Example of query:
> {code}select count(t.oooi) from dfs.tmp.`test.json` t{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4794) Regression: Wrong result for query with disjunctive partition filters

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4794:
-
Reviewer: Rahul Challapalli

> Regression: Wrong result for query with disjunctive partition filters
> -
>
> Key: DRILL-4794
> URL: https://issues.apache.org/jira/browse/DRILL-4794
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.7.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.8.0
>
>
> For a query that contains certain types of disjunctive filter conditions such 
> as 'dir0=x OR dir1=y'  we get wrong result when metadata caching is used.  
> This is a regression due to DRILL-4530.  
> Note that the filter involves OR of 2 different directory levels. For the 
> normal case of OR condition at the same level the problem does not occur. 
> Correct result (without metadata cache) 
> {noformat}
> 0: jdbc:drill:zk=local> select count(*) from dfs.`orders` where dir0=1994 or 
> dir1='Q3' ;
> +-+
> | EXPR$0  |
> +-+
> | 60  |
> +-+
> {noformat}
> Wrong result (with metadata cache):
> {noformat}
> 0: jdbc:drill:zk=local> select count(*) from dfs.`orders` where dir0=1994 or 
> dir1='Q3' ;
> +-+
> | EXPR$0  |
> +-+
> | 50  |
> +-+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4744) Fully Qualified JDBC Plugin Tables return Table not Found via Rest API

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4744:
-
Reviewer: Chun Chang

> Fully Qualified JDBC Plugin Tables return Table not Found via Rest API
> --
>
> Key: DRILL-4744
> URL: https://issues.apache.org/jira/browse/DRILL-4744
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.6.0
>Reporter: John Omernik
>Assignee: Roman Lozovyk
>Priority: Minor
> Fix For: 1.7.0
>
>
> When trying to query a JDBC table via authenticated Rest API, using a fully 
> qualified table name returns table not found.  This does not occur in 
> sqlline, and a workaround is to "use pluginname.mysqldatabase" prior to the 
> query. (Then the fully qualified table name will work)
> Plugin Name: mysql
> Mysql Database: events
> Mysql Table: curevents
> Via Rest:
> select * from mysql.events.curevents limit 10;
> Fail with "VALIDATION ERROR "Table 'mysql.events.curevents' not found
> Via Rest:
> use mysql.events;
> select * from mysql.events.curevents limit 10;
> - Success. 
> Via SQL line, authenticating with the same username, you can connect, and run 
> select * from mysql.events.curevents limit 10;
> without issue. (and without the use mysql.events)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4743) HashJoin's not fully parallelized in query plan

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4743:
-
Reviewer: Robert Hou

> HashJoin's not fully parallelized in query plan
> ---
>
> Key: DRILL-4743
> URL: https://issues.apache.org/jira/browse/DRILL-4743
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Gautam Kumar Parai
>Assignee: Gautam Kumar Parai
>  Labels: doc-impacting
> Fix For: 1.8.0
>
>
> The underlying problem is filter selectivity under-estimate for a query with 
> complicated predicates e.g. deeply nested and/or predicates. This leads to 
> under parallelization of the major fragment doing the join. 
> To really resolve this problem we need table/column statistics to correctly 
> estimate the selectivity. However, in the absence of statistics OR even when 
> existing statistics are insufficient to get a correct estimate of selectivity 
> this will serve as a workaround.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4746) Verification Failures (Decimal values) in drill's regression tests

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4746:
-
Reviewer: Khurram Faraaz

> Verification Failures (Decimal values) in drill's regression tests
> --
>
> Key: DRILL-4746
> URL: https://issues.apache.org/jira/browse/DRILL-4746
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - Text & CSV
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.8.0
>
>
> We started seeing the below 4 functional test failures in drill's extended 
> tests [1]. The data for the below tests can be downloaded from [2]
> {code}
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate28.q
> framework/resources/Functional/tpcds/impala/text/q43.q
> framework/resources/Functional/tpcds/variants/text/q6_1.sql
> framework/resources/Functional/aggregates/tpcds_variants/text/aggregate29.q
> {code}
> The failures started showing up from the commit [3]
> [1] https://github.com/mapr/drill-test-framework
> [2] http://apache-drill.s3.amazonaws.com/files/tpcds-sf1-text.tgz
> [3] 
> https://github.com/apache/drill/commit/223507b76ff6c2227e667ae4a53f743c92edd295
> Let me know if more information is needed to reproduce this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4530) Improve metadata cache performance for queries with single partition

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4530:
-
Reviewer: Rahul Challapalli

> Improve metadata cache performance for queries with single partition 
> -
>
> Key: DRILL-4530
> URL: https://issues.apache.org/jira/browse/DRILL-4530
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.8.0
>
>
> Consider two types of queries which are run with Parquet metadata caching: 
> {noformat}
> query 1:
> SELECT col FROM  `A/B/C`;
> query 2:
> SELECT col FROM `A` WHERE dir0 = 'B' AND dir1 = 'C';
> {noformat}
> For a certain dataset, the query1 elapsed time is 1 sec whereas query2 
> elapsed time is 9 sec even though both are accessing the same amount of data. 
>  The user expectation is that they should perform roughly the same.  The main 
> difference comes from reading the bigger metadata cache file at the root 
> level 'A' for query2 and then applying the partitioning filter.  query1 reads 
> a much smaller metadata cache file at the subdirectory level. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2330) Add support for nested aggregate expressions for window aggregates

2016-07-25 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-2330:
-
Reviewer: Khurram Faraaz

> Add support for nested aggregate expressions for window aggregates
> --
>
> Key: DRILL-2330
> URL: https://issues.apache.org/jira/browse/DRILL-2330
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
>Assignee: Gautam Kumar Parai
> Fix For: 1.8.0
>
> Attachments: drillbit.log
>
>
> Aggregate expressions currently cannot be nested. 
> *The following query fails to validate:*
> {code:sql}
> select avg(sum(i_item_sk)) from item;
> {code}
> Error:
> Query failed: SqlValidatorException: Aggregate expressions cannot be nested
> Log attached. 
> Reference: TPCDS queries (20, 63, 98, ...) fail to execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4199) Add Support for HBase 1.X

2016-07-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4199:
-
Reviewer: Abhishek Girish  (was: Jacques Nadeau)

> Add Support for HBase 1.X
> -
>
> Key: DRILL-4199
> URL: https://issues.apache.org/jira/browse/DRILL-4199
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HBase
>Affects Versions: 1.7.0
>Reporter: Divjot singh
>Assignee: Aditya Kishore
> Fix For: 1.7.0
>
>
> Is there any Road map to upgrade the Hbase version to 1.x series. Currently 
> drill supports Hbase 0.98 version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2593) 500 error when crc for a query profile is out of sync

2016-07-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-2593:
-
Reviewer: Krystal

> 500 error when crc for a query profile is out of sync
> -
>
> Key: DRILL-2593
> URL: https://issues.apache.org/jira/browse/DRILL-2593
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 0.7.0
>Reporter: Jason Altekruse
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
> Attachments: warning1.JPG, warning2.JPG
>
>
> To reproduce, on a machine where an embedded drillbit has been run, edit one 
> of the profiles stored in /tmp/drill/profiles and try to navigate to the 
> profiles page on the Web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4716) status.json doesn't work in drill ui

2016-07-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4716:
-
Reviewer: Krystal

> status.json doesn't work in drill ui
> 
>
> Key: DRILL-4716
> URL: https://issues.apache.org/jira/browse/DRILL-4716
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP
>Affects Versions: 1.6.0
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
>Priority: Minor
> Fix For: 1.7.0
>
>
> 1. http://localhost:8047/status returns "Running!"
> But http://localhost:8047/status.json gives error.
> {code}
> {
>   "errorMessage" : "HTTP 404 Not Found"
> }
> {code}
> 2. Remove link to System Options on page http://localhost:8047/status as 
> redundant.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2385) count on complex objects failed with missing function implementation

2016-07-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-2385:
-
Reviewer: Chun Chang

> count on complex objects failed with missing function implementation
> 
>
> Key: DRILL-2385
> URL: https://issues.apache.org/jira/browse/DRILL-2385
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 0.8.0
>Reporter: Chun Chang
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.7.0
>
>
> #Wed Mar 04 01:23:42 EST 2015
> git.commit.id.abbrev=71b6bfe
> Have a complex type looks like the following:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.sia from 
> `complex.json` t limit 1;
> ++
> |sia |
> ++
> | [1,11,101,1001] |
> ++
> {code}
> A count on the complex type will fail with missing function implementation:
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.gbyi, count(t.sia) 
> countsia from `complex.json` t group by t.gbyi;
> Query failed: RemoteRpcException: Failure while running fragment., Schema is 
> currently null.  You must call buildSchema(SelectionVectorMode) before this 
> container can return a schema. [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on 
> qa-node119.qa.lab:31010 ]
> [ 12856530-3133-45be-bdf4-ef8cc784f7b3 on qa-node119.qa.lab:31010 ]
> Error: exception while executing query: Failure while executing query. 
> (state=,code=0)
> {code}
> drillbit.log
> {code}
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] ERROR 
> o.a.drill.exec.ops.FragmentContext - Fragment Context received failure.
> org.apache.drill.exec.exception.SchemaChangeException: Failure while 
> materializing expression.
> Error in expression at index 0.  Error: Missing function implementation: 
> [count(BIGINT-REPEATED)].  Full expression: null.
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregatorInternal(HashAggBatch.java:210)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.createAggregator(HashAggBatch.java:158)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.buildSchema(HashAggBatch.java:101)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:130)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:67) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:114)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:57) 
> [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:121)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303)
>  [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> 2015-03-04 13:44:51,383 [2b08832b-9247-e90c-785d-751f02fc1548:frag:2:0] WARN  
> o.a.d.e.w.fragment.FragmentExecutor - Error while initializing or executing 
> fragment
> java.lang.NullPointerException: Schema is currently null.  You must call 
> buildSchema(SelectionVectorMode) before this container can return a schema.
> at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) 
> ~[guava-14.0.1.jar:na]
> at 
> org.apache.drill.exec.record.VectorContainer.getSchema(VectorContainer.java:261)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.getSchema(AbstractRecordBatch.java:155)
>  ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
> at 
> 

[jira] [Updated] (DRILL-3623) Limit 0 should avoid execution when querying a known schema

2016-07-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3623:
-
Reviewer: Krystal

> Limit 0 should avoid execution when querying a known schema
> ---
>
> Key: DRILL-3623
> URL: https://issues.apache.org/jira/browse/DRILL-3623
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Storage - Hive
>Affects Versions: 1.1.0
> Environment: MapR cluster
>Reporter: Andries Engelbrecht
>Assignee: Sudheesh Katkam
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> Running a select * from hive.table limit 0 does not return (hangs).
> Select * from hive.table limit 1 works fine
> Hive table is about 6GB with 330 files with parquet using snappy compression.
> Data types are int, bigint, string and double.
> Querying directory with parquet files through the DFS plugin works fine
> select * from dfs.root.`/user/hive/warehouse/database/table` limit 0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3474) Add implicit file columns support

2016-07-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3474:
-
Reviewer: Krystal

> Add implicit file columns support
> -
>
> Key: DRILL-3474
> URL: https://issues.apache.org/jira/browse/DRILL-3474
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Metadata
>Affects Versions: 1.1.0
>Reporter: Jim Scott
>Assignee: Arina Ielchiieva
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>
> I could not find another ticket which talks about this ...
> The file name should be a column which can be selected or filtered when 
> querying a directory just like dir0, dir1 are available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4707) Conflicting columns names under case-insensitive policy lead to either memory leak or incorrect result

2016-07-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4707:
-
Reviewer: Chun Chang

> Conflicting columns names under case-insensitive policy lead to either memory 
> leak or incorrect result
> --
>
> Key: DRILL-4707
> URL: https://issues.apache.org/jira/browse/DRILL-4707
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jinfeng Ni
>Priority: Critical
> Fix For: 1.8.0
>
>
> On latest master branch:
> {code}
> select version, commit_id, commit_message from sys.version;
> +-+---+-+
> | version | commit_id |   
>   commit_message  |
> +-+---+-+
> | 1.7.0-SNAPSHOT  | 3186217e5abe3c6c2c7e504cdb695567ff577e4c  | DRILL-4607: 
> Add a split function that allows to separate string by a delimiter  |
> +-+---+-+
> {code}
> If a query has two conflicting column names under case-insensitive policy, 
> Drill will either hit memory leak, or incorrect issue.
> Q1.
> {code}
> select r_regionkey as XYZ, r_name as xyz FROM cp.`tpch/region.parquet`;
> Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. 
> Memory leaked: (131072)
> Allocator(op:0:0:1:Project) 100/131072/2490368/100 
> (res/actual/peak/limit)
> Fragment 0:0
> {code}
> Q2: return only one column in the result. 
> {code}
> select n_nationkey as XYZ, n_regionkey as xyz FROM cp.`tpch/nation.parquet`;
> +--+
> | XYZ  |
> +--+
> | 0|
> | 1|
> | 1|
> | 1|
> | 4|
> | 0|
> | 3|
> {code}
> The cause of the problem seems to be that the Project thinks the two incoming 
> columns as identical (since Drill adopts case-insensitive for column names in 
> execution). 
> The planner should make sure that the conflicting columns are resolved, since 
> execution is name-based. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3559) Make filename available to sql statments just like dirN

2016-07-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3559:
-
Reviewer: Krystal

> Make filename available to sql statments just like dirN
> ---
>
> Key: DRILL-3559
> URL: https://issues.apache.org/jira/browse/DRILL-3559
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: SQL Parser
>Affects Versions: 1.1.0
>Reporter: Stefán Baxter
>Assignee: Arina Ielchiieva
>Priority: Minor
>  Labels: doc-impacting
> Fix For: 1.7.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4693) Incorrect column ordering when CONVERT_FROM() json is used

2016-07-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4693:
-
Reviewer: Chun Chang

> Incorrect column ordering when CONVERT_FROM() json is used 
> ---
>
> Key: DRILL-4693
> URL: https://issues.apache.org/jira/browse/DRILL-4693
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators, Query Planning & 
> Optimization
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
>
> For the following query, the column order in the results is wrong..it should 
> be col1, col2, col3. 
> {noformat}
> 0: jdbc:drill:zk=local> select 'abc' as col1, convert_from('{"x" : "y"}', 
> 'json') as col2, 'xyz' as col3 from cp.`tpch/region.parquet`;
> +---+---++
> | col1  | col3  |col2|
> +---+---++
> | abc   | xyz   | {"x":"y"}  |
> | abc   | xyz   | {"x":"y"}  |
> | abc   | xyz   | {"x":"y"}  |
> | abc   | xyz   | {"x":"y"}  |
> | abc   | xyz   | {"x":"y"}  |
> +---+---++
> {noformat}
> The EXPLAIN plan:
> {noformat}
> 0: jdbc:drill:zk=local> explain plan for select 'abc' as col1, 
> convert_from('{"x" : "y"}', 'json') as col2, 'xyz' as col3 from 
> cp.`tpch/region.parquet`;
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(col1=['abc'], col2=[CONVERT_FROMJSON('{"x" : "y"}')], 
> col3=['xyz'])
> 00-02Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
> [path=classpath:/tpch/region.parquet]], 
> selectionRoot=classpath:/tpch/region.parquet, numFiles=1, 
> usedMetadataFile=false, columns=[]]])
> {noformat}
> This happens on current master branch as well as 1.6.0 and even earlier (I 
> checked 1.4.0 as well which also has the same behavior).  So it is a 
> pre-existing bug.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4733) max(dir0) reading more columns than necessary

2016-07-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4733:
-
Reviewer: Chun Chang

> max(dir0) reading more columns than necessary
> -
>
> Key: DRILL-4733
> URL: https://issues.apache.org/jira/browse/DRILL-4733
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization, Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Rahul Challapalli
>Assignee: Arina Ielchiieva
>Priority: Critical
> Fix For: 1.7.0
>
> Attachments: bug.tgz
>
>
> The below query started to fail from this commit : 
> 3209886a8548eea4a2f74c059542672f8665b8d2
> {code}
> select max(dir0) from dfs.`/drill/testdata/bug/2016`;
> Error: UNSUPPORTED_OPERATION ERROR: Streaming aggregate does not support 
> schema changes
> Fragment 0:0
> [Error Id: b0060205-e9a6-428a-9803-7b4312b2c6f4 on qa-node190.qa.lab:31010] 
> (state=,code=0)
> {code}
> The sub-folders contains files which do have schema change for one column 
> "contributions" (int32 vs double). However prior to this commit we did not 
> fail in the scenario. Log files and test data are attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4701) Fix log name and missing lines in logs on Web UI

2016-07-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4701:
-
Reviewer: Krystal

> Fix log name and missing lines in logs on Web UI
> 
>
> Key: DRILL-4701
> URL: https://issues.apache.org/jira/browse/DRILL-4701
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Arina Ielchiieva
>Assignee: Arina Ielchiieva
> Fix For: 1.7.0
>
>
> 1. When the log files are downloaded from the ui, the name of the downloaded 
> file is "download". We should save the file with the same name as the log 
> file (ie. drillbit.log)
> 2. The last N lines of the log file displayed in the web UI do not match the 
> log file itself. Some lines are missing compared with actual log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4588) Enable JMXReporter to Expose Metrics

2016-07-08 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4588:
-
Reviewer: Krystal

> Enable JMXReporter to Expose Metrics
> 
>
> Key: DRILL-4588
> URL: https://issues.apache.org/jira/browse/DRILL-4588
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
> Fix For: 1.7.0
>
>
> -There is a static initialization order issue that needs to be fixed.-
> The code is commented out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator

2016-05-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4679:
-
Reviewer: Chun Chang

> CONVERT_FROM()  json format fails if 0 rows are received from upstream 
> operator
> ---
>
> Key: DRILL-4679
> URL: https://issues.apache.org/jira/browse/DRILL-4679
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.6.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
>
> CONVERT_FROM() json format fails as below if the underlying Filter produces 0 
> rows: 
> {noformat}
> 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x 
> from cp.`tpch/region.parquet` where r_regionkey = ;
> Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without 
> first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch]
> Fragment 0:0
> {noformat}
> If the conversion is applied as UTF8 format,  the same query succeeds: 
> {noformat}
> 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x 
> from cp.`tpch/region.parquet` where r_regionkey = ;
> ++
> | x  |
> ++
> ++
> No rows selected (0.241 seconds)
> {noformat}
> The reason for this is the special handling in the ProjectRecordBatch for 
> JSON.  The output schema is not known for this until the run time and the 
> ComplexWriter in the Project relies on seeing the input data to determine the 
> output schema - this could be a MapVector or ListVector etc.  
> If the input data has 0 rows due to a filter condition, we should at least 
> produce a default output schema, e.g an empty MapVector ?  Need to decide a 
> good default.  Note that the CONVERT_FROM(x, 'json') could occur on 2 
> branches of a UNION-ALL and if one input is empty while the other side is 
> not, it may still cause incompatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4676) Foreman.moveToState can block forever if called by the foreman thread while the query is still being setup

2016-05-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4676:
-
Reviewer: Chun Chang

> Foreman.moveToState can block forever if called by the foreman thread while 
> the query is still being setup
> --
>
> Key: DRILL-4676
> URL: https://issues.apache.org/jira/browse/DRILL-4676
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
> Fix For: 1.7.0
>
>
> When the query is being setup, foreman has a special CountDownLatch that 
> blocks rpc threads from delivering external events, this latch is unblocked 
> at the end of the query setup.
> In some cases though, when the foreman is submitting remote fragments, a 
> failure in RpcBus.send() causes an exception to be thrown that is reported to 
> Foreman.FragmentSubmitListener and blocks in the CountDownLatch. This causes 
> the foreman thread to block forever, and can rpc threads to be blocked too.
> This seems to happen more frequently at a high concurrency load, and also can 
> prevent clients from connecting to the Drillbits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4654) Expose New System Metrics

2016-05-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4654:
-
Reviewer: Krystal

> Expose New System Metrics
> -
>
> Key: DRILL-4654
> URL: https://issues.apache.org/jira/browse/DRILL-4654
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
> Fix For: 1.8.0
>
>
> + Add more metrics to the DrillMetrics registry (exposed through web UI and 
> jconsole, through JMX): pending queries, running queries, completed queries, 
> current memory usage (root allocator)
> + Clean up and document metric registration API
> -+ Deprecate getMetrics() method in contextual objects; use 
> DrillMetrics.getRegistry() directly-
> + Make JMX reporting and log reporting configurable through system properties 
> (since config file is not meant to be used in common module)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4298) SYSTEM ERROR: ChannelClosedException

2016-05-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4298:
-
Assignee: Deneche A. Hakim
Reviewer: Chun Chang

> SYSTEM ERROR: ChannelClosedException
> 
>
> Key: DRILL-4298
> URL: https://issues.apache.org/jira/browse/DRILL-4298
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - RPC
>Affects Versions: 1.5.0
>Reporter: Chun Chang
>Assignee: Deneche A. Hakim
> Fix For: 1.7.0
>
>
> 1.5.0-SNAPSHOT2f0e3f27e630d5ac15cdaef808564e01708c3c55
> Running functional regression, hit this error, seems random and not 
> associated with any particular query.
> From client side:
> {noformat}
> 1/5  create table `existing_partition_pruning/lineitempart` partition 
> by (dir0) as select * from 
> dfs.`/drill/testdata/partition_pruning/dfs/lineitempart`;
> Error: SYSTEM ERROR: ChannelClosedException: Channel closed 
> /10.10.100.171:31010 <--> /10.10.100.171:33713.
> Fragment 0:0
> [Error Id: 772d90b8-c5e6-4ecc-8776-68ccc6b57d49 on drillats1.qa.lab:31010] 
> (state=,code=0)
> java.sql.SQLException: SYSTEM ERROR: ChannelClosedException: Channel closed 
> /10.10.100.171:31010 <--> /10.10.100.171:33713.
> Fragment 0:0
> [Error Id: 772d90b8-c5e6-4ecc-8776-68ccc6b57d49 on drillats1.qa.lab:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
>   at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:321)
>   at 
> net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:172)
>   at sqlline.IncrementalRows.hasNext(IncrementalRows.java:62)
>   at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>   at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>   at sqlline.SqlLine.print(SqlLine.java:1593)
>   at sqlline.Commands.execute(Commands.java:852)
>   at sqlline.Commands.sql(Commands.java:751)
>   at sqlline.SqlLine.dispatch(SqlLine.java:746)
>   at sqlline.SqlLine.runCommands(SqlLine.java:1651)
>   at sqlline.Commands.run(Commands.java:1304)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
>   at sqlline.SqlLine.dispatch(SqlLine.java:742)
>   at sqlline.SqlLine.initArgs(SqlLine.java:553)
>   at sqlline.SqlLine.begin(SqlLine.java:596)
>   at sqlline.SqlLine.start(SqlLine.java:375)
>   at sqlline.SqlLine.main(SqlLine.java:268)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM 
> ERROR: ChannelClosedException: Channel closed /10.10.100.171:31010 <--> 
> /10.10.100.171:33713.
> Fragment 0:0
> [Error Id: 772d90b8-c5e6-4ecc-8776-68ccc6b57d49 on drillats1.qa.lab:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67)
>   at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374)
>   at 
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
>   at 
> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252)
>   at 
> org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> 

[jira] [Updated] (DRILL-4479) JsonReader should pick a less restrictive type when creating the default column

2016-05-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4479:
-
Reviewer: Chun Chang

> JsonReader should pick a less restrictive type when creating the default 
> column
> ---
>
> Key: DRILL-4479
> URL: https://issues.apache.org/jira/browse/DRILL-4479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.5.0
>Reporter: Aman Sinha
>Assignee: Aman Sinha
> Fix For: 1.7.0
>
> Attachments: mostlynulls.json
>
>
> This JIRA is related to DRILL-3806 but has a narrower scope, so I decided to 
> create separate one. 
> The JsonReader has the method ensureAtLeastOneField() (see 
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L91)
>  that ensures that when no columns are found, create an empty one and it 
> chooses to create a nullable int column.  One consequence is that queries of 
> the following type fail:
> {noformat}
> select c1 from dfs.`mostlynulls.json`;
> ...
> ...
> | null  |
> | null  |
> Error: DATA_READ ERROR: Error parsing JSON - You tried to write a VarChar 
> type when you are using a ValueWriter of type NullableIntWriterImpl.
> File  /Users/asinha/data/mostlynulls.json
> Record  4097
> {noformat}
> In this file the first 4096 rows have NULL values for c1 followed by rows 
> that have a valid string.  
> It would be useful for the Json reader to choose a less restrictive type such 
> as varchar in order to allow more types of queries to run.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3826) Concurrent Query Submission leads to Channel Closed Exception

2016-05-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3826:
-
Assignee: Deneche A. Hakim
Reviewer: Rahul Challapalli

> Concurrent Query Submission leads to Channel Closed Exception
> -
>
> Key: DRILL-3826
> URL: https://issues.apache.org/jira/browse/DRILL-3826
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC, Execution - RPC
>Affects Versions: 1.1.0, 1.2.0
> Environment: - CentOS release 6.6 (Final)
> - hadoop-2.7.1
> - hbase-1.0.1.1
> - drill-1.1.0
> - jdk-1.8.0_45
>Reporter: Yiyi Hu
>Assignee: Deneche A. Hakim
>  Labels: filesystem, hadoop, hbase, jdbc, rpc
> Fix For: 1.7.0
>
> Attachments: jdbc-test-client-drillbit.log, shell-sqlline.log, 
> shell-test-drillbit.log
>
>
> Frequently seen CHANNEL CLOSED EXCEPTION while running concurrent quries with 
> relatively large LIMIT.
> Here are the details,
> SET UP:
> - Single drillbit running on a single zookeeper node
> - 4G heap size, 8G direct memory
> - Storage plugins: local filesystem, hdfs, hbase
> TEST DATA:
> - A 50,000,000 records json file test.json, with two fields id, 
> title  (approximately 3G).
> SHELL TEST:
> - Running 4 drill shells concurrently with query:
>   SELECT id, title from dfs.`test.json` LIMIT 500.
> - Queries got canceled. Channel closing between client and server were seen 
> randomly, as an example shown below:
> {noformat}
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
> ChannelClosedException: Channel closed /192.168.4.201:31010 <--> 
> /192.168.4.201:48829.
> Fragment 0:0
> [Error Id: 0bd2b500-155e-46e0-9f26-bd89fea47a25 on TEST-101:31010]
>   at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>   at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>   at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>   at sqlline.SqlLine.print(SqlLine.java:1583)
>   at sqlline.Commands.execute(Commands.java:852)
>   at sqlline.Commands.sql(Commands.java:751)
>   at sqlline.SqlLine.dispatch(SqlLine.java:738)
>   at sqlline.SqlLine.begin(SqlLine.java:612)
>   at sqlline.SqlLine.start(SqlLine.java:366)
>   at sqlline.SqlLine.main(SqlLine.java:259)
> {noformat}
> JDBC TEST:
> - 6 separate threads running the same query: SELECT id, title from 
> dfs.`test.json` LIMIT 1000, each maintains its own connection. ResultSet, 
> statement and connection are closed finally.
> - Throws the same channel closed exception randomly. Log file were enclosed 
> for review.
> - Memory usage was monitored, all good.
> CROSS STORAGE PLUGINS:
> - The same issue can be found not only in JSON on a file system (local/hdfs), 
> but also in HBASE.
> - The issue was not found in a single thread application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3833) Concurrent Queries Failed Unexpectedly

2016-05-27 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3833:
-
Assignee: Deneche A. Hakim
Reviewer: Rahul Challapalli

> Concurrent Queries Failed Unexpectedly
> --
>
> Key: DRILL-3833
> URL: https://issues.apache.org/jira/browse/DRILL-3833
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC, Execution - RPC
>Affects Versions: 1.1.0, 1.2.0
> Environment: CentOS release 6.6 (Final)
> Hadoop-2.7.1
> Drill-1.1.0
> Single drillbit on single zookeeper
>Reporter: Yiyi Hu
>Assignee: Deneche A. Hakim
>  Labels: hdfs, jdbc, rpc
> Fix For: 1.7.0
>
> Attachments: drillbit.log
>
>
> Concurrent queries with a relatively large LIMIT size *failed*, where the 
> failure occurred randomly.
> To reproduce:
> - Test data: a JSON file test.json (at least 10,000,000 records, two fields 
> id, title);
> - Submit 5 queries with 5 separate threads using jdbc, where the query is:
> {panel}
> SELECT id, title FROM dfs.`test.json` LIMIT 1000;
> {panel}
> - The error message in drillbit.log:
> {noformat}
> 2015-09-24 19:15:15,393 [Client-1] INFO  
> o.a.d.j.i.DrillResultSetImpl$ResultsListener - [#4] Query failed: 
> org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: 
> ChannelClosedException: Channel closed /192.168.4.201:31010 <--> 
> /192.168.4.201:58795.
> Fragment 0:0
> [Error Id: 60a7baa8-a2ed-47e6-b7ca-68afd82c852a on TEST-101:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:118)
>  [drill-java-exec-1.1.0.jar:1.1.0]
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:111) 
> [drill-java-exec-1.1.0.jar:1.1.0]
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:47)
>  [drill-java-exec-1.1.0.jar:1.1.0]
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:32)
>  [drill-java-exec-1.1.0.jar:1.1.0]
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:61) 
> [drill-java-exec-1.1.0.jar:1.1.0]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:233) 
> [drill-java-exec-1.1.0.jar:1.1.0]
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:205) 
> [drill-java-exec-1.1.0.jar:1.1.0]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>  [netty-handler-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
>  [netty-codec-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>  [netty-transport-4.0.27.Final.jar:4.0.27.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
>  

[jira] [Updated] (DRILL-4657) Rank() will return wrong results if a frame of data is too big (more than 2 batches)

2016-05-16 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4657:
-
Reviewer: Abhishek Girish

> Rank() will return wrong results if a frame of data is too big (more than 2 
> batches)
> 
>
> Key: DRILL-4657
> URL: https://issues.apache.org/jira/browse/DRILL-4657
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.3.0
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.7.0
>
>
> When you run a query with RANK, and one particular frame is too long to fit 
> in 2 batches of data, you will get wrong result.
> I was able to reproduce the issue in a unit test, thanks to the fact that we 
> can control the size of the batches processed by the window operator. I will 
> post a fix soon along with the unit test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4478) binary_string cannot convert buffer that were not start from 0 correctly

2016-05-06 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4478:
-
Reviewer: Khurram Faraaz  (was: Aman Sinha)

> binary_string cannot convert buffer that were not start from 0 correctly
> 
>
> Key: DRILL-4478
> URL: https://issues.apache.org/jira/browse/DRILL-4478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Codegen
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
> Fix For: 1.7.0
>
>
> When binary_string was called multiple times, it can only convert the first 
> one correctly if the drillbuf start from 0. For the second and afterwards 
> calls, because the drillbuf is not starting from 0 thus 
> DrillStringUtils.parseBinaryString could not do the work correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3845) PartitionSender doesn't send last batch for receivers that already terminated

2016-05-06 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-3845:
-
Reviewer: Kunal Khatua  (was: Victoria Markman)

> PartitionSender doesn't send last batch for receivers that already terminated
> -
>
> Key: DRILL-3845
> URL: https://issues.apache.org/jira/browse/DRILL-3845
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
> Fix For: 1.5.0
>
> Attachments: 29c45a5b-e2b9-72d6-89f2-d49ba88e2939.sys.drill
>
>
> Even if a receiver has finished and informed the corresponding partition 
> sender, the sender will still try to send a "last batch" to the receiver when 
> it's done. In most cases this is fine as those batches will be silently 
> dropped by the receiving DataServer, but if a receiver has finished +10 
> minutes ago, DataServer will throw an exception as it couldn't find the 
> corresponding FragmentManager (WorkEventBus has a 10 minutes recentlyFinished 
> cache).
> DRILL-2274 is a reproduction for this case (after the corresponding fix is 
> applied).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4121) External Sort may not spill if above a receiver

2016-05-06 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4121:
-
Reviewer: Kunal Khatua  (was: Victoria Markman)

> External Sort may not spill if above a receiver
> ---
>
> Key: DRILL-4121
> URL: https://issues.apache.org/jira/browse/DRILL-4121
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
> Fix For: 1.5.0
>
>
> If external sort is above a receiver, all received batches will contain non 
> root buffers. Sort operator doesn't account for non root buffers when 
> estimating how much memory and if it needs to spill to disk. This may delay 
> the spill and cause the corresponding Drillbit to use large amounts of memory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4163) Support schema changes for MergeJoin operator.

2016-05-06 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4163:
-
Reviewer: Khurram Faraaz  (was: Victoria Markman)

> Support schema changes for MergeJoin operator.
> --
>
> Key: DRILL-4163
> URL: https://issues.apache.org/jira/browse/DRILL-4163
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: amit hadke
>Assignee: Jason Altekruse
> Fix For: 1.5.0
>
>
> Since external sort operator supports schema changes, allow use of union 
> types in merge join to support for schema changes.
> For now, we assume that merge join always works on record batches from sort 
> operator. Thus merging schemas and promoting to union vectors is already 
> taken care by sort operator.
> Test Cases:
> 1) Only one side changes schema (join on union type and primitive type)
> 2) Both sids change schema on all columns.
> 3) Join between numeric types and string types.
> 4) Missing columns - each batch has different columns. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4187) Introduce a state to separate queries pending execution from those pending in the queue.

2016-05-06 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4187:
-
Reviewer: Chun Chang  (was: Sudheesh Katkam)

> Introduce a state to separate queries pending execution from those pending in 
> the queue.
> 
>
> Key: DRILL-4187
> URL: https://issues.apache.org/jira/browse/DRILL-4187
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Hanifi Gunes
>Assignee: Hanifi Gunes
> Fix For: 1.5.0
>
>
> Currently queries pending in the queue are not listed in the web UI besides 
> we use the state PENDING to mean pending executions. This issue proposes i) 
> to list enqueued queries in the web UI ii) to introduce a new state for 
> queries sitting at the queue, differentiating then from those pending 
> execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2517) Apply Partition pruning before reading files during planning

2016-05-06 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-2517:
-
Reviewer: Kunal Khatua  (was: Victoria Markman)

> Apply Partition pruning before reading files during planning
> 
>
> Key: DRILL-2517
> URL: https://issues.apache.org/jira/browse/DRILL-2517
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Query Planning & Optimization
>Affects Versions: 0.7.0, 0.8.0
>Reporter: Adam Gilmore
>Assignee: Jinfeng Ni
> Fix For: 1.6.0, Future
>
>
> Partition pruning still tries to read Parquet files during the planning stage 
> even though they don't match the partition filter.
> For example, if there were an invalid Parquet file in a directory that should 
> not be queried:
> {code}
> 0: jdbc:drill:zk=local> select sum(price) from dfs.tmp.purchases where dir0 = 
> 1;
> Query failed: IllegalArgumentException: file:/tmp/purchases/4/0_0_0.parquet 
> is not a Parquet file (too small)
> {code}
> The reason is that the partition pruning happens after the Parquet plugin 
> tries to read the footer of each file.
> Ideally, partition pruning would happen first before the format plugin gets 
> involved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4490) Count(*) function returns as optional instead of required

2016-05-06 Thread Suresh Ollala (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Ollala updated DRILL-4490:
-
Reviewer: Krystal

> Count(*) function returns as optional instead of required
> -
>
> Key: DRILL-4490
> URL: https://issues.apache.org/jira/browse/DRILL-4490
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Krystal
>Assignee: Sean Hsuan-Yi Chu
> Fix For: 1.7.0
>
>
> git.commit.id.abbrev=c8a7840
> I have the following CTAS query:
> create table test as select count(*) as col1 from cp.`tpch/orders.parquet`;
> The schema of the test table shows col1 as optional:
> message root {
>   optional int64 col1;
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >