[jira] [Commented] (DRILL-4674) Can not cast 0 , 1 to boolean inside value constructor

2016-05-13 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282805#comment-15282805
 ] 

Jason Altekruse commented on DRILL-4674:


Do you have a tool that is generating this query so you need the provided 
syntax to work? I believe this should be possible using the keywords TRUE and 
FALSE instead of 1 and 0. I believe in that case you also will not have to cast 
them.

> Can not cast 0 , 1 to boolean inside value constructor
> --
>
> Key: DRILL-4674
> URL: https://issues.apache.org/jira/browse/DRILL-4674
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.7.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>
> Drill does not return results when we try to cast 0 and 1 to boolean inside a 
> value constructor.
> Drill version : 1.7.0-SNAPSHOT  commit ID : 09b26277
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> values(cast(1 as boolean));
> Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 1
> Fragment 0:0
> [Error Id: 35dcc4bb-0c5d-466f-8fb5-cf7f0a892155 on centos-02.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> values(cast(0 as boolean));
> Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 0
> Fragment 0:0
> [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010] 
> (state=,code=0)
> {noformat}
> Where as we get results on Postgres for same query.
> {noformat}
> postgres=# values(cast(1 as boolean));
>  column1
> -
>  t
> (1 row)
> postgres=# values(cast(0 as boolean));
>  column1
> -
>  f
> (1 row)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2016-05-13 07:16:16,578 [28ca80bf-0af9-bc05-258b-6b5744739ed8:frag:0:0] ERROR 
> o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalArgumentException: 
> Invalid value for boolean: 0
> Fragment 0:0
> [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalArgumentException: Invalid value for boolean: 0
> Fragment 0:0
> [Error Id: 2dbcafe2-92c7-475e-a2aa-9745ef72c1cc on centos-02.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
>  ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287)
>  [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.lang.IllegalArgumentException: Invalid value for boolean: 0
> at 
> org.apache.drill.exec.test.generated.ProjectorGen9.doSetup(ProjectorTemplate.java:95)
>  ~[na:na]
> at 
> org.apache.drill.exec.test.generated.ProjectorGen9.setup(ProjectorTemplate.java:93)
>  ~[na:na]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:444)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
>  ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> at 
> 

[jira] [Created] (DRILL-4663) FileSystem properties Config block from filesystem plugin are not being applied for file writers

2016-05-10 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4663:
--

 Summary: FileSystem properties Config block from filesystem plugin 
are not being applied for file writers
 Key: DRILL-4663
 URL: https://issues.apache.org/jira/browse/DRILL-4663
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jason Altekruse


Currently all of the record writers create their own empty filesystem 
configuration upon initialization. They do not currently apply the custom 
configurations that are included in the plugin configuration, which prevents 
users from setting custom properties on the write path. If possible this 
configuration should be shared with the readers. If there is a need to isolate 
this from the configuration used for the readers, we should still add the 
configurations from the storage plugin config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4659) Specify, as part of the query, table information: data format (CSV, parquet, JSON. etc.), field delimiter, etc.

2016-05-09 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15276627#comment-15276627
 ] 

Jason Altekruse commented on DRILL-4659:


This feature was added last fall, I think we may want to duplicate this 
information in the section about "Querying Data" to make it easier to find, but 
the feature is documented here.

https://drill.apache.org/docs/plugin-configuration-basics/#using-the-formats-attributes-as-table-function-parameters

If you would like to see more examples of usage or information about the 
features development this was the JIRA for the feature: 
https://issues.apache.org/jira/browse/DRILL-4047

> Specify, as part of the query, table information: data format (CSV, parquet, 
> JSON. etc.), field delimiter, etc.
> ---
>
> Key: DRILL-4659
> URL: https://issues.apache.org/jira/browse/DRILL-4659
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization, SQL Parser
>Reporter: Roger Dielrton
>Priority: Minor
>
> I have a file, that I would like to use in a query, and it can have one or 
> more of the following properties:
> * Has not extension ==> Drill is unable to handle it.
> * I know it contains data in CSV format, but the field separator is a non 
> standard character ==> Drill is unable to parse it (without modify the 
> storage plugin configuration).
> * Is located in an Amazon S3 bucket ==> I can't rename it.
> * Has a big size ==> It would be expensive to make a copy of it. 
> It would be nice if you can specify, as part of the "select" query, as 
> metadata, relevant table information as:
> * Data format (CSV, parquet, JSON. etc.)
> * Field delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2100) Drill not deleting spooling files

2016-05-03 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15269288#comment-15269288
 ] 

Jason Altekruse commented on DRILL-2100:


[~vitalii] Sorry for missing this during the review, won't using deleteOnExit 
wait until JVM shutdown to delete the directories? This isn't the desired 
behavior is it?

> Drill not deleting spooling files
> -
>
> Key: DRILL-2100
> URL: https://issues.apache.org/jira/browse/DRILL-2100
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 0.8.0
>Reporter: Abhishek Girish
>Assignee: Deneche A. Hakim
> Fix For: 1.7.0
>
>
> Currently, after forcing queries to use an external sort by switching off 
> hash join/agg causes spill-to-disk files accumulating. 
> This causes issues with disk space availability when the spill is configured 
> to be on the local file system (/tmp/drill). Also not optimal when configured 
> to use DFS (custom). 
> Drill must clean up all temporary files created after a query completes or 
> after a drillbit restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4641) Support for lzo compression

2016-04-25 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256672#comment-15256672
 ] 

Jason Altekruse commented on DRILL-4641:


Are you looking to query compressed text (csv/tsv) or parquet files that 
internally use LZO? I think it might be possible to do both of these things by 
including the correct LZO jars on your classpath when you run Drill. If I 
remember correctly the library everyone uses for LZO is GPL licensed, so we (as 
well as other apache projects like Parquet, Hadoop, etc.) cannot include it in 
our official distributions.

> Support for lzo compression
> ---
>
> Key: DRILL-4641
> URL: https://issues.apache.org/jira/browse/DRILL-4641
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: Future
> Environment: Not specific to platform
>Reporter: subbu srinivasan
>
> Would love support for quering lzo compressed files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4445) Remove extra code to work around mixture of arrays and Lists used in Logical and Physical query plan nodes

2016-04-20 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4445.

Resolution: Fixed

Fixed in d24205d4e795a1aab54b64708dde1e7deeca668b

> Remove extra code to work around mixture of arrays and Lists used in Logical 
> and Physical query plan nodes
> --
>
> Key: DRILL-4445
> URL: https://issues.apache.org/jira/browse/DRILL-4445
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
>
> The physical plan node classes for all of the operators currently use a mix 
> of arrays and Lists to refer to lists of incoming operators, expressions, and 
> other operator properties. This had lead to the introduction of several 
> utility methods for translating between the two representations, examples can 
> be seen in common/logical/data/Abstractbuilder.
> This isn't a major problem, but the new operator test framework uses these 
> classes as a primary interface for setting up the tests. It seemed worthwhile 
> to just refactor the classes to be consistent so that the tests would all be 
> similar. There are a few changes to execution code, but they are all just 
> trivial changes to use the list based interfaces (length vs size(), set() 
> instead of arr[i] = foo, etc.) as Jackson just transparently handles both 
> types the same (which is why this hasn't really been a problem).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4448) Specification of Ordering (ASC, DESC) on a sort plan node uses Strings for construction, should also allow for use of the corresponding Calcite Enums

2016-04-20 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4448.

Resolution: Fixed

Fixed in 1d1acc09ec30167f0653d99cee6f30c7b1413859

> Specification of Ordering (ASC, DESC) on a sort plan node uses Strings for 
> construction, should also allow for use of the corresponding Calcite Enums
> -
>
> Key: DRILL-4448
> URL: https://issues.apache.org/jira/browse/DRILL-4448
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jason Altekruse
>
> Small change to provide a cleaner interface when constructing sort 
> configurations in tests. The current class mixes together two tasks, 
> converting between the strings we chose to put in the plans (ASC, DESC) and 
> the Calcite enums, as well as validating the allowed values. Calcite has 
> several values that we do not currently use like 
> STRICTLY_ASCENDING/DESCENDING and CLUSTERED. We can break these two tasks 
> apart to allow for construction directly from the Enum, but still provide 
> validation for only the drill allowed values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4442) Improve VectorAccessible and RecordBatch interfaces to provide only necessary information to the correct consumers

2016-04-20 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4442.

Resolution: Fixed

Fixed in 01e04cddd6ad57f9ae146fe479e30bebcd7cc432

> Improve VectorAccessible and RecordBatch interfaces to provide only necessary 
> information to the correct consumers
> --
>
> Key: DRILL-4442
> URL: https://issues.apache.org/jira/browse/DRILL-4442
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
>
> During the creation of the operator test framework I ran into a small snag 
> trying to share code between the existing test infrastructure and the new 
> features to allow directly consuming the output of an operator rather than 
> that of a query.
> I needed to move the getSelectionVector2 and getSelectionVector4 methods up 
> to the VectorAccessible interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (DRILL-4437) Implement framework for testing operators in isolation

2016-04-20 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4437:
---
Comment: was deleted

(was: d93a3633815ed1c7efd6660eae62b7351a2c9739)

> Implement framework for testing operators in isolation
> --
>
> Key: DRILL-4437
> URL: https://issues.apache.org/jira/browse/DRILL-4437
> Project: Apache Drill
>  Issue Type: Test
>  Components: Tools, Build & Test
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.7.0
>
>
> Most of the tests written for Drill are end-to-end. We spin up a full 
> instance of the server, submit one or more SQL queries and check the results.
> While integration tests like this are useful for ensuring that all features 
> are guaranteed to not break end-user functionality overuse of this approach 
> has caused a number of pain points.
> Overall the tests end up running a lot of the exact same code, parsing and 
> planning many similar queries.
> Creating consistent reproductions of issues, especially edge cases found in 
> clustered environments can be extremely difficult. Even the simpler case of 
> testing cases where operators are able to handle a particular series of 
> incoming batches of records has required hacks like generating large enough 
> files so that the scanners happen to break them up into separate batches. 
> These tests are brittle as they make assumptions about how the scanners will 
> work in the future. An example of when this could break, we might do perf 
> evaluation to find out we should be producing larger batches in some cases. 
> Existing tests that are trying to test multiple batches by producing a few 
> more records than the current threshold for batch size would not be testing 
> the same code paths.
> We need to make more parts of the system testable without initializing the 
> entire Drill server, as well as making the different internal settings and 
> state of the server configurable for tests.
> This is a first effort to enable testing the physical operators in Drill by 
> mocking the components of the system necessary to enable operators to 
> initialize and execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4437) Implement framework for testing operators in isolation

2016-04-20 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4437.

Resolution: Fixed

Fixed in d93a3633815ed1c7efd6660eae62b7351a2c9739

> Implement framework for testing operators in isolation
> --
>
> Key: DRILL-4437
> URL: https://issues.apache.org/jira/browse/DRILL-4437
> Project: Apache Drill
>  Issue Type: Test
>  Components: Tools, Build & Test
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.7.0
>
>
> Most of the tests written for Drill are end-to-end. We spin up a full 
> instance of the server, submit one or more SQL queries and check the results.
> While integration tests like this are useful for ensuring that all features 
> are guaranteed to not break end-user functionality overuse of this approach 
> has caused a number of pain points.
> Overall the tests end up running a lot of the exact same code, parsing and 
> planning many similar queries.
> Creating consistent reproductions of issues, especially edge cases found in 
> clustered environments can be extremely difficult. Even the simpler case of 
> testing cases where operators are able to handle a particular series of 
> incoming batches of records has required hacks like generating large enough 
> files so that the scanners happen to break them up into separate batches. 
> These tests are brittle as they make assumptions about how the scanners will 
> work in the future. An example of when this could break, we might do perf 
> evaluation to find out we should be producing larger batches in some cases. 
> Existing tests that are trying to test multiple batches by producing a few 
> more records than the current threshold for batch size would not be testing 
> the same code paths.
> We need to make more parts of the system testable without initializing the 
> entire Drill server, as well as making the different internal settings and 
> state of the server configurable for tests.
> This is a first effort to enable testing the physical operators in Drill by 
> mocking the components of the system necessary to enable operators to 
> initialize and execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4437) Implement framework for testing operators in isolation

2016-04-20 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250409#comment-15250409
 ] 

Jason Altekruse commented on DRILL-4437:


d93a3633815ed1c7efd6660eae62b7351a2c9739

> Implement framework for testing operators in isolation
> --
>
> Key: DRILL-4437
> URL: https://issues.apache.org/jira/browse/DRILL-4437
> Project: Apache Drill
>  Issue Type: Test
>  Components: Tools, Build & Test
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.7.0
>
>
> Most of the tests written for Drill are end-to-end. We spin up a full 
> instance of the server, submit one or more SQL queries and check the results.
> While integration tests like this are useful for ensuring that all features 
> are guaranteed to not break end-user functionality overuse of this approach 
> has caused a number of pain points.
> Overall the tests end up running a lot of the exact same code, parsing and 
> planning many similar queries.
> Creating consistent reproductions of issues, especially edge cases found in 
> clustered environments can be extremely difficult. Even the simpler case of 
> testing cases where operators are able to handle a particular series of 
> incoming batches of records has required hacks like generating large enough 
> files so that the scanners happen to break them up into separate batches. 
> These tests are brittle as they make assumptions about how the scanners will 
> work in the future. An example of when this could break, we might do perf 
> evaluation to find out we should be producing larger batches in some cases. 
> Existing tests that are trying to test multiple batches by producing a few 
> more records than the current threshold for batch size would not be testing 
> the same code paths.
> We need to make more parts of the system testable without initializing the 
> entire Drill server, as well as making the different internal settings and 
> state of the server configurable for tests.
> This is a first effort to enable testing the physical operators in Drill by 
> mocking the components of the system necessary to enable operators to 
> initialize and execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4551) Add some missing functions that are generated by Tableau (cot, regex_matches, split_part, isdate)

2016-03-29 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4551:
--

 Summary: Add some missing functions that are generated by Tableau 
(cot, regex_matches, split_part, isdate)
 Key: DRILL-4551
 URL: https://issues.apache.org/jira/browse/DRILL-4551
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse
Assignee: Jason Altekruse


Several of these functions do not appear to be standard SQL functions, but they 
are available in several other popular databases like SQL Server, Oracle and 
Postgres.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4505) Can't group by or sort across files with different schema

2016-03-14 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15193881#comment-15193881
 ] 

Jason Altekruse commented on DRILL-4505:


[~tobad357] Can you try to add a cast to your column APPLICATION_ID? Work is 
ongoing to fully support changing schema, which includes a concept of an 
untyped null that tries to defer materialization until it is needed. In this 
case I believe it is possible that we are materializing the column that does 
not appear in some of the files to a default type (we arbitrarily chose 
nullable bigint before starting work on the full changing schema support). 
Casting these automatically materialized nulls to the correct type may resolve 
the issue you are seeing.

If this doesn't fix the issue, you can try to enable the union type, but it is 
currently considered an experimental feature and needs to be more thoroughly 
tested.

alter session set `exec.enable_union_type` = true

https://issues.apache.org/jira/browse/DRILL-3229

> Can't group by or sort across files with different schema
> -
>
> Key: DRILL-4505
> URL: https://issues.apache.org/jira/browse/DRILL-4505
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.5.0
> Environment: Java 1.8
>Reporter: Tobias
>
> We are currently trying out the support for querying across parquet files 
> with different schemas.
> Simple selects work well but when we wan't to do sort or group by Drill 
> returns "UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support sorts 
> with changing schemas Fragment 0:0 [Error Id: 
> ff490670-64c1-4fb8-990e-a02aa44ac010 on zookeeper-1:31010]"
> This is despite not even including the new columns in the query.
> Expected result would be to treat the non existing columns in certain files 
> as either null or default value and allow them to be grouped and sorted
> Example
> SELECT APPLICATION_ID ,dir0 AS year_ FROM dfs.`/PRO/UTC/1` WHERE dir2 
> >='2016-01-01' AND dir2<'2016-04-02' work with changing schema
> but SELECT max(APPLICATION_ID ),dir0 AS year_ FROM dfs.`/PRO/UTC/1` WHERE 
> dir2 >='2016-01-01' AND dir2<'2016-04-02'  group by dir0 does not work
> For us this hampers any possibility to have an evolving schema with moderatly 
> complex queries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-09 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4482.

Resolution: Fixed
  Assignee: Jason Altekruse  (was: Stefán Baxter)

Fixed in 64ab0a8ec9d98bf96f4d69274dddc180b8efe263

> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185880#comment-15185880
 ] 

Jason Altekruse commented on DRILL-4482:


I definitely found a fixed the issue, the regression was introduced by 
DRILL-4382, but the tests were not written properly to catch the change. Adding 
more tests now, patch should be posted soon.

> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185658#comment-15185658
 ] 

Jason Altekruse commented on DRILL-4482:


I think I may have reproduced the issue, is this field in your dataset a Map or 
a Record in the avro schema? I am seeing nulls in a case with maps, I am trying 
to figure out the cause right now.

I will be improving our test coverage for Avro as a part of this change to make 
sure we don't have regressions like this in the future.

> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185392#comment-15185392
 ] 

Jason Altekruse commented on DRILL-4482:


[~acmeguy] Thanks for the quick response, I will continue to try to reproduce 
the failure.

> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4482) Avro no longer selects data correctly from a sub-structure

2016-03-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185364#comment-15185364
 ] 

Jason Altekruse commented on DRILL-4482:


[~acmeguy] I'm trying to reproduce this issue and not seeing it on a small avro 
file. There is no guarantee about read order when reading a directory, so 
running a limit 0 query over the same table in two formats (or even the same 
list of files two different times) will not be guaranteed to give the same 
result. Is transactions a directory or file without an extension?

Are you sure that there are not null values in this column? Could you try to 
run a query with a predictable result like a max/min on the column or a limit 
with a sort? 

It is still possible that this is a Drill bug, and I will try with a 
distributed query to see if I can reproduce it, but if you have time to try to 
confirm any of these things it could help with creating a reproduction.

> Avro no longer selects data correctly from a sub-structure
> --
>
> Key: DRILL-4482
> URL: https://issues.apache.org/jira/browse/DRILL-4482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.6.0
>Reporter: Stefán Baxter
>Assignee: Stefán Baxter
>Priority: Blocker
> Fix For: 1.6.0
>
>
> Parquet:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/processed/<>/transactions` as s limit 1;
> ++
> | EXPR$0 |
> ++
> | 87.55.171.210  |
> ++
> 1 row selected (1.184 seconds)
> Avro:
> 0: jdbc:drill:zk=local> select s.client_ip.ip from 
> dfs.asa.`/streaming/<>/transactions` as s limit 1;
> +-+
> | EXPR$0  |
> +-+
> | null|
> +-+
> 1 row selected (0.29 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4332) tests in TestFrameworkTest fail in Java 8

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4332.

   Resolution: Fixed
Fix Version/s: (was: Future)
   1.6.0

Fixed in 447b093cd2b05bfeae001844a7e3573935e84389

> tests in TestFrameworkTest fail in Java 8
> -
>
> Key: DRILL-4332
> URL: https://issues.apache.org/jira/browse/DRILL-4332
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Assignee: Laurent Goujon
> Fix For: 1.6.0
>
>
> the following unit tests fail in Java 8:
> {noformat}
> TestFrameworkTest.testRepeatedColumnMatching
> TestFrameworkTest.testCSVVerificationOfOrder_checkFailure
> {noformat}
> The tests expect the query to fail with a specific error message. The message 
> generated by DrillTestWrapper.compareMergedVectors assumes a specific order 
> in a map keySet (which we shouldn't). In Java 8 it seems the order changed 
> which causes a slightly different error message



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4486) Expression serializer incorrectly serializes escaped characters

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4486.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in 80316f3f8bef866720f99e609fe758ec8e0c4612

> Expression serializer incorrectly serializes escaped characters
> ---
>
> Key: DRILL-4486
> URL: https://issues.apache.org/jira/browse/DRILL-4486
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Steven Phillips
>Assignee: Steven Phillips
> Fix For: 1.6.0
>
>
> the drill expression parser requires backslashes to be escaped. But the 
> ExpressionStringBuilder is not properly escaping them. This causes problems, 
> especially in the case of regex expressions run with parallel execution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4375) Fix the maven release profile, broken by jdbc jar size enforcer added in DRILL-4291

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4375.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in 1f29914fc5c7d1e36651ac28167804c4012501fe

> Fix the maven release profile, broken by jdbc jar size enforcer added in 
> DRILL-4291
> ---
>
> Key: DRILL-4375
> URL: https://issues.apache.org/jira/browse/DRILL-4375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse reassigned DRILL-2048:
--

Assignee: Jason Altekruse

> Malformed drill stoage config stored in zookeeper will prevent Drill from 
> starting
> --
>
> Key: DRILL-2048
> URL: https://issues.apache.org/jira/browse/DRILL-2048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.7.0
>
>
> We noticed this problem while trying to test dev builds on a common cluster. 
> When applying changes that added a field to the configuration of a storage 
> plugin, the new format of the configuration would be persisted in zookeeper. 
> When a different dev build that did not include the change set tried to be 
> deployed on the same cluster the config stored in zookeeper would fail to 
> parse and the drillbit would not be able to start. This is not system 
> critical configuration so the drillbit should be able to still start with the 
> plugin disabled.
> This fix could also include changing the jackson mapper to allow ignoring 
> unexpected fields in the configuration. This would give a little better 
> chance for interoperability between future versions of Drill as we add new 
> configuration options as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-2048:
---
Fix Version/s: (was: Future)

> Malformed drill stoage config stored in zookeeper will prevent Drill from 
> starting
> --
>
> Key: DRILL-2048
> URL: https://issues.apache.org/jira/browse/DRILL-2048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.7.0
>
>
> We noticed this problem while trying to test dev builds on a common cluster. 
> When applying changes that added a field to the configuration of a storage 
> plugin, the new format of the configuration would be persisted in zookeeper. 
> When a different dev build that did not include the change set tried to be 
> deployed on the same cluster the config stored in zookeeper would fail to 
> parse and the drillbit would not be able to start. This is not system 
> critical configuration so the drillbit should be able to still start with the 
> plugin disabled.
> This fix could also include changing the jackson mapper to allow ignoring 
> unexpected fields in the configuration. This would give a little better 
> chance for interoperability between future versions of Drill as we add new 
> configuration options as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-2048) Malformed drill stoage config stored in zookeeper will prevent Drill from starting

2016-03-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-2048:
---
Fix Version/s: 1.7.0

> Malformed drill stoage config stored in zookeeper will prevent Drill from 
> starting
> --
>
> Key: DRILL-2048
> URL: https://issues.apache.org/jira/browse/DRILL-2048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
> Fix For: 1.7.0
>
>
> We noticed this problem while trying to test dev builds on a common cluster. 
> When applying changes that added a field to the configuration of a storage 
> plugin, the new format of the configuration would be persisted in zookeeper. 
> When a different dev build that did not include the change set tried to be 
> deployed on the same cluster the config stored in zookeeper would fail to 
> parse and the drillbit would not be able to start. This is not system 
> critical configuration so the drillbit should be able to still start with the 
> plugin disabled.
> This fix could also include changing the jackson mapper to allow ignoring 
> unexpected fields in the configuration. This would give a little better 
> chance for interoperability between future versions of Drill as we add new 
> configuration options as necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4334) tests in TestMongoFilterPushDown fail in Java 8

2016-03-07 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183421#comment-15183421
 ] 

Jason Altekruse commented on DRILL-4334:


[~laurentgo] Was this superseded by DRILL-4467? Or does another change still 
need to be made to fix this issue?

> tests in TestMongoFilterPushDown fail in Java 8
> ---
>
> Key: DRILL-4334
> URL: https://issues.apache.org/jira/browse/DRILL-4334
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
>Assignee: Laurent Goujon
> Fix For: Future
>
>
> All tests in TestMongoFilterPushDown fail in Java8. It looks like the filter 
> is not pushed down to the Mongo storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4375) Fix the maven release profile, broken by jdbc jar size enforcer added in DRILL-4291

2016-03-07 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15183337#comment-15183337
 ] 

Jason Altekruse commented on DRILL-4375:


Because a fix of this issue is necessary to run a release at all, I would like 
to merge the current patch and try to come up with a cleaner solution after the 
release.

I will reflect your comments onto the JIRA I had opened up as a follow-up to 
this one and make sure we find the most complete and correct solution to the 
issue soon.

I'm considering "For now, I am good with your current patch too." a +1, with 
the assumption that we should definitely make a follow up change soon.

> Fix the maven release profile, broken by jdbc jar size enforcer added in 
> DRILL-4291
> ---
>
> Key: DRILL-4375
> URL: https://issues.apache.org/jira/browse/DRILL-4375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4384) Query profile is missing important information on WebUi

2016-03-03 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178324#comment-15178324
 ] 

Jason Altekruse commented on DRILL-4384:


Fixed in c95b5432301fe487d64a1fc06e765228469fc3a2

> Query profile is missing important information on WebUi
> ---
>
> Key: DRILL-4384
> URL: https://issues.apache.org/jira/browse/DRILL-4384
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jacques Nadeau
>Priority: Blocker
> Fix For: 1.6.0
>
> Attachments: DRILL-4384.patch
>
>
> Built drill from master branch (0a2518d7cf01a92a27a82e29edac5424bedf31d5) and 
> started in embedded mode. Then,
> run a query and checked the query profile through WebUI. However,
> seems that the fragment profiles , operator profiles and visualized
> plan sections are all empty. Tried both Mac and CentOS and hit the same
> problem.
> After doing a binary search over recent commits, seems the patch of
> "DRILL-3581: Upgrade HPPC to 0.7.1" is the cause of broken query
> profiles [1].  The query profile on the commits before DRILL-3581
> looks fine.
> [1] 
> https://github.com/apache/drill/commit/d27127c94d5c08306697a5627a1bac5f144abb22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4471) Add unit test for the Drill Web UI

2016-03-03 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4471:
--

 Summary: Add unit test for the Drill Web UI
 Key: DRILL-4471
 URL: https://issues.apache.org/jira/browse/DRILL-4471
 Project: Apache Drill
  Issue Type: Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse


While the Web UI isn't being very actively developed, a few times changes to 
the Drill build or internal parts of the server have broken parts of the Web UI.

As the web UI is a primary interface for viewing cluster information, 
cancelling queries, configuring storage and other tasks, we really should add 
automated tests for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4384) Query profile is missing important information on WebUi

2016-03-03 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178321#comment-15178321
 ] 

Jason Altekruse commented on DRILL-4384:


[~jni] Venki merged this yesterday along with some other outstanding patches. I 
do agree with you about the automated tests for the UI. I have opened 
DRILL-4471 to track this task.

> Query profile is missing important information on WebUi
> ---
>
> Key: DRILL-4384
> URL: https://issues.apache.org/jira/browse/DRILL-4384
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jinfeng Ni
>Assignee: Jacques Nadeau
>Priority: Blocker
> Fix For: 1.6.0
>
> Attachments: DRILL-4384.patch
>
>
> Built drill from master branch (0a2518d7cf01a92a27a82e29edac5424bedf31d5) and 
> started in embedded mode. Then,
> run a query and checked the query profile through WebUI. However,
> seems that the fragment profiles , operator profiles and visualized
> plan sections are all empty. Tried both Mac and CentOS and hit the same
> problem.
> After doing a binary search over recent commits, seems the patch of
> "DRILL-3581: Upgrade HPPC to 0.7.1" is the cause of broken query
> profiles [1].  The query profile on the commits before DRILL-3581
> looks fine.
> [1] 
> https://github.com/apache/drill/commit/d27127c94d5c08306697a5627a1bac5f144abb22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4441) IN operator does not work with Avro reader

2016-03-03 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15178020#comment-15178020
 ] 

Jason Altekruse commented on DRILL-4441:


I need to fix this for varbinary too, but to make the test case work I had to 
fix casts from varchar to varbinary, as it does not appear we support binary 
literals.

> IN operator does not work with Avro reader
> --
>
> Key: DRILL-4441
> URL: https://issues.apache.org/jira/browse/DRILL-4441
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.5.0
> Environment: Ubuntu
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
>Priority: Critical
> Fix For: 1.6.0
>
>
> IN operator simply does not work. 
> (And I find it interesting that Storage-Avro is not available here in Jira as 
> a Storage component)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4441) IN operator does not work with Avro reader

2016-03-03 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse reassigned DRILL-4441:
--

Assignee: Jason Altekruse

> IN operator does not work with Avro reader
> --
>
> Key: DRILL-4441
> URL: https://issues.apache.org/jira/browse/DRILL-4441
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.5.0
> Environment: Ubuntu
>Reporter: Stefán Baxter
>Assignee: Jason Altekruse
>Priority: Critical
> Fix For: 1.6.0
>
>
> IN operator simply does not work. 
> (And I find it interesting that Storage-Avro is not available here in Jira as 
> a Storage component)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-4447) Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY

2016-02-29 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse reassigned DRILL-4447:
--

Assignee: Jason Altekruse

> Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and 
> GROUP BY
> -
>
> Key: DRILL-4447
> URL: https://issues.apache.org/jira/browse/DRILL-4447
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
> Environment: Centos 6.2/Distributed/CDH5.4.9
>Reporter: Ryan Clough
>Assignee: Jason Altekruse
>Priority: Critical
> Attachments: timestamps.txt, timestamps_parquet.tar.gz
>
>
> The issue comes from a larger query, but I've managed to narrow it down to 
> what is a minimally reproducible issue.
> Given a list of timestamps (will attach files) associated with 3 days, We 
> want to select the distinct dates (total: 3 days) from this list. To do this, 
> I decided to use the TO_DATE function, which does exactly what I want it.
> Note, there are 47 distinct timestamps in the data set.
> {code:sql}
> jdbc:drill:> SELECT DISTINCT(TO_DATE(data_date)) AS data_date
> . . . . . . . > FROM timestamps;
> +-+
> |  data_date  |
> +-+
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-23  |
> +-+
> 47 rows selected (11.057 seconds)
> {code}
> As you can see, DRILL has ignored the TO_DATE function when checking for 
> distinct records (note that the 47 rows matches the 47 rows of distinct 
> timestamps).
> My testing has also shown that this affect GROUP BY. I wouldn't be surprised 
> if it manifested its self elsewhere.
> I tried to get around the problem by converting the dates to a string using 
> TO_CHAR: surely drill will use the resulting strings to do the DISTINCT 
> comparison?
> {code:sql}
> drill:> SELECT DISTINCT(TO_CHAR(TO_DATE(data_date))) FROM timestamps;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [to_char(DATE-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: bcad87f0-3353-4a3b-842e-c68a02b394c3 on 
> lvimhdpa14.lv.vimeows.com:31010] (state=,code=0)
> {code}
> As far as I can tell from the docs, you SHOULD be able to convert a date to a 
> string with TO_CHAR(). I'm not sure what the underlying issue is here, but I 
> thought it good to report the issue.
> Please let me know if you need any further info, query plans, etc, but it 
> should be reproducable with the timestamps data I'll attach in a minute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4447) Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY

2016-02-29 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4447:
---
Priority: Critical  (was: Major)

> Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and 
> GROUP BY
> -
>
> Key: DRILL-4447
> URL: https://issues.apache.org/jira/browse/DRILL-4447
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
> Environment: Centos 6.2/Distributed/CDH5.4.9
>Reporter: Ryan Clough
>Priority: Critical
> Attachments: timestamps.txt, timestamps_parquet.tar.gz
>
>
> The issue comes from a larger query, but I've managed to narrow it down to 
> what is a minimally reproducible issue.
> Given a list of timestamps (will attach files) associated with 3 days, We 
> want to select the distinct dates (total: 3 days) from this list. To do this, 
> I decided to use the TO_DATE function, which does exactly what I want it.
> Note, there are 47 distinct timestamps in the data set.
> {code:sql}
> jdbc:drill:> SELECT DISTINCT(TO_DATE(data_date)) AS data_date
> . . . . . . . > FROM timestamps;
> +-+
> |  data_date  |
> +-+
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-23  |
> +-+
> 47 rows selected (11.057 seconds)
> {code}
> As you can see, DRILL has ignored the TO_DATE function when checking for 
> distinct records (note that the 47 rows matches the 47 rows of distinct 
> timestamps).
> My testing has also shown that this affect GROUP BY. I wouldn't be surprised 
> if it manifested its self elsewhere.
> I tried to get around the problem by converting the dates to a string using 
> TO_CHAR: surely drill will use the resulting strings to do the DISTINCT 
> comparison?
> {code:sql}
> drill:> SELECT DISTINCT(TO_CHAR(TO_DATE(data_date))) FROM timestamps;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [to_char(DATE-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: bcad87f0-3353-4a3b-842e-c68a02b394c3 on 
> lvimhdpa14.lv.vimeows.com:31010] (state=,code=0)
> {code}
> As far as I can tell from the docs, you SHOULD be able to convert a date to a 
> string with TO_CHAR(). I'm not sure what the underlying issue is here, but I 
> thought it good to report the issue.
> Please let me know if you need any further info, query plans, etc, but it 
> should be reproducable with the timestamps data I'll attach in a minute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4447) Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY

2016-02-29 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172690#comment-15172690
 ] 

Jason Altekruse commented on DRILL-4447:


Couple of things here, the reason that the to_char function was not found is 
that it actually requires a second parameter to specify the desired format. 
More info can be found on this doc page: 
https://drill.apache.org/docs/data-type-conversion/#to_char

Using to_char correctly I was able to get the 3 distinct values back int he 
query, but your original query looks like a bug running a distinct aggregate 
over date values. I am going to raise the priority as this is a wrong result 
issue and try to look into it soon.

I do not understand why writing the data out to a file changes the result of 
the aggregation, that seems pretty puzzling.



> Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and 
> GROUP BY
> -
>
> Key: DRILL-4447
> URL: https://issues.apache.org/jira/browse/DRILL-4447
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
> Environment: Centos 6.2/Distributed/CDH5.4.9
>Reporter: Ryan Clough
> Attachments: timestamps.txt, timestamps_parquet.tar.gz
>
>
> The issue comes from a larger query, but I've managed to narrow it down to 
> what is a minimally reproducible issue.
> Given a list of timestamps (will attach files) associated with 3 days, We 
> want to select the distinct dates (total: 3 days) from this list. To do this, 
> I decided to use the TO_DATE function, which does exactly what I want it.
> Note, there are 47 distinct timestamps in the data set.
> {code:sql}
> jdbc:drill:> SELECT DISTINCT(TO_DATE(data_date)) AS data_date
> . . . . . . . > FROM timestamps;
> +-+
> |  data_date  |
> +-+
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-23  |
> +-+
> 47 rows selected (11.057 seconds)
> {code}
> As you can see, DRILL has ignored the TO_DATE function when checking for 
> distinct records (note that the 47 rows matches the 47 rows of distinct 
> timestamps).
> My testing has also shown that this affect GROUP BY. I wouldn't be surprised 
> if it manifested its self elsewhere.
> I tried to get around the problem by converting the dates to a string using 
> TO_CHAR: surely drill will use the resulting strings to do the DISTINCT 
> comparison?
> {code:sql}
> drill:> SELECT DISTINCT(TO_CHAR(TO_DATE(data_date))) FROM timestamps;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [to_char(DATE-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: bcad87f0-3353-4a3b-842e-c68a02b394c3 on 
> lvimhdpa14.lv.vimeows.com:31010] (state=,code=0)
> {code}
> As far as I can tell from the docs, you SHOULD be able to convert a date to a 
> string with TO_CHAR(). I'm not sure what the underlying issue is here, but I 
> thought it good to report the issue.
> Please let me know if you need any further info, query plans, etc, but it 
> should be reproducable with the timestamps data I'll attach in a minute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4451) Improve operator unit tests to allow for direct inspection of the sequence of result batches

2016-02-26 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4451:
--

 Summary: Improve operator unit tests to allow for direct 
inspection of the sequence of result batches
 Key: DRILL-4451
 URL: https://issues.apache.org/jira/browse/DRILL-4451
 Project: Apache Drill
  Issue Type: Test
  Components: Tools, Build & Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse


The first version of the operator test framework allows for comparison of the 
result set with a baseline, but does not give a way to specify the expected 
batch boundaries. All of the batches are combined together before they are 
compared (sharing code with the existing test infrastructure for complete SQL 
queries).

The framework should also include a way to directly inspect SV2 and SV4 batches 
that are produced by operators like filter and sort. These structures are used 
to store a view into the incoming data (an SV2 is a bitmask for everything that 
matched the filter and an SV4 is used to represent cross-batch pointers to 
reflect the sorted order of a series of batches without rewriting them). 
Currently the test just follows the pointers to iterate over the values as they 
would appear after a rewrite of the data (by the SelectionVectorRemover 
operator).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4450) Improve operator unit tests to allow for setting custom options on a test

2016-02-26 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4450:
--

 Summary: Improve operator unit tests to allow for setting custom 
options on a test
 Key: DRILL-4450
 URL: https://issues.apache.org/jira/browse/DRILL-4450
 Project: Apache Drill
  Issue Type: Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse


The initial work done on the operator test framework included mocking of the 
system/session options just complete enough to get the first ~10 operators to 
execute a single query. These values are currently shared across all tests. To 
test all code paths we will need a way to set options from individual tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4448) Specification of Ordering (ASC, DESC) on a sort plan node uses Strings for construction, should also allow for use of the corresponding Calcite Enums

2016-02-26 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4448:
---
Description: Small change to provide a cleaner interface when constructing 
sort configurations in tests. The current class mixes together two tasks, 
converting between the strings we chose to put in the plans (ASC, DESC) and the 
Calcite enums, as well as validating the allowed values. Calcite has several 
values that we do not currently use like STRICTLY_ASCENDING/DESCENDING and 
CLUSTERED. We can break these two tasks apart to allow for construction 
directly from the Enum, but still provide validation for only the drill allowed 
values.

> Specification of Ordering (ASC, DESC) on a sort plan node uses Strings for 
> construction, should also allow for use of the corresponding Calcite Enums
> -
>
> Key: DRILL-4448
> URL: https://issues.apache.org/jira/browse/DRILL-4448
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jason Altekruse
>
> Small change to provide a cleaner interface when constructing sort 
> configurations in tests. The current class mixes together two tasks, 
> converting between the strings we chose to put in the plans (ASC, DESC) and 
> the Calcite enums, as well as validating the allowed values. Calcite has 
> several values that we do not currently use like 
> STRICTLY_ASCENDING/DESCENDING and CLUSTERED. We can break these two tasks 
> apart to allow for construction directly from the Enum, but still provide 
> validation for only the drill allowed values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4448) Specification of Ordering (ASC, DESC) on a sort plan node uses Strings for construction, should also allow for use of the corresponding Calcite Enums

2016-02-26 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4448:
---
Summary: Specification of Ordering (ASC, DESC) on a sort plan node uses 
Strings for construction, should also allow for use of the corresponding 
Calcite Enums  (was: Specification of Ordering (ASC, DESC) on a sort plan node 
uses Strings for it construction, should also allow for use of the 
corresponding Calcite Enums)

> Specification of Ordering (ASC, DESC) on a sort plan node uses Strings for 
> construction, should also allow for use of the corresponding Calcite Enums
> -
>
> Key: DRILL-4448
> URL: https://issues.apache.org/jira/browse/DRILL-4448
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jason Altekruse
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4448) Specification of Ordering (ASC, DESC) on a sort plan node uses Strings for it construction, should also allow for use of the corresponding Calcite Enums

2016-02-26 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4448:
--

 Summary: Specification of Ordering (ASC, DESC) on a sort plan node 
uses Strings for it construction, should also allow for use of the 
corresponding Calcite Enums
 Key: DRILL-4448
 URL: https://issues.apache.org/jira/browse/DRILL-4448
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4445) Remove extra code to work around mixture of arrays and Lists used in Logical and Physical query plan nodes

2016-02-26 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4445:
--

 Summary: Remove extra code to work around mixture of arrays and 
Lists used in Logical and Physical query plan nodes
 Key: DRILL-4445
 URL: https://issues.apache.org/jira/browse/DRILL-4445
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse
Assignee: Jason Altekruse


The physical plan node classes for all of the operators currently use a mix of 
arrays and Lists to refer to lists of incoming operators, expressions, and 
other operator properties. This had lead to the introduction of several utility 
methods for translating between the two representations, examples can be seen 
in common/logical/data/Abstractbuilder.

This isn't a major problem, but the new operator test framework uses these 
classes as a primary interface for setting up the tests. It seemed worthwhile 
to just refactor the classes to be consistent so that the tests would all be 
similar. There are a few changes to execution code, but they are all just 
trivial changes to use the list based interfaces (length vs size(), set() 
instead of arr[i] = foo, etc.) as Jackson just transparently handles both types 
the same (which is why this hasn't really been a problem).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4441) IN operator does not work with Avro reader

2016-02-26 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169439#comment-15169439
 ] 

Jason Altekruse commented on DRILL-4441:


I also added a new component for Storage - Avro

> IN operator does not work with Avro reader
> --
>
> Key: DRILL-4441
> URL: https://issues.apache.org/jira/browse/DRILL-4441
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.5.0
> Environment: Ubuntu
>Reporter: Stefán Baxter
>Priority: Critical
> Fix For: 1.6.0
>
>
> IN operator simply does not work. 
> (And I find it interesting that Storage-Avro is not available here in Jira as 
> a Storage component)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4441) IN operator does not work with Avro reader

2016-02-26 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4441:
---
Component/s: (was: Storage - Other)
 Storage - Avro

> IN operator does not work with Avro reader
> --
>
> Key: DRILL-4441
> URL: https://issues.apache.org/jira/browse/DRILL-4441
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Avro
>Affects Versions: 1.5.0
> Environment: Ubuntu
>Reporter: Stefán Baxter
>Priority: Critical
> Fix For: 1.6.0
>
>
> IN operator simply does not work. 
> (And I find it interesting that Storage-Avro is not available here in Jira as 
> a Storage component)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4441) IN operator does not work with Avro reader

2016-02-26 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4441:
---
Priority: Critical  (was: Major)

> IN operator does not work with Avro reader
> --
>
> Key: DRILL-4441
> URL: https://issues.apache.org/jira/browse/DRILL-4441
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.5.0
> Environment: Ubuntu
>Reporter: Stefán Baxter
>Priority: Critical
> Fix For: 1.6.0
>
>
> IN operator simply does not work. 
> (And I find it interesting that Storage-Avro is not available here in Jira as 
> a Storage component)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4441) IN operator does not work with Avro reader

2016-02-26 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169423#comment-15169423
 ] 

Jason Altekruse commented on DRILL-4441:


[~acmeguy] A workaround you can use today is to cast the value coming out of 
avro to varchar and provide a maximum length value (it is incorrectly not being 
set when we create the table metadata based on the file, so it defaults to 1). 
We need to also file a follow-up bug as to why we are inconsistently apply a 
truncation based on this metadata, which is why you see different results.

I will be posting a patch shortly that fixes the bug without a cast.

cast(s.sold_to varchar(65000))

> IN operator does not work with Avro reader
> --
>
> Key: DRILL-4441
> URL: https://issues.apache.org/jira/browse/DRILL-4441
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.5.0
> Environment: Ubuntu
>Reporter: Stefán Baxter
> Fix For: 1.6.0
>
>
> IN operator simply does not work. 
> (And I find it interesting that Storage-Avro is not available here in Jira as 
> a Storage component)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4442) Improve VectorAccessible and RecordBatch interfaces to provide only necessary information to the correct consumers

2016-02-25 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4442:
--

 Summary: Improve VectorAccessible and RecordBatch interfaces to 
provide only necessary information to the correct consumers
 Key: DRILL-4442
 URL: https://issues.apache.org/jira/browse/DRILL-4442
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse
Assignee: Jason Altekruse


During the creation of the operator test framework I ran into a small snag 
trying to share code between the existing test infrastructure and the new 
features to allow directly consuming the output of an operator rather than that 
of a query.

I needed to move the getSelectionVector2 and getSelectionVector4 methods up to 
the VectorAccessible interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4436) Result data gets mixed up when various tables have a column "label"

2016-02-25 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4436:
---
Component/s: Storage - JDBC

> Result data gets mixed up when various tables have a column "label"
> ---
>
> Key: DRILL-4436
> URL: https://issues.apache.org/jira/browse/DRILL-4436
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JDBC
>Affects Versions: 1.5.0
> Environment: Drill 1.5.0 with Zookeeper on CentOS 7.0 
>Reporter: Vincent Uribe
>
> We have two tables in a MySQL database:
> CREATE TABLE `Gender` (
>   `genderId` bigint(20) NOT NULL AUTO_INCREMENT,
>   `label` varchar(15) NOT NULL,
>   PRIMARY KEY (`genderId`)
> ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1;
> CREATE TABLE `Civility` (
>   `civilityId` bigint(20) NOT NULL AUTO_INCREMENT,
>   `abbreviation` varchar(15) NOT NULL,
>   `label` varchar(60) DEFAULT NULL
>   PRIMARY KEY (`civilityId`)
> ) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1;
> With a query on these two tables with Gender.label as 'gender' and 
> Civility.label as 'civility', we obtain, depending of the query :
> * gender in civility
> * civility in the gender
> * NULL in the other column (gender or civility)
> if we drop the table Gender and recreate it with like this:
> CREATE TABLE `Gender` (
>   `genderId` bigint(20) NOT NULL AUTO_INCREMENT,
>   `label2` varchar(15) NOT NULL,
>   PRIMARY KEY (`genderId`)
> ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1;
> Everything is fine.
> I guess something is wrong with the metadata...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4439) Improve new unit operator tests to handle operators that expect RawBatchBuffers off of the wire, such as the UnorderedReciever and MergingReciever

2016-02-25 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4439:
--

 Summary: Improve new unit operator tests to handle operators that 
expect RawBatchBuffers off of the wire, such as the UnorderedReciever and 
MergingReciever
 Key: DRILL-4439
 URL: https://issues.apache.org/jira/browse/DRILL-4439
 Project: Apache Drill
  Issue Type: Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4438) Fix out of memory failure identified by new operator unit tests

2016-02-25 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167597#comment-15167597
 ] 

Jason Altekruse commented on DRILL-4438:


These failing tests will be checked in with the patch for the new unit test 
framework, they will be annotated with @Ignore and in the 
BasicPhysicalOpUnitTest class.

> Fix out of memory failure identified by new operator unit tests
> ---
>
> Key: DRILL-4438
> URL: https://issues.apache.org/jira/browse/DRILL-4438
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
>Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4437) Implement framework for testing operators in isolation

2016-02-25 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4437:
--

 Summary: Implement framework for testing operators in isolation
 Key: DRILL-4437
 URL: https://issues.apache.org/jira/browse/DRILL-4437
 Project: Apache Drill
  Issue Type: Test
  Components: Tools, Build & Test
Reporter: Jason Altekruse
Assignee: Jason Altekruse
 Fix For: 1.6.0


Most of the tests written for Drill are end-to-end. We spin up a full instance 
of the server, submit one or more SQL queries and check the results.

While integration tests like this are useful for ensuring that all features are 
guaranteed to not break end-user functionality overuse of this approach has 
caused a number of pain points.

Overall the tests end up running a lot of the exact same code, parsing and 
planning many similar queries.

Creating consistent reproductions of issues, especially edge cases found in 
clustered environments can be extremely difficult. Even the simpler case of 
testing cases where operators are able to handle a particular series of 
incoming batches of records has required hacks like generating large enough 
files so that the scanners happen to break them up into separate batches. These 
tests are brittle as they make assumptions about how the scanners will work in 
the future. An example of when this could break, we might do perf evaluation to 
find out we should be producing larger batches in some cases. Existing tests 
that are trying to test multiple batches by producing a few more records than 
the current threshold for batch size would not be testing the same code paths.

We need to make more parts of the system testable without initializing the 
entire Drill server, as well as making the different internal settings and 
state of the server configurable for tests.

This is a first effort to enable testing the physical operators in Drill by 
mocking the components of the system necessary to enable operators to 
initialize and execute.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4438) Fix out of memory failure identified by new operator unit tests

2016-02-25 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4438:
--

 Summary: Fix out of memory failure identified by new operator unit 
tests
 Key: DRILL-4438
 URL: https://issues.apache.org/jira/browse/DRILL-4438
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jason Altekruse
Assignee: Jason Altekruse
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3930) Remove direct references to TopLevelAllocator from unit tests

2016-02-25 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-3930.

   Resolution: Fixed
 Assignee: (was: Chris Westin)
Fix Version/s: 1.3.0

> Remove direct references to TopLevelAllocator from unit tests
> -
>
> Key: DRILL-3930
> URL: https://issues.apache.org/jira/browse/DRILL-3930
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Chris Westin
> Fix For: 1.3.0
>
>
> The RootAllocatorFactory should be used throughout the code to allow us to 
> change allocators via configuration or other software choices. Some unit 
> tests still reference TopLevelAllocator directly. We also need to do a better 
> job of handling exceptions that can be handled by close()ing an allocator 
> that isn't in the proper state (remaining open child allocators, outstanding 
> buffers, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4425) Handle blank column names in Hbase in CONVERT_FROM

2016-02-25 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15167510#comment-15167510
 ] 

Jason Altekruse commented on DRILL-4425:


I think that makes a lot of sense. Something like hbase_empty_col_name, or I 
guess if this is possible to be found in other systems we shouldn't include the 
source in the name. We can make it configurable, just in case someone manages 
to collide with whatever we pick.

So I guess we would need to add this to the schema registration of sources, as 
well as the record readers that will actually grab data from the source, and we 
would assume that only the sentinel would appear in the rest of planning and 
execution?

> Handle blank column names in Hbase in CONVERT_FROM
> --
>
> Key: DRILL-4425
> URL: https://issues.apache.org/jira/browse/DRILL-4425
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HBase
>Affects Versions: 1.3.0
> Environment: Apache Drill 1.3 on HortonWorks HDP VM 2.1
>Reporter: Saurabh Nigam
>  Labels: easyfix
>
> Hbase table may contain blank column names & blank column family names. Drill 
> needs to handle such situation.
> I faced the issue when I had a column with blank column name in my Hbase 
> Table. To reproduce it :
> -Create a column without any name in Hbase
> -Try to access it via Drill console
> -Try to use CONVERT_FROM function to convert that data from Base64 encoding 
> to make it readable. You wont be able to convert blank column because you 
> cannot use blank in your query after a dot.
> Something like  this
> SELECT CONVERT_FROM(students. , 'UTF8') AS zipcode 
>  FROM students;
> Where column name is blank
> We need to provide a placeholder for blank column names



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4394) Can’t build the custom functions for Apache Drill 1.5.0

2016-02-24 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4394.

Resolution: Fixed
  Assignee: Jason Altekruse

> Can’t build the custom functions for Apache Drill 1.5.0
> ---
>
> Key: DRILL-4394
> URL: https://issues.apache.org/jira/browse/DRILL-4394
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Kumiko Yada
>Assignee: Jason Altekruse
>Priority: Critical
>
> I tried to build the custom functions for Drill 1.5.0, but I got the below 
> error:
> Failure to find org.apache.drill.exec:drill-java-exec:jar:1.5.0 in 
> http://repo.maven.apache.org/maven2 was cached in the local repository.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4435) Add YARN jar required for running Drill on cluster with Kerberos

2016-02-24 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4435:
--

 Summary: Add YARN jar required for running Drill on cluster with 
Kerberos
 Key: DRILL-4435
 URL: https://issues.apache.org/jira/browse/DRILL-4435
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse
Assignee: Jason Altekruse


As described here, Drill currently requires adding a YARN jar to the classpath 
to run on Kerberos. If it doesn't conflict with any jars currently included 
with Drill we should just include this in the distribution to make this work 
out of the box.

http://www.dremio.com/blog/securing-sql-on-hadoop-part-2-installing-and-configuring-drill/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3584) Drill Kerberos HDFS Support / Documentation

2016-02-24 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166597#comment-15166597
 ] 

Jason Altekruse commented on DRILL-3584:


It looks like actually getting Drill working on this setup requires including a 
YARN jar, we should just include this with Drill by default so this works out 
of the box.

> Drill Kerberos HDFS Support / Documentation
> ---
>
> Key: DRILL-3584
> URL: https://issues.apache.org/jira/browse/DRILL-3584
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.1.0
>Reporter: Hari Sekhon
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> I'm trying to find Drill docs for Kerberos support for secure HDFS clusters 
> and it doesn't appear to be well tested / supported / documented yet.
> This product is Dead-on-Arrival if it doesn't integrate well with secure 
> Hadoop clusters, specifically HDFS + Kerberos (plus obviously secure 
> kerberized Hive/HCatalog etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3584) Drill Kerberos HDFS Support / Documentation

2016-02-24 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166595#comment-15166595
 ] 

Jason Altekruse commented on DRILL-3584:


[~ngriffith] Wrote a nice pair of blogs on this, we might want to put some of 
the information here in the Drill docs as well.

http://www.dremio.com/blog/securing-sql-on-hadoop-part-1-installing-cdh-and-kerberos/
http://www.dremio.com/blog/securing-sql-on-hadoop-part-2-installing-and-configuring-drill/

> Drill Kerberos HDFS Support / Documentation
> ---
>
> Key: DRILL-3584
> URL: https://issues.apache.org/jira/browse/DRILL-3584
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.1.0
>Reporter: Hari Sekhon
>Assignee: Jacques Nadeau
>Priority: Blocker
>
> I'm trying to find Drill docs for Kerberos support for secure HDFS clusters 
> and it doesn't appear to be well tested / supported / documented yet.
> This product is Dead-on-Arrival if it doesn't integrate well with secure 
> Hadoop clusters, specifically HDFS + Kerberos (plus obviously secure 
> kerberized Hive/HCatalog etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3229) Create a new EmbeddedVector

2016-02-24 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-3229.

   Resolution: Fixed
Fix Version/s: (was: Future)
   1.4.0

> Create a new EmbeddedVector
> ---
>
> Key: DRILL-3229
> URL: https://issues.apache.org/jira/browse/DRILL-3229
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Codegen, Execution - Data Types, Execution - 
> Relational Operators, Functions - Drill
>Reporter: Jacques Nadeau
>Assignee: Steven Phillips
> Fix For: 1.4.0
>
>
> Embedded Vector will leverage a binary encoding for holding information about 
> type for each individual field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-284) Publish artifacts to maven for Drill

2016-02-24 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-284.
---
   Resolution: Fixed
Fix Version/s: (was: Future)
   1.1.0

> Publish artifacts to maven for Drill
> 
>
> Key: DRILL-284
> URL: https://issues.apache.org/jira/browse/DRILL-284
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Timothy Chen
> Fix For: 1.1.0
>
>
> We need to publish our artifacts and version to maven so other dependencies 
> (Whirr, or other ones that wants maven include) can use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4426) Review storage and format plugins like parquet, JSON, Avro, Hive, etc. to ensure they fail with useful error messages including filename, column, etc.

2016-02-23 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4426:
---
Description: 
A number of these issues have been fixed in the past in individual instances, 
but we should review any remaining cases where a failure does not produce an 
error message with as much useful context information as possible. Filename 
should always be possible, column or record/line number where possible would be 
good.

One such case with a low level parquet failure was reported here.

http://search-hadoop.com/m/qRVAX48ao4xTDne/drill+Query+Return+Error+because+of+a+single+file=Query+Return+Error+because+of+a+single+file

  was:
A number of these issues have been fixed in the past in individual instances. 
but we should review any remaining cases where a failure does not produce an 
error message with as much useful context information as possible. Filename 
should always be possible, column or record/line number where possible would be 
good.

One such case with a low level parquet failure was reported here.

http://search-hadoop.com/m/qRVAX48ao4xTDne/drill+Query+Return+Error+because+of+a+single+file=Query+Return+Error+because+of+a+single+file


> Review storage and format plugins like parquet, JSON, Avro, Hive, etc. to 
> ensure they fail with useful error messages including filename, column, etc.
> --
>
> Key: DRILL-4426
> URL: https://issues.apache.org/jira/browse/DRILL-4426
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
>
> A number of these issues have been fixed in the past in individual instances, 
> but we should review any remaining cases where a failure does not produce an 
> error message with as much useful context information as possible. Filename 
> should always be possible, column or record/line number where possible would 
> be good.
> One such case with a low level parquet failure was reported here.
> http://search-hadoop.com/m/qRVAX48ao4xTDne/drill+Query+Return+Error+because+of+a+single+file=Query+Return+Error+because+of+a+single+file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4426) Review storage and format plugins like parquet, JSON, Avro, Hive, etc. to ensure they fail with useful error messages including filename, column, etc.

2016-02-23 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4426:
--

 Summary: Review storage and format plugins like parquet, JSON, 
Avro, Hive, etc. to ensure they fail with useful error messages including 
filename, column, etc.
 Key: DRILL-4426
 URL: https://issues.apache.org/jira/browse/DRILL-4426
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Jason Altekruse
Assignee: Jason Altekruse


A number of these issues have been fixed in the past in individual instances. 
but we should review any remaining cases where a failure does not produce an 
error message with as much useful context information as possible. Filename 
should always be possible, column or record/line number where possible would be 
good.

One such case with a low level parquet failure was reported here.

http://search-hadoop.com/m/qRVAX48ao4xTDne/drill+Query+Return+Error+because+of+a+single+file=Query+Return+Error+because+of+a+single+file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4425) Handle blank column names in Hbase in CONVERT_FROM

2016-02-23 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159166#comment-15159166
 ] 

Jason Altekruse commented on DRILL-4425:


I'm not sure that a blank column name is valid SQL, I am pretty sure most 
databases reject it.

It looks like `` gets rejected at the validation stage in Drill, but I believe 
this is the right behavior.

Why would anyone ever use a blank column name? Is this a commonly used feature 
of HBase?

While we want to fully support the systems we plug into, some patterns found in 
particular systems just don't conform the SQL model, and implementing this 
would mean removing a reasonable check from the SQL validator.

> Handle blank column names in Hbase in CONVERT_FROM
> --
>
> Key: DRILL-4425
> URL: https://issues.apache.org/jira/browse/DRILL-4425
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Storage - HBase
>Affects Versions: 1.3.0
> Environment: Apache Drill 1.3 on HortonWorks HDP VM 2.1
>Reporter: Saurabh Nigam
>Priority: Critical
>  Labels: easyfix
>
> Hbase table may contain blank column names & blank column family names. Drill 
> needs to handle such situation.
> I faced the issue when I had a column with blank column name in my Hbase 
> Table. To reproduce it :
> -Create a column without any name in Hbase
> -Try to access it via Drill console
> -Try to use CONVERT_FROM function to convert that data from Base64 encoding 
> to make it readable. You wont be able to convert blank column because you 
> cannot use blank in your query after a dot.
> Something like  this
> SELECT CONVERT_FROM(students. , 'UTF8') AS zipcode 
>  FROM students;
> Where column name is blank
> We need to provide a placeholder for blank column names



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4394) Can’t build the custom functions for Apache Drill 1.5.0

2016-02-17 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151023#comment-15151023
 ] 

Jason Altekruse commented on DRILL-4394:


Sorry about the oversight, I had not completed the last part of releasing the 
artifacts. Please try again as they should now be available.

> Can’t build the custom functions for Apache Drill 1.5.0
> ---
>
> Key: DRILL-4394
> URL: https://issues.apache.org/jira/browse/DRILL-4394
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.5.0
>Reporter: Kumiko Yada
>Priority: Critical
>
> I tried to build the custom functions for Drill 1.5.0, but I got the below 
> error:
> Failure to find org.apache.drill.exec:drill-java-exec:jar:1.5.0 in 
> http://repo.maven.apache.org/maven2 was cached in the local repository.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4383) Allow passign custom configuration options to a file system through the storage plugin config

2016-02-11 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4383:
--

 Summary: Allow passign custom configuration options to a file 
system through the storage plugin config
 Key: DRILL-4383
 URL: https://issues.apache.org/jira/browse/DRILL-4383
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Reporter: Jason Altekruse
Assignee: Jason Altekruse
 Fix For: 1.6.0


A similar feature already exists in the Hive and Hbase plugins, it simply 
provides a key/value map for passing custom configuration options to the 
underlying storage system.

This would be useful for the filesystem plugin to configure S3 without needing 
to create a core-site.xml file or restart Drill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3522) IllegalStateException from Mongo storage plugin

2016-02-09 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139462#comment-15139462
 ] 

Jason Altekruse commented on DRILL-3522:


This applied cleanly to master and did not cause any mongo test failures, I am 
planning to merge it to master shortly, there should be no need for a new patch 
to be uploaded.

> IllegalStateException from Mongo storage plugin
> ---
>
> Key: DRILL-3522
> URL: https://issues.apache.org/jira/browse/DRILL-3522
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MongoDB
>Affects Versions: 1.1.0
>Reporter: Adam Gilmore
>Assignee: Adam Gilmore
>Priority: Critical
> Attachments: DRILL-3522.1.patch.txt
>
>
> With a Mongo storage plugin enabled, we are sporadically getting the 
> following exception when running queries (even not against the Mongo storage 
> plugin):
> {code}
> SYSTEM ERROR: IllegalStateException: state should be: open
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: 
> org.apache.drill.common.exceptions.DrillRuntimeException: state should be: 
> open
> org.apache.drill.exec.work.foreman.Foreman.run():253
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (com.google.common.util.concurrent.UncheckedExecutionException) 
> org.apache.drill.common.exceptions.DrillRuntimeException: state should be: 
> open
> com.google.common.cache.LocalCache$Segment.get():2263
> com.google.common.cache.LocalCache.get():4000
> com.google.common.cache.LocalCache.getOrLoad():4004
> com.google.common.cache.LocalCache$LocalLoadingCache.get():4874
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.getSubSchemaNames():172
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.setHolder():159
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory.registerSchemas():127
> org.apache.drill.exec.store.mongo.MongoStoragePlugin.registerSchemas():86
> 
> org.apache.drill.exec.store.StoragePluginRegistry$DrillSchemaFactory.registerSchemas():328
> org.apache.drill.exec.ops.QueryContext.getRootSchema():165
> org.apache.drill.exec.ops.QueryContext.getRootSchema():154
> org.apache.drill.exec.ops.QueryContext.getRootSchema():142
> org.apache.drill.exec.ops.QueryContext.getNewDefaultSchema():128
> org.apache.drill.exec.planner.sql.DrillSqlWorker.():91
> org.apache.drill.exec.work.foreman.Foreman.runSQL():901
> org.apache.drill.exec.work.foreman.Foreman.run():242
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (org.apache.drill.common.exceptions.DrillRuntimeException) state 
> should be: open
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$DatabaseLoader.load():98
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$DatabaseLoader.load():82
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture():3599
> com.google.common.cache.LocalCache$Segment.loadSync():2379
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad():2342
> com.google.common.cache.LocalCache$Segment.get():2257
> com.google.common.cache.LocalCache.get():4000
> com.google.common.cache.LocalCache.getOrLoad():4004
> com.google.common.cache.LocalCache$LocalLoadingCache.get():4874
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.getSubSchemaNames():172
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.setHolder():159
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory.registerSchemas():127
> org.apache.drill.exec.store.mongo.MongoStoragePlugin.registerSchemas():86
> 
> org.apache.drill.exec.store.StoragePluginRegistry$DrillSchemaFactory.registerSchemas():328
> org.apache.drill.exec.ops.QueryContext.getRootSchema():165
> org.apache.drill.exec.ops.QueryContext.getRootSchema():154
> org.apache.drill.exec.ops.QueryContext.getRootSchema():142
> org.apache.drill.exec.ops.QueryContext.getNewDefaultSchema():128
> org.apache.drill.exec.planner.sql.DrillSqlWorker.():91
> org.apache.drill.exec.work.foreman.Foreman.runSQL():901
> org.apache.drill.exec.work.foreman.Foreman.run():242
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (java.lang.IllegalStateException) state should be: open
> com.mongodb.assertions.Assertions.isTrue():70
>   

[jira] [Commented] (DRILL-3522) IllegalStateException from Mongo storage plugin

2016-02-09 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139178#comment-15139178
 ] 

Jason Altekruse commented on DRILL-3522:


+1 Thanks for the fix Adam, sorry this sat out for so long.

Please feel free to reach out on the list if you are waiting for a review. I am 
trying to get the JIRA cleaned up so that the list of REVIEWABLE jiras actually 
reflects what needs to currently be reviewed and hope this will be less of an 
issue going forward.

> IllegalStateException from Mongo storage plugin
> ---
>
> Key: DRILL-3522
> URL: https://issues.apache.org/jira/browse/DRILL-3522
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MongoDB
>Affects Versions: 1.1.0
>Reporter: Adam Gilmore
>Assignee: Adam Gilmore
>Priority: Critical
> Attachments: DRILL-3522.1.patch.txt
>
>
> With a Mongo storage plugin enabled, we are sporadically getting the 
> following exception when running queries (even not against the Mongo storage 
> plugin):
> {code}
> SYSTEM ERROR: IllegalStateException: state should be: open
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: 
> org.apache.drill.common.exceptions.DrillRuntimeException: state should be: 
> open
> org.apache.drill.exec.work.foreman.Foreman.run():253
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (com.google.common.util.concurrent.UncheckedExecutionException) 
> org.apache.drill.common.exceptions.DrillRuntimeException: state should be: 
> open
> com.google.common.cache.LocalCache$Segment.get():2263
> com.google.common.cache.LocalCache.get():4000
> com.google.common.cache.LocalCache.getOrLoad():4004
> com.google.common.cache.LocalCache$LocalLoadingCache.get():4874
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.getSubSchemaNames():172
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.setHolder():159
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory.registerSchemas():127
> org.apache.drill.exec.store.mongo.MongoStoragePlugin.registerSchemas():86
> 
> org.apache.drill.exec.store.StoragePluginRegistry$DrillSchemaFactory.registerSchemas():328
> org.apache.drill.exec.ops.QueryContext.getRootSchema():165
> org.apache.drill.exec.ops.QueryContext.getRootSchema():154
> org.apache.drill.exec.ops.QueryContext.getRootSchema():142
> org.apache.drill.exec.ops.QueryContext.getNewDefaultSchema():128
> org.apache.drill.exec.planner.sql.DrillSqlWorker.():91
> org.apache.drill.exec.work.foreman.Foreman.runSQL():901
> org.apache.drill.exec.work.foreman.Foreman.run():242
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (org.apache.drill.common.exceptions.DrillRuntimeException) state 
> should be: open
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$DatabaseLoader.load():98
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$DatabaseLoader.load():82
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture():3599
> com.google.common.cache.LocalCache$Segment.loadSync():2379
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad():2342
> com.google.common.cache.LocalCache$Segment.get():2257
> com.google.common.cache.LocalCache.get():4000
> com.google.common.cache.LocalCache.getOrLoad():4004
> com.google.common.cache.LocalCache$LocalLoadingCache.get():4874
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.getSubSchemaNames():172
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory$MongoSchema.setHolder():159
> 
> org.apache.drill.exec.store.mongo.schema.MongoSchemaFactory.registerSchemas():127
> org.apache.drill.exec.store.mongo.MongoStoragePlugin.registerSchemas():86
> 
> org.apache.drill.exec.store.StoragePluginRegistry$DrillSchemaFactory.registerSchemas():328
> org.apache.drill.exec.ops.QueryContext.getRootSchema():165
> org.apache.drill.exec.ops.QueryContext.getRootSchema():154
> org.apache.drill.exec.ops.QueryContext.getRootSchema():142
> org.apache.drill.exec.ops.QueryContext.getNewDefaultSchema():128
> org.apache.drill.exec.planner.sql.DrillSqlWorker.():91
> org.apache.drill.exec.work.foreman.Foreman.runSQL():901
> org.apache.drill.exec.work.foreman.Foreman.run():242
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> 

[jira] [Resolved] (DRILL-4230) NullReferenceException when SELECTing from empty mongo collection

2016-02-09 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4230.

   Resolution: Fixed
Fix Version/s: 1.5.0

Fixed in ed2f1ca8ed3c0ebac7e33494db6749851fc2c970

This was applied separately to the 1.5 release branch, so the commit there has 
identical content and the same commit message, but will have a different hash.

> NullReferenceException when SELECTing from empty mongo collection
> -
>
> Key: DRILL-4230
> URL: https://issues.apache.org/jira/browse/DRILL-4230
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MongoDB
>Affects Versions: 1.3.0
>Reporter: Brick Shitting Bird Jr.
>Assignee: Jason Altekruse
> Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4349) parquet reader returns wrong results when reading a nullable column that starts with a large number of nulls (>30k)

2016-02-09 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140236#comment-15140236
 ] 

Jason Altekruse commented on DRILL-4349:


As I was rolling the rc3 release candidate for 1.5.0 I decided to apply this 
fix to the release branch as it seemed useful to get into the release. The 
commit hash will be different but the patch applied cleanly and has an 
identical diff represented.

> parquet reader returns wrong results when reading a nullable column that 
> starts with a large number of nulls (>30k)
> ---
>
> Key: DRILL-4349
> URL: https://issues.apache.org/jira/browse/DRILL-4349
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.4.0
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.5.0
>
> Attachments: drill4349.tar.gz
>
>
> While reading a nullable column, if in a single pass we only read null 
> values, the parquet reader resets the value of pageReader.readPosInBytes 
> which will lead to wrong data read from the file.
> To reproduce the issue, create a csv file (repro.csv) with 2 columns (id, 
> val) with 50100 rows, where id equals to the row number and val is empty for 
> the first 50k rows, and equal to id for the remaining rows.
> create a parquet table from the csv file:
> {noformat}
> CREATE TABLE `repro_parquet` AS SELECT CAST(columns[0] AS INT) AS id, 
> CAST(NULLIF(columns[1], '') AS DOUBLE) AS val from `repro.csv`;
> {noformat}
> Now if you query any of the non null values you will get wrong results:
> {noformat}
> 0: jdbc:drill:zk=local> select * from `repro_parquet` where id>=5 limit 
> 10;
> ++---+
> |   id   |val|
> ++---+
> | 5  | 9.11337776337441E-309 |
> | 50001  | 3.26044E-319  |
> | 50002  | 1.4916681476489723E-154   |
> | 50003  | 2.18890676|
> | 50004  | 2.681561588521345E154 |
> | 50005  | -2.1016574E-317   |
> | 50006  | -1.4916681476489723E-154  |
> | 50007  | -2.18890676   |
> | 50008  | -2.681561588521345E154|
> | 50009  | 2.1016574E-317|
> ++---+
> 10 rows selected (0.238 seconds)
> {noformat}
> and here are the expected values:
> {noformat}
> 0: jdbc:drill:zk=local> select * from `repro.csv` where cast(columns[0] as 
> int)>=5 limit 10;
> ++
> |  columns   |
> ++
> | ["5","5"]  |
> | ["50001","50001"]  |
> | ["50002","50002"]  |
> | ["50003","50003"]  |
> | ["50004","50004"]  |
> | ["50005","50005"]  |
> | ["50006","50006"]  |
> | ["50007","50007"]  |
> | ["50008","50008"]  |
> | ["50009","50009"]  |
> ++
> {noformat}
> I confirmed that the file is written correctly and the issue is in the 
> parquet reader (already have a fix for it)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4380) Fix performance regression: in creation of FileSelection in ParquetFormatPlugin to not set files if metadata cache is available.

2016-02-09 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4380:
---
Fix Version/s: 1.5.0

> Fix performance regression: in creation of FileSelection in 
> ParquetFormatPlugin to not set files if metadata cache is available.
> 
>
> Key: DRILL-4380
> URL: https://issues.apache.org/jira/browse/DRILL-4380
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Parth Chandra
> Fix For: 1.5.0
>
>
> The regression has been caused by the changes in 
> 367d74a65ce2871a1452361cbd13bbd5f4a6cc95 (DRILL-2618: handle queries over 
> empty folders consistently so that they report table not found rather than 
> failing.)
> In ParquetFormatPlugin, the original code created a FileSelection object in 
> the following code:
> {code}
> return new FileSelection(fileNames, metaRootPath.toString(), metadata, 
> selection.getFileStatusList(fs));
> {code}
> The selection.getFileStatusList call made an inexpensive call to 
> FileSelection.init(). The call was inexpensive because the 
> FileSelection.files member was not set and the code does not need to make an 
> expensive call to get the file statuses corresponding to the files in the 
> FileSelection.files member.
> In the new code, this is replaced by 
> {code}
>   final FileSelection newSelection = FileSelection.create(null, fileNames, 
> metaRootPath.toString());
> return ParquetFileSelection.create(newSelection, metadata);
> {code}
> This sets the FileSelection.files member but not the FileSelection.statuses 
> member. A subsequent call to FileSelection.getStatuses ( in 
> ParquetGroupScan() ) now makes an expensive call to get all the statuses.
> It appears that there was an implicit assumption that the 
> FileSelection.statuses member should be set before the FileSelection.files 
> member is set. This assumption is no longer true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4235) Hit IllegalStateException when exec.queue.enable=ture

2016-02-09 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4235:
---
Fix Version/s: (was: 1.6.0)
   1.5.0

> Hit IllegalStateException when exec.queue.enable=ture 
> --
>
> Key: DRILL-4235
> URL: https://issues.apache.org/jira/browse/DRILL-4235
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.5.0
> Environment: git.commit.id=6dea429949a3d6a68aefbdb3d78de41e0955239b
>Reporter: Dechang Gu
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.5.0
>
>
> 0: jdbc:drill:schema=dfs.parquet> select * from sys.options;
> Error: SYSTEM ERROR: IllegalStateException: Failure trying to change states: 
> ENQUEUED --> RUNNING
> [Error Id: 6ac8167c-6fb7-4274-9e5c-bf62a195c06e on ucs-node5.perf.lab:31010]
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
> during fragment initialization: Exceptions caught during event processing
> org.apache.drill.exec.work.foreman.Foreman.run():261
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (java.lang.RuntimeException) Exceptions caught during event 
> processing
> org.apache.drill.common.EventProcessor.sendEvent():93
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState():792
> org.apache.drill.exec.work.foreman.Foreman.moveToState():909
> org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan():420
> org.apache.drill.exec.work.foreman.Foreman.runSQL():926
> org.apache.drill.exec.work.foreman.Foreman.run():250
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (java.lang.IllegalStateException) Failure trying to change 
> states: ENQUEUED --> RUNNING
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent():896
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent():790
> org.apache.drill.common.EventProcessor.sendEvent():73
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState():792
> org.apache.drill.exec.work.foreman.Foreman.moveToState():909
> org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan():420
> org.apache.drill.exec.work.foreman.Foreman.runSQL():926
> org.apache.drill.exec.work.foreman.Foreman.run():250
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (DRILL-4349) parquet reader returns wrong results when reading a nullable column that starts with a large number of nulls (>30k)

2016-02-09 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse reopened DRILL-4349:

  Assignee: Jason Altekruse  (was: Deneche A. Hakim)

> parquet reader returns wrong results when reading a nullable column that 
> starts with a large number of nulls (>30k)
> ---
>
> Key: DRILL-4349
> URL: https://issues.apache.org/jira/browse/DRILL-4349
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.4.0
>Reporter: Deneche A. Hakim
>Assignee: Jason Altekruse
>Priority: Critical
> Fix For: 1.5.0
>
> Attachments: drill4349.tar.gz
>
>
> While reading a nullable column, if in a single pass we only read null 
> values, the parquet reader resets the value of pageReader.readPosInBytes 
> which will lead to wrong data read from the file.
> To reproduce the issue, create a csv file (repro.csv) with 2 columns (id, 
> val) with 50100 rows, where id equals to the row number and val is empty for 
> the first 50k rows, and equal to id for the remaining rows.
> create a parquet table from the csv file:
> {noformat}
> CREATE TABLE `repro_parquet` AS SELECT CAST(columns[0] AS INT) AS id, 
> CAST(NULLIF(columns[1], '') AS DOUBLE) AS val from `repro.csv`;
> {noformat}
> Now if you query any of the non null values you will get wrong results:
> {noformat}
> 0: jdbc:drill:zk=local> select * from `repro_parquet` where id>=5 limit 
> 10;
> ++---+
> |   id   |val|
> ++---+
> | 5  | 9.11337776337441E-309 |
> | 50001  | 3.26044E-319  |
> | 50002  | 1.4916681476489723E-154   |
> | 50003  | 2.18890676|
> | 50004  | 2.681561588521345E154 |
> | 50005  | -2.1016574E-317   |
> | 50006  | -1.4916681476489723E-154  |
> | 50007  | -2.18890676   |
> | 50008  | -2.681561588521345E154|
> | 50009  | 2.1016574E-317|
> ++---+
> 10 rows selected (0.238 seconds)
> {noformat}
> and here are the expected values:
> {noformat}
> 0: jdbc:drill:zk=local> select * from `repro.csv` where cast(columns[0] as 
> int)>=5 limit 10;
> ++
> |  columns   |
> ++
> | ["5","5"]  |
> | ["50001","50001"]  |
> | ["50002","50002"]  |
> | ["50003","50003"]  |
> | ["50004","50004"]  |
> | ["50005","50005"]  |
> | ["50006","50006"]  |
> | ["50007","50007"]  |
> | ["50008","50008"]  |
> | ["50009","50009"]  |
> ++
> {noformat}
> I confirmed that the file is written correctly and the issue is in the 
> parquet reader (already have a fix for it)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4349) parquet reader returns wrong results when reading a nullable column that starts with a large number of nulls (>30k)

2016-02-09 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4349:
---
Fix Version/s: (was: 1.6.0)
   1.5.0

> parquet reader returns wrong results when reading a nullable column that 
> starts with a large number of nulls (>30k)
> ---
>
> Key: DRILL-4349
> URL: https://issues.apache.org/jira/browse/DRILL-4349
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.4.0
>Reporter: Deneche A. Hakim
>Assignee: Jason Altekruse
>Priority: Critical
> Fix For: 1.5.0
>
> Attachments: drill4349.tar.gz
>
>
> While reading a nullable column, if in a single pass we only read null 
> values, the parquet reader resets the value of pageReader.readPosInBytes 
> which will lead to wrong data read from the file.
> To reproduce the issue, create a csv file (repro.csv) with 2 columns (id, 
> val) with 50100 rows, where id equals to the row number and val is empty for 
> the first 50k rows, and equal to id for the remaining rows.
> create a parquet table from the csv file:
> {noformat}
> CREATE TABLE `repro_parquet` AS SELECT CAST(columns[0] AS INT) AS id, 
> CAST(NULLIF(columns[1], '') AS DOUBLE) AS val from `repro.csv`;
> {noformat}
> Now if you query any of the non null values you will get wrong results:
> {noformat}
> 0: jdbc:drill:zk=local> select * from `repro_parquet` where id>=5 limit 
> 10;
> ++---+
> |   id   |val|
> ++---+
> | 5  | 9.11337776337441E-309 |
> | 50001  | 3.26044E-319  |
> | 50002  | 1.4916681476489723E-154   |
> | 50003  | 2.18890676|
> | 50004  | 2.681561588521345E154 |
> | 50005  | -2.1016574E-317   |
> | 50006  | -1.4916681476489723E-154  |
> | 50007  | -2.18890676   |
> | 50008  | -2.681561588521345E154|
> | 50009  | 2.1016574E-317|
> ++---+
> 10 rows selected (0.238 seconds)
> {noformat}
> and here are the expected values:
> {noformat}
> 0: jdbc:drill:zk=local> select * from `repro.csv` where cast(columns[0] as 
> int)>=5 limit 10;
> ++
> |  columns   |
> ++
> | ["5","5"]  |
> | ["50001","50001"]  |
> | ["50002","50002"]  |
> | ["50003","50003"]  |
> | ["50004","50004"]  |
> | ["50005","50005"]  |
> | ["50006","50006"]  |
> | ["50007","50007"]  |
> | ["50008","50008"]  |
> | ["50009","50009"]  |
> ++
> {noformat}
> I confirmed that the file is written correctly and the issue is in the 
> parquet reader (already have a fix for it)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4349) parquet reader returns wrong results when reading a nullable column that starts with a large number of nulls (>30k)

2016-02-09 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse closed DRILL-4349.
--
Resolution: Fixed
  Reviewer:   (was: Chun Chang)

> parquet reader returns wrong results when reading a nullable column that 
> starts with a large number of nulls (>30k)
> ---
>
> Key: DRILL-4349
> URL: https://issues.apache.org/jira/browse/DRILL-4349
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.4.0
>Reporter: Deneche A. Hakim
>Assignee: Jason Altekruse
>Priority: Critical
> Fix For: 1.5.0
>
> Attachments: drill4349.tar.gz
>
>
> While reading a nullable column, if in a single pass we only read null 
> values, the parquet reader resets the value of pageReader.readPosInBytes 
> which will lead to wrong data read from the file.
> To reproduce the issue, create a csv file (repro.csv) with 2 columns (id, 
> val) with 50100 rows, where id equals to the row number and val is empty for 
> the first 50k rows, and equal to id for the remaining rows.
> create a parquet table from the csv file:
> {noformat}
> CREATE TABLE `repro_parquet` AS SELECT CAST(columns[0] AS INT) AS id, 
> CAST(NULLIF(columns[1], '') AS DOUBLE) AS val from `repro.csv`;
> {noformat}
> Now if you query any of the non null values you will get wrong results:
> {noformat}
> 0: jdbc:drill:zk=local> select * from `repro_parquet` where id>=5 limit 
> 10;
> ++---+
> |   id   |val|
> ++---+
> | 5  | 9.11337776337441E-309 |
> | 50001  | 3.26044E-319  |
> | 50002  | 1.4916681476489723E-154   |
> | 50003  | 2.18890676|
> | 50004  | 2.681561588521345E154 |
> | 50005  | -2.1016574E-317   |
> | 50006  | -1.4916681476489723E-154  |
> | 50007  | -2.18890676   |
> | 50008  | -2.681561588521345E154|
> | 50009  | 2.1016574E-317|
> ++---+
> 10 rows selected (0.238 seconds)
> {noformat}
> and here are the expected values:
> {noformat}
> 0: jdbc:drill:zk=local> select * from `repro.csv` where cast(columns[0] as 
> int)>=5 limit 10;
> ++
> |  columns   |
> ++
> | ["5","5"]  |
> | ["50001","50001"]  |
> | ["50002","50002"]  |
> | ["50003","50003"]  |
> | ["50004","50004"]  |
> | ["50005","50005"]  |
> | ["50006","50006"]  |
> | ["50007","50007"]  |
> | ["50008","50008"]  |
> | ["50009","50009"]  |
> ++
> {noformat}
> I confirmed that the file is written correctly and the issue is in the 
> parquet reader (already have a fix for it)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-02-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138293#comment-15138293
 ] 

Jason Altekruse commented on DRILL-4373:


Just to isolate the issue a little more, can you read the table with Hive 
itself (rather than reading hive through the drill plugin)?

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4373) Drill and Hive have incompatible timestamp representations in parquet

2016-02-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138294#comment-15138294
 ] 

Jason Altekruse commented on DRILL-4373:


Could you also post the failure message?

> Drill and Hive have incompatible timestamp representations in parquet
> -
>
> Key: DRILL-4373
> URL: https://issues.apache.org/jira/browse/DRILL-4373
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
>
> git.commit.id.abbrev=83d460c
> I created a parquet file with a timestamp type using Drill. Now if I define a 
> hive table on top of the parquet file and use "timestamp" as the column type, 
> drill fails to read the hive table through the hive storage plugin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4331) TestFlattenPlanning.testFlattenPlanningAvoidUnnecessaryProject fail in Java 8

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4331:
---
Fix Version/s: (was: Future)
   1.6.0

> TestFlattenPlanning.testFlattenPlanningAvoidUnnecessaryProject fail in Java 8
> -
>
> Key: DRILL-4331
> URL: https://issues.apache.org/jira/browse/DRILL-4331
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
> Fix For: 1.6.0
>
>
> This test expects the following Project in the query plan:
> {noformat}
> Project(EXPR$0=[$1], rownum=[$0])
> {noformat}
> In Java 8, for some reason the scan operator exposes the columns in reverse 
> order which causes the project to be different than the one expected:
> {noformat}
> Project(EXPR$0=[$0], rownum=[$1])
> {noformat}
> The plan is still correct, so the test must be fixed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4375) Fix the maven release profile, broken by jdbc jar size enforcer added in DRILL-4291

2016-02-08 Thread Jason Altekruse (JIRA)
Jason Altekruse created DRILL-4375:
--

 Summary: Fix the maven release profile, broken by jdbc jar size 
enforcer added in DRILL-4291
 Key: DRILL-4375
 URL: https://issues.apache.org/jira/browse/DRILL-4375
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jason Altekruse
Assignee: Jason Altekruse






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4375) Fix the maven release profile, broken by jdbc jar size enforcer added in DRILL-4291

2016-02-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138094#comment-15138094
 ] 

Jason Altekruse commented on DRILL-4375:


Rather than duplicate the info, the fix applied for this issue is described in 
a related JIRA for further investigation into the root problem here: 
https://issues.apache.org/jira/browse/DRILL-4336?focusedCommentId=15138022=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15138022

> Fix the maven release profile, broken by jdbc jar size enforcer added in 
> DRILL-4291
> ---
>
> Key: DRILL-4375
> URL: https://issues.apache.org/jira/browse/DRILL-4375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jason Altekruse
>Assignee: Jason Altekruse
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4336) Fix weird interaction between maven-release, maven-enforcer and RAT plugins

2016-02-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138120#comment-15138120
 ] 

Jason Altekruse commented on DRILL-4336:


I have filed DRILL-4375, linked as related to this issue, for merging the fix 
that was applied to the 1.5 release branch. I think this may still need to be 
investigated further, as the reason this is only failing in the release profile 
is unclear.

> Fix weird interaction between maven-release, maven-enforcer and RAT plugins
> ---
>
> Key: DRILL-4336
> URL: https://issues.apache.org/jira/browse/DRILL-4336
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jason Altekruse
>
> While trying to make the 1.5.0 release I ran into a bizarre failure from RAT 
> complaining about a file it should have been ignoring according to the plugin 
> configuration.
> Disabling the newly added maven-enforcer plugin "fixed" the issue, but we 
> need to keep this in the build to make sure new dependencies don't creep into 
> the JDBC driver that is supposed to be as small as possible.
> For the sake of the release the jdbc-all jar's size was checked manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4295) Obsolete protobuf generated files under protocol/

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4295:
---
Fix Version/s: 1.6.0

> Obsolete protobuf generated files under protocol/
> -
>
> Key: DRILL-4295
> URL: https://issues.apache.org/jira/browse/DRILL-4295
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build & Test
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Trivial
> Fix For: 1.6.0
>
>
> The following two files don't have a protobuf definition anymore, and are not 
> generated when running {{mvn process-sources -P proto-compile}} under 
> {{protocol/}}:
> {noformat}
> src/main/java/org/apache/drill/exec/proto/beans/RpcFailure.java
> src/main/java/org/apache/drill/exec/proto/beans/ViewPointer.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4295) Obsolete protobuf generated files under protocol/

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4295.

Resolution: Fixed

Fixed in fbb0165def5e23b6b2f6a690d47dc5fbeb2bdbcb

> Obsolete protobuf generated files under protocol/
> -
>
> Key: DRILL-4295
> URL: https://issues.apache.org/jira/browse/DRILL-4295
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build & Test
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Trivial
> Fix For: 1.6.0
>
>
> The following two files don't have a protobuf definition anymore, and are not 
> generated when running {{mvn process-sources -P proto-compile}} under 
> {{protocol/}}:
> {noformat}
> src/main/java/org/apache/drill/exec/proto/beans/RpcFailure.java
> src/main/java/org/apache/drill/exec/proto/beans/ViewPointer.java
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4331) TestFlattenPlanning.testFlattenPlanningAvoidUnnecessaryProject fail in Java 8

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4331.

Resolution: Fixed

Fixed in 32da4675e8bf1358b863532daadd2769f380600f

> TestFlattenPlanning.testFlattenPlanningAvoidUnnecessaryProject fail in Java 8
> -
>
> Key: DRILL-4331
> URL: https://issues.apache.org/jira/browse/DRILL-4331
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build & Test
>Affects Versions: 1.5.0
>Reporter: Deneche A. Hakim
> Fix For: 1.6.0
>
>
> This test expects the following Project in the query plan:
> {noformat}
> Project(EXPR$0=[$1], rownum=[$0])
> {noformat}
> In Java 8, for some reason the scan operator exposes the columns in reverse 
> order which causes the project to be different than the one expected:
> {noformat}
> Project(EXPR$0=[$0], rownum=[$1])
> {noformat}
> The plan is still correct, so the test must be fixed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (DRILL-4336) Fix weird interaction between maven-release, maven-enforcer and RAT plugins

2016-02-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138022#comment-15138022
 ] 

Jason Altekruse edited comment on DRILL-4336 at 2/8/16 11:47 PM:
-

Update on this issue: after more investigation the real cause of the problem 
was determined to be the maven shade plugin. As a part of shading, a list of 
dependencies to exclude from the jdbc-all.jar is applied to make it smaller.

The resulting jar is expected to require no external dependencies, everything 
needed should be bundled within it and shaded to avoid conflicts with any 
shared dependencies used by a client application. To express the changed list 
of dependencies the plugin generates a dependency-reduced-pom.xml file, which 
contains an empty list of dependencies.

The issue came about with the location chosen for this new pom file. By default 
the shade plugin puts the file in the module root, which violates a maven 
design principle of keeping generated files in the 'target' directory, so that 
they will be cleared upon a 'mvn clean'. This is considered a known limitation 
of the shade plugin as stated here 
https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html#dependencyReducedPomLocation.
 We had tried to follow best practices by overriding the default location and 
putting it in target.

Unfortunately this seemed to have a weird interaction with the maven-enforcer 
plugin added to enforce the JAR size, and the release maven profile. With the 
newly generated pom file in a directory below the jdbc-all module, maven was 
trying to run the 'target' directory as a submodule. This failed initially with 
a complaint that the newly-generated POM was lacking an apache header (from 
RAT), but beyond this the target directory is not going to be structured 
properly to run as a maven module and it isn't supposed to be. It hasn't been 
completely investigated to determine why this only happened when running the 
release profile.

The fix applied on the 1.5 release branch for this issue was to move the file 
back to it's default (but maven convention breaking) location. As part of the 
change the file needed to be added to the .gitignore and RAT files, as it was 
no longer excluded from either of these simply by being inside of a 'target' 
directory.


was (Author: jaltekruse):
Update on this issue: after more investigation the real cause of the problem 
was determined to be the maven shade plugin. As a part of shading, a list of 
dependencies to exclude from the jdbc-all.jar is applied to make it smaller.

The resulting jar is expected to require no external dependencies, everything 
needed should be bundled within it and shaded to avoid conflicts with any 
shared dependencies used by a client application. To express the changed list 
of dependencies the plugin generates a dependency-reduced-pom.xml file, which 
contains an empty list of dependencies.

The issue came about with the location chosen for this new pom file. By default 
the shade plugin puts the file in the module root, which violates a maven 
design principle of keeping generated files in the 'target' directory, so that 
they will be cleared upon a 'mvn clean'. This is considered a known limitation 
of the shade plugin as stated here 
https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html#dependencyReducedPomLocation.
 We had tried to follow best practices by overriding the default location and 
putting it in target.

Unfortunately this seemed to have a weird interaction with the maven-enforcer 
plugin added to enforce the JAR size, and the release maven profile. With the 
newly generated pom file in a directory below the jdbc-all module, maven was 
trying to run the 'target' directory as a submodule. This failed initially with 
a complaint that the newly-generated POM was lacking an apache header (from 
RAT), but beyond this the target directory is not going to be structure 
properly to run as a maven module and it isn't supposed to be. It hasn't been 
completely investigated to determine why this only happened when running the 
release profile.

The fix applied on the 1.5 release branch for this issue was to move the file 
back to it's default (but maven convention breaking) location. As part of the 
change the file needed to be added to the .gitignore and RAT files, as it was 
no longer excluded from either of these simply by being inside of a 'target' 
directory.

> Fix weird interaction between maven-release, maven-enforcer and RAT plugins
> ---
>
> Key: DRILL-4336
> URL: https://issues.apache.org/jira/browse/DRILL-4336
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jason Altekruse
>
> While trying to make the 1.5.0 release I ran into a bizarre failure from RAT 
> complaining 

[jira] [Commented] (DRILL-4336) Fix weird interaction between maven-release, maven-enforcer and RAT plugins

2016-02-08 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138022#comment-15138022
 ] 

Jason Altekruse commented on DRILL-4336:


Update on this issue: after more investigation the real cause of the problem 
was determined to be the maven shade plugin. As a part of shading, a list of 
dependencies to exclude from the jdbc-all.jar is applied to make it smaller.

The resulting jar is expected to require no external dependencies, everything 
needed should be bundled within it and shaded to avoid conflicts with any 
shared dependencies used by a client application. To express the changed list 
of dependencies the plugin generates a dependency-reduced-pom.xml file, which 
contains an empty list of dependencies.

The issue came about with the location chosen for this new pom file. By default 
the shade plugin puts the file in the module root, which violates a maven 
design principle of keeping generated files in the 'target' directory, so that 
they will be cleared upon a 'mvn clean'. This is considered a known limitation 
of the shade plugin as stated here 
https://maven.apache.org/plugins/maven-shade-plugin/shade-mojo.html#dependencyReducedPomLocation.
 We had tried to follow best practices by overriding the default location and 
putting it in target.

Unfortunately this seemed to have a weird interaction with the maven-enforcer 
plugin added to enforce the JAR size, and the release maven profile. With the 
newly generated pom file in a directory below the jdbc-all module, maven was 
trying to run the 'target' directory as a submodule. This failed initially with 
a complaint that the newly-generated POM was lacking an apache header (from 
RAT), but beyond this the target directory is not going to be structure 
properly to run as a maven module and it isn't supposed to be. It hasn't been 
completely investigated to determine why this only happened when running the 
release profile.

The fix applied on the 1.5 release branch for this issue was to move the file 
back to it's default (but maven convention breaking) location. As part of the 
change the file needed to be added to the .gitignore and RAT files, as it was 
no longer excluded from either of these simply by being inside of a 'target' 
directory.

> Fix weird interaction between maven-release, maven-enforcer and RAT plugins
> ---
>
> Key: DRILL-4336
> URL: https://issues.apache.org/jira/browse/DRILL-4336
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jason Altekruse
>
> While trying to make the 1.5.0 release I ran into a bizarre failure from RAT 
> complaining about a file it should have been ignoring according to the plugin 
> configuration.
> Disabling the newly added maven-enforcer plugin "fixed" the issue, but we 
> need to keep this in the build to make sure new dependencies don't creep into 
> the JDBC driver that is supposed to be as small as possible.
> For the sake of the release the jdbc-all jar's size was checked manually.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4359) EndpointAffinity missing equals method

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4359.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in 6b1b4d257b89e5579140e75388cd37db5563a6a8

> EndpointAffinity missing equals method
> --
>
> Key: DRILL-4359
> URL: https://issues.apache.org/jira/browse/DRILL-4359
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Trivial
> Fix For: 1.6.0
>
>
> EndpointAffinity is a placeholder class, but has no equals method to allow 
> comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4353) Expired sessions in web server are not cleaning up resources, leading to resource leak

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4353.

Resolution: Fixed

Fixed in 282dfd762f1bd6628b293c68b20cdff321bd70a3

This was also merged into the 1.5 release branch, that commit has a different 
hash, but there were other changes that had already been merged into master 
that we didn't want to include in the release.

> Expired sessions in web server are not cleaning up resources, leading to 
> resource leak
> --
>
> Key: DRILL-4353
> URL: https://issues.apache.org/jira/browse/DRILL-4353
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.5.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Currently we store the session resources (including DrillClient) in attribute 
> {{SessionAuthentication}} object which implements 
> {{HttpSessionBindingListener}}. Whenever a session is invalidated, all 
> attributes are removed and if an attribute class implements 
> {{HttpSessionBindingListener}}, listener is informed. 
> {{SessionAuthentication}} implementation of {{HttpSessionBindingListener}} 
> logs out the user which includes cleaning up the resources as well, but 
> {{SessionAuthentication}} relies on ServletContext stored in thread local 
> variable (see 
> [here|https://github.com/eclipse/jetty.project/blob/jetty-9.1.5.v20140505/jetty-security/src/main/java/org/eclipse/jetty/security/authentication/SessionAuthentication.java#L88]).
>  In case of thread that cleans up the expired sessions there is no 
> {{ServletContext}} in thread local variable, leading to not logging out the 
> user properly and resource leak.
> Fix: Add {{HttpSessionEventListener}} to cleanup the 
> {{SessionAuthentication}} and resources every time a HttpSession is expired 
> or invalidated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4361) Allow for FileSystemPlugin subclasses to override FormatCreator

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4361.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in 5e57b0e3b44f46aa93bf82f366eb3a3f61990da3

> Allow for FileSystemPlugin subclasses to override FormatCreator
> ---
>
> Key: DRILL-4361
> URL: https://issues.apache.org/jira/browse/DRILL-4361
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Minor
> Fix For: 1.6.0
>
>
> FileSystemPlugin subclasses are not able to customize plugins, as 
> FormatCreator in created in FileSystemPlugin constructor and immediately used 
> to create SchemaFactory instance.
> FormatCreator instantiation should be moved to a protected method so that 
> subclass can choose to implement it differently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4225) TestDateFunctions#testToChar fails when the locale is non-English

2016-02-08 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4225.

   Resolution: Fixed
Fix Version/s: 1.6.0

Fixed in 4e9b82562cf0fc46e759b89857ffb85e129a178b

> TestDateFunctions#testToChar fails when the locale is non-English
> -
>
> Key: DRILL-4225
> URL: https://issues.apache.org/jira/browse/DRILL-4225
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.4.0
> Environment: Mac OS X 10.10.5
>Reporter: Akihiko Kusanagi
> Fix For: 1.6.0
>
>
> Set the locale to ja_JP on Mac OS X: 
> {noformat}
> $ defaults read -g AppleLocale
> ja_JP
> {noformat}
> TestDateFunctions#testToChar fails with the following output:
> {noformat}
> Running org.apache.drill.exec.fn.impl.TestDateFunctions#testToChar
> 2008-2-23
> 12 20 30
> 2008 2 23 12:00:00
> ...
> Tests run: 6, Failures: 1, Errors: 0, Skipped: 2, Time elapsed: 14.333 sec 
> <<< FAILURE! - in org.apache.drill.exec.fn.impl.TestDateFunctions
> testToChar(org.apache.drill.exec.fn.impl.TestDateFunctions)  Time elapsed: 
> 2.793 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<2008-[Feb]-23> but was:<2008-[2]-23>
>   at 
> org.apache.drill.exec.fn.impl.TestDateFunctions.testCommon(TestDateFunctions.java:66)
>   at 
> org.apache.drill.exec.fn.impl.TestDateFunctions.testToChar(TestDateFunctions.java:139)
> ...
> Failed tests: 
>   TestDateFunctions.testToChar:139->testCommon:66 expected:<2008-[Feb]-23> 
> but was:<2008-[2]-23>
> {noformat}
> Test queries are like this:
> {noformat}
> to_char((cast('2008-2-23' as date)), '-MMM-dd')
> to_char(cast('12:20:30' as time), 'HH mm ss')
> to_char(cast('2008-2-23 12:00:00' as timestamp), ' MMM dd HH:mm:ss')
> {noformat}
> This failure occurs because org.joda.time.format.DateTimeFormat interprets 
> the pattern 'MMM' differently depending on the locale. This will probably 
> occur in other OS platforms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4342) Drill fails to read a date column from hive generated parquet

2016-02-04 Thread Jason Altekruse (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15133739#comment-15133739
 ] 

Jason Altekruse commented on DRILL-4342:


Yes, it would be a problem for Hive as well. This can be marked a duplicate.

> Drill fails to read a date column from hive generated parquet
> -
>
> Key: DRILL-4342
> URL: https://issues.apache.org/jira/browse/DRILL-4342
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive, Storage - Parquet
>Reporter: Rahul Challapalli
> Attachments: fewtypes_null.parquet
>
>
> git.commit.id.abbrev=576271d
> Below is the hive ddl (using hive 1.2 which supports date in parquet)
> {code}
> create external table hive1dot2_fewtypes_null (
>   int_col int,
>   bigint_col bigint,
>   date_col date,
>   time_col string,
>   timestamp_col timestamp,
>   interval_col string,
>   varchar_col string,
>   float_col float,
>   double_col double,
>   bool_col boolean
> )
> stored as parquet
> location '/drill/testdata/hive_storage/hive1dot2_fewtypes_null';
> {code}
> Query using the hive storage plugin
> {code}
> date_col from hive.hive1dot2_fewtypes_null;
> +-+
> |  date_col   |
> +-+
> | null|
> | null|
> | null|
> | 1996-01-29  |
> | 1996-03-01  |
> | 1996-03-02  |
> | 1997-02-28  |
> | null|
> | 1997-03-01  |
> | 1997-03-02  |
> | 2000-04-01  |
> | 2000-04-03  |
> | 2038-04-08  |
> | 2039-04-09  |
> | 2040-04-10  |
> | null|
> | 1999-02-08  |
> | 1999-03-08  |
> | 1999-01-18  |
> | 2003-01-02  |
> | null|
> +-+
> {code}
> Below is the output reading through dfs parquet reader. 
> {code}
> 0: jdbc:drill:zk=10.10.10.41:5181> select date_col from 
> dfs.`/drill/testdata/hive_storage/hive1dot2_fewtypes_null`;
> +-+
> |  date_col   |
> +-+
> | null|
> | null|
> | null|
> | 369-02-09  |
> | 369-03-12  |
> | 369-03-13  |
> | 368-03-11  |
> | null|
> | 368-03-12  |
> | 368-03-13  |
> | 365-04-12  |
> | 365-04-14  |
> | 327-04-19  |
> | 326-04-20  |
> | 325-04-21  |
> | null|
> | 366-02-19  |
> | 366-03-19  |
> | 366-01-29  |
> | 362-01-13  |
> | null|
> +-+
> {code}
> I attached the parquet file generated from hive. Let me know if anything else 
> is needed for reproducing this issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4347) Planning time for query64 from TPCDS test suite has increased 10 times compared to 1.4 release

2016-02-03 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4347:
---
Affects Version/s: (was: 0.5.0)
   1.5.0

> Planning time for query64 from TPCDS test suite has increased 10 times 
> compared to 1.4 release
> --
>
> Key: DRILL-4347
> URL: https://issues.apache.org/jira/browse/DRILL-4347
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.5.0
>Reporter: Victoria Markman
> Attachments: 294e9fb9-cdda-a89f-d1a7-b852878926a1.sys.drill_1.4.0, 
> 294ea418-9fb8-3082-1725-74e3cfe38fe9.sys.drill_1.5.0
>
>
> mapr-drill-1.5.0.201602012001-1.noarch.rpm
> {code}
> 0: jdbc:drill:schema=dfs> WITH cs_ui
> . . . . . . . . . . . . >  AS (SELECT cs_item_sk,
> . . . . . . . . . . . . > Sum(cs_ext_list_price) AS sale,
> . . . . . . . . . . . . > Sum(cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit) AS refund
> . . . . . . . . . . . . >  FROM   catalog_sales,
> . . . . . . . . . . . . > catalog_returns
> . . . . . . . . . . . . >  WHERE  cs_item_sk = cr_item_sk
> . . . . . . . . . . . . > AND cs_order_number = 
> cr_order_number
> . . . . . . . . . . . . >  GROUP  BY cs_item_sk
> . . . . . . . . . . . . >  HAVING Sum(cs_ext_list_price) > 2 * Sum(
> . . . . . . . . . . . . > cr_refunded_cash + 
> cr_reversed_charge
> . . . . . . . . . . . . > + cr_store_credit)),
> . . . . . . . . . . . . >  cross_sales
> . . . . . . . . . . . . >  AS (SELECT i_product_name product_name,
> . . . . . . . . . . . . > i_item_sk  item_sk,
> . . . . . . . . . . . . > s_store_name   store_name,
> . . . . . . . . . . . . > s_zip  store_zip,
> . . . . . . . . . . . . > ad1.ca_street_number   
> b_street_number,
> . . . . . . . . . . . . > ad1.ca_street_name 
> b_streen_name,
> . . . . . . . . . . . . > ad1.ca_cityb_city,
> . . . . . . . . . . . . > ad1.ca_zip b_zip,
> . . . . . . . . . . . . > ad2.ca_street_number   
> c_street_number,
> . . . . . . . . . . . . > ad2.ca_street_name 
> c_street_name,
> . . . . . . . . . . . . > ad2.ca_cityc_city,
> . . . . . . . . . . . . > ad2.ca_zip c_zip,
> . . . . . . . . . . . . > d1.d_year  AS syear,
> . . . . . . . . . . . . > d2.d_year  AS fsyear,
> . . . . . . . . . . . . > d3.d_year  s2year,
> . . . . . . . . . . . . > Count(*)   cnt,
> . . . . . . . . . . . . > Sum(ss_wholesale_cost) s1,
> . . . . . . . . . . . . > Sum(ss_list_price) s2,
> . . . . . . . . . . . . > Sum(ss_coupon_amt) s3
> . . . . . . . . . . . . >  FROM   store_sales,
> . . . . . . . . . . . . > store_returns,
> . . . . . . . . . . . . > cs_ui,
> . . . . . . . . . . . . > date_dim d1,
> . . . . . . . . . . . . > date_dim d2,
> . . . . . . . . . . . . > date_dim d3,
> . . . . . . . . . . . . > store,
> . . . . . . . . . . . . > customer,
> . . . . . . . . . . . . > customer_demographics cd1,
> . . . . . . . . . . . . > customer_demographics cd2,
> . . . . . . . . . . . . > promotion,
> . . . . . . . . . . . . > household_demographics hd1,
> . . . . . . . . . . . . > household_demographics hd2,
> . . . . . . . . . . . . > customer_address ad1,
> . . . . . . . . . . . . > customer_address ad2,
> . . . . . . . . . . . . > income_band ib1,
> . . . . . . . . . . . . > income_band ib2,
> . . . . . . . . . . . . > item
> . . . . . . . . . . . . >  WHERE  ss_store_sk = s_store_sk
> . . . . . . . . . . . . > AND ss_sold_date_sk = d1.d_date_sk
> . . . . . . . . . . . . > AND ss_customer_sk = c_customer_sk
> . . . . . . . . . . . . > AND ss_cdemo_sk = cd1.cd_demo_sk
> . . . . . . . . . . . . > AND ss_hdemo_sk = hd1.hd_demo_sk
> . . . . . . . . . . . . > AND ss_addr_sk = ad1.ca_address_sk
> . . . . . . . . . . . . > AND ss_item_sk = i_item_sk
> . . . . . . . . . 

[jira] [Resolved] (DRILL-4128) null pointer at org.apache.drill.exec.vector.accessor.AbstractSqlAccessor.getString(AbstractSqlAccessor.java:101)

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4128.

Resolution: Fixed
  Assignee: Jason Altekruse

Fixed in 1b96174b1e5bafb13a873dd79f03467802d7c929

> null pointer at 
> org.apache.drill.exec.vector.accessor.AbstractSqlAccessor.getString(AbstractSqlAccessor.java:101)
> -
>
> Key: DRILL-4128
> URL: https://issues.apache.org/jira/browse/DRILL-4128
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.0.0, 1.1.0, 1.2.0, 1.3.0, 1.4.0
>Reporter: Devender Yadav 
>Assignee: Jason Altekruse
>Priority: Blocker
> Fix For: 1.5.0
>
>
> While fetching data from ResultSet in JDBC. I got the null pointer. Details - 
> java.lang.NullPointerException
> at 
> org.apache.drill.exec.vector.accessor.AbstractSqlAccessor.getString(AbstractSqlAccessor.java:101)
> at 
> org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getString(BoundCheckingAccessor.java:124)
> at 
> org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getString(TypeConvertingSqlAccessor.java:649)
> at 
> org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getString(AvaticaDrillSqlAccessor.java:95)
> at 
> net.hydromatic.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:205)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.getString(DrillResultSetImpl.java:182)
> Below mentioned method is throwing null pointer becaue getObject(rowOffset) 
> returns null for null values & null.toString() is throwing null pointer.
>  @Override
>   public String getString(int rowOffset) throws InvalidAccessException{
> return getObject(rowOffset).toString();
>   }
> It should be like:
>  @Override
>   public String getString(int rowOffset) throws InvalidAccessException{
> return getObject(rowOffset)==null? null:getObject(rowOffset).toString();
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4128) null pointer at org.apache.drill.exec.vector.accessor.AbstractSqlAccessor.getString(AbstractSqlAccessor.java:101)

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse updated DRILL-4128:
---
Fix Version/s: (was: Future)
   1.5.0

> null pointer at 
> org.apache.drill.exec.vector.accessor.AbstractSqlAccessor.getString(AbstractSqlAccessor.java:101)
> -
>
> Key: DRILL-4128
> URL: https://issues.apache.org/jira/browse/DRILL-4128
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.0.0, 1.1.0, 1.2.0, 1.3.0, 1.4.0
>Reporter: Devender Yadav 
>Priority: Blocker
> Fix For: 1.5.0
>
>
> While fetching data from ResultSet in JDBC. I got the null pointer. Details - 
> java.lang.NullPointerException
> at 
> org.apache.drill.exec.vector.accessor.AbstractSqlAccessor.getString(AbstractSqlAccessor.java:101)
> at 
> org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getString(BoundCheckingAccessor.java:124)
> at 
> org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getString(TypeConvertingSqlAccessor.java:649)
> at 
> org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getString(AvaticaDrillSqlAccessor.java:95)
> at 
> net.hydromatic.avatica.AvaticaResultSet.getString(AvaticaResultSet.java:205)
> at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.getString(DrillResultSetImpl.java:182)
> Below mentioned method is throwing null pointer becaue getObject(rowOffset) 
> returns null for null values & null.toString() is throwing null pointer.
>  @Override
>   public String getString(int rowOffset) throws InvalidAccessException{
> return getObject(rowOffset).toString();
>   }
> It should be like:
>  @Override
>   public String getString(int rowOffset) throws InvalidAccessException{
> return getObject(rowOffset)==null? null:getObject(rowOffset).toString();
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (DRILL-4032) Drill unable to parse json files with schema changes

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse reopened DRILL-4032:


> Drill unable to parse json files with schema changes
> 
>
> Key: DRILL-4032
> URL: https://issues.apache.org/jira/browse/DRILL-4032
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - JSON
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Blocker
> Fix For: 1.4.0
>
>
> git.commit.id.abbrev=bb69f22
> {code}
> select d.col2.col3  from reg1 d;
> Error: DATA_READ ERROR: Error parsing JSON - index: 0, length: 4 (expected: 
> range(0, 0))
> File  /drill/testdata/reg1/a.json
> Record  2
> Fragment 0:0
> {code}
> The folder reg1 contains 2 files
> File 1 : a.json
> {code}
> {"col1": "val1","col2": null}
> {"col1": "val1","col2": {"col3":"abc", "col4":"xyz"}}
> {code}
> File 2 : b.json
> {code}
> {"col1": "val1","col2": null}
> {"col1": "val1","col2": null}
> {code}
> Exception from the log file :
> {code}
> [Error Id: a7e3c716-838d-4f8f-9361-3727b98f04cd ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.easy.json.JSONRecordReader.handleAndRaise(JSONRecordReader.java:165)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.easy.json.JSONRecordReader.next(JSONRecordReader.java:205)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:183) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:119)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:113)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:103)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:130)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:156)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:119)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> [na:1.7.0_71]
> at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>  [hadoop-common-2.7.0-mapr-1506.jar:na]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.IndexOutOfBoundsException: index: 0, length: 4 
> (expected: range(0, 0))
> at io.netty.buffer.DrillBuf.checkIndexD(DrillBuf.java:189) 
> 

[jira] [Reopened] (DRILL-4048) Parquet reader corrupts dictionary encoded binary columns

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse reopened DRILL-4048:


> Parquet reader corrupts dictionary encoded binary columns
> -
>
> Key: DRILL-4048
> URL: https://issues.apache.org/jira/browse/DRILL-4048
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Jason Altekruse
>Priority: Blocker
> Fix For: 1.4.0
>
> Attachments: lineitem_dic_enc.parquet
>
>
> git.commit.id.abbrev=04c01bd
> The below query returns corrupted data (not even showing up here) for binary 
> columns
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   |  |  | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PE  | T   | 
> egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> The same query from an older build (git.commit.id.abbrev=839f8da)
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | l_orderkey  | l_partkey  | l_suppkey  | l_linenumber  | l_quantity  | 
> l_extendedprice  | l_discount  | l_tax  | l_returnflag  | l_linestatus  | 
> l_shipdate  | l_commitdate  | l_receiptdate  |   l_shipinstruct   | 
> l_shipmode  |l_comment |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> | 1   | 1552   | 93 | 1 | 17.0| 
> 24710.35 | 0.04| 0.02   | N | O | 
> 1996-03-13  | 1996-02-12| 1996-03-22 | DELIVER IN PERSON  | TRUCK 
>   | egular courts above the  |
> +-+++---+-+--+-++---+---+-+---+++-+--+
> {code}
> Below is the output of the parquet-meta command for this dataset
> {code}
> creator: parquet-mr 
> file schema: root 
> ---
> l_orderkey:  REQUIRED INT32 R:0 D:0
> l_partkey:   REQUIRED INT32 R:0 D:0
> l_suppkey:   REQUIRED INT32 R:0 D:0
> l_linenumber:REQUIRED INT32 R:0 D:0
> l_quantity:  REQUIRED DOUBLE R:0 D:0
> l_extendedprice: REQUIRED DOUBLE R:0 D:0
> l_discount:  REQUIRED DOUBLE R:0 D:0
> l_tax:   REQUIRED DOUBLE R:0 D:0
> l_returnflag:REQUIRED BINARY O:UTF8 R:0 D:0
> l_linestatus:REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipdate:  REQUIRED INT32 O:DATE R:0 D:0
> l_commitdate:REQUIRED INT32 O:DATE R:0 D:0
> l_receiptdate:   REQUIRED INT32 O:DATE R:0 D:0
> l_shipinstruct:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipmode:  REQUIRED BINARY O:UTF8 R:0 D:0
> l_comment:   REQUIRED BINARY O:UTF8 R:0 D:0
> row group 1: RC:60175 TS:3049610 
> 

[jira] [Resolved] (DRILL-4032) Drill unable to parse json files with schema changes

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4032.

   Resolution: Fixed
Fix Version/s: 1.4.0

> Drill unable to parse json files with schema changes
> 
>
> Key: DRILL-4032
> URL: https://issues.apache.org/jira/browse/DRILL-4032
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - JSON
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Blocker
> Fix For: 1.4.0
>
>
> git.commit.id.abbrev=bb69f22
> {code}
> select d.col2.col3  from reg1 d;
> Error: DATA_READ ERROR: Error parsing JSON - index: 0, length: 4 (expected: 
> range(0, 0))
> File  /drill/testdata/reg1/a.json
> Record  2
> Fragment 0:0
> {code}
> The folder reg1 contains 2 files
> File 1 : a.json
> {code}
> {"col1": "val1","col2": null}
> {"col1": "val1","col2": {"col3":"abc", "col4":"xyz"}}
> {code}
> File 2 : b.json
> {code}
> {"col1": "val1","col2": null}
> {"col1": "val1","col2": null}
> {code}
> Exception from the log file :
> {code}
> [Error Id: a7e3c716-838d-4f8f-9361-3727b98f04cd ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.easy.json.JSONRecordReader.handleAndRaise(JSONRecordReader.java:165)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.easy.json.JSONRecordReader.next(JSONRecordReader.java:205)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:183) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:119)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:113)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:103)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:130)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:156)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:119)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> [na:1.7.0_71]
> at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>  [hadoop-common-2.7.0-mapr-1506.jar:na]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.IndexOutOfBoundsException: index: 0, length: 4 
> (expected: range(0, 0))
> at 

[jira] [Closed] (DRILL-4032) Drill unable to parse json files with schema changes

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse closed DRILL-4032.
--

> Drill unable to parse json files with schema changes
> 
>
> Key: DRILL-4032
> URL: https://issues.apache.org/jira/browse/DRILL-4032
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types, Storage - JSON
>Affects Versions: 1.3.0
>Reporter: Rahul Challapalli
>Assignee: Steven Phillips
>Priority: Blocker
> Fix For: 1.4.0
>
>
> git.commit.id.abbrev=bb69f22
> {code}
> select d.col2.col3  from reg1 d;
> Error: DATA_READ ERROR: Error parsing JSON - index: 0, length: 4 (expected: 
> range(0, 0))
> File  /drill/testdata/reg1/a.json
> Record  2
> Fragment 0:0
> {code}
> The folder reg1 contains 2 files
> File 1 : a.json
> {code}
> {"col1": "val1","col2": null}
> {"col1": "val1","col2": {"col3":"abc", "col4":"xyz"}}
> {code}
> File 2 : b.json
> {code}
> {"col1": "val1","col2": null}
> {"col1": "val1","col2": null}
> {code}
> Exception from the log file :
> {code}
> [Error Id: a7e3c716-838d-4f8f-9361-3727b98f04cd ]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.easy.json.JSONRecordReader.handleAndRaise(JSONRecordReader.java:165)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.store.easy.json.JSONRecordReader.next(JSONRecordReader.java:205)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:183) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:119)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:113)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:103)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:130)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:156)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:119)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
> [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:256)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at java.security.AccessController.doPrivileged(Native Method) 
> [na:1.7.0_71]
> at javax.security.auth.Subject.doAs(Subject.java:415) [na:1.7.0_71]
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
>  [hadoop-common-2.7.0-mapr-1506.jar:na]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250)
>  [drill-java-exec-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.IndexOutOfBoundsException: index: 0, length: 4 
> (expected: range(0, 0))
> at io.netty.buffer.DrillBuf.checkIndexD(DrillBuf.java:189) 
> 

[jira] [Resolved] (DRILL-4243) CTAS with partition by, results in Out Of Memory

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse resolved DRILL-4243.

   Resolution: Fixed
Fix Version/s: 1.5.0

> CTAS with partition by, results in Out Of Memory
> 
>
> Key: DRILL-4243
> URL: https://issues.apache.org/jira/browse/DRILL-4243
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
> Fix For: 1.5.0
>
>
> CTAS with partition by, results in Out Of Memory. It seems to be coming from 
> ExternalSortBatch
> Details of Drill are
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOTe4372f224a4b474494388356355a53808092a67a
> DRILL-4242: Updates to storage-mongo03.01.2016 @ 15:31:13 PST   
> Unknown 04.01.2016 @ 01:02:29 PST
>  create table `tpch_single_partition/lineitem` partition by (l_moddate) as 
> select l.*, l_shipdate - extract(day from l_shipdate) + 1 l_moddate from 
> cp.`tpch/lineitem.parquet` l;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while 
> executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010] 
> (state=,code=0)
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1923)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:73)
>   at 
> net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338)
>   at 
> net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:101)
>   at sqlline.Commands.execute(Commands.java:841)
>   at sqlline.Commands.sql(Commands.java:751)
>   at sqlline.SqlLine.dispatch(SqlLine.java:746)
>   at sqlline.SqlLine.runCommands(SqlLine.java:1651)
>   at sqlline.Commands.run(Commands.java:1304)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
>   at sqlline.SqlLine.dispatch(SqlLine.java:742)
>   at sqlline.SqlLine.initArgs(SqlLine.java:553)
>   at sqlline.SqlLine.begin(SqlLine.java:596)
>   at sqlline.SqlLine.start(SqlLine.java:375)
>   at sqlline.SqlLine.main(SqlLine.java:268)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: RESOURCE 
> ERROR: One or more nodes ran out of memory while executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69)
>   at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400)
>   at 
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
>   at 
> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264)
>   at 
> org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> 

[jira] [Closed] (DRILL-4243) CTAS with partition by, results in Out Of Memory

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse closed DRILL-4243.
--

> CTAS with partition by, results in Out Of Memory
> 
>
> Key: DRILL-4243
> URL: https://issues.apache.org/jira/browse/DRILL-4243
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
> Fix For: 1.5.0
>
>
> CTAS with partition by, results in Out Of Memory. It seems to be coming from 
> ExternalSortBatch
> Details of Drill are
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOTe4372f224a4b474494388356355a53808092a67a
> DRILL-4242: Updates to storage-mongo03.01.2016 @ 15:31:13 PST   
> Unknown 04.01.2016 @ 01:02:29 PST
>  create table `tpch_single_partition/lineitem` partition by (l_moddate) as 
> select l.*, l_shipdate - extract(day from l_shipdate) + 1 l_moddate from 
> cp.`tpch/lineitem.parquet` l;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while 
> executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010] 
> (state=,code=0)
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1923)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:73)
>   at 
> net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338)
>   at 
> net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:101)
>   at sqlline.Commands.execute(Commands.java:841)
>   at sqlline.Commands.sql(Commands.java:751)
>   at sqlline.SqlLine.dispatch(SqlLine.java:746)
>   at sqlline.SqlLine.runCommands(SqlLine.java:1651)
>   at sqlline.Commands.run(Commands.java:1304)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
>   at sqlline.SqlLine.dispatch(SqlLine.java:742)
>   at sqlline.SqlLine.initArgs(SqlLine.java:553)
>   at sqlline.SqlLine.begin(SqlLine.java:596)
>   at sqlline.SqlLine.start(SqlLine.java:375)
>   at sqlline.SqlLine.main(SqlLine.java:268)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: RESOURCE 
> ERROR: One or more nodes ran out of memory while executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69)
>   at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400)
>   at 
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
>   at 
> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264)
>   at 
> org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> 

[jira] [Reopened] (DRILL-4243) CTAS with partition by, results in Out Of Memory

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse reopened DRILL-4243:


> CTAS with partition by, results in Out Of Memory
> 
>
> Key: DRILL-4243
> URL: https://issues.apache.org/jira/browse/DRILL-4243
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.5.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
> Fix For: 1.5.0
>
>
> CTAS with partition by, results in Out Of Memory. It seems to be coming from 
> ExternalSortBatch
> Details of Drill are
> {noformat}
> version   commit_id   commit_message  commit_time build_email 
> build_time
> 1.5.0-SNAPSHOTe4372f224a4b474494388356355a53808092a67a
> DRILL-4242: Updates to storage-mongo03.01.2016 @ 15:31:13 PST   
> Unknown 04.01.2016 @ 01:02:29 PST
>  create table `tpch_single_partition/lineitem` partition by (l_moddate) as 
> select l.*, l_shipdate - extract(day from l_shipdate) + 1 l_moddate from 
> cp.`tpch/lineitem.parquet` l;
> Error: RESOURCE ERROR: One or more nodes ran out of memory while 
> executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010] 
> (state=,code=0)
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010]
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247)
>   at 
> org.apache.drill.jdbc.impl.DrillCursor.loadInitialSchema(DrillCursor.java:290)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:1923)
>   at 
> org.apache.drill.jdbc.impl.DrillResultSetImpl.execute(DrillResultSetImpl.java:73)
>   at 
> net.hydromatic.avatica.AvaticaConnection.executeQueryInternal(AvaticaConnection.java:404)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeQueryInternal(AvaticaStatement.java:351)
>   at 
> net.hydromatic.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:338)
>   at 
> net.hydromatic.avatica.AvaticaStatement.execute(AvaticaStatement.java:69)
>   at 
> org.apache.drill.jdbc.impl.DrillStatementImpl.execute(DrillStatementImpl.java:101)
>   at sqlline.Commands.execute(Commands.java:841)
>   at sqlline.Commands.sql(Commands.java:751)
>   at sqlline.SqlLine.dispatch(SqlLine.java:746)
>   at sqlline.SqlLine.runCommands(SqlLine.java:1651)
>   at sqlline.Commands.run(Commands.java:1304)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
>   at sqlline.SqlLine.dispatch(SqlLine.java:742)
>   at sqlline.SqlLine.initArgs(SqlLine.java:553)
>   at sqlline.SqlLine.begin(SqlLine.java:596)
>   at sqlline.SqlLine.start(SqlLine.java:375)
>   at sqlline.SqlLine.main(SqlLine.java:268)
> Caused by: org.apache.drill.common.exceptions.UserRemoteException: RESOURCE 
> ERROR: One or more nodes ran out of memory while executing the query.
> Fragment 0:0
> [Error Id: 3323fd1c-4b78-42a7-b311-23ee73c7d550 on atsqa4-193.qa.lab:31010]
>   at 
> org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119)
>   at 
> org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
>   at 
> org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
>   at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:69)
>   at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:400)
>   at 
> org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:105)
>   at 
> org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:264)
>   at 
> org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:142)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:298)
>   at 
> org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:269)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
>   at 
> 

[jira] [Reopened] (DRILL-4205) Simple query hit IndexOutOfBoundException

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse reopened DRILL-4205:


>  Simple query hit IndexOutOfBoundException
> --
>
> Key: DRILL-4205
> URL: https://issues.apache.org/jira/browse/DRILL-4205
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.4.0
>Reporter: Dechang Gu
>Assignee: Dechang Gu
> Fix For: 1.5.0
>
>
> The following query failed due to IOB:
> 0: jdbc:drill:schema=wf_pigprq100> select * from 
> `store_sales/part-m-00073.parquet`;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: srcIndex: 1048587
> Fragment 0:0
> [Error Id: ad8d2bc0-259f-483c-9024-93865963541e on ucs-node4.perf.lab:31010]
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet 
> record reader.
> Message: 
> Hadoop path: /tpcdsPigParq/SF100/store_sales/part-m-00073.parquet
> Total records read: 135280
> Mock records read: 0
> Records to read: 1424
> Row group index: 0
> Records in row group: 3775712
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message pig_schema {
>   optional int64 ss_sold_date_sk;
>   optional int64 ss_sold_time_sk;
>   optional int64 ss_item_sk;
>   optional int64 ss_customer_sk;
>   optional int64 ss_cdemo_sk;
>   optional int64 ss_hdemo_sk;
>   optional int64 ss_addr_sk;
>   optional int64 ss_store_sk;
>   optional int64 ss_promo_sk;
>   optional int64 ss_ticket_number;
>   optional int64 ss_quantity;
>   optional double ss_wholesale_cost;
>   optional double ss_list_price;
>   optional double ss_sales_price;
>   optional double ss_ext_discount_amt;
>   optional double ss_ext_sales_price;
>   optional double ss_ext_wholesale_cost;
>   optional double ss_ext_list_price;
>   optional double ss_ext_tax;
>   optional double ss_coupon_amt;
>   optional double ss_net_paid;
>   optional double ss_net_paid_inc_tax;
>   optional double ss_net_profit;
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (DRILL-4205) Simple query hit IndexOutOfBoundException

2016-02-02 Thread Jason Altekruse (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Altekruse closed DRILL-4205.
--

>  Simple query hit IndexOutOfBoundException
> --
>
> Key: DRILL-4205
> URL: https://issues.apache.org/jira/browse/DRILL-4205
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.4.0
>Reporter: Dechang Gu
>Assignee: Dechang Gu
> Fix For: 1.5.0
>
>
> The following query failed due to IOB:
> 0: jdbc:drill:schema=wf_pigprq100> select * from 
> `store_sales/part-m-00073.parquet`;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: srcIndex: 1048587
> Fragment 0:0
> [Error Id: ad8d2bc0-259f-483c-9024-93865963541e on ucs-node4.perf.lab:31010]
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet 
> record reader.
> Message: 
> Hadoop path: /tpcdsPigParq/SF100/store_sales/part-m-00073.parquet
> Total records read: 135280
> Mock records read: 0
> Records to read: 1424
> Row group index: 0
> Records in row group: 3775712
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message pig_schema {
>   optional int64 ss_sold_date_sk;
>   optional int64 ss_sold_time_sk;
>   optional int64 ss_item_sk;
>   optional int64 ss_customer_sk;
>   optional int64 ss_cdemo_sk;
>   optional int64 ss_hdemo_sk;
>   optional int64 ss_addr_sk;
>   optional int64 ss_store_sk;
>   optional int64 ss_promo_sk;
>   optional int64 ss_ticket_number;
>   optional int64 ss_quantity;
>   optional double ss_wholesale_cost;
>   optional double ss_list_price;
>   optional double ss_sales_price;
>   optional double ss_ext_discount_amt;
>   optional double ss_ext_sales_price;
>   optional double ss_ext_wholesale_cost;
>   optional double ss_ext_list_price;
>   optional double ss_ext_tax;
>   optional double ss_coupon_amt;
>   optional double ss_net_paid;
>   optional double ss_net_paid_inc_tax;
>   optional double ss_net_profit;
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   >