date:20160706

[jira] [Commented] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return

2016-07-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364090#comment-15364090
 ] 

ASF GitHub Bot commented on DRILL-4673:
---

GitHub user vdiravka opened a pull request:

https://github.com/apache/drill/pull/541

DRILL-4673: Implement "DROP TABLE IF EXISTS" for drill to prevent FAI…

…LED status on command return

- implement DROP TABLE IF EXISTS and DROP VIEW IF EXISTS;
- added unit test for DROP TABLE IF EXISTS;
- added unit test for DROP VIEW IF EXISTS;
- added unit test for "IF" hive UDF.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vdiravka/drill DRILL-4673

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #541


commit 4aa2e98cfd2c87953b9081ccc163fbc8cf4f7ce4
Author: Vitalii Diravka 
Date:   2016-05-13T19:24:31Z

DRILL-4673: Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED 
status on command return
- implement DROP TABLE IF EXISTS and DROP VIEW IF EXISTS;
- added unit test for DROP TABLE IF EXISTS;
- added unit test for DROP VIEW IF EXISTS;
- added unit test for "IF" hive UDF.




> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command return
> -
>
> Key: DRILL-4673
> URL: https://issues.apache.org/jira/browse/DRILL-4673
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: drill
>
> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command "DROP TABLE" return if table doesn't exist.
> The same for "DROP VIEW IF EXISTS"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return

2016-07-06 Thread Khurram Faraaz (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364100#comment-15364100
 ] 

Khurram Faraaz commented on DRILL-4673:
---

If there was a table T1 and user issued command, DROP TABLE T1 IF EXISTS
And if table T1 had any views defined over T1, will those views also get 
dropped ? after the above command was executed.

> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command return
> -
>
> Key: DRILL-4673
> URL: https://issues.apache.org/jira/browse/DRILL-4673
> Project: Apache Drill
>  Issue Type: New Feature
>  Components: Functions - Drill
>Reporter: Vitalii Diravka
>Assignee: Vitalii Diravka
>Priority: Minor
>  Labels: drill
>
> Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on 
> command "DROP TABLE" return if table doesn't exist.
> The same for "DROP VIEW IF EXISTS"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4763) Parquet file with DATE logical type produces wrong results for simple SELECT

2016-07-06 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364556#comment-15364556
 ] 

Paul Rogers commented on DRILL-4763:


It seems that Drill's policy for dates is to treat the data as a "bucket of 
bits." The user is required to tell Drill that, in this query, for this file, 
please treat the data as a date. The user does this using a convert function. 
(I have, however, not yet fully tested the conversions to see if they help in 
this specific case.)

The specific request here is for Drill to do the conversion automatically for 
the reasons cited above. 1) There is only one "right" way to do the conversion 
of a date, so Drill might as well do it rather than each and every query or 
view.

Note that this is a symtom of a larger problem: Drill does not undestand 
Parquet logical types. A similar problem occurs with Parquet inteval types 
(which I have not yet fully tested.)

> Parquet file with DATE logical type produces wrong results for simple SELECT
> 
>
> Key: DRILL-4763
> URL: https://issues.apache.org/jira/browse/DRILL-4763
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
> Attachments: date.parquet, int_16.parquet
>
>
> Created a simple Parquet file with the following schema:
> message test { required int32 index; required int32 value (DATE); required 
> int32 raw; }
> That is, a file with an int32 storage type and a DATE logical type. Then, 
> created a number of test values:
> 0 (which should be interpreted as 1970-01-01) and
> (int) (System.currentTimeMillis() / (24*60*60*1000) ) Which should be 
> interpreted as the number of days since 1970-01-01 and today.
> According to the Parquet spec 
> (https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md), 
> Parquet dates are expressed as "the number of days from the Unix epoch, 1 
> January 1970."
> Java timestamps are expressed as "measured in milliseconds, between the 
> current time and midnight, January 1, 1970 UTC."
> There is ambiguity here: Parquet dates are presumably local times not 
> absolute times, so the math above will actually tell us the date in London 
> right now, but that's close enough.
> Generate the local file to date.parquet. Query it with:
> SELECT * from `local`.`root`.`date.parquet`;
> The results are incorrect:
> index value raw
> 1 -11395-10-18T00:00:00.000-07:52:58  0
> Here, we have a value of 0. The displayed date is decidedly not 
> 1970-01-01T00:00:00. We actually have many problems:
> 1. The date is far off.
> 2. The output shows time. But, the Parquet DATE format explcitly does NOT 
> include time, so it makes no sense to include it.
> 3. The output attempts to show a time zone, but a time zone of -07:52:58, 
> while close to PST, is not right (there is no timezine that is of by 7:02 
> from UTC.)
> 4. The data has no time zone, Parquet DATE explicilty is a local time, so it 
> is impossible to know the relationship between that date an UTC.
> The correct output (in ISO format) would be: 1970-01-01
> The last line should be today's date, but instead is:
> 6 -11348-04-20T00:00:00.000-07:52:58  16986
> Expected:
> 2016-07-04
> Note that all the information to produce the right information is available 
> to Drill:
> 1. The DATE annotation says the meaning of the signed 32-bit integer.
> 2. Given the starting point and duration in days, the conversion to Drill's 
> own internal date format is unambiguous.
> 3. The DATE annotation says that the date is local, so Drill should not 
> attempt to convert to UTC. (That is, a Java Date object can't be used, 
> instead a Joda/Java 8 LocalDate is necessary.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4765) Missing, incorrect information in Drill data types page

2016-07-06 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365189#comment-15365189
 ] 

Paul Rogers commented on DRILL-4765:


Suggestion on solution: for Parquet, the Date logical type declares the meaning 
of the int32 field. The Parquet reader should do a Parquet-to-Drill conversion 
step for each field. For dates, that conversion means to change the Parquet 
date format to Drill's (by converting units and/or 0-point.)

Ideally, the solution should be generic so it will work for Parquet interval 
types as well in the future. Also, Parquet logical types. That is, maybe a 
table of conversion functions, keyed by Parquet logical type.

An advantage of this approach is that we can then easily support the "no-op" 
Parquet logical types of int_8, int_16, int_32, uint_8, and uint_16.

> Missing, incorrect information in Drill data types page
> ---
>
> Key: DRILL-4765
> URL: https://issues.apache.org/jira/browse/DRILL-4765
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>
> Consider the Drill Supported Types page: 
> https://drill.apache.org/docs/supported-data-types/
> A number of issues can be seen.
> For BIGINT, it would be clearer to express the range as: -2^63 to 2^63-1.
> For INTEGER, it would be clearer to express the range as: -2^31 to 2^31-1.
> DATE: The statement "in -MM-DD format" is wrong. The internal 
> representation has no format, it is just a number representing the day count. 
> The format is applied only on output and varies depending on the tool used. 
> Perhaps for the Drill web UI it is in ISO format.
> DATE: Presumably the date is not time-zone specific. That is, 2016-07-01 is 
> the first of July in both the US and India, though a given absolute time may 
> be on two different dates in these locations.
> DATE: We use 4713 BC as a 0-point. But, the calendar system has changed many 
> times since that date. (Indeed, the current system did not even exist on that 
> date.) Is this a simple projection of the current system back in time, or 
> does it adjust for the discontinuties in the Gregorian calendar? This should 
> be stated as it is important for any data files that contain historical 
> dates. (And is why choosing a 20th-century 0-point would have been better...)
> FLOAT, DOUBLE: presumably these are in the standard IEEE Standard 754 format? 
> If so, let's state that.
> INTERVAL: there are many ways that intervals have been represented in DB 
> systems. Parquet represents data as a triple: months, days and 
> (milli)seconds. Does Drill use a similar format? If not, what is the format? 
> A normal DB can declare the interval as part of the data declaration. How 
> does Drill infer the format? How does the user access the parts of the range?
> INTERVAL: the footnote says, "Internally, INTERVAL is represented as 
> INTERVALDAY or INTERVALYEAR." But, if so, then INTERVAL can't represent a 
> time interval: a serious limitation. Also, we can't convert a Parquet 
> Interval to a Drill interval since there is no mapping to Drill that includes 
> months, days and seconds. This is a huge limitation and should be explained.
> SMALLINT: This is a supported types table, but the footnotes say SMALLINT is 
> not supported. We also do not list the many internal Value Vector types we 
> don't support (int8, uint8, int16, uint16, uint32 and so on.) Should we list 
> SMALLINT if we don't actually support it?
> TIME: the format is acutally number of seconds since 2001-01-01. The "24-hour 
> based time ... in hours, minutes, seconds format" confuses display format 
> with internal representation. See DATE above.
> TIME: Presumably the time is in local time, not UTC. That is, the time is 
> 12:34:56 with the time zone left unspecified.
> TIME: The example for TIME is, "22:55:55.23". But, note that the example 
> shows milliseconds, but the description says the time unit is seconds. Which 
> is right?
> TIME: The example shows just a time (seconds since midnight), but the 
> description says that this is a timestamp: number of seconds since 
> 2010-01-01. If so, then is TIME like TIMESTAMP (with a different basis)? Or 
> is really a time-only value (so that the description is wrong?)
> TIMESTAMP: The description says "JDBC timestamp", but this is not accurate. 
> JDBC is a layer on top of a DB. So, we could say, "JDBC timestamp format".
> TIMESTAMP: Explain the basis. A JDBC timestamp 
> (https://docs.oracle.com/javase/8/docs/api/java/sql/Timestamp.html) expresses 
> time in nanoseconds since 1970-01-01. So, does Drill also have nanosecond 
> precision? The docs say, "optional milliseconds", so presumably Drill only 
> keeps milliseconds, As a result, the

[jira] [Commented] (DRILL-4765) Missing, incorrect information in Drill data types page

2016-07-06 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365240#comment-15365240
 ] 

Paul Rogers commented on DRILL-4765:


I've not personally tested Drill's Parquet file write support. However, if a 
date conversion is needed on write from Drill format to Parquet format, then 
that change should also be handled by a fix for this issue.

> Missing, incorrect information in Drill data types page
> ---
>
> Key: DRILL-4765
> URL: https://issues.apache.org/jira/browse/DRILL-4765
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>
> Consider the Drill Supported Types page: 
> https://drill.apache.org/docs/supported-data-types/
> A number of issues can be seen.
> For BIGINT, it would be clearer to express the range as: -2^63 to 2^63-1.
> For INTEGER, it would be clearer to express the range as: -2^31 to 2^31-1.
> DATE: The statement "in -MM-DD format" is wrong. The internal 
> representation has no format, it is just a number representing the day count. 
> The format is applied only on output and varies depending on the tool used. 
> Perhaps for the Drill web UI it is in ISO format.
> DATE: Presumably the date is not time-zone specific. That is, 2016-07-01 is 
> the first of July in both the US and India, though a given absolute time may 
> be on two different dates in these locations.
> DATE: We use 4713 BC as a 0-point. But, the calendar system has changed many 
> times since that date. (Indeed, the current system did not even exist on that 
> date.) Is this a simple projection of the current system back in time, or 
> does it adjust for the discontinuties in the Gregorian calendar? This should 
> be stated as it is important for any data files that contain historical 
> dates. (And is why choosing a 20th-century 0-point would have been better...)
> FLOAT, DOUBLE: presumably these are in the standard IEEE Standard 754 format? 
> If so, let's state that.
> INTERVAL: there are many ways that intervals have been represented in DB 
> systems. Parquet represents data as a triple: months, days and 
> (milli)seconds. Does Drill use a similar format? If not, what is the format? 
> A normal DB can declare the interval as part of the data declaration. How 
> does Drill infer the format? How does the user access the parts of the range?
> INTERVAL: the footnote says, "Internally, INTERVAL is represented as 
> INTERVALDAY or INTERVALYEAR." But, if so, then INTERVAL can't represent a 
> time interval: a serious limitation. Also, we can't convert a Parquet 
> Interval to a Drill interval since there is no mapping to Drill that includes 
> months, days and seconds. This is a huge limitation and should be explained.
> SMALLINT: This is a supported types table, but the footnotes say SMALLINT is 
> not supported. We also do not list the many internal Value Vector types we 
> don't support (int8, uint8, int16, uint16, uint32 and so on.) Should we list 
> SMALLINT if we don't actually support it?
> TIME: the format is acutally number of seconds since 2001-01-01. The "24-hour 
> based time ... in hours, minutes, seconds format" confuses display format 
> with internal representation. See DATE above.
> TIME: Presumably the time is in local time, not UTC. That is, the time is 
> 12:34:56 with the time zone left unspecified.
> TIME: The example for TIME is, "22:55:55.23". But, note that the example 
> shows milliseconds, but the description says the time unit is seconds. Which 
> is right?
> TIME: The example shows just a time (seconds since midnight), but the 
> description says that this is a timestamp: number of seconds since 
> 2010-01-01. If so, then is TIME like TIMESTAMP (with a different basis)? Or 
> is really a time-only value (so that the description is wrong?)
> TIMESTAMP: The description says "JDBC timestamp", but this is not accurate. 
> JDBC is a layer on top of a DB. So, we could say, "JDBC timestamp format".
> TIMESTAMP: Explain the basis. A JDBC timestamp 
> (https://docs.oracle.com/javase/8/docs/api/java/sql/Timestamp.html) expresses 
> time in nanoseconds since 1970-01-01. So, does Drill also have nanosecond 
> precision? The docs say, "optional milliseconds", so presumably Drill only 
> keeps milliseconds, As a result, the Drill timestamp is NOT a JDBC timestamp.
> TIMESTAMP: JDBC timestamps are vague. They are based on a Java Date which is 
> defined as milliseconds since 1970-01-01T00:00:00 UTC. But, it seems a JDBC 
> timestamp is local (it has no implied timezone). Does Drill assume that a 
> TIMESTAMP is UTC (like java.util.Date) or local (like java.sql.Date)?
> TIMESTAMP & DATE/TIME: We've created an incompatibility between the date & 
> time format on the

[jira] [Updated] (DRILL-4767) Parquet reader throw IllegalArgumentException for int32 type with GZIP compression

2016-07-06 Thread Chun Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang updated DRILL-4767:
--
Attachment: int32_10_bs10k_ps1k_gzip.parquet

> Parquet reader throw IllegalArgumentException for int32 type with GZIP 
> compression
> --
>
> Key: DRILL-4767
> URL: https://issues.apache.org/jira/browse/DRILL-4767
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Chun Chang
> Attachments: int32_10_bs10k_ps1k_gzip.parquet
>
>
> Created a small parquet file with the following schema:
> {noformat}
> [root@perfnode166 parquet-mr]# java -jar 
> parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar schema 
> /mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
> message test {
>   required int32 int32_field_required;
>   optional int32 int32_field_optional;
>   repeated int32 int32_field_repeated;
> }
> {noformat}
> and meta
> {noformat}
> [root@perfnode166 parquet-mr]# java -jar 
> parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar meta 
> /mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
> file: 
> file:/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
> creator:  parquet-mr version 1.8.2-SNAPSHOT (build 
> 0cfa025d6ffeee07cb0fa2125c977185b849e5c9)
> extra:writer.model.name = example
> file schema:  test
> 
> int32_field_required: REQUIRED INT32 R:0 D:0
> int32_field_optional: OPTIONAL INT32 R:0 D:1
> int32_field_repeated: REPEATED INT32 R:1 D:1
> row group 1:  RC:10 TS:147 OFFSET:4
> 
> int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 
> ENC:DELTA_BINARY_PACKED
> int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 
> ENC:DELTA_BINARY_PACKED
> int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 
> ENC:DELTA_BINARY_PACKED
> {noformat}
> and dump
> {noformat}
> [root@perfnode166 parquet-mr]# java -jar 
> parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar dump 
> /mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
> row group 0
> 
> int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 ENC:D 
> [more]...
> int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 ENC: 
> [more]...
> int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 ENC 
> [more]...
> int32_field_required TV=10 RL=0 DL=0
> 
> 
> page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 0, max:  
> [more]... VC:10
> int32_field_optional TV=10 RL=0 DL=1
> 
> 
> page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 1, max:  
> [more]... VC:10
> int32_field_repeated TV=10 RL=1 DL=1
> 
> 
> page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 2, max:  
> [more]... VC:10
> INT32 int32_field_required
> 
> *** row group 1 of 1, values 1 to 10 ***
> value 1:  R:0 D:0 V:0
> value 2:  R:0 D:0 V:3
> value 3:  R:0 D:0 V:6
> value 4:  R:0 D:0 V:9
> value 5:  R:0 D:0 V:12
> value 6:  R:0 D:0 V:15
> value 7:  R:0 D:0 V:18
> value 8:  R:0 D:0 V:21
> value 9:  R:0 D:0 V:24
> value 10: R:0 D:0 V:27
> INT32 int32_field_optional
> 
> *** row group 1 of 1, values 1 to 10 ***
> value 1:  R:0 D:1 V:1
> value 2:  R:0 D:1 V:4
> value 3:  R:0 D:1 V:7
> value 4:  R:0 D:1 V:10
> value 5:  R:0 D:1 V:13
> value 6:  R:0 D:1 V:16
> value 7:  R:0 D:1 V:19
> value 8:  R:0 D:1 V:22
> value 9:  R:0 D:1 V:25
> value 10: R:0 D:1 V:28
> INT32 int32_field_repeated
> 
> *** row group 1 of 1, values 1 to 10 ***
> value 1:  R:0 D:1 V:2
> value 2:  R:0 D:1 V:5
> value 3:  R:0 D:1 V:8
> value 4:  R:0 D:1 V:11
> value 5:  R:0 D:1 V:14
> value 6:  R:0 D:1 V:17
> value 7:  R:0 D:1 V:20
> value 8:  R:0 D:1 V:23
> value 9:  R:0 D:1 V:26
> value 10: R:0 D:1 V:29
> {noformat}
> But query through drill, I got the following error:
> {noformat}
> 0: jdbc:drill:schema=dfs.drillTestDir> select * from 
>

[jira] [Created] (DRILL-4769) forman spins query int32 data with snappy compression

2016-07-06 Thread Chun Chang (JIRA)

Chun Chang created DRILL-4769:
-

 Summary: forman spins query int32 data with snappy compression
 Key: DRILL-4769
 URL: https://issues.apache.org/jira/browse/DRILL-4769
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.7.0
Reporter: Chun Chang


Similar data structure as DRILL-4767, but with SNAPPY compression, same query, 
drill forman just spins. 

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar dump 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_snappy.parquet
row group 0

int32_field_required:  INT32 SNAPPY DO:0 FPO:4 SZ:49/47/0.96 VC:10 ENC [more]...
int32_field_optional:  INT32 SNAPPY DO:0 FPO:53 SZ:51/49/0.96 VC:10 EN [more]...
int32_field_repeated:  INT32 SNAPPY DO:0 FPO:104 SZ:53/51/0.96 VC:10 E [more]...

int32_field_required TV=10 RL=0 DL=0

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 0, max:  
[more]... VC:10

int32_field_optional TV=10 RL=0 DL=1

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 1, max:  
[more]... VC:10

int32_field_repeated TV=10 RL=1 DL=1

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 2, max:  
[more]... VC:10

INT32 int32_field_required

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:0 V:0
value 2:  R:0 D:0 V:3
value 3:  R:0 D:0 V:6
value 4:  R:0 D:0 V:9
value 5:  R:0 D:0 V:12
value 6:  R:0 D:0 V:15
value 7:  R:0 D:0 V:18
value 8:  R:0 D:0 V:21
value 9:  R:0 D:0 V:24
value 10: R:0 D:0 V:27

INT32 int32_field_optional

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:1 V:1
value 2:  R:0 D:1 V:4
value 3:  R:0 D:1 V:7
value 4:  R:0 D:1 V:10
value 5:  R:0 D:1 V:13
value 6:  R:0 D:1 V:16
value 7:  R:0 D:1 V:19
value 8:  R:0 D:1 V:22
value 9:  R:0 D:1 V:25
value 10: R:0 D:1 V:28

INT32 int32_field_repeated

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:1 V:2
value 2:  R:0 D:1 V:5
value 3:  R:0 D:1 V:8
value 4:  R:0 D:1 V:11
value 5:  R:0 D:1 V:14
value 6:  R:0 D:1 V:17
value 7:  R:0 D:1 V:20
value 8:  R:0 D:1 V:23
value 9:  R:0 D:1 V:26
value 10: R:0 D:1 V:29
{noformat}

Here is the drillbit thread dump:

{noformat}
[root@perfnode169 ~]# jstack -l 7355
2016-07-06 17:25:56
Full thread dump OpenJDK 64-Bit Server VM (24.79-b02 mixed mode):

"Attach Listener" daemon prio=10 tid=0x7fbae45a0800 nid=0x2a52 waiting on 
condition [0x]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
- None

"qtp239614979-176" prio=10 tid=0x016bd000 nid=0x2329 waiting on 
condition [0x7fbab749a000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0006f7700410> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at 
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:389)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:513)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.access$700(QueuedThreadPool.java:48)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:569)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- None

"qtp239614979-174" prio=10 tid=0x01b3c800 nid=0x2327 waiting on 
condition [0x7fbabd7e5000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x0006f7700410> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at 
org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:389)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:513)
at

[jira] [Updated] (DRILL-4769) forman spins query int32 data with snappy compression

2016-07-06 Thread Chun Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang updated DRILL-4769:
--
Attachment: int32_10_bs10k_ps1k_snappy.parquet

> forman spins query int32 data with snappy compression
> -
>
> Key: DRILL-4769
> URL: https://issues.apache.org/jira/browse/DRILL-4769
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.7.0
>Reporter: Chun Chang
> Attachments: int32_10_bs10k_ps1k_snappy.parquet
>
>
> Similar data structure as DRILL-4767, but with SNAPPY compression, same 
> query, drill forman just spins. 
> {noformat}
> [root@perfnode166 parquet-mr]# java -jar 
> parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar dump 
> /mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_snappy.parquet
> row group 0
> 
> int32_field_required:  INT32 SNAPPY DO:0 FPO:4 SZ:49/47/0.96 VC:10 ENC 
> [more]...
> int32_field_optional:  INT32 SNAPPY DO:0 FPO:53 SZ:51/49/0.96 VC:10 EN 
> [more]...
> int32_field_repeated:  INT32 SNAPPY DO:0 FPO:104 SZ:53/51/0.96 VC:10 E 
> [more]...
> int32_field_required TV=10 RL=0 DL=0
> 
> 
> page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 0, max:  
> [more]... VC:10
> int32_field_optional TV=10 RL=0 DL=1
> 
> 
> page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 1, max:  
> [more]... VC:10
> int32_field_repeated TV=10 RL=1 DL=1
> 
> 
> page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 2, max:  
> [more]... VC:10
> INT32 int32_field_required
> 
> *** row group 1 of 1, values 1 to 10 ***
> value 1:  R:0 D:0 V:0
> value 2:  R:0 D:0 V:3
> value 3:  R:0 D:0 V:6
> value 4:  R:0 D:0 V:9
> value 5:  R:0 D:0 V:12
> value 6:  R:0 D:0 V:15
> value 7:  R:0 D:0 V:18
> value 8:  R:0 D:0 V:21
> value 9:  R:0 D:0 V:24
> value 10: R:0 D:0 V:27
> INT32 int32_field_optional
> 
> *** row group 1 of 1, values 1 to 10 ***
> value 1:  R:0 D:1 V:1
> value 2:  R:0 D:1 V:4
> value 3:  R:0 D:1 V:7
> value 4:  R:0 D:1 V:10
> value 5:  R:0 D:1 V:13
> value 6:  R:0 D:1 V:16
> value 7:  R:0 D:1 V:19
> value 8:  R:0 D:1 V:22
> value 9:  R:0 D:1 V:25
> value 10: R:0 D:1 V:28
> INT32 int32_field_repeated
> 
> *** row group 1 of 1, values 1 to 10 ***
> value 1:  R:0 D:1 V:2
> value 2:  R:0 D:1 V:5
> value 3:  R:0 D:1 V:8
> value 4:  R:0 D:1 V:11
> value 5:  R:0 D:1 V:14
> value 6:  R:0 D:1 V:17
> value 7:  R:0 D:1 V:20
> value 8:  R:0 D:1 V:23
> value 9:  R:0 D:1 V:26
> value 10: R:0 D:1 V:29
> {noformat}
> Here is the drillbit thread dump:
> {noformat}
> [root@perfnode169 ~]# jstack -l 7355
> 2016-07-06 17:25:56
> Full thread dump OpenJDK 64-Bit Server VM (24.79-b02 mixed mode):
> "Attach Listener" daemon prio=10 tid=0x7fbae45a0800 nid=0x2a52 waiting on 
> condition [0x]
>java.lang.Thread.State: RUNNABLE
>Locked ownable synchronizers:
>   - None
> "qtp239614979-176" prio=10 tid=0x016bd000 nid=0x2329 waiting on 
> condition [0x7fbab749a000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0006f7700410> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>   at 
> org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:389)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:513)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.access$700(QueuedThreadPool.java:48)
>   at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:569)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> "qtp239614979-174" prio=10 tid=0x01b3c800 nid=0x2327 waiting on 
> condition [0x7fbabd7e5000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0006f7700410> (a 
>

[jira] [Updated] (DRILL-4767) Parquet reader throw IllegalArgumentException for int32 type with GZIP compression

2016-07-06 Thread Abhishek Girish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish updated DRILL-4767:
---
Description: 
Created a small parquet file with the following schema:

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar schema 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
message test {
  required int32 int32_field_required;
  optional int32 int32_field_optional;
  repeated int32 int32_field_repeated;
}
{noformat}

and meta

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar meta 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
file: 
file:/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
creator:  parquet-mr version 1.8.2-SNAPSHOT (build 
0cfa025d6ffeee07cb0fa2125c977185b849e5c9)
extra:writer.model.name = example

file schema:  test

int32_field_required: REQUIRED INT32 R:0 D:0
int32_field_optional: OPTIONAL INT32 R:0 D:1
int32_field_repeated: REPEATED INT32 R:1 D:1

row group 1:  RC:10 TS:147 OFFSET:4

int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 
ENC:DELTA_BINARY_PACKED
int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 
ENC:DELTA_BINARY_PACKED
int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 
ENC:DELTA_BINARY_PACKED
{noformat}

and dump

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar dump 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
row group 0

int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 ENC:D [more]...
int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 ENC: [more]...
int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 ENC [more]...

int32_field_required TV=10 RL=0 DL=0

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 0, max:  
[more]... VC:10

int32_field_optional TV=10 RL=0 DL=1

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 1, max:  
[more]... VC:10

int32_field_repeated TV=10 RL=1 DL=1

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 2, max:  
[more]... VC:10

INT32 int32_field_required

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:0 V:0
value 2:  R:0 D:0 V:3
value 3:  R:0 D:0 V:6
value 4:  R:0 D:0 V:9
value 5:  R:0 D:0 V:12
value 6:  R:0 D:0 V:15
value 7:  R:0 D:0 V:18
value 8:  R:0 D:0 V:21
value 9:  R:0 D:0 V:24
value 10: R:0 D:0 V:27

INT32 int32_field_optional

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:1 V:1
value 2:  R:0 D:1 V:4
value 3:  R:0 D:1 V:7
value 4:  R:0 D:1 V:10
value 5:  R:0 D:1 V:13
value 6:  R:0 D:1 V:16
value 7:  R:0 D:1 V:19
value 8:  R:0 D:1 V:22
value 9:  R:0 D:1 V:25
value 10: R:0 D:1 V:28

INT32 int32_field_repeated

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:1 V:2
value 2:  R:0 D:1 V:5
value 3:  R:0 D:1 V:8
value 4:  R:0 D:1 V:11
value 5:  R:0 D:1 V:14
value 6:  R:0 D:1 V:17
value 7:  R:0 D:1 V:20
value 8:  R:0 D:1 V:23
value 9:  R:0 D:1 V:26
value 10: R:0 D:1 V:29
{noformat}

But query through drill, I got the following error:

{noformat}
0: jdbc:drill:schema=dfs.drillTestDir> select * from 
dfs.`drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet`;
Error: SYSTEM ERROR: IllegalArgumentException

Fragment 0:0

[Error Id: d91ec9fe-0ce3-4d05-9e5b-d53cebb99726 on 10.10.30.169:31010] 
(state=,code=0)

0: jdbc:drill:schema=dfs.drillTestDir> select * from sys.version;
+-+---+---++-++
| version | commit_id | 
 commit_message   |commit_time  
   | build_email | build_time |

[jira] [Created] (DRILL-4767) Parquet reader throw IllegalArgumentException for int32 type with GZIP compression

2016-07-06 Thread Chun Chang (JIRA)

Chun Chang created DRILL-4767:
-

 Summary: Parquet reader throw IllegalArgumentException for int32 
type with GZIP compression
 Key: DRILL-4767
 URL: https://issues.apache.org/jira/browse/DRILL-4767
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.7.0
Reporter: Chun Chang


Created a small parquet file with the following schema:

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar schema 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
message test {
  required int32 int32_field_required;
  optional int32 int32_field_optional;
  repeated int32 int32_field_repeated;
}
{noformt}

and meta

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar meta 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
file: 
file:/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
creator:  parquet-mr version 1.8.2-SNAPSHOT (build 
0cfa025d6ffeee07cb0fa2125c977185b849e5c9)
extra:writer.model.name = example

file schema:  test

int32_field_required: REQUIRED INT32 R:0 D:0
int32_field_optional: OPTIONAL INT32 R:0 D:1
int32_field_repeated: REPEATED INT32 R:1 D:1

row group 1:  RC:10 TS:147 OFFSET:4

int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 
ENC:DELTA_BINARY_PACKED
int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 
ENC:DELTA_BINARY_PACKED
int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 
ENC:DELTA_BINARY_PACKED
{noformat}

and dump

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar dump 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
row group 0

int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 ENC:D [more]...
int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 ENC: [more]...
int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 ENC [more]...

int32_field_required TV=10 RL=0 DL=0

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 0, max:  
[more]... VC:10

int32_field_optional TV=10 RL=0 DL=1

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 1, max:  
[more]... VC:10

int32_field_repeated TV=10 RL=1 DL=1

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 2, max:  
[more]... VC:10

INT32 int32_field_required

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:0 V:0
value 2:  R:0 D:0 V:3
value 3:  R:0 D:0 V:6
value 4:  R:0 D:0 V:9
value 5:  R:0 D:0 V:12
value 6:  R:0 D:0 V:15
value 7:  R:0 D:0 V:18
value 8:  R:0 D:0 V:21
value 9:  R:0 D:0 V:24
value 10: R:0 D:0 V:27

INT32 int32_field_optional

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:1 V:1
value 2:  R:0 D:1 V:4
value 3:  R:0 D:1 V:7
value 4:  R:0 D:1 V:10
value 5:  R:0 D:1 V:13
value 6:  R:0 D:1 V:16
value 7:  R:0 D:1 V:19
value 8:  R:0 D:1 V:22
value 9:  R:0 D:1 V:25
value 10: R:0 D:1 V:28

INT32 int32_field_repeated

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:1 V:2
value 2:  R:0 D:1 V:5
value 3:  R:0 D:1 V:8
value 4:  R:0 D:1 V:11
value 5:  R:0 D:1 V:14
value 6:  R:0 D:1 V:17
value 7:  R:0 D:1 V:20
value 8:  R:0 D:1 V:23
value 9:  R:0 D:1 V:26
value 10: R:0 D:1 V:29
{noformat}

But query through drill, I got the following error:

{noformat}
0: jdbc:drill:schema=dfs.drillTestDir> select * from 
dfs.`drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet`;
Error: SYSTEM ERROR: IllegalArgumentException

Fragment 0:0

[Error Id: d91ec9fe-0ce3-4d05-9e5b-d53cebb99726 on 10.10.30.169:31010] 
(state=,code=0)

0: jdbc:drill:schema=dfs.drillTestDir> select * from sys.version;
+-+---+---++-++
| version | commit_id |

[jira] [Updated] (DRILL-4767) Parquet reader throw IllegalArgumentException for int32 type with GZIP compression

2016-07-06 Thread Chun Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang updated DRILL-4767:
--
Description: 
Created a small parquet file with the following schema:

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar schema 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
message test {
  required int32 int32_field_required;
  optional int32 int32_field_optional;
  repeated int32 int32_field_repeated;
}
{/noformt}

and meta

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar meta 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
file: 
file:/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
creator:  parquet-mr version 1.8.2-SNAPSHOT (build 
0cfa025d6ffeee07cb0fa2125c977185b849e5c9)
extra:writer.model.name = example

file schema:  test

int32_field_required: REQUIRED INT32 R:0 D:0
int32_field_optional: OPTIONAL INT32 R:0 D:1
int32_field_repeated: REPEATED INT32 R:1 D:1

row group 1:  RC:10 TS:147 OFFSET:4

int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 
ENC:DELTA_BINARY_PACKED
int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 
ENC:DELTA_BINARY_PACKED
int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 
ENC:DELTA_BINARY_PACKED
{/noformat}

and dump

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar dump 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
row group 0

int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 ENC:D [more]...
int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 ENC: [more]...
int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 ENC [more]...

int32_field_required TV=10 RL=0 DL=0

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 0, max:  
[more]... VC:10

int32_field_optional TV=10 RL=0 DL=1

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 1, max:  
[more]... VC:10

int32_field_repeated TV=10 RL=1 DL=1

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 2, max:  
[more]... VC:10

INT32 int32_field_required

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:0 V:0
value 2:  R:0 D:0 V:3
value 3:  R:0 D:0 V:6
value 4:  R:0 D:0 V:9
value 5:  R:0 D:0 V:12
value 6:  R:0 D:0 V:15
value 7:  R:0 D:0 V:18
value 8:  R:0 D:0 V:21
value 9:  R:0 D:0 V:24
value 10: R:0 D:0 V:27

INT32 int32_field_optional

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:1 V:1
value 2:  R:0 D:1 V:4
value 3:  R:0 D:1 V:7
value 4:  R:0 D:1 V:10
value 5:  R:0 D:1 V:13
value 6:  R:0 D:1 V:16
value 7:  R:0 D:1 V:19
value 8:  R:0 D:1 V:22
value 9:  R:0 D:1 V:25
value 10: R:0 D:1 V:28

INT32 int32_field_repeated

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:1 V:2
value 2:  R:0 D:1 V:5
value 3:  R:0 D:1 V:8
value 4:  R:0 D:1 V:11
value 5:  R:0 D:1 V:14
value 6:  R:0 D:1 V:17
value 7:  R:0 D:1 V:20
value 8:  R:0 D:1 V:23
value 9:  R:0 D:1 V:26
value 10: R:0 D:1 V:29
{/noformat}

But query through drill, I got the following error:

{noformat}
0: jdbc:drill:schema=dfs.drillTestDir> select * from 
dfs.`drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet`;
Error: SYSTEM ERROR: IllegalArgumentException

Fragment 0:0

[Error Id: d91ec9fe-0ce3-4d05-9e5b-d53cebb99726 on 10.10.30.169:31010] 
(state=,code=0)

0: jdbc:drill:schema=dfs.drillTestDir> select * from sys.version;
+-+---+---++-++
| version | commit_id | 
 commit_message   |commit_time  
   | build_email | build_time |

[jira] [Updated] (DRILL-4767) Parquet reader throw IllegalArgumentException for int32 type with GZIP compression

2016-07-06 Thread Chun Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chang updated DRILL-4767:
--
Description: 
Created a small parquet file with the following schema:

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar schema 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
message test {
  required int32 int32_field_required;
  optional int32 int32_field_optional;
  repeated int32 int32_field_repeated;
}
{noformt}

and meta

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar meta 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
file: 
file:/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
creator:  parquet-mr version 1.8.2-SNAPSHOT (build 
0cfa025d6ffeee07cb0fa2125c977185b849e5c9)
extra:writer.model.name = example

file schema:  test

int32_field_required: REQUIRED INT32 R:0 D:0
int32_field_optional: OPTIONAL INT32 R:0 D:1
int32_field_repeated: REPEATED INT32 R:1 D:1

row group 1:  RC:10 TS:147 OFFSET:4

int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 
ENC:DELTA_BINARY_PACKED
int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 
ENC:DELTA_BINARY_PACKED
int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 
ENC:DELTA_BINARY_PACKED
{noformat}

and dump

{noformat}
[root@perfnode166 parquet-mr]# java -jar 
parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar dump 
/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
row group 0

int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 ENC:D [more]...
int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 ENC: [more]...
int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 ENC [more]...

int32_field_required TV=10 RL=0 DL=0

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 0, max:  
[more]... VC:10

int32_field_optional TV=10 RL=0 DL=1

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 1, max:  
[more]... VC:10

int32_field_repeated TV=10 RL=1 DL=1

page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 2, max:  
[more]... VC:10

INT32 int32_field_required

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:0 V:0
value 2:  R:0 D:0 V:3
value 3:  R:0 D:0 V:6
value 4:  R:0 D:0 V:9
value 5:  R:0 D:0 V:12
value 6:  R:0 D:0 V:15
value 7:  R:0 D:0 V:18
value 8:  R:0 D:0 V:21
value 9:  R:0 D:0 V:24
value 10: R:0 D:0 V:27

INT32 int32_field_optional

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:1 V:1
value 2:  R:0 D:1 V:4
value 3:  R:0 D:1 V:7
value 4:  R:0 D:1 V:10
value 5:  R:0 D:1 V:13
value 6:  R:0 D:1 V:16
value 7:  R:0 D:1 V:19
value 8:  R:0 D:1 V:22
value 9:  R:0 D:1 V:25
value 10: R:0 D:1 V:28

INT32 int32_field_repeated

*** row group 1 of 1, values 1 to 10 ***
value 1:  R:0 D:1 V:2
value 2:  R:0 D:1 V:5
value 3:  R:0 D:1 V:8
value 4:  R:0 D:1 V:11
value 5:  R:0 D:1 V:14
value 6:  R:0 D:1 V:17
value 7:  R:0 D:1 V:20
value 8:  R:0 D:1 V:23
value 9:  R:0 D:1 V:26
value 10: R:0 D:1 V:29
{noformat}

But query through drill, I got the following error:

{noformat}
0: jdbc:drill:schema=dfs.drillTestDir> select * from 
dfs.`drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet`;
Error: SYSTEM ERROR: IllegalArgumentException

Fragment 0:0

[Error Id: d91ec9fe-0ce3-4d05-9e5b-d53cebb99726 on 10.10.30.169:31010] 
(state=,code=0)

0: jdbc:drill:schema=dfs.drillTestDir> select * from sys.version;
+-+---+---++-++
| version | commit_id | 
 commit_message   |commit_time  
   | build_email | build_time |

[jira] [Created] (DRILL-4768) Drill may leak hive meta store connection if hive meta store client call hits error

2016-07-06 Thread Jinfeng Ni (JIRA)

Jinfeng Ni created DRILL-4768:
-

 Summary: Drill may leak hive meta store connection if hive meta 
store client call hits error
 Key: DRILL-4768
 URL: https://issues.apache.org/jira/browse/DRILL-4768
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jinfeng Ni


We are seeing one drillbit creates hundreds of connections to hive meta store. 
This indicates that drill is leaking those connection, and did not close those 
connections properly. When such leaking happens, it may prevent other 
applications from connecting to hive meta store. 

It seems one cause of leaking connection happens when hive meta store client 
call hits exception. 



 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-4768) Drill may leak hive meta store connection if hive meta store client call hits error

2016-07-06 Thread Jinfeng Ni (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni updated DRILL-4768:
--
Component/s: Storage - Hive

> Drill may leak hive meta store connection if hive meta store client call hits 
> error
> ---
>
> Key: DRILL-4768
> URL: https://issues.apache.org/jira/browse/DRILL-4768
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Hive
>Reporter: Jinfeng Ni
>
> We are seeing one drillbit creates hundreds of connections to hive meta 
> store. This indicates that drill is leaking those connection, and did not 
> close those connections properly. When such leaking happens, it may prevent 
> other applications from connecting to hive meta store. 
> It seems one cause of leaking connection happens when hive meta store client 
> call hits exception. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4763) Parquet file with DATE logical type produces wrong results for simple SELECT

2016-07-06 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-4763:
--

Assignee: Vitalii Diravka

> Parquet file with DATE logical type produces wrong results for simple SELECT
> 
>
> Key: DRILL-4763
> URL: https://issues.apache.org/jira/browse/DRILL-4763
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.6.0
>Reporter: Paul Rogers
>Assignee: Vitalii Diravka
> Attachments: date.parquet, int_16.parquet
>
>
> Created a simple Parquet file with the following schema:
> message test { required int32 index; required int32 value (DATE); required 
> int32 raw; }
> That is, a file with an int32 storage type and a DATE logical type. Then, 
> created a number of test values:
> 0 (which should be interpreted as 1970-01-01) and
> (int) (System.currentTimeMillis() / (24*60*60*1000) ) Which should be 
> interpreted as the number of days since 1970-01-01 and today.
> According to the Parquet spec 
> (https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md), 
> Parquet dates are expressed as "the number of days from the Unix epoch, 1 
> January 1970."
> Java timestamps are expressed as "measured in milliseconds, between the 
> current time and midnight, January 1, 1970 UTC."
> There is ambiguity here: Parquet dates are presumably local times not 
> absolute times, so the math above will actually tell us the date in London 
> right now, but that's close enough.
> Generate the local file to date.parquet. Query it with:
> SELECT * from `local`.`root`.`date.parquet`;
> The results are incorrect:
> index value raw
> 1 -11395-10-18T00:00:00.000-07:52:58  0
> Here, we have a value of 0. The displayed date is decidedly not 
> 1970-01-01T00:00:00. We actually have many problems:
> 1. The date is far off.
> 2. The output shows time. But, the Parquet DATE format explcitly does NOT 
> include time, so it makes no sense to include it.
> 3. The output attempts to show a time zone, but a time zone of -07:52:58, 
> while close to PST, is not right (there is no timezine that is of by 7:02 
> from UTC.)
> 4. The data has no time zone, Parquet DATE explicilty is a local time, so it 
> is impossible to know the relationship between that date an UTC.
> The correct output (in ISO format) would be: 1970-01-01
> The last line should be today's date, but instead is:
> 6 -11348-04-20T00:00:00.000-07:52:58  16986
> Expected:
> 2016-07-04
> Note that all the information to produce the right information is available 
> to Drill:
> 1. The DATE annotation says the meaning of the signed 32-bit integer.
> 2. Given the starting point and duration in days, the conversion to Drill's 
> own internal date format is unambiguous.
> 3. The DATE annotation says that the date is local, so Drill should not 
> attempt to convert to UTC. (That is, a Java Date object can't be used, 
> instead a Joda/Java 8 LocalDate is necessary.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (DRILL-4309) Make this option store.hive.optimize_scan_with_native_readers=true default

2016-07-06 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka reassigned DRILL-4309:
--

Assignee: Vitalii Diravka  (was: Arina Ielchiieva)

> Make this option store.hive.optimize_scan_with_native_readers=true default
> --
>
> Key: DRILL-4309
> URL: https://issues.apache.org/jira/browse/DRILL-4309
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Query Planning & Optimization
>Reporter: Sean Hsuan-Yi Chu
>Assignee: Vitalii Diravka
> Fix For: Future
>
>
> This new feature has been around and used/tests in many scenarios. 
> We should enable this feature by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return

[jira] [Commented] (DRILL-4673) Implement "DROP TABLE IF EXISTS" for drill to prevent FAILED status on command return

[jira] [Commented] (DRILL-4763) Parquet file with DATE logical type produces wrong results for simple SELECT

[jira] [Commented] (DRILL-4765) Missing, incorrect information in Drill data types page

[jira] [Commented] (DRILL-4765) Missing, incorrect information in Drill data types page

[jira] [Updated] (DRILL-4767) Parquet reader throw IllegalArgumentException for int32 type with GZIP compression

[jira] [Created] (DRILL-4769) forman spins query int32 data with snappy compression

[jira] [Updated] (DRILL-4769) forman spins query int32 data with snappy compression

[jira] [Updated] (DRILL-4767) Parquet reader throw IllegalArgumentException for int32 type with GZIP compression

[jira] [Created] (DRILL-4767) Parquet reader throw IllegalArgumentException for int32 type with GZIP compression

[jira] [Updated] (DRILL-4767) Parquet reader throw IllegalArgumentException for int32 type with GZIP compression

[jira] [Updated] (DRILL-4767) Parquet reader throw IllegalArgumentException for int32 type with GZIP compression

[jira] [Created] (DRILL-4768) Drill may leak hive meta store connection if hive meta store client call hits error

[jira] [Updated] (DRILL-4768) Drill may leak hive meta store connection if hive meta store client call hits error

[jira] [Assigned] (DRILL-4763) Parquet file with DATE logical type produces wrong results for simple SELECT

[jira] [Assigned] (DRILL-4309) Make this option store.hive.optimize_scan_with_native_readers=true default

16 matches

Site Navigation

Mail list logo

Footer information