[jira] [Created] (DRILL-7130) IllegalStateException: Read batch count [0] should be greater than zero

2019-03-21 Thread salim achouche (JIRA)
salim achouche created DRILL-7130:
-

 Summary: IllegalStateException: Read batch count [0] should be 
greater than zero
 Key: DRILL-7130
 URL: https://issues.apache.org/jira/browse/DRILL-7130
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.15.0
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.17.0


The following exception is being hit when reading parquet data:

Caused by: java.lang.IllegalStateException: Read batch count [0] should be 
greater than zero at 
org.apache.drill.shaded.guava.com.google.common.base.Preconditions.checkState(Preconditions.java:509)
 ~[drill-shaded-guava-23.0.jar:23.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenNullableFixedEntryReader.getEntry(VarLenNullableFixedEntryReader.java:49)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getFixedEntry(VarLenBulkPageReader.java:167)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenBulkPageReader.getEntry(VarLenBulkPageReader.java:132)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:154)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenColumnBulkInput.next(VarLenColumnBulkInput.java:38)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:624)
 ~[vector-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setSafe(NullableVarCharVector.java:716)
 ~[vector-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLengthColumnReaders$NullableVarCharColumn.setSafe(VarLengthColumnReaders.java:215)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLengthValuesColumn.readRecordsInBulk(VarLengthValuesColumn.java:98)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readRecordsInBulk(VarLenBinaryReader.java:114)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields(VarLenBinaryReader.java:92)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader$VariableWidthReader.readRecords(BatchReader.java:156)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:43)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] at 
org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:288)
 ~[drill-java-exec-1.15.0.0.jar:1.15.0.0] ... 29 common frames omitted

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (DRILL-7129) Join with more than 1 condition is not using stats to compute row count estimate

2019-03-21 Thread Gautam Parai (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gautam Parai reassigned DRILL-7129:
---

Assignee: Gautam Parai

> Join with more than 1 condition is not using stats to compute row count 
> estimate
> 
>
> Key: DRILL-7129
> URL: https://issues.apache.org/jira/browse/DRILL-7129
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Anisha Reddy
>Assignee: Gautam Parai
>Priority: Major
> Fix For: 1.17.0
>
>
> Below are the details: 
>  
> {code:java}
> 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
> `table_stats/Tpch0.01/parquet/lineitem`; +-+ | EXPR$0 | +-+ | 
> 57068 | +-+ 1 row selected (0.179 seconds)
>  0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
> `table_stats/Tpch0.01/parquet/partsupp`; +-+ | EXPR$0 | +-+ | 
> 7474 | +-+ 1 row selected (0.171 seconds) 
> 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
> `table_stats/Tpch0.01/parquet/lineitem` l, 
> `table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey 
> and l.l_suppkey = ps.ps_suppkey; +-+ | EXPR$0 | +-+ | 53401 | 
> +-+ 1 row selected (0.769 seconds)
>  0: jdbc:drill:drillbit=10.10.101.108> explain plan including all attributes 
> for select * from `table_stats/Tpch0.01/parquet/lineitem` l, 
> `table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey 
> and l.l_suppkey = ps.ps_suppkey; 
> +--+--+
>  | text | json | 
> +--+--+
>  | 00-00 Screen : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): 
> rowcount = 57068.0, cumulative cost = {313468.8 rows, 2110446.8 cpu, 193626.0 
> io, 0.0 network, 197313.6 memory}, id = 107578 00-01 ProjectAllowDup(**=[$0], 
> **0=[$1]) : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): rowcount 
> = 57068.0, cumulative cost = {307762.0 rows, 2104740.0 cpu, 193626.0 io, 0.0 
> network, 197313.6 memory}, id = 107577 00-02 Project(T10¦¦**=[$0], 
> T11¦¦**=[$3]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, DYNAMIC_STAR 
> T11¦¦**): rowcount = 57068.0, cumulative cost = {250694.0 rows, 1990604.0 
> cpu, 193626.0 io, 0.0 network, 197313.6 memory}, id = 107576 00-03 
> HashJoin(condition=[AND(=($1, $4), =($2, $5))], joinType=[inner], semi-join: 
> =[false]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, ANY l_partkey, ANY 
> l_suppkey, DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = 
> 57068.0, cumulative cost = {193626.0 rows, 1876468.0 cpu, 193626.0 io, 0.0 
> network, 197313.6 memory}, id = 107575 00-05 Project(T10¦¦**=[$0], 
> l_partkey=[$1], l_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, 
> ANY l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = 
> {114136.0 rows, 342408.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 
> 107572 00-07 Scan(table=[[dfs, drilltestdir, 
> table_stats/Tpch0.01/parquet/lineitem]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/lineitem]], 
> selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/lineitem, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, 
> `l_partkey`, `l_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY 
> l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = {57068.0 
> rows, 171204.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 107571 00-04 
> Project(T11¦¦**=[$0], ps_partkey=[$1], ps_suppkey=[$2]) : rowType = 
> RecordType(DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = 
> 7474.0, cumulative cost = {14948.0 rows, 44844.0 cpu, 22422.0 io, 0.0 
> network, 0.0 memory}, id = 107574 00-06 Scan(table=[[dfs, drilltestdir, 
> table_stats/Tpch0.01/parquet/partsupp]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/partsupp]], 
> selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/partsupp, 
> numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, 
> `ps_partkey`, `ps_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY 
> ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {7474.0 
> rows, 22422.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107573
> {code}
> The ndv for l_partkey = 2000
> ps_partkey = 1817
> l_supkey = 100
> ps_suppkey = 100 
> We see that such joins is 

[jira] [Created] (DRILL-7129) Join with more than 1 condition is not using stats to compute row count estimate

2019-03-21 Thread Anisha Reddy (JIRA)
Anisha Reddy created DRILL-7129:
---

 Summary: Join with more than 1 condition is not using stats to 
compute row count estimate
 Key: DRILL-7129
 URL: https://issues.apache.org/jira/browse/DRILL-7129
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.16.0
Reporter: Anisha Reddy
 Fix For: 1.17.0


Below are the details: 

 
{code:java}
0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
`table_stats/Tpch0.01/parquet/lineitem`; +-+ | EXPR$0 | +-+ | 
57068 | +-+ 1 row selected (0.179 seconds)

 0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
`table_stats/Tpch0.01/parquet/partsupp`; +-+ | EXPR$0 | +-+ | 
7474 | +-+ 1 row selected (0.171 seconds) 

0: jdbc:drill:drillbit=10.10.101.108> select count(*) from 
`table_stats/Tpch0.01/parquet/lineitem` l, 
`table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey 
and l.l_suppkey = ps.ps_suppkey; +-+ | EXPR$0 | +-+ | 53401 | 
+-+ 1 row selected (0.769 seconds)

 0: jdbc:drill:drillbit=10.10.101.108> explain plan including all attributes 
for select * from `table_stats/Tpch0.01/parquet/lineitem` l, 
`table_stats/Tpch0.01/parquet/partsupp` ps where l.l_partkey = ps.ps_partkey 
and l.l_suppkey = ps.ps_suppkey; 
+--+--+
 | text | json | 
+--+--+
 | 00-00 Screen : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): 
rowcount = 57068.0, cumulative cost = {313468.8 rows, 2110446.8 cpu, 193626.0 
io, 0.0 network, 197313.6 memory}, id = 107578 00-01 ProjectAllowDup(**=[$0], 
**0=[$1]) : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): rowcount = 
57068.0, cumulative cost = {307762.0 rows, 2104740.0 cpu, 193626.0 io, 0.0 
network, 197313.6 memory}, id = 107577 00-02 Project(T10¦¦**=[$0], 
T11¦¦**=[$3]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, DYNAMIC_STAR 
T11¦¦**): rowcount = 57068.0, cumulative cost = {250694.0 rows, 1990604.0 cpu, 
193626.0 io, 0.0 network, 197313.6 memory}, id = 107576 00-03 
HashJoin(condition=[AND(=($1, $4), =($2, $5))], joinType=[inner], semi-join: 
=[false]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, ANY l_partkey, ANY 
l_suppkey, DYNAMIC_STAR T11¦¦**, ANY ps_partkey, ANY ps_suppkey): rowcount = 
57068.0, cumulative cost = {193626.0 rows, 1876468.0 cpu, 193626.0 io, 0.0 
network, 197313.6 memory}, id = 107575 00-05 Project(T10¦¦**=[$0], 
l_partkey=[$1], l_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T10¦¦**, 
ANY l_partkey, ANY l_suppkey): rowcount = 57068.0, cumulative cost = {114136.0 
rows, 342408.0 cpu, 171204.0 io, 0.0 network, 0.0 memory}, id = 107572 00-07 
Scan(table=[[dfs, drilltestdir, table_stats/Tpch0.01/parquet/lineitem]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/lineitem]], 
selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/lineitem, 
numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, `l_partkey`, 
`l_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY l_partkey, ANY 
l_suppkey): rowcount = 57068.0, cumulative cost = {57068.0 rows, 171204.0 cpu, 
171204.0 io, 0.0 network, 0.0 memory}, id = 107571 00-04 Project(T11¦¦**=[$0], 
ps_partkey=[$1], ps_suppkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T11¦¦**, 
ANY ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {14948.0 
rows, 44844.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107574 00-06 
Scan(table=[[dfs, drilltestdir, table_stats/Tpch0.01/parquet/partsupp]], 
groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=maprfs:///drill/testdata/table_stats/Tpch0.01/parquet/partsupp]], 
selectionRoot=maprfs:/drill/testdata/table_stats/Tpch0.01/parquet/partsupp, 
numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`**`, 
`ps_partkey`, `ps_suppkey`]]]) : rowType = RecordType(DYNAMIC_STAR **, ANY 
ps_partkey, ANY ps_suppkey): rowcount = 7474.0, cumulative cost = {7474.0 rows, 
22422.0 cpu, 22422.0 io, 0.0 network, 0.0 memory}, id = 107573
{code}

The ndv for l_partkey = 2000
ps_partkey = 1817
l_supkey = 100
ps_suppkey = 100 

We see that such joins is just taking the max of left side and the right side 
table.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-7126) Contrib format-ltsv is not being included in distribution

2019-03-21 Thread Abhishek Girish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Girish closed DRILL-7126.
--

> Contrib format-ltsv is not being included in distribution
> -
>
> Key: DRILL-7126
> URL: https://issues.apache.org/jira/browse/DRILL-7126
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.16.0
>Reporter: Abhishek Girish
>Assignee: Abhishek Girish
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Unable to add the ltsv format in the dfs storage plugin. Looks like it's a 
> build distribution issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7110) Skip writing profile when an ALTER SESSION is executed

2019-03-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7110:

Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Skip writing profile when an ALTER SESSION is executed
> --
>
> Key: DRILL-7110
> URL: https://issues.apache.org/jira/browse/DRILL-7110
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Execution - Monitoring
>Affects Versions: 1.16.0
>Reporter: Kunal Khatua
>Assignee: Kunal Khatua
>Priority: Minor
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> Currently, any {{ALTER }} query will be logged. While this is useful, 
> it can potentially add up to a lot of profiles being written unnecessarily, 
> since those changes are also reflected on the queries that follow.
> This JIRA is proposing an option to skip writing such profiles to the profile 
> store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7128) IllegalStateException: Read batch count [0] should be greater than zero

2019-03-21 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-7128:
-

 Summary: IllegalStateException: Read batch count [0] should be 
greater than zero
 Key: DRILL-7128
 URL: https://issues.apache.org/jira/browse/DRILL-7128
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Parquet
Affects Versions: 1.15.0
Reporter: Khurram Faraaz


Source table is a Hive table stored as parquet.
Issue is seen only when querying datacapturekey column, which is of VARCHAR 
type.

Hive 2.3
MapR Drill : 1.15.0.0-mapr 
commit id : 951ef156fb1025677a2ca2dcf84e11002bf4b513

{noformat}
0: jdbc:drill:drillbit=test.a.node1> describe bt_br_cc_invalid_leads ;
+-++--+
| COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
+-++--+
| wrapup | CHARACTER VARYING | YES |
| datacapturekey | CHARACTER VARYING | YES |
| leadgendate | CHARACTER VARYING | YES |
| crla1 | CHARACTER VARYING | YES |
| crla2 | CHARACTER VARYING | YES |
| invalid_lead | INTEGER | YES |
| destination_advertiser_vendor_name | CHARACTER VARYING | YES |
| source_program_key | CHARACTER VARYING | YES |
| publisher_publisher | CHARACTER VARYING | YES |
| areaname | CHARACTER VARYING | YES |
| data_abertura_ficha | CHARACTER VARYING | YES |
+-++--+
11 rows selected (1.85 seconds)
0: jdbc:drill:drillbit=test.a.node1>

// from the view definition, note that column datacapturekey is of type 
VARVCHAR with precision 2000
{
"name" : "bt_br_cc_invalid_leads",
"sql" : "SELECT CAST(`wrapup` AS VARCHAR(2000)) AS `wrapup`, 
CAST(`datacapturekey` AS VARCHAR(2000)) AS `datacapturekey`, CAST(`leadgendate` 
AS VARCHAR(2000)) AS `leadgendate`, CAST(`crla1` AS VARCHAR(2000)) AS `crla1`, 
CAST(`crla2` AS VARCHAR(2000)) AS `crla2`, CAST(`invalid_lead` AS INTEGER) AS 
`invalid_lead`, CAST(`destination_advertiser_vendor_name` AS VARCHAR(2000)) AS 
`destination_advertiser_vendor_name`, CAST(`source_program_key` AS 
VARCHAR(2000)) AS `source_program_key`, CAST(`publisher_publisher` AS 
VARCHAR(2000)) AS `publisher_publisher`, CAST(`areaname` AS VARCHAR(2000)) AS 
`areaname`, CAST(`data_abertura_ficha` AS VARCHAR(2000)) AS 
`data_abertura_ficha`\nFROM 
`dfs`.`root`.`/user/bigtable/logs/hive/warehouse/bt_br_cc_invalid_leads`",
"fields" : [ {
"name" : "wrapup",
"type" : "VARCHAR",
"precision" : 2000,
"isNullable" : true
}, {
"name" : "datacapturekey",
"type" : "VARCHAR",
"precision" : 2000,
"isNullable" : true
...
...

// total number of rows in bt_br_cc_invalid_leads
0: jdbc:drill:drillbit=test.a.node1> select count(*) from 
bt_br_cc_invalid_leads ;
+-+
| EXPR$0 |
+-+
| 20599 |
+-+
1 row selected (0.173 seconds)
{noformat}

Stack trace from drillbit.log
{noformat}
2019-03-18 12:19:01,610 [237010da-6eda-a913-0424-32f63fbe01be:foreman] INFO 
o.a.drill.exec.work.foreman.Foreman - Query text for query with id 
237010da-6eda-a913-0424-32f63fbe01be issued by bigtable: SELECT 
`bt_br_cc_invalid_leads`.`datacapturekey` AS `datacapturekey`
FROM `dfs.drill_views`.`bt_br_cc_invalid_leads` `bt_br_cc_invalid_leads`
GROUP BY `bt_br_cc_invalid_leads`.`datacapturekey`

2019-03-18 12:19:02,495 [237010da-6eda-a913-0424-32f63fbe01be:frag:0:0] INFO 
o.a.d.e.w.fragment.FragmentExecutor - 237010da-6eda-a913-0424-32f63fbe01be:0:0: 
State change requested AWAITING_ALLOCATION --> RUNNING
2019-03-18 12:19:02,495 [237010da-6eda-a913-0424-32f63fbe01be:frag:0:0] INFO 
o.a.d.e.w.f.FragmentStatusReporter - 237010da-6eda-a913-0424-32f63fbe01be:0:0: 
State to report: RUNNING
2019-03-18 12:19:02,502 [237010da-6eda-a913-0424-32f63fbe01be:frag:0:0] INFO 
o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Error in parquet 
record reader.
Message:
Hadoop path: /user/bigtable/logs/hive/warehouse/bt_br_cc_invalid_leads/08_0
Total records read: 0
Row group index: 0
Records in row group: 1551
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message hive_schema {
 optional binary wrapup (UTF8);
 optional binary datacapturekey (UTF8);
 optional binary leadgendate (UTF8);
 optional binary crla1 (UTF8);
 optional binary crla2 (UTF8);
 optional binary invalid_lead (UTF8);
 optional binary destination_advertiser_vendor_name (UTF8);
 optional binary source_program_key (UTF8);
 optional binary publisher_publisher (UTF8);
 optional binary areaname (UTF8);
 optional binary data_abertura_ficha (UTF8);
}
, metadata: {}}, blocks: [BlockMetaData\{1551, 139906 
[ColumnMetaData{UNCOMPRESSED [wrapup] optional binary wrapup (UTF8) 
[PLAIN_DICTIONARY, RLE, BIT_PACKED], 4}, ColumnMetaData\{UNCOMPRESSED 
[datacapturekey] optional binary datacapturekey (UTF8) [RLE, PLAIN, 
BIT_PACKED], 656}, ColumnMetaData\{UNCOMPRESSED [leadgendate] optional binary 
leadgendate (UTF8) [PLAIN_DICTIONARY, RLE, BIT_PACKED], 23978}, 

[jira] [Assigned] (DRILL-7128) IllegalStateException: Read batch count [0] should be greater than zero

2019-03-21 Thread Khurram Faraaz (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz reassigned DRILL-7128:
-

Assignee: salim achouche

> IllegalStateException: Read batch count [0] should be greater than zero
> ---
>
> Key: DRILL-7128
> URL: https://issues.apache.org/jira/browse/DRILL-7128
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.15.0
>Reporter: Khurram Faraaz
>Assignee: salim achouche
>Priority: Major
>
> Source table is a Hive table stored as parquet.
> Issue is seen only when querying datacapturekey column, which is of VARCHAR 
> type.
> Hive 2.3
> MapR Drill : 1.15.0.0-mapr 
> commit id : 951ef156fb1025677a2ca2dcf84e11002bf4b513
> {noformat}
> 0: jdbc:drill:drillbit=test.a.node1> describe bt_br_cc_invalid_leads ;
> +-++--+
> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
> +-++--+
> | wrapup | CHARACTER VARYING | YES |
> | datacapturekey | CHARACTER VARYING | YES |
> | leadgendate | CHARACTER VARYING | YES |
> | crla1 | CHARACTER VARYING | YES |
> | crla2 | CHARACTER VARYING | YES |
> | invalid_lead | INTEGER | YES |
> | destination_advertiser_vendor_name | CHARACTER VARYING | YES |
> | source_program_key | CHARACTER VARYING | YES |
> | publisher_publisher | CHARACTER VARYING | YES |
> | areaname | CHARACTER VARYING | YES |
> | data_abertura_ficha | CHARACTER VARYING | YES |
> +-++--+
> 11 rows selected (1.85 seconds)
> 0: jdbc:drill:drillbit=test.a.node1>
> // from the view definition, note that column datacapturekey is of type 
> VARVCHAR with precision 2000
> {
> "name" : "bt_br_cc_invalid_leads",
> "sql" : "SELECT CAST(`wrapup` AS VARCHAR(2000)) AS `wrapup`, 
> CAST(`datacapturekey` AS VARCHAR(2000)) AS `datacapturekey`, 
> CAST(`leadgendate` AS VARCHAR(2000)) AS `leadgendate`, CAST(`crla1` AS 
> VARCHAR(2000)) AS `crla1`, CAST(`crla2` AS VARCHAR(2000)) AS `crla2`, 
> CAST(`invalid_lead` AS INTEGER) AS `invalid_lead`, 
> CAST(`destination_advertiser_vendor_name` AS VARCHAR(2000)) AS 
> `destination_advertiser_vendor_name`, CAST(`source_program_key` AS 
> VARCHAR(2000)) AS `source_program_key`, CAST(`publisher_publisher` AS 
> VARCHAR(2000)) AS `publisher_publisher`, CAST(`areaname` AS VARCHAR(2000)) AS 
> `areaname`, CAST(`data_abertura_ficha` AS VARCHAR(2000)) AS 
> `data_abertura_ficha`\nFROM 
> `dfs`.`root`.`/user/bigtable/logs/hive/warehouse/bt_br_cc_invalid_leads`",
> "fields" : [ {
> "name" : "wrapup",
> "type" : "VARCHAR",
> "precision" : 2000,
> "isNullable" : true
> }, {
> "name" : "datacapturekey",
> "type" : "VARCHAR",
> "precision" : 2000,
> "isNullable" : true
> ...
> ...
> // total number of rows in bt_br_cc_invalid_leads
> 0: jdbc:drill:drillbit=test.a.node1> select count(*) from 
> bt_br_cc_invalid_leads ;
> +-+
> | EXPR$0 |
> +-+
> | 20599 |
> +-+
> 1 row selected (0.173 seconds)
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2019-03-18 12:19:01,610 [237010da-6eda-a913-0424-32f63fbe01be:foreman] INFO 
> o.a.drill.exec.work.foreman.Foreman - Query text for query with id 
> 237010da-6eda-a913-0424-32f63fbe01be issued by bigtable: SELECT 
> `bt_br_cc_invalid_leads`.`datacapturekey` AS `datacapturekey`
> FROM `dfs.drill_views`.`bt_br_cc_invalid_leads` `bt_br_cc_invalid_leads`
> GROUP BY `bt_br_cc_invalid_leads`.`datacapturekey`
> 2019-03-18 12:19:02,495 [237010da-6eda-a913-0424-32f63fbe01be:frag:0:0] INFO 
> o.a.d.e.w.fragment.FragmentExecutor - 
> 237010da-6eda-a913-0424-32f63fbe01be:0:0: State change requested 
> AWAITING_ALLOCATION --> RUNNING
> 2019-03-18 12:19:02,495 [237010da-6eda-a913-0424-32f63fbe01be:frag:0:0] INFO 
> o.a.d.e.w.f.FragmentStatusReporter - 
> 237010da-6eda-a913-0424-32f63fbe01be:0:0: State to report: RUNNING
> 2019-03-18 12:19:02,502 [237010da-6eda-a913-0424-32f63fbe01be:frag:0:0] INFO 
> o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Error in parquet 
> record reader.
> Message:
> Hadoop path: 
> /user/bigtable/logs/hive/warehouse/bt_br_cc_invalid_leads/08_0
> Total records read: 0
> Row group index: 0
> Records in row group: 1551
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message hive_schema {
>  optional binary wrapup (UTF8);
>  optional binary datacapturekey (UTF8);
>  optional binary leadgendate (UTF8);
>  optional binary crla1 (UTF8);
>  optional binary crla2 (UTF8);
>  optional binary invalid_lead (UTF8);
>  optional binary destination_advertiser_vendor_name (UTF8);
>  optional binary source_program_key (UTF8);
>  optional binary publisher_publisher (UTF8);
>  optional binary 

[jira] [Updated] (DRILL-7118) Filter not getting pushed down on MapR-DB tables.

2019-03-21 Thread Pritesh Maker (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pritesh Maker updated DRILL-7118:
-
Reviewer: Aman Sinha

> Filter not getting pushed down on MapR-DB tables.
> -
>
> Key: DRILL-7118
> URL: https://issues.apache.org/jira/browse/DRILL-7118
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization
>Affects Versions: 1.15.0
>Reporter: Hanumath Rao Maduri
>Assignee: Hanumath Rao Maduri
>Priority: Major
> Fix For: 1.16.0
>
>
> A simple is null filter is not being pushed down for the mapr-db tables. Here 
> is the repro for the same.
> {code:java}
> 0: jdbc:drill:zk=local> explain plan for select * from dfs.`/tmp/js` where b 
> is null;
> ANTLR Tool version 4.5 used for code generation does not match the current 
> runtime version 4.7.1ANTLR Runtime version 4.5 used for parser compilation 
> does not match the current runtime version 4.7.1ANTLR Tool version 4.5 used 
> for code generation does not match the current runtime version 4.7.1ANTLR 
> Runtime version 4.5 used for parser compilation does not match the current 
> runtime version 
> 4.7.1+--+--+
> | text | json |
> +--+--+
> | 00-00 Screen
> 00-01 Project(**=[$0])
> 00-02 Project(T0¦¦**=[$0])
> 00-03 SelectionVectorRemover
> 00-04 Filter(condition=[IS NULL($1)])
> 00-05 Project(T0¦¦**=[$0], b=[$1])
> 00-06 Scan(table=[[dfs, /tmp/js]], groupscan=[JsonTableGroupScan 
> [ScanSpec=JsonScanSpec [tableName=/tmp/js, condition=null], columns=[`**`, 
> `b`], maxwidth=1]])
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-6562) Plugin Management improvements

2019-03-21 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6562:

Labels: doc-impacting ready-to-commit  (was: doc-impacting)

> Plugin Management improvements
> --
>
> Key: DRILL-6562
> URL: https://issues.apache.org/jira/browse/DRILL-6562
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Client - HTTP, Web Server
>Affects Versions: 1.14.0
>Reporter: Abhishek Girish
>Assignee: Vitalii Diravka
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
> Attachments: Export.png, ExportAll.png, Screenshot from 2019-03-21 
> 01-18-17.png, Screenshot from 2019-03-21 02-52-50.png, Storage.png, 
> UpdateExport.png, create.png, image-2018-07-23-02-55-02-024.png, 
> image-2018-10-22-20-20-24-658.png, image-2018-10-22-20-20-59-105.png
>
>
> Follow-up to DRILL-4580.
> Provide ability to export all storage plugin configurations at once, with a 
> new "Export All" option on the Storage page of the Drill web UI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7127) Update hbase version for mapr profile

2019-03-21 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7127:
-
Labels: ready-to-commit  (was: )

> Update hbase version for mapr profile
> -
>
> Key: DRILL-7127
> URL: https://issues.apache.org/jira/browse/DRILL-7127
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase, Tools, Build  Test
>Affects Versions: 1.16.0
>Reporter: Abhishek Girish
>Assignee: Abhishek Girish
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Current hbase version for mapr profile is {{1.1.1-mapr-1602-m7-5.2.0}} - 
> which is over 3 years old. Needs to be updated to {{1.1.8-mapr-1808}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7126) Contrib format-ltsv is not being included in distribution

2019-03-21 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7126:
-
Labels: ready-to-commit  (was: )

> Contrib format-ltsv is not being included in distribution
> -
>
> Key: DRILL-7126
> URL: https://issues.apache.org/jira/browse/DRILL-7126
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build  Test
>Affects Versions: 1.16.0
>Reporter: Abhishek Girish
>Assignee: Abhishek Girish
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Unable to add the ltsv format in the dfs storage plugin. Looks like it's a 
> build distribution issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)