[jira] [Updated] (DRILL-7161) Aggregation with group by clause

2019-04-09 Thread Gayathri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gayathri updated DRILL-7161:

Priority: Blocker  (was: Critical)

> Aggregation with group by clause
> 
>
> Key: DRILL-7161
> URL: https://issues.apache.org/jira/browse/DRILL-7161
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.14.0
>Reporter: Gayathri
>Priority: Blocker
>  Labels: Drill, issue
>
> Facing some issues with the following case:
> Json file (*sample.json*) is having the following content:
> {"a":2,"b":null}
> {"a":2,"b":null}
> {"a":3,"b":null}
> {"a":4,"b":null}
> *Query:*
> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a;
> *Error:*
> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions 
> supported for VarChar type
> *Observation:*
> If we query without using group by, then it is working fine without any 
> error. If group by is used, then sum of null values is throwing the above 
> error.
>  
> Can anyone please let us know the solution for this or if there are any 
> alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7077) Add Function to Facilitate Time Series Analysis

2019-04-09 Thread Bridget Bevens (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-7077:
--
Labels: doc-complete ready-to-commit  (was: doc-impacting ready-to-commit)

> Add Function to Facilitate Time Series Analysis
> ---
>
> Key: DRILL-7077
> URL: https://issues.apache.org/jira/browse/DRILL-7077
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
>
> When analyzing time based data, you will often have to aggregate by time 
> grains. While some time grains will be easy to calculate, others, such as 
> quarter, can be quite difficult. These functions enable a user to quickly and 
> easily aggregate data by various units of time. Usage is as follows:
> {code:java}
> SELECT 
> FROM 
> GROUP BY nearestDate(, {code}
> So let's say that a user wanted to count the number of hits on a web server 
> per 15 minute, the query might look like this:
> {code:java}
> SELECT nearestDate(`eventDate`, '15MINUTE' ) AS eventDate,
> COUNT(*) AS hitCount
> FROM dfs.`log.httpd`
> GROUP BY nearestDate(`eventDate`, '15MINUTE'){code}
> Currently supports the following time units:
>  * YEAR
>  * QUARTER
>  * MONTH
>  * WEEK_SUNDAY
>  * WEEK_MONDAY
>  * DAY
>  * HOUR
>  * HALF_HOUR / 30MIN
>  * QUARTER_HOUR / 15MIN
>  * MINUTE
>  * 30SECOND
>  * 15SECOND
>  * SECOND
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7077) Add Function to Facilitate Time Series Analysis

2019-04-09 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813987#comment-16813987
 ] 

Bridget Bevens commented on DRILL-7077:
---

[~cgivre] the info is posted here: 
https://drill.apache.org/docs/date-time-functions-and-arithmetic/#nearestdate 
Let me know if I need to change anything.

Thanks,
Bridget

> Add Function to Facilitate Time Series Analysis
> ---
>
> Key: DRILL-7077
> URL: https://issues.apache.org/jira/browse/DRILL-7077
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> When analyzing time based data, you will often have to aggregate by time 
> grains. While some time grains will be easy to calculate, others, such as 
> quarter, can be quite difficult. These functions enable a user to quickly and 
> easily aggregate data by various units of time. Usage is as follows:
> {code:java}
> SELECT 
> FROM 
> GROUP BY nearestDate(, {code}
> So let's say that a user wanted to count the number of hits on a web server 
> per 15 minute, the query might look like this:
> {code:java}
> SELECT nearestDate(`eventDate`, '15MINUTE' ) AS eventDate,
> COUNT(*) AS hitCount
> FROM dfs.`log.httpd`
> GROUP BY nearestDate(`eventDate`, '15MINUTE'){code}
> Currently supports the following time units:
>  * YEAR
>  * QUARTER
>  * MONTH
>  * WEEK_SUNDAY
>  * WEEK_MONDAY
>  * DAY
>  * HOUR
>  * HALF_HOUR / 30MIN
>  * QUARTER_HOUR / 15MIN
>  * MINUTE
>  * 30SECOND
>  * 15SECOND
>  * SECOND
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813963#comment-16813963
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

sohami commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Leverage the summary's totalRowCount and totalNullCount for COUNT() queries 
> (also prevent eager expansion of files)
> ---
>
> Key: DRILL-7064
> URL: https://issues.apache.org/jira/browse/DRILL-7064
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Aman Sinha
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This sub-task is meant to leverage the Parquet metadata cache's summary 
> stats: totalRowCount (across all files and row groups) and the per-column 
> totalNullCount (across all files and row groups) to answer plain COUNT 
> aggregation queries without Group-By.  These are currently converted to a 
> DirectScan by the ConvertCountToDirectScanRule which utilizes the row group 
> metadata; however this rule is applied on Drill Logical rels and converts the 
> logical plan to a physical plan with DirectScanPrel but this is too late 
> since the DrillScanRel that is already created during logical planning has 
> already read the entire metadata cache file along with its full list of row 
> group entries. The metadata cache file can grow quite large and this does not 
> scale. 
> The solution is to use the Metadata Summary file that is created in 
> DRILL-7063 and create a new rule that will apply early on such that it 
> operates on the Calcite logical rels instead of the Drill logical rels and 
> prevents eager expansion of the list of files/row groups.   
> We will not remove the existing rule. The existing rule will continue to 
> operate as before because it is possible that after some transformations, we 
> still want to apply the optimizations for COUNT queries. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-09 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7089:
-
Component/s: Metadata

> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813964#comment-16813964
 ] 

ASF GitHub Bot commented on DRILL-7089:
---

sohami commented on pull request #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7077) Add Function to Facilitate Time Series Analysis

2019-04-09 Thread Bridget Bevens (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813895#comment-16813895
 ] 

Bridget Bevens commented on DRILL-7077:
---

Hi [~cgivre],

I'm trying this function and may be doing something wrong, but 15SECOND and 
30SECOND are not working for me:

select nearestdate(CAST(COLUMNS[2] as timestamp), '30SECOND') as nearest_second 
from dfs.samples.`/bee/time.csv`;

Error: SYSTEM ERROR: DrillRuntimeException: [30SECOND] is not a valid time 
statement. Expecting: [YEAR, QUARTER, MONTH, WEEK_SUNDAY, WEEK_MONDAY, DAY, 
HOUR, HALF_HOUR, QUARTER_HOUR, MINUTE, HALF_MINUTE, QUARTER_MINUTE, SECOND]

Fragment 0:0

Please, refer to logs for more information.

[Error Id: f119202e-ec24-4670-83c2-14b4a7f83ebf on doc23.lab:31010] 
(state=,code=0)

apache drill> select nearestdate(CAST(COLUMNS[2] as timestamp), 'SECOND') as 
nearest_second from dfs.samples.`/bee/time.csv`;
+---+
|nearest_second |
+---+
| 2018-01-01 05:10:15.0 |
| 2017-02-02 01:02:03.0 |
| 2003-04-06 07:11:11.0 |
+---+
3 rows selected (0.191 seconds)  

Are 15SECOND and 30SECOND supported?

Thanks,
Bridget


> Add Function to Facilitate Time Series Analysis
> ---
>
> Key: DRILL-7077
> URL: https://issues.apache.org/jira/browse/DRILL-7077
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> When analyzing time based data, you will often have to aggregate by time 
> grains. While some time grains will be easy to calculate, others, such as 
> quarter, can be quite difficult. These functions enable a user to quickly and 
> easily aggregate data by various units of time. Usage is as follows:
> {code:java}
> SELECT 
> FROM 
> GROUP BY nearestDate(, {code}
> So let's say that a user wanted to count the number of hits on a web server 
> per 15 minute, the query might look like this:
> {code:java}
> SELECT nearestDate(`eventDate`, '15MINUTE' ) AS eventDate,
> COUNT(*) AS hitCount
> FROM dfs.`log.httpd`
> GROUP BY nearestDate(`eventDate`, '15MINUTE'){code}
> Currently supports the following time units:
>  * YEAR
>  * QUARTER
>  * MONTH
>  * WEEK_SUNDAY
>  * WEEK_MONDAY
>  * DAY
>  * HOUR
>  * HALF_HOUR / 30MIN
>  * QUARTER_HOUR / 15MIN
>  * MINUTE
>  * 30SECOND
>  * 15SECOND
>  * SECOND
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.

2019-04-09 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7164:
---
Description: 
On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost" in plan: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "STRING",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.record.reader",
  "string_val" : 
"org.apache.drill.exec.store.kafka.decoders.JsonMessageReader",
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "exec.errors.verbose",
  "bool_val" : true,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.poll.timeout",
  "num_val" : 5000,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.width.max_per_node",
  "num_val" : 2,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "kafka-scan",
"@id" : 6,
"userName" : "",
"kafkaStoragePluginConfig" : {
  "type" : "kafka",
  "kafkaConsumerProps" : {
"bootstrap.servers" : "127.0.0.1:56524",
"group.id" : "drill-test-consumer"
  },
  "enabled" : true
},
"columns" : [ "`**`", "`kafkaMsgOffset`" ],
"kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "project",
"@id" : 5,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`**`"
}, {
  "ref" : "`kafkaMsgOffset`",
  "expr" : "`kafkaMsgOffset`"
} ],
"child" : 6,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "filter",
"@id" : 4,
"child" : 5,
"expr" : "equal(`kafkaMsgOffset`, 9) ",
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 0.75
}
  }, {
"pop" : "selection-vector-remover",
"@id" : 3,
"child" : 4,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 3,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 1,
"exprs" : [ {
  "ref" : "`**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 2,
"outputProj" : true,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  } ]
}!
{code}

In the earlier checkin (d22e68b83d1d0cc0539d79ae0cb3aa70ae3242ad ) there is a 
change in the way cost is being represented. It also has the changed the test 
which I think is not right. The pattern to compare in the plan should be made 
smart to fix this issue generically.

  was:
On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost" 

[jira] [Updated] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.

2019-04-09 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7164:
---
Description: 
On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"cost" in plan: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "STRING",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.record.reader",
  "string_val" : 
"org.apache.drill.exec.store.kafka.decoders.JsonMessageReader",
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "exec.errors.verbose",
  "bool_val" : true,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.poll.timeout",
  "num_val" : 5000,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.width.max_per_node",
  "num_val" : 2,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "kafka-scan",
"@id" : 6,
"userName" : "",
"kafkaStoragePluginConfig" : {
  "type" : "kafka",
  "kafkaConsumerProps" : {
"bootstrap.servers" : "127.0.0.1:56524",
"group.id" : "drill-test-consumer"
  },
  "enabled" : true
},
"columns" : [ "`**`", "`kafkaMsgOffset`" ],
"kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "project",
"@id" : 5,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`**`"
}, {
  "ref" : "`kafkaMsgOffset`",
  "expr" : "`kafkaMsgOffset`"
} ],
"child" : 6,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "filter",
"@id" : 4,
"child" : 5,
"expr" : "equal(`kafkaMsgOffset`, 9) ",
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 0.75
}
  }, {
"pop" : "selection-vector-remover",
"@id" : 3,
"child" : 4,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 3,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 1,
"exprs" : [ {
  "ref" : "`**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 2,
"outputProj" : true,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  } ]
}!
{code}

In the earlier checkin (d22e68b83d1d0cc0539d79ae0cb3aa70ae3242ad ) there is a 
change in the way cost is being represented. It also has the changed the test 
which I think is not right. The pattern to compare in the plan should be made 
smart to fix this issue generically.

  was:
On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost" in 

[jira] [Updated] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.

2019-04-09 Thread Hanumath Rao Maduri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanumath Rao Maduri updated DRILL-7164:
---
Description: 
On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost" in plan: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "STRING",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.record.reader",
  "string_val" : 
"org.apache.drill.exec.store.kafka.decoders.JsonMessageReader",
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "exec.errors.verbose",
  "bool_val" : true,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.poll.timeout",
  "num_val" : 5000,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.width.max_per_node",
  "num_val" : 2,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "kafka-scan",
"@id" : 6,
"userName" : "",
"kafkaStoragePluginConfig" : {
  "type" : "kafka",
  "kafkaConsumerProps" : {
"bootstrap.servers" : "127.0.0.1:56524",
"group.id" : "drill-test-consumer"
  },
  "enabled" : true
},
"columns" : [ "`**`", "`kafkaMsgOffset`" ],
"kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "project",
"@id" : 5,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`**`"
}, {
  "ref" : "`kafkaMsgOffset`",
  "expr" : "`kafkaMsgOffset`"
} ],
"child" : 6,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "filter",
"@id" : 4,
"child" : 5,
"expr" : "equal(`kafkaMsgOffset`, 9) ",
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 0.75
}
  }, {
"pop" : "selection-vector-remover",
"@id" : 3,
"child" : 4,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 3,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 1,
"exprs" : [ {
  "ref" : "`**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 2,
"outputProj" : true,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  } ]
}!
{code}

In the earlier checkin (d22e68b83d1d0cc0539d79ae0cb3aa70ae3242ad ) there is a 
change in the way cost is being represented. This has the changed the test 
which I think is not right. The pattern to compare in the plan should be made 
smart to fix this issue generically.

  was:
On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost" in 

[jira] [Created] (DRILL-7164) KafkaFilterPushdownTest is sometimes failing to pattern match correctly.

2019-04-09 Thread Hanumath Rao Maduri (JIRA)
Hanumath Rao Maduri created DRILL-7164:
--

 Summary: KafkaFilterPushdownTest is sometimes failing to pattern 
match correctly.
 Key: DRILL-7164
 URL: https://issues.apache.org/jira/browse/DRILL-7164
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Kafka
Affects Versions: 1.16.0
Reporter: Hanumath Rao Maduri
Assignee: Abhishek Ravi
 Fix For: 1.17.0


On my private build I am hitting kafka storage tests issue intermittently. Here 
is the issue which I came across.
{code}
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_91]
15:01:39.852 [main] ERROR org.apache.drill.TestReporter - Test Failed (d: -292 
B(75.4 KiB), h: -391.1 MiB(240.7 MiB), nh: 824.5 KiB(129.0 MiB)): 
testPushdownOffsetOneRecordReturnedWithBoundaryConditions(org.apache.drill.exec.store.kafka.KafkaFilterPushdownTest)
java.lang.AssertionError: Unable to find expected string "kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
.*
.*
"cost" in plan: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "STRING",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.record.reader",
  "string_val" : 
"org.apache.drill.exec.store.kafka.decoders.JsonMessageReader",
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "exec.errors.verbose",
  "bool_val" : true,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "store.kafka.poll.timeout",
  "num_val" : 5000,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.width.max_per_node",
  "num_val" : 2,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "kafka-scan",
"@id" : 6,
"userName" : "",
"kafkaStoragePluginConfig" : {
  "type" : "kafka",
  "kafkaConsumerProps" : {
"bootstrap.servers" : "127.0.0.1:56524",
"group.id" : "drill-test-consumer"
  },
  "enabled" : true
},
"columns" : [ "`**`", "`kafkaMsgOffset`" ],
"kafkaScanSpec" : {
  "topicName" : "drill-pushdown-topic"
},
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "project",
"@id" : 5,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`**`"
}, {
  "ref" : "`kafkaMsgOffset`",
  "expr" : "`kafkaMsgOffset`"
} ],
"child" : 6,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 5.0
}
  }, {
"pop" : "filter",
"@id" : 4,
"child" : 5,
"expr" : "equal(`kafkaMsgOffset`, 9) ",
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 0.75
}
  }, {
"pop" : "selection-vector-remover",
"@id" : 3,
"child" : 4,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 2,
"exprs" : [ {
  "ref" : "`T23¦¦**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 3,
"outputProj" : false,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "project",
"@id" : 1,
"exprs" : [ {
  "ref" : "`**`",
  "expr" : "`T23¦¦**`"
} ],
"child" : 2,
"outputProj" : true,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  }, {
"pop" : "screen",
"@id" : 0,
"child" : 1,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 1.0
}
  } ]
}!
{code}

In the earlier checkin there is a change in the way cost is being represented. 
This has the changed the test which I think is not right. The pattern to 
compare in the plan should be made smart to fix this issue generically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7163) Join query fails with java.lang.IllegalArgumentException: null

2019-04-09 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-7163:
-

 Summary: Join query fails with java.lang.IllegalArgumentException: 
null
 Key: DRILL-7163
 URL: https://issues.apache.org/jira/browse/DRILL-7163
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.15.0
Reporter: Khurram Faraaz


Join query fails with java.lang.IllegalArgumentException: null

Drill : 1.15.0

Failing query is

{noformat}
Select * 
From 
( 
select 
convert_from(t.itm.iUUID, 'UTF8') iUUID, 
convert_from(t.UPC.UPC14, 'UTF8') UPC14, 
convert_from(t.itm.upcDesc, 'UTF8') upcDesc, 
convert_from(t.ris.mstBrdOid, 'UTF8') mstBrdOid, 
convert_from(t.ris.vrfLgyMtch, 'UTF8') vrfLgyMtch, 
convert_from(t.itm.mtch.cfdMtch, 'UTF8') cfdMtch, 
convert_from(t.itm.uoM, 'UTF8') uoM, 
convert_from(t.uomRec.valVal, 'UTF8') uomVal, 
case when a.iUUID is null then 0 else 1 end as keyind 
from hbase.`/mapr/tables/item-master` t 
left outer join 
( 
select distinct 
convert_from(t.m.iUUID, 'UTF8') iUUID 
from hbase.`/mapr/tables/items` t 
) a 
on t.itm.iUUID = a.iUUID 
) i 
where (i.mstBrdOid is null 
or i.vrfLgyMtch is null) 
and i.keyind=1 
{noformat}

Stack trace from drillbit.log
{noformat}
2019-03-27 11:45:44,563 [23646564-3d23-f32b-6f68-11d7c4dd7a19:frag:1:0] ERROR 
o.a.d.e.physical.impl.BaseRootExec - Batch dump started: dumping last 2 failed 
batches
2019-03-27 11:45:44,564 [23646564-3d23-f32b-6f68-11d7c4dd7a19:frag:1:0] ERROR 
o.a.d.e.p.i.p.ProjectRecordBatch - 
ProjectRecordBatch[projector=Projector[vector2=null, selectionVectorMode=NONE], 
hasRemainder=false, remainderIndex=0, recordCount=0, 
container=org.apache.drill.exec.record.VectorContainer@2133fd0e[recordCount = 
0, schemaChanged = false, schema = BatchSchema [fields=[[`row_key` 
(VARBINARY:REQUIRED)], [`clnDesc` (MAP:REQUIRED), children=([`bndlCnt` 
(VARBINARY:OPTIONAL)], [`by` (VARBINARY:OPTIONAL)], [`desc` 
(VARBINARY:OPTIONAL)], [`dt` (VARBINARY:OPTIONAL)], [`descExt` 
(VARBINARY:OPTIONAL)])], [`dup` (MAP:REQUIRED), children=([`dupBy` 
(VARBINARY:OPTIONAL)], [`dupDt` (VARBINARY:OPTIONAL)], [`duplicate` 
(VARBINARY:OPTIONAL)], [`preferred` (VARBINARY:OPTIONAL)])], [`itm` 
(MAP:REQUIRED), children=([`iUUID` (VARBINARY:OPTIONAL)], [`cfdLgyMtch` 
(VARBINARY:OPTIONAL)], [`uoM` (VARBINARY:OPTIONAL)], [`upcCd` 
(VARBINARY:OPTIONAL)], [`upcDesc` (VARBINARY:OPTIONAL)], [`promo` 
(VARBINARY:OPTIONAL)])], [`lckSts` (MAP:REQUIRED), children=([`lckBy` 
(VARBINARY:OPTIONAL)], [`lckDt` (VARBINARY:OPTIONAL)])], [`lgy` (MAP:REQUIRED), 
children=([`lgyBr` (VARBINARY:OPTIONAL)])], [`obs` (MAP:REQUIRED), 
children=([`POSFile` (VARBINARY:OPTIONAL)])], [`prmRec` (MAP:REQUIRED)], [`ris` 
(MAP:REQUIRED), children=([`UPC` (VARBINARY:OPTIONAL)], [`brdDesc` 
(VARBINARY:OPTIONAL)], [`brdExtDesc` (VARBINARY:OPTIONAL)], [`brdFamDesc` 
(VARBINARY:OPTIONAL)], [`brdTypeCd` (VARBINARY:OPTIONAL)], [`flvDesc` 
(VARBINARY:OPTIONAL)], [`mfgDesc` (VARBINARY:OPTIONAL)], [`modBy` 
(VARBINARY:OPTIONAL)], [`modDt` (VARBINARY:OPTIONAL)], [`msaCatCd` 
(VARBINARY:OPTIONAL)])], [`rjr` (MAP:REQUIRED)], [`uomRec` (MAP:REQUIRED), 
children=([`valBy` (VARBINARY:OPTIONAL)], [`valDt` (VARBINARY:OPTIONAL)], 
[`valVal` (VARBINARY:OPTIONAL)], [`recBy` (VARBINARY:OPTIONAL)], [`recDt` 
(VARBINARY:OPTIONAL)], [`recRat` (VARBINARY:OPTIONAL)], [`recVal` 
(VARBINARY:OPTIONAL)])], [`upc` (MAP:REQUIRED), children=([`UPC14` 
(VARBINARY:OPTIONAL)], [`allUPCVar` (VARBINARY:OPTIONAL)])], [`$f12` 
(VARBINARY:OPTIONAL)], [`iUUID` (VARCHAR:OPTIONAL)]], selectionVector=NONE], 
wrappers = [org.apache.drill.exec.vector.VarBinaryVector@b23a384[field = 
[`row_key` (VARBINARY:REQUIRED)], ...], 
org.apache.drill.exec.vector.complex.MapVector@61c779ff, 
org.apache.drill.exec.vector.complex.MapVector@575c0f96, 
org.apache.drill.exec.vector.complex.MapVector@69b943fe, 
org.apache.drill.exec.vector.complex.MapVector@7f90e2ce, 
org.apache.drill.exec.vector.complex.MapVector@25c27442, 
org.apache.drill.exec.vector.complex.MapVector@12d5ffd3, 
org.apache.drill.exec.vector.complex.MapVector@3150f8c4, 
org.apache.drill.exec.vector.complex.MapVector@49aefab2, 
org.apache.drill.exec.vector.complex.MapVector@7f78e7a1, 
org.apache.drill.exec.vector.complex.MapVector@426ea4fa, 
org.apache.drill.exec.vector.complex.MapVector@74cee2ab, 
org.apache.drill.exec.vector.NullableVarBinaryVector@4a0bfdea[field = [`$f12` 
(VARBINARY:OPTIONAL)], ...], 
org.apache.drill.exec.vector.NullableVarCharVector@72f64ee5[field = [`iUUID` 
(VARCHAR:OPTIONAL)], ...]], ...]]
2019-03-27 11:45:44,565 [23646564-3d23-f32b-6f68-11d7c4dd7a19:frag:1:0] ERROR 
o.a.d.e.p.impl.join.HashJoinBatch - 
HashJoinBatch[container=org.apache.drill.exec.record.VectorContainer@45887d35[recordCount
 = 0, schemaChanged = false, schema = BatchSchema [fields=[[`row_key` 
(VARBINARY:REQUIRED)], [`clnDesc` (MAP:REQUIRED), children=([`bndlCnt` 
(VARBINARY:OPTIONAL)], [`by` (VARBINARY:OPTIONAL)], [`desc` 
(VARBINARY:OPTIONAL)], 

[jira] [Commented] (DRILL-7065) Ensure backward compatibility is maintained

2019-04-09 Thread Sorabh Hamirwasia (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813660#comment-16813660
 ] 

Sorabh Hamirwasia commented on DRILL-7065:
--

Merged with DRILL-7063

> Ensure backward compatibility is maintained 
> 
>
> Key: DRILL-7065
> URL: https://issues.apache.org/jira/browse/DRILL-7065
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7065) Ensure backward compatibility is maintained

2019-04-09 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7065:
-
Reviewer: Volodymyr Vysotskyi

> Ensure backward compatibility is maintained 
> 
>
> Key: DRILL-7065
> URL: https://issues.apache.org/jira/browse/DRILL-7065
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7065) Ensure backward compatibility is maintained

2019-04-09 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7065:
-
Labels: ready-to-commit  (was: )

> Ensure backward compatibility is maintained 
> 
>
> Key: DRILL-7065
> URL: https://issues.apache.org/jira/browse/DRILL-7065
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7066) Auto-refresh should pick up existing columns from metadata cache

2019-04-09 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7066:
-
Labels: ready-to-commit  (was: )

> Auto-refresh should pick up existing columns from metadata cache
> 
>
> Key: DRILL-7066
> URL: https://issues.apache.org/jira/browse/DRILL-7066
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7066) Auto-refresh should pick up existing columns from metadata cache

2019-04-09 Thread Sorabh Hamirwasia (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sorabh Hamirwasia updated DRILL-7066:
-
Reviewer: Aman Sinha

> Auto-refresh should pick up existing columns from metadata cache
> 
>
> Key: DRILL-7066
> URL: https://issues.apache.org/jira/browse/DRILL-7066
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7066) Auto-refresh should pick up existing columns from metadata cache

2019-04-09 Thread Sorabh Hamirwasia (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813656#comment-16813656
 ] 

Sorabh Hamirwasia commented on DRILL-7066:
--

Merged with DRILL-7063

> Auto-refresh should pick up existing columns from metadata cache
> 
>
> Key: DRILL-7066
> URL: https://issues.apache.org/jira/browse/DRILL-7066
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813655#comment-16813655
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

amansinha100 commented on issue #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#issuecomment-481356363
 
 
   Rebased on master. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Leverage the summary's totalRowCount and totalNullCount for COUNT() queries 
> (also prevent eager expansion of files)
> ---
>
> Key: DRILL-7064
> URL: https://issues.apache.org/jira/browse/DRILL-7064
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Aman Sinha
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This sub-task is meant to leverage the Parquet metadata cache's summary 
> stats: totalRowCount (across all files and row groups) and the per-column 
> totalNullCount (across all files and row groups) to answer plain COUNT 
> aggregation queries without Group-By.  These are currently converted to a 
> DirectScan by the ConvertCountToDirectScanRule which utilizes the row group 
> metadata; however this rule is applied on Drill Logical rels and converts the 
> logical plan to a physical plan with DirectScanPrel but this is too late 
> since the DrillScanRel that is already created during logical planning has 
> already read the entire metadata cache file along with its full list of row 
> group entries. The metadata cache file can grow quite large and this does not 
> scale. 
> The solution is to use the Metadata Summary file that is created in 
> DRILL-7063 and create a new rule that will apply early on such that it 
> operates on the Calcite logical rels instead of the Drill logical rels and 
> prevents eager expansion of the list of files/row groups.   
> We will not remove the existing rule. The existing rule will continue to 
> operate as before because it is possible that after some transformations, we 
> still want to apply the optimizations for COUNT queries. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-09 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7064:
---
Labels: ready-to-commit  (was: )

> Leverage the summary's totalRowCount and totalNullCount for COUNT() queries 
> (also prevent eager expansion of files)
> ---
>
> Key: DRILL-7064
> URL: https://issues.apache.org/jira/browse/DRILL-7064
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Aman Sinha
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This sub-task is meant to leverage the Parquet metadata cache's summary 
> stats: totalRowCount (across all files and row groups) and the per-column 
> totalNullCount (across all files and row groups) to answer plain COUNT 
> aggregation queries without Group-By.  These are currently converted to a 
> DirectScan by the ConvertCountToDirectScanRule which utilizes the row group 
> metadata; however this rule is applied on Drill Logical rels and converts the 
> logical plan to a physical plan with DirectScanPrel but this is too late 
> since the DrillScanRel that is already created during logical planning has 
> already read the entire metadata cache file along with its full list of row 
> group entries. The metadata cache file can grow quite large and this does not 
> scale. 
> The solution is to use the Metadata Summary file that is created in 
> DRILL-7063 and create a new rule that will apply early on such that it 
> operates on the Calcite logical rels instead of the Drill logical rels and 
> prevents eager expansion of the list of files/row groups.   
> We will not remove the existing rule. The existing rule will continue to 
> operate as before because it is possible that after some transformations, we 
> still want to apply the optimizations for COUNT queries. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813614#comment-16813614
 ] 

ASF GitHub Bot commented on DRILL-7089:
---

vvysotskyi commented on issue #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728#issuecomment-481335547
 
 
   @amansinha100, thanks for pointing this, they passed, so I have added 
`ready-to-commit` label.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-09 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-7089:
---
Labels: ready-to-commit  (was: )

> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813611#comment-16813611
 ] 

ASF GitHub Bot commented on DRILL-7089:
---

amansinha100 commented on issue #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728#issuecomment-481333534
 
 
   @vvysotskyi if the regression tests pass you can update the read-to-commit 
label in the JIRA. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813591#comment-16813591
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

amansinha100 commented on issue #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#issuecomment-481324430
 
 
   @vvysotskyi I have addressed the missed comment.  Pls take a look.  Also, I 
haven't yet rebased on latest master since master already has the DRILL-7063 
commit, so I will need to decouple this PR such that only  the changes for 
DRILL-7064 can be applied. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Leverage the summary's totalRowCount and totalNullCount for COUNT() queries 
> (also prevent eager expansion of files)
> ---
>
> Key: DRILL-7064
> URL: https://issues.apache.org/jira/browse/DRILL-7064
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Aman Sinha
>Priority: Major
> Fix For: 1.16.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This sub-task is meant to leverage the Parquet metadata cache's summary 
> stats: totalRowCount (across all files and row groups) and the per-column 
> totalNullCount (across all files and row groups) to answer plain COUNT 
> aggregation queries without Group-By.  These are currently converted to a 
> DirectScan by the ConvertCountToDirectScanRule which utilizes the row group 
> metadata; however this rule is applied on Drill Logical rels and converts the 
> logical plan to a physical plan with DirectScanPrel but this is too late 
> since the DrillScanRel that is already created during logical planning has 
> already read the entire metadata cache file along with its full list of row 
> group entries. The metadata cache file can grow quite large and this does not 
> scale. 
> The solution is to use the Metadata Summary file that is created in 
> DRILL-7063 and create a new rule that will apply early on such that it 
> operates on the Calcite logical rels instead of the Drill logical rels and 
> prevents eager expansion of the list of files/row groups.   
> We will not remove the existing rule. The existing rule will continue to 
> operate as before because it is possible that after some transformations, we 
> still want to apply the optimizations for COUNT queries. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813561#comment-16813561
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

amansinha100 commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#discussion_r273561165
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java
 ##
 @@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.planner.common.CountToDirectScanUtils;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.direct.MetadataDirectGroupScan;
+import org.apache.drill.exec.store.parquet.ParquetFormatConfig;
+import org.apache.drill.exec.store.parquet.ParquetReaderConfig;
+import org.apache.drill.exec.store.parquet.metadata.Metadata;
+import org.apache.drill.exec.store.parquet.metadata.Metadata_V4;
+import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.LinkedHashMap;
+import java.util.Set;
+
+/**
+ *  This rule is a logical planning counterpart to a corresponding 
ConvertCountToDirectScanPrule
+ * physical rule
+ * 
+ * 
+ * This rule will convert " select count(*)  as mycount from table "
+ * or " select count(not-nullable-expr) as mycount from table " into
+ * 
+ *Project(mycount)
+ * \
+ *DirectGroupScan ( PojoRecordReader ( rowCount ))
+ *
+ * or " select count(column) as mycount from table " into
+ * 
+ *  Project(mycount)
+ *   \
+ *DirectGroupScan (PojoRecordReader (columnValueCount))
+ *
+ * Rule can be applied if query contains multiple count expressions.
+ * " select count(column1), count(column2), count(*) from table "
+ * 
+ *
+ * 
+ * The rule utilizes the Parquet Metadata Cache's summary information to 
retrieve the total row count
+ * and the per-column null count.  As such, the rule is only applicable for 
Parquet tables and only if the
+ * metadata cache has been created with the summary information.
+ * 
+ */
+public class ConvertCountToDirectScanRule extends RelOptRule {
+
+  public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.some(Project.class,
+RelOptHelper.any(TableScan.class))), 
"Agg_on_proj_on_scan:logical");
+
+  public static final RelOptRule AGG_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.any(TableScan.class)), 
"Agg_on_scan:logical");
+
+  

[jira] [Comment Edited] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs

2019-04-09 Thread Vitalii Diravka (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813486#comment-16813486
 ] 

Vitalii Diravka edited comment on DRILL-7162 at 4/9/19 2:35 PM:


Jetty version is updated in latest master version to 9.3, see DRILL-7051. There 
an issue with Jetty 9.4 version, see DRILL-7135.
 [~er.ayushsha...@gmail.com] Regarding other CVEs, if you are able to fix them 
please open the PRs.
 Thanks 


was (Author: vitalii):
Jetty version is updated in latest master version to 9.3, see DRILL-7051. There 
an issue with Jetty 9.4 version, see DRILL-7135.
[~er.ayushsha...@gmail.com] Regarding other CVEs, please publish here the list 
and if you are able to fix them please open the PRs.
Thanks 

>  Apache Drill uses 3rd Party with Highest CVEs
> --
>
> Key: DRILL-7162
> URL: https://issues.apache.org/jira/browse/DRILL-7162
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0, 1.14.0, 1.15.0
>Reporter: Ayush Sharma
>Priority: Major
>
> Apache Drill uses rd party libraries with almost 250+ CVEs.
> Most of the CVEs are in the older version of Jetty (9.1.x) whereas the 
> current version of Jetty is 9.4.x
> Also many of the other libraries are in EOF versions and the are not patched 
> even in the latest release.
> This creates an issue of security when we use it in production.
> We are able to replace many older version of libraries with the latest 
> versions with no CVEs , however many of them are not replaceable as it is and 
> would require some changes in the source code.
> The jetty version is of the highest priority and needs migration to 9.4.x 
> version immediately.
>  
> Please look into this issue at immediate priority as it compromises with the 
> security of the application utilizing Apache Drill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs

2019-04-09 Thread Vitalii Diravka (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813486#comment-16813486
 ] 

Vitalii Diravka commented on DRILL-7162:


Jetty version is updated in latest master version to 9.3, see DRILL-7051. There 
an issue with Jetty 9.4 version, see DRILL-7135.
[~er.ayushsha...@gmail.com] Regarding other CVEs, please publish here the list 
and if you are able to fix them please open the PRs.
Thanks 

>  Apache Drill uses 3rd Party with Highest CVEs
> --
>
> Key: DRILL-7162
> URL: https://issues.apache.org/jira/browse/DRILL-7162
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0, 1.14.0, 1.15.0
>Reporter: Ayush Sharma
>Priority: Blocker
>
> Apache Drill uses rd party libraries with almost 250+ CVEs.
> Most of the CVEs are in the older version of Jetty (9.1.x) whereas the 
> current version of Jetty is 9.4.x
> Also many of the other libraries are in EOF versions and the are not patched 
> even in the latest release.
> This creates an issue of security when we use it in production.
> We are able to replace many older version of libraries with the latest 
> versions with no CVEs , however many of them are not replaceable as it is and 
> would require some changes in the source code.
> The jetty version is of the highest priority and needs migration to 9.4.x 
> version immediately.
>  
> Please look into this issue at immediate priority as it compromises with the 
> security of the application utilizing Apache Drill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs

2019-04-09 Thread Vitalii Diravka (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-7162:
---
Priority: Major  (was: Blocker)

>  Apache Drill uses 3rd Party with Highest CVEs
> --
>
> Key: DRILL-7162
> URL: https://issues.apache.org/jira/browse/DRILL-7162
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.13.0, 1.14.0, 1.15.0
>Reporter: Ayush Sharma
>Priority: Major
>
> Apache Drill uses rd party libraries with almost 250+ CVEs.
> Most of the CVEs are in the older version of Jetty (9.1.x) whereas the 
> current version of Jetty is 9.4.x
> Also many of the other libraries are in EOF versions and the are not patched 
> even in the latest release.
> This creates an issue of security when we use it in production.
> We are able to replace many older version of libraries with the latest 
> versions with no CVEs , however many of them are not replaceable as it is and 
> would require some changes in the source code.
> The jetty version is of the highest priority and needs migration to 9.4.x 
> version immediately.
>  
> Please look into this issue at immediate priority as it compromises with the 
> security of the application utilizing Apache Drill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7162) Apache Drill uses 3rd Party with Highest CVEs

2019-04-09 Thread Ayush Sharma (JIRA)
Ayush Sharma created DRILL-7162:
---

 Summary:  Apache Drill uses 3rd Party with Highest 
CVEs
 Key: DRILL-7162
 URL: https://issues.apache.org/jira/browse/DRILL-7162
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.15.0, 1.14.0, 1.13.0
Reporter: Ayush Sharma


Apache Drill uses rd party libraries with almost 250+ CVEs.

Most of the CVEs are in the older version of Jetty (9.1.x) whereas the current 
version of Jetty is 9.4.x

Also many of the other libraries are in EOF versions and the are not patched 
even in the latest release.

This creates an issue of security when we use it in production.

We are able to replace many older version of libraries with the latest versions 
with no CVEs , however many of them are not replaceable as it is and would 
require some changes in the source code.

The jetty version is of the highest priority and needs migration to 9.4.x 
version immediately.

 

Please look into this issue at immediate priority as it compromises with the 
security of the application utilizing Apache Drill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7161) Aggregation with group by clause

2019-04-09 Thread Gayathri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gayathri updated DRILL-7161:

Affects Version/s: 1.14.0

> Aggregation with group by clause
> 
>
> Key: DRILL-7161
> URL: https://issues.apache.org/jira/browse/DRILL-7161
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.14.0
>Reporter: Gayathri
>Priority: Critical
>  Labels: Drill, issue
>
> Facing some issues with the following case:
> Json file (*sample.json*) is having the following content:
> {"a":2,"b":null}
> {"a":2,"b":null}
> {"a":3,"b":null}
> {"a":4,"b":null}
> *Query:*
> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a;
> *Error:*
> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions 
> supported for VarChar type
> *Observation:*
> If we query without using group by, then it is working fine without any 
> error. If group by is used, then sum of null values is throwing the above 
> error.
>  
> Can anyone please let us know the solution for this or if there are any 
> alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7161) Aggregation with group by clause

2019-04-09 Thread Gayathri (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gayathri updated DRILL-7161:

Labels: Drill issue  (was: )

> Aggregation with group by clause
> 
>
> Key: DRILL-7161
> URL: https://issues.apache.org/jira/browse/DRILL-7161
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Reporter: Gayathri
>Priority: Critical
>  Labels: Drill, issue
>
> Facing some issues with the following case:
> Json file (*sample.json*) is having the following content:
> {"a":2,"b":null}
> {"a":2,"b":null}
> {"a":3,"b":null}
> {"a":4,"b":null}
> *Query:*
> SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a;
> *Error:*
> UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions 
> supported for VarChar type
> *Observation:*
> If we query without using group by, then it is working fine without any 
> error. If group by is used, then sum of null values is throwing the above 
> error.
>  
> Can anyone please let us know the solution for this or if there are any 
> alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (DRILL-7153) Drill Fails to Build using JDK 1.8.0_65

2019-04-09 Thread Charles Givre (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Givre closed DRILL-7153.


Fixed.

> Drill Fails to Build using JDK 1.8.0_65
> ---
>
> Key: DRILL-7153
> URL: https://issues.apache.org/jira/browse/DRILL-7153
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.16.0
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Blocker
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Drill fails to build when using Java 1.8.0_65.  Throws the following error:
> [{{ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile 
> (default-compile) on project drill-java-exec: Compilation failure
> [ERROR] 
> /Users/cgivre/github/drill-dev/drill/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/FilterEvaluatorUtils.java:[59,68]
>  error: unreported exception E; must be caught or declared to be thrown
> [ERROR]   where E,T,V are type-variables:
> [ERROR] E extends Exception declared in method 
> accept(ExprVisitor,V)
> [ERROR] T extends Object declared in method 
> accept(ExprVisitor,V)
> [ERROR] V extends Object declared in method 
> accept(ExprVisitor,V)
> [ERROR]
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the 
> command
> [ERROR]   mvn  -rf :drill-java-exec}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7089) Implement caching of BaseMetadata classes

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813263#comment-16813263
 ] 

ASF GitHub Bot commented on DRILL-7089:
---

vvysotskyi commented on issue #1728: DRILL-7089: Implement caching for 
TableMetadataProvider at query level and adapt statistics to use Drill 
metastore API
URL: https://github.com/apache/drill/pull/1728#issuecomment-481200553
 
 
   Rebased onto the master and resolved merge conflicts.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Implement caching of BaseMetadata classes
> -
>
> Key: DRILL-7089
> URL: https://issues.apache.org/jira/browse/DRILL-7089
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> In the scope of DRILL-6852 were introduced new classes for metadata usage. 
> These classes may be reused in other GroupScan instances to preserve heap 
> usage for the case when metadata is large.
> The idea is to store {{BaseMetadata}} inheritors in {{DrillTable}} and pass 
> them to the {{GroupScan}}, so in the scope of the single query, it will be 
> possible to reuse them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-4946) org.objectweb.asm.tree.analysis.AnalyzerException printed to console in embedded mode

2019-04-09 Thread Volodymyr Vysotskyi (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi resolved DRILL-4946.

   Resolution: Cannot Reproduce
Fix Version/s: 1.16.0

Resolving this Jira, since it is not reproduced anymore using query from the 
description.

> org.objectweb.asm.tree.analysis.AnalyzerException printed to console in 
> embedded mode
> -
>
> Key: DRILL-4946
> URL: https://issues.apache.org/jira/browse/DRILL-4946
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
> Fix For: 1.16.0
>
>
> Testing by querying a json file got AnalyzerException printed. 
> The problem was due to scalar_replacement mode is default to be 'try', and 
> org.objectweb.asm.util.CheckMethodAdapter is printing stack trace to stderr. 
> [shi@cshi-centos1 private-drill]$ cat /tmp/conv.json 
> {"row": "0", "key": "\\x4a\\x31\\x39\\x38", "key2": "4a313938", "kp1": 
> "4a31", "kp2": "38"}
> {"row": "1", "key": null, "key2": null, "kp1": null, "kp2": null}
> {"row": "2", "key": "\\x4e\\x4f\\x39\\x51", "key2": "4e4f3951", "kp1": 
> "4e4f", "kp2": "51"}
> {"row": "3", "key": "\\x6e\\x6f\\x39\\x31", "key2": "6e6f3931", "kp1": 
> "6e6f", "kp2": "31"}
> 0: jdbc:drill:zk=local> SELECT convert_from(binary_string(key), 'INT_BE') as 
> intkey from dfs.`/tmp/conv.json`;
> org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 158: 
> Expected an object reference, but found .
>   at org.objectweb.asm.tree.analysis.Analyzer.analyze(Analyzer.java:294)
>   at 
> org.objectweb.asm.util.CheckMethodAdapter$1.visitEnd(CheckMethodAdapter.java:450)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.objectweb.asm.util.CheckMethodAdapter.visitEnd(CheckMethodAdapter.java:1028)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.bytecode.InstructionModifier.visitEnd(InstructionModifier.java:508)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
>   at 
> org.apache.drill.exec.compile.bytecode.ScalarReplacementNode.visitEnd(ScalarReplacementNode.java:87)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.bytecode.AloadPopRemover.visitEnd(AloadPopRemover.java:136)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:726)
>   at org.objectweb.asm.tree.ClassNode.accept(ClassNode.java:412)
>   at 
> org.apache.drill.exec.compile.MergeAdapter.getMergedClass(MergeAdapter.java:223)
>   at 
> org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:263)
>   at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78)
>   at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:74)
>   at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
>   at 
> org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:63)
>   at 
> org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:56)
>   at 
> org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:310)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:484)
>   at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>   at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
>   at 
> 

[jira] [Commented] (DRILL-4946) org.objectweb.asm.tree.analysis.AnalyzerException printed to console in embedded mode

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813226#comment-16813226
 ] 

ASF GitHub Bot commented on DRILL-4946:
---

vvysotskyi commented on issue #619: DRILL-4946: redirect System.err so users 
under embedded mode won't se…
URL: https://github.com/apache/drill/pull/619#issuecomment-481192148
 
 
   Closing this PR, since most of the problems which caused errors during 
scalar replacement were resolved.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> org.objectweb.asm.tree.analysis.AnalyzerException printed to console in 
> embedded mode
> -
>
> Key: DRILL-4946
> URL: https://issues.apache.org/jira/browse/DRILL-4946
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
>
> Testing by querying a json file got AnalyzerException printed. 
> The problem was due to scalar_replacement mode is default to be 'try', and 
> org.objectweb.asm.util.CheckMethodAdapter is printing stack trace to stderr. 
> [shi@cshi-centos1 private-drill]$ cat /tmp/conv.json 
> {"row": "0", "key": "\\x4a\\x31\\x39\\x38", "key2": "4a313938", "kp1": 
> "4a31", "kp2": "38"}
> {"row": "1", "key": null, "key2": null, "kp1": null, "kp2": null}
> {"row": "2", "key": "\\x4e\\x4f\\x39\\x51", "key2": "4e4f3951", "kp1": 
> "4e4f", "kp2": "51"}
> {"row": "3", "key": "\\x6e\\x6f\\x39\\x31", "key2": "6e6f3931", "kp1": 
> "6e6f", "kp2": "31"}
> 0: jdbc:drill:zk=local> SELECT convert_from(binary_string(key), 'INT_BE') as 
> intkey from dfs.`/tmp/conv.json`;
> org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 158: 
> Expected an object reference, but found .
>   at org.objectweb.asm.tree.analysis.Analyzer.analyze(Analyzer.java:294)
>   at 
> org.objectweb.asm.util.CheckMethodAdapter$1.visitEnd(CheckMethodAdapter.java:450)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.objectweb.asm.util.CheckMethodAdapter.visitEnd(CheckMethodAdapter.java:1028)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.bytecode.InstructionModifier.visitEnd(InstructionModifier.java:508)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
>   at 
> org.apache.drill.exec.compile.bytecode.ScalarReplacementNode.visitEnd(ScalarReplacementNode.java:87)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.bytecode.AloadPopRemover.visitEnd(AloadPopRemover.java:136)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:726)
>   at org.objectweb.asm.tree.ClassNode.accept(ClassNode.java:412)
>   at 
> org.apache.drill.exec.compile.MergeAdapter.getMergedClass(MergeAdapter.java:223)
>   at 
> org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:263)
>   at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78)
>   at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:74)
>   at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
>   at 
> org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:63)
>   at 
> org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:56)
>   at 
> 

[jira] [Commented] (DRILL-4946) org.objectweb.asm.tree.analysis.AnalyzerException printed to console in embedded mode

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-4946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813227#comment-16813227
 ] 

ASF GitHub Bot commented on DRILL-4946:
---

vvysotskyi commented on pull request #619: DRILL-4946: redirect System.err so 
users under embedded mode won't se…
URL: https://github.com/apache/drill/pull/619
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> org.objectweb.asm.tree.analysis.AnalyzerException printed to console in 
> embedded mode
> -
>
> Key: DRILL-4946
> URL: https://issues.apache.org/jira/browse/DRILL-4946
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Chunhui Shi
>Assignee: Chunhui Shi
>Priority: Critical
>
> Testing by querying a json file got AnalyzerException printed. 
> The problem was due to scalar_replacement mode is default to be 'try', and 
> org.objectweb.asm.util.CheckMethodAdapter is printing stack trace to stderr. 
> [shi@cshi-centos1 private-drill]$ cat /tmp/conv.json 
> {"row": "0", "key": "\\x4a\\x31\\x39\\x38", "key2": "4a313938", "kp1": 
> "4a31", "kp2": "38"}
> {"row": "1", "key": null, "key2": null, "kp1": null, "kp2": null}
> {"row": "2", "key": "\\x4e\\x4f\\x39\\x51", "key2": "4e4f3951", "kp1": 
> "4e4f", "kp2": "51"}
> {"row": "3", "key": "\\x6e\\x6f\\x39\\x31", "key2": "6e6f3931", "kp1": 
> "6e6f", "kp2": "31"}
> 0: jdbc:drill:zk=local> SELECT convert_from(binary_string(key), 'INT_BE') as 
> intkey from dfs.`/tmp/conv.json`;
> org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 158: 
> Expected an object reference, but found .
>   at org.objectweb.asm.tree.analysis.Analyzer.analyze(Analyzer.java:294)
>   at 
> org.objectweb.asm.util.CheckMethodAdapter$1.visitEnd(CheckMethodAdapter.java:450)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.objectweb.asm.util.CheckMethodAdapter.visitEnd(CheckMethodAdapter.java:1028)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.CheckMethodVisitorFsm.visitEnd(CheckMethodVisitorFsm.java:114)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.bytecode.InstructionModifier.visitEnd(InstructionModifier.java:508)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
>   at 
> org.apache.drill.exec.compile.bytecode.ScalarReplacementNode.visitEnd(ScalarReplacementNode.java:87)
>   at org.objectweb.asm.MethodVisitor.visitEnd(MethodVisitor.java:877)
>   at 
> org.apache.drill.exec.compile.bytecode.AloadPopRemover.visitEnd(AloadPopRemover.java:136)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:837)
>   at org.objectweb.asm.tree.MethodNode.accept(MethodNode.java:726)
>   at org.objectweb.asm.tree.ClassNode.accept(ClassNode.java:412)
>   at 
> org.apache.drill.exec.compile.MergeAdapter.getMergedClass(MergeAdapter.java:223)
>   at 
> org.apache.drill.exec.compile.ClassTransformer.getImplementationClass(ClassTransformer.java:263)
>   at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:78)
>   at 
> org.apache.drill.exec.compile.CodeCompiler$Loader.load(CodeCompiler.java:74)
>   at 
> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
>   at 
> org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:63)
>   at 
> org.apache.drill.exec.compile.CodeCompiler.getImplementationClass(CodeCompiler.java:56)
>   at 
> org.apache.drill.exec.ops.FragmentContext.getImplementationClass(FragmentContext.java:310)
>   at 
> 

[jira] [Commented] (DRILL-5679) Document JAVA_HOME requirements for installing Drill in distributed mode

2019-04-09 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813214#comment-16813214
 ] 

Arina Ielchiieva commented on DRILL-5679:
-

[~bbevens] I think this was meant for both but there is no need to mention 
that, it's just a general pre-requisites for Windows installation.

{quote}
Click New, and enter JAVA_HOME as the variable name. For the variable value, 
enter the path to your JDK installation. Instead of using Program Files in the 
path name, use progra~1. This is required because Drill cannot use file paths 
with spaces.
{quote}
Not all users have Java installed in Program Files directory, user can chose 
any directory during Java installation. I guess you just should mention that.



> Document JAVA_HOME requirements for installing Drill in distributed mode
> 
>
> Key: DRILL-5679
> URL: https://issues.apache.org/jira/browse/DRILL-5679
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-complete
> Fix For: Future
>
>
> There is general requirement that JAVA_HOME variable should not contain 
> spaces.
> For example, during Drill installation in distributed mode on Windows user 
> can see the following error:
> {noformat}
> C:\Drill/bin/runbit: line 107: exec: C:\Program: not found
> {noformat}
> There are two options to fix this problem:
> {noformat}
> 1. Install JAVA in directory without spaces.
> 2. Replace "Program Files" in your JAVA_HOME variable to progra~1 or progra~2 
> (if in x86).
> Example: JAVA_HOME="C:\progra~1\Java\jdk1.7.0_71"
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (DRILL-5679) Document JAVA_HOME requirements for installing Drill in distributed mode

2019-04-09 Thread Arina Ielchiieva (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813214#comment-16813214
 ] 

Arina Ielchiieva edited comment on DRILL-5679 at 4/9/19 10:04 AM:
--

[~bbevens] I think this was meant for both but there is no need to mention 
that, it's just a general pre-requisites for Windows installation.

{quote}
Click New, and enter JAVA_HOME as the variable name. For the variable value, 
enter the path to your JDK installation. Instead of using Program Files in the 
path name, use progra~1. This is required because Drill cannot use file paths 
with spaces.
{quote}
Not all users have Java installed in Program Files directory, user can chose 
any directory during Java installation. I guess you just should mention that.




was (Author: arina):
[~bbevens] I think this was meant for both but there is no need to mention 
that, it's just a general pre-requisites for Windows installation.

{quote}
Click New, and enter JAVA_HOME as the variable name. For the variable value, 
enter the path to your JDK installation. Instead of using Program Files in the 
path name, use progra~1. This is required because Drill cannot use file paths 
with spaces.
{quote}
Not all users have Java installed in Program Files directory, user can chose 
any directory during Java installation. I guess you just should mention that.



> Document JAVA_HOME requirements for installing Drill in distributed mode
> 
>
> Key: DRILL-5679
> URL: https://issues.apache.org/jira/browse/DRILL-5679
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.10.0
>Reporter: Arina Ielchiieva
>Assignee: Bridget Bevens
>Priority: Major
>  Labels: doc-complete
> Fix For: Future
>
>
> There is general requirement that JAVA_HOME variable should not contain 
> spaces.
> For example, during Drill installation in distributed mode on Windows user 
> can see the following error:
> {noformat}
> C:\Drill/bin/runbit: line 107: exec: C:\Program: not found
> {noformat}
> There are two options to fix this problem:
> {noformat}
> 1. Install JAVA in directory without spaces.
> 2. Replace "Program Files" in your JAVA_HOME variable to progra~1 or progra~2 
> (if in x86).
> Example: JAVA_HOME="C:\progra~1\Java\jdk1.7.0_71"
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-7161) Aggregation with group by clause

2019-04-09 Thread Gayathri (JIRA)
Gayathri created DRILL-7161:
---

 Summary: Aggregation with group by clause
 Key: DRILL-7161
 URL: https://issues.apache.org/jira/browse/DRILL-7161
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Reporter: Gayathri


Facing some issues with the following case:

Json file (*sample.json*) is having the following content:
{"a":2,"b":null}
{"a":2,"b":null}
{"a":3,"b":null}
{"a":4,"b":null}

*Query:*

SELECT a, sum(b) FROM dfs.`C:\\Users\\user\\Desktop\\sample.json` group by a;

*Error:*

UNSUPPORTED_OPERATION ERROR: Only COUNT, MIN and MAX aggregate functions 
supported for VarChar type

*Observation:*

If we query without using group by, then it is working fine without any error. 
If group by is used, then sum of null values is throwing the above error.

 

Can anyone please let us know the solution for this or if there are any 
alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6985) Fix sqlline.bat issues on Windows and add drill-embedded.bat

2019-04-09 Thread Volodymyr Vysotskyi (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813186#comment-16813186
 ] 

Volodymyr Vysotskyi commented on DRILL-6985:


Hi [~bbevens],

Updated pages look good, thanks!

> Fix sqlline.bat issues on Windows and add drill-embedded.bat
> 
>
> Key: DRILL-6985
> URL: https://issues.apache.org/jira/browse/DRILL-6985
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
> Environment: Windows 10
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-complete, ready-to-commit
> Fix For: 1.16.0
>
>
> *For documentation*
> {{drill-embedded.bat}} was added as handy script to Start Drill on Windows 
> without passing any params.
> Please updated the following section: 
> https://drill.apache.org/docs/starting-drill-on-windows/
> Other issues covered in this Jira:
> {{sqlline.bat}} fails for the next cases:
>  1. Specified file in the argument:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -f /tmp/q.sql
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Error: Could not find or load main class sqlline.SqlLine
> {noformat}
> 2. Specified file path that contains spaces:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -f "/tmp/q q.sql"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> q.sql""=="test" was unexpected at this time.
> {noformat}
> 3. Specified query in the argument:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -e "select * 
> from sys.version"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> * was unexpected at this time.
> {noformat}
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" -q "select 'a' 
> from sys.version"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> 'a' was unexpected at this time.
> {noformat}
> 4. Specified custom config location:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" 
> --config=/tmp/conf
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Error: Could not find or load main class sqlline.SqlLine
> {noformat}
> 5. Specified custom config location with spaces in the path:
> {noformat}
> apache-drill-1.15.0\bin>sqlline.bat -u "jdbc:drill:zk=local" 
> --config="/tmp/conf test"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> test"" was unexpected at this time.
> {noformat}
> 6. Sqlline was run from non-bin directory:
> {noformat}
> apache-drill-1.15.0>bin\sqlline.bat -u "jdbc:drill:zk=local"
> DRILL_ARGS - " -u jdbc:drill:zk=local"
> HADOOP_HOME not detected...
> HBASE_HOME not detected...
> Calculating Drill classpath...
> Error: Could not find or load main class sqlline.SqlLine
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7160) exec.query.max_rows QUERY-level options are shown on Profiles tab

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813174#comment-16813174
 ] 

ASF GitHub Bot commented on DRILL-7160:
---

vvysotskyi commented on issue #1742: DRILL-7160: e.q.max_rows QUERY-level 
option shown even if not set
URL: https://github.com/apache/drill/pull/1742#issuecomment-481175705
 
 
   @kkhatua, `exec.query.max_rows` query level option should be present in 
query profile only for the case when auto-limit is applied or can be applied, 
but currently, it is present even for the case when the non-select query is 
submitted and `exec.query.max_rows` is set to non-zero value. Please fix this 
case and verify that other corner cases are handled correctly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> exec.query.max_rows QUERY-level options are shown on Profiles tab
> -
>
> Key: DRILL-7160
> URL: https://issues.apache.org/jira/browse/DRILL-7160
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Web Server
>Affects Versions: 1.16.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Kunal Khatua
>Priority: Blocker
> Fix For: 1.16.0
>
>
> As [~arina] has noticed, option {{exec.query.max_rows}} is shown on Web UI's 
> Profiles even when it was not set explicitly. The issue is because the option 
> is being set on the query level internally.
> From the code, looks like it is set in 
> {{DrillSqlWorker.checkAndApplyAutoLimit()}}, and perhaps a check whether the 
> value differs from the existing one should be added.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7064) Leverage the summary's totalRowCount and totalNullCount for COUNT() queries (also prevent eager expansion of files)

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813124#comment-16813124
 ] 

ASF GitHub Bot commented on DRILL-7064:
---

vvysotskyi commented on pull request #1736: DRILL-7064: Leverage the summary 
metadata for plain COUNT aggregates.
URL: https://github.com/apache/drill/pull/1736#discussion_r273380948
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/ConvertCountToDirectScanRule.java
 ##
 @@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.logical;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.plan.RelOptRuleOperand;
+import org.apache.calcite.rel.core.Aggregate;
+import org.apache.calcite.rel.core.AggregateCall;
+import org.apache.calcite.rel.core.Project;
+import org.apache.calcite.rel.core.TableScan;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.rex.RexInputRef;
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.FormatPluginConfig;
+
+import org.apache.drill.exec.physical.base.ScanStats;
+import org.apache.drill.exec.planner.common.CountToDirectScanUtils;
+import org.apache.drill.exec.planner.common.DrillRelOptUtil;
+
+import org.apache.drill.exec.planner.physical.PlannerSettings;
+import org.apache.drill.exec.store.ColumnExplorer;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.FileSystemPlugin;
+import org.apache.drill.exec.store.dfs.FormatSelection;
+import org.apache.drill.exec.store.dfs.NamedFormatPluginConfig;
+import org.apache.drill.exec.store.direct.MetadataDirectGroupScan;
+import org.apache.drill.exec.store.parquet.ParquetFormatConfig;
+import org.apache.drill.exec.store.parquet.ParquetReaderConfig;
+import org.apache.drill.exec.store.parquet.metadata.Metadata;
+import org.apache.drill.exec.store.parquet.metadata.Metadata_V4;
+import org.apache.drill.exec.store.pojo.DynamicPojoRecordReader;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableMap;
+import org.apache.hadoop.fs.Path;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.LinkedHashMap;
+import java.util.Set;
+
+/**
+ *  This rule is a logical planning counterpart to a corresponding 
ConvertCountToDirectScanPrule
+ * physical rule
+ * 
+ * 
+ * This rule will convert " select count(*)  as mycount from table "
+ * or " select count(not-nullable-expr) as mycount from table " into
+ * 
+ *Project(mycount)
+ * \
+ *DirectGroupScan ( PojoRecordReader ( rowCount ))
+ *
+ * or " select count(column) as mycount from table " into
+ * 
+ *  Project(mycount)
+ *   \
+ *DirectGroupScan (PojoRecordReader (columnValueCount))
+ *
+ * Rule can be applied if query contains multiple count expressions.
+ * " select count(column1), count(column2), count(*) from table "
+ * 
+ *
+ * 
+ * The rule utilizes the Parquet Metadata Cache's summary information to 
retrieve the total row count
+ * and the per-column null count.  As such, the rule is only applicable for 
Parquet tables and only if the
+ * metadata cache has been created with the summary information.
+ * 
+ */
+public class ConvertCountToDirectScanRule extends RelOptRule {
+
+  public static final RelOptRule AGG_ON_PROJ_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.some(Project.class,
+RelOptHelper.any(TableScan.class))), 
"Agg_on_proj_on_scan:logical");
+
+  public static final RelOptRule AGG_ON_SCAN = new 
ConvertCountToDirectScanRule(
+  RelOptHelper.some(Aggregate.class,
+RelOptHelper.any(TableScan.class)), 
"Agg_on_scan:logical");
+
+  private 

[jira] [Commented] (DRILL-7063) Create separate summary file for schema, totalRowCount, totalNullCount (includes maintenance)

2019-04-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813055#comment-16813055
 ] 

ASF GitHub Bot commented on DRILL-7063:
---

sohami commented on pull request #1723: DRILL-7063: Seperate metadata cache 
file into summary, file metadata
URL: https://github.com/apache/drill/pull/1723
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Create separate summary file for schema, totalRowCount, totalNullCount 
> (includes maintenance)
> -
>
> Key: DRILL-7063
> URL: https://issues.apache.org/jira/browse/DRILL-7063
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Metadata
>Reporter: Venkata Jyothsna Donapati
>Assignee: Venkata Jyothsna Donapati
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>   Original Estimate: 252h
>  Remaining Estimate: 252h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)