[GitHub] drill pull request #982: Fixed queued time calculation

2017-10-09 Thread prasadns14
GitHub user prasadns14 opened a pull request:

https://github.com/apache/drill/pull/982

Fixed queued time calculation

@paul-rogers please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/prasadns14/drill DRILL-5716

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/982.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #982


commit b5be0b5dd91c73a8a0374bd28b0f1232c2e35e00
Author: Prasad Nagaraj Subramanya 
Date:   2017-10-03T03:05:05Z

Fixed queued time calculation




---


[GitHub] drill pull request #928: DRILL-5716: Queue-driven memory allocation

2017-10-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/928


---


[GitHub] drill pull request #965: DRILL-5811 reduced repeated log messages further.

2017-10-09 Thread vrozov
Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/965#discussion_r14359
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/schedule/BlockMapBuilder.java
 ---
@@ -104,12 +104,16 @@ public BlockMapReader(FileStatus status, boolean 
blockify) {
 @Override
 protected List runInner() throws Exception {
   final List work = Lists.newArrayList();
+
+  final Set noDrillbitHosts = logger.isDebugEnabled() ? 
Sets.newHashSet() : null;
--- End diff --

Consider moving `noDillbitHosts` to `BlockMapBuilder` class (use 
`Sets.newConcurrentHashSet()` in this case) as it does not seem to belong to 
`BlockMapReader`. With such change, other changes are not necessary and likely 
this will allow reducing repeated log messages even further. Drop `` 
from `Sets.newHashSet()`.


---


[GitHub] drill pull request #981: DRILL-5854: IllegalStateException when empty batch ...

2017-10-09 Thread ppadma
GitHub user ppadma opened a pull request:

https://github.com/apache/drill/pull/981

DRILL-5854: IllegalStateException when empty batch with valid schema …

…is received

Problem is that merge receiver is reading from the wrong sender when first 
batch is empty from one of the senders.
When first batch is empty, we are continuing without moving to the next 
sender.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ppadma/drill DRILL-5854

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/981.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #981


commit eb25c2f2ec879c369c7deb84a4273bec1fad42cc
Author: Padma Penumarthy 
Date:   2017-10-09T22:05:17Z

DRILL-5854: IllegalStateException when empty batch with valid schema is 
received




---


Re: Implicit columns and maps

2017-10-09 Thread Paul Rogers
As you point out, naming is a separate issue. I believe we inherited names from 
some other system. But, it is an issue that we use “good” names for implicit 
columns. If we add more names “createDate”, “modifcationDate”, “owner”, or 
whatever), we end up breaking someone’s queries that has columns with those 
names.

Would be good to have a prefix, as you suggest ,to separate Drill names from 
user names. As it turns out, Drill supports maps (AKA structs) and arrays, so 
perhaps we could have:

$meta$
|- filename
|- fqn
|- …
|- dir[]

Where “dir” is an array, rather than the current separate scalar dir0, dir1, 
etc.

The above map let’s us add any number of metadata columns without potentially 
breaking existing queries.

Note that we already have a problem where we can hit a hard schema change 
because one reader sees “/a/b/c/foo.csv” while another sees “a/b/bar.csv”, 
resulting in different numbers of “dirx” columns from the two readers.

Yet another issue is that, in wildcard queries (e.g. “SELECT *”), we add all 
implicit columns, then remove them later. We should optimize this case.

But, even if we keep the original names, and defer the other issues, the 
question about map semantics still stands…

- Paul

> On Oct 9, 2017, at 12:06 PM, Boaz Ben-Zvi  wrote:
> 
>How about changing all those “implicit” columns to have some 
> “unconventional” prefix, like an underscore (or two _ _ ); e.g. _suffix, 
> _dir0, etc .
> 
> With such a change we may need to handle the transition of existing users’ 
> code ; e.g., maybe change the priority (mentioned below) so that an existing 
> “suffix” column takes precedence over the implicit one.
> Or just go “cold turkey” and force the users to change.
> 
> Just an idea,
> 
>Boaz  
> 
> On 10/9/17, 10:45 AM, "Paul Rogers"  wrote:
> 
>Hi All,
> 
>Drill provides a set of “implicit” columns to describe files: filename, 
> suffix, fan and filepath. Drill also provides an open-ended set of partition 
> columns: dir0, dir1, dir2, etc.
> 
>Not all readers support the above: some do and some don’t.
> 
>Drill semantics seem to treat these as semi-reserved words when a reader 
> supports implicit columns. If a table has a “suffix” column, then Drill will 
> treat “suffix” as an implicit column, ignoring the table column. If the user 
> wants that table column, they can use a session option to temporarily rename 
> the implicit column. A bit odd, perhaps, but it is our solution.
> 
>What is our desired behavior, however, if the user asks for a column that 
> includes an implicit column as a prefix: “suffix.a”? Clearly, here, “suffix” 
> is a map (i.e. structure) and “a” is a field within that map. Since the 
> implicit “suffix” is never a map, should we:
> 
>1) Assume that, here, “suffix” is a map column projected from the table?
>2) Issue an error?
>3) Ignore the “.a” part and just return “suffix” as an implicit column?
>4) Something else?
> 
>The code is murky on this point because JSON is implemented far 
> differently than text files and so on. Each has its own rules. Do we need 
> consistency of behavior, or is reader-specific behavior the expected design?
> 
>Thanks,
> 
>- Paul
> 
> 
> 



Re: Implicit columns and maps

2017-10-09 Thread Boaz Ben-Zvi
How about changing all those “implicit” columns to have some 
“unconventional” prefix, like an underscore (or two _ _ ); e.g. _suffix, _dir0, 
etc .

With such a change we may need to handle the transition of existing users’ code 
; e.g., maybe change the priority (mentioned below) so that an existing 
“suffix” column takes precedence over the implicit one.
Or just go “cold turkey” and force the users to change.

 Just an idea,

Boaz  

On 10/9/17, 10:45 AM, "Paul Rogers"  wrote:

Hi All,

Drill provides a set of “implicit” columns to describe files: filename, 
suffix, fan and filepath. Drill also provides an open-ended set of partition 
columns: dir0, dir1, dir2, etc.

Not all readers support the above: some do and some don’t.

Drill semantics seem to treat these as semi-reserved words when a reader 
supports implicit columns. If a table has a “suffix” column, then Drill will 
treat “suffix” as an implicit column, ignoring the table column. If the user 
wants that table column, they can use a session option to temporarily rename 
the implicit column. A bit odd, perhaps, but it is our solution.

What is our desired behavior, however, if the user asks for a column that 
includes an implicit column as a prefix: “suffix.a”? Clearly, here, “suffix” is 
a map (i.e. structure) and “a” is a field within that map. Since the implicit 
“suffix” is never a map, should we:

1) Assume that, here, “suffix” is a map column projected from the table?
2) Issue an error?
3) Ignore the “.a” part and just return “suffix” as an implicit column?
4) Something else?

The code is murky on this point because JSON is implemented far differently 
than text files and so on. Each has its own rules. Do we need consistency of 
behavior, or is reader-specific behavior the expected design?

Thanks,

- Paul





[jira] [Resolved] (DRILL-5840) A query that includes sort completes, and then loses Drill connection. Drill becomes unresponsive, and cannot restart because it cannot communicate with Zookeeper

2017-10-09 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou resolved DRILL-5840.
---
Resolution: Not A Problem

> A query that includes sort completes, and then loses Drill connection. Drill 
> becomes unresponsive, and cannot restart because it cannot communicate with 
> Zookeeper
> --
>
> Key: DRILL-5840
> URL: https://issues.apache.org/jira/browse/DRILL-5840
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
>
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> select count(*) from (select * from 
> dfs.`/drill/testdata/resource-manager/250wide.tbl` order by columns[0])d 
> where d.columns[0] = 'ljdfhwuehnoiueyf';
> {noformat}
> Query tries to complete, but cannot.  It takes 20 hours from the time the 
> query tries to complete, to the time Drill finally loses its connection.
> From the drillbit.log:
> {noformat}
> 2017-10-03 16:28:14,892 [262bec7f-3539-0dd7-6fea-f2959f9df3b6:frag:0:0] DEBUG 
> o.a.drill.exec.work.foreman.Foreman - 262bec7f-3539-0dd7-6fea-f2959f9df3b6: 
> State change requested RUNNING --> COMPLETED
> 2017-10-04 01:47:27,698 [UserServer-1] DEBUG 
> o.a.d.e.r.u.UserServerRequestHandler - Received query to run.  Returning 
> query handle.
> 2017-10-04 03:30:02,916 [262bec7f-3539-0dd7-6fea-f2959f9df3b6:frag:0:0] WARN  
> o.a.d.exec.work.foreman.QueryManager - Failure while trying to delete the 
> estore profile for this query.
> org.apache.drill.common.exceptions.DrillRuntimeException: unable to delete 
> node at /running/262bec7f-3539-0dd7-6fea-f2959f9df3b6
>   at 
> org.apache.drill.exec.coord.zk.ZookeeperClient.delete(ZookeeperClient.java:343)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.coord.zk.ZkEphemeralStore.remove(ZkEphemeralStore.java:108)
>  ~[drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager.updateEphemeralState(QueryManager.java:293)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman.recordNewState(Foreman.java:1043) 
> [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:964) 
> [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman.access$2600(Foreman.java:113) 
> [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1025)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:1018)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.common.EventProcessor.processEvents(EventProcessor.java:107) 
> [drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:65) 
> [drill-common-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.addEvent(Foreman.java:1020)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.Foreman.addToEventQueue(Foreman.java:1038) 
> [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager.nodeComplete(QueryManager.java:498)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager.access$100(QueryManager.java:66)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager$NodeTracker.fragmentComplete(QueryManager.java:462)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager.fragmentDone(QueryManager.java:147)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager.access$400(QueryManager.java:66)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.work.foreman.QueryManager$1.statusUpdate(QueryManager.java:525)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> org.apache.drill.exec.rpc.control.WorkEventBus.statusUpdate(WorkEventBus.java:71)
>  [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
>   at 
> 

Implicit columns and maps

2017-10-09 Thread Paul Rogers
Hi All,

Drill provides a set of “implicit” columns to describe files: filename, suffix, 
fan and filepath. Drill also provides an open-ended set of partition columns: 
dir0, dir1, dir2, etc.

Not all readers support the above: some do and some don’t.

Drill semantics seem to treat these as semi-reserved words when a reader 
supports implicit columns. If a table has a “suffix” column, then Drill will 
treat “suffix” as an implicit column, ignoring the table column. If the user 
wants that table column, they can use a session option to temporarily rename 
the implicit column. A bit odd, perhaps, but it is our solution.

What is our desired behavior, however, if the user asks for a column that 
includes an implicit column as a prefix: “suffix.a”? Clearly, here, “suffix” is 
a map (i.e. structure) and “a” is a field within that map. Since the implicit 
“suffix” is never a map, should we:

1) Assume that, here, “suffix” is a map column projected from the table?
2) Issue an error?
3) Ignore the “.a” part and just return “suffix” as an implicit column?
4) Something else?

The code is murky on this point because JSON is implemented far differently 
than text files and so on. Each has its own rules. Do we need consistency of 
behavior, or is reader-specific behavior the expected design?

Thanks,

- Paul



Re: [DISCUSS] Drill 1.12.0 release

2017-10-09 Thread Paul Rogers
Hi Arina,

In addition to my own PRs, there are several in the “active” queue that we 
could get in if we can just push them over the line and clear the queue. The 
owners of the PRs should check if we are waiting on them to take action.

977 DRILL-5849: Add freemarker lib to dependencyManagement to ensure prop…
976 DRILL-5797: Choose parquet reader from read columns
975 DRILL-5743: Handling column family and column scan for hbase
973 DRILL-5775: Select * query on a maprdb binary table fails
972 DRILL-5838: Fix MaprDB filter pushdown for the case of nested field 
(reg. of DRILL-4264)
950 Drill 5431: SSL Support
949 DRILL-5795: Parquet Filter push down at rowgroup level
936 DRILL-5772: Add unit tests to indicate how utf-8 support can be enabled 
/ disabled in Drill
904 DRILL-5717: change some date time test cases with specific timezone or 
Local
892 DRILL-5645: negation of expression causes null pointer exception
889 DRILL-5691: enhance scalar sub queries checking for the cartesian join

(Items not on the list above have become “inactive” for a variety of reasons.)

Thanks,

- Paul

> On Oct 9, 2017, at 9:57 AM, Paul Rogers  wrote:
> 
> Hi Arina,
> 
> I’d like to include the following that are needed to finish up the “managed” 
> sort and spill-to-disk for hash agg:
> 
> #928: DRILL-5716: Queue-driven memory allocation 
> #958, DRILL-5808: Reduce memory allocator strictness for "managed" operators 
> #960, DRILL-5815: Option to set query memory as percent of total 
> 
> The following is needed to resolve issues with HBase support in empty batches:
> 
> #968, DRILL-5830: Resolve regressions to MapR DB from DRILL-5546 
> 
> The following are nice-to-haves that build on work already done in this 
> release, and that some of my own work depends on:
> 
> #970, DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather 
> than mock 
> #978: DRILL-5842: Refactor and simplify the fragment, operator contexts for 
> testing
> 
> The following is not needed for 1.12 per-se, but is the foundation for a 
> project I’m working on; would be good to get this in after 2-3 months of 
> review time:
> 
> #921, foundation for batch size limitation
> 
> The key issue with each of the above is that they each need a committer to 
> review. Some have reviews from non-committers. Any volunteers?
> 
> Thanks,
> 
> - Paul
> 
>> On Oct 9, 2017, at 9:38 AM, Charles Givre  wrote:
>> 
>> Hi Arina,
>> I’d like to include Drill-5834, Adding network functions.
>> https://github.com/apache/drill/pull/971 
>> 
>> 
>> Hopefully I didn’t violate too many coding standards this time ;-)
>> —C
>> 
>>> On Oct 9, 2017, at 9:09 AM, Arina Yelchiyeva  
>>> wrote:
>>> 
>>> I want to include DRILL-5337 (OpenTSDB plugin), I'll try to finish code
>>> review during this week.
>>> 
>>> Kind regards
>>> Arina
>>> 
>>> On Mon, Oct 9, 2017 at 4:08 PM, Arina Ielchiieva  wrote:
>>> 
 Hi Drillers,
 
 It's been several months since the last release and it is time to do the
 next one. I am volunteering to be the release manager.
 
 If there are any issues on which work is in progress, that you feel we
 *must* include in the release, please post in reply to this thread. Based
 on your input we'll define release cut off date.
 
 Kind regards
 Arina
 
>> 
> 



[GitHub] drill issue #949: DRILL-5795: Parquet Filter push down at rowgroup level

2017-10-09 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/949
  
This change causes one of our functional tests to fail. We will have to 
track down the issue and either update the test, or post the problem here.


---


Re: [DISCUSS] Drill 1.12.0 release

2017-10-09 Thread Paul Rogers
Hi Arina,

I’d like to include the following that are needed to finish up the “managed” 
sort and spill-to-disk for hash agg:

#928: DRILL-5716: Queue-driven memory allocation 
#958, DRILL-5808: Reduce memory allocator strictness for "managed" operators 
#960, DRILL-5815: Option to set query memory as percent of total 

The following is needed to resolve issues with HBase support in empty batches:

#968, DRILL-5830: Resolve regressions to MapR DB from DRILL-5546 

The following are nice-to-haves that build on work already done in this 
release, and that some of my own work depends on:

#970, DRILL-5832: Migrate OperatorFixture to use SystemOptionManager rather 
than mock 
#978: DRILL-5842: Refactor and simplify the fragment, operator contexts for 
testing

The following is not needed for 1.12 per-se, but is the foundation for a 
project I’m working on; would be good to get this in after 2-3 months of review 
time:

#921, foundation for batch size limitation

The key issue with each of the above is that they each need a committer to 
review. Some have reviews from non-committers. Any volunteers?

Thanks,

- Paul

> On Oct 9, 2017, at 9:38 AM, Charles Givre  wrote:
> 
> Hi Arina,
> I’d like to include Drill-5834, Adding network functions.
> https://github.com/apache/drill/pull/971 
> 
> 
> Hopefully I didn’t violate too many coding standards this time ;-)
> —C
> 
>> On Oct 9, 2017, at 9:09 AM, Arina Yelchiyeva  
>> wrote:
>> 
>> I want to include DRILL-5337 (OpenTSDB plugin), I'll try to finish code
>> review during this week.
>> 
>> Kind regards
>> Arina
>> 
>> On Mon, Oct 9, 2017 at 4:08 PM, Arina Ielchiieva  wrote:
>> 
>>> Hi Drillers,
>>> 
>>> It's been several months since the last release and it is time to do the
>>> next one. I am volunteering to be the release manager.
>>> 
>>> If there are any issues on which work is in progress, that you feel we
>>> *must* include in the release, please post in reply to this thread. Based
>>> on your input we'll define release cut off date.
>>> 
>>> Kind regards
>>> Arina
>>> 
> 



Re: [DISCUSS] Drill 1.12.0 release

2017-10-09 Thread Charles Givre
Hi Arina,
I’d like to include Drill-5834, Adding network functions.
https://github.com/apache/drill/pull/971 


Hopefully I didn’t violate too many coding standards this time ;-)
—C

> On Oct 9, 2017, at 9:09 AM, Arina Yelchiyeva  
> wrote:
> 
> I want to include DRILL-5337 (OpenTSDB plugin), I'll try to finish code
> review during this week.
> 
> Kind regards
> Arina
> 
> On Mon, Oct 9, 2017 at 4:08 PM, Arina Ielchiieva  wrote:
> 
>> Hi Drillers,
>> 
>> It's been several months since the last release and it is time to do the
>> next one. I am volunteering to be the release manager.
>> 
>> If there are any issues on which work is in progress, that you feel we
>> *must* include in the release, please post in reply to this thread. Based
>> on your input we'll define release cut off date.
>> 
>> Kind regards
>> Arina
>> 



[GitHub] drill issue #973: DRILL-5775: Select * query on a maprdb binary table fails

2017-10-09 Thread vdiravka
Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/973
  
@paul-rogers That changes connected to MapR-DB JSON tables, but this PR 
connected to MapR-DB binary tables. And looks like the functionality of these 
PRs are not crossed.

The branch is rebased into last master  version.


---


Re: [DISCUSS] Drill 1.12.0 release

2017-10-09 Thread Arina Yelchiyeva
I want to include DRILL-5337 (OpenTSDB plugin), I'll try to finish code
review during this week.

Kind regards
Arina

On Mon, Oct 9, 2017 at 4:08 PM, Arina Ielchiieva  wrote:

> Hi Drillers,
>
> It's been several months since the last release and it is time to do the
> next one. I am volunteering to be the release manager.
>
> If there are any issues on which work is in progress, that you feel we
> *must* include in the release, please post in reply to this thread. Based
> on your input we'll define release cut off date.
>
> Kind regards
> Arina
>


[DISCUSS] Drill 1.12.0 release

2017-10-09 Thread Arina Ielchiieva
Hi Drillers,

It's been several months since the last release and it is time to do the
next one. I am volunteering to be the release manager.

If there are any issues on which work is in progress, that you feel we
*must* include in the release, please post in reply to this thread. Based
on your input we'll define release cut off date.

Kind regards
Arina


[GitHub] drill issue #892: DRILL-5645: negation of expression causes null pointer exc...

2017-10-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/892
  
+1, LGTM.


---


[GitHub] drill issue #980: DRILL-5857: Fix NumberFormatException in Hive unit tests

2017-10-09 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/980
  
+1, LGTM.


---


[GitHub] drill pull request #980: DRILL-5857: Fix NumberFormatException in Hive unit ...

2017-10-09 Thread vvysotskyi
GitHub user vvysotskyi opened a pull request:

https://github.com/apache/drill/pull/980

DRILL-5857: Fix NumberFormatException in Hive unit tests

There is no unit test since with or without this change tests are passes 
and query plan does not change. 

The exception that has been appeared caught and wrote into the logs.
After the check on [this 
line](https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveMetadataProvider.java#L90),
 correct stats was created calling 
[getStatsEstimateFromInputSplits()](https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveMetadataProvider.java#L95)
 method.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vvysotskyi/drill DRILL-5857

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/980.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #980


commit 24be497469a976fb51c16c0474774229e44fcc71
Author: Volodymyr Vysotskyi 
Date:   2017-10-09T14:28:42Z

DRILL-5857: Fix NumberFormatException in Hive unit tests




---


[jira] [Created] (DRILL-5857) Fix NumberFormatException in Hive unit tests

2017-10-09 Thread Volodymyr Vysotskyi (JIRA)
Volodymyr Vysotskyi created DRILL-5857:
--

 Summary: Fix NumberFormatException in Hive unit tests
 Key: DRILL-5857
 URL: https://issues.apache.org/jira/browse/DRILL-5857
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Hive
Reporter: Volodymyr Vysotskyi
Assignee: Volodymyr Vysotskyi
Priority: Minor


Though all unit tests pass, it seems that number of hive tests have errors:
{noformat}
11:40:02.558 [262831fd-444d-57ee-b528-8a01a6a1c0a1:foreman] ERROR 
o.a.d.e.s.hive.HiveMetadataProvider - Failed to parse Hive stats in metastore.
java.lang.NumberFormatException: null
at java.lang.Long.parseLong(Long.java:404) ~[na:1.7.0_131]
at java.lang.Long.valueOf(Long.java:540) ~[na:1.7.0_131]
at 
org.apache.drill.exec.store.hive.HiveMetadataProvider.getStatsFromProps(HiveMetadataProvider.java:211)
 [classes/:na]
at 
org.apache.drill.exec.store.hive.HiveMetadataProvider.getStats(HiveMetadataProvider.java:100)
 [classes/:na]
at 
org.apache.drill.exec.store.hive.HiveScan.getScanStats(HiveScan.java:229) 
[classes/:na]
at 
org.apache.drill.exec.physical.base.AbstractGroupScan.getScanStats(AbstractGroupScan.java:79)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.logical.DrillScanRel.computeSelfCost(DrillScanRel.java:159)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.calcite.rel.metadata.RelMdPercentageOriginalRows.getNonCumulativeCost(RelMdPercentageOriginalRows.java:162)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost_$(Unknown 
Source) [janino-2.7.6.jar:na]
at 
GeneratedMetadataHandler_NonCumulativeCost.getNonCumulativeCost(Unknown Source) 
[janino-2.7.6.jar:na]
at 
org.apache.calcite.rel.metadata.RelMetadataQuery.getNonCumulativeCost(RelMetadataQuery.java:258)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.getCost(VolcanoPlanner.java:1122)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements0(RelSubset.java:365)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.plan.volcano.RelSubset.propagateCostImprovements(RelSubset.java:348)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.addRelToSet(VolcanoPlanner.java:1840)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1772)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:1026)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1046)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:1953)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.transformTo(VolcanoRuleCall.java:138)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:213) 
[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.drill.exec.planner.logical.DrillPushProjIntoScan.onMatch(DrillPushProjIntoScan.java:90)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:228)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:811)
 [calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:310) 
[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:401)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:343)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:242)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:292)
 [drill-java-exec-1.12.0-SNAPSHOT.jar:1.12.0-SNAPSHOT]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:169)
 

[GitHub] drill pull request #976: DRILL-5797: Choose parquet reader from read columns

2017-10-09 Thread dprofeta
Github user dprofeta commented on a diff in the pull request:

https://github.com/apache/drill/pull/976#discussion_r143403559
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
 ---
@@ -156,18 +160,39 @@ public ScanBatch getBatch(FragmentContext context, 
ParquetRowGroupScan rowGroupS
 return new ScanBatch(rowGroupScan, context, oContext, readers, 
implicitColumns);
   }
 
-  private static boolean isComplex(ParquetMetadata footer) {
-MessageType schema = footer.getFileMetaData().getSchema();
+  private static boolean isComplex(ParquetMetadata footer, 
List columns) {
+if (Utilities.isStarQuery(columns)) {
+  MessageType schema = footer.getFileMetaData().getSchema();
 
-for (Type type : schema.getFields()) {
-  if (!type.isPrimitive()) {
-return true;
+  for (Type type : schema.getFields()) {
+if (!type.isPrimitive()) {
+  return true;
+}
   }
-}
-for (ColumnDescriptor col : schema.getColumns()) {
-  if (col.getMaxRepetitionLevel() > 0) {
-return true;
+  for (ColumnDescriptor col : schema.getColumns()) {
+if (col.getMaxRepetitionLevel() > 0) {
+  return true;
+}
+  }
+  return false;
+} else {
+  for (SchemaPath column : columns) {
+if (isColumnComplex(footer.getFileMetaData().getSchema(), column)) 
{
+  return true;
+}
   }
+  return false;
+}
+  }
+
+  private static boolean isColumnComplex(GroupType grouptype, SchemaPath 
column) {
+PathSegment.NameSegment root = column.getRootSegment();
+if (!grouptype.containsField(root.getPath().toLowerCase())) {
+  return false;
+}
+Type type = grouptype.getType(root.getPath().toLowerCase());
+if (type.isRepetition(Type.Repetition.REPEATED) || 
!type.isPrimitive()) {
--- End diff --

Yes, sure. I wanted to check it in a loop first, but ParquetRecordReader 
doesn't handle any nested type, so the loop is not needed now. But I didn't 
refactor enough.


---


[GitHub] drill pull request #976: DRILL-5797: Choose parquet reader from read columns

2017-10-09 Thread dprofeta
Github user dprofeta commented on a diff in the pull request:

https://github.com/apache/drill/pull/976#discussion_r143403657
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
 ---
@@ -156,18 +160,39 @@ public ScanBatch getBatch(FragmentContext context, 
ParquetRowGroupScan rowGroupS
 return new ScanBatch(rowGroupScan, context, oContext, readers, 
implicitColumns);
   }
 
-  private static boolean isComplex(ParquetMetadata footer) {
-MessageType schema = footer.getFileMetaData().getSchema();
+  private static boolean isComplex(ParquetMetadata footer, 
List columns) {
+if (Utilities.isStarQuery(columns)) {
--- End diff --

Added in a new commit.


---


[GitHub] drill pull request #976: DRILL-5797: Choose parquet reader from read columns

2017-10-09 Thread dprofeta
Github user dprofeta commented on a diff in the pull request:

https://github.com/apache/drill/pull/976#discussion_r143403232
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetScanBatchCreator.java
 ---
@@ -156,18 +160,39 @@ public ScanBatch getBatch(FragmentContext context, 
ParquetRowGroupScan rowGroupS
 return new ScanBatch(rowGroupScan, context, oContext, readers, 
implicitColumns);
   }
 
-  private static boolean isComplex(ParquetMetadata footer) {
-MessageType schema = footer.getFileMetaData().getSchema();
+  private static boolean isComplex(ParquetMetadata footer, 
List columns) {
+if (Utilities.isStarQuery(columns)) {
+  MessageType schema = footer.getFileMetaData().getSchema();
 
-for (Type type : schema.getFields()) {
-  if (!type.isPrimitive()) {
-return true;
+  for (Type type : schema.getFields()) {
+if (!type.isPrimitive()) {
+  return true;
+}
   }
-}
-for (ColumnDescriptor col : schema.getColumns()) {
-  if (col.getMaxRepetitionLevel() > 0) {
-return true;
+  for (ColumnDescriptor col : schema.getColumns()) {
+if (col.getMaxRepetitionLevel() > 0) {
+  return true;
+}
+  }
+  return false;
+} else {
+  for (SchemaPath column : columns) {
+if (isColumnComplex(footer.getFileMetaData().getSchema(), column)) 
{
+  return true;
+}
   }
+  return false;
+}
+  }
+
+  private static boolean isColumnComplex(GroupType grouptype, SchemaPath 
column) {
+PathSegment.NameSegment root = column.getRootSegment();
+if (!grouptype.containsField(root.getPath().toLowerCase())) {
+  return false;
+}
+Type type = grouptype.getType(root.getPath().toLowerCase());
--- End diff --

ok, for `getType()`. It throws an exception so I will catch it.
I don't see any `getName()` in the `SchemaPath` / `PathSegment` class. Can 
you tell me which `getName()` you mean?


---