[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

2018-01-31 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165263759
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -215,6 +206,7 @@ public BatchHolder() {
   MaterializedField outputField = materializedValueFields[i];
   // Create a type-specific ValueVector for this value
   vector = TypeHelper.getNewVector(outputField, allocator);
+  int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize;
--- End diff --

I can think of three reasons to use the sizer:

* Type logic is complex: we have multiple sets of rules depending on the 
data type. Best to encapsulate the logic in a single place. So, either 1) use 
the "sizer", or 2) move the logic from the "sizer" to a common utility.
* Column size is tricky as it depends on `DataMode`. The size or a 
`Required INT` is 4. The (total memory) size of an `Optional INT` is 5. For a 
`Repeated INT`? You need to know the average array cardinality, which the 
"sizer" provides (by analyzing an input batch.)
* As discussed, variable-width columns (`VARCHAR`, `VARBINARY` for HBase) 
have no known size. We really have to completely forget about that awful "50" 
estimate. We can only estimate size from input, which is, again, what the 
"sizer" does.

Of course, all the above only works I you actually sample the input.

A current limitation (and good enhancement) is that the Sizer is aware of 
just one batch. The sort (the first user of the "sizer") needed only aggregate 
row size, so it just kept track of the widest row ever seen. If you need 
detailed column information, you may want another layer: one that aggregates 
information across batches. (For arrays and variable-width columns, you can 
take the weighted average or the maximum depending on your needs.)

Remember, if the purpose of this number is to estimate memory use, then you 
have to add a 33% (average) allowance for internal fragmentation. (Each vector 
is, on average, 75% full.)


---


Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-31 Thread Paul Rogers
Thanks everyone! Glad to be of service.
Drill fills a unique niche: the ability to run distributed SQL queries on a 
large variety of data sources. I look forward to doing my part to bring that to 
a wider audience.
- Paul

 

On Wednesday, January 31, 2018, 7:21:06 PM PST, Saurabh Mahapatra 
 wrote:  
 
 Congrats Paul!

On January 31, 2018, at 3:52 PM, Vlad Rozov  wrote:

Congrats Paul!

Thank you,

Vlad

On 1/31/18 04:37, Vitalii Diravka wrote:
> Congratulations, Paul!
> Well deserved.
>
> Kind regards
> Vitalii
>
> On Wed, Jan 31, 2018 at 9:58 AM, Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
>
>> Congratulations, Paul!
>> Well deserved.
>>
>> Kind regards
>> Arina
>>
>> On Wed, Jan 31, 2018 at 10:13 AM, Robert Hou  wrote:
>>
>>> Congratulations, Paul!
>>>
>>>
>>> --Robert
>>>
>>> 
>>> From: Abhishek Girish 
>>> Sent: Tuesday, January 30, 2018 9:31 PM
>>> To: dev@drill.apache.org
>>> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>>>
>>> Congratulations, Paul!
>>>
>>> On Tue, Jan 30, 2018 at 2:48 PM, Sorabh Hamirwasia >>
>>> wrote:
>>>
 Congratulations Paul!


 Thanks,
 Sorabh

 
 From: AnilKumar B 
 Sent: Tuesday, January 30, 2018 2:43:07 PM
 To: dev@drill.apache.org
 Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers

 Congratulations, Paul.

 Thanks & Regards,
 B Anil Kumar.

 On Tue, Jan 30, 2018 at 2:34 PM, Chunhui Shi  wrote:

> Congrats Paul! Well deserved!
>
> 
> From: Kunal Khatua 
> Sent: Tuesday, January 30, 2018 2:05:56 PM
> To: dev@drill.apache.org
> Subject: RE: [ANNOUNCE] New PMC member: Paul Rogers
>
> Congratulations, Paul !
>
> -Original Message-
> From: salim achouche [mailto:sachouc...@gmail.com]
> Sent: Tuesday, January 30, 2018 2:00 PM
> To: dev@drill.apache.org; Padma Penumarthy 
> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>
> Congrats Paul!
>
> Regards,
> Salim
>
>> On Jan 30, 2018, at 1:58 PM, Padma Penumarthy <
>> ppenumar...@mapr.com>
> wrote:
>> Congratulations Paul.
>>
>> Thanks
>> Padma
>>
>>
>>> On Jan 30, 2018, at 1:55 PM, Gautam Parai 
>> wrote:
>>> Congratulations Paul!
>>>
>>> 
>>> From: Timothy Farkas 
>>> Sent: Tuesday, January 30, 2018 1:54:43 PM
>>> To: dev@drill.apache.org
>>> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>>>
>>> Congrats!
>>>
>>> 
>>> From: Aman Sinha 
>>> Sent: Tuesday, January 30, 2018 1:50:07 PM
>>> To: dev@drill.apache.org
>>> Subject: [ANNOUNCE] New PMC member: Paul Rogers
>>>
>>> I am pleased to announce that Drill PMC invited Paul Rogers to the
>>> PMC and he has accepted the invitation.
>>>
>>> Congratulations Paul and thanks for your contributions !
>>>
>>> -Aman
>>> (on behalf of Drill PMC)
>

  

[GitHub] drill issue #1106: DRILL-6129: Fixed query failure due to nested column data...

2018-01-31 Thread paul-rogers
Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1106
  
Note that a similar bug was recently fixed in (as I recall) the Merge 
Receiver. As part of this fix, would be good to either:

1. Determine if we have more copies of this logic besides the Merge 
Receiver (previously fixed) and the client code (fixed here.)
2. Refactor the code so that all use cases use a common set of code for 
this task.

In any event, would be good to compare this code with that done in the 
Merge Receiver to ensure that we are using a common approach. See 
`exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java` in 
PR #968.

The two sets of code appear similar, depending on what `isSameSchema()` 
does with a list of `MaterializedField`s. But, please take a look.


---


[GitHub] drill pull request #1103: DRILL-6124: Fixed possible NPE when no injection s...

2018-01-31 Thread ilooner
Github user ilooner closed the pull request at:

https://github.com/apache/drill/pull/1103


---


[GitHub] drill issue #1103: DRILL-6124: Fixed possible NPE when no injection site is ...

2018-01-31 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1103
  
You are right @arina-ielchiieva . Thanks for catching this, I will close 
the PR and mark the jira as invalid.


---


Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-31 Thread Saurabh Mahapatra
Congrats Paul!

On January 31, 2018, at 3:52 PM, Vlad Rozov  wrote:

Congrats Paul!

Thank you,

Vlad

On 1/31/18 04:37, Vitalii Diravka wrote:
> Congratulations, Paul!
> Well deserved.
>
> Kind regards
> Vitalii
>
> On Wed, Jan 31, 2018 at 9:58 AM, Arina Yelchiyeva <
> arina.yelchiy...@gmail.com> wrote:
>
>> Congratulations, Paul!
>> Well deserved.
>>
>> Kind regards
>> Arina
>>
>> On Wed, Jan 31, 2018 at 10:13 AM, Robert Hou  wrote:
>>
>>> Congratulations, Paul!
>>>
>>>
>>> --Robert
>>>
>>> 
>>> From: Abhishek Girish 
>>> Sent: Tuesday, January 30, 2018 9:31 PM
>>> To: dev@drill.apache.org
>>> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>>>
>>> Congratulations, Paul!
>>>
>>> On Tue, Jan 30, 2018 at 2:48 PM, Sorabh Hamirwasia >>
>>> wrote:
>>>
 Congratulations Paul!


 Thanks,
 Sorabh

 
 From: AnilKumar B 
 Sent: Tuesday, January 30, 2018 2:43:07 PM
 To: dev@drill.apache.org
 Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers

 Congratulations, Paul.

 Thanks & Regards,
 B Anil Kumar.

 On Tue, Jan 30, 2018 at 2:34 PM, Chunhui Shi  wrote:

> Congrats Paul! Well deserved!
>
> 
> From: Kunal Khatua 
> Sent: Tuesday, January 30, 2018 2:05:56 PM
> To: dev@drill.apache.org
> Subject: RE: [ANNOUNCE] New PMC member: Paul Rogers
>
> Congratulations, Paul !
>
> -Original Message-
> From: salim achouche [mailto:sachouc...@gmail.com]
> Sent: Tuesday, January 30, 2018 2:00 PM
> To: dev@drill.apache.org; Padma Penumarthy 
> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>
> Congrats Paul!
>
> Regards,
> Salim
>
>> On Jan 30, 2018, at 1:58 PM, Padma Penumarthy <
>> ppenumar...@mapr.com>
> wrote:
>> Congratulations Paul.
>>
>> Thanks
>> Padma
>>
>>
>>> On Jan 30, 2018, at 1:55 PM, Gautam Parai 
>> wrote:
>>> Congratulations Paul!
>>>
>>> 
>>> From: Timothy Farkas 
>>> Sent: Tuesday, January 30, 2018 1:54:43 PM
>>> To: dev@drill.apache.org
>>> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>>>
>>> Congrats!
>>>
>>> 
>>> From: Aman Sinha 
>>> Sent: Tuesday, January 30, 2018 1:50:07 PM
>>> To: dev@drill.apache.org
>>> Subject: [ANNOUNCE] New PMC member: Paul Rogers
>>>
>>> I am pleased to announce that Drill PMC invited Paul Rogers to the
>>> PMC and he has accepted the invitation.
>>>
>>> Congratulations Paul and thanks for your contributions !
>>>
>>> -Aman
>>> (on behalf of Drill PMC)
>



[GitHub] drill issue #1106: DRILL-6129: Fixed query failure due to nested column data...

2018-01-31 Thread priteshm
Github user priteshm commented on the issue:

https://github.com/apache/drill/pull/1106
  
@amansinha100 can you please review it?


---


[GitHub] drill pull request #1106: DRILL-6129: Fixed query failure due to nested colu...

2018-01-31 Thread sachouche
GitHub user sachouche opened a pull request:

https://github.com/apache/drill/pull/1106

DRILL-6129: Fixed query failure due to nested column data type change

Problem Description -
- The Drillbit was able to successfully send batches containing different 
metadata (for nested columns)
- This was the case when one or multiple scanners were involved
- The issue happened within the client where value vectors are cached 
across batches
- The load(...) API is responsible for updating values vectors when a new 
batch arrives
- The RecordBatchLoader class is used to detect schema changes ; if this is 
the case, then previous value vectors are discarded and new ones created
- There is a bug with the current implementation where only first level 
columns are compared

Fix -
- The fix is to improve the schema diff logic by including nested columns

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachouche/drill DRILL-6129

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1106.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1106


commit 9ffb41f509cd2531e7f3cdf89a66605ec0fdf7a4
Author: Salim Achouche 
Date:   2018-02-01T02:59:58Z

DRILL-6129: Fixed query failure due to nested column data type change




---


Build failed in Jenkins: drill-scm #930

2018-01-31 Thread Apache Jenkins Server
See 

Changes:

[bben-zvi] DRILL-6106: Use valueOf method instead of constructor since valueOf 
has

--
[...truncated 731.62 KB...]
Downloading: 
http://repo.dremio.com/release/com/stumbleupon/async/1.4.1/async-1.4.1.jar
Downloading: 
http://repo.dremio.com/release/org/apache/kudu/interface-annotations/1.3.0/interface-annotations-1.3.0.jar
 Downloading: 
http://repository.mapr.com/nexus/content/repositories/drill/org/apache/kudu/kudu-client/1.3.0/kudu-client-1.3.0.jar
Downloading: 
http://repository.mapr.com/nexus/content/repositories/drill/org/apache/kudu/interface-annotations/1.3.0/interface-annotations-1.3.0.jar
Downloading: 
http://repository.mapr.com/nexus/content/repositories/drill/com/stumbleupon/async/1.4.1/async-1.4.1.jar
 Downloading: 
https://repo.maven.apache.org/maven2/org/apache/kudu/kudu-client/1.3.0/kudu-client-1.3.0.jar
Downloading: 
https://repo.maven.apache.org/maven2/org/apache/kudu/interface-annotations/1.3.0/interface-annotations-1.3.0.jar
3/7345 KB   6/7345 KB   9/7345 KB   9/7345 KB   Downloading: 
https://repo.maven.apache.org/maven2/com/stumbleupon/async/1.4.1/async-1.4.1.jar
12/7345 KB   15/7345 KB   15/7345 KB   18/7345 KB   19/7345 KB   22/7345 KB   
23/7345 KB   26/7345 KB   27/7345 KB   30/7345 KB   31/7345 KB   34/7345 KB   
37/7345 KB   37/7345 KB   3/21 KB   39/7345 KB   3/21 KB   39/7345 KB   5/21 KB 
  40/7345 KB   5/21 KB   40/7345 KB   8/21 KB   43/7345 KB   8/21 KB   43/7345 
KB   11/21 KB   46/7345 KB   11/21 KB   46/7345 KB   13/21 KB   47/7345 KB   
13/21 KB   47/7345 KB   16/21 KB   50/7345 KB   16/21 KB   50/7345 KB   19/21 
KB   53/7345 KB   19/21 KB   53/7345 KB   21/21 KB   53/7345 KB   21/21 KB   
55/7345 KB   21/21 KB   58/7345 KB   21/21 KB   61/7345 KB   21/21 KB   63/7345 
KB   21/21 KB   66/7345 KB   21/21 KB   67/7345 KB   21/21 KB   70/7345 KB   
21/21 KB   73/7345 KB   21/21 KB   75/7345 KB   21/21 KB   78/7345 KB   21/21 
KB   81/7345 KB   21/21 KB   83/7345 KB   21/21 KB   87/7345 KB   21/21 KB   
91/7345 KB   21/21 KB   95/7345 KB   21/21 KB   99/7345 KB   21/21 KB   99/7345 
KB   21/21 KB   3/18 KB   103/7345 KB   21/21 KB   3/18 KB   103/7345 KB   
21/21 KB   5/18 KB   106/7345 KB   21/21 KB   5/18 KB   106/7345 KB   21/21 KB  
 8/18 KB   109/7345 KB   21/21 KB   8/18 KB   109/7345 KB   21/21 KB   11/18 KB 
  111/7345 KB   21/21 KB   11/18 KB   111/7345 KB   21/21 KB   13/18 KB   
111/7345 KB   21/21 KB   16/18 KB   114/7345 KB   21/21 KB   16/18 KB   
114/7345 KB   21/21 KB   18/18 KB   117/7345 KB   21/21 KB   18/18 KB   
119/7345 KB   21/21 KB   18/18 KB   122/7345 KB   21/21 KB   18/18 KB   
125/7345 KB   21/21 KB   18/18 KB   127/7345 KB   21/21 KB   18/18 KB   
131/7345 KB   21/21 KB   18/18 KB   
135/7345 KB   18/18 KB  Downloaded: 
https://repo.maven.apache.org/maven2/org/apache/kudu/interface-annotations/1.3.0/interface-annotations-1.3.0.jar
 (21 KB at 743.3 KB/sec)
139/7345 KB   18/18 KB   143/7345 KB   18/18 KB   147/7345 KB   18/18 KB   
150/7345 KB   18/18 KB   151/7345 KB   18/18 KB   154/7345 KB   18/18 KB   
157/7345 KB   18/18 KB   159/7345 KB   18/18 KB   162/7345 KB   18/18 KB   
163/7345 KB   18/18 KB   166/7345 KB   18/18 KB   167/7345 KB   18/18 KB   
170/7345 KB   18/18 KB   171/7345 KB   18/18 KB   174/7345 KB   18/18 KB   
177/7345 KB   18/18 KB   179/7345 KB   18/18 KB   182/7345 KB   18/18 KB   
183/7345 KB   18/18 KB   186/7345 KB   18/18 KB   187/7345 KB   18/18 KB   
190/7345 KB   18/18 KB   191/7345 KB   18/18 KB
Downloaded: 
https://repo.maven.apache.org/maven2/com/stumbleupon/async/1.4.1/async-1.4.1.jar
 (18 KB at 706.0 KB/sec)
194/7345 KB  197/7345 KB   199/7345 KB   202/7345 KB   205/7345 KB  
 207/7345 KB   211/7345 KB   215/7345 KB   219/7345 KB   223/7345 KB   227/7345 
KB   231/7345 KB   235/7345 KB   239/7345 KB   243/7345 KB   247/7345 KB   
250/7345 KB   253/7345 KB   255/7345 KB   258/7345 KB   259/7345 KB   262/7345 
KB   265/7345 KB   267/7345 KB   270/7345 KB   271/7345 KB   274/7345 KB   
277/7345 KB   279/7345 KB   282/7345 KB   283/7345 KB   286/7345 KB   289/7345 
KB   291/7345 KB   294/7345 KB   295/7345 KB   298/7345 KB   301/7345 KB   
303/7345 KB   306/7345 KB   309/7345 KB   311/7345 KB   315/7345 KB   319/7345 
KB   323/7345 KB   327/7345 KB   331/7345 KB   335/7345 KB   339/7345 KB   
340/7345 KB   344/7345 KB   347/7345 KB   351/7345 KB   355/7345 KB   359/7345 
KB   363/7345 KB   367/7345 KB   371/7345 KB   375/7345 KB   379/7345 KB   
382/7345 KB   385/7345 KB   387/7345 KB   390/7345 KB   393/7345 KB   395/7345 
KB   398/7345 KB   401/7345 KB   403/7345 KB   406/7345 KB   409/7345 KB   
411/7345 KB   414/7345 KB   415/7345 KB   418/7345 KB   421/7345 KB   423/7345 
KB   426/7345 KB   429/7345 KB   431/7345 KB   432/7345 KB  

[jira] [Created] (DRILL-6129) Query fails on nested data type schema change

2018-01-31 Thread salim achouche (JIRA)
salim achouche created DRILL-6129:
-

 Summary: Query fails on nested data type schema change
 Key: DRILL-6129
 URL: https://issues.apache.org/jira/browse/DRILL-6129
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - CLI
Affects Versions: 1.10.0
Reporter: salim achouche
Assignee: salim achouche
 Fix For: 1.13.0


Use-Case -
 * Assume two parquet files with similar schemas except for a nested column
 * Schema file1
 ** int64 field1
 ** optional group field2

 *** optional group field2.1 (LIST)
  repeated group list

 * optional group element

 ** optional int64 child_field
 * Schema file2
 ** int64 field1
 ** optional group field2

 *** optional group field2.1 (LIST)
  repeated group list

 * optional group element

 ** optional group child_field
 *** optional int64 child_field_f1
 *** optional int64 child_field_f1
 * Essentially child_field changed from an int64 to a group of fields

 

Observed Query Failure

select * from ;
Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The 
field $bits$(UINT1:REQUIRED) doesn't match the provided metadata major_type {
  minor_type: MAP
  mode: REQUIRED
Note that selecting one file at a time succeeds which seems to indicate the 
issue has to do with the schema change logic. 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill pull request #1099: DRILL-6106: Use valueOf method instead of construc...

2018-01-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1099


---


Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-31 Thread Vlad Rozov

Congrats Paul!

Thank you,

Vlad

On 1/31/18 04:37, Vitalii Diravka wrote:

Congratulations, Paul!
Well deserved.

Kind regards
Vitalii

On Wed, Jan 31, 2018 at 9:58 AM, Arina Yelchiyeva <
arina.yelchiy...@gmail.com> wrote:


Congratulations, Paul!
Well deserved.

Kind regards
Arina

On Wed, Jan 31, 2018 at 10:13 AM, Robert Hou  wrote:


Congratulations, Paul!


--Robert


From: Abhishek Girish 
Sent: Tuesday, January 30, 2018 9:31 PM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers

Congratulations, Paul!

On Tue, Jan 30, 2018 at 2:48 PM, Sorabh Hamirwasia 
Sent: Tuesday, January 30, 2018 2:43:07 PM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers

Congratulations, Paul.

Thanks & Regards,
B Anil Kumar.

On Tue, Jan 30, 2018 at 2:34 PM, Chunhui Shi  wrote:


Congrats Paul! Well deserved!


From: Kunal Khatua 
Sent: Tuesday, January 30, 2018 2:05:56 PM
To: dev@drill.apache.org
Subject: RE: [ANNOUNCE] New PMC member: Paul Rogers

Congratulations, Paul !

-Original Message-
From: salim achouche [mailto:sachouc...@gmail.com]
Sent: Tuesday, January 30, 2018 2:00 PM
To: dev@drill.apache.org; Padma Penumarthy 
Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers

Congrats Paul!

Regards,
Salim


On Jan 30, 2018, at 1:58 PM, Padma Penumarthy <

ppenumar...@mapr.com>

wrote:

Congratulations Paul.

Thanks
Padma



On Jan 30, 2018, at 1:55 PM, Gautam Parai 

wrote:

Congratulations Paul!


From: Timothy Farkas 
Sent: Tuesday, January 30, 2018 1:54:43 PM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers

Congrats!


From: Aman Sinha 
Sent: Tuesday, January 30, 2018 1:50:07 PM
To: dev@drill.apache.org
Subject: [ANNOUNCE] New PMC member: Paul Rogers

I am pleased to announce that Drill PMC invited Paul Rogers to the
PMC and he has accepted the invitation.

Congratulations Paul and thanks for your contributions !

-Aman
(on behalf of Drill PMC)






Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-31 Thread Karthikeyan Manivannan
Congrats !



On January 31, 2018 at 1:54:20 PM, Khurram Faraaz 
(kfar...@mapr.com) wrote:

Congratulations Paul.


From: Vova Vysotskyi 
Sent: Wednesday, January 31, 2018 12:51:25 PM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers

Congrats, Paul!

2018-01-31 22:40 GMT+02:00 Rob Wu :

> Congratulations, Paul!
>
> Best regards,
>
> Rob
>
> Best regards,
>
> Rob
> 
> From: Charles Givre 
> Sent: Wednesday, January 31, 2018 9:47:25 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>
> Congrats Paul! Very well deserved!
>
> > On Jan 30, 2018, at 16:50, Aman Sinha  wrote:
> >
> > I am pleased to announce that Drill PMC invited Paul Rogers to the PMC
> and
> > he has accepted the invitation.
> >
> > Congratulations Paul and thanks for your contributions !
> >
> > -Aman
> > (on behalf of Drill PMC)
>
>


--
Kind regards,
Volodymyr Vysotskyi


Re: Batch handling project next steps

2018-01-31 Thread Karthikeyan Manivannan
Hi Paul,

Looking at your PR plan, I am guessing that the CSV reader changes and the JSON 
reader changes are independent, right?
If yes, I would like to propose that you open a PR for the CSV reader first. 
That will give us one more way to start exercising and maturing the accessor 
framework while the other layers are changing to adapt to it.

Thanks.

Karthik


On January 29, 2018 at 11:04:05 PM, Paul Rogers 
(par0...@yahoo.com.invalid) wrote:

Hi All,
Let's discuss the next step for the "batch handling" project. [1]
Thanks to Aman for committing the "hygiene" PR.
I'm rebasing the remaining code on the updated master. I'll keep the 
"RowSetRev3" [2] branch unchanged since [1] has many links to it. Instead I'll 
create a new branch.
It seems to work best to keep the PRs small as outlined in [3]. So, I'm 
thinking to carve off just the metadata enhancements for the next PR. (Or, if 
that is awkward, I'll slice of some other small piece.) Doing this work is a 
bit tedious because I'll have to do temporary edits to files outside the merged 
changes; and those changes will show up as code conflicts in later PRs. Still, 
this bit-by-bit approach may work better than a single monster PR.
Thoughts or suggestions?
Thanks,
- Paul
[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_paul-2Drogers_drill_wiki_Batch-2DHandling-2DUpgrades=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=HlugibuI4IVjs-VMnFvNTcaBtEaDDqE4Ya96cugWqJ8=VpQRC58vkbPxqjvx_a-4PBqDWTJFqU8OiaS5RGtjwWc=0xNXZSGqFVo34ETmcenh7NjhFPizbDRFmKg8EQjpNTk=
[2] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_paul-2Drogers_drill_tree_RowSetRev3=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=HlugibuI4IVjs-VMnFvNTcaBtEaDDqE4Ya96cugWqJ8=VpQRC58vkbPxqjvx_a-4PBqDWTJFqU8OiaS5RGtjwWc=Vwm5mNIpYEPwXW0qHHhKB_tDIhSxbzwZ9hR_dmmnOl4=

[3] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_paul-2Drogers_drill_wiki_BH-2DCode-2DIntro-23pull-2Drequest-2Dplan=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=HlugibuI4IVjs-VMnFvNTcaBtEaDDqE4Ya96cugWqJ8=VpQRC58vkbPxqjvx_a-4PBqDWTJFqU8OiaS5RGtjwWc=Pf876M41GizQ_AKPI3wohYrcwF-vx__nZw9NqXIAw8E=



Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-31 Thread Khurram Faraaz
Congratulations Paul.


From: Vova Vysotskyi 
Sent: Wednesday, January 31, 2018 12:51:25 PM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers

Congrats, Paul!

2018-01-31 22:40 GMT+02:00 Rob Wu :

> Congratulations, Paul!
>
> Best regards,
>
> Rob
>
> Best regards,
>
> Rob
> 
> From: Charles Givre 
> Sent: Wednesday, January 31, 2018 9:47:25 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>
> Congrats Paul!  Very well deserved!
>
> > On Jan 30, 2018, at 16:50, Aman Sinha  wrote:
> >
> > I am pleased to announce that Drill PMC invited Paul Rogers to the PMC
> and
> > he has accepted the invitation.
> >
> > Congratulations Paul and thanks for your contributions !
> >
> > -Aman
> > (on behalf of Drill PMC)
>
>


--
Kind regards,
Volodymyr Vysotskyi


[jira] [Created] (DRILL-6128) Wrong Result with Nested Loop Join

2018-01-31 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-6128:


 Summary: Wrong Result with Nested Loop Join
 Key: DRILL-6128
 URL: https://issues.apache.org/jira/browse/DRILL-6128
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia


Nested Loop Join produces wrong result's if there are multiple batches on the 
right side. It builds an ExapandableHyperContainer to hold all the right side 
of batches. Then for each record on left side input evaluates the condition 
with all records on right side and emit the output if condition is satisfied. 
The main loop inside 
[populateOutgoingBatch|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/NestedLoopJoinTemplate.java#L106]
 call's *doEval* with correct indexes to evaluate records on both the sides. In 
generated code of *doEval* for some reason there is a right shift of 16 done on 
the rightBatchIndex (sample shared below).
{code:java}
public boolean doEval(int leftIndex, int rightBatchIndex, int 
rightRecordIndexWithinBatch)
 throws SchemaChangeException
{
  {
   IntHolder out3 = new IntHolder();
   {
 out3 .value = vv0 .getAccessor().get((leftIndex));
   }
   IntHolder out7 = new IntHolder();
   {
 out7 .value =  
 
vv4[((rightBatchIndex)>>>16)].getAccessor().get(((rightRecordIndexWithinBatch)& 
65535));
   }

..
..
}{code}
 

When the actual loop is processing second batch, inside eval method the index 
with right shift becomes 0 and it ends up evaluating condition w.r.t first 
right batch again. So if there is more than one batch (upto 65535) on right 
side doEval will always consider first batch for condition evaluation. But the 
output data will be based on correct batch so there will be issues like 
OutOfBound and WrongData. Cases can be:

Let's say: *rightBatchIndex*: index of right batch to consider, 
*rightRecordIndexWithinBatch*: index of record in right batch at rightBatchIndex

1) First right batch comes with zero data and with OK_NEW_SCHEMA (let's say 
because of filter in the operator tree). Next Right batch has > 0 data. So when 
we call doEval for second batch(*rightBatchIndex = 1*) and first record in it 
(i.e. *rightRecordIndexWithinBatch = 0*), actual evaluation will happen using 
first batch (since *rightBatchIndex >>> 16 = 0*). On accessing record at 
*rightRecordIndexWithinBatch* in first batch it will throw 
*IndexOutofBoundException* since the first batch has no records.

2) Let's say there are 2 batches on right side. Also let's say first batch 
contains 3 records (with id_right=1/2/3) and 2nd batch also contain 3 records 
(with id_right=10/20/30). Also let's say there is 1 batch on left side with 3 
records (with id_left=1/2/3). Then in this case the NestedLoopJoin (with 
equality condition) will end up producing 6 records instead of 3. It produces 
first 3 records based on match between left records and match in first right 
batch records. But while 2nd right batch it will evaluate id_left=id_right 
based on first batch instead and will again find matches and will produce 
another 3 records. *Example:*

*Left Batch Data:*

 
{code:java}
Batch1:

{
 "id_left": 1,
 "cost_left": 11,
 "name_left": "item11"
}
{
 "id_left": 2,
 "cost_left": 21,
 "name_left": "item21"
}
{
 "id_left": 3,
 "cost_left": 31,
 "name_left": "item31"
}{code}
 

*Right Batch Data:*

 
{code:java}
Batch 1:
{
 "id_right": 1,
 "cost_right": 10,
 "name_right": "item1"
}
{
 "id_right": 2,
 "cost_right": 20,
 "name_right": "item2"
}
{
 "id_right": 3,
 "cost_right": 30,
 "name_right": "item3"
}
{code}
 

 
{code:java}
Batch 2:
{
 "id_right": 4,
 "cost_right": 40,
 "name_right": "item4"
}
{
 "id_right": 4,
 "cost_right": 40,
 "name_right": "item4"
}
{
 "id_right": 4,
 "cost_right": 40,
 "name_right": "item4"
}{code}
 

*Produced output:*
{code:java}
{
 "id_left": 1,
 "cost_left": 11,
 "name_left": "item11",
 "id_right": 1,
 "cost_right": 10,
 "name_right": "item1"
}
{
 "id_left": 1,
 "cost_left": 11,
 "name_left": "item11",
 "id_right": 4,
 "cost_right": 40,
 "name_right": "item4"
}
{
 "id_left": 2,
 "cost_left": 21,
 "name_left": "item21"
 "id_right": 2, 
 "cost_right": 20,
 "name_right": "item2"
}
{
 "id_left": 2,
 "cost_left": 21,
 "name_left": "item21"
 "id_right": 4, 
 "cost_right": 40,
 "name_right": "item4"
}
{
 "id_left": 3,
 "cost_left": 31,
 "name_left": "item31"
 "id_right": 3, 
 "cost_right": 30,
 "name_right": "item3"
}
{
 "id_left": 3,
 "cost_left": 31,
 "name_left": "item31"
 "id_right": 4, 
 "cost_right": 40,
 "name_right": "item4"
}{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-31 Thread Vova Vysotskyi
Congrats, Paul!

2018-01-31 22:40 GMT+02:00 Rob Wu :

> Congratulations, Paul!
>
> Best regards,
>
> Rob
>
> Best regards,
>
> Rob
> 
> From: Charles Givre 
> Sent: Wednesday, January 31, 2018 9:47:25 AM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>
> Congrats Paul!  Very well deserved!
>
> > On Jan 30, 2018, at 16:50, Aman Sinha  wrote:
> >
> > I am pleased to announce that Drill PMC invited Paul Rogers to the PMC
> and
> > he has accepted the invitation.
> >
> > Congratulations Paul and thanks for your contributions !
> >
> > -Aman
> > (on behalf of Drill PMC)
>
>


-- 
Kind regards,
Volodymyr Vysotskyi


Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-31 Thread Rob Wu
Congratulations, Paul!

Best regards,

Rob

Best regards,

Rob

From: Charles Givre 
Sent: Wednesday, January 31, 2018 9:47:25 AM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers

Congrats Paul!  Very well deserved!

> On Jan 30, 2018, at 16:50, Aman Sinha  wrote:
>
> I am pleased to announce that Drill PMC invited Paul Rogers to the PMC and
> he has accepted the invitation.
>
> Congratulations Paul and thanks for your contributions !
>
> -Aman
> (on behalf of Drill PMC)



[GitHub] drill issue #1099: DRILL-6106: Use valueOf method instead of constructor sin...

2018-01-31 Thread vrozov
Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1099
  
@reudismam Travis fails in other PRs as well. See #1105.


---


[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

2018-01-31 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165168294
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -84,13 +85,6 @@
 public abstract class HashAggTemplate implements HashAggregator {
   protected static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HashAggregator.class);
 
-  private static final int VARIABLE_MAX_WIDTH_VALUE_SIZE = 50;
-  private static final int VARIABLE_MIN_WIDTH_VALUE_SIZE = 8;
-
-  private static final boolean EXTRA_DEBUG_1 = false;
--- End diff --

Oh but there is! slf4j and logback have a feature called markers, which 
allows you to associate a tag with a statement. When you print logs you can 
specify to filter by level and by marker. There is a working example here 
https://examples.javacodegeeks.com/enterprise-java/slf4j/slf4j-markers-example/ 
. I will update the log statements to use markers in this PR.


---


[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

2018-01-31 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165166234
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
@@ -255,7 +254,6 @@ private HashAggregator createAggregatorInternal() 
throws SchemaChangeException,
   groupByOutFieldIds[i] = container.add(vv);
 }
 
-int extraNonNullColumns = 0; // each of SUM, MAX and MIN gets an extra 
bigint column
--- End diff --

Maybe do this work as a separate PR (for DRILL-5728) ?  Else it would delay 
this PR, and overload it ...


---


[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

2018-01-31 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r164617150
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -84,13 +85,6 @@
 public abstract class HashAggTemplate implements HashAggregator {
   protected static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HashAggregator.class);
 
-  private static final int VARIABLE_MAX_WIDTH_VALUE_SIZE = 50;
-  private static final int VARIABLE_MIN_WIDTH_VALUE_SIZE = 8;
-
-  private static final boolean EXTRA_DEBUG_1 = false;
--- End diff --

The logging framework only gives error/warning/debug/trace ... there is no 
option for a user configurable level 


---


[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

2018-01-31 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165161146
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggBatch.java
 ---
@@ -255,7 +254,6 @@ private HashAggregator createAggregatorInternal() 
throws SchemaChangeException,
   groupByOutFieldIds[i] = container.add(vv);
 }
 
-int extraNonNullColumns = 0; // each of SUM, MAX and MIN gets an extra 
bigint column
--- End diff --

Thanks for catching this. Then we should fix the underlying problem instead 
of passing around additional parameters to work around the issue. I will work 
on fixing the codegen for the BatchHolder as part of this PR.


---


[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

2018-01-31 Thread ppadma
Github user ppadma commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165156589
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -215,6 +206,7 @@ public BatchHolder() {
   MaterializedField outputField = materializedValueFields[i];
   // Create a type-specific ValueVector for this value
   vector = TypeHelper.getNewVector(outputField, allocator);
+  int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize;
--- End diff --

@ilooner That is the point. If we know the exact value, why do we need 
RecordBatchSizer ? we should use RecordBatchSizer when we need to get sizing 
information for a batch (in most cases, incoming batch). In this case, you are 
allocating memory for value vectors for the batch you are building. For fixed 
width columns, you can get the column width size for each type you are 
allocating memory for using TypeHelper.getSize. For variable width columns, 
TypeHelper.getSize assumes it is 50 bytes.  If you want to adjust memory you 
are allocating for variable width columns for outgoing batch based on incoming 
batch, that's when you use RecordBatchSizer on actual incoming batch to figure 
out the average size of that column.  You can also use RecordBatchSizer on 
incoming batch if you want to figure out how many values you want to allocate 
memory for in the outgoing batch. Note that, with your change, for just created 
value vectors, variable width columns will return estSize of 1, which is n
 ot what you want. 


---


[GitHub] drill issue #1105: DRILL-6125: Fix possible memory leak when query is cancel...

2018-01-31 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1105
  
@sachouche @arina-ielchiieva 


---


[GitHub] drill pull request #1105: DRILL-6125: Fix possible memory leak when query is...

2018-01-31 Thread ilooner
GitHub user ilooner opened a pull request:

https://github.com/apache/drill/pull/1105

DRILL-6125: Fix possible memory leak when query is cancelled.

A detailed description of the problem and solution can be found here: 

https://issues.apache.org/jira/browse/DRILL-6125

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ilooner/drill DRILL-6125

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1105.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1105


commit 1d1725a276c058e8c09e456963bac928d1f062ed
Author: Timothy Farkas 
Date:   2018-01-30T23:55:41Z

DRILL-6125: Fix possible memory leak when query is cancelled.




---


[GitHub] drill issue #1099: DRILL-6106: Use valueOf method instead of constructor sin...

2018-01-31 Thread reudismam
Github user reudismam commented on the issue:

https://github.com/apache/drill/pull/1099
  
Only pass Travis CI by removing the edits to SSLConfigClient.java


---


[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

2018-01-31 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165137630
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/spill/RecordBatchSizer.java
 ---
@@ -232,9 +251,8 @@ else if (width > 0) {
 }
   }
 
-  public static final int MAX_VECTOR_SIZE = ValueVector.MAX_BUFFER_SIZE; 
// 16 MiB
-
   private List columnSizes = new ArrayList<>();
+  private Map columnSizeMap = 
CaseInsensitiveMap.newHashMap();
--- End diff --

Thanks for the explanation here and on the dev list @paul-rogers. 


---


[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

2018-01-31 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165136635
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -397,11 +384,9 @@ private void delayedSetup() {
 }
 numPartitions = BaseAllocator.nextPowerOfTwo(numPartitions); // in 
case not a power of 2
 
-if ( schema == null ) { estValuesBatchSize = estOutgoingAllocSize = 
estMaxBatchSize = 0; } // incoming was an empty batch
--- End diff --

All the unit and functional tests passed without an NPE. The null check was 
redundant because the code in **doWork** that calls **delayedSetup** sets the 
schema if it is null.

```
  // This would be called only once - first time actual data arrives on 
incoming
  if ( schema == null && incoming.getRecordCount() > 0 ) {
this.schema = incoming.getSchema();
currentBatchRecordCount = incoming.getRecordCount(); // initialize 
for first non empty batch
// Calculate the number of partitions based on actual incoming data
delayedSetup();
  }
```

So schema will never be null when delayed setup is called


---


[jira] [Created] (DRILL-6127) NullPointerException happens when submitting physical plan to the Hive storage plugin

2018-01-31 Thread Anton Gozhiy (JIRA)
Anton Gozhiy created DRILL-6127:
---

 Summary: NullPointerException happens when submitting physical 
plan to the Hive storage plugin
 Key: DRILL-6127
 URL: https://issues.apache.org/jira/browse/DRILL-6127
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.13.0
Reporter: Anton Gozhiy


*Prerequisites:*
*1.* Create some test table in Hive:
{code:sql}
create external table if not exists hive_storage.test (key string, value 
string) stored as parquet
location '/hive_storage/test';
insert into table test values ("key", "value");
{code}
*2.* Hive plugin config:

{code:json}
{
  "type": "hive",
  "enabled": true,
  "configProps": {
"hive.metastore.uris": "thrift://localhost:9083",
"fs.default.name": "maprfs:///",
"hive.metastore.sasl.enabled": "false"
  }
}
{code}

*Steps:*
*1.* From the Drill web UI, run the following query:
{code:sql}
explain plan for select * from hive.hive_storage.`test`
{code}

*2.* Copy the json part of the plan
*3.* On the Query page set checkbox to the PHYSICAL
*4.* Submit the copied plan  

*Expected result:*
Drill should return normal result: "key", "value"

*Actual result:*
NPE happens:
{noformat}
[Error Id: 8b45c27e-bddd-4552-b7ea-e5af6f40866a on node1:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
NullPointerException


[Error Id: 8b45c27e-bddd-4552-b7ea-e5af6f40866a on node1:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:761)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.checkCommonStates(QueryStateProcessor.java:327)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.planning(QueryStateProcessor.java:223)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:83)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:279) 
[drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_161]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
Caused by: org.apache.drill.exec.work.foreman.ForemanSetupException: Failure 
while parsing physical plan.
at 
org.apache.drill.exec.work.foreman.Foreman.parseAndRunPhysicalPlan(Foreman.java:393)
 [drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:257) 
[drill-java-exec-1.13.0-SNAPSHOT.jar:1.13.0-SNAPSHOT]
... 3 common frames omitted
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Instantiation 
of [simple type, class org.apache.drill.exec.store.hive.HiveScan] value failed 
(java.lang.NullPointerException): null
 at [Source: { "head" : { "version" : 1, "generator" : { "type" : 
"ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : 
[ ], "queue" : 0, "hasResourcePlan" : false, "resultMode" : "EXEC" }, "graph" : 
[ { "pop" : "hive-scan", "@id" : 2, "userName" : "mapr", "hive-table" : { 
"table" : { "tableName" : "test", "dbName" : "hive_storage", "owner" : "mapr", 
"createTime" : 1517417959, "lastAccessTime" : 0, "retention" : 0, "sd" : { 
"location" : "maprfs:/hive_storage/test", "inputFormat" : 
"org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat", "outputFormat" 
: "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat", 
"compressed" : false, "numBuckets" : -1, "serDeInfo" : { "name" : null, 
"serializationLib" : 
"org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe", "parameters" : { 
"serialization.format" : "1" } }, "sortCols" : [ ], "parameters" : { } }, 
"partitionKeys" : [ ], "parameters" : { "totalSize" : "0", "EXTERNAL" : "TRUE", 
"numRows" : "1", "rawDataSize" : "2", "COLUMN_STATS_ACCURATE" : "true", 
"numFiles" : "0", "transient_lastDdlTime" : "1517418363" }, "viewOriginalText" 
: null, "viewExpandedText" : null, "tableType" : "EXTERNAL_TABLE", 
"columnsCache" : { "keys" : [ [ { "name" : "key", "type" : "string", "comment" 
: null }, { "name" : "value", "type" : "string", "comment" : null } ] ] } }, 
"partitions" : null }, "columns" : [ "`key`", "`value`" ], "cost" : 0.0 }, { 
"pop" : "project", "@id" : 1, "exprs" : [ { "ref" : "`key`", "expr" : "`key`" 
}, { "ref" : "`value`", "expr" : "`value`" } ], "child" : 2, "outputProj" : 
true, "initialAllocation" : 100, "maxAllocation" : 100, 

[GitHub] drill pull request #1101: DRILL-6032: Made the batch sizing for HashAgg more...

2018-01-31 Thread ilooner
Github user ilooner commented on a diff in the pull request:

https://github.com/apache/drill/pull/1101#discussion_r165135291
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -215,6 +206,7 @@ public BatchHolder() {
   MaterializedField outputField = materializedValueFields[i];
   // Create a type-specific ValueVector for this value
   vector = TypeHelper.getNewVector(outputField, allocator);
+  int columnSize = new RecordBatchSizer.ColumnSize(vector).estSize;
--- End diff --

@ppadma I thought estSize represented the estimated column width. For 
FixedWidth vectors we know the exact column width, so why can't we use the 
exact value? Also why are there two different things for measuring column 
sizes, when do you use RecordBatchSizer and when do you use TypeHelper? 


---


[GitHub] drill issue #897: Drill-5703 Added Syntax Highlighting and Limited Autocompl...

2018-01-31 Thread cgivre
Github user cgivre commented on the issue:

https://github.com/apache/drill/pull/897
  
Yes, it can be closed.  Nice work on the syntax highlighting!

> On Jan 29, 2018, at 16:46, Kunal Khatua  wrote:
> 
> @cgivre  can we close this PR if the #1043 
 met the objectives?
> 
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub 
, or mute the 
thread 
.
> 




---


Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-31 Thread Charles Givre
Congrats Paul!  Very well deserved!

> On Jan 30, 2018, at 16:50, Aman Sinha  wrote:
> 
> I am pleased to announce that Drill PMC invited Paul Rogers to the PMC and
> he has accepted the invitation.
> 
> Congratulations Paul and thanks for your contributions !
> 
> -Aman
> (on behalf of Drill PMC)



[GitHub] drill issue #1099: DRILL-6106: Use valueOf method instead of constructor sin...

2018-01-31 Thread reudismam
Github user reudismam commented on the issue:

https://github.com/apache/drill/pull/1099
  
I have squashed the commits, but I’m getting an error in Travis CI 
similar to the previous one when I reverted some changes.
Column a-offsets of type UInt4Vector: Offset (0) must be 0 but was 1



---


[GitHub] drill issue #916: DRILL-5377: Five-digit year dates are displayed incorrectl...

2018-01-31 Thread vdiravka
Github user vdiravka commented on the issue:

https://github.com/apache/drill/pull/916
  
@arina-ielchiieva You are right.
According to SQL spec after resolving 
[CALCITE-2055](https://issues.apache.org/jira/browse/CALCITE-2055) and 
Drill-Calcite upgrade Drill and Calcite don't support five digit years. 
Please find more details in jira description.


---


[GitHub] drill pull request #916: DRILL-5377: Five-digit year dates are displayed inc...

2018-01-31 Thread vdiravka
Github user vdiravka closed the pull request at:

https://github.com/apache/drill/pull/916


---


[jira] [Resolved] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc

2018-01-31 Thread Vitalii Diravka (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka resolved DRILL-5377.

Resolution: Not A Problem

[~vvysotskyi] Thank you.
So for now test cases from jira description will fail with:
{code}
java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: Year out of 
range: [11356]
{code}
This is an expected exception. Nothing should be fixed.

> Five-digit year dates are displayed incorrectly via jdbc
> 
>
> Key: DRILL-5377
> URL: https://issues.apache.org/jira/browse/DRILL-5377
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Parquet
>Affects Versions: 1.10.0
>Reporter: Rahul Challapalli
>Assignee: Vitalii Diravka
>Priority: Minor
> Fix For: 1.13.0
>
>
> git.commit.id.abbrev=38ef562
> The issue is connected to displaying five-digit year dates via jdbc
> Below is the output, I get from test framework when I disable auto correction 
> for date fields
> {code}
> select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', 
> autoCorrectCorruptDates => false)) order by l_shipdate limit 10;
> ^@356-03-19
> ^@356-03-21
> ^@356-03-21
> ^@356-03-23
> ^@356-03-24
> ^@356-03-24
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> ^@356-03-26
> {code}
> Or a simpler case:
> {code}
> 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from 
> (VALUES(1));
> +--+
> | FUTURE_DATE  |
> +--+
> | 356-02-16   |
> +--+
> 1 row selected (0.293 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] drill issue #916: DRILL-5377: Five-digit year dates are displayed incorrectl...

2018-01-31 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/916
  
It seems that this PR is not relevant after Calcite upgrade.
@vdiravka please confirm and close PR.


---


[GitHub] drill issue #1104: DRILL-6118: Handle item star columns during project / fil...

2018-01-31 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1104
  
@chunhui-shi please review.


---


[GitHub] drill pull request #1104: DRILL-6118: Handle item star columns during projec...

2018-01-31 Thread arina-ielchiieva
GitHub user arina-ielchiieva opened a pull request:

https://github.com/apache/drill/pull/1104

DRILL-6118: Handle item star columns during project / filter push dow…

…n and directory pruning

1. Added DrillFilterItemStarReWriterRule to re-write item star fields to 
regular field references.
2. Refactored DrillPushProjectIntoScanRule to handle item star fields, 
factored out helper classes and methods from PreUitl.class.
3. Fixed issue with dynamic star usage (after Calcite upgrade old usage of 
star was still present, replaced WILDCARD -> DYNAMIC_STAR  for clarity).
4. Added unit tests to check project / filter push down and directory 
pruning with item star.

Details in [DRILL-6118](https://issues.apache.org/jira/browse/DRILL-6118).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arina-ielchiieva/drill DRILL-6118

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1104.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1104


commit 4673bfb593ca6422d58fa9e0e6eb281a69f1ed69
Author: Arina Ielchiieva 
Date:   2017-12-21T17:31:00Z

DRILL-6118: Handle item star columns during project / filter push down and 
directory pruning

1. Added DrillFilterItemStarReWriterRule to re-write item star fields to 
regular field references.
2. Refactored DrillPushProjectIntoScanRule to handle item star fields, 
factored out helper classes and methods from PreUitl.class.
3. Fixed issue with dynamic star usage (after Calcite upgrade old usage of 
star was still present, replaced WILDCARD -> DYNAMIC_STAR  for clarity).
4. Added unit tests to check project / filter push down and directory 
pruning with item star.




---


Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-31 Thread Vitalii Diravka
Congratulations, Paul!
Well deserved.

Kind regards
Vitalii

On Wed, Jan 31, 2018 at 9:58 AM, Arina Yelchiyeva <
arina.yelchiy...@gmail.com> wrote:

> Congratulations, Paul!
> Well deserved.
>
> Kind regards
> Arina
>
> On Wed, Jan 31, 2018 at 10:13 AM, Robert Hou  wrote:
>
> > Congratulations, Paul!
> >
> >
> > --Robert
> >
> > 
> > From: Abhishek Girish 
> > Sent: Tuesday, January 30, 2018 9:31 PM
> > To: dev@drill.apache.org
> > Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
> >
> > Congratulations, Paul!
> >
> > On Tue, Jan 30, 2018 at 2:48 PM, Sorabh Hamirwasia  >
> > wrote:
> >
> > > Congratulations Paul!
> > >
> > >
> > > Thanks,
> > > Sorabh
> > >
> > > 
> > > From: AnilKumar B 
> > > Sent: Tuesday, January 30, 2018 2:43:07 PM
> > > To: dev@drill.apache.org
> > > Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
> > >
> > > Congratulations, Paul.
> > >
> > > Thanks & Regards,
> > > B Anil Kumar.
> > >
> > > On Tue, Jan 30, 2018 at 2:34 PM, Chunhui Shi  wrote:
> > >
> > > > Congrats Paul! Well deserved!
> > > >
> > > > 
> > > > From: Kunal Khatua 
> > > > Sent: Tuesday, January 30, 2018 2:05:56 PM
> > > > To: dev@drill.apache.org
> > > > Subject: RE: [ANNOUNCE] New PMC member: Paul Rogers
> > > >
> > > > Congratulations, Paul !
> > > >
> > > > -Original Message-
> > > > From: salim achouche [mailto:sachouc...@gmail.com]
> > > > Sent: Tuesday, January 30, 2018 2:00 PM
> > > > To: dev@drill.apache.org; Padma Penumarthy 
> > > > Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
> > > >
> > > > Congrats Paul!
> > > >
> > > > Regards,
> > > > Salim
> > > >
> > > > > On Jan 30, 2018, at 1:58 PM, Padma Penumarthy <
> ppenumar...@mapr.com>
> > > > wrote:
> > > > >
> > > > > Congratulations Paul.
> > > > >
> > > > > Thanks
> > > > > Padma
> > > > >
> > > > >
> > > > >> On Jan 30, 2018, at 1:55 PM, Gautam Parai 
> wrote:
> > > > >>
> > > > >> Congratulations Paul!
> > > > >>
> > > > >> 
> > > > >> From: Timothy Farkas 
> > > > >> Sent: Tuesday, January 30, 2018 1:54:43 PM
> > > > >> To: dev@drill.apache.org
> > > > >> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
> > > > >>
> > > > >> Congrats!
> > > > >>
> > > > >> 
> > > > >> From: Aman Sinha 
> > > > >> Sent: Tuesday, January 30, 2018 1:50:07 PM
> > > > >> To: dev@drill.apache.org
> > > > >> Subject: [ANNOUNCE] New PMC member: Paul Rogers
> > > > >>
> > > > >> I am pleased to announce that Drill PMC invited Paul Rogers to the
> > > > >> PMC and he has accepted the invitation.
> > > > >>
> > > > >> Congratulations Paul and thanks for your contributions !
> > > > >>
> > > > >> -Aman
> > > > >> (on behalf of Drill PMC)
> > > > >
> > > >
> > > >
> > >
> >
>


[GitHub] drill issue #1083: DRILL-4185: UNION ALL involving empty directory on any si...

2018-01-31 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1083
  
+1, LGTM. Thanks for making the changes.


---


[GitHub] drill pull request #1083: DRILL-4185: UNION ALL involving empty directory on...

2018-01-31 Thread vdiravka
Github user vdiravka commented on a diff in the pull request:

https://github.com/apache/drill/pull/1083#discussion_r165023581
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java ---
@@ -568,6 +570,22 @@ public void nullMixedComparatorEqualJoinHelper(final 
String query) throws Except
 .go();
   }
 
+  /** InnerJoin with empty dir table on nullable cols, MergeJoin */
+  // TODO: the same tests should be added for HashJoin operator, DRILL-6070
+  @Test
--- End diff --

The bug was founded for NLJ and empty tables. I have resolved that issue.
The separate test class is added for empty dir tables and different join 
operators.

Also I have made refactoring for the TestHashJoinAdvanced, 
TestMergeJoinAdvanced, TestNestedLoopJoin classes.


---


[GitHub] drill issue #1099: DRILL-6106: Use valueOf method instead of constructor sin...

2018-01-31 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1099
  
Well, you can always use force push to override your previous changes or 
even replace your remote branch with new local.


---


[GitHub] drill issue #1103: DRILL-6124: Fixed possible NPE when no injection site is ...

2018-01-31 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1103
  
@ilooner it looks like if latch is not found, execution control will return 
dummy latch [1]? If I am missing something, please explain.

[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/testing/ExecutionControls.java#L206


---


[GitHub] drill issue #1099: DRILL-6106: Use valueOf method instead of constructor sin...

2018-01-31 Thread reudismam
Github user reudismam commented on the issue:

https://github.com/apache/drill/pull/1099
  
Maybe it has not worked as expected. It squashed the commit (first commit), 
but as the commit mix commits from other persons, they come together. Maybe it 
will be the case of creating a patch file for the desired commit and apply this 
patch to a new pull request. 


---


Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-31 Thread Arina Yelchiyeva
Congratulations, Paul!
Well deserved.

Kind regards
Arina

On Wed, Jan 31, 2018 at 10:13 AM, Robert Hou  wrote:

> Congratulations, Paul!
>
>
> --Robert
>
> 
> From: Abhishek Girish 
> Sent: Tuesday, January 30, 2018 9:31 PM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>
> Congratulations, Paul!
>
> On Tue, Jan 30, 2018 at 2:48 PM, Sorabh Hamirwasia 
> wrote:
>
> > Congratulations Paul!
> >
> >
> > Thanks,
> > Sorabh
> >
> > 
> > From: AnilKumar B 
> > Sent: Tuesday, January 30, 2018 2:43:07 PM
> > To: dev@drill.apache.org
> > Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
> >
> > Congratulations, Paul.
> >
> > Thanks & Regards,
> > B Anil Kumar.
> >
> > On Tue, Jan 30, 2018 at 2:34 PM, Chunhui Shi  wrote:
> >
> > > Congrats Paul! Well deserved!
> > >
> > > 
> > > From: Kunal Khatua 
> > > Sent: Tuesday, January 30, 2018 2:05:56 PM
> > > To: dev@drill.apache.org
> > > Subject: RE: [ANNOUNCE] New PMC member: Paul Rogers
> > >
> > > Congratulations, Paul !
> > >
> > > -Original Message-
> > > From: salim achouche [mailto:sachouc...@gmail.com]
> > > Sent: Tuesday, January 30, 2018 2:00 PM
> > > To: dev@drill.apache.org; Padma Penumarthy 
> > > Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
> > >
> > > Congrats Paul!
> > >
> > > Regards,
> > > Salim
> > >
> > > > On Jan 30, 2018, at 1:58 PM, Padma Penumarthy 
> > > wrote:
> > > >
> > > > Congratulations Paul.
> > > >
> > > > Thanks
> > > > Padma
> > > >
> > > >
> > > >> On Jan 30, 2018, at 1:55 PM, Gautam Parai  wrote:
> > > >>
> > > >> Congratulations Paul!
> > > >>
> > > >> 
> > > >> From: Timothy Farkas 
> > > >> Sent: Tuesday, January 30, 2018 1:54:43 PM
> > > >> To: dev@drill.apache.org
> > > >> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
> > > >>
> > > >> Congrats!
> > > >>
> > > >> 
> > > >> From: Aman Sinha 
> > > >> Sent: Tuesday, January 30, 2018 1:50:07 PM
> > > >> To: dev@drill.apache.org
> > > >> Subject: [ANNOUNCE] New PMC member: Paul Rogers
> > > >>
> > > >> I am pleased to announce that Drill PMC invited Paul Rogers to the
> > > >> PMC and he has accepted the invitation.
> > > >>
> > > >> Congratulations Paul and thanks for your contributions !
> > > >>
> > > >> -Aman
> > > >> (on behalf of Drill PMC)
> > > >
> > >
> > >
> >
>


Re: [ANNOUNCE] New PMC member: Paul Rogers

2018-01-31 Thread Robert Hou
Congratulations, Paul!


--Robert


From: Abhishek Girish 
Sent: Tuesday, January 30, 2018 9:31 PM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers

Congratulations, Paul!

On Tue, Jan 30, 2018 at 2:48 PM, Sorabh Hamirwasia 
wrote:

> Congratulations Paul!
>
>
> Thanks,
> Sorabh
>
> 
> From: AnilKumar B 
> Sent: Tuesday, January 30, 2018 2:43:07 PM
> To: dev@drill.apache.org
> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
>
> Congratulations, Paul.
>
> Thanks & Regards,
> B Anil Kumar.
>
> On Tue, Jan 30, 2018 at 2:34 PM, Chunhui Shi  wrote:
>
> > Congrats Paul! Well deserved!
> >
> > 
> > From: Kunal Khatua 
> > Sent: Tuesday, January 30, 2018 2:05:56 PM
> > To: dev@drill.apache.org
> > Subject: RE: [ANNOUNCE] New PMC member: Paul Rogers
> >
> > Congratulations, Paul !
> >
> > -Original Message-
> > From: salim achouche [mailto:sachouc...@gmail.com]
> > Sent: Tuesday, January 30, 2018 2:00 PM
> > To: dev@drill.apache.org; Padma Penumarthy 
> > Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
> >
> > Congrats Paul!
> >
> > Regards,
> > Salim
> >
> > > On Jan 30, 2018, at 1:58 PM, Padma Penumarthy 
> > wrote:
> > >
> > > Congratulations Paul.
> > >
> > > Thanks
> > > Padma
> > >
> > >
> > >> On Jan 30, 2018, at 1:55 PM, Gautam Parai  wrote:
> > >>
> > >> Congratulations Paul!
> > >>
> > >> 
> > >> From: Timothy Farkas 
> > >> Sent: Tuesday, January 30, 2018 1:54:43 PM
> > >> To: dev@drill.apache.org
> > >> Subject: Re: [ANNOUNCE] New PMC member: Paul Rogers
> > >>
> > >> Congrats!
> > >>
> > >> 
> > >> From: Aman Sinha 
> > >> Sent: Tuesday, January 30, 2018 1:50:07 PM
> > >> To: dev@drill.apache.org
> > >> Subject: [ANNOUNCE] New PMC member: Paul Rogers
> > >>
> > >> I am pleased to announce that Drill PMC invited Paul Rogers to the
> > >> PMC and he has accepted the invitation.
> > >>
> > >> Congratulations Paul and thanks for your contributions !
> > >>
> > >> -Aman
> > >> (on behalf of Drill PMC)
> > >
> >
> >
>