[GitHub] drill pull request: Drill 4581

2016-04-13 Thread paul-rogers
Github user paul-rogers commented on the pull request:

https://github.com/apache/drill/pull/477#issuecomment-209731896
  
Thanks; I'll do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: Drill 4581

2016-04-13 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/477#issuecomment-209730105
  
Once you've gone through reviews, you should also collapse your change into 
a single commit and name it using the pattern below. This allows a committer to 
make sure to get everything right.

DRILL-: short description

Longer description...




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: Drill 4581

2016-04-13 Thread paul-rogers
Github user paul-rogers commented on the pull request:

https://github.com/apache/drill/pull/477#issuecomment-209728811
  
This fix leaves the external interface unchanged. This is a preparation for 
the work to add a site configuration level; but none of that work is in this 
fix. Instead, this particular fix just corrects problems in the existing public 
interface. I ran tests to ensure that the scripts continue to work as before 
where they worked, but they now work properly where the previous version was 
broken.

Thanks for the comment on merge; I expected I'd miss something on this 
first checkin...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Drill on YARN

2016-04-13 Thread Paul Rogers
Hi Jacques,

Thanks for the comments and the links to the documents.

In the context of YARN, Resource Management divides into two (mostly) 
independent parts: external and internal.

YARN (via a user request) sets the external limits: x cores and y MB of RAM. 
YARN kills processes that exceed the memory limit, and (optionally) uses 
cgroups to enforce the vcores limit.

Drill’s job is to manage its internal resources to best “live within” that 
external limit. Being new, I was not aware of the two documents. Sounds like a 
good solution is already in place for memory, and good plans exist for CPU.

If we assume the current threading model, as you suggest, we’ll still be fine 
in terms of CPU usage. The current model can be a bit exuberant in its use of 
CPU, but, cgroups will ensure that Drill cannot exceed the YARN-imposed CPU 
limit.

Once the Drill-on-YARN work gets a bit further along, we will run tests to 
validate that cgroups does work as promised. I’ll let the group know as we get 
some results.

Thanks,

- Paul

 
> On Apr 13, 2016, at 3:03 PM, Jacques Nadeau  wrote:
> 
> It sounds like Paul and John would both benefit from reviewing [1] & [2].
> 
> Drill's has memory management, respects limits and has a hierarchy of
> allocators to do this. The framework for constraining certain operations,
> fragments or queries all exists. (Note that this is entirely focused on
> off-heap memory, in general Drill tries to avoid ever moving data on heap.)
> 
> Workload management is another topic and there is an initial proposal out
> on that for comment here: [2]
> 
> The parallelization algorithms don't currently support heterogeneous nodes.
> I'd suggest that initial work be done on adding or removing same sized
> nodes. A separate substantial effort would be involved in better lopsided
> parallelization and workload decisions. (Let's get the basics right first.)
> 
> With regards to Paul's comments on 'inside Drill' threading, I think you're
> jumping to some incorrect conclusions. There hasn't been any formal
> proposals to change the threading model. There was a very short discussion
> a month or two back where Hanifi said he'd throw out some prototype code
> but nothing has been shared since. I suggest you assume the current
> threading model until there is a consensus around something new.
> 
> [1]
> https://github.com/apache/drill/blob/master/exec/memory/base/src/main/java/org/apache/drill/exec/memory/README.md
> [2]
> https://docs.google.com/document/d/1xK6CyxwzpEbOrjOdmkd9GXf37dVaf7z0BsvBNLgsZWs/edit
> 
> 
> 
> 
> 
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> 
> On Mon, Mar 28, 2016 at 8:43 AM, John Omernik  wrote:
> 
>> Great summary.  I'll fill in some "non-technical" explanations of some
>> challenges with Memory as I see. Drill Devs, please keep Paul and I
>> accurate in our understanding.
>> 
>> First,  Memory is already set at the drillbit level... sorta.  It's set via
>> ENV in drill-env, and is not a cluster specific thing. However, I believe
>> there are some challenges that come into play when you have bits of
>> different sizes. Drill "may" assume that bits are all the same size, and
>> thus, if you run a query, depending on which bit is the foreman, and which
>> fragments land where, the query may succeed or fail. That's not an ideal
>> situation. I think for a holistic discussion on memory, we need to get some
>> definitives around how Drill handles memory, especially different sized
>> nodes, and what changes would need to be made for bits of different size to
>> work well together on a production cluster.
>> 
>> This discussion forms the basis of almost all work around memory
>> management. If we can realistically only have bits of one size in it's
>> current form, then static allocations are where we are going to be for the
>> initial Yarn work. I love the idea of scaling up and down, but it will be
>> difficult to scale an entire cluster worth of bits up and down, so
>> heterogeneous resource allocations must be a prerequisite to dynamic
>> allocation discussions (other then just adding and removing whole bits).
>> 
>> Second, this also plays into the multiple drillbits per node discussion.
>> If static sized bits are our only approach, then the initial reaction is to
>> make them smaller so you have some granularity in scaling up and down.
>> This may actually hurt a cluster.  Large queries may be challenged by
>> trying to fit it's fragments on 3 nodes of say 8GB of direct RAM, but that
>> query would run fine on bits of 24GB of direct RAM.  Drill Devs: Keep me
>> honest here. I am going off of lots of participation in this memory/cpu
>> discussions when I first started Drill/Marathon integration, and that is
>> the feeling I got in talking to folks on and off list about memory
>> management.
>> 
>> This is a hard topic, but one that I am glad you are spearheading Paul,
>> because as we see more and more clusters get folded together, 

Re: Proposal: Create v2 branch to work on breaking changes

2016-04-13 Thread Jacques Nadeau
A number of the vector problems are an issue for the Java client but not
the c client since the c client doesn't support complex or union types yet.

Agree on your other points:

- We need to get to rolling upgrades. (Note that I suggest that we try to
get to "minor" compatibility for Drillbit <--> Drillbit by the end of the
2.x series in my notes)
- Also agree that we should always work to avoid changing any of the
interfaces described in the doc, no matter what the external commitment is.
The performance analog is a good one.



--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Wed, Apr 13, 2016 at 5:54 PM, Parth Chandra  wrote:

> Thanks for putting this doc together Jacques. This gives us a clear
> framework for discussion.
> Just as a clarification (I haven't yet been able to do more than glance at
> the doc), for 2.0, I was suggesting client-server compatibility not
> drillbit-drillbit compatibility. It seems some of the items you noted
> earlier (null lists, union vectors, etc.) may break drillbit-drillbit
> compatibility but not necessarily affect client-server compatibility. So we
> may be in agreement on some things here.
> In general, though, as the size of user's clusters grows, it will be
> required that we permit rolling upgrades. As I said in the hangout, it's
> like performance, we have to consider it at every instance; and take a
> decision to not support backward compatibility only after due
> consideration. At the moment, some of the functionality we are talking
> about might justify breaking drillbit-drillbit compatibility. Our design
> decisions for these implementations, though, must keep the requirement for
> future backward compatibility in mind.
> I'll add further comments in the JIRA.
>
> On Tue, Apr 12, 2016 at 6:47 PM, Jacques Nadeau 
> wrote:
>
> > A general policy shouldn't hold up a specific decision. Even after we
> > establish a guiding policy, there will be exceptions that we will
> consider.
> > I'm looking for concrete counterpoint to the cost of maintaining
> backwards
> > compatibility.
> >
> > That being said, I have put together an initial proposal of the
> > compatibility commitments we should make to the users. It is important to
> > note that my outline is about our public commitment. As a development
> > community, we should always work to avoid disruptive or backwards
> > incompatible changes on public apis even if the our public commitment
> > policy doesn't dictate it.
> >
> > The proposal is attached here:
> > https://issues.apache.org/jira/browse/DRILL-4600
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Tue, Apr 12, 2016 at 5:54 PM, Neeraja Rentachintala <
> > nrentachint...@maprtech.com> wrote:
> >
> > > Makes sense to postpone the debate : )
> > > Will Look forward for the proposal.
> > >
> > > On Tuesday, April 12, 2016, Zelaine Fong  wrote:
> > >
> > > > As we discussed at this morning's hangout, Jacques took the action to
> > put
> > > > together a strawman compatibility points document.  Would it be
> better
> > to
> > > > wait for that document before we debate this further?
> > > >
> > > > -- Zelaine
> > > >
> > > > On Tue, Apr 12, 2016 at 4:39 PM, Jacques Nadeau  > > > > wrote:
> > > >
> > > > > I agree with Paul, too. Perfect compatibility would be great. I
> > > recognize
> > > > > the issues that a version break could cause.  These are some of the
> > > > issues
> > > > > that I believe require a version break to address:
> > > > > - Support nulls in lists.
> > > > > - Distinguish null maps from empty maps.
> > > > > - Distinguish null arrays from empty arrays.
> > > > > - Support sparse maps (analogous to Parquet maps instead of our
> > current
> > > > > approach analogous to structs in Parquet lingo).
> > > > > - Clean up decimal and enable it by default.
> > > > > - Support full Avro <> Parquet roundtrip (and Parquet files
> generated
> > > by
> > > > > other tools).
> > > > > - Enable union type by default.
> > > > > - Improve performance execution performance of nullable values.
> > > > >
> > > > > I think these things need to be addressed in the 2.x line (let's
> say
> > > that
> > > > > is ~12 months). This is all about tradeoffs which is why I keep
> > asking
> > > > > people to provide concrete impact. If you think at least one of
> these
> > > > > should be resolved, you're arguing for breaking wire compatibility
> > > > between
> > > > > 1.x and 2.x.
> > > > >
> > > > > So let's get concrete:
> > > > >
> > > > > - How many users are running multiple clusters and using a single
> > > client
> > > > to
> > > > > connect them?
> > > > > - What BI tools are most users using? What is the primary driver
> they
> > > are
> > > > > using?
> > > > > - What BI tools are packaging a Drill driver? If any, what is the
> > > update
> > > > > process and lead time?
> > > > > - How many users are skipping multiple Drill versions 

[GitHub] drill pull request: Drill 4581

2016-04-13 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/477#issuecomment-209714087
  
A quick question and an fyi. (I haven't looked at the details of the patch.)

q) Does this change the external interface of the scripts. If so, we should 
probably put in the v2 branch.
fyi) In general: you should make sure that your patch is rebased as opposed 
to merge. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (DRILL-4570) Revise description of Drill ports in documentation

2016-04-13 Thread Bridget Bevens (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens resolved DRILL-4570.
---
Resolution: Fixed

updated doc based on comments

> Revise description of Drill ports in documentation
> --
>
> Key: DRILL-4570
> URL: https://issues.apache.org/jira/browse/DRILL-4570
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Paul Rogers
>Assignee: Bridget Bevens
>Priority: Minor
>
> The documentation describes five Drillbit ports. See 
> http://drill.apache.org/docs/ports-used-by-drill/
> As it turns out, the fifth port (JGroups and Infinispan) was used by a 
> distributed cache feature which is, at present, disabled. Perhaps note this 
> in the documentation. Or, better, remove this port until the distributed 
> cache feature is redesigned.
> The ports are configurable. The table lists only the default ports. Please 
> add a column that lists Configuration Parameter Name and change the "Port" 
> column to "Default Port."
> The config option names are:
> drill.exec.http.port (Default 8047)
> drill.exec.rpc.bit.server.port (This is the control port, 31011)
> drill.exec.rpc.user.server.port (Default 31010)
> The data port is set internally to the control port (given by ...server.port) 
> + 1. So, for the config parameter column, say
> drill.exec.rpc.bit.server.port + 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Proposal: Create v2 branch to work on breaking changes

2016-04-13 Thread Parth Chandra
Thanks for putting this doc together Jacques. This gives us a clear
framework for discussion.
Just as a clarification (I haven't yet been able to do more than glance at
the doc), for 2.0, I was suggesting client-server compatibility not
drillbit-drillbit compatibility. It seems some of the items you noted
earlier (null lists, union vectors, etc.) may break drillbit-drillbit
compatibility but not necessarily affect client-server compatibility. So we
may be in agreement on some things here.
In general, though, as the size of user's clusters grows, it will be
required that we permit rolling upgrades. As I said in the hangout, it's
like performance, we have to consider it at every instance; and take a
decision to not support backward compatibility only after due
consideration. At the moment, some of the functionality we are talking
about might justify breaking drillbit-drillbit compatibility. Our design
decisions for these implementations, though, must keep the requirement for
future backward compatibility in mind.
I'll add further comments in the JIRA.

On Tue, Apr 12, 2016 at 6:47 PM, Jacques Nadeau  wrote:

> A general policy shouldn't hold up a specific decision. Even after we
> establish a guiding policy, there will be exceptions that we will consider.
> I'm looking for concrete counterpoint to the cost of maintaining backwards
> compatibility.
>
> That being said, I have put together an initial proposal of the
> compatibility commitments we should make to the users. It is important to
> note that my outline is about our public commitment. As a development
> community, we should always work to avoid disruptive or backwards
> incompatible changes on public apis even if the our public commitment
> policy doesn't dictate it.
>
> The proposal is attached here:
> https://issues.apache.org/jira/browse/DRILL-4600
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Tue, Apr 12, 2016 at 5:54 PM, Neeraja Rentachintala <
> nrentachint...@maprtech.com> wrote:
>
> > Makes sense to postpone the debate : )
> > Will Look forward for the proposal.
> >
> > On Tuesday, April 12, 2016, Zelaine Fong  wrote:
> >
> > > As we discussed at this morning's hangout, Jacques took the action to
> put
> > > together a strawman compatibility points document.  Would it be better
> to
> > > wait for that document before we debate this further?
> > >
> > > -- Zelaine
> > >
> > > On Tue, Apr 12, 2016 at 4:39 PM, Jacques Nadeau  > > > wrote:
> > >
> > > > I agree with Paul, too. Perfect compatibility would be great. I
> > recognize
> > > > the issues that a version break could cause.  These are some of the
> > > issues
> > > > that I believe require a version break to address:
> > > > - Support nulls in lists.
> > > > - Distinguish null maps from empty maps.
> > > > - Distinguish null arrays from empty arrays.
> > > > - Support sparse maps (analogous to Parquet maps instead of our
> current
> > > > approach analogous to structs in Parquet lingo).
> > > > - Clean up decimal and enable it by default.
> > > > - Support full Avro <> Parquet roundtrip (and Parquet files generated
> > by
> > > > other tools).
> > > > - Enable union type by default.
> > > > - Improve performance execution performance of nullable values.
> > > >
> > > > I think these things need to be addressed in the 2.x line (let's say
> > that
> > > > is ~12 months). This is all about tradeoffs which is why I keep
> asking
> > > > people to provide concrete impact. If you think at least one of these
> > > > should be resolved, you're arguing for breaking wire compatibility
> > > between
> > > > 1.x and 2.x.
> > > >
> > > > So let's get concrete:
> > > >
> > > > - How many users are running multiple clusters and using a single
> > client
> > > to
> > > > connect them?
> > > > - What BI tools are most users using? What is the primary driver they
> > are
> > > > using?
> > > > - What BI tools are packaging a Drill driver? If any, what is the
> > update
> > > > process and lead time?
> > > > - How many users are skipping multiple Drill versions (e.g. going
> from
> > > 1.2
> > > > to 1.6)? (Beyond the MapR tick-tock pattern)
> > > > - How many users are delaying driver upgrade substantially? Are there
> > > > customers using the 1.0 driver?
> > > > - What is the average number of deployed clients per Drillbit
> cluster?
> > > >
> > > > These are some of the things that need to be evaluated to determine
> > > whether
> > > > we choose to implement a compatibility layer or simply make a full
> > break.
> > > > (And in reality, I'm not sure we have the resources to build and
> carry
> > a
> > > > complex compatibility layer for these changes.)
> > > >
> > > > Whatever the policy we agree upon for future commitments to the user
> > > base,
> > > > we're in a situation where there are very important reasons to move
> the
> > > > codebase forward and change the wire protocol for 2.x.
> > > >
> > > > I 

[jira] [Created] (DRILL-4606) Create DrillClient.Builder class

2016-04-13 Thread Sudheesh Katkam (JIRA)
Sudheesh Katkam created DRILL-4606:
--

 Summary: Create DrillClient.Builder class
 Key: DRILL-4606
 URL: https://issues.apache.org/jira/browse/DRILL-4606
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Sudheesh Katkam
Assignee: Sudheesh Katkam


+ Create a helper class to build DrillClient instances, and deprecate 
DrillClient constructors
+ Allow DrillClient to specify an event loop group (so user event loop can be 
used for queries from Web API calls)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: Drill 4581

2016-04-13 Thread paul-rogers
GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/477

Drill 4581

Provides fixes to the Drill launch scripts for multiple issues in 
DRILL-4581.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-4581-Fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/477.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #477


commit abbfe84e35517c37f59a507694a4f0224137d2b8
Author: Paul Rogers 
Date:   2016-04-08T21:04:40Z

Merge remote-tracking branch 'apache/master'

commit e68ab2d8c50ce4d64631f410cbcbe7f4e98aeb8c
Author: Paul Rogers 
Date:   2016-04-13T23:27:06Z

Merge remote-tracking branch 'apache/master'

commit 6b4bec80e5473f10c05006da58a32bf9bd3f6b13
Author: Paul Rogers 
Date:   2016-04-14T00:06:06Z

Fixed issues from DRILL-4581




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Drill on YARN

2016-04-13 Thread Jacques Nadeau
It sounds like Paul and John would both benefit from reviewing [1] & [2].

Drill's has memory management, respects limits and has a hierarchy of
allocators to do this. The framework for constraining certain operations,
fragments or queries all exists. (Note that this is entirely focused on
off-heap memory, in general Drill tries to avoid ever moving data on heap.)

Workload management is another topic and there is an initial proposal out
on that for comment here: [2]

The parallelization algorithms don't currently support heterogeneous nodes.
I'd suggest that initial work be done on adding or removing same sized
nodes. A separate substantial effort would be involved in better lopsided
parallelization and workload decisions. (Let's get the basics right first.)

With regards to Paul's comments on 'inside Drill' threading, I think you're
jumping to some incorrect conclusions. There hasn't been any formal
proposals to change the threading model. There was a very short discussion
a month or two back where Hanifi said he'd throw out some prototype code
but nothing has been shared since. I suggest you assume the current
threading model until there is a consensus around something new.

[1]
https://github.com/apache/drill/blob/master/exec/memory/base/src/main/java/org/apache/drill/exec/memory/README.md
[2]
https://docs.google.com/document/d/1xK6CyxwzpEbOrjOdmkd9GXf37dVaf7z0BsvBNLgsZWs/edit





--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Mar 28, 2016 at 8:43 AM, John Omernik  wrote:

> Great summary.  I'll fill in some "non-technical" explanations of some
> challenges with Memory as I see. Drill Devs, please keep Paul and I
> accurate in our understanding.
>
> First,  Memory is already set at the drillbit level... sorta.  It's set via
> ENV in drill-env, and is not a cluster specific thing. However, I believe
> there are some challenges that come into play when you have bits of
> different sizes. Drill "may" assume that bits are all the same size, and
> thus, if you run a query, depending on which bit is the foreman, and which
> fragments land where, the query may succeed or fail. That's not an ideal
> situation. I think for a holistic discussion on memory, we need to get some
> definitives around how Drill handles memory, especially different sized
> nodes, and what changes would need to be made for bits of different size to
> work well together on a production cluster.
>
> This discussion forms the basis of almost all work around memory
> management. If we can realistically only have bits of one size in it's
> current form, then static allocations are where we are going to be for the
> initial Yarn work. I love the idea of scaling up and down, but it will be
> difficult to scale an entire cluster worth of bits up and down, so
> heterogeneous resource allocations must be a prerequisite to dynamic
> allocation discussions (other then just adding and removing whole bits).
>
> Second, this also plays into the multiple drillbits per node discussion.
> If static sized bits are our only approach, then the initial reaction is to
> make them smaller so you have some granularity in scaling up and down.
> This may actually hurt a cluster.  Large queries may be challenged by
> trying to fit it's fragments on 3 nodes of say 8GB of direct RAM, but that
> query would run fine on bits of 24GB of direct RAM.  Drill Devs: Keep me
> honest here. I am going off of lots of participation in this memory/cpu
> discussions when I first started Drill/Marathon integration, and that is
> the feeling I got in talking to folks on and off list about memory
> management.
>
> This is a hard topic, but one that I am glad you are spearheading Paul,
>  because as we see more and more clusters get folded together, having a
> citizen that plays nice with others, and provides flexibility with regards
> to performance vs resource tradeoffs will be a huge selling/implementation
> point of any analytics tool.  If it's hard to implement and test at scale
> without dedicated hardware, it won't get a fair shake.
>
> John
>
>
> On Sun, Mar 27, 2016 at 3:25 PM, Paul Rogers  wrote:
>
> > Hi John,
> >
> > The other main topic of your discussion is memory management. Here we
> seem
> > to have 6 topics:
> >
> > 1. Setting the limits for Drill.
> > 2. Drill respects the limits.
> > 3. Drill lives within its memory “budget.”
> > 4. Drill throttles work based on available memory.
> > 5. Drill adapts memory usage to available memory.
> > 6. Some means to inform Drill of increases (or decreased) in memory
> > allocation.
> >
> > YARN, via container requests, solves the first problem. Someone (the
> > network admin) has to decide on the size of each drill-bit container, but
> > YARN handles allocating the space, preventing memory oversubscription,
> and
> > enforcing the limit (by killing processes that exceed their allocation.)
> >
> > As you pointed out, memory management is different than CPU: we can’t
> just
> > 

[GitHub] drill pull request: DRILL-3714: Avoid cascading disconnection when...

2016-04-13 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/463#discussion_r59631885
  
--- Diff: exec/rpc/src/main/java/org/apache/drill/exec/rpc/BasicClient.java 
---
@@ -282,15 +283,19 @@ public void interrupted(final InterruptedException 
ex) {
 
   private class ClientHandshakeHandler extends 
AbstractHandshakeHandler {
 
-public ClientHandshakeHandler() {
+private final R connection;
+
+public ClientHandshakeHandler(R connection) {
   super(BasicClient.this.handshakeType, 
BasicClient.this.handshakeParser);
+  Preconditions.checkNotNull(connection);
--- End diff --

Note that I dislike this construct so I didn't include this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3714: Avoid cascading disconnection when...

2016-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/463


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4605) Flatten doesn't return nested arrays correctly when Union is enabled

2016-04-13 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-4605:
-

 Summary: Flatten doesn't return nested arrays correctly when Union 
is enabled
 Key: DRILL-4605
 URL: https://issues.apache.org/jira/browse/DRILL-4605
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jacques Nadeau


File: 
{code}
{a:[[1,2,3], [4]]}
{code}

{code}
set `exec.enable_union_type` = false;
{code}

{code}
select flatten(a) as a from dfs.tmp.`blue.json`;
+--+
|a |
+--+
| [1,2,3]  |
| [4]  |
+--+
{code}

{code}
set `exec.enable_union_type` = true;
{code}

{code}
select flatten(a) as a from dfs.tmp.`blue.json`;
+---+
|   a   |
+---+
| null  |
| null  |
+---+
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4604) Generate warning on Web UI if drillbits have different versions

2016-04-13 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-4604:
---

 Summary: Generate warning on Web UI if drillbits have different 
versions
 Key: DRILL-4604
 URL: https://issues.apache.org/jira/browse/DRILL-4604
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.6.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.7.0


Display drillbit version on web UI. If any of drillbits version doesn't match 
with current drillbit, generate warning.
Screenshots - TBA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Proposal: Create v2 branch to work on breaking changes

2016-04-13 Thread George Chow
Hi Jacques,

Thanks for keeping the list in the loop. I'll be on the lookout for this
sanitized list.

George

On Wed, Apr 13, 2016 at 11:40 AM, Jacques Nadeau  wrote:

> For anyone following this thread, some of the people here reached out to me
> privately to better detail some concerns that they don't feel comfortable
> sharing publicly.
>
> I'm work with them to come up with a sanitized way to share the specific
> requirements that they are seeing so that we can come to a consensus of
> what is the best thing to do for v2 on the list.
>
> thanks,
> Jacques
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Tue, Apr 12, 2016 at 6:47 PM, Jacques Nadeau 
> wrote:
>
> > A general policy shouldn't hold up a specific decision. Even after we
> > establish a guiding policy, there will be exceptions that we will
> consider.
> > I'm looking for concrete counterpoint to the cost of maintaining
> backwards
> > compatibility.
> >
> > That being said, I have put together an initial proposal of the
> > compatibility commitments we should make to the users. It is important to
> > note that my outline is about our public commitment. As a development
> > community, we should always work to avoid disruptive or backwards
> > incompatible changes on public apis even if the our public commitment
> > policy doesn't dictate it.
> >
> > The proposal is attached here:
> > https://issues.apache.org/jira/browse/DRILL-4600
> >
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Tue, Apr 12, 2016 at 5:54 PM, Neeraja Rentachintala <
> > nrentachint...@maprtech.com> wrote:
> >
> >> Makes sense to postpone the debate : )
> >> Will Look forward for the proposal.
> >>
> >> On Tuesday, April 12, 2016, Zelaine Fong  wrote:
> >>
> >> > As we discussed at this morning's hangout, Jacques took the action to
> >> put
> >> > together a strawman compatibility points document.  Would it be better
> >> to
> >> > wait for that document before we debate this further?
> >> >
> >> > -- Zelaine
> >> >
> >> > On Tue, Apr 12, 2016 at 4:39 PM, Jacques Nadeau  >> > > wrote:
> >> >
> >> > > I agree with Paul, too. Perfect compatibility would be great. I
> >> recognize
> >> > > the issues that a version break could cause.  These are some of the
> >> > issues
> >> > > that I believe require a version break to address:
> >> > > - Support nulls in lists.
> >> > > - Distinguish null maps from empty maps.
> >> > > - Distinguish null arrays from empty arrays.
> >> > > - Support sparse maps (analogous to Parquet maps instead of our
> >> current
> >> > > approach analogous to structs in Parquet lingo).
> >> > > - Clean up decimal and enable it by default.
> >> > > - Support full Avro <> Parquet roundtrip (and Parquet files
> generated
> >> by
> >> > > other tools).
> >> > > - Enable union type by default.
> >> > > - Improve performance execution performance of nullable values.
> >> > >
> >> > > I think these things need to be addressed in the 2.x line (let's say
> >> that
> >> > > is ~12 months). This is all about tradeoffs which is why I keep
> asking
> >> > > people to provide concrete impact. If you think at least one of
> these
> >> > > should be resolved, you're arguing for breaking wire compatibility
> >> > between
> >> > > 1.x and 2.x.
> >> > >
> >> > > So let's get concrete:
> >> > >
> >> > > - How many users are running multiple clusters and using a single
> >> client
> >> > to
> >> > > connect them?
> >> > > - What BI tools are most users using? What is the primary driver
> they
> >> are
> >> > > using?
> >> > > - What BI tools are packaging a Drill driver? If any, what is the
> >> update
> >> > > process and lead time?
> >> > > - How many users are skipping multiple Drill versions (e.g. going
> from
> >> > 1.2
> >> > > to 1.6)? (Beyond the MapR tick-tock pattern)
> >> > > - How many users are delaying driver upgrade substantially? Are
> there
> >> > > customers using the 1.0 driver?
> >> > > - What is the average number of deployed clients per Drillbit
> cluster?
> >> > >
> >> > > These are some of the things that need to be evaluated to determine
> >> > whether
> >> > > we choose to implement a compatibility layer or simply make a full
> >> break.
> >> > > (And in reality, I'm not sure we have the resources to build and
> >> carry a
> >> > > complex compatibility layer for these changes.)
> >> > >
> >> > > Whatever the policy we agree upon for future commitments to the user
> >> > base,
> >> > > we're in a situation where there are very important reasons to move
> >> the
> >> > > codebase forward and change the wire protocol for 2.x.
> >> > >
> >> > > I think it is noble to strive towards backwards compatibility. We
> >> should
> >> > > always do this. However, I also think that--especially early in a
> >> > product's
> >> > > life--it is better to resolve technical debt issues and break a few
> >> eggs
> >> > > than defer and 

[GitHub] drill pull request: DRILL-4446: Support mandatory work assignment ...

2016-04-13 Thread vkorukanti
Github user vkorukanti closed the pull request at:

https://github.com/apache/drill/pull/403


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4562: Fix bug with nested union expressi...

2016-04-13 Thread vkorukanti
Github user vkorukanti commented on the pull request:

https://github.com/apache/drill/pull/455#issuecomment-209641593
  
LGTM, +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


median, quantile

2016-04-13 Thread Steven Phillips
I submitted a pull request a little while ago that introduces (approximate)
median and quantile functions using the tdigest library.

https://github.com/apache/drill/pull/456

It would be great if I could get some feedback on this. Specifically, is it
ok to call these functions median and quantile, given that they are not
exact.


[GitHub] drill pull request: DRILL-4603: Refactor FileSystem plugin code to...

2016-04-13 Thread adityakishore
Github user adityakishore commented on a diff in the pull request:

https://github.com/apache/drill/pull/476#discussion_r59603934
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSystemPluginImplementationProvider.java
 ---
@@ -0,0 +1,141 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * 
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * 
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.dfs;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.Lists;
+import org.apache.drill.common.exceptions.ExecutionSetupException;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.store.ClassPathFileSystem;
+import org.apache.drill.exec.store.LocalSyncableFileSystem;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+
+import java.io.IOException;
+import java.util.List;
+import java.util.Map;
+
+import static 
org.apache.drill.exec.store.dfs.FileSystemSchemaFactory.DEFAULT_WS_NAME;
+
+/**
+ * Provides needed component by the {@link FileSystemPlugin}. This can be 
overridden to supply customized components
+ * (such as custom schema factory) to {@link FileSystemPlugin}.
+ */
+public class FileSystemPluginImplementationProvider {
+
+  protected final FileSystemConfig fsConfig;
+  protected final String fsPluginName;
+  protected final DrillbitContext dContext;
+
+  private FormatCreator formatCreator; /** Don't use this directly, use 
{@link #getFormatCreator(Configuration)} */
+
+  /**
+   * Instantiate an object
+   * @param dContext {@link DrillbitContext} instance.
+   * @param fsPluginName Name of the FileSystemPlugin storage
+   * @param fsConfig FileSystemPlugin configuration
+   */
+  public FileSystemPluginImplementationProvider(
+  final DrillbitContext dContext,
+  final String fsPluginName,
+  final FileSystemConfig fsConfig) {
+this.dContext = dContext;
+this.fsConfig = fsConfig;
+this.fsPluginName = fsPluginName;
+  }
+
+  /**
+   * @return Return any properties needed for {@link Configuration} object 
that are not set in
+   * {@link FileSystemPlugin}'s configuration.
+   */
+  public Map getFsProps() {
+return ImmutableMap.of(
+"fs.classpath.impl", ClassPathFileSystem.class.getName(),
+"fs.drill-local.impl", LocalSyncableFileSystem.class.getName()
+);
+  }
+
+  /**
+   * @return Create and return {@link FormatCreator} based on the storage 
plugin configuration.
+   */
+  public FormatCreator getFormatCreator(final Configuration fsConf) {
+if (formatCreator == null) {
--- End diff --

Is the `FormatCreator` instance tied to some settings in `fsConf`?

If yes, wouldn't a subsequent invocation of this function ignore the passed 
`fsConf` with different values of such settings?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Proposal: Create v2 branch to work on breaking changes

2016-04-13 Thread Jacques Nadeau
For anyone following this thread, some of the people here reached out to me
privately to better detail some concerns that they don't feel comfortable
sharing publicly.

I'm work with them to come up with a sanitized way to share the specific
requirements that they are seeing so that we can come to a consensus of
what is the best thing to do for v2 on the list.

thanks,
Jacques


--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Tue, Apr 12, 2016 at 6:47 PM, Jacques Nadeau  wrote:

> A general policy shouldn't hold up a specific decision. Even after we
> establish a guiding policy, there will be exceptions that we will consider.
> I'm looking for concrete counterpoint to the cost of maintaining backwards
> compatibility.
>
> That being said, I have put together an initial proposal of the
> compatibility commitments we should make to the users. It is important to
> note that my outline is about our public commitment. As a development
> community, we should always work to avoid disruptive or backwards
> incompatible changes on public apis even if the our public commitment
> policy doesn't dictate it.
>
> The proposal is attached here:
> https://issues.apache.org/jira/browse/DRILL-4600
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Tue, Apr 12, 2016 at 5:54 PM, Neeraja Rentachintala <
> nrentachint...@maprtech.com> wrote:
>
>> Makes sense to postpone the debate : )
>> Will Look forward for the proposal.
>>
>> On Tuesday, April 12, 2016, Zelaine Fong  wrote:
>>
>> > As we discussed at this morning's hangout, Jacques took the action to
>> put
>> > together a strawman compatibility points document.  Would it be better
>> to
>> > wait for that document before we debate this further?
>> >
>> > -- Zelaine
>> >
>> > On Tue, Apr 12, 2016 at 4:39 PM, Jacques Nadeau > > > wrote:
>> >
>> > > I agree with Paul, too. Perfect compatibility would be great. I
>> recognize
>> > > the issues that a version break could cause.  These are some of the
>> > issues
>> > > that I believe require a version break to address:
>> > > - Support nulls in lists.
>> > > - Distinguish null maps from empty maps.
>> > > - Distinguish null arrays from empty arrays.
>> > > - Support sparse maps (analogous to Parquet maps instead of our
>> current
>> > > approach analogous to structs in Parquet lingo).
>> > > - Clean up decimal and enable it by default.
>> > > - Support full Avro <> Parquet roundtrip (and Parquet files generated
>> by
>> > > other tools).
>> > > - Enable union type by default.
>> > > - Improve performance execution performance of nullable values.
>> > >
>> > > I think these things need to be addressed in the 2.x line (let's say
>> that
>> > > is ~12 months). This is all about tradeoffs which is why I keep asking
>> > > people to provide concrete impact. If you think at least one of these
>> > > should be resolved, you're arguing for breaking wire compatibility
>> > between
>> > > 1.x and 2.x.
>> > >
>> > > So let's get concrete:
>> > >
>> > > - How many users are running multiple clusters and using a single
>> client
>> > to
>> > > connect them?
>> > > - What BI tools are most users using? What is the primary driver they
>> are
>> > > using?
>> > > - What BI tools are packaging a Drill driver? If any, what is the
>> update
>> > > process and lead time?
>> > > - How many users are skipping multiple Drill versions (e.g. going from
>> > 1.2
>> > > to 1.6)? (Beyond the MapR tick-tock pattern)
>> > > - How many users are delaying driver upgrade substantially? Are there
>> > > customers using the 1.0 driver?
>> > > - What is the average number of deployed clients per Drillbit cluster?
>> > >
>> > > These are some of the things that need to be evaluated to determine
>> > whether
>> > > we choose to implement a compatibility layer or simply make a full
>> break.
>> > > (And in reality, I'm not sure we have the resources to build and
>> carry a
>> > > complex compatibility layer for these changes.)
>> > >
>> > > Whatever the policy we agree upon for future commitments to the user
>> > base,
>> > > we're in a situation where there are very important reasons to move
>> the
>> > > codebase forward and change the wire protocol for 2.x.
>> > >
>> > > I think it is noble to strive towards backwards compatibility. We
>> should
>> > > always do this. However, I also think that--especially early in a
>> > product's
>> > > life--it is better to resolve technical debt issues and break a few
>> eggs
>> > > than defer and carry a bunch of extra code around.
>> > >
>> > > Yes, it can suck for users. Luckily, we should also be giving users a
>> > bunch
>> > > of positive reasons that it is worth upgrading and dealing with this
>> > > version break. These include better perf, better compatibility with
>> other
>> > > tools, union type support, faster bi tool behaviors and a number of
>> other
>> > > things.
>> > >
>> > > I for one vote for moving forward and making 

[GitHub] drill pull request: DRILL-4603: Refactor FileSystem plugin code to...

2016-04-13 Thread vkorukanti
GitHub user vkorukanti opened a pull request:

https://github.com/apache/drill/pull/476

DRILL-4603: Refactor FileSystem plugin code to allow customizations

- Add a FileSystemPluginImplementationProvider to allow customizing:
  - Configuration
  - WorkspaceSchemaFactory lists
  - FormatCreator

- Separate out WorkspaceSchema from WorkspaceSchemaFactory

- Create a Configuration object and reuse it wherever new copied of 
Configuration are needed

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vkorukanti/drill DRILL-4603

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/476.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #476


commit d43f1de57cb012929e37d11f2b3f57885af8c92d
Author: Venki Korukanti 
Date:   2016-04-07T02:55:29Z

DRILL-4603: Refactor FileSystem plugin code to allow customizations

- Add a FileSystemPluginImplementationProvider to allow customizing:
  - Configuration
  - WorkspaceSchemaFactory lists
  - FormatCreator

- Separate out WorkspaceSchema from WorkspaceSchemaFactory

- Create a Configuration object and reuse it wherever new copied of 
Configuration are needed




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (DRILL-4446) Improve current fragment parallelization module

2016-04-13 Thread Venki Korukanti (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti resolved DRILL-4446.

Resolution: Fixed

> Improve current fragment parallelization module
> ---
>
> Key: DRILL-4446
> URL: https://issues.apache.org/jira/browse/DRILL-4446
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.5.0
>Reporter: Venki Korukanti
>Assignee: Venki Korukanti
> Fix For: 1.7.0
>
>
> Current fragment parallelizer {{SimpleParallelizer.java}} can’t handle 
> correctly the case where an operator has mandatory scheduling requirement for 
> a set of DrillbitEndpoints and affinity for each DrillbitEndpoint (i.e how 
> much portion of the total tasks to be scheduled on each DrillbitEndpoint). It 
> assumes that scheduling requirements are soft (except one case where Mux and 
> DeMux case where mandatory parallelization requirement of 1 unit). 
> An example is:
> Cluster has 3 nodes running Drillbits and storage service on each. Data for a 
> table is only present at storage services in two nodes. So a GroupScan needs 
> to be scheduled on these two nodes in order to read the data. Storage service 
> doesn't support (or costly) reading data from remote node.
> Inserting the mandatory scheduling requirements within existing 
> SimpleParallelizer is not sufficient as you may end up with a plan that has a 
> fragment with two GroupScans each having its own hard parallelization 
> requirements.
> Proposal is:
> Add a property to each operator which tells what parallelization 
> implementation to use. Most operators don't have any particular strategy 
> (such as Project or Filter), they depend on incoming operator. Current 
> existing operators which have requirements (all existing GroupScans) default 
> to current parallelizer {{SimpleParallelizer}}. {{Screen}} defaults to new 
> mandatory assignment parallelizer. It is possible that PhysicalPlan generated 
> can have a fragment with operators having different parallelization 
> strategies. In that case an exchange is inserted in between operators where a 
> change in parallelization strategy is required.
> Will send a detailed design doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-4593: Remove OldAssignmentCreator in Fil...

2016-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/473


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4593: Remove OldAssignmentCreator in Fil...

2016-04-13 Thread StevenMPhillips
Github user StevenMPhillips commented on the pull request:

https://github.com/apache/drill/pull/473#issuecomment-209580405
  
+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4603) Refactor FileSystem plugin code to allow customizations

2016-04-13 Thread Venki Korukanti (JIRA)
Venki Korukanti created DRILL-4603:
--

 Summary: Refactor FileSystem plugin code to allow customizations
 Key: DRILL-4603
 URL: https://issues.apache.org/jira/browse/DRILL-4603
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Other
Reporter: Venki Korukanti
Assignee: Venki Korukanti
 Fix For: 1.7.0


Currently FileSystemPlugin is hard to extend, lot of logic for creating 
component implementations ({{WorkspaceSchemaFactory}}s, {{FormatCreator}, 
defining default workspaces and configuration (implicit to FileSystem 
implementation)) are hard coded in constructor.
 
This JIRA is to track 
 * refactoring the FileSystemPlugin to allow custom component implementations 
(Configuration, WorkSpaceSchemaFactory, FileSystemSchemaFactory or 
FormatCreator).
 * Share a single Hadoop {{Configuration}} object to create new 
{{Configuration}} objects. Creating a new {{Configuration}} without an existing 
copy is not efficient, because it involves scanning the classpath for *-site 
files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Created] (DRILL-4602) Avro files dont work if the union format is ["some-type", "null"]

2016-04-13 Thread Stefán Baxter
Hi,

I'm using the union format [ "null", "some-type"] and it fails.
I have been meaning to get you test data but we have a product soft-launch
in 2 weeks and I doubt we will have time until after that is done.

Regards,
 -Stefán

On Wed, Apr 13, 2016 at 3:30 PM, Christian (JIRA)  wrote:

> Christian created DRILL-4602:
> 
>
>  Summary: Avro files dont work if the union format is
> ["some-type", "null"]
>  Key: DRILL-4602
>  URL: https://issues.apache.org/jira/browse/DRILL-4602
>  Project: Apache Drill
>   Issue Type: Bug
>   Components: Storage - Avro
> Affects Versions: 1.6.0
> Reporter: Christian
>  Fix For: 1.7.0
>  Attachments: 0001-Fixing-avro-union-types.patch
>
> An avro files generated by a different system (e.g. Spark) can have a
> slightly different union format, that is not understood by drill. For
> example ["some-type", "null"] will cause an error when [ "null",
> "some-type"] still works.
>
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>


[GitHub] drill pull request: DRILL-4576: Add PlannerCallback interface for ...

2016-04-13 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/466#discussion_r59574906
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import java.util.Collection;
+
+import org.apache.calcite.plan.RelOptPlanner;
+
+/**
+ * A callback that StoragePlugins can initialize to allow further 
configuration
+ * of the Planner at initialization time. Examples could be to allow 
adding lattices,
+ * materializations or additional traits to the planner that will be used 
in
+ * planning.
+ */
+public abstract class PlannerCallback {
--- End diff --

Generally, I think up through Java 7, trying to do functional programming 
creates unmaintainable and hard to understand code. I've seen some projects 
which especially suffer from this. If we deprecated Java 7, I'd likely change 
my tune. I'm a fan of functional programming when the language has good support 
for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4574: Avro Plugin: Flatten does not work...

2016-04-13 Thread jacques-n
Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/459#issuecomment-209521505
  
Nice first patch. Thanks for the contribution.

+1

Can you collapse this into a single commit and make sure the commit message 
matches the Drill style: 

DRILL-: A short description

Longer description




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4576: Add PlannerCallback interface for ...

2016-04-13 Thread laurentgo
Github user laurentgo commented on a diff in the pull request:

https://github.com/apache/drill/pull/466#discussion_r59572951
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import java.util.Collection;
+
+import org.apache.calcite.plan.RelOptPlanner;
+
+/**
+ * A callback that StoragePlugins can initialize to allow further 
configuration
+ * of the Planner at initialization time. Examples could be to allow 
adding lattices,
+ * materializations or additional traits to the planner that will be used 
in
+ * planning.
+ */
+public abstract class PlannerCallback {
--- End diff --

Hope you can share your thoughts regarding `Function<>`, I might have spent 
too much time doing functional programming in the past years :) interfaces are 
usually easier to mock. Also with java8, you can annotate them with 
`@FunctionInterface` and use lambda expression directly (but also in java8 you 
can add default methods to interface which makes abstract classes less 
compulsory as a way of not breaking backward compatibility).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4571: Add link to local Drill logs from ...

2016-04-13 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/472#discussion_r59571508
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -303,4 +303,9 @@
   StringValidator IMPERSONATION_POLICY_VALIDATOR =
   new 
InboundImpersonationManager.InboundImpersonationPolicyValidator(IMPERSONATION_POLICIES_KEY,
 "[]");
 
+  /**
+   * Web settings
+   */
+  String WEB_LOGS_MAX_LINES = "drill.web.logs.max_lines";
--- End diff --

Should use system option. Anything that isn't bootstrap shouldn't be kept 
out of *.conf


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4576: Add PlannerCallback interface for ...

2016-04-13 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/466#discussion_r59570703
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import java.util.Collection;
+
+import org.apache.calcite.plan.RelOptPlanner;
+
+/**
+ * A callback that StoragePlugins can initialize to allow further 
configuration
+ * of the Planner at initialization time. Examples could be to allow 
adding lattices,
+ * materializations or additional traits to the planner that will be used 
in
+ * planning.
+ */
+public abstract class PlannerCallback {
+
+  /**
+   * Method that will be called before a planner is used to further 
configure the planner.
+   * @param planner The planner to be configured.
+   */
+  public abstract void initializePlanner(RelOptPlanner planner);
--- End diff --

onInitialization is better. Will update.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4576: Add PlannerCallback interface for ...

2016-04-13 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/466#discussion_r59570591
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import java.util.Collection;
+
+import org.apache.calcite.plan.RelOptPlanner;
+
+/**
+ * A callback that StoragePlugins can initialize to allow further 
configuration
+ * of the Planner at initialization time. Examples could be to allow 
adding lattices,
+ * materializations or additional traits to the planner that will be used 
in
+ * planning.
+ */
+public abstract class PlannerCallback {
+
+  /**
+   * Method that will be called before a planner is used to further 
configure the planner.
+   * @param planner The planner to be configured.
+   */
+  public abstract void initializePlanner(RelOptPlanner planner);
+
+
+  public static PlannerCallback merge(Collection 
callbacks){
--- End diff --

Probably has to do with whether Eclipse cleans up my formatting. We should 
have checkstyles for these things. Let's try to do something with v2 so we 
remove this inconsistency from the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-4576: Add PlannerCallback interface for ...

2016-04-13 Thread jacques-n
Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/466#discussion_r59570320
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerCallback.java 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner;
+
+import java.util.Collection;
+
+import org.apache.calcite.plan.RelOptPlanner;
+
+/**
+ * A callback that StoragePlugins can initialize to allow further 
configuration
+ * of the Planner at initialization time. Examples could be to allow 
adding lattices,
+ * materializations or additional traits to the planner that will be used 
in
+ * planning.
+ */
+public abstract class PlannerCallback {
--- End diff --

I'm against using the Function<> pattern in general. Thought class would be 
easier to extend without breaking backwards compatibility. I could do an 
interface + abstract class but that just seems overly complex/heavy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4602) Avro files dont work if the union format is ["some-type", "null"]

2016-04-13 Thread Christian (JIRA)
Christian created DRILL-4602:


 Summary: Avro files dont work if the union format is ["some-type", 
"null"]
 Key: DRILL-4602
 URL: https://issues.apache.org/jira/browse/DRILL-4602
 Project: Apache Drill
  Issue Type: Bug
  Components: Storage - Avro
Affects Versions: 1.6.0
Reporter: Christian
 Fix For: 1.7.0
 Attachments: 0001-Fixing-avro-union-types.patch

An avro files generated by a different system (e.g. Spark) can have a slightly 
different union format, that is not understood by drill. For example 
["some-type", "null"] will cause an error when [ "null", "some-type"] still 
works. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3940) Make RecordBatch AutoCloseable

2016-04-13 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-3940.
---
Resolution: Fixed
  Assignee: Jacques Nadeau  (was: Chris Westin)

RecordBatch wasn't made autocloseable. Instead, CloseableRecordBatch was 
created that is managed by the framework so the interface doesn't leak into the 
operators.

> Make RecordBatch AutoCloseable
> --
>
> Key: DRILL-3940
> URL: https://issues.apache.org/jira/browse/DRILL-3940
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
>Reporter: Chris Westin
>Assignee: Jacques Nadeau
>
> This made it easier to find RecordBatches that were not cleaned up (because 
> the compiler complains about AutoCloseable resources that haven't been 
> closed).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Fwd: Reading Avro Arrays

2016-04-13 Thread Johannes Schulte
And again with the right dev mailing list...

-- Forwarded message --
From: Johannes Schulte 
Date: Wed, Apr 13, 2016 at 2:21 PM
Subject: Fwd: Reading Avro Arrays
To: drill-...@apache.org


Hi!

This pull request fixes a problem with FLATTEN on nested avro records.
Please see posts from the user list and the issue
https://issues.apache.org/jira/browse/DRILL-4574 for documentation.

I would love to get some feedback!

Johannes

https://github.com/apache/drill/pull/459



-- Forwarded message --
From: Johannes Schulte 
Date: Tue, Apr 12, 2016 at 11:33 PM
Subject: Re: Reading Avro Arrays
To: u...@drill.apache.org


After some evenings of digging into the code i more or less had a lucky
moment and was able to fix the problem. I wonder why nobody else ran into
this problem until now - for me it was a blocker to drill adoption and i am
really surprised nobody else ever encountered this issue. I hope that
somebody with more knowledge of the codebase can review this and integrate
it soon.


On Sun, Apr 3, 2016 at 11:29 AM, Johannes Schulte <
johannes.schu...@gmail.com> wrote:

> Alright, thanks! I created a pull request and are very open for any input
>
> https://github.com/apache/drill/pull/459
>
> Cheers,
>
> Johannes
>
> On Sun, Apr 3, 2016 at 9:10 AM, Abdel Hakim Deneche  > wrote:
>
>> pull requests are fine. You still need a JIRA though
>>
>> On Sun, Apr 3, 2016 at 8:03 AM, Johannes Schulte <
>> johannes.schu...@gmail.com
>> > wrote:
>>
>> > I now extended the AvroFormatTest-Suite by two unit tests that show that
>> >
>> > * Flattening of primitive array works as expected
>> > * Flattening of arrays of records does not work properly
>> >
>> > I spent some time trying to find the reason but it's my first contact
>> with
>> > the drill-codebase.
>> >
>> > Is the recommended way of making this unit test available still to
>> attach a
>> > patch in an issue or is a pull-request also an option?
>> >
>> > In the context of the recent avro maturity discussion I would love to
>> fix
>> > this error myself but I would need some hints what goes wrong there
>> > internally.
>> >
>> > Johannes
>> >
>> > On Fri, Mar 25, 2016 at 10:50 PM, Johannes Schulte <
>> > johannes.schu...@gmail.com> wrote:
>> >
>> > > Hi Stefan, hi Jacques, thanks for going after this - I almost
>> resignated
>> > > but know i think it was because i accessed the data over jdbc with
>> > squirrel
>> > > and got irritated by the unknown type column there. nonetheless, if
>> the
>> > > schema looks like this:
>> > >
>> > >
>> > > {
>> > >   "type" : "record",
>> > >   "name" : "MainRecord",
>> > >   "namespace" : "drizz.WriteAvroTestFileForDrill$",
>> > >   "fields" : [ {
>> > > "name" : "elements",
>> > > "type" : {
>> > >   "type" : "array",
>> > >   "items" : {
>> > > "type" : "record",
>> > > "name" : "NestedRecord",
>> > > "fields" : [ {
>> > >   "name" : "field1",
>> > >   "type" : "int"
>> > > } ]
>> > >   },
>> > >   "java-class" : "java.util.List"
>> > > }
>> > >   } ]
>> > > }
>> > >
>> > > and the contents looks like this (according to avro tojson command
>> line
>> > > utility)
>> > >
>> > >
>> > >
>> >
>> {"elements":[{"field1":0},{"field1":1},{"field1":2},{"field1":3},{"field1":4},{"field1":5},{"field1":6},{"field1":7},{"field1":8},{"field1":9}]}
>> > >
>> > >
>> >
>> {"elements":[{"field1":0},{"field1":1},{"field1":2},{"field1":3},{"field1":4},{"field1":5},{"field1":6},{"field1":7},{"field1":8},{"field1":9}]}
>> > >
>> > > a query like
>> > >
>> > > select flatten(elements) from
>> > > dfs.`/Users/j.schulte/data/avro-drill/no-union/`;
>> > >
>> > > yields exactly two rows:
>> > > +---+
>> > > |EXPR$0 |
>> > > +---+
>> > > | {"field1":9}  |
>> > > | {"field1":9}  |
>> > > +---+
>> > >
>> > > as if only the last element in the array would survive.
>> > >
>> > > Thanks for your help so far..
>> > >
>> > > On Fri, Mar 25, 2016 at 5:45 PM, Stefán Baxter <
>> > ste...@activitystream.com>
>> > > wrote:
>> > >
>> > >> Johannes, Jacques is right.
>> > >>
>> > >> I only tested the flattening of maps and not the flattening of
>> > >> list-of-maps.
>> > >>
>> > >> -Stefan
>> > >>
>> > >> On Fri, Mar 25, 2016 at 4:12 PM, Jacques Nadeau 
>> > >> wrote:
>> > >>
>> > >> > I think there is some incorrect information and confusion in this
>> > >> thread.
>> > >> > Could you please share a piece of sample data and a specific query?
>> > The
>> > >> > error message shown in your original email is suggesting that you
>> were
>> > >> > trying to flatten a map rather than an array of maps. Flatten is
>> for
>> > >> arrays
>> > >> > only. The arrays can have scalars or complex objects in them.
>> > >> >
>> > >> > --
>> > >> > Jacques Nadeau
>> > >> > CTO and Co-Founder, Dremio
>> > >> >
>> > >> > On Fri, Mar 25, 

Fwd: Reading Avro Arrays

2016-04-13 Thread Johannes Schulte
Hi!

This pull request fixes a problem with FLATTEN on nested avro records.
Please see posts from the user list and the issue
https://issues.apache.org/jira/browse/DRILL-4574 for documentation.

I would love to get some feedback!

Johannes

https://github.com/apache/drill/pull/459


-- Forwarded message --
From: Johannes Schulte 
Date: Tue, Apr 12, 2016 at 11:33 PM
Subject: Re: Reading Avro Arrays
To: u...@drill.apache.org


After some evenings of digging into the code i more or less had a lucky
moment and was able to fix the problem. I wonder why nobody else ran into
this problem until now - for me it was a blocker to drill adoption and i am
really surprised nobody else ever encountered this issue. I hope that
somebody with more knowledge of the codebase can review this and integrate
it soon.


On Sun, Apr 3, 2016 at 11:29 AM, Johannes Schulte <
johannes.schu...@gmail.com> wrote:

> Alright, thanks! I created a pull request and are very open for any input
>
> https://github.com/apache/drill/pull/459
>
> Cheers,
>
> Johannes
>
> On Sun, Apr 3, 2016 at 9:10 AM, Abdel Hakim Deneche  > wrote:
>
>> pull requests are fine. You still need a JIRA though
>>
>> On Sun, Apr 3, 2016 at 8:03 AM, Johannes Schulte <
>> johannes.schu...@gmail.com
>> > wrote:
>>
>> > I now extended the AvroFormatTest-Suite by two unit tests that show that
>> >
>> > * Flattening of primitive array works as expected
>> > * Flattening of arrays of records does not work properly
>> >
>> > I spent some time trying to find the reason but it's my first contact
>> with
>> > the drill-codebase.
>> >
>> > Is the recommended way of making this unit test available still to
>> attach a
>> > patch in an issue or is a pull-request also an option?
>> >
>> > In the context of the recent avro maturity discussion I would love to
>> fix
>> > this error myself but I would need some hints what goes wrong there
>> > internally.
>> >
>> > Johannes
>> >
>> > On Fri, Mar 25, 2016 at 10:50 PM, Johannes Schulte <
>> > johannes.schu...@gmail.com> wrote:
>> >
>> > > Hi Stefan, hi Jacques, thanks for going after this - I almost
>> resignated
>> > > but know i think it was because i accessed the data over jdbc with
>> > squirrel
>> > > and got irritated by the unknown type column there. nonetheless, if
>> the
>> > > schema looks like this:
>> > >
>> > >
>> > > {
>> > >   "type" : "record",
>> > >   "name" : "MainRecord",
>> > >   "namespace" : "drizz.WriteAvroTestFileForDrill$",
>> > >   "fields" : [ {
>> > > "name" : "elements",
>> > > "type" : {
>> > >   "type" : "array",
>> > >   "items" : {
>> > > "type" : "record",
>> > > "name" : "NestedRecord",
>> > > "fields" : [ {
>> > >   "name" : "field1",
>> > >   "type" : "int"
>> > > } ]
>> > >   },
>> > >   "java-class" : "java.util.List"
>> > > }
>> > >   } ]
>> > > }
>> > >
>> > > and the contents looks like this (according to avro tojson command
>> line
>> > > utility)
>> > >
>> > >
>> > >
>> >
>> {"elements":[{"field1":0},{"field1":1},{"field1":2},{"field1":3},{"field1":4},{"field1":5},{"field1":6},{"field1":7},{"field1":8},{"field1":9}]}
>> > >
>> > >
>> >
>> {"elements":[{"field1":0},{"field1":1},{"field1":2},{"field1":3},{"field1":4},{"field1":5},{"field1":6},{"field1":7},{"field1":8},{"field1":9}]}
>> > >
>> > > a query like
>> > >
>> > > select flatten(elements) from
>> > > dfs.`/Users/j.schulte/data/avro-drill/no-union/`;
>> > >
>> > > yields exactly two rows:
>> > > +---+
>> > > |EXPR$0 |
>> > > +---+
>> > > | {"field1":9}  |
>> > > | {"field1":9}  |
>> > > +---+
>> > >
>> > > as if only the last element in the array would survive.
>> > >
>> > > Thanks for your help so far..
>> > >
>> > > On Fri, Mar 25, 2016 at 5:45 PM, Stefán Baxter <
>> > ste...@activitystream.com>
>> > > wrote:
>> > >
>> > >> Johannes, Jacques is right.
>> > >>
>> > >> I only tested the flattening of maps and not the flattening of
>> > >> list-of-maps.
>> > >>
>> > >> -Stefan
>> > >>
>> > >> On Fri, Mar 25, 2016 at 4:12 PM, Jacques Nadeau 
>> > >> wrote:
>> > >>
>> > >> > I think there is some incorrect information and confusion in this
>> > >> thread.
>> > >> > Could you please share a piece of sample data and a specific query?
>> > The
>> > >> > error message shown in your original email is suggesting that you
>> were
>> > >> > trying to flatten a map rather than an array of maps. Flatten is
>> for
>> > >> arrays
>> > >> > only. The arrays can have scalars or complex objects in them.
>> > >> >
>> > >> > --
>> > >> > Jacques Nadeau
>> > >> > CTO and Co-Founder, Dremio
>> > >> >
>> > >> > On Fri, Mar 25, 2016 at 2:00 AM, Johannes Schulte <
>> > >> > johannes.schu...@gmail.com> wrote:
>> > >> >
>> > >> > > Hi Stefan,
>> > >> > >
>> > >> > > thanks for this information - so it seems that there is
>> currently no
>> > >> way
>> > >> > of

[GitHub] drill pull request: DRILL-4584: JDBC/ODBC Client IP in Drill audit...

2016-04-13 Thread vdiravka
GitHub user vdiravka opened a pull request:

https://github.com/apache/drill/pull/475

DRILL-4584: JDBC/ODBC Client IP in Drill audit logs

The format of added field in log files is 
"remoteAddress":"192.168.121.1:58984"

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vdiravka/drill DRILL-4584

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #475


commit 92269e33c56f627671f266c47aca52141c288bda
Author: Vitalii Diravka 
Date:   2016-04-08T08:01:37Z

DRILL-4584: JDBC/ODBC Client IP in Drill audit logs
- the format of added field in log files is 
"remoteAddress":"192.168.121.1:58984"




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (DRILL-4601) Partitioning based on the parquet statistics

2016-04-13 Thread Miroslav Holubec (JIRA)
Miroslav Holubec created DRILL-4601:
---

 Summary: Partitioning based on the parquet statistics
 Key: DRILL-4601
 URL: https://issues.apache.org/jira/browse/DRILL-4601
 Project: Apache Drill
  Issue Type: Improvement
  Components: Query Planning & Optimization
Reporter: Miroslav Holubec


It can really help performance to extend current partitioning idea implemented 
in DRILL- even further.
Currently partitioning is based on statistics, when min value equals to max 
value for whole file. Based on this files are removed from scan in planning 
phase. Problem with this is, that it leads to many small parquet files, which 
is not fine in HDFS world. Also only few columns are partitioned.

I would like to extend this idea to use all statistics for all columns. So if 
value should equal to constant, remove all files from plan which have 
statistics off. This will really help performance for scans over many parquet 
files.

I have initial patch ready, currently just to give an idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)