Re: [ANNOUNCE] New Committer: Jyothsna Donapati

2019-05-09 Thread Timothy Farkas
Congrats!!


On Thu, May 9, 2019 at 2:54 PM Bridget Bevens  wrote:

> Congratulations, Jyothsna!!! :-)
>
> On Thu, May 9, 2019 at 2:46 PM Khurram Faraaz  wrote:
>
> > Congratulations Jyothsna!
> >
> > On Thu, May 9, 2019 at 2:38 PM salim achouche 
> > wrote:
> >
> > > Congratulations Jyothsna!
> > >
> > > On Thu, May 9, 2019 at 2:28 PM Aman Sinha 
> wrote:
> > >
> > > > The Project Management Committee (PMC) for Apache Drill has invited
> > > > Jyothsna
> > > > Donapati to become a committer, and we are pleased to announce that
> she
> > > has
> > > > accepted.
> > > >
> > > > Jyothsna has been contributing to Drill for about 1 1/2 years.  She
> > > > initially contributed the graceful shutdown capability and more
> > recently
> > > > has made several crucial improvements in the parquet metadata caching
> > > which
> > > > have gone into the 1.16 release.  She also co-authored the design
> > > document
> > > > for this feature.
> > > >
> > > > Welcome Jyothsna, and thank you for your contributions.  Keep up the
> > good
> > > > work
> > > > !
> > > >
> > > > -Aman
> > > > (on behalf of Drill PMC)
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > Salim
> > >
> >
>


Re: [ANNOUNCE] New PMC member: Sorabh Hamirwasia

2019-04-05 Thread Timothy Farkas
Congrats!

Tim

On Fri, Apr 5, 2019 at 9:06 AM Arina Ielchiieva  wrote:

> I am pleased to announce that Drill PMC invited Sorabh Hamirwasia to
> the PMC and
> he has accepted the invitation.
>
> Congratulations Sorabh and welcome!
>
> - Arina
> (on behalf of Drill PMC)
>


Re: [ANNOUNCE] New Committer: Salim Achouche

2018-12-17 Thread Timothy Farkas
Congrats!

On Mon, Dec 17, 2018 at 9:37 AM Aman Sinha  wrote:

> Congratulations Salim !  Thanks for your contributions !
>
> Aman
>
> On Mon, Dec 17, 2018 at 3:20 AM Vitalii Diravka 
> wrote:
>
> > Congratulations Salim!
> > Well deserved!
> >
> > Kind regards
> > Vitalii
> >
> >
> > On Mon, Dec 17, 2018 at 12:40 PM Arina Ielchiieva 
> > wrote:
> >
> > > The Project Management Committee (PMC) for Apache Drill has invited
> Salim
> > > Achouche to become a committer, and we are pleased to announce that he
> > has
> > > accepted.
> > >
> > > Salim Achouche [1] started contributing to the Drill project in 2017.
> He
> > > has made many improvements for the parquet reader, including
> performance
> > > for flat data types, columnar parquet batch sizing functionality, fixed
> > > various bugs and memory leaks. He also optimized implicit columns
> > handling
> > > with scanner and improved sql pattern contains performance.
> > >
> > > Welcome Salim, and thank you for your contributions!
> > >
> > > - Arina
> > > (on behalf of Drill PMC)
> > >
> >
>


Re: [ANNOUNCE] New Committer: Karthikeyan Manivannan

2018-12-07 Thread Timothy Farkas
Congrats Karthik!!!

On Fri, Dec 7, 2018 at 11:15 AM Khurram Faraaz  wrote:

> Congratulations Karthik!!
>
> On Fri, Dec 7, 2018 at 11:12 AM Abhishek Girish 
> wrote:
>
> > Congratulations Karthik!
> >
> > On Fri, Dec 7, 2018 at 11:11 AM Arina Ielchiieva 
> wrote:
> >
> > > The Project Management Committee (PMC) for Apache Drill has invited
> > > Karthikeyan
> > > Manivannan to become a committer, and we are pleased to announce that
> he
> > > has accepted.
> > >
> > > Karthik started contributing to the Drill project in 2016. He has
> > > implemented changes in various Drill areas, including batch sizing,
> > > security, code-gen, C++ part. One of his latest improvements is  ACL
> > > support for Drill ZK nodes.
> > >
> > > Welcome Karthik, and thank you for your contributions!
> > >
> > > - Arina
> > > (on behalf of Drill PMC)
> > >
> >
>


Re: Hangout Discussion Topics

2018-11-12 Thread Timothy Farkas
Works for me. Then let's skip the hangout tomorrow. If there are any
objections please feel free to respond to this thread.

Thanks,
Tim

On Mon, Nov 12, 2018 at 5:35 PM Aman Sinha  wrote:

> Since we are having the Drill Developer day on Wednesday, perhaps we can
> skip the hangout tomorrow ?
>
> Aman
>
> On Mon, Nov 12, 2018 at 10:13 AM Timothy Farkas  wrote:
>
> > Hi All,
> >
> > Does anyone have any topics to discuss during the hangout tomorrow?
> >
> > Thanks,
> > Tim
> >
>


Hangout Discussion Topics

2018-11-12 Thread Timothy Farkas
Hi All,

Does anyone have any topics to discuss during the hangout tomorrow?

Thanks,
Tim


Re: [ANNOUNCE] New Committer: Hanumath Rao Maduri

2018-11-01 Thread Timothy Farkas
Congrats Hanu!

On Thu, Nov 1, 2018 at 8:43 AM salim achouche  wrote:

> Congrats Hanu!
>
> On Thu, Nov 1, 2018 at 6:05 AM Arina Ielchiieva  wrote:
>
> > The Project Management Committee (PMC) for Apache Drill has invited
> > Hanumath
> > Rao Maduri to become a committer, and we are pleased to announce that he
> > has accepted.
> >
> > Hanumath became a contributor in 2017, making changes mostly in the Drill
> > planning side, including lateral / unnest support. He is also one of the
> > contributors of index based planning and execution support.
> >
> > Welcome Hanumath, and thank you for your contributions!
> >
> > - Arina
> > (on behalf of Drill PMC)
> >
>
>
> --
> Regards,
> Salim
>


Re: [ANNOUNCE] New Committer: Gautam Parai

2018-10-22 Thread Timothy Farkas
Congrats Gautam!

On Mon, Oct 22, 2018 at 9:04 AM Hanumath Rao Maduri 
wrote:

> Congratulations Gautam!
>
> On Mon, Oct 22, 2018 at 8:46 AM salim achouche 
> wrote:
>
> > Congrats Gautam!
> >
> > On Mon, Oct 22, 2018 at 7:25 AM Arina Ielchiieva 
> wrote:
> >
> > > The Project Management Committee (PMC) for Apache Drill has invited
> > Gautam
> > > Parai to become a committer, and we are pleased to announce that he has
> > > accepted.
> > >
> > > Gautam has become a contributor since 2016, making changes in various
> > Drill
> > > areas including planning side. He is also one of the contributors of
> the
> > > upcoming feature to support index based planning and execution.
> > >
> > > Welcome Gautam, and thank you for your contributions!
> > >
> > > - Arina
> > > (on behalf of Drill PMC)
> > >
> >
> >
> > --
> > Regards,
> > Salim
> >
>


[jira] [Created] (DRILL-6808) Simplify logging statements in HashAgg

2018-10-19 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6808:
-

 Summary: Simplify logging statements in HashAgg
 Key: DRILL-6808
 URL: https://issues.apache.org/jira/browse/DRILL-6808
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6807) Move code for evicting a partition in HashAgg into a separate class

2018-10-19 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6807:
-

 Summary: Move code for evicting a partition in HashAgg into a 
separate class
 Key: DRILL-6807
 URL: https://issues.apache.org/jira/browse/DRILL-6807
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6806) Move code for handling a partition in HashAgg into a separate class

2018-10-19 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6806:
-

 Summary: Move code for handling a partition in HashAgg into a 
separate class
 Key: DRILL-6806
 URL: https://issues.apache.org/jira/browse/DRILL-6806
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6804) Simplify Usage of OperatorPhase in HashAgg Template

2018-10-18 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6804:
-

 Summary: Simplify Usage of OperatorPhase in HashAgg Template
 Key: DRILL-6804
 URL: https://issues.apache.org/jira/browse/DRILL-6804
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Committer: Chunhui Shi

2018-09-28 Thread Timothy Farkas
Congrats!

On Fri, Sep 28, 2018 at 1:17 PM Sorabh Hamirwasia 
wrote:

> Congratulations Chunhui!!
>
> Thanks,
> Sorabh
>
> On Fri, Sep 28, 2018 at 12:56 PM Paul Rogers 
> wrote:
>
> > Congrats Chunhui!
> >
> > Thanks,
> > - Paul
> >
> >
> >
> > On Friday, September 28, 2018, 2:17:42 AM PDT, Arina Ielchiieva <
> > ar...@apache.org> wrote:
> >
> >  The Project Management Committee (PMC) for Apache Drill has invited
> > Chunhui
> > Shi to become a committer, and we are pleased to announce that he has
> > accepted.
> >
> > Chunhui Shi has become a contributor since 2016, making changes in
> various
> > Drill areas. He has shown profound knowledge in Drill planning side
> during
> > his work to support lateral join. He is also one of the contributors of
> the
> > upcoming feature to support index based planning and execution.
> >
> > Welcome Chunhui, and thank you for your contributions!
> >
> > - Arina
> > (on behalf of Drill PMC)
> >
>


[jira] [Created] (DRILL-6757) Enforce Updated NONE Contract

2018-09-21 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6757:
-

 Summary: Enforce Updated NONE Contract
 Key: DRILL-6757
 URL: https://issues.apache.org/jira/browse/DRILL-6757
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


DRILL-6747 updated the contract for RecordBatches after they return NONE. This 
new contract should be implemented in all the operators.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6756) HashTable Decouple updateIncoming for build and probe side.

2018-09-21 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6756:
-

 Summary: HashTable Decouple updateIncoming for build and probe 
side.
 Key: DRILL-6756
 URL: https://issues.apache.org/jira/browse/DRILL-6756
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas


We should decouple update incoming for the build and probe side batches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6755) HashJoin don't build hash tables when probe side is empty.

2018-09-21 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6755:
-

 Summary: HashJoin don't build hash tables when probe side is empty.
 Key: DRILL-6755
 URL: https://issues.apache.org/jira/browse/DRILL-6755
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Boaz Ben-Zvi


Currently when doing an Inner or a Right join we still build hashtables when 
the probe side is empty. A performance optimization would be to not build them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6747) Empty Probe Side in Right Join Causes IOB.

2018-09-19 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6747:
-

 Summary: Empty Probe Side in Right Join Causes IOB.
 Key: DRILL-6747
 URL: https://issues.apache.org/jira/browse/DRILL-6747
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas
Assignee: Timothy Farkas


When a right join is done with an empty probe side, which first returns an 
OK_NEW_SCHEMA, and then NONE, an IOB would be triggered.

This happens in the following scenario

 1. We finish reading the build side.
 2. The upstream probe side operator is an UnorderReciever.
 3. We sniff for probe data, and find none, so the UnorderedReciever returns 
NONE.
 4. When the UnorderedReciever returns NONE it clears the VectorContainer in 
its RecordBatchLoader causing all the value vectors to be deleted.
 5. We then try to build hashtables from the build side data.
 6. The HashTable requires a probe side vector container from an 
UnorderedReciever.
 7. The UnorderedReciever's vector container is in RecordBatchLoader, which was 
cleared when NONE was returned. So all the vector containers columns were 
removed.
 8. Building the hashtable attempts to access the vector container's columns, 
and we get an IOB.

There are a couple ways to fix the issue.

Currently updating value vectors in the hashtable for build and probe sides are 
tightly coupled, you cannot update them independently. So we could refactor 
updating the build and probe sides independently in the case of an empty probe 
side. This would be an invasive change though.

Another solution which I am going to implement here is to add a requirement to 
the contract for IteroutCome.NONE that the VectorContainer columns are 
preserved, and fix UnorderedReciever to obey that contract. It looks like all 
the other operators already do this, but it is not explicity tested. I will 
create other jiras to explicitely test this behavior.

{code}
Fragment 3:3

[Error Id: 061e7d71-e3c1-43bd-a6d5-9d588ecf6551 on perf109-52.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IndexOutOfBoundsException: Index: 1, Size: 0

Fragment 3:3

[Error Id: 061e7d71-e3c1-43bd-a6d5-9d588ecf6551 on perf109-52.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:360)
 [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:215)
 [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:326)
 [drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_171]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]
Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[na:1.8.0_171]
at java.util.ArrayList.get(ArrayList.java:433) ~[na:1.8.0_171]
at 
org.apache.drill.exec.record.VectorContainer.getValueAccessorById(VectorContainer.java:317)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
at 
org.apache.drill.exec.record.RecordBatchLoader.getValueAccessorById(RecordBatchLoader.java:251)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.unorderedreceiver.UnorderedReceiverBatch.getValueAccessorById(UnorderedReceiverBatch.java:139)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
at 
org.apache.drill.exec.test.generated.HashTableGen783.doSetup(HashTableGen783.java:49)
 ~[na:na]
at 
org.apache.drill.exec.physical.impl.common.HashTableTemplate.updateBatches(HashTableTemplate.java:513)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.common.HashTableTemplate.updateIncoming(HashTableTemplate.java:870)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.common.HashPartition.buildContainersHashTableAndHelper(HashPartition.java:510)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:973)
 ~[drill-java-exec-1.15.0-SNAPSHOT.jar:1.15.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:436)
 ~[drill-java-exec-1.15.0

Re: [DISCUSSION] CI for Drill

2018-09-11 Thread Timothy Farkas
+1 For trying out Circle CI. I've used it in the past, and I think the UI
is much better than Travis.

Tim

On Tue, Sep 11, 2018 at 8:21 AM Vitalii Diravka 
wrote:

> Recently we discussed Travis build failures and there were excluded more
> tests to make Travis happy [1]. But looks like the issue returned back and
> Travis build fails intermittently.
>
> I tried to find other solution instead of exclusion Drill unit tests and
> found other good CI - CircleCI [2]. Looks like this CI will allow to run
> all unit tests successfully.
> And it offers good conditions for open-source projects [3] (even OS X
> environment is available).
> The example of Apache project, which uses this CI is Apache Cassandra [4]
>
> My quick set-up of CircleCI for Drill still fails, but it should be just
> configured properly [5].
>
> I think we can try CircleCI in parallel with Travis and if it works well,
> we will move completely to CircleCI.
> Does it make sense? Maybe somebody faced with it and knows some limitations
> or complexities?
>
> [1]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6559=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=Q-8LHY-5W3frk1S48j4jsEgmHOKPowwFtsEHM9Fp_g4=0oL67ROsJWhMDYzDS-y3Ch-ibgsfKQph8tN0I0jsB1o=
> [2]
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_DefinitelyTyped_DefinitelyTyped_issues_20308-23issuecomment-2D342115544=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=Q-8LHY-5W3frk1S48j4jsEgmHOKPowwFtsEHM9Fp_g4=k1Q44t4uWwCoA0fUVtaoKHaXEMq4Gtf97k0ST1YjGNs=
> [3]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__circleci.com_pricing_=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=Q-8LHY-5W3frk1S48j4jsEgmHOKPowwFtsEHM9Fp_g4=2XHpHg1fhBVMrNA2HuZJCWl08PQ3SqJ0r0Kd3L9wqao=
> [4]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_cassandra_blob_trunk_.circleci_config.yml=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=Q-8LHY-5W3frk1S48j4jsEgmHOKPowwFtsEHM9Fp_g4=RiP35johSh3iM0LkqEDGuuMH_F9Hy4LBrtFOqCcTYQ4=
> [5]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__circleci.com_gh_vdiravka_drill_tree_circleCI=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=Q-8LHY-5W3frk1S48j4jsEgmHOKPowwFtsEHM9Fp_g4=TAY_BXixKtv88mMRkXzSFcIlJ5bYxigcAK0RbJsFlPU=
>
> Kind regards
> Vitalii
>


Re: Drill in the distributed compute jungle

2018-09-10 Thread Timothy Farkas
It's an interesting idea, and I think the main inhibitor that prevents this
from happening is that the popular big data projects are stuck on services.
Specifically if you need distributed coordination you run a separate
zookeeper cluster. If you need a batch compute engine you run a separate
spark cluster. If you need a streaming engine you deploy a separate Flink
or Apex pipeline. If you want to reuse and combine all these services to
make a new engine, you find yourself maintaining several different clusters
of machines, which just isn't practical.

IMO the paradigm needs to shift from services to libraries. If you need
distributed coordination import the zookeeper library and start the
zookeeper client, which will run zookeeper threads and turn your
application process into a member of the zookeeper quorum. If you need
compute import the compute engine library and start the compute engine
client and your application node will also turn into a worker node. When
you start a library it will discover the other nodes in your application to
form a cohesive cluster. I think this shift has already begun. Calcite is a
library, not a query planning service. Also etcd allows you to run an etcd
instance in your application's process using a simple function call. Arrow
is also a library, not a service. And Apache Ignite is a compute engine
that allows you to run the Ignite compute engine in your application's
process https://ignite.apache.org/ .

If we shift to thinking of libraries instead of services, then it becomes
trivial to build new engines, since new engines would just be a library
that depends on other libraries. Also you no longer manage several
services, you only manage the service that you built.

>From the little I read about ray, it seems like ray is also moving in the
library direction.

Tim



On Sun, Sep 9, 2018 at 10:21 PM Paul Rogers 
wrote:

> Hi All,
>
> Been reading up on distributed DB papers of late, including those passed
> along by this group. Got me thinking about Arina's question about where
> Drill might go in the long term.
>
> One thing I've noticed is that there are now quite a few distributed
> compute frameworks, many of which support SQL in some form. A partial list
> would include Drill, Presto, Impala, Hive LLAP, Spark SQL (sort of),
> Dremio, Alibaba MaxCompute, Microsoft's Dryad, Scope and StreamS, Google's
> Dremel and BigQuery and F1, the batch version of Flink -- and those are
> just the ones off the top of my head. Seems every big Internet shop has
> created one (Google, Facebook, Alibaba, Microsoft, etc.)
>
> There is probably some lesson in here for Drill. Being a distributed
> compute engine seems to have become a commodity at this late stage of Big
> Data. But, it is still extremely hard to build a distributed compute engine
> that scales, especially for a small project like Drill.
>
> What unique value does Drill bring compared to the others? Certainly being
> open source. Being in Java helps. Supporting xDBC is handy. Being able to
> scan any type of data is great (but we tell people that when they get
> serious, they should use only Parquet).
>
> As the team thinks about Arina's question about where Drill goes next, one
> wonders if there is some way to share the load?  Rather than every project
> building its own DAG optimizer and execution engine, its own distribution
> framework, its own scanners, its own implementation of data types, and of
> SQL functions, etc., is there a way to combine efforts?
>
> Ray [1] out of UC Berkeley is early days, but it promises to be exactly
> the highly scalable, low-latency engine that Drill tries to be. Calcite is
> the universal SQL parser and optimizer. Arrow wants to be the database
> toolkit, including data format, network protocol, etc. YARN, Mesos,
> Kubernetes and others want to manage the cluster load. Ranger and Sentry
> want to do data security. There are now countless storage formats (HDFS
> (classic, erasure coding, Ozone), S3, ADLS, Ceph, MapR, Aluxio, Druid, Kudu
> and countless key-value stores. HMS is the metastore we all love to hate
> and cries out for a newer, more scalable design -- but one shared by all
> engines.
>
> Then, on the compute side, SQL is just one (important) model. Spark and
> old-school MapReduce can handle general data transform problems. Ray
> targets ML. Apex and Flink target streaming. Then there are graph-query
> engines and more. Might these all be seen as different compute models that
> run on top of a common scheduler, data, security, storage, and distribution
> framework? Each has unique needs around shuffle, planning, duration, etc.
> But the underlying mechanisms are really all quite similar.
>
> Might Drill, in some far future form, be the project that combines these
> tools to create a SQL compute engine? Rather than a little project like
> Drill trying to do it all (and Drill is now talking about taking on the
> metastore challenge), perhaps Drill might evolve to get out of the

Re: [ANNOUNCE] New Committer: Weijie Tong

2018-08-31 Thread Timothy Farkas
Congrats Weijie! You've done awesome work!

Tim

On Fri, Aug 31, 2018 at 11:02 AM Sorabh Hamirwasia 
wrote:

> Congratulations Weijie!
>
> Thanks,
> Sorabh
>
> On Fri, Aug 31, 2018 at 10:28 AM, Paul Rogers 
> wrote:
>
> > Congratulations Weijie, thanks for your contributions to Drill.
> > Thanks,
> > - Paul
> >
> >
> >
> > On Friday, August 31, 2018, 8:51:30 AM PDT, Arina Ielchiieva <
> > ar...@apache.org> wrote:
> >
> >  The Project Management Committee (PMC) for Apache Drill has invited
> Weijie
> > Tong to become a committer, and we are pleased to announce that he has
> > accepted.
> >
> > Weijie Tong has become a very active contributor to Drill in recent
> months.
> > He contributed the Join predicate push down feature which will be
> available
> > in Apache Drill 1.15. The feature is non trivial and has covered changes
> > to all aspects of Drill: RPC layer, Planning, and Execution.
> >
> > Welcome Weijie, and thank you for your contributions!
> >
> > - Arina
> > (on behalf of Drill PMC)
> >
> >
>


[jira] [Resolved] (DRILL-6462) Enhance OperatorTestBuilder to use RowSets instead of adhoc json strings.

2018-08-27 Thread Timothy Farkas (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas resolved DRILL-6462.
---
Resolution: Fixed

This issue was resolved as part of DRILL-6461

> Enhance OperatorTestBuilder to use RowSets instead of adhoc json strings.
> -
>
> Key: DRILL-6462
> URL: https://issues.apache.org/jira/browse/DRILL-6462
> Project: Apache Drill
>  Issue Type: Sub-task
>    Reporter: Timothy Farkas
>        Assignee: Timothy Farkas
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Question] HiveStoragePlugin and NativeParquetRowGroupScan

2018-08-27 Thread Timothy Farkas
Hi Paul,

As you said each reader uses a different file system and config. As far as
I know this happens correctly in all cases, except there was one corner
case reported by a user a year ago. The corner case was that if you set
fs.defaultFS to the local file system in the HiveStoragePlugin, then
restart a Drillbit and then do a CTAS statement, the command would fail
because an operator was using the wrong FileSystem. This corner case is no
longer reproducible in house. So, I've been trying to narrow down possible
root causes by trying to understand the theory of how Drill handles
FileSystems. Since, the problem is not reproducible and the candid root
causes for the problem have been debunked, I am going to abandon the issue
and mark it as not reproducible.

One bit of learning that came out of the exercise was that the
DrillFileSystem should be immutable after it is created. This was not
previously enforced or documented, so a programmer could accidentally
mutate a DrillFileSystem incorrectly. I have a PR open that documents and
enforces this contract now.

Thanks,
Tim

On Fri, Aug 24, 2018 at 5:11 PM Paul Rogers 
wrote:

> Hi Tim,
>
> Can't recall the details on this. The phrase "the filesystem
> configuration" might be misleading. When executing, Drill must support
> multiple filesystems. I can have two different DFS configs, pointing to two
> different HDFS clusters (say) in a single query:
>
> SELECT ... FROM dfs1.`aFile.csv`, dfs2.`anotherFile.csv`
>
> We'd create separate readers for each file. Each reader should have a
> different filesystem conf: the one appropriate for the storage plugin
> config used for that file.
>
> Using that as a reference, it would seem that Hive plugin queries use the
> hive fs, while any DFS tables in the same query use the DFS config.
>
> I wonder, based on your comment, is this not happening? Are the configs
> getting muddled somehow?
>
> Thanks,
> - Paul
>
>
>
> On Friday, August 24, 2018, 3:45:08 PM PDT, Timothy Farkas <
> tfar...@mapr.com> wrote:
>
>  Hi Paul / Vitalii
>
> Thanks for the info. I was asking about this because of
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6609=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=G3sMOgIgWfI5gdRM9Zg-q7FCe5lveejIeHMb9EHRGbA=3joGV6TQJXZ8OlUctGeTyMc5d2KuCAJPgYnQ5K0siKI=
> in which some strange
> behavior was observed if the user defined fs.default.name in the
> HivePlugin
> config. I also saw that the filesystem specified in the HivePlugin config
> influences the FileSystem used for native scans. This happens because in
> HiveDrillNativeParquetRowGroupScan.getFsConf we use the HiveStoragePlugin
> to create the filesystem configuration, which is then used by
> DrillFileSystem.
>
> However, based on your feedback it looks like this is desirable behavior,
> since the user may want to define a different filesystem for the HivePlugin
> along with different format plugins. Which means the root cause of
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6609=DwIFaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=G3sMOgIgWfI5gdRM9Zg-q7FCe5lveejIeHMb9EHRGbA=3joGV6TQJXZ8OlUctGeTyMc5d2KuCAJPgYnQ5K0siKI=
> is something else then.
> I'll probably abandon that issue at this point since it's not reproducible
> and I have no further leads as to what could cause it.
>
> Thanks,
> Tim
>
> On Thu, Aug 23, 2018 at 2:46 AM, Vitalii Diravka <
> vitalii.dira...@gmail.com>
> wrote:
>
> > Hi Tim,
> >
> > Some comments from me.
> >
> > *HiveStoragePlugin*
> > *fs.defaultFS *is Hive specific property. This is the URI used by Hive
> > Metastore to point where tables are placed. There is no need to specify
> > this property, if default value from *core-site.xml* is acceptable, see
> > more:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__hadoop.
> > apache.org_docs_r3.1.0_hadoop-2Dproject-2Ddist_hadoop-
> > 2Dcommon_core-2Ddefault.xml=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
> > 4eQVr8zB8ZBff-yxTimdOQ=Y3D0V12MikEpxfG9ybUeW6KLgeJcCD
> > N8jXEur5IyORo=iJjg-o08kFjMfaxGHOZ9QAiTnk2KhkwPofQ3jEVjtyw=
> >
> > *Hive Native readers. *
> > Currently Drill has two Hive Native readers: Parquet and MapR Json. Both
> of
> > them use appropriate default File Format Plugins. It is a limitation and
> > there is no way for now to change FormatPlugins config for them.
> > There is Jira ticket for it:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> > apache.org_jira_browse_DRILL-2D6621=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
> > 4eQVr8zB8ZBff-yxTimdOQ=Y3D0V12MikEpxfG9ybUeW6KLgeJcCDN8jXEur5IyORo=
> > QDZyPZEwolNN1wu5z4QMwajvdQ

[jira] [Created] (DRILL-6714) Fix Handling of Missing Columns in DRILL-4264

2018-08-27 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6714:
-

 Summary: Fix Handling of Missing Columns in DRILL-4264
 Key: DRILL-6714
 URL: https://issues.apache.org/jira/browse/DRILL-6714
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Volodymyr Vysotskyi


Implement the improvements Salim discussed on this PR 
https://github.com/apache/drill/pull/1445 to ensure column names are created 
without backticks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New PMC member: Volodymyr Vysotskyi

2018-08-25 Thread Timothy Farkas
Congratulations, Volodymyr!

On Sat, Aug 25, 2018 at 9:00 AM, Kunal Khatua  wrote:

> Congratulations, Volodymyr!
> On 8/25/2018 6:32:07 AM, weijie tong  wrote:
> Congratulations Volodymyr!
>
> On Sat, Aug 25, 2018 at 8:30 AM salim achouche wrote:
>
> > Congrats Volodymyr!
> >
> > On Fri, Aug 24, 2018 at 11:32 AM Gautam Parai wrote:
> >
> > > Congratulations Vova!
> > >
> > > Gautam
> > >
> > > On Fri, Aug 24, 2018 at 10:59 AM, Khurram Faraaz
> > wrote:
> > >
> > > > Congratulations Volodymyr!
> > > >
> > > > Regards,
> > > > Khurram
> > > >
> > > > On Fri, Aug 24, 2018 at 10:25 AM, Hanumath Rao Maduri
> > > hanu@gmail.com>
> > > > wrote:
> > > >
> > > > > Congratulations Volodymyr!
> > > > >
> > > > > Thanks,
> > > > > -Hanu
> > > > >
> > > > > On Fri, Aug 24, 2018 at 10:22 AM Paul Rogers
> >
> > > >
> > > > > wrote:
> > > > >
> > > > > > Congratulations Volodymyr!
> > > > > > Thanks,
> > > > > > - Paul
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Friday, August 24, 2018, 5:53:25 AM PDT, Arina Ielchiieva
> > > > > > ar...@apache.org> wrote:
> > > > > >
> > > > > > I am pleased to announce that Drill PMC invited Volodymyr
> > Vysotskyi
> > > to
> > > > > the
> > > > > > PMC and he has accepted the invitation.
> > > > > >
> > > > > > Congratulations Vova and thanks for your contributions!
> > > > > >
> > > > > > - Arina
> > > > > > (on behalf of Drill PMC)
> > > > > >
> > > > >
> > > >
> > >
> >
>


[Question] HiveStoragePlugin and NativeParquetRowGroupScan

2018-08-22 Thread Timothy Farkas
Hi All,

I'm a bit confused and I was hoping to get some clarification about how the
HiveStoragePlugin interacts with the FileSystem plugin. Currently the
HiveStoragePlugin allows the user to configure their own value for
fs.defaultFS in the plugin properties, which overrides the defaultFS used
when doing a native parquet scan for Hive. Is this intentional? Also what
is the high level theory about how Hive and the FileSystem plugins
interact? Specifically does Drill support querying Hive when Hive is using
a different FileSystem than the one specified in the file system plugin? Or
does Drill assume that the Hive is using the same FileSystem as the one
defined in the Drill FileSystem plugin?

Thanks,
Tim


[jira] [Created] (DRILL-6698) Add support for handling output batches with selection vectors to OperatorTestBuilder

2018-08-17 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6698:
-

 Summary: Add support for handling output batches with selection 
vectors to OperatorTestBuilder
 Key: DRILL-6698
 URL: https://issues.apache.org/jira/browse/DRILL-6698
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas


Currently only output batches without a selection vector are allowed in the 
OperatorTestBuilder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New PMC member: Boaz Ben-Zvi

2018-08-17 Thread Timothy Farkas
Congrats!

On Fri, Aug 17, 2018 at 11:27 AM, Gautam Parai  wrote:

> Congratulations Boaz!!
>
> Gautam
>
> On Fri, Aug 17, 2018 at 11:04 AM, Khurram Faraaz  wrote:
>
> > Congratulations Boaz.
> >
> > On Fri, Aug 17, 2018 at 10:47 AM, shi.chunhui <
> > shi.chun...@aliyun.com.invalid> wrote:
> >
> > > Congrats Boaz!
> > > --
> > > Sender:Arina Ielchiieva 
> > > Sent at:2018 Aug 17 (Fri) 17:51
> > > To:dev ; user 
> > > Subject:[ANNOUNCE] New PMC member: Boaz Ben-Zvi
> > >
> > > I am pleased to announce that Drill PMC invited Boaz Ben-Zvi to the PMC
> > and
> > > he has accepted the invitation.
> > >
> > > Congratulations Boaz and thanks for your contributions!
> > >
> > > - Arina
> > > (on behalf of Drill PMC)
> > >
> >
>


Re: [Question] ValueVector Contract and Usage

2018-08-15 Thread Timothy Farkas
Thanks for the explanation Paul. It makes sense that we want to preallocate
for performance. It looks like the setSafe bug you mentioned was fixed,
since there is an explicit check for it. But I've run into a couple more
issues with adding to an empty VariableLength Vector and an empty
RepeatedVector. I started going down the road of making that behavior work,
but since that behavior isn't useful maybe I'll just document that you have
to call allocateNew on a vector before using it and add an assert statement
to enforce that requirement on all the vectors.

Thanks,
Tim

On Wed, Aug 15, 2018 at 6:21 PM, Paul Rogers 
wrote:

> Hi Tim,
>
> IIRC, you have to do an initial allocation. There was a bug that, if you
> didn't, the setSafe would try to double your vector from 0 items to 0
> items. This would be t0o small, so it would double again, forever.
>
> In general, you don't want to start with an empty vector (or the default
> size you get on a plain alloc()). Your code will waste time doubling your
> vector up to the desired size.
>
> Instead, use the available information ("sizer" and accompanying
> allocation tool in an internal operator, or your best guess in a reader) to
> allocate the vector to the desired final size. (The result set loader
> handles this busy work for you, BTW.)
>
>
> If you turn on detailed logging in vectors, you'll see, and be alarmed by,
> the number of doublings that happen otherwise.
>
>
> Thanks,
> - Paul
>
>
>
> On Wednesday, August 15, 2018, 6:11:49 PM PDT, Timothy Farkas <
> tfar...@mapr.com> wrote:
>
>  I'm currently observing a bug and I believe the source of it may be a
> misunderstanding I have about the usage of value vectors. If I create any
> value vector using it's constructor, is it safe to use the addSafe /
> setSafe methods on it out of the box? Or do I have to call allocateNew or
> allocateNewSafe on it before I do anything in order to safely start
> manipulating it?
>
> Thanks,
> Tim
>
>


[Question] ValueVector Contract and Usage

2018-08-15 Thread Timothy Farkas
I'm currently observing a bug and I believe the source of it may be a
misunderstanding I have about the usage of value vectors. If I create any
value vector using it's constructor, is it safe to use the addSafe /
setSafe methods on it out of the box? Or do I have to call allocateNew or
allocateNewSafe on it before I do anything in order to safely start
manipulating it?

Thanks,
Tim


[jira] [Created] (DRILL-6683) move getSelectionVector2 and getSelectionVector4 from VectorAccessible interface to RecordBatch interface

2018-08-13 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6683:
-

 Summary: move getSelectionVector2 and getSelectionVector4 from 
VectorAccessible interface to RecordBatch interface
 Key: DRILL-6683
 URL: https://issues.apache.org/jira/browse/DRILL-6683
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6675) VectorContainer.setRecordCount should set record count for value vectors as well.

2018-08-08 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6675:
-

 Summary: VectorContainer.setRecordCount should set record count 
for value vectors as well.
 Key: DRILL-6675
 URL: https://issues.apache.org/jira/browse/DRILL-6675
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6409) Query Failed: An Error Occurred

2018-08-03 Thread Timothy Farkas (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas resolved DRILL-6409.
---
Resolution: Fixed

> Query Failed: An Error Occurred
> ---
>
> Key: DRILL-6409
> URL: https://issues.apache.org/jira/browse/DRILL-6409
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: JUDILSON JOSE DA COSTA JUNIOR
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
> Attachments: Test.rar, erro_drill.png, 
> erro_drill_det_mongo-java-driver-3.2.0.png, 
> erro_drill_det_mongo-java-driver-3.7.0.png
>
>
> h2. Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: INTERNAL_ERROR ERROR: 
> UnSupported Bson type: DECIMAL128 Fragment 0:0 [Error Id: 
> 35c951e0-3ce7-4232-8a90-1c146ecc749f on DSK-0244.eicon.com.br:31010]
> Verison mongo-java-driver: 3.7.0
> Version mongo: 3.4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Ownership Of VectorContainers In HashJoin After Calling Next

2018-08-03 Thread Timothy Farkas
Thanks for the explanation Sorabh, that clears things up for me.

Tim

On Thu, Aug 2, 2018 at 11:38 PM, Sorabh Hamirwasia 
wrote:

> When next is called on upstream operator then 2 things can happen:
>
>1. Current operator either work on incoming and produces/copy the
>records to its outgoing batch. In this case there is no transfer of
>ownership of incoming batch buffer from upstream to current operator
>allocator. Current operator will allocate separate memory footprint for
> its
>outgoing batch buffer. Also current operator is supposed to release the
>incoming batch buffer once its done working on it.
>2. Current operator does a transfer of buffers from incoming batch value
>vectors to outgoing value vectors (like in Filter, limit (see [1]),
> etc).
>In this case ownership of buffers in incoming batch is transferred to
>current operator allocator.
>
> But I have seen different operator behaving differently. For Hash Join
> since join operators has to evaluate join condition for each probe side
> row, I don't think it will do any transfers. For build side it will build
> hash table on column involved in join condition but also has to store other
> columns if in projection list of query. So probably it might do transfer
> for those columns only (haven't looked into code though).
>
> [1]:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_apache_drill_blob_006dc10a88c1708b793e3a38ac52a0
> 266bb07deb_exec_java-2Dexec_src_main_java_org_apache_
> drill_exec_physical_impl_limit_LimitRecordBatch.java-23L181=DwIBaQ=
> cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=
> 3PkPHZrX33YZhmngIGbjYtgGn_KYzXNVUyJPEUQVFyo=7YyEpHwWOp2I8zQjNEdFXLu-
> QnVCq7neR7lyg7bfPVs=
>
> Thanks,
> Sorabh
>
> On Thu, Aug 2, 2018 at 9:43 PM, Timothy Farkas  wrote:
>
> > Hi All,
> >
> > What is the expected behavior for HashJoin when it calls next for its
> left
> > or right upstream record batches. Is ownership of an upstream
> > VectorContainer supposed to pass from from the left or right upstream
> > record batches to HashJoin immediately after a call to next? Or is
> > ownership of a VectorContainer supposed to stay with an upstream record
> > batch immediately after a call to next?
> >
> > Thanks,
> > Tim
> >
>


Ownership Of VectorContainers In HashJoin After Calling Next

2018-08-02 Thread Timothy Farkas
Hi All,

What is the expected behavior for HashJoin when it calls next for its left
or right upstream record batches. Is ownership of an upstream
VectorContainer supposed to pass from from the left or right upstream
record batches to HashJoin immediately after a call to next? Or is
ownership of a VectorContainer supposed to stay with an upstream record
batch immediately after a call to next?

Thanks,
Tim


[jira] [Created] (DRILL-6656) Add Regex To Disallow Extra Semicolons In Imports

2018-07-31 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6656:
-

 Summary: Add Regex To Disallow Extra Semicolons In Imports
 Key: DRILL-6656
 URL: https://issues.apache.org/jira/browse/DRILL-6656
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6655) Require Package Declaration In Checkstyle

2018-07-31 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6655:
-

 Summary: Require Package Declaration In Checkstyle
 Key: DRILL-6655
 URL: https://issues.apache.org/jira/browse/DRILL-6655
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6618) Unnest changes for implicit column

2018-07-31 Thread Timothy Farkas (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas resolved DRILL-6618.
---
Resolution: Done

> Unnest changes for implicit column
> --
>
> Key: DRILL-6618
> URL: https://issues.apache.org/jira/browse/DRILL-6618
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Affects Versions: 1.14.0
>Reporter: Sorabh Hamirwasia
>Assignee: Parth Chandra
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.15.0
>
>
> 1) Update unnest to work on entire left incoming instead of row by row 
> processing.
> 2) Update unnest to generate an implicit field (name passed in PopConfig) 
> with rowId of each output row being generated. The type of implicit field 
> will be IntVector.
> 3) Fix all existing unit tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6439) Sometimes Travis Times Out

2018-07-31 Thread Timothy Farkas (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-6439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas resolved DRILL-6439.
---
Resolution: Duplicate

> Sometimes Travis Times Out
> --
>
> Key: DRILL-6439
> URL: https://issues.apache.org/jira/browse/DRILL-6439
> Project: Apache Drill
>  Issue Type: Bug
>    Reporter: Timothy Farkas
>        Assignee: Timothy Farkas
>Priority: Major
>
> Ocassionally Travis builds run a few minutes longer than usual and timeout.
> {code}
> changes detected, packing new archive
> .
> .
> The job exceeded the maximum time limit for jobs, and has been terminated.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6650) Disallow Empty Statements With Stray Semicolons

2018-07-31 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6650:
-

 Summary: Disallow Empty Statements With Stray Semicolons
 Key: DRILL-6650
 URL: https://issues.apache.org/jira/browse/DRILL-6650
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


Having empty statements with stray semicolons can cause compilation in eclipse 
to fail. To fix this this change removes the empty semicolon statements and 
forbids them in our checkstyle configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6646) HashJoin Memory Calculator Over Reserves Memory For HashTables

2018-07-27 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6646:
-

 Summary: HashJoin Memory Calculator Over Reserves Memory For 
HashTables
 Key: DRILL-6646
 URL: https://issues.apache.org/jira/browse/DRILL-6646
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


When preparing for the probe phase it uses the worst case hashtable size even 
after we have constructed a hash table. In this case when we already have a 
hash table we should not use it's predicted worst case size, we should use its 
actual size in the memory calculator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6644) In Some Cases The HashJoin Memory Calculator Over Reserves Memory

2018-07-27 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6644:
-

 Summary: In Some Cases The HashJoin Memory Calculator Over 
Reserves Memory
 Key: DRILL-6644
 URL: https://issues.apache.org/jira/browse/DRILL-6644
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


There are two cases where the HashJoin Memory calculator over reserves memory:

 1. It reserves a maximum incoming probe batch size during the build phase. 
This is not really necessary because we will not fetch probe data until the 
probe phase. We only have to account for the space data received during 
OK_NEW_SCHEMA occupied.
 2. When preparing for the probe phase it uses the worst case hashtable size 
even after we have constructed a hash table. In this case when we already have 
a hash table we should not use it's predicted worst case size, we should use 
its actual size in the memory calculator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] 1.14.0 release

2018-07-13 Thread Timothy Farkas
Hi Boaz,

I looked at DRILL-6606 and have updated the ticket, I should have a fix
monday. It looks like a minor logical error.

I'm not clear on why you suspect DRILL-6453 is cause by batch sniffing,
perhaps we can discuss offline.

Thanks,
Tim

On Fri, Jul 13, 2018 at 2:01 PM, Arina Yelchiyeva <
arina.yelchiy...@gmail.com> wrote:

> Two more regressions:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> apache.org_jira_browse_DRILL-2D6603=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
> 4eQVr8zB8ZBff-yxTimdOQ=F7Ih5Ah_SfOS5fZXFApt88iMe3Vd-
> Jq1XDvxPN6b3y4=FqDR26vK2kVG-P69NcqoqNxRrdHKZvBCWamYRftPPYg=
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.
> apache.org_jira_browse_DRILL-2D6605=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
> 4eQVr8zB8ZBff-yxTimdOQ=F7Ih5Ah_SfOS5fZXFApt88iMe3Vd-Jq1XDvxPN6b3y4=
> KzhJcDP4F8FIA7h4GiQE8wg_CSae0CAI0yEMslff52o=
>
> Kind regards,
> Arina
>
> On Fri, Jul 13, 2018 at 11:25 PM Sorabh Hamirwasia 
> wrote:
>
> > Hi Boaz,
> > Couple of updates.
> >
> > *Merged In:*
> > DRILL-6542: (May be Ready2Commit soon) IndexOutOfBounds exception for
> > multilevel lateral ((Sorabh / Parth))
> >
> > *In Review:*
> >
> >
> > *DRILL-6475: Query with UNNEST causes a Null Pointer .  (( Hanumath ))*
> > Thanks,
> > Sorabh
> >
> > On Fri, Jul 13, 2018 at 1:17 PM, Parth Chandra 
> wrote:
> >
> > > Our (unwritten) rule has been that a commit cannot even go in unless
> unit
> > > _and_ regression tests pass.
> > > Releases are stricter, all tests, longevity tests, UI, are required to
> > > pass. In addition, any performance regression needs to be discussed.
> > >
> > > So far we have not made any exceptions, but that is not to say we
> cannot.
> > >
> > > On Fri, Jul 13, 2018 at 1:03 PM, Vlad Rozov  wrote:
> > >
> > > > My 2 cents:
> > > >
> > > > From Apache point of view it is OK to do a release even if unit tests
> > do
> > > > not pass at all or there is a large number of regression introduced.
> > > Apache
> > > > release is a source release and as long as it compiles and does not
> > have
> > > > license issues, it is up to community (PMC) to decide on any other
> > > criteria
> > > > for a release.
> > > >
> > > > The issue in DRILL-6453 is not limited to a large number of hash
> joins.
> > > It
> > > > should be possible to reproduce it even with a single hash join as
> long
> > > as
> > > > left and right sides are getting batches from one(many) to many
> > exchanges
> > > > (broadcast or hash partitioner senders).
> > > >
> > > > Thank you,
> > > >
> > > > Vlad
> > > >
> > > >
> > > > On 7/13/18 08:41, Aman Sinha wrote:
> > > >
> > > >> I would say we have to take a measured approach to this and decide
> on
> > a
> > > >> case-by-case which issue is a show stopper.
> > > >> While of course we have to make every effort to avoid regression, we
> > > >> cannot
> > > >> claim that a particular release will not cause any regression.
> > > >> I believe there are 1+ passing tests,  so that should provide a
> > > level
> > > >> of confidence.   The TPC-DS 72 is a 10 table join which in the
> hadoop
> > > >> world
> > > >> of
> > > >> denormalized schemas is not relatively common.  The main question is
> > > does
> > > >> the issue reproduce with fewer joins having the same type of
> > > distribution
> > > >> plan ?
> > > >>
> > > >>
> > > >> Aman
> > > >>
> > > >> On Fri, Jul 13, 2018 at 7:36 AM Arina Yelchiyeva <
> > > >> arina.yelchiy...@gmail.com>
> > > >> wrote:
> > > >>
> > > >> We cannot release with existing regressions, especially taking into
> > > >>> account
> > > >>> the there are not minor issues.
> > > >>> As far as I understand reverting is not an option since hash join
> > spill
> > > >>> feature are extended into several commits + subsequent fixes.
> > > >>> I guess we need to consider postponing the release until issues are
> > > >>> resolved.
> > > >>>
> > > >>> Kind regards,
> > > >>> Arina
> > > >>>
> > > >>> On Fri, Jul 13, 2018 at 5:14 PM Boaz Ben-Zvi 
> > wrote:
> > > >>>
> > > >>> (Guessing ...) It is possible that the root cause for DRILL-6606 is
> > >  similar to that in  DRILL-6453 -- that is the new "early sniffing"
> > in
> > >  the
> > >  Hash-Join, which repeatedly invokes next() on the two "children"
> of
> > > the
> > >  join *during schema discovery* until non-empty data is returned
> (or
> > >  NONE,
> > >  STOP, etc).  Last night Salim, Vlad and I briefly discussed
> > >  alternatives,
> > >  like postponing the "sniffing" to a later time (beginning of the
> > build
> > > 
> > > >>> for
> > > >>>
> > >  the right child, and beginning of the probe for the left child).
> > > 
> > >  However this would require some work time. So what should we do
> > about
> > > 
> > > >>> 1.14
> > > >>>
> > >  ?
> > > 
> > > Thanks,
> > > 
> > > Boaz
> > > 
> > >  On Fri, Jul 13, 2018 at 3:46 AM, Arina Yelchiyeva <
> > >  arina.yelchiy...@gmail.com> wrote:
> > > 
> > >  

[jira] [Created] (DRILL-6609) Investigate Creation of FileSystem Configuration for Hive Parquet Files

2018-07-13 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6609:
-

 Summary: Investigate Creation of FileSystem Configuration for Hive 
Parquet Files
 Key: DRILL-6609
 URL: https://issues.apache.org/jira/browse/DRILL-6609
 Project: Apache Drill
  Issue Type: Task
Reporter: Timothy Farkas


Currently when reading a parquet file in Hive we try to speed things up by 
doing a native parquet scan with HiveDrillNativeParquetRowGroupScan. When 
retrieving the FileSystem Configuration to use in 
HiveDrillNativeParquetRowGroupScan.getFsConf, use all the properties defined 
for the HiveStoragePlugin. This could cause a misconfiguration in the 
HiveStoragePlugin to influence the configuration of our FileSystem.

Currently it is unclear if this was desired behavior or not. If it is desired 
we need to document why it was done. If it is not desired we need to fix the 
issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6608) Properly Handle Creation and Closure of DrillFileSystems

2018-07-13 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6608:
-

 Summary: Properly Handle Creation and Closure of DrillFileSystems
 Key: DRILL-6608
 URL: https://issues.apache.org/jira/browse/DRILL-6608
 Project: Apache Drill
  Issue Type: Task
Reporter: Timothy Farkas


Currently the strategy Drill uses for creating file systems is to create a 
DrillFileSystem for readers and writers and then never close it. In order to 
prevent the proliferation of underlying file system objects used by 
DrillFileSystem, the underlying filesystems are cached.

This is not ideal, we should properly close our file system objects instead of 
caching them and keeping them in memory forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Travis CI Timed Out

2018-06-29 Thread Timothy Farkas
The timeouts are a sporadic issue. You can click on the failed build to
bring up Travis. For now, since you are a committer you will have a restart
build button to the right of the console (non-committers don't get this
button for some reason), this will try the build again and it will probably
pass. Vitali is proposing some fixes for the timeout issue, so hopefully we
won't have to deal with it soon.


​

On Thu, Jun 28, 2018 at 9:21 PM, Charles Givre  wrote:

> I submitted some changes and the following happened.  Any suggestions?
>
>
> Audit done.
>
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---  [0;32mmaven-enforcer-plugin:1.3.1:enforce [m
> [1m(avoid_bad_dependencies) [m @  [36mdrill-sqlline [0;1m --- [m
> [ [1;34mINFO [m]
> [ [1;34mINFO [m]  [1m---  [0;32mmaven-install-plugin:2.5.2:install [m
> [1m(default-install) [m @  [36mdrill-sqlline [0;1m --- [m
> [ [1;34mINFO [m] Installing /home/travis/build/apache/
> drill/contrib/sqlline/target/drill-sqlline-1.14.0-SNAPSHOT.jar to
> /home/travis/.m2/repository/org/apache/drill/contrib/
> drill-sqlline/1.14.0-SNAPSHOT/drill-sqlline-1.14.0-SNAPSHOT.jar
> [ [1;34mINFO [m] Installing 
> /home/travis/build/apache/drill/contrib/sqlline/pom.xml
> to /home/travis/.m2/repository/org/apache/drill/contrib/
> drill-sqlline/1.14.0-SNAPSHOT/drill-sqlline-1.14.0-SNAPSHOT.pom
> [ [1;34mINFO [m] Installing /home/travis/build/apache/
> drill/contrib/sqlline/target/drill-sqlline-1.14.0-SNAPSHOT-tests.jar to
> /home/travis/.m2/repository/org/apache/drill/contrib/
> drill-sqlline/1.14.0-SNAPSHOT/drill-sqlline-1.14.0-SNAPSHOT-tests.jar
> [ [1;34mINFO [m]  [1m---
> - [m
> [ [1;34mINFO [m]  [1mReactor Summary: [m
> [ [1;34mINFO [m]
> [ [1;34mINFO [m] Apache Drill Root POM ..
> [1;32mSUCCESS [m [ 11.686 s]
> [ [1;34mINFO [m] tools/Parent Pom ...
> [1;32mSUCCESS [m [  1.118 s]
> [ [1;34mINFO [m] tools/freemarker codegen tooling ...
> [1;32mSUCCESS [m [  7.546 s]
> [ [1;34mINFO [m] Drill Protocol .
> [1;32mSUCCESS [m [  1.690 s]
> [ [1;34mINFO [m] Common (Logical Plan, Base expressions) 
> [1;32mSUCCESS [m [  6.200 s]
> [ [1;34mINFO [m] Logical Plan, Base expressions .
> [1;32mSUCCESS [m [  6.076 s]
> [ [1;34mINFO [m] exec/Parent Pom 
> [1;32mSUCCESS [m [  2.769 s]
> [ [1;34mINFO [m] exec/memory/Parent Pom .
> [1;32mSUCCESS [m [  0.825 s]
> [ [1;34mINFO [m] exec/memory/base ...
> [1;32mSUCCESS [m [  3.922 s]
> [ [1;34mINFO [m] exec/rpc ...
> [1;32mSUCCESS [m [  1.866 s]
> [ [1;34mINFO [m] exec/Vectors ...
> [1;32mSUCCESS [m [  7.492 s]
> [ [1;34mINFO [m] contrib/Parent Pom .
> [1;32mSUCCESS [m [  2.568 s]
> [ [1;34mINFO [m] contrib/data/Parent Pom 
> [1;32mSUCCESS [m [  0.693 s]
> [ [1;34mINFO [m] contrib/data/tpch-sample-data ..
> [1;32mSUCCESS [m [  3.937 s]
> [ [1;34mINFO [m] exec/Java Execution Engine .
> [1;32mSUCCESS [m [28:30 min]
> [ [1;34mINFO [m] exec/JDBC Driver using dependencies 
> [1;32mSUCCESS [m [02:47 min]
> [ [1;34mINFO [m] JDBC JAR with all dependencies .
> [1;32mSUCCESS [m [ 42.480 s]
> [ [1;34mINFO [m] Drill-on-YARN ..
> [1;32mSUCCESS [m [ 15.818 s]
> [ [1;34mINFO [m] contrib/kudu-storage-plugin 
> [1;32mSUCCESS [m [ 10.498 s]
> [ [1;34mINFO [m] contrib/opentsdb-storage-plugin 
> [1;32mSUCCESS [m [ 19.961 s]
> [ [1;34mINFO [m] contrib/mongo-storage-plugin ...
> [1;32mSUCCESS [m [ 52.961 s]
> [ [1;34mINFO [m] contrib/hbase-storage-plugin ...
> [1;32mSUCCESS [m [01:18 min]
> [ [1;34mINFO [m] contrib/jdbc-storage-plugin 
> [1;32mSUCCESS [m [ 25.873 s]
> [ [1;34mINFO [m] contrib/hive-storage-plugin/Parent Pom .
> [1;32mSUCCESS [m [  3.396 s]
> [ [1;34mINFO [m] contrib/hive-storage-plugin/hive-exec-shaded ...
> [1;32mSUCCESS [m [ 37.665 s]
> [ [1;34mINFO [m] contrib/mapr-format-plugin .
> [1;32mSUCCESS [m [  5.200 s]
> [ [1;34mINFO [m] contrib/hive-storage-plugin/core ...
> [1;32mSUCCESS [m [ 10.537 s]
> [ [1;34mINFO [m] contrib/drill-gis-plugin ...
> [1;32mSUCCESS [m [ 21.492 s]
> [ [1;34mINFO [m] contrib/kafka-storage-plugin ...
> [1;32mSUCCESS [m [  6.602 s]
> [ [1;34mINFO [m] Packaging and Distribution Assembly 
> [1;32mSUCCESS [m [ 58.576 s]
> [ [1;34mINFO [m] contrib/sqlline 
> [1;32mSUCCESS [m [  1.570 s]
> [ [1;34mINFO [m]  [1m---
> 

Re: Deprecation of BaseTestQuery FYI

2018-06-28 Thread Timothy Farkas
Hi Charles,

So it is actually supported. Drill's boolean vector is BitVector.
Internally bits are stored efficiently, but when you fetch a bit from the
vector it becomes an int, -1 for true and 0 for false. So currently you can
check this by using singletonInt and comparing against -1 and 0.

Here is a test snippet that does it. We should probably add some
convenience methods specifically for booleans so this is not so confusing.

public class SimpleQueryTest extends ClusterTest {
  @Rule
  public final BaseDirTestWatcher baseDirTestWatcher = new BaseDirTestWatcher();

  @Before
  public void start() throws Exception {
startCluster(new ClusterFixtureBuilder(baseDirTestWatcher));
  }

  @Test
  public void testBoolean() throws RpcException {
RowSet rowSet = this.queryBuilder().sql("select position_id,
cast(false as boolean) from cp.`employee.json`").rowSet();
RowSetReader reader = rowSet.reader();
reader.next();
System.out.println(reader.column(1).scalar().getInt());
rowSet.clear();

rowSet = this.queryBuilder().sql("select position_id, cast(true as
boolean) from cp.`employee.json`").rowSet();
reader = rowSet.reader();
reader.next();
System.out.println(reader.column(1).scalar().getInt());
rowSet.clear();
  }
}



On Thu, Jun 28, 2018 at 4:09 PM, Timothy Farkas  wrote:

> Hi Charles,
>
> Currently the RowSetReader doesn't support booleans, but it can be added.
> I'll try to add it now, and see if it could be done quickly. I'll update
> you on my progress.
>
> Tim
>
> On Thu, Jun 28, 2018 at 2:38 PM, Charles Givre  wrote:
>
>> Hi Tim,
>> Could post some sample code as to how to test a SQL query that returns a
>> Boolean?
>> —C
>>
>> > On Jun 28, 2018, at 17:30, Timothy Farkas  wrote:
>> >
>> > - We would have to add a boolean column reader to ColumnAccessors and
>> wire
>> > it in and add a getBoolean method to ScalarReader.
>> >
>> > - Your example should work as is, ClusterTest has a testBuilder method
>> > that allows you to use the traditional test builder. Is there something
>> not
>> > working with the test builder?
>> >
>> > Tim
>> >
>> >
>> > On Thu, Jun 28, 2018 at 12:39 PM, Arina Yelchiyeva <
>> > arina.yelchiy...@gmail.com> wrote:
>> >
>> >> Hi Tim,
>> >>
>> >> it looks like deprecating BaseTestQuery was a little bit pre-mature.
>> >> For example, from in this PR - https://urldefense.proofpoint.
>> com/v2/url?u=https-3A__urldefense.proofpoint=DwIFaQ=cskd
>> kSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=D9i95wxPKk0aPH
>> ip2Rj6Xju0J151UakaSe6OnMGTW5s=z5tYyOUI_Cyx1cwhMEMfnCb-
>> BDFaZkRaKQKwt_zl2HA=.
>> >> com/v2/url?u=https-3A__github.com_apache_drill_pull_1331=DwIBaQ=
>> >> cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=
>> >> zoNJPdWKxMX9-jbR2bftzwkX-CSrihIbpCirhHM1kA0=_uxhA_
>> >> qiMBTjbit6DDw-DvZNRGesfeA5g-FQjkl7f10= -
>> >> Charles is trying to re-work  BaseTestQuery usage to ClusterTest.
>> >> First, it did not contain getSigletonDouble method which Charles has
>> >> implemented. Now he has troubles with implementing getSigletonBoolean
>> >> method which might be due to reader limitations.
>> >> Also I am not quite clear how we can verify columns names and multiple
>> >> columns in the result.
>> >> For example:
>> >>
>> >> testBuilder()
>> >>  .sqlQuery("select (mi || lname) as CONCATOperator, mi, lname,
>> >> concat(mi, lname) as CONCAT from concatNull")
>> >>  .ordered()
>> >>  .baselineColumns("CONCATOperator", "mi", "lname", "CONCAT")
>> >>  .baselineValues("A.Nowmer", "A.", "Nowmer", "A.Nowmer")
>> >>  .baselineValues("I.Whelply", "I.", "Whelply", "I.Whelply")
>> >>  .baselineValues(null, null, "Derry", "Derry")
>> >>  .baselineValues("J.Spence", "J.", "Spence", "J.Spence")
>> >>  .build().run();
>> >>
>> >> Can you please suggest how this example can be re-written?
>> >>
>> >> Kind regards,
>> >> Arina
>> >>
>> >> On Mon, Jun 25, 2018 at 11:10 PM Timothy Farkas 
>> wrote:
>> >>
>> >>> Hi All,
>> >>>
>> >>> BaseTestQuery was deprecated a while ago. Keeping it short and sweet
>> :),
>> >> if
>> >>> you want to use BaseTestQuery directly, don't. Use ClusterTest
>> instead.
>> >> If
>> >>> you are using PlanTestBase for planner tests, continue to do so.
>> >> Eventually
>> >>> PlanTestBase will be changed to extend ClusterTest instead. There is a
>> >> JIRA
>> >>> to track that issue https://urldefense.proofpoint.
>> com/v2/url?u=https-3A__urldefense.proofpoint=DwIFaQ=cskd
>> kSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=D9i95wxPKk0aPH
>> ip2Rj6Xju0J151UakaSe6OnMGTW5s=z5tYyOUI_Cyx1cwhMEMfnCb-
>> BDFaZkRaKQKwt_zl2HA=.
>> >> com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-
>> >> 2D6536=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=
>> >> zoNJPdWKxMX9-jbR2bftzwkX-CSrihIbpCirhHM1kA0=
>> >> BPAlenAq0k1kjAz7fUYXyEQMaJM1IPOgmdeySMMY84U=.
>> >>>
>> >>> Thanks,
>> >>> Tim
>> >>>
>> >>
>>
>>
>


Re: Deprecation of BaseTestQuery FYI

2018-06-28 Thread Timothy Farkas
Hi Charles,

Currently the RowSetReader doesn't support booleans, but it can be added.
I'll try to add it now, and see if it could be done quickly. I'll update
you on my progress.

Tim

On Thu, Jun 28, 2018 at 2:38 PM, Charles Givre  wrote:

> Hi Tim,
> Could post some sample code as to how to test a SQL query that returns a
> Boolean?
> —C
>
> > On Jun 28, 2018, at 17:30, Timothy Farkas  wrote:
> >
> > - We would have to add a boolean column reader to ColumnAccessors and
> wire
> > it in and add a getBoolean method to ScalarReader.
> >
> > - Your example should work as is, ClusterTest has a testBuilder method
> > that allows you to use the traditional test builder. Is there something
> not
> > working with the test builder?
> >
> > Tim
> >
> >
> > On Thu, Jun 28, 2018 at 12:39 PM, Arina Yelchiyeva <
> > arina.yelchiy...@gmail.com> wrote:
> >
> >> Hi Tim,
> >>
> >> it looks like deprecating BaseTestQuery was a little bit pre-mature.
> >> For example, from in this PR - https://urldefense.proofpoint.
> com/v2/url?u=https-3A__urldefense.proofpoint=DwIFaQ=
> cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=
> D9i95wxPKk0aPHip2Rj6Xju0J151UakaSe6OnMGTW5s=z5tYyOUI_
> Cyx1cwhMEMfnCb-BDFaZkRaKQKwt_zl2HA=.
> >> com/v2/url?u=https-3A__github.com_apache_drill_pull_1331=DwIBaQ=
> >> cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=
> >> zoNJPdWKxMX9-jbR2bftzwkX-CSrihIbpCirhHM1kA0=_uxhA_
> >> qiMBTjbit6DDw-DvZNRGesfeA5g-FQjkl7f10= -
> >> Charles is trying to re-work  BaseTestQuery usage to ClusterTest.
> >> First, it did not contain getSigletonDouble method which Charles has
> >> implemented. Now he has troubles with implementing getSigletonBoolean
> >> method which might be due to reader limitations.
> >> Also I am not quite clear how we can verify columns names and multiple
> >> columns in the result.
> >> For example:
> >>
> >> testBuilder()
> >>  .sqlQuery("select (mi || lname) as CONCATOperator, mi, lname,
> >> concat(mi, lname) as CONCAT from concatNull")
> >>  .ordered()
> >>  .baselineColumns("CONCATOperator", "mi", "lname", "CONCAT")
> >>  .baselineValues("A.Nowmer", "A.", "Nowmer", "A.Nowmer")
> >>  .baselineValues("I.Whelply", "I.", "Whelply", "I.Whelply")
> >>  .baselineValues(null, null, "Derry", "Derry")
> >>  .baselineValues("J.Spence", "J.", "Spence", "J.Spence")
> >>  .build().run();
> >>
> >> Can you please suggest how this example can be re-written?
> >>
> >> Kind regards,
> >> Arina
> >>
> >> On Mon, Jun 25, 2018 at 11:10 PM Timothy Farkas 
> wrote:
> >>
> >>> Hi All,
> >>>
> >>> BaseTestQuery was deprecated a while ago. Keeping it short and sweet
> :),
> >> if
> >>> you want to use BaseTestQuery directly, don't. Use ClusterTest instead.
> >> If
> >>> you are using PlanTestBase for planner tests, continue to do so.
> >> Eventually
> >>> PlanTestBase will be changed to extend ClusterTest instead. There is a
> >> JIRA
> >>> to track that issue https://urldefense.proofpoint.
> com/v2/url?u=https-3A__urldefense.proofpoint=DwIFaQ=
> cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=
> D9i95wxPKk0aPHip2Rj6Xju0J151UakaSe6OnMGTW5s=z5tYyOUI_
> Cyx1cwhMEMfnCb-BDFaZkRaKQKwt_zl2HA=.
> >> com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-
> >> 2D6536=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=
> >> zoNJPdWKxMX9-jbR2bftzwkX-CSrihIbpCirhHM1kA0=
> >> BPAlenAq0k1kjAz7fUYXyEQMaJM1IPOgmdeySMMY84U=.
> >>>
> >>> Thanks,
> >>> Tim
> >>>
> >>
>
>


Re: Deprecation of BaseTestQuery FYI

2018-06-28 Thread Timothy Farkas
 - We would have to add a boolean column reader to ColumnAccessors and wire
it in and add a getBoolean method to ScalarReader.

 - Your example should work as is, ClusterTest has a testBuilder method
that allows you to use the traditional test builder. Is there something not
working with the test builder?

Tim


On Thu, Jun 28, 2018 at 12:39 PM, Arina Yelchiyeva <
arina.yelchiy...@gmail.com> wrote:

> Hi Tim,
>
> it looks like deprecating BaseTestQuery was a little bit pre-mature.
> For example, from in this PR - https://urldefense.proofpoint.
> com/v2/url?u=https-3A__github.com_apache_drill_pull_1331=DwIBaQ=
> cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=
> zoNJPdWKxMX9-jbR2bftzwkX-CSrihIbpCirhHM1kA0=_uxhA_
> qiMBTjbit6DDw-DvZNRGesfeA5g-FQjkl7f10= -
> Charles is trying to re-work  BaseTestQuery usage to ClusterTest.
> First, it did not contain getSigletonDouble method which Charles has
> implemented. Now he has troubles with implementing getSigletonBoolean
> method which might be due to reader limitations.
> Also I am not quite clear how we can verify columns names and multiple
> columns in the result.
> For example:
>
> testBuilder()
>   .sqlQuery("select (mi || lname) as CONCATOperator, mi, lname,
> concat(mi, lname) as CONCAT from concatNull")
>   .ordered()
>   .baselineColumns("CONCATOperator", "mi", "lname", "CONCAT")
>   .baselineValues("A.Nowmer", "A.", "Nowmer", "A.Nowmer")
>   .baselineValues("I.Whelply", "I.", "Whelply", "I.Whelply")
>   .baselineValues(null, null, "Derry", "Derry")
>   .baselineValues("J.Spence", "J.", "Spence", "J.Spence")
>   .build().run();
>
> Can you please suggest how this example can be re-written?
>
> Kind regards,
> Arina
>
> On Mon, Jun 25, 2018 at 11:10 PM Timothy Farkas  wrote:
>
> > Hi All,
> >
> > BaseTestQuery was deprecated a while ago. Keeping it short and sweet :),
> if
> > you want to use BaseTestQuery directly, don't. Use ClusterTest instead.
> If
> > you are using PlanTestBase for planner tests, continue to do so.
> Eventually
> > PlanTestBase will be changed to extend ClusterTest instead. There is a
> JIRA
> > to track that issue https://urldefense.proofpoint.
> com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-
> 2D6536=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=
> zoNJPdWKxMX9-jbR2bftzwkX-CSrihIbpCirhHM1kA0=
> BPAlenAq0k1kjAz7fUYXyEQMaJM1IPOgmdeySMMY84U=.
> >
> > Thanks,
> > Tim
> >
>


Re: [DISCUSSION] Travis build failures

2018-06-27 Thread Timothy Farkas
+1

On Wed, Jun 27, 2018 at 10:00 AM, Vitalii Diravka  wrote:

> This is a topic from last Hangout meeting.
>
> Sometimes Drill Travis Build fails because of job run expires.
> Certainly, the right way is to accelerate Drill execution :)
>
> Nevertheless I believe we could consider excluding some more tests from
> Travis Build.
> We can add all TPCH tests (
> TestTpchLimit0, TestTpchExplain, TestTpchPlanning, TestTpchExplain) to the
> SlowTest category.
> Actually it isn't urgent, but we can consider it for future, if this
> happens more often.
>
> Kind regards
> Vitalii
>


Re: [ANNOUNCE] New PMC member: Vitalii Diravka

2018-06-26 Thread Timothy Farkas
Congrats Vitalii!

On Tue, Jun 26, 2018 at 11:12 AM, Aman Sinha  wrote:

> I am pleased to announce that Drill PMC invited Vitalii Diravka to the PMC
> and he has accepted the invitation.
>
> Congratulations Vitalii and thanks for your contributions !
>
> -Aman
> (on behalf of Drill PMC)
>


Deprecation of BaseTestQuery FYI

2018-06-25 Thread Timothy Farkas
Hi All,

BaseTestQuery was deprecated a while ago. Keeping it short and sweet :), if
you want to use BaseTestQuery directly, don't. Use ClusterTest instead. If
you are using PlanTestBase for planner tests, continue to do so. Eventually
PlanTestBase will be changed to extend ClusterTest instead. There is a JIRA
to track that issue https://issues.apache.org/jira/browse/DRILL-6536.

Thanks,
Tim


[jira] [Created] (DRILL-6536) Migrate PlanTestBase to extend ClusterTest instead of BaseTestQuery

2018-06-25 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6536:
-

 Summary: Migrate PlanTestBase to extend ClusterTest instead of 
BaseTestQuery
 Key: DRILL-6536
 URL: https://issues.apache.org/jira/browse/DRILL-6536
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas


BaseTestQuery has been deprecated in favor of ClusterTest



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6525) Consolidate value vector calculations that account for value vector doubling in 1 place

2018-06-21 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6525:
-

 Summary: Consolidate value vector calculations that account for 
value vector doubling in 1 place
 Key: DRILL-6525
 URL: https://issues.apache.org/jira/browse/DRILL-6525
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas


Currently HashJoinMemoryCalculatorImpl and the Parquet batch sizing work use 
separate logic for computing vector sizes taking into account value vector 
doubling. This logic should be consolidated into a single utility method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Committer: Padma Penumarthy

2018-06-15 Thread Timothy Farkas
Congrats Padma!


From: Padma Penumarthy 
Sent: Friday, June 15, 2018 5:25:12 PM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy

Thanks to all for your wishes.
To become a committer means a lot to me and I hope to continue making
significant contributions to drill.

Thanks
Padma


> On Jun 15, 2018, at 1:15 PM, Boaz Ben-Zvi  wrote:
>
> Congratulations Padma; welcome to our “club” 
>
> On 6/15/18, 12:47 PM, "Jyothsna Reddy"  wrote:
>
>Congratulations, Padma !!
>
>
>
>On Fri, Jun 15, 2018 at 12:39 PM, AnilKumar B  
> wrote:
>
>> Congratulations, Padma
>>
>> On Fri, Jun 15, 2018 at 12:36 PM Kunal Khatua  wrote:
>>
>>> Congratulations, Padma !
>>>
>>>
>>> On 6/15/2018 12:34:15 PM, Robert Wu  wrote:
>>> Congratulations, Padma!
>>>
>>> Best regards,
>>>
>>> Rob
>>>
>>> -Original Message-
>>> From: Hanumath Rao Maduri
>>> Sent: Friday, June 15, 2018 12:25 PM
>>> To: dev@drill.apache.org
>>> Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy
>>>
>>> Congratulations Padma!
>>>
>>> On Fri, Jun 15, 2018 at 12:04 PM, Gautam Parai wrote:
>>>
 Congratulations Padma!!


 Gautam

 
 From: Vlad Rozov
 Sent: Friday, June 15, 2018 11:56:37 AM
 To: dev@drill.apache.org
 Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy

 Congrats Padma!

 Thank you,

 Vlad

 On 6/15/18 11:38, Charles Givre wrote:
> Congrats Padma!!
>
>> On Jun 15, 2018, at 13:57, Bridget Bevens wrote:
>>
>> Congratulations, Padma!!! 
>>
>> 
>> From: Prasad Nagaraj Subramanya
>> Sent: Friday, June 15, 2018 10:32:04 AM
>> To: dev@drill.apache.org
>> Subject: Re: [ANNOUNCE] New Committer: Padma Penumarthy
>>
>> Congratulations Padma!
>>
>> Thanks,
>> Prasad
>>
>> On Fri, Jun 15, 2018 at 9:59 AM Vitalii Diravka <>
 vitalii.dira...@gmail.com>
>> wrote:
>>
>>> Congrats Padma!
>>>
>>> Kind regards
>>> Vitalii
>>>
>>>
>>> On Fri, Jun 15, 2018 at 7:40 PM Arina Ielchiieva
>>>
 wrote:
>>>
 Padma, congratulations and welcome!

 Kind regards,
 Arina

 On Fri, Jun 15, 2018 at 7:36 PM Aman Sinha
 wrote:

> The Project Management Committee (PMC) for Apache Drill has
> invited
>>> Padma
> Penumarthy to become a committer, and we are pleased to announce
> that
>>> she
> has
> accepted.
>
> Padma has been contributing to Drill for about 1 1/2 years. She
> has
>>> made
> improvements for work-unit assignment in the parallelizer,
 performance
>>> of
> filter operator for pattern matching and (more recently) on the
> batch sizing for several operators: Flatten, MergeJoin, HashJoin,
>>> UnionAll.
>
> Welcome Padma, and thank you for your contributions. Keep up
> the
 good
 work
> !
>
> -Aman
> (on behalf of Drill PMC)
>


>>>
>> --
>> Thanks & Regards,
>> B Anil Kumar.
>>
>
>



[jira] [Created] (DRILL-6497) Handle OOMs properly while graceful shutdown is in progress.

2018-06-14 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6497:
-

 Summary: Handle OOMs properly while graceful shutdown is in 
progress.
 Key: DRILL-6497
 URL: https://issues.apache.org/jira/browse/DRILL-6497
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Hangout tomorrow (get your tickets now)

2018-06-11 Thread Timothy Farkas
Hi,

I'd like to give the presentation for the resource management proposal.

Thanks,
Tim

From: Parth Chandra 
Sent: Monday, June 11, 2018 5:02:51 PM
To: dev; u...@drill.apache.org
Subject: Hangout tomorrow (get your tickets now)

We'll have the Drill hangout tomorrow Jun12th, 2018 at 10:00 PDT.

If you have any topics to discuss, send a reply to this post or just join
the hangout.

( Drill hangout link

 )

Thanks

Parth


[Question] What is RecordBatch.getOutgoingContainer for?

2018-06-08 Thread Timothy Farkas
Is there any case we would want to use RecordBatch.getOutgoingContainer instead 
of RecordBatch.getContainer? It doesn't seem to be used anywhere except one 
unit test.

Thanks,
Tim


[jira] [Created] (DRILL-6484) Threads in threadpools may not properly propagate their exceptions and errors to parent threads.

2018-06-08 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6484:
-

 Summary: Threads in threadpools may not properly propagate their 
exceptions and errors to parent threads.
 Key: DRILL-6484
 URL: https://issues.apache.org/jira/browse/DRILL-6484
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas


Exceptions and errors should be propagated from threads in thread pools to 
parent threads and handled properly. Currently some threads like those in the 
PartitionDecorator set the exception of the ExecutorState but the exception is 
never handled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Vote] Cleaning Up Old PRs

2018-06-07 Thread Timothy Farkas
;
> I'm not against cleaning up old PRs.  But I am not sure it is easy to
> automate without losing some good work.
>
>
> Thanks.
>
>
> --Robert
>
> 
> From: Dave Oshinsky 
> Sent: Thursday, June 7, 2018 11:34 AM
> To: dev@drill.apache.org
> Subject: Re: [Vote] Cleaning Up Old PRs
>
> Hi Tim,
> Everyone's time is constrained, so I doubt that it will always be possible
> to give "timely" reviews to PR's, especially complex ones, or ones
> regarding problems that are not regarded as high priority.  I suggest these
> changes to your scheme:
>
> 1) Once a PR reaches the 3 months point, send an email to the list and
> directly to the PR creator that the PR will automatically be closed in 1
> more month if specific actions are not taken.  The PR creator is less
> likely to miss an email that is sent directly to him/her.
> 2) Automatic removals should not be executed until an administrator has
> approved it.  In other words, it should not be completely automatic,
> without a human in the loop.
> 3) PR's that are closed (either automatically or not) should remain in the
> system for some time (with "reopen" possible), in case a mistake occurs.
> It seems that github already supports this behavior.
>
> As of this writing, I see 105 open PR's, 1201 closed PR's for Apache
> Drill.  Perhaps I'm missing something, but why the effort to make this
> automatic?  Are there way more PR's than I'm seeing?
>
> Thanks,
> Dave O
>
> 
> From: Timothy Farkas 
> Sent: Thursday, June 7, 2018 1:38 PM
> To: dev@drill.apache.org
> Subject: Re: [Vote] Cleaning Up Old PRs
>
> Hi Dave,
>
> I'm sorry you had a bad experience. We should do better giving timely
> reviews moving forward. I think there are some ways we can protect PRs from
> unresponsive committers while still closing PRs from unresponsive
> contributors. Here are some ideas.
>
>  1. Have an auto responder comment on each new PR after it is opened with
> all the information a contributor needs to be successful along with all the
> information about how PRs are autoclosed and what to do to keep the PR
> alive. Also encourage the contributor to spam us until we do a review in
> this message.
>
>  2. Auto labeling fresh PRs with a "needs-first-review" label (or
> something like that). PRs with this label are exempt from the auto closing
> process and the label will only be removed after a committer has looked at
> the PR and done a first round of review. This can protect a PR that had
> never been reviewed from being closed.
>  3. Allow the contributor to request a "pending" label to be placed on
> their PR. This label would make their PR permanently immune to auto closing
> even after a first round of review has been completed and the
> "needs-first-review" label has been removed.
>
> How do you feel about these protections? Do you think they would be
> sufficient? If not, do you have any alternative ideas to help improve the
> process?
>
> As a note, I think our motivations are the same. We both want quality PRs
> to make it into Drill. I want to do it by removing PRs where the
> contributor is unresponsive so committers can better focus on the PRs that
> need attention. And I think you are rightfully concerned about false
> positives when automating this process. Hopefully we can find a good middle
> ground that everyone can be happy with.
>
> Thanks,
> Tim
>
> 
> From: Dave Oshinsky 
> Sent: Wednesday, June 6, 2018 6:28:39 PM
> To: dev@drill.apache.org
> Subject: Re: [Vote] Cleaning Up Old PRs
>
> Tim,
> It's too restrictive, unless something can be done to educate (outsider)
> PR authors like myself to "go against the grain" and keep asking.  And
> asking.  And asking.  And asking.  You get the picture?  I did all that.
> And it was ignored.  I assumed that people outside MapR aren't welcome to
> contribute, and/or there was little interest in making decimal work
> properly, and/or there was simply nobody available to review it (what I was
> most comfortable believing), and/or my emails smelled really bad (kidding
> on the last one 8-).  I asked a few times, and asked again a few times a
> few months later, and nothing.  What can you do to educate outsiders as to
> what they need to do to make sure a useful PR doesn't get flushed down the
> toilet?  I spent days learning some amount of Drill internals and
> implementing VARDECIMAL (over 70 source files changed), and did it again
> months later to merge to then current master tip.  All ignored for quite
> some time.
>
> Thanks to Volodymyr Vysots

Re: [Vote] Cleaning Up Old PRs

2018-06-07 Thread Timothy Farkas
Hi Dave,

I'm sorry you had a bad experience. We should do better giving timely reviews 
moving forward. I think there are some ways we can protect PRs from 
unresponsive committers while still closing PRs from unresponsive contributors. 
Here are some ideas.

 1. Have an auto responder comment on each new PR after it is opened with all 
the information a contributor needs to be successful along with all the 
information about how PRs are autoclosed and what to do to keep the PR alive. 
Also encourage the contributor to spam us until we do a review in this message.

 2. Auto labeling fresh PRs with a "needs-first-review" label (or something 
like that). PRs with this label are exempt from the auto closing process and 
the label will only be removed after a committer has looked at the PR and done 
a first round of review. This can protect a PR that had never been reviewed 
from being closed.
 3. Allow the contributor to request a "pending" label to be placed on their 
PR. This label would make their PR permanently immune to auto closing even 
after a first round of review has been completed and the "needs-first-review" 
label has been removed.

How do you feel about these protections? Do you think they would be sufficient? 
If not, do you have any alternative ideas to help improve the process?

As a note, I think our motivations are the same. We both want quality PRs to 
make it into Drill. I want to do it by removing PRs where the contributor is 
unresponsive so committers can better focus on the PRs that need attention. And 
I think you are rightfully concerned about false positives when automating this 
process. Hopefully we can find a good middle ground that everyone can be happy 
with.

Thanks,
Tim


From: Dave Oshinsky 
Sent: Wednesday, June 6, 2018 6:28:39 PM
To: dev@drill.apache.org
Subject: Re: [Vote] Cleaning Up Old PRs

Tim,
It's too restrictive, unless something can be done to educate (outsider) PR 
authors like myself to "go against the grain" and keep asking.  And asking.  
And asking.  And asking.  You get the picture?  I did all that.  And it was 
ignored.  I assumed that people outside MapR aren't welcome to contribute, 
and/or there was little interest in making decimal work properly, and/or there 
was simply nobody available to review it (what I was most comfortable 
believing), and/or my emails smelled really bad (kidding on the last one 8-).  
I asked a few times, and asked again a few times a few months later, and 
nothing.  What can you do to educate outsiders as to what they need to do to 
make sure a useful PR doesn't get flushed down the toilet?  I spent days 
learning some amount of Drill internals and implementing VARDECIMAL (over 70 
source files changed), and did it again months later to merge to then current 
master tip.  All ignored for quite some time.

Thanks to Volodymyr Vysotskyi for ultimately grabbing the ball and running with 
it.  That complex a change required an "insider" to bring it fully to fruition. 
 But if the PR had been automatically flushed, I have my doubts as to whether 
the story would have ended the same way.

Thanks,
Dave Oshinsky

____
From: Timothy Farkas 
Sent: Wednesday, June 6, 2018 7:07 PM
To: dev@drill.apache.org
Subject: Re: [Vote] Cleaning Up Old PRs

Good point Dave. With this automation and a stale period of 3 months, PRs would 
be closed after 3 months of inactivity. However, if you just post one comment 
asking a reviewer to review once every three months, it will stay alive 
indefinitely. Also if you don't want to do this you could request your PR to be 
marked as pending, and it would be exempt from the rule and never be closed 
automatically.


The idea behind this automation is to distinguish PRs from contributors who are 
actively working on their PRs and contributors who open a PR but then never 
follow up. In open source, the latter happens often and it really overloads the 
system with PRs that will never be finished. Also having this automation with 
an explicit time limit incentivizes the contributor to make noise and comment 
on the PR to get a review.

In my opinion this is exactly what we want, if your PR doesn't get reviewed you 
should make noise and spam us with messages until we make it happen. As long as 
you keep making noise, your PR won't be closed, and it helps keep us honest by 
doing timely reviews.

What are your thoughts? Do you still feel this is too restrictive?

Thanks,
Tim




From: Dave Oshinsky 
Sent: Wednesday, June 6, 2018 3:50:15 PM
To: dev@drill.apache.org
Subject: Re: [Vote] Cleaning Up Old PRs

Tim,
It took well over one year before anyone started looking at my August 2016 PR 
to implement VARDECIMAL decimal types improvements:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_570=DwIFAw=cskdkSMqhcnjZxdQVpw

Re: [Vote] Cleaning Up Old PRs

2018-06-06 Thread Timothy Farkas
Good point Dave. With this automation and a stale period of 3 months, PRs would 
be closed after 3 months of inactivity. However, if you just post one comment 
asking a reviewer to review once every three months, it will stay alive 
indefinitely. Also if you don't want to do this you could request your PR to be 
marked as pending, and it would be exempt from the rule and never be closed 
automatically.


The idea behind this automation is to distinguish PRs from contributors who are 
actively working on their PRs and contributors who open a PR but then never 
follow up. In open source, the latter happens often and it really overloads the 
system with PRs that will never be finished. Also having this automation with 
an explicit time limit incentivizes the contributor to make noise and comment 
on the PR to get a review.

In my opinion this is exactly what we want, if your PR doesn't get reviewed you 
should make noise and spam us with messages until we make it happen. As long as 
you keep making noise, your PR won't be closed, and it helps keep us honest by 
doing timely reviews.

What are your thoughts? Do you still feel this is too restrictive?

Thanks,
Tim




From: Dave Oshinsky 
Sent: Wednesday, June 6, 2018 3:50:15 PM
To: dev@drill.apache.org
Subject: Re: [Vote] Cleaning Up Old PRs

Tim,
It took well over one year before anyone started looking at my August 2016 PR 
to implement VARDECIMAL decimal types improvements:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_570=DwIFAw=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=MWi8kb0OAU2j5LMIUIewh8w-DPsI0o1XrKdc4X1s9d8=VRLzr69rpmak_g_UzdY7WYp-qS8QUnsHc7ySiWfzVFE=

Volodymyr Vysotskyi ultimately grabbed the decimal types ball and ran with it, 
but I am concerned that my PR and some others would have gotten flushed 
prematurely with this kind of automatic cleaning regimen.

Just my 2.5 cents.

Dave Oshinsky


From: Timothy Farkas 
Sent: Wednesday, June 6, 2018 6:12 PM
To: dev@drill.apache.org
Subject: [Vote] Cleaning Up Old PRs

The subject of this vote is whether / how to use probot stale.


https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_probot_stale=DwIFAw=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=MWi8kb0OAU2j5LMIUIewh8w-DPsI0o1XrKdc4X1s9d8=b-1khYEQPqc40pOYraMy-Dw3iGswgnIUXAkHE8YjGEw=


Please fill out the survey below.


https://urldefense.proofpoint.com/v2/url?u=https-3A__www.surveymonkey.com_r_NGDCX8R=DwIFAw=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=MWi8kb0OAU2j5LMIUIewh8w-DPsI0o1XrKdc4X1s9d8=MNgiBpVkL2b8h4VWBYtgclKclzT2p1skDOOu-GeoWhk=


If you feel this completely misses the mark of what should be done, please 
discuss on this thread. Also this is my first survey monkey poll, so if there 
are any issues please let me know. I'll follow up in two weeks to discuss the 
results.

Thanks,
Tim
***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**



[jira] [Created] (DRILL-6469) Resource Management

2018-06-05 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6469:
-

 Summary: Resource Management
 Key: DRILL-6469
 URL: https://issues.apache.org/jira/browse/DRILL-6469
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas
Assignee: Timothy Farkas


This is a top level Jira for the currently known resource management tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6468) CatastrophicFailure.exit Should Not Call System.exit

2018-06-05 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6468:
-

 Summary: CatastrophicFailure.exit Should Not Call System.exit
 Key: DRILL-6468
 URL: https://issues.apache.org/jira/browse/DRILL-6468
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas
Assignee: Timothy Farkas


Drill may never terminate in the event of a Heap OOM. When this happens we see 
stack traces like the following:

{code}
"250387a7-363d-619c-d745-57ae50f19d15:frag:0:0" #104 daemon prio=10 os_prio=0 
tid=0x7fd9d1eec190 nid=0xd7d5 in Object.wait() [0x7fd953de2000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
- locked <0x0005c06bee28> (a 
org.apache.drill.exec.server.Drillbit$ShutdownThread)
at java.lang.Thread.join(Thread.java:1326)
at 
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
at 
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
at java.lang.Shutdown.runHooks(Shutdown.java:123)
at java.lang.Shutdown.sequence(Shutdown.java:167)
at java.lang.Shutdown.exit(Shutdown.java:212)
- locked <0x0005c1d8bb28> (a java.lang.Class for java.lang.Shutdown)
at java.lang.Runtime.exit(Runtime.java:109)
at java.lang.System.exit(System.java:971)
at 
org.apache.drill.common.CatastrophicFailure.exit(CatastrophicFailure.java:49)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:246)
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

Here CatastrophicFailure.exit is being called when we encounter a Heap OOM. 
Then we call System.exit to terminate the java process. The only issue is that 
System.exit run's Drill's normal shutdown hook and tries to do a graceful 
shutdown. In the case of a Heap OOM we cannot do this reliable because there 
physically isn't enough memory to proceed executing our code. The JVM likely 
gets stuck a various places waiting on garbage collection and object 
allocations on the heap and the Drillbit stops making progress.
*Improving Drill's Behavoir*

*Solution To Hanging Shutdown*

There are two kinds of OutOfMemory exceptions in Drill. Direct Memory OOMs and 
Heap OOMs. Typically Direct Memory OOMs are recoverable because Drill uses 
Direct Memory to store data only, so we can fail a query and lose data and 
recover. Heap OOMs are unrecoverable because we actually need the Heap to 
execute our code, and if we can't use the heap then we basically can't run our 
code reliably.

When Drill experiences a catastrophic failure we should not call System.exit 
because then we will try to shutdown gracefully. In the event of a catastrophic 
failure like a Heap OOM we cannot recover so we should forcefully terminate the 
jvm with Runtime.getRuntime().halt .

This will make Drill shutdown promptly in the event of a Heap OOM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Discuss] Cleanup Old PRs

2018-06-04 Thread Timothy Farkas
Hi all again!

With the latest batch commit we are down from 148 open PRs to 107. To prune 
things down further, I'd like to propose using probot stale 
https://github.com/probot/stale. This is a handy github app which automatically 
marks old PRs as stale and closes them. This way we can automatically and 
politely close PRs that have been inactive for an extended period of time.

What are everyone's thoughts on this?

Thanks,
Tim


From: Timothy Farkas 
Sent: Friday, June 1, 2018 11:01:13 AM
To: dev@drill.apache.org
Subject: Re: [Discuss] Cleanup Old PRs

Hi All,

These are some PRs that were already +1'd by a committer but never merged. Most 
have conflicts, some don't even have conflicts. If there are any volunteers to 
take these across the finish line that would be great.


https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_292=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=8lEIkAws5ws0hRlTD16Tg9OXn-3okcDiiTTK0c8syBk=6T2Y_mslcnhGM6t5fYAkT1Mt3w2AjwHs2ySTUTjJr54=

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_309=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=8lEIkAws5ws0hRlTD16Tg9OXn-3okcDiiTTK0c8syBk=y87vZ1v67LLrJguhoz3mZRazrJr8KPIKSatjFk1Upns=

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_437=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=8lEIkAws5ws0hRlTD16Tg9OXn-3okcDiiTTK0c8syBk=k0WD3r2gH9nEAYoRZkBkRJAPaZ4qI0M0pqTJ2SR2mJY=

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_441=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=8lEIkAws5ws0hRlTD16Tg9OXn-3okcDiiTTK0c8syBk=LugFzfwxsEbNHdcxXHAalcxUSRrTYwx6qLzcfwh3rd0=

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_455=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=8lEIkAws5ws0hRlTD16Tg9OXn-3okcDiiTTK0c8syBk=jL4kOKrjipWBK2hCYvx9Ndhw7l81izk1hfnepyV1_J4=

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_480=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=8lEIkAws5ws0hRlTD16Tg9OXn-3okcDiiTTK0c8syBk=5Mt1bCi5z_yyXRZGABHX2p4d8Ejba5X9n7HHxVYzFuo=

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_drill_pull_652=DwIFAg=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=8lEIkAws5ws0hRlTD16Tg9OXn-3okcDiiTTK0c8syBk=kNOi7uwV_1Thu1z0pCEIsy_WbApul65jeYKbBN7yslk=



There were also 7 really small documentation changes that I think we can merge 
and close. I will follow up with Bridget about those.

Thanks,
Tim


From: Timothy Farkas 
Sent: Thursday, May 31, 2018 6:05:15 PM
To: dev@drill.apache.org
Subject: Re: [Discuss] Cleanup Old PRs

Closed the first round of obsolete PRs. Went from 148 open to 125 open.


I observed some other low hanging fruit that could be closed. Specifically 
there were some small PRs against gh-pages, half of which were already +1'd but 
never merged and the other half of which looked pretty reasonable to me but 
never reviewed. So my question is what is the proper process for merging 
changes into gh-pages?


Paul to kickstart the process of pushing PRs over the line I'll compile a list 
of PRs that were +1'd but never merged. Perhaps we can get some committers to 
volunteer to update the old +1'd PRs and merge them.

Thanks,
Tim



From: Paul Rogers 
Sent: Thursday, May 31, 2018 4:53:24 PM
To: dev@drill.apache.org
Subject: Re: [Discuss] Cleanup Old PRs

+1

I just learned to ignore the ancient PRs; they were not adding much value.

If a PR looks like it could be resurrected, we might consider 1) assigning a 
committer to help push it over the line, and 2) check back with submitter to 
see if they can update it.

We tried the above a few times over the last couple of years and were able to 
finish a couple of otherwise-stale PRs.

Thanks,
- Paul



On Thursday, May 31, 2018, 2:35:25 PM PDT, Timothy Farkas 
 wrote:

 Hi All,

There are a lot of open PRs. I think it would be good to close some of them in 
order to identify the remaining PRs that require action to be taken. 
Specifically I was thinking of first closing obsolete PRs and then see how far 
that takes us. A PR could be considered obsolete if it is:


  *  Changing code or documentation that no longer exists.
  *  Adding documentation that is no longer correct.
  *  Has a note already on the PR that it needs to be closed because another PR 
was opened.


Any thoughts?

Thanks,
Tim


[jira] [Created] (DRILL-6464) Disallow System.out, System.err, and Exception.printStackTrace

2018-06-04 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6464:
-

 Summary: Disallow System.out, System.err, and 
Exception.printStackTrace
 Key: DRILL-6464
 URL: https://issues.apache.org/jira/browse/DRILL-6464
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas


Add checkstyle rules to disallow using these print methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6462) Enhance OperatorTestBuilder to use RowSets instead of adhoc json strings.

2018-06-01 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6462:
-

 Summary: Enhance OperatorTestBuilder to use RowSets instead of 
adhoc json strings.
 Key: DRILL-6462
 URL: https://issues.apache.org/jira/browse/DRILL-6462
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6461) Add Basic Data Correctness Unit Tests

2018-06-01 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6461:
-

 Summary: Add Basic Data Correctness Unit Tests
 Key: DRILL-6461
 URL: https://issues.apache.org/jira/browse/DRILL-6461
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Timothy Farkas
Assignee: Timothy Farkas


There are no data correctness unit tests for HashAgg. We need to add some.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [Discuss] Cleanup Old PRs

2018-05-31 Thread Timothy Farkas
Closed the first round of obsolete PRs. Went from 148 open to 125 open.


I observed some other low hanging fruit that could be closed. Specifically 
there were some small PRs against gh-pages, half of which were already +1'd but 
never merged and the other half of which looked pretty reasonable to me but 
never reviewed. So my question is what is the proper process for merging 
changes into gh-pages?


Paul to kickstart the process of pushing PRs over the line I'll compile a list 
of PRs that were +1'd but never merged. Perhaps we can get some committers to 
volunteer to update the old +1'd PRs and merge them.

Thanks,
Tim



From: Paul Rogers 
Sent: Thursday, May 31, 2018 4:53:24 PM
To: dev@drill.apache.org
Subject: Re: [Discuss] Cleanup Old PRs

+1

I just learned to ignore the ancient PRs; they were not adding much value.

If a PR looks like it could be resurrected, we might consider 1) assigning a 
committer to help push it over the line, and 2) check back with submitter to 
see if they can update it.

We tried the above a few times over the last couple of years and were able to 
finish a couple of otherwise-stale PRs.

Thanks,
- Paul



On Thursday, May 31, 2018, 2:35:25 PM PDT, Timothy Farkas 
 wrote:

 Hi All,

There are a lot of open PRs. I think it would be good to close some of them in 
order to identify the remaining PRs that require action to be taken. 
Specifically I was thinking of first closing obsolete PRs and then see how far 
that takes us. A PR could be considered obsolete if it is:


  *  Changing code or documentation that no longer exists.
  *  Adding documentation that is no longer correct.
  *  Has a note already on the PR that it needs to be closed because another PR 
was opened.


Any thoughts?

Thanks,
Tim


[jira] [Resolved] (DRILL-4411) HashJoin should not only depend on number of records, but also on size

2018-05-31 Thread Timothy Farkas (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas resolved DRILL-4411.
---
Resolution: Duplicate

> HashJoin should not only depend on number of records, but also on size
> --
>
> Key: DRILL-4411
> URL: https://issues.apache.org/jira/browse/DRILL-4411
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Reporter: MinJi Kim
>Assignee: MinJi Kim
>Priority: Major
>
> In HashJoinProbeTemplate, each batch is limited to TARGET_RECORDS_PER_BATCH 
> (4000).  But we should not only depend on the number of records, but also 
> size (in case of extremely large records).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[Discuss] Cleanup Old PRs

2018-05-31 Thread Timothy Farkas
Hi All,

There are a lot of open PRs. I think it would be good to close some of them in 
order to identify the remaining PRs that require action to be taken. 
Specifically I was thinking of first closing obsolete PRs and then see how far 
that takes us. A PR could be considered obsolete if it is:


  *   Changing code or documentation that no longer exists.
  *   Adding documentation that is no longer correct.
  *   Has a note already on the PR that it needs to be closed because another 
PR was opened.


Any thoughts?

Thanks,
Tim


Re: [ANNOUNCE] New Committer: Timothy Farkas

2018-05-25 Thread Timothy Farkas
Thanks everyone! Looking forward to doing more work with this great team!

Tim


From: salim achouche <sachouc...@gmail.com>
Sent: Friday, May 25, 2018 4:01:14 PM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New Committer: Timothy Farkas

Congrats Tim!

On Fri, May 25, 2018 at 11:58 AM, Aman Sinha <amansi...@apache.org> wrote:

> The Project Management Committee (PMC) for Apache Drill has invited Timothy
> Farkas  to become a committer, and we are pleased to announce that he
> has accepted.
>
> Tim has become an active contributor to Drill in less than a year. During
> this time he has contributed to addressing flaky unit tests,  fixing memory
> leaks in certain operators,  enhancing the system options framework to be
> more extensible and setting up the Travis CI tests.  More recently, he
> worked on the memory sizing calculations for hash join.
>
> Welcome Tim, and thank you for your contributions.  Keep up the good work !
>
> -Aman
> (on behalf of Drill PMC)
>


Re: [Question] Using loggers in tests

2018-05-25 Thread Timothy Farkas
Thanks for the pointers Sorabh / Vova,

I was able to get lilith running and displaying test logs out of the box. 
Paul's LogFixture is also a good and easy to use alternative if we want to 
enable logs to print to the console. Since we have several good options for 
enabling the logs in the tests, I'll proceed with removing unnecessary 
System.out.prints and converting them into log statements where appropriate.

Thanks,
Tim


From: Sorabh Hamirwasia <shamirwa...@mapr.com>
Sent: Friday, May 25, 2018 11:35:30 AM
To: dev@drill.apache.org
Subject: Re: [Question] Using loggers in tests

I think by default the logging is disabled except for error message which is 
directed to stdout. All other logging are directed to socket appender consumed 
by Lilith (if setup is there) when the property drill.lilith.enable is set to 
true.


Thanks,
Sorabh


From: Timothy Farkas <tfar...@mapr.com>
Sent: Friday, May 25, 2018 10:55:06 AM
To: dev@drill.apache.org
Subject: Re: [Question] Using loggers in tests

Thanks Vova,

I saw the logback-test.xml in drill-common, but logging messages won't print in 
the unit tests in java-exec unless I copy logback-test.xml to 
exec/java-exec/src/test/resources. Was I missing something?

Thanks,
Tim


From: Vova Vysotskyi <vvo...@gmail.com>
Sent: Friday, May 25, 2018 2:03:01 AM
To: dev@drill.apache.org
Subject: Re: [Question] Using loggers in tests

Hi Tim,

Drill already contains logback-test.xml file in drill-common module.
Since this module is used in all other modules, there is no need to add
more logback-test.xml files.

Kind regards,
Volodymyr Vysotskyi


пт, 25 трав. 2018 о 06:05 Timothy Farkas <tfar...@mapr.com> пише:

> Thanks Paul, forgot about that. I'll migrate all the tests off of
> System.out.print and onto LogFixture, I don't think its worth it to create
> a TestLogger.out since we should all be using loggers anyway. I'll also add
> a checkstyle check that will cause the build to fail of System.out.print is
> used anywhere.
>
> Thanks,
> Tim
>
> 
> From: Paul Rogers <par0...@yahoo.com.INVALID>
> Sent: Thursday, May 24, 2018 7:08:28 PM
> To: dev@drill.apache.org
> Subject: Re: [Question] Using loggers in tests
>
> LogFixture? As illustrated in ExampleTest?
>
> This fixture lets you turn on all or selected loggers for the duration of
> a single test. I used it all the time when debugging. Works great.
> It works when turning loggers on when the default is that they are off.
> For whatever reason, I could not get it to work to turn off logging that
> was enabled in the config file.
> At one point I did look into adding a custom debug logger that works just
> like System.out, but is disabled by default. That way, conversion was just
> a matter of replacing System.out with TestLogger.out which can be done via
> search/replace. Not sure if I ever checked that in, but it would be trivial
> to replicate.
> Thanks,
> - Paul
>
>
>
> On Thursday, May 24, 2018, 5:32:49 PM PDT, Timothy Farkas <
> tfar...@mapr.com> wrote:
>
>  Hi All,
>
> I was wondering if there was a magical way to enable the Slf4j loggers for
> unit tests without adding a logback-test.xml file into src/test/resources
> for a submodule in the project? If not, would there by any issues with
> adding a default logback-test.xml file that has logging disabled by default
> to each submodule's src/test/resources directory? I'd like to do this in
> order to discourage the use of System.out.println in tests (and eventually
> prohibit it completely) by providing an easy to use out of the box
> alternative. Currently our test logs are polluted by many System.out.print
> statements and switching to using logging will allow us to have our test
> messages when we want them, and to disable them when we don't want them.
>
> Thanks,
> Tim
>


Re: [Question] Using loggers in tests

2018-05-25 Thread Timothy Farkas
Thanks Vova,

I saw the logback-test.xml in drill-common, but logging messages won't print in 
the unit tests in java-exec unless I copy logback-test.xml to 
exec/java-exec/src/test/resources. Was I missing something?

Thanks,
Tim


From: Vova Vysotskyi <vvo...@gmail.com>
Sent: Friday, May 25, 2018 2:03:01 AM
To: dev@drill.apache.org
Subject: Re: [Question] Using loggers in tests

Hi Tim,

Drill already contains logback-test.xml file in drill-common module.
Since this module is used in all other modules, there is no need to add
more logback-test.xml files.

Kind regards,
Volodymyr Vysotskyi


пт, 25 трав. 2018 о 06:05 Timothy Farkas <tfar...@mapr.com> пише:

> Thanks Paul, forgot about that. I'll migrate all the tests off of
> System.out.print and onto LogFixture, I don't think its worth it to create
> a TestLogger.out since we should all be using loggers anyway. I'll also add
> a checkstyle check that will cause the build to fail of System.out.print is
> used anywhere.
>
> Thanks,
> Tim
>
> 
> From: Paul Rogers <par0...@yahoo.com.INVALID>
> Sent: Thursday, May 24, 2018 7:08:28 PM
> To: dev@drill.apache.org
> Subject: Re: [Question] Using loggers in tests
>
> LogFixture? As illustrated in ExampleTest?
>
> This fixture lets you turn on all or selected loggers for the duration of
> a single test. I used it all the time when debugging. Works great.
> It works when turning loggers on when the default is that they are off.
> For whatever reason, I could not get it to work to turn off logging that
> was enabled in the config file.
> At one point I did look into adding a custom debug logger that works just
> like System.out, but is disabled by default. That way, conversion was just
> a matter of replacing System.out with TestLogger.out which can be done via
> search/replace. Not sure if I ever checked that in, but it would be trivial
> to replicate.
> Thanks,
> - Paul
>
>
>
> On Thursday, May 24, 2018, 5:32:49 PM PDT, Timothy Farkas <
> tfar...@mapr.com> wrote:
>
>  Hi All,
>
> I was wondering if there was a magical way to enable the Slf4j loggers for
> unit tests without adding a logback-test.xml file into src/test/resources
> for a submodule in the project? If not, would there by any issues with
> adding a default logback-test.xml file that has logging disabled by default
> to each submodule's src/test/resources directory? I'd like to do this in
> order to discourage the use of System.out.println in tests (and eventually
> prohibit it completely) by providing an easy to use out of the box
> alternative. Currently our test logs are polluted by many System.out.print
> statements and switching to using logging will allow us to have our test
> messages when we want them, and to disable them when we don't want them.
>
> Thanks,
> Tim
>


Re: [Question] Using loggers in tests

2018-05-24 Thread Timothy Farkas
Thanks Paul, forgot about that. I'll migrate all the tests off of 
System.out.print and onto LogFixture, I don't think its worth it to create a 
TestLogger.out since we should all be using loggers anyway. I'll also add a 
checkstyle check that will cause the build to fail of System.out.print is used 
anywhere.

Thanks,
Tim


From: Paul Rogers <par0...@yahoo.com.INVALID>
Sent: Thursday, May 24, 2018 7:08:28 PM
To: dev@drill.apache.org
Subject: Re: [Question] Using loggers in tests

LogFixture? As illustrated in ExampleTest?

This fixture lets you turn on all or selected loggers for the duration of a 
single test. I used it all the time when debugging. Works great.
It works when turning loggers on when the default is that they are off. For 
whatever reason, I could not get it to work to turn off logging that was 
enabled in the config file.
At one point I did look into adding a custom debug logger that works just like 
System.out, but is disabled by default. That way, conversion was just a matter 
of replacing System.out with TestLogger.out which can be done via 
search/replace. Not sure if I ever checked that in, but it would be trivial to 
replicate.
Thanks,
- Paul



On Thursday, May 24, 2018, 5:32:49 PM PDT, Timothy Farkas 
<tfar...@mapr.com> wrote:

 Hi All,

I was wondering if there was a magical way to enable the Slf4j loggers for unit 
tests without adding a logback-test.xml file into src/test/resources for a 
submodule in the project? If not, would there by any issues with adding a 
default logback-test.xml file that has logging disabled by default to each 
submodule's src/test/resources directory? I'd like to do this in order to 
discourage the use of System.out.println in tests (and eventually prohibit it 
completely) by providing an easy to use out of the box alternative. Currently 
our test logs are polluted by many System.out.print statements and switching to 
using logging will allow us to have our test messages when we want them, and to 
disable them when we don't want them.

Thanks,
Tim


[Question] Using loggers in tests

2018-05-24 Thread Timothy Farkas
Hi All,

I was wondering if there was a magical way to enable the Slf4j loggers for unit 
tests without adding a logback-test.xml file into src/test/resources for a 
submodule in the project? If not, would there by any issues with adding a 
default logback-test.xml file that has logging disabled by default to each 
submodule's src/test/resources directory? I'd like to do this in order to 
discourage the use of System.out.println in tests (and eventually prohibit it 
completely) by providing an easy to use out of the box alternative. Currently 
our test logs are polluted by many System.out.print statements and switching to 
using logging will allow us to have our test messages when we want them, and to 
disable them when we don't want them.

Thanks,
Tim


[jira] [Created] (DRILL-6439) Sometimes Travis Times Out

2018-05-22 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6439:
-

 Summary: Sometimes Travis Times Out
 Key: DRILL-6439
 URL: https://issues.apache.org/jira/browse/DRILL-6439
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas
Assignee: Timothy Farkas


Ocassionally Travis builds run a few minutes longer than usual and timeout.

{code}
changes detected, packing new archive
.
.


The job exceeded the maximum time limit for jobs, and has been terminated.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6438) TestLocalExchange Generates a lot of unnecessary messages that pollute test logs.

2018-05-22 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6438:
-

 Summary: TestLocalExchange Generates a lot of unnecessary messages 
that pollute test logs.
 Key: DRILL-6438
 URL: https://issues.apache.org/jira/browse/DRILL-6438
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


See example

{code}
Running 
org.apache.drill.exec.physical.impl.TestLocalExchange#testGroupByMultiFields
Plan: {
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.width.max_per_node",
  "num_val" : 2,
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "planner.enable_mux_exchange",
  "bool_val" : true,
  "scope" : "SESSION"
}, {
  "kind" : "BOOLEAN",
  "accessibleScopes" : "ALL",
  "name" : "planner.enable_demux_exchange",
  "bool_val" : false,
  "scope" : "SESSION"
}, {
  "kind" : "LONG",
  "accessibleScopes" : "ALL",
  "name" : "planner.slice_target",
  "num_val" : 1,
  "scope" : "SESSION"
} ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "fs-scan",
"@id" : 196611,
"userName" : "travis",
"files" : [ 
"file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/6.json",
 
"file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/9.json",
 
"file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/3.json",
 
"file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/1.json",
 
"file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/2.json",
 
"file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/7.json",
 
"file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/0.json",
 
"file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/5.json",
 
"file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/4.json",
 
"file:/home/travis/build/apache/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.TestLocalExchange/root/empTable/8.json"
 ],
"storage" : {
  "type" : "file",
  "enabled" : true,
  "connection" : "file:///",
  "config" : null,
  "workspaces" : {
"root" : {
  "location" : 
"/home/travis/build/apache/drill/exec/java-exec/./target/org.apache.drill.exec.physical.impl.TestLocalExchange/root",
  "writable" : true,
  "defaultInputFormat" : null,
  "allowAccessOutsideWorkspace" : false
},
"tmp" : {
  "location" : 
"/home/travis/build/apache/drill/exec/java-exec/./target/org.apache.drill.exec.physical.impl.TestLocalExchange/dfsTestTmp/1527026062606-0",
  "writable" : true,
  "defaultInputFormat" : null,
  "allowAccessOutsideWorkspace" : false
},
"default" : {
  "location" : 
"/home/travis/build/apache/drill/exec/java-exec/./target/org.apache.drill.exec.physical.impl.TestLocalExchange/root",
  "writable" : true,
  "defaultInputFormat" : null,
  "allowAccessOutsideWorkspace" : false
}
  },
  "formats" : {
"psv" : {
  "type"

[jira] [Created] (DRILL-6437) Travis Fails Because Logs Are Flooded.

2018-05-22 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6437:
-

 Summary: Travis Fails Because Logs Are Flooded.
 Key: DRILL-6437
 URL: https://issues.apache.org/jira/browse/DRILL-6437
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


The Travis logs are flooded when downloading mysql.

{code}
Downloading from central: 
http://repo.maven.apache.org/maven2/com/jcabi/mysql-dist/5.6.14/mysql-dist-5.6.14-linux-amd64.zip
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
Progress (1): 0.1/325 MB
{code}

And the Travis build fails with

{code}
The log length has exceeded the limit of 4 MB (this usually means that the test 
suite is raising the same exception over and over).

The job has been terminated
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6436) Store context and name in AbstractStoragePlugin instead of replicating fields in each StoragePlugin

2018-05-22 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6436:
-

 Summary: Store context and name in AbstractStoragePlugin instead 
of replicating fields in each StoragePlugin
 Key: DRILL-6436
 URL: https://issues.apache.org/jira/browse/DRILL-6436
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6430) Drill Should Not Fail If It Sees Deprecated Options Stored In Zookeeper Or Locally

2018-05-18 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6430:
-

 Summary: Drill Should Not Fail If It Sees Deprecated Options 
Stored In Zookeeper Or Locally
 Key: DRILL-6430
 URL: https://issues.apache.org/jira/browse/DRILL-6430
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


This is required for resource management since we will likely remove many 
options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6429) SortImpl Should Not Use BufferAllocator.setLenient()

2018-05-18 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6429:
-

 Summary: SortImpl Should Not Use BufferAllocator.setLenient()
 Key: DRILL-6429
 URL: https://issues.apache.org/jira/browse/DRILL-6429
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6428) HashAgg should not use BufferAllocator.setLenient()

2018-05-18 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6428:
-

 Summary: HashAgg should not use BufferAllocator.setLenient()
 Key: DRILL-6428
 URL: https://issues.apache.org/jira/browse/DRILL-6428
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


Memory limits should not be lenient, they need to be explicit and 
deterministic. This is required for resource management.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6386) Disallow Unused Imports In Checkstyle

2018-05-14 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas resolved DRILL-6386.
---
Resolution: Fixed

> Disallow Unused Imports In Checkstyle
> -
>
> Key: DRILL-6386
> URL: https://issues.apache.org/jira/browse/DRILL-6386
> Project: Apache Drill
>  Issue Type: Improvement
>    Reporter: Timothy Farkas
>        Assignee: Timothy Farkas
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Drill commits with improper author name

2018-05-14 Thread Timothy Farkas
Thanks Path,

I'll be sure to attribute any formatting changes I make to myself from now on.

Tim


From: Vlad Rozov <vro...@apache.org>
Sent: Friday, May 11, 2018 11:16:18 PM
To: dev@drill.apache.org
Subject: Re: Drill commits with improper author name

My 2 cents:

1. Most of the time massive format related changes are done using tools
such as IDE or maven plugins and they are not a result of manual editing
source code and there is no IP involved into those changes.
2. In many cases code refactoring (moving class from one package to
another for example) results in incorrect author being attributed by git
(it is possible to force git to track changes properly, but it requires
additional effort on a contributor side and I don't think that Drill
follows that practice).
2. Git is not the only way Apache accepts contributions. Contributors
are not required to have git (and/or svn) account and may contribute by
attaching a patch to JIRA, so SCM may not always have info on a
contributor. Even when SCM(git) does not have info on who contributed a
patch, that information is always available in JIRA.

Vlad

On 5/11/18 15:25, Parth Chandra wrote:
> Ted as usual puts it succinctly. It doesn't matter what the nature of the
> change, the author name should be accurate.
>
> Not blaming anyone, we frequently look to other projects to learn how
> things are done. Let's just not do it in Drill.
>
>
>
> On Fri, May 11, 2018 at 3:10 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>
>> Tim,
>>
>> It is important to attribute *all* changes to after real person. If you
>> make a change, we should track back to you. If you, as a committer accept a
>> change from someone else, the original author should be preserved and your
>> name should be recorded as well.
>>
>> A big claim that is made by all Apache projects is that provenance of all
>> changes is documented. We have to support that claim by credit the right
>> person.
>>
>> On Fri, May 11, 2018, 12:51 Timothy Farkas <tfar...@mapr.com> wrote:
>>
>>> Hi Parth,
>>>
>>> I'm the culprit for that. It was suggested during the code review for my
>>> change that sweeping formatting only changes should be attributed to a
>> fake
>>> Drill Dev user. Having this separate commit was approved by the reviewer
>>> when my change was reviewed. This practice is also done in other open
>>> source projects such as Apache Apex, see an example of such a commit here
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_apex-2Dcore_commit_=DwICaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=HtGhZK8op5Mwg0z9TjERBfHDra-WjpD8Tm__dfy2VBs=TaQco0GD1Gen16M8KLSBwsyTYDhbJq2hUtQBj5h8r5o=
>> 4a91c30c25c0c10562aec4350fb03e40a06d4a89
>>> .
>>>
>>> If this is not the right way to go about things, how should formatting
>>> only changes be committed? I will likely be making more formatting
>> changes
>>> as I improve the checkstyle checks and want to make sure I follow the
>> right
>>> process.
>>>
>>> Thanks,
>>> Tim
>>>
>>> 
>>> From: Parth Chandra <par...@apache.org>
>>> Sent: Friday, May 11, 2018 11:59:09 AM
>>> To: dev
>>> Subject: Drill commits with improper author name
>>>
>>> Can we please not use Drill-Dev as the author email?
>>> Committers please watch out and ask contributors to provide a valid
>> email.
>>>
>>>
>>>
>>>
>>> This is an automated email from the ASF dual-hosted git repository.
>>>> in repository
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitbox.
>> apache.org_repos_asf_drill.git=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=
>> 4eQVr8zB8ZBff-yxTimdOQ=lfd0vh9Lq713LfH65fSXfgqQoh7qpIEQoZn76R_h5Gw=
>> MyvLwOdgKMfhD1RWp15iDV39r8xI1ry9kkC6zcMsZL8=
>>>> commit 8a1a7c53fb211dcba6e7b2f2ce90c28af4b9c518
>>>> Author: Drill Dev <dev@drill.apache.org>
>>>> AuthorDate: Tue May 8 13:57:37 2018 -0700
>>>>
>>>>  DRILL-6386: Remove unused imports and star imports.
>>>> ---
>>>>
>>>>



Re: Drill commits with improper author name

2018-05-11 Thread Timothy Farkas
Hi Parth,

I'm the culprit for that. It was suggested during the code review for my change 
that sweeping formatting only changes should be attributed to a fake Drill Dev 
user. Having this separate commit was approved by the reviewer when my change 
was reviewed. This practice is also done in other open source projects such as 
Apache Apex, see an example of such a commit here 
https://github.com/apache/apex-core/commit/4a91c30c25c0c10562aec4350fb03e40a06d4a89
 .

If this is not the right way to go about things, how should formatting only 
changes be committed? I will likely be making more formatting changes as I 
improve the checkstyle checks and want to make sure I follow the right process.

Thanks,
Tim


From: Parth Chandra 
Sent: Friday, May 11, 2018 11:59:09 AM
To: dev
Subject: Drill commits with improper author name

Can we please not use Drill-Dev as the author email?
Committers please watch out and ask contributors to provide a valid email.





This is an automated email from the ASF dual-hosted git repository.
>
> in repository 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__gitbox.apache.org_repos_asf_drill.git=DwIBaQ=cskdkSMqhcnjZxdQVpwTXg=4eQVr8zB8ZBff-yxTimdOQ=lfd0vh9Lq713LfH65fSXfgqQoh7qpIEQoZn76R_h5Gw=MyvLwOdgKMfhD1RWp15iDV39r8xI1ry9kkC6zcMsZL8=
>
> commit 8a1a7c53fb211dcba6e7b2f2ce90c28af4b9c518
> Author: Drill Dev 
> AuthorDate: Tue May 8 13:57:37 2018 -0700
>
> DRILL-6386: Remove unused imports and star imports.
> ---
>
>


[Notice] CheckStyle Changes :)

2018-05-10 Thread Timothy Farkas
Hi All,

Unused imports have been recently disallowed, so Travis and Jenkins will fail 
if your PR includes unused imports. To avoid breaking the build, please rebase 
your PRs onto the latest master and run "mvn checkstyle:check" and fix any 
errors before your PRs are merged.

Thanks,
Tim




[jira] [Created] (DRILL-6405) Boolean Cast Exceptions in Tests

2018-05-10 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6405:
-

 Summary: Boolean Cast Exceptions in Tests
 Key: DRILL-6405
 URL: https://issues.apache.org/jira/browse/DRILL-6405
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


15:56:49.937 [250b31cd-e445-ddac-532d-d09cec14ff3c:foreman] ERROR 
o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: IllegalArgumentException: 
Invalid value for boolean: A


[Error Id: 03cbfc62-c929-4897-80ae-eda08c9427d7 on drillu4.qa.lab:31022]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IllegalArgumentException: Invalid value for boolean: A


[Error Id: 03cbfc62-c929-4897-80ae-eda08c9427d7 on drillu4.qa.lab:31022]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
 ~[drill-common-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:761)
 [classes/:na]
at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.checkCommonStates(QueryStateProcessor.java:325)
 [classes/:na]
at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.planning(QueryStateProcessor.java:221)
 [classes/:na]
at 
org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:83)
 [classes/:na]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) 
[classes/:na]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[na:1.8.0_161]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[na:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: Error while applying rule 
ReduceExpressionsRule(Project), args 
[rel#177367:LogicalProject.NONE.ANY([]).[](input=rel#177366:Subset#0.NONE.ANY([]).[0],b_val=CAST('A'):BOOLEAN
 NOT NULL)]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:282) 
[classes/:na]
... 3 common frames omitted
Caused by: java.lang.RuntimeException: Error while applying rule 
ReduceExpressionsRule(Project), args 
[rel#177367:LogicalProject.NONE.ANY([]).[](input=rel#177366:Subset#0.NONE.ANY([]).[0],b_val=CAST('A'):BOOLEAN
 NOT NULL)]
at 
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:236)
 ~[calcite-core-1.16.0-drill-r0.jar:1.16.0-drill-r0]
at 
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:652)
 ~[calcite-core-1.16.0-drill-r0.jar:1.16.0-drill-r0]
at 
org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:368) 
~[calcite-core-1.16.0-drill-r0.jar:1.16.0-drill-r0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:419)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:359)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:258)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:308)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:178)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:145)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:83)
 ~[classes/:na]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:567) 
[classes/:na]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264) 
[classes/:na]
... 3 common frames omitted
Caused by: java.lang.RuntimeException: Error in evaluating function of castBIT
at 
org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitFunctionHolderExpression(InterpreterEvaluator.java:347)
 ~[classes/:na]
at 
org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator$EvalVisitor.visitFunctionHolderExpression(InterpreterEvaluator.java:194)
 ~[classes/:na]
at 
org.apache.drill.common.expression.FunctionHolderExpression.accept(FunctionHolderExpression.java:53)
 ~[drill-logical-1.14.0-SNAPSHOT.jar:1.14.0-SNAPSHOT]
at 
org.apache.drill.exec.expr.fn.interpreter.InterpreterEvaluator.evaluateConstantExpr(InterpreterEvaluator.java:69)
 ~[classes/:na]
at 
org.apache.drill.exec.planner.logical.DrillConstExecutor.reduce(DrillConstExecutor.java:150)
 ~[classes/:na]
at 
org.apache.calcite.rex.RexSimplify.simplifyCast(RexSimplify.java:949) 
~[calcite-core-1.16.0-drill-r0.jar:1.16.0-drill-r0

[jira] [Created] (DRILL-6404) Some Tests Timeout On Our Build Server

2018-05-10 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6404:
-

 Summary: Some Tests Timeout On Our Build Server
 Key: DRILL-6404
 URL: https://issues.apache.org/jira/browse/DRILL-6404
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


{code}
  TestLargeFileCompilation.testTOP_N_SORT:157->BaseTestQuery.testNoResult:384 » 
TestTimedOut
  
TestLargeFileCompilation.testEXTERNAL_SORT:151->BaseTestQuery.testNoResult:384 
» TestTimedOut
  TestLargeFileCompilation>BaseTestQuery.closeClient:286 » Runtime Exception 
whi...
  TestGracefulShutdown.testRestApiShutdown:294 » TestTimedOut test timed out 
aft...
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6403) There are two SchemaBuilders that do the same thing. Consolidate them.

2018-05-10 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6403:
-

 Summary: There are two SchemaBuilders that do the same thing. 
Consolidate them.
 Key: DRILL-6403
 URL: https://issues.apache.org/jira/browse/DRILL-6403
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


There is org.apache.drill.test.rowSet.schema.SchemaBuilder used for testing and 
org.apache.drill.exec.record.SchemaBuilder used in Drill. They basically do the 
same thing. We should combine them and have one schema builder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6399) Use RowSets In MiniPlanUnitTestBase To Generate Test Data

2018-05-09 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6399:
-

 Summary: Use RowSets In MiniPlanUnitTestBase To Generate Test Data
 Key: DRILL-6399
 URL: https://issues.apache.org/jira/browse/DRILL-6399
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6398) Combine RowSetTestUtils with RowSetUtilities

2018-05-09 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6398:
-

 Summary: Combine RowSetTestUtils with RowSetUtilities
 Key: DRILL-6398
 URL: https://issues.apache.org/jira/browse/DRILL-6398
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


There are two classes with RowSet utils, there should just be one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6397) OperatorTestBuilder, should leverage RowSets for comparing baseline values.

2018-05-09 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6397:
-

 Summary: OperatorTestBuilder, should leverage RowSets for 
comparing baseline values.
 Key: DRILL-6397
 URL: https://issues.apache.org/jira/browse/DRILL-6397
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6396) Remove unused getTempDir Method in BaseFixture

2018-05-09 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6396:
-

 Summary: Remove unused getTempDir Method in BaseFixture
 Key: DRILL-6396
 URL: https://issues.apache.org/jira/browse/DRILL-6396
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas


This tempDirectory method is no longer used. The DirTestWatcher and 
BaseDirTestWatcher classes are used instead for testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6392) System option exec.max_hash_table_size is completely unused. We should remove it.

2018-05-08 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6392:
-

 Summary: System option exec.max_hash_table_size is completely 
unused. We should remove it.
 Key: DRILL-6392
 URL: https://issues.apache.org/jira/browse/DRILL-6392
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6387) TestTpchDistributedConcurrent tests are ignored, they should be enabled.

2018-05-07 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6387:
-

 Summary: TestTpchDistributedConcurrent tests are ignored, they 
should be enabled.
 Key: DRILL-6387
 URL: https://issues.apache.org/jira/browse/DRILL-6387
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas
Assignee: Arina Ielchiieva


[~arina] I noticed that you disabled TestTpchDistributedConcurrent with your 
change for DRILL-5771



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6386) Disallow Unused Imports In Checkstyle

2018-05-07 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6386:
-

 Summary: Disallow Unused Imports In Checkstyle
 Key: DRILL-6386
 URL: https://issues.apache.org/jira/browse/DRILL-6386
 Project: Apache Drill
  Issue Type: Improvement
Reporter: Timothy Farkas
Assignee: Timothy Farkas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6380) Mongo db storage plugin tests can hang on jenkins.

2018-05-02 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6380:
-

 Summary: Mongo db storage plugin tests can hang on jenkins.
 Key: DRILL-6380
 URL: https://issues.apache.org/jira/browse/DRILL-6380
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas
Assignee: Timothy Farkas


When running on our Jenkins server the mongodb tests hang because the Config 
servers take up to 5 seconds to process each request (see *Error 2*). This 
causes the tests to never finish within a reasonable span of time. Searching 
online people run into this issue when mixing versions of mongo db, but that is 
not happening in our tests. A possible cause is *Error 1* which seems to 
indicate that the mongo db config servers are not completely initialized since 
the config servers should have a lockping document when starting up.

*Error 1*

{code}
[mongod output] 2018-05-01T23:38:47.468-0700 I COMMAND  [replSetDistLockPinger] 
command config.lockpings command: findAndModify { findAndModify: "lockpings", 
query: { _id: "ConfigServer" }, update: { $set: { ping: new Date(1525243123413) 
} }, upsert: true, writeConcern: { w: "majority", wtimeout: 15000 } } 
planSummary: IDHACK update: { $set: { ping: new Date(1525243123413) } } 
keysExamined:0 docsExamined:0 nMatched:0 nModified:0 upsert:1 keysInserted:2 
numYields:0 reslen:198 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, 
Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, 
Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } 
protocol:op_query 4055ms
[mongod output] 2018-05-01T23:38:47.469-0700 W SHARDING [replSetDistLockPinger] 
pinging failed for distributed lock pinger :: caused by :: 
LockStateChangeFailed: findAndModify query predicate didn't match any lock 
document
[mongod output] 2018-05-01T23:38:47.498-0700 I SHARDING [Balancer] lock 
'balancer' successfully forced
[mongod output] 2018-05-01T23:38:47.498-0700 I SHARDING [Balancer] distributed 
lock 'balancer' acquired, ts : 5ae95cd5d1023488104e6282
[mongod output] 2018-05-01T23:38:47.498-0700 I SHARDING [Balancer] CSRS 
balancer thread is recovering
[mongod output] 2018-05-01T23:38:47.498-0700 I SHARDING [Balancer] CSRS 
balancer thread is recovered
[mongod output] 2018-05-01T23:38:48.056-0700 I NETWORK  [thread2] connection 
accepted from 127.0.0.1:50244 #10 (7 connections now open)
{code}

*Error 2*

{code}
[mongod output] 2018-05-01T23:39:37.690-0700 I COMMAND  [conn7] command 
config.settings command: find { find: "settings", filter: { _id: "chunksize" }, 
readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1525243172000|1, 
t: 1 } }, limit: 1, maxTimeMS: 3 } planSummary: EOF keysExamined:0 
docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:354 locks:{ 
Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, 
Collection: { acquireCount: { r: 1 } } } protocol:op_command 4988ms
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-6352) Investigate IndexOutOfBoundsException in TestBsonRecordReader

2018-04-30 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas resolved DRILL-6352.
---
Resolution: Fixed

> Investigate IndexOutOfBoundsException in TestBsonRecordReader
> -
>
> Key: DRILL-6352
> URL: https://issues.apache.org/jira/browse/DRILL-6352
> Project: Apache Drill
>  Issue Type: Bug
>    Reporter: Timothy Farkas
>        Assignee: Timothy Farkas
>Priority: Major
>
> TestBsonRecordReader requires 400kb on the allocator in order to run all 
> tests successfully. Reducing the memory below that to 300kb will cause an IOB
> {code}
> objc[92518]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/bin/java 
> (0x10e8fe4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x10e9824e0). One of the two will be used. Which one is undefined.
> java.lang.IndexOutOfBoundsException: DrillBuf[7], udle: [1 0..0], index: 0, 
> length: 4 (expected: range(0, 0))
> DrillBuf[7], udle: [1 0..0]
>   at 
> org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
>   at 
> org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
>   at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
>   at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
>   at 
> org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:696)
>   at 
> org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setSafe(NullableVarCharVector.java:609)
>   at 
> org.apache.drill.exec.vector.complex.impl.NullableVarCharWriterImpl.write(NullableVarCharWriterImpl.java:110)
>   at 
> org.apache.drill.exec.store.bson.BsonRecordReader.writeString(BsonRecordReader.java:276)
>   at 
> org.apache.drill.exec.store.bson.BsonRecordReader.writeToListOrMap(BsonRecordReader.java:167)
>   at 
> org.apache.drill.exec.store.bson.BsonRecordReader.writeToListOrMap(BsonRecordReader.java:139)
>   at 
> org.apache.drill.exec.store.bson.BsonRecordReader.write(BsonRecordReader.java:75)
>   at 
> org.apache.drill.exec.store.bson.TestBsonRecordReader.testRecursiveDocuments(TestBsonRecordReader.java:193)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
>   at 
> com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Committer: Sorabh Hamirwasia

2018-04-30 Thread Timothy Farkas
Congratulations, Sorabh! Well deserved.

Tim


From: Bridget Bevens 
Sent: Monday, April 30, 2018 10:01:26 AM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New Committer: Sorabh Hamirwasia

Congratulations, Sorabh!


From: Vlad Rozov 
Sent: Monday, April 30, 2018 9:45:27 AM
To: dev@drill.apache.org
Subject: Re: [ANNOUNCE] New Committer: Sorabh Hamirwasia

Congrats!

Thank you,

Vlad

On 4/30/18 09:08, Arina Ielchiieva wrote:
> Congrats!
>
> Kind regards
> Arina
>
> On Mon, Apr 30, 2018 at 6:47 PM Vitalii Diravka 
> wrote:
>
>> Well deserved, Sorabh!
>>
>> Kind regards
>> Vitalii
>>
>>
>> On Mon, Apr 30, 2018 at 6:37 PM Abhishek Girish 
>> wrote:
>>
>>> Congrats Sorabh!!
>>> On Mon, Apr 30, 2018 at 8:35 AM Aman Sinha  wrote:
>>>
 The Project Management Committee (PMC) for Apache Drill has invited
>>> Sorabh
 Hamirwasia  to become a committer, and we are pleased to announce that
>> he
 has accepted.

 Over the last 1 1/2 years Sorabh's contributions have been in a few
 different areas. He took
 the lead in designing and implementing network encryption support for
 Drill. He has contributed
 to the web server and UI side.  More recently, he is involved in design
>>> and
 implementation of the lateral join operator.

 Welcome Sorabh, and thank you for your contributions.  Keep up the good
 work !

 -Aman
 (on behalf of Drill PMC)




[jira] [Created] (DRILL-6352) Investigate Why TestBsonRecordReader needs 900kb to run

2018-04-24 Thread Timothy Farkas (JIRA)
Timothy Farkas created DRILL-6352:
-

 Summary: Investigate Why TestBsonRecordReader needs 900kb to run
 Key: DRILL-6352
 URL: https://issues.apache.org/jira/browse/DRILL-6352
 Project: Apache Drill
  Issue Type: Bug
Reporter: Timothy Farkas


TestBsonRecordReader requires 400kb on the allocator in order to run all tests 
successfully. This seems like it too much. Reducing the memory below that to 
300kb will cause an IOB

{code}
objc[92518]: Class JavaLaunchHelper is implemented in both 
/Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/bin/java 
(0x10e8fe4c0) and 
/Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/jre/lib/libinstrument.dylib
 (0x10e9824e0). One of the two will be used. Which one is undefined.

java.lang.IndexOutOfBoundsException: DrillBuf[7], udle: [1 0..0], index: 0, 
length: 4 (expected: range(0, 0))
DrillBuf[7], udle: [1 0..0]

at 
org.apache.drill.exec.memory.BoundsChecking.checkIndex(BoundsChecking.java:80)
at 
org.apache.drill.exec.memory.BoundsChecking.lengthCheck(BoundsChecking.java:86)
at io.netty.buffer.DrillBuf.chk(DrillBuf.java:114)
at io.netty.buffer.DrillBuf.getInt(DrillBuf.java:484)
at 
org.apache.drill.exec.vector.VarCharVector$Mutator.setSafe(VarCharVector.java:696)
at 
org.apache.drill.exec.vector.NullableVarCharVector$Mutator.setSafe(NullableVarCharVector.java:609)
at 
org.apache.drill.exec.vector.complex.impl.NullableVarCharWriterImpl.write(NullableVarCharWriterImpl.java:110)
at 
org.apache.drill.exec.store.bson.BsonRecordReader.writeString(BsonRecordReader.java:276)
at 
org.apache.drill.exec.store.bson.BsonRecordReader.writeToListOrMap(BsonRecordReader.java:167)
at 
org.apache.drill.exec.store.bson.BsonRecordReader.writeToListOrMap(BsonRecordReader.java:139)
at 
org.apache.drill.exec.store.bson.BsonRecordReader.write(BsonRecordReader.java:75)
at 
org.apache.drill.exec.store.bson.TestBsonRecordReader.testRecursiveDocuments(TestBsonRecordReader.java:193)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >