from:"Muhammad Gelbana"

Using Tableau to connect to DB engines using Calcite's JDBC driver

2017-09-19 Thread Muhammad Gelbana

Tableau supports Apache Drill JDBC driver, so you basically can use Drill
as a data provider for Tableau.

I'm asking if anyone implemented a Calcite adapter for some data engine and
tested if Tableau would be able to connect to it as if it was Apache Drill ?

It's like you connect to that adapter by configuring an Apache Drill
connection to it, through Tableau.

Because otherwise, that data engine will need to have an ODBC driver, which
is clearly a pain in the neck if you Google enough. That's actually what
I'm trying to do. I need to implement a Calcite adapter to support a data
engine but supporting Tableau is essential to our customers and I'd be very
happy if I can avoid going through the Calcite ODBC driver path.

I apologize if this sounds like a Calcite question but I believe Drill
developers who worked on the JDBC driver can give a good insight.

If you ask me, I believe Drill is all about Calcite in distributed mode :D,
this may very well be so sketchy point of view but I'm not experienced with
Drill or Calcite myself.

Hopefully I explained my self clearly.

Thanks,
Gelbana

Re: Drill 2.0 (design) hackathon

2017-09-06 Thread Muhammad Gelbana

Understood. But if it's possible to stream the event, may be we can do the
streaming through YouTube too, which can archive the stream afterwards. But
it's up to 8 hours only.

I'm not a YouTube expert though.

https://support.google.com/youtube/answer/6247592

I'm just afraid I may not be able to attend and I'm very interested into
what you guys are going to discuss.

On Sep 7, 2017 1:07 AM, "Pritesh Maker" <pma...@mapr.com> wrote:

> Hi
>
> We don't plan on recording the event (it's a day long event!) but are
> looking at options to have a WebEx or Hangout link if folks want to join
> remotely.
>
> Pritesh
> _________
> From: Muhammad Gelbana <m.gelb...@gmail.com<mailto:m.gelb...@gmail.com>>
> Sent: Wednesday, September 6, 2017 1:08 AM
> Subject: Re: Drill 2.0 (design) hackathon
> To: <dev@drill.apache.org<mailto:dev@drill.apache.org>>
>
>
> Would anyone kindly own the recording of the event ?
>
> On Sep 6, 2017 7:47 AM, "Aman Sinha" <amansi...@apache.org mansi...@apache.org>> wrote:
>
> > Here is the Eventbrite event for registration:
> >
> > https://www.eventbrite.com/e/drill-developer-day-sept-2017-
> > registration-7478463285
> >
> > Please register so we can plan for food and drinks appropriately.
> >
> > The link also contains a google doc link for the preliminary agenda and a
> > 'Topics' tab with volunteer sign-up column. Please add your name to the
> > area(s) of interest.
> >
> > Thanks and look forward to seeing you all !
> >
> > -Aman
> >
> > On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers <prog...@mapr.com prog...@mapr.com>> wrote:
> >
> > > A partial list of Drill’s public APIs:
> > >
> > > IMHO, highest priority for Drill 2.0.
> > >
> > >
> > > * JDBC/ODBC drivers
> > > * Client (for JDBC/ODBC) + ODBC & JDBC
> > > * Client (for full Drill async, columnar)
> > > * Storage plugin
> > > * Format plugin
> > > * System/session options
> > > * Queueing (e.g. ZK-based queues)
> > > * Rest API
> > > * Resource Planning (e.g. max query memory per node)
> > > * Metadata access, storage (e.g. file system locations vs. a
> > metastore)
> > > * Metadata files formats (Parquet, views, etc.)
> > >
> > > Lower priority for future releases:
> > >
> > >
> > > * Query Planning (e.g. Calcite rules)
> > > * Config options
> > > * SQL syntax, especially Drill extensions
> > > * UDF
> > > * Management (e.g. JMX, Rest API calls, etc.)
> > > * Drill File System (HDFS)
> > > * Web UI
> > > * Shell scripts
> > >
> > > There are certainly more. Please suggest those that are missing. I’ve
> > > taken a rough cut at which APIs need forward/backward compatibility
> > first,
> > > in part based on those that are the “most public” and most likely to
> > > change. Others are important, but we can’t do them all at once.
> > >
> > > Thanks,
> > >
> > > - Paul
> > >
> > > On Aug 29, 2017, at 6:00 PM, Aman Sinha <amansi...@apache.org mansi...@apache.org> > > mansi...@apache.org<mailto:mansi...@apache.org>>> wrote:
> > >
> > > Hi Paul,
> > > certainly makes sense to have the API compatibility discussions during
> > this
> > > hackathon. The 2.0 release may be a good checkpoint to introduce
> > breaking
> > > changes necessitating changes to the ODBC/JDBC drivers and other
> external
> > > applications. As part of this exercise (not during the hackathon but
> as a
> > > follow-up action), we also should clearly identify the "public"
> > interfaces.
> > >
> > >
> > > I will add this to the agenda.
> > >
> > > thanks,
> > > -Aman
> > >
> > > On Tue, Aug 29, 2017 at 2:08 PM, Paul Rogers <prog...@mapr.com prog...@mapr.com> > > prog...@mapr.com<mailto:prog...@mapr.com>>> wrote:
> > >
> > > Thanks Aman for organizing the Hackathon!
> > >
> > > The list included many good ideas for Drill 2.0. Some of those require
> > > changes to Drill’s “public” interfaces (file format, client protocol,
> SQL
> > > behavior, etc.)
> > >
> > > At present, Drill has no good mechanism to handle backward/forward
> > > compatibility at the API level. Protobuf versioning certainly helps,
> but
> > > can’t completely solve semantic changes (where a f

Re: Drill 2.0 (design) hackathon

2017-09-06 Thread Muhammad Gelbana

Would anyone kindly own the recording of the event ?

On Sep 6, 2017 7:47 AM, "Aman Sinha"  wrote:

> Here is the Eventbrite event for registration:
>
> https://www.eventbrite.com/e/drill-developer-day-sept-2017-
> registration-7478463285
>
> Please register so we can plan for food and drinks appropriately.
>
> The link also contains a google doc link for the preliminary agenda and a
> 'Topics' tab with volunteer sign-up column.  Please add your name to the
> area(s) of interest.
>
> Thanks and look forward to seeing you all !
>
> -Aman
>
> On Wed, Aug 30, 2017 at 9:44 AM, Paul Rogers  wrote:
>
> > A partial list of Drill’s public APIs:
> >
> > IMHO, highest priority for Drill 2.0.
> >
> >
> >   *   JDBC/ODBC drivers
> >   *   Client (for JDBC/ODBC) + ODBC & JDBC
> >   *   Client (for full Drill async, columnar)
> >   *   Storage plugin
> >   *   Format plugin
> >   *   System/session options
> >   *   Queueing (e.g. ZK-based queues)
> >   *   Rest API
> >   *   Resource Planning (e.g. max query memory per node)
> >   *   Metadata access, storage (e.g. file system locations vs. a
> metastore)
> >   *   Metadata files formats (Parquet, views, etc.)
> >
> > Lower priority for future releases:
> >
> >
> >   *   Query Planning (e.g. Calcite rules)
> >   *   Config options
> >   *   SQL syntax, especially Drill extensions
> >   *   UDF
> >   *   Management (e.g. JMX, Rest API calls, etc.)
> >   *   Drill File System (HDFS)
> >   *   Web UI
> >   *   Shell scripts
> >
> > There are certainly more. Please suggest those that are missing. I’ve
> > taken a rough cut at which APIs need forward/backward compatibility
> first,
> > in part based on those that are the “most public” and most likely to
> > change. Others are important, but we can’t do them all at once.
> >
> > Thanks,
> >
> > - Paul
> >
> > On Aug 29, 2017, at 6:00 PM, Aman Sinha  mansi...@apache.org>> wrote:
> >
> > Hi Paul,
> > certainly makes sense to have the API compatibility discussions during
> this
> > hackathon.  The 2.0 release may be a good checkpoint to introduce
> breaking
> > changes necessitating changes to the ODBC/JDBC drivers and other external
> > applications. As part of this exercise (not during the hackathon but as a
> > follow-up action), we also should clearly identify the "public"
> interfaces.
> >
> >
> > I will add this to the agenda.
> >
> > thanks,
> > -Aman
> >
> > On Tue, Aug 29, 2017 at 2:08 PM, Paul Rogers  prog...@mapr.com>> wrote:
> >
> > Thanks Aman for organizing the Hackathon!
> >
> > The list included many good ideas for Drill 2.0. Some of those require
> > changes to Drill’s “public” interfaces (file format, client protocol, SQL
> > behavior, etc.)
> >
> > At present, Drill has no good mechanism to handle backward/forward
> > compatibility at the API level. Protobuf versioning certainly helps, but
> > can’t completely solve semantic changes (where a field changes meaning,
> or
> > a non-Protobuf data chunk changes format.) As just one concrete example,
> > changing to Arrow will break pre-Arrow ODBC/JDBC drivers because class
> > names and data formats will change.
> >
> > Perhaps we can prioritize, for the proposed 2.0 release, a one-time set
> of
> > breaking changes that introduce a versioning mechanism into our public
> > APIs. Once these are in place, we can evolve the APIs in the future by
> > following the newly-created versioning protocol.
> >
> > Without such a mechanism, we cannot support old & new clients in the same
> > cluster. Nor can we support rolling upgrades. Of course, another solution
> > is to get it right the second time, then freeze all APIs and agree to
> never
> > again change them. Not sure we have sufficient access to a crystal ball
> to
> > predict everything we’d ever need in our APIs, however...
> >
> > Thanks,
> >
> > - Paul
> >
> > On Aug 24, 2017, at 8:39 AM, Aman Sinha  mansi...@apache.org>> wrote:
> >
> > Drill Developers,
> >
> > In order to kick-start the Drill 2.0  release discussions, I would like
> > to
> > propose a Drill 2.0  (design) hackathon (a.k.a Drill Developer Day ™ J ).
> >
> > As I mentioned in the hangout on Tuesday,  MapR has offered to host it on
> > Sept 18th at their offices at 350 Holger Way, San Jose.   Hope that works
> > for most of you!
> >
> > The goal is to get the community together for a day-long technical
> > discussion on key topics in preparation for a Drill 2.0 release as well
> > as
> > potential improvements in upcoming 1.xx releases.  Depending on the
> > interest areas, we could form groups and have a volunteer lead each
> > group.
> >
> > Based on prior discussions on the dev list, hangouts and existing JIRAs,
> > there is already a substantial set of topics and I have summarized a few
> > of
> > them below.   What other topics do folks want to talk about?   Feel free
> > to
> > respond to this thread and I will create a google doc to

Re: Custom extension storage plugin development

2017-08-29 Thread Muhammad Gelbana

I assume this page

can give you the overview you are looking for ? May be you can start with
asking specific questions.

Thanks,
Gelbana

On Tue, Aug 29, 2017 at 12:26 PM, Charuta Rajopadhye 
wrote:

> Hi Team,
>
> I need to develop an extension storage plugin for my data source.
> I have been through this
>  write-custom-storage-plugin-for-apache-drill/37646421#37646421>
> and Apache Drill's documentation  but did
> not find helpful references for my cause.
> Thanks to the mailing list, i got hold of a few wiki pages:
> https://github.com/paul-rogers/drill/wiki/Storage-Plugin-Configuration
> *https://github.com/paul-rogers/drill/wiki/Storage-Plugin-Model
> *
> and other related pages on the same link.
> I have downloaded the drill source code, cross referenced the storage
> plugin implementations in there and the information on the wiki pages,
> learnt about the interfaces to implement, classes to use, configs to set up
> etc.
> This has made my understanding a smidgen better, but i am still unable to
> get a clear picture of the workflow.
> Can someone please guide me in this regard or provide a few more pointers?
>

Re: Test cases that require a UTC timezone.

2017-08-28 Thread Muhammad Gelbana

In the "Drill developer guide or code organization" thread
 you asked me what am I struggling with, well let's discuss this in this
thread as it provides more context.

Frankly, I can't claim I fully understood what you said so I tried to
understand the problem on my own, but I haven't succeeded yet. Let me break
down what you said that I don't understand

"
which is then stored using an offset from the epoch UTC
"
Stored where ? The query used by the test case is querying a literal.

What I'm struggling with is getting the test cases to complete %100
successfully. I thought I need to set the timezone for the server (Drill
instance ?) started by the test cases but I couldn't reach the code that
actually starts the server.

Also since other test cases are ignored (i.e. Marked with @Ignore) because
they depend on timezones, then why are the test cases failing for me aren't
? I find them depending on timezones as well.

Thanks,
Gelbana

On Sun, Jul 9, 2017 at 6:39 PM, Paul Rogers <prog...@mapr.com> wrote:

> Hi Muhammad,
>
> While I can’t comment on the specific test cases, I can say that Drill
> always uses the server’s own timezone to hold dates and times. Not sure how
> this is affecting the tests, but the same date/time will have a different
> numeric value in each time zone. That is, “2 PM on July 9, 2017” is
> interpreted as “2 PM on July 9, 2017 in your server's time zone”, which is
> then stored using an offset from the epoch UTC, but with that value
> reinterpreted as an offset from the epoch in your time zone.
>
> The reinterpreted UTC value is then sent to the client where it is
> reinterpreted again as an offset from the epoch in the client’s own
> timezone. So, on your server, “2 PM on July 9, 2017” is interpreted as “2
> PM on July 9, 2017 GMT+2”, but when I connect to the server, do a query,
> and obtain time data, it is reinterpreted as “2 PM on July 9, 2017 GMT-8."
>
> This mostly works, but does lead to the well known issues that Joda time
> (and, later, the JDK 8 time library) were designed to resolve.
>
> So, to run time tests, you may have to understand our somewhat convoluted
> time mapping to make things work.
>
> Thanks,
>
> - Paul
>
> > On Jul 9, 2017, at 7:47 AM, Muhammad Gelbana <m.gelb...@gmail.com>
> wrote:
> >
> > While trying to run Drill's test cases
> > <https://issues.apache.org/jira/browse/DRILL-5606>, I found that one of
> the
> > failing tests would succeed
> > <https://issues.apache.org/jira/browse/DRILL-5606?
> focusedCommentId=16079131=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-16079131>
> > if the timezone was set to UTC (Mine is GMT+2).
> >
> > When I looked around for other test cases that may require timezones, I
> > found a couple of tests ignored (Marked with @Ignore) because they depend
> > on timezones !
> >
> > Would someone please tell me how can I set the timezone for a test case ?
> > Also sharing a guide about Drill's tests classes, packages,
> > architecture...etc, would be very helpful.
> >
> > -Gelbana
>
>

Re: Drill developer guide or code organization

2017-08-26 Thread Muhammad Gelbana

I agree to that. Having a documentation guiding potential committers
through the code can help many achieve their tasks and grow the community.
I my self am struggling a bit with the test cases framework but I'm not
giving my full time though.

Anyway, here is a list of the all the Github wikis for Drill forks:

https://github.com/paul-rogers/drill/wiki
https://github.com/parthchandra/drill/wiki
https://github.com/kkhatua/drill/wiki
https://github.com/bitblender/drill/wiki
https://github.com/chunhui-shi/drill/wiki
https://github.com/xiaom/drill/wiki
https://github.com/jacques-n/drill/wiki
https://github.com/XingCloud/incubator-drill/wiki (Chinese)

Thanks,
Gelbana

On Sat, Aug 26, 2017 at 3:07 PM, Aditya Allamraju <
aditya.allamr...@gmail.com> wrote:

> Team,
>
> Is there a place where we have documented different Code components of
> Drill?
> What i am looking for is something similar to
> https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide (mainly
> the
> part with code organization)
> I looked at apache docs. But could not find the above info in "developer
> information".
>
> I request the active members of the group to share such info. If it is not
> yet there, can someone please put up a doc for a start briefly mentioning
> different components and problem they are solving.
> Such information will greatly help the newcomers to this community.
>
> Appreciate all the efforts going on in this group.
>
> Thanks
> Aditya
>

Re: Missing versions for "Affected Versions" and "Fix Version" fields ?

2017-08-25 Thread Muhammad Gelbana

They are showing up again, thanks Pritesh.

Thanks,
Gelbana

On Sat, Aug 26, 2017 at 1:26 AM, Pritesh Maker <pma...@mapr.com> wrote:

> I was trying to clean up some of the older versions and I didn’t realize
> that marking the older versions as archived will also affect the “Affected
> Versions” field.
> Please try it again – I have undone the change.
>
>
>
> On 8/25/17, 4:15 PM, "Kunal Khatua" <kkha...@mapr.com> wrote:
>
> That is very odd!
>
> I'd expect Fix Version to probably get locked down for already
> released versions, but it is odd that affected versions isn’t showing up.
>
> But I do see older JIRAs with older versions visible:
> For e.g.
> 1.10 https://issues.apache.org/jira/projects/DRILL/versions/12338769
>
> But in listing of all versions, it is missing:
> https://issues.apache.org/jira/projects/DRILL?
> selectedItem=com.atlassian.jira.jira-projects-plugin:
> release-page=all
>
>
> -Original Message-
> From: Muhammad Gelbana [mailto:m.gelb...@gmail.com]
> Sent: Friday, August 25, 2017 4:00 PM
> To: dev@drill.apache.org
> Subject: Missing versions for "Affected Versions" and "Fix Version"
> fields ?
>
> For the "Affected Versions" and "Fix Version" JIRA fields, I can only
> find the following versions:
>
>- 1.11.0
>- 1.12.0
>- 2.0
>- Future
>
> Aren't versions 1.10.0 and earlier supported anymore ?
>
> Thanks,
> Gelbana
>
>
>

Missing versions for "Affected Versions" and "Fix Version" fields ?

2017-08-25 Thread Muhammad Gelbana

For the "Affected Versions" and "Fix Version" JIRA fields, I can only find
the following versions:

   - 1.11.0
   - 1.12.0
   - 2.0
   - Future

Aren't versions 1.10.0 and earlier supported anymore ?

Thanks,
Gelbana

[jira] [Created] (DRILL-5735) UI options grouping and filtering & Metrics hints

2017-08-21 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5735:
---

 Summary: UI options grouping and filtering & Metrics hints
 Key: DRILL-5735
 URL: https://issues.apache.org/jira/browse/DRILL-5735
 Project: Apache Drill
  Issue Type: Improvement
  Components: Web Server
Affects Versions: 1.11.0, 1.10.0, 1.9.0
Reporter: Muhammad Gelbana


I can think of some UI improvements that could make all the difference for 
users trying to optimize low-performing queries.

h2. Options
h3. Grouping
We can organize the options to be grouped by their scope of effect, this will 
help users easily locate the options they may need to tune.
h3. Filtering
Since the options are a lot, we can add a filtering mechanism (i.e. string 
search or group\scope filtering) so the user can filter out the options he's 
not interested in. To provide more benefit than the grouping idea mentioned 
above, filtering may include keywords also and not just the option name, since 
the user may not be aware of the name of the option he's looking for.

h2. Metrics
I'm referring here to the metrics page and the query execution plan page that 
displays the overview section and major\minor fragments metrics. We can show 
hints for each metric such as:
# What does it represent in more details.
# What option\scope-of-options to tune (increase ? decrease ?) to improve the 
performance reported by this metric.
# May be even provide a small dialog to quickly allow the modification of the 
related option(s) to that metric



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: direct.used on the Metrics page exceeded planner.memory.max_query_memory_per_node while running a single query

2017-08-15 Thread Muhammad Gelbana

I guess that's why someone recommended to specify a low limit, but then
specify a higher one for each query before execution, using the *ALTER
SESSION* command. I can't remember were did I read that.

Thanks,
Gelbana

On Tue, Aug 15, 2017 at 7:49 PM, Kunal Khatua <kkha...@mapr.com> wrote:

> The property *planner.memory.max_query_memory_per_node* is a cumulative
> limit of all the operators' minor fragments' memory consumption.
>
> However, like Boaz pointed out, the truly memory hungry operators like
> Hash, Sort or Scan operators will take the lion's share of a query's
> memory. Since, some operators which cannot spill to disk as yet (e.g. Hash
> Join) will continue to grab as much memory as they need, you can see the
> memory go up beyond the limit defined.
>
> So, with a 4GB limit, you should see the query constraint within that
> total for each node. It might be a bit higher because there are other
> overheads, like SCAN, but I doubt it would double!
>
> Also, the property 'direct.used' only shows the memory currently held by
> Netty. So, it could be a tad bit misleading (unless you know for sure that
> each subsequent query needs more memory, in which case you'll see the value
> rise).
>
> Boaz might be able to point you to the 10GB limit.
>
>
> -Original Message-
> From: Muhammad Gelbana [mailto:m.gelb...@gmail.com]
> Sent: Tuesday, August 15, 2017 7:43 AM
> To: dev@drill.apache.org
> Subject: Re: direct.used on the Metrics page exceeded
> planner.memory.max_query_memory_per_node while running a single query
>
> By "instance", you mean minor fragments, correct ? And does the
> *planner.memory.max_query_memory_per_node* limit apply to each *type* of
> minor fragments individually ? Assuming the memory limit is set to *4 GB*,
> and the running query involves external sort and hash aggregates, should I
> expect the query to consume at least *8 GB* ?
>
> Would you please point out to me, where in the code can I look into the
> implementation of this 10GB memory limit ?
>
> Thanks,
> Gelbana
>
> On Tue, Aug 15, 2017 at 2:40 AM, Boaz Ben-Zvi <bben-...@mapr.com> wrote:
>
> > There is this page: https://drill.apache.org/docs/
> > sort-based-and-hash-based-memory-constrained-operators/
> > But it seems out of date (correct for 1.10). It does not explain about
> > the hash operators, except that they run till they can not allocate
> > any more memory. This will happen (undocumented) at either 10GB per
> > instance, or when there is no more memory at the node.
> >
> >  Boaz
> >
> > On 8/14/17, 1:16 PM, "Muhammad Gelbana" <m.gelb...@gmail.com> wrote:
> >
> > I'm not sure which version I was using when that happened. But
> > that's some
> > precise details you've mentioned! Is this mentioned somewhere in
> > the docs ?
> >
> > Thanks a lot.
> >
> > On Aug 14, 2017 9:00 PM, "Boaz Ben-Zvi" <bben-...@mapr.com> wrote:
> >
> > > Did your query include a hash join ?
> > >
> > > As of 1.11, only the External Sort and Hash Aggregate operators
> > obey the
> > > memory limit (that is, the “max query memory per node” figure is
> > divided
> > > among all the instances of these operators).
> > > The Hash Join (as was before 1.11) still does not take part in
> > this memory
> > > allocation scheme, and each instance may use up to 10GB.
> > >
> > > Also in 1.11, the Hash Aggregate may “fall back” to the 1.10
> behavior
> > > (same as the Hash Join; i.e. up to 10GB) in case there is too
> > little memory
> > > per an instance (because it cannot perform memory spilling,
> > which requires
> > > some minimal memory to hold multiple batches).
> > >
> > >Thanks,
> > >
> > >   Boaz
> > >
> > > On 8/11/17, 4:25 PM, "Muhammad Gelbana" <m.gelb...@gmail.com>
> wrote:
> > >
> > > Sorry for the long subject !
> > >
> > > I'm running a single query on a single node Drill setup.
> > >
> > > I assumed that setting the *planner.memory.max_query_
> > memory_per_node*
> > > property
> > > controls the max amount of memory (in bytes) for each running
> on
> > a
> > > single
> > > node. Which means that in my setup, the *direct.used* metric in
> > the
> > > metrics
> > > page should never exceed that value in my case.
> > >
> > > But it did and drastically. I assigned *34359738368* (32 GB) to
> > the
> > > *planner.memory.max_query_memory_per_node* option but while
> > > monitoring the
> > > *direct.used* metric, I found that it reached *51640484458*
> (~48
> > GB).
> > >
> > > What did I mistakenly do\interpret ?
> > >
> > > Thanks,
> > > Gelbana
> > > 
> > >
> > >
> > >
> >
> >
> >
>

Re: direct.used on the Metrics page exceeded planner.memory.max_query_memory_per_node while running a single query

2017-08-15 Thread Muhammad Gelbana

By "instance", you mean minor fragments, correct ? And does the
*planner.memory.max_query_memory_per_node* limit apply to each *type* of
minor fragments individually ? Assuming the memory limit is set to *4 GB*,
and the running query involves external sort and hash aggregates, should I
expect the query to consume at least *8 GB* ?

Would you please point out to me, where in the code can I look into the
implementation of this 10GB memory limit ?

Thanks,
Gelbana

On Tue, Aug 15, 2017 at 2:40 AM, Boaz Ben-Zvi <bben-...@mapr.com> wrote:

> There is this page: https://drill.apache.org/docs/
> sort-based-and-hash-based-memory-constrained-operators/
> But it seems out of date (correct for 1.10). It does not explain about the
> hash operators, except that they run till they can not allocate any more
> memory. This will happen (undocumented) at either 10GB per instance, or
> when there is no more memory at the node.
>
>  Boaz
>
> On 8/14/17, 1:16 PM, "Muhammad Gelbana" <m.gelb...@gmail.com> wrote:
>
> I'm not sure which version I was using when that happened. But that's
> some
> precise details you've mentioned! Is this mentioned somewhere in the
> docs ?
>
> Thanks a lot.
>
> On Aug 14, 2017 9:00 PM, "Boaz Ben-Zvi" <bben-...@mapr.com> wrote:
>
> > Did your query include a hash join ?
> >
> > As of 1.11, only the External Sort and Hash Aggregate operators obey
> the
> > memory limit (that is, the “max query memory per node” figure is
> divided
> > among all the instances of these operators).
> > The Hash Join (as was before 1.11) still does not take part in this
> memory
> > allocation scheme, and each instance may use up to 10GB.
> >
> > Also in 1.11, the Hash Aggregate may “fall back” to the 1.10 behavior
> > (same as the Hash Join; i.e. up to 10GB) in case there is too little
> memory
> > per an instance (because it cannot perform memory spilling, which
> requires
> > some minimal memory to hold multiple batches).
> >
> >Thanks,
> >
> >   Boaz
> >
> > On 8/11/17, 4:25 PM, "Muhammad Gelbana" <m.gelb...@gmail.com> wrote:
> >
> > Sorry for the long subject !
> >
> > I'm running a single query on a single node Drill setup.
> >
> > I assumed that setting the *planner.memory.max_query_
> memory_per_node*
> > property
> > controls the max amount of memory (in bytes) for each running on
> a
> > single
> > node. Which means that in my setup, the *direct.used* metric in
> the
> > metrics
> > page should never exceed that value in my case.
> >
> > But it did and drastically. I assigned *34359738368* (32 GB) to
> the
> > *planner.memory.max_query_memory_per_node* option but while
> > monitoring the
> > *direct.used* metric, I found that it reached *51640484458* (~48
> GB).
> >
> > What did I mistakenly do\interpret ?
> >
> > Thanks,
> > Gelbana
> > 
> >
> >
> >
>
>
>

Re: direct.used on the Metrics page exceeded planner.memory.max_query_memory_per_node while running a single query

2017-08-14 Thread Muhammad Gelbana

I'm not sure which version I was using when that happened. But that's some
precise details you've mentioned! Is this mentioned somewhere in the docs ?

Thanks a lot.

On Aug 14, 2017 9:00 PM, "Boaz Ben-Zvi" <bben-...@mapr.com> wrote:

> Did your query include a hash join ?
>
> As of 1.11, only the External Sort and Hash Aggregate operators obey the
> memory limit (that is, the “max query memory per node” figure is divided
> among all the instances of these operators).
> The Hash Join (as was before 1.11) still does not take part in this memory
> allocation scheme, and each instance may use up to 10GB.
>
> Also in 1.11, the Hash Aggregate may “fall back” to the 1.10 behavior
> (same as the Hash Join; i.e. up to 10GB) in case there is too little memory
> per an instance (because it cannot perform memory spilling, which requires
> some minimal memory to hold multiple batches).
>
>Thanks,
>
>   Boaz
>
> On 8/11/17, 4:25 PM, "Muhammad Gelbana" <m.gelb...@gmail.com> wrote:
>
> Sorry for the long subject !
>
> I'm running a single query on a single node Drill setup.
>
> I assumed that setting the *planner.memory.max_query_memory_per_node*
> property
> controls the max amount of memory (in bytes) for each running on a
> single
> node. Which means that in my setup, the *direct.used* metric in the
> metrics
> page should never exceed that value in my case.
>
> But it did and drastically. I assigned *34359738368* (32 GB) to the
> *planner.memory.max_query_memory_per_node* option but while
> monitoring the
> *direct.used* metric, I found that it reached *51640484458* (~48 GB).
>
> What did I mistakenly do\interpret ?
>
> Thanks,
> Gelbana
> 
>
>
>

[jira] [Created] (DRILL-5718) java.lang.IllegalStateException: Memory was leaked by query

2017-08-12 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5718:
---

 Summary: java.lang.IllegalStateException: Memory was leaked by 
query
 Key: DRILL-5718
 URL: https://issues.apache.org/jira/browse/DRILL-5718
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow, Execution - RPC
Affects Versions: 1.11.0, 1.9.0
 Environment: Linux iWebGelbanaDev 2.6.32-696.1.1.el6.x86_64 #1 SMP Tue 
Apr 11 17:13:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

48 Cores
25 GB Heap
200 GB Direct memory
Reporter: Muhammad Gelbana


Configurations
{noformat}
planner.memory.max_query_memory_per_node: 17179869184 (16 GB)
planner.width.max_per_node: 48
store.parquet.block-size: 134217728 (128 MB, this is the block size used to 
create the parquet files)
{noformat}

{noformat}
Fragment 0:0

[Error Id: 05c39a1e-c8a8-4147-870f-e0cdbb454e53 on iWebStitchFixDev:31010]
[BitServer-4] INFO org.apache.drill.exec.work.fragment.FragmentExecutor - 
267104f2-e48d-1d66-63f4-387848c1ccf2:1:10: State change requested RUNNING --> 
CANCELLATION_REQUESTED
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
ChannelClosedException: Channel closed /127.0.0.1:31010 <--> /127.0.0.1:40404.

Fragment 0:0

[Error Id: 05c39a1e-c8a8-4147-870f-e0cdbb454e53 on iWebStitchFixDev:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:550)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.drill.exec.rpc.ChannelClosedException: Channel closed 
/127.0.0.1:31010 <--> /127.0.0.1:40404.
at 
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:164)
at 
org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete(RpcBus.java:144)
at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at 
io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:603)
at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:563)
at 
io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:406)
at 
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
at 
io.netty.channel.AbstractChannel$CloseFuture.setClosed(AbstractChannel.java:943)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.doClose0(AbstractChannel.java:592)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:584)
at 
io.netty.channel.DefaultChannelPipeline$HeadContext.close(DefaultChannelPipeline.java:1099)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
at 
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
at 
io.netty.channel.ChannelOutboundHandlerAdapter.close(ChannelOutboundHandlerAdapter.java:71)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:615)
at 
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:600)
at 
io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:466)
at 
io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:187)
at 
org.apache.drill.exec.rpc.BasicServer$LoggingReadTimeoutHandler.readTimedOut(BasicServer.java:122)
at 
io.netty.handler.timeout.ReadTimeoutHandler$ReadTimeoutTask.run(ReadTimeoutHandler.java:212)
at 
io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at 
io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
Suppressed: org.apache.drill.exec.rpc.RpcException: Failure sending message.
at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:124)
at 
org.apache.drill.exec.rpc.user.UserServer$BitToUserConnection.sendData(User

Re: Error message: Memory was leaked by query

2017-08-12 Thread Muhammad Gelbana

java:103)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
> io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
> at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Attempted to send a message
> when connection is no longer valid.
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
> at
> org.apache.drill.exec.rpc.RequestIdMap.createNewRpcListener(RequestIdMap.java:88)
> at
> org.apache.drill.exec.rpc.AbstractRemoteConnection.createNewRpcListener(AbstractRemoteConnection.java:162)
> at org.apache.drill.exec.rpc.RpcBus.send(RpcBus.java:117)
> ... 46 more


These error messages were found in my application's logs

> INFO: [08:28:59] Channel closed /127.0.0.1:40242 <--> localhost/
> 127.0.0.1:31010.
> [org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete]
>  INFO: [08:28:59] Channel closed /127.0.0.1:40240 <--> localhost/
> 127.0.0.1:31010.
> [org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete]
>  INFO: [08:28:59] Channel closed /127.0.0.1:40238 <--> localhost/
> 127.0.0.1:31010.
> [org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete]
>  INFO: [08:28:59] Channel closed /127.0.0.1:40236 <--> localhost/
> 127.0.0.1:31010.
> [org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete]
>  INFO: [08:28:59] Channel closed /127.0.0.1:40234 <--> localhost/
> 127.0.0.1:31010.
> [org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete]
>  INFO: [08:28:59] Channel closed /127.0.0.1:40232 <--> localhost/
> 127.0.0.1:31010.
> [org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete]
>  INFO: [08:28:59] Channel closed /127.0.0.1:40230 <--> localhost/
> 127.0.0.1:31010.
> [org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete]
>  INFO: [08:28:59] Channel closed /127.0.0.1:40228 <--> localhost/
> 127.0.0.1:31010.
> [org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete]
>  INFO: [08:30:58] Channel closed /127.0.0.1:40226 <--> localhost/
> 127.0.0.1:31010.
> [org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete]
>  INFO: [08:30:58] Channel closed /127.0.0.1:40244 <--> localhost/
> 127.0.0.1:31010.
> [org.apache.drill.exec.rpc.RpcBus$ChannelClosedHandler.operationComplete]


I had to put *slf4j-simple-1.7.25.jar* in *jars/3rdparty/* to resolve of
the *SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder"*
 warning.

Thanks,
Gelbana

On Sat, Aug 12, 2017 at 2:55 AM, Muhammad Gelbana <m.gelb...@gmail.com>
wrote:

> I'm trying to run the following query
>
> **
> *SELECT op.platform, op.name <http://op.name>, op.paymentType,
> ck.posDiscountName, sum(op.amount) amt FROM `dfs`.`/path_to_parquet` op,
> `dfs`.`path_to_parquet2` ck WHERE ck.id <http://ck.id> = op.check_id
> GROUP BY op.platform, op.name <http://op.name>, op.paymentType,
> ck.posDiscountName LIMIT 2147483647*
>
> I also tried the same query without the LIMIT clause
> <https://issues.apache.org/jira/browse/DRILL-5435> but it still fails for
> the same reason.
> 
> I'

Error message: Memory was leaked by query

2017-08-11 Thread Muhammad Gelbana

I'm trying to run the following query

**
*SELECT op.platform, op.name , op.paymentType,
ck.posDiscountName, sum(op.amount) amt FROM `dfs`.`/path_to_parquet` op,
`dfs`.`path_to_parquet2` ck WHERE ck.id  = op.check_id
GROUP BY op.platform, op.name , op.paymentType,
ck.posDiscountName LIMIT 2147483647*

I also tried the same query without the LIMIT clause
 but it still fails for
the same reason.

I'm facing the following exception in the logs and I'm not sure how to
resolve it.

Suppressed: java.lang.IllegalStateException: Memory was leaked by query.
> Memory leaked: (4194304)
> Allocator(op:0:0:0:Screen) 100/4194304/12582912/100
> (res/actual/peak/limit)
> at
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:492)
> at
> org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:141)
> at
> org.apache.drill.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:422)
> at
> org.apache.drill.exec.ops.FragmentContext.close(FragmentContext.java:411)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:318)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ... 1 more
> Suppressed: java.lang.IllegalStateException: Memory was leaked by
> query. Memory leaked: (4194304)
> Allocator(frag:0:0) 300/4194304/1511949440/300
> (res/actual/peak/limit)
> at
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:492)
> at
> org.apache.drill.exec.ops.FragmentContext.suppressingClose(FragmentContext.java:422)
> at
> org.apache.drill.exec.ops.FragmentContext.close(FragmentContext.java:416)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:318)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:155)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ... 1 more


The UI is showing the following error
*org.apache.drill.common.exceptions.UserException: CONNECTION ERROR:
Connection /1.1.1.1:40834  <--> Gelbana/1.1.1.1:31010
 (user client) closed unexpectedly. Drillbit down?
[Error Id: 268bc3a7-114f-4681-984c-05d143f7ebd9 ]*

I understand that this bug has been fixed in 1.9
, which is the version
I'm using. I did what the comments suggested which is to tell Drill to use
a tmp directory that has enough space, so I set the JVM option
*java.io.tmpdir* to /home/mgelbana/server*/temp/* which has over 100GB of
free space, and modified the drill-override.conf file to have the following

tmp: {
> directories: ["/home/mgelbana/server/temp/"],
> filesystem: "file:///"
>   },
>   sort: {
> external: {
>   spill: {
> batch.size : 4000,
> group.size : 100,
> threshold : 200,
> directories : [ "/home/mgelbana/server/temp/spill" ],
> fs : "file:///"
>   }
> }
>   }


I'm running a single Drillbit on a single machine with 25 GB of heap memory
and a 100 GB of direct memory. The machine has 48 cores (i.e. the output of
*nproc* on linux)

*planner.width.max_per_node = 40*
*planner.memory.max_query_memory_per_node = 8589934592 (8 GB)*

That's the plan of the query is

00-00Screen : rowType = RecordType(ANY platform, ANY name, ANY
> paymentType, ANY posDiscountName, ANY amt): rowcount = 2.147483647E9,
> cumulative cost = {7.00120818116E9 rows, 3.700395926736E10 cpu, 0.0 io,
> 8.6479703758848E12 network, 1.444996068119E10 memory}, id = 24229
> 00-01  Project(platform=[$0], name=[$1], paymentType=[$2],
> posDiscountName=[$3], amt=[$4]) : rowType = RecordType(ANY platform, ANY
> name, ANY paymentType, ANY posDiscountName, ANY amt): rowcount =
> 2.147483647E9, cumulative cost = {6.78645981646E9 rows,
> 3.678921090266E10 cpu, 0.0 io,

direct.used on the Metrics page exceeded planner.memory.max_query_memory_per_node while running a single query

2017-08-11 Thread Muhammad Gelbana

Sorry for the long subject !

I'm running a single query on a single node Drill setup.

I assumed that setting the *planner.memory.max_query_memory_per_node* property
controls the max amount of memory (in bytes) for each running on a single
node. Which means that in my setup, the *direct.used* metric in the metrics
page should never exceed that value in my case.

But it did and drastically. I assigned *34359738368* (32 GB) to the
*planner.memory.max_query_memory_per_node* option but while monitoring the
*direct.used* metric, I found that it reached *51640484458* (~48 GB).

What did I mistakenly do\interpret ?

Thanks,
Gelbana

[jira] [Created] (DRILL-5707) Non-scalar subquery fails the whole query if it's aggregate column has an alias

2017-08-07 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5707:
---

 Summary: Non-scalar subquery fails the whole query if it's 
aggregate column has an alias
 Key: DRILL-5707
 URL: https://issues.apache.org/jira/browse/DRILL-5707
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization, SQL Parser
Affects Versions: 1.11.0, 1.9.0
Reporter: Muhammad Gelbana


The following query can be handled by Drill
{code:sql}
SELECT b.marital_status, (SELECT SUM(position_id) FROM cp.`employee.json` a 
WHERE a.marital_status = b.marital_status ) AS max_a FROM cp.`employee.json` b
{code}

But if I add an alias to the aggregate fuction
{code:sql}
SELECT b.marital_status, (SELECT SUM(position_id) MY_ALIAS FROM 
cp.`employee.json` a WHERE a.marital_status = b.marital_status ) AS max_a FROM 
cp.`employee.json` b
{code}

Drill starts complaining that it can't handle non-scalar subqueries
{noformat}
org.apache.drill.common.exceptions.UserRemoteException: UNSUPPORTED_OPERATION 
ERROR: Non-scalar sub-query used in an expression See Apache Drill JIRA: 
DRILL-1937
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Caching Indexes

2017-08-07 Thread Muhammad Gelbana

Some queries may require an index to be created before performing a join
operation for instance. Is there a way to cache that index for future use ?

Thanks,
Gelbana

Understanding the -Dbounds JVM option

2017-08-07 Thread Muhammad Gelbana

This page  refers
to a JVM option by the name *bounds*. I tried googling for it but I
couldn't find anything.

Would someone please elaborate on the effect of this option or post a link
that does ?

Thanks,
Gelbana

Spelling mistake in the "system options" page

2017-08-07 Thread Muhammad Gelbana

I felt it's not a big deal so I thought I don't have a create a Jira task
for it.

On this page

, search for "with have", you should find only one instance, I believe it
should be "will have".


Thanks,
Gelbana

Re: master branch compilation errors shown in Eclipse

2017-08-04 Thread Muhammad Gelbana

An error shows that the class *ExprLexer* cannot be resolved, The .java
file for that class is generated and found in *drill-logical* project
(module ?) int it's *target/generated-sources/antlr3/* folder, which I
configured, through Eclipse, to be a source folder and then *ALL*
compilation errors were gone.

Shouldn't that be configured in maven somewhere ?



Thanks,
Gelbana

On Wed, Aug 2, 2017 at 8:22 PM, Muhammad Gelbana <m.gelb...@gmail.com>
wrote:

> The TSV file is ruined by the stacktraces in the maven exceptions, I
> deleted those and attached another one excluding them.
>
> Thanks,
> Gelbana
>
> On Wed, Aug 2, 2017 at 8:17 PM, Muhammad Gelbana <m.gelb...@gmail.com>
> wrote:
>
>> I attached a TSV file showing the errors I have in Eclipse. I believe its
>> something that has to do with code generation because many errors are:
>>
>>- Compilation errors due to mismatching signatures for overridden
>>methods while the *@Override* annotation is applied
>>- Missing fields\methods such as the missing *INSTANCE* field in
>>*org.apache.calcite.rel.metadata.DefaultRelMetadataProvider* (Shown
>>in the screenshot)
>>
>>
>> Thanks,
>> Gelbana
>>
>> On Wed, Aug 2, 2017 at 6:16 PM, Paul Rogers <prog...@mapr.com> wrote:
>>
>>> Please pass along the detailed error: here or in the JIRA you filed
>>> earlier. That way, we can try to figure out what’s happening.
>>>
>>> Thanks,
>>>
>>> - Paul
>>>
>>> On Aug 2, 2017, at 6:45 AM, Muhammad Gelbana <m.gelb...@gmail.com
>>> <mailto:m.gelb...@gmail.com>> wrote:
>>>
>>> I pulled the latest code from the master branch just today. Ran mvn
>>> clean install -DskipTests which finished successfully, updated all projects
>>> but the update filed due to a NullPointerException while updating the
>>> vector project. So I ran the update again but excluded the vector project
>>> this time.
>>>
>>> Now I'm left with the errors shown in the attached image.
>>>
>>> I'm used to developing a Drill plugin in Eclipse but using the 1.9
>>> branch with only errors only related to maven like lifecycles errors. And
>>> that didn't cause any issue for me.
>>>
>>> I was hoping to create a simple PR but this is blocking me. Would
>>> someone please help me get through this ?
>>>
>>> Thanks,
>>> Gelbana
>>>
>>>
>>
>

Re: Question on building drill & developing a new storage plugin

2017-08-03 Thread Muhammad Gelbana

This wiki  should be very
helpful for understanding plugins development process and the unit testing
architecture among many other things.

I'm facing difficulties with running the test cases successfully too using
a fresh clone: https://issues.apache.org/jira/browse/DRILL-5606
If you get over this, please share the knowledge.

Thanks,
Gelbana

On Thu, Aug 3, 2017 at 8:57 PM, Paul Rogers  wrote:

> Hi Bob,
>
> Thanks for tackling the new plugin!
>
> Drill is a huge piece of software build, in part, by combining a wide
> variety of libraries and packages. As it turns out, the package authors
> have used an even wider variety of loggers, Guava versions and so on.
>
> Maven provides “dependency management” to control the chaos. Basically,
> the pom.xml file first adds library X that, say, uses log4j or commons
> logging. Then, elsewhere in the pom file, dependency management removes
> these dependencies. Maven provides (very hard to read) documentation. The
> “dependency tree” is your friend here.
>
> From your e-mail, it is unclear if these errors appeared with a “stock”
> Drill build, or after you started adding the libraries for WARC. If in
> “stock” Drill, then somehow something slipped through the cracks & we need
> to fix it. If after adding new jars, then you have to tackle the dependency
> management aspect.
>
> Note that, depending on where you add the code, you may also find yourself
> fighting with the “JDBC All” project. That project imports all of Drill
> into the JDBC package, then removes a bunch of stuff to keep the file size
> under a defined maximum. If your code pushes JDBC over the limit, you’ll
> have to find things to throw overboard to get below the limit again. (This
> is why storage plugins should, ideally, be in the contrib directory, not in
> exec.)
>
> At present, Drill provides no “SDK”: no way to build plugins except as
> part of the Drill source tree. At one point I was able to get this working
> for some aspects. Storage plugins, however, do require changes in core
> Drill code (for the bootstrap file, for registering the reader, etc.) That
> is the bit that must be fixed to allow a true external plugin development.
> This is a shame, and should be fixed. If you have time to do so, we’re
> always looking for contributions!
>
> Thanks,
>
> - Paul
>
>
> > On Aug 3, 2017, at 10:29 AM, Bob Rudis  wrote:
> >
> > Hey folks,
> >
> > First:
> >
> > Inspired by the PCAP support in 1.11.0 I started down the path of
> > cloning drill and just doing a test build before I started looking at
> > 2 of the issues I posted for the PCAP storage format and also working
> > to incorporate a similar WARC plugin (via the jwat jars).
> >
> > `mvn package -DskipTests` works fine but `mvn install -DskipTests`
> generates:
> >
> > Found Banned Dependency: log4j:log4j:jar:1.2.17
> > Found Banned Dependency: commons-logging:commons-logging:jar:1.1.1
> >
> > warnings which causes:
> >
> > Apache Drill Root POM .. FAILURE
> >
> > and (hence) the overall build to fail.
> >
> > I'm hoping that I'm just missing something obvious after some failed
> googling.
> >
> >
> > Second:
> >
> > Is there a way to develop the WARC format plugin outside of
> > Drill-source-proper vs having to build into it. If so, is there a
> > small, example GH repo someone cld point me to (that also has the "how
> > to get it into Drill" part). From a scan of the source, it looks like
> > the formats are embedded into a few source files in the
> > Drill-source-proper. If there's no way to develop one outside of it,
> > that's not a problem, I just figured it'd be easier to do it outside
> > of modifying Drill source first.
> >
> > thx,
> >
> > -Bob
>
>

Re: master branch compilation errors shown in Eclipse

2017-08-02 Thread Muhammad Gelbana

The TSV file is ruined by the stacktraces in the maven exceptions, I
deleted those and attached another one excluding them.

Thanks,
Gelbana

On Wed, Aug 2, 2017 at 8:17 PM, Muhammad Gelbana <m.gelb...@gmail.com>
wrote:

> I attached a TSV file showing the errors I have in Eclipse. I believe its
> something that has to do with code generation because many errors are:
>
>- Compilation errors due to mismatching signatures for overridden
>methods while the *@Override* annotation is applied
>- Missing fields\methods such as the missing *INSTANCE* field in
>*org.apache.calcite.rel.metadata.DefaultRelMetadataProvider* (Shown in
>the screenshot)
>
>
> Thanks,
> Gelbana
>
> On Wed, Aug 2, 2017 at 6:16 PM, Paul Rogers <prog...@mapr.com> wrote:
>
>> Please pass along the detailed error: here or in the JIRA you filed
>> earlier. That way, we can try to figure out what’s happening.
>>
>> Thanks,
>>
>> - Paul
>>
>> On Aug 2, 2017, at 6:45 AM, Muhammad Gelbana <m.gelb...@gmail.com> m.gelb...@gmail.com>> wrote:
>>
>> I pulled the latest code from the master branch just today. Ran mvn clean
>> install -DskipTests which finished successfully, updated all projects but
>> the update filed due to a NullPointerException while updating the vector
>> project. So I ran the update again but excluded the vector project this
>> time.
>>
>> Now I'm left with the errors shown in the attached image.
>>
>> I'm used to developing a Drill plugin in Eclipse but using the 1.9 branch
>> with only errors only related to maven like lifecycles errors. And that
>> didn't cause any issue for me.
>>
>> I was hoping to create a simple PR but this is blocking me. Would someone
>> please help me get through this ?
>>
>> Thanks,
>> Gelbana
>>
>>
>

DescriptionResourcePathLocationType
INSTANCE cannot be resolved or is not a fieldDrillDefaultRelMetadataProvider.java/drill-java-exec/src/main/java/org/apache/drill/exec/planner/costline 32Java Problem
The method computeSelfCost(RelOptPlanner, RelMetadataQuery) of type DrillAggregateRel must override or implement a supertype methodDrillAggregateRel.java/drill-java-exec/src/main/java/org/apache/drill/exec/planner/logicalline 88Java Problem
The method computeSelfCost(RelOptPlanner, RelMetadataQuery) of type DrillFilterRelBase must override or implement a supertype methodDrillFilterRelBase.java/drill-java-exec/src/main/java/org/apache/drill/exec/planner/commonline 67Java Problem
The method computeSelfCost(RelOptPlanner, RelMetadataQuery) of type DrillJoinRelBase must override or implement a supertype methodDrillJoinRelBase.java/drill-java-exec/src/main/java/org/apache/drill/exec/planner/commonline 64Java Problem
The method computeSelfCost(RelOptPlanner, RelMetadataQuery) of type DrillLimitRelBase must override or implement a supertype methodDrillLimitRelBase.java/drill-java-exec/src/main/java/org/apache/drill/exec/planner/commonline 66Java Problem
The method computeSelfCost(RelOptPlanner, RelMetadataQuery) of type DrillProjectRelBase must override or implement a supertype methodDrillProjectRelBase.java/drill-java-exec/src/main/java/org/apache/drill/exec/planner/commonline 73Java Problem
The method computeSelfCost(RelOptPlanner, RelMetadataQuery) of type DrillScanRel must override or implement a supertype methodDrillScanRel.java/drill-java-exec/src/main/java/org/apache/drill/exec/planner/logicalline 159Java Problem
The method computeSelfCost(RelOptPlanner, RelMetadataQuery) of type DrillScreenRelBase must override or implement a supertype methodDrillScreenRelBase.java/drill-java-exec/src/main/java/org/apache/drill/exec/planner/commonline 42Java Problem
The method computeSelfCost(RelOptPlanner, RelMetadataQuery) of type DrillUnionRel must override or implement a supertype methodDrillUnionRel.java/drill-java-exec/src/main/java/org/apache/drill/exec/planner/logicalline 58Java Problem
The method computeSelfCost(RelOptPlanner, RelMetadataQuery) of type HashAggPrel must override or implement a supertype methodHashAggPrel.java/drill-java-exec/src/main/java/org/apache/drill/exec/planner/physicalline 68Java Problem
The method computeSelfCost(RelOptPlanner, RelMetadataQuery) of type HashToMergeExchangePrel must override or implement a supertype methodHashToMergeExchangePrel.java/drill-java-exec/src/main/java/org/apache/drill/exec/planner/physicalline 55Java Problem
The method computeSelfCost(RelOptPlanner, RelMetadataQuery) of type HashToRandomExchangePrel must override or implement a supertype methodHashToRandomExchangePrel.java/drill-java-exec/src/main/java/org/apache/drill/exec/planner/physicalline 66Java Problem
The method computeSelfCost(RelOptPlanner, RelMetadataQuery) of type OrderedPartitionExchangePrel must override or implement a supertype met

Re: master branch compilation errors shown in Eclipse

2017-08-02 Thread Muhammad Gelbana

I attached a TSV file showing the errors I have in Eclipse. I believe its
something that has to do with code generation because many errors are:

   - Compilation errors due to mismatching signatures for overridden
   methods while the *@Override* annotation is applied
   - Missing fields\methods such as the missing *INSTANCE* field in
   *org.apache.calcite.rel.metadata.DefaultRelMetadataProvider* (Shown in
   the screenshot)

Thanks,
Gelbana

On Wed, Aug 2, 2017 at 6:16 PM, Paul Rogers <prog...@mapr.com> wrote:

> Please pass along the detailed error: here or in the JIRA you filed
> earlier. That way, we can try to figure out what’s happening.
>
> Thanks,
>
> - Paul
>
> On Aug 2, 2017, at 6:45 AM, Muhammad Gelbana <m.gelb...@gmail.com m.gelb...@gmail.com>> wrote:
>
> I pulled the latest code from the master branch just today. Ran mvn clean
> install -DskipTests which finished successfully, updated all projects but
> the update filed due to a NullPointerException while updating the vector
> project. So I ran the update again but excluded the vector project this
> time.
>
> Now I'm left with the errors shown in the attached image.
>
> I'm used to developing a Drill plugin in Eclipse but using the 1.9 branch
> with only errors only related to maven like lifecycles errors. And that
> didn't cause any issue for me.
>
> I was hoping to create a simple PR but this is blocking me. Would someone
> please help me get through this ?
>
> Thanks,
> Gelbana
>
>

DescriptionResourcePathLocationType
Artifact has not been packaged yet. When used on reactor artifact, unpack should be executed after packaging: see MDEP-98. (org.apache.maven.plugins:maven-dependency-plugin:2.8:unpack:unpack-vector-types:initialize)

org.apache.maven.plugin.MojoExecutionException: Artifact has not been packaged yet. When used on reactor artifact, unpack should be executed after packaging: see MDEP-98.
at org.apache.maven.plugin.dependency.AbstractDependencyMojo.unpack(AbstractDependencyMojo.java:265)
at org.apache.maven.plugin.dependency.fromConfiguration.UnpackMojo.unpackArtifact(UnpackMojo.java:128)
at org.apache.maven.plugin.dependency.fromConfiguration.UnpackMojo.doExecute(UnpackMojo.java:106)
at org.apache.maven.plugin.dependency.AbstractDependencyMojo.execute(AbstractDependencyMojo.java:167)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
at org.eclipse.m2e.core.internal.embedder.MavenImpl.execute(MavenImpl.java:331)
at org.eclipse.m2e.core.internal.embedder.MavenImpl$11.call(MavenImpl.java:1362)
at org.eclipse.m2e.core.internal.embedder.MavenImpl$11.call(MavenImpl.java:1)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.executeBare(MavenExecutionContext.java:176)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:112)
at org.eclipse.m2e.core.internal.embedder.MavenImpl.execute(MavenImpl.java:1360)
at org.eclipse.m2e.core.project.configurator.MojoExecutionBuildParticipant.build(MojoExecutionBuildParticipant.java:52)
at com.ianbrandt.tools.m2e.mdp.core.MdpBuildParticipant.executeMojo(MdpBuildParticipant.java:133)
at com.ianbrandt.tools.m2e.mdp.core.MdpBuildParticipant.build(MdpBuildParticipant.java:67)
at org.eclipse.m2e.core.internal.builder.MavenBuilderImpl.build(MavenBuilderImpl.java:137)
at org.eclipse.m2e.core.internal.builder.MavenBuilder$1.method(MavenBuilder.java:172)
at org.eclipse.m2e.core.internal.builder.MavenBuilder$1.method(MavenBuilder.java:1)
at org.eclipse.m2e.core.internal.builder.MavenBuilder$BuildMethod$1$1.call(MavenBuilder.java:115)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.executeBare(MavenExecutionContext.java:176)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:112)
at org.eclipse.m2e.core.internal.builder.MavenBuilder$BuildMethod$1.call(MavenBuilder.java:105)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.executeBare(MavenExecutionContext.java:176)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:151)
at org.eclipse.m2e.core.internal.embedder.MavenExecutionContext.execute(MavenExecutionContext.java:99)
at org.eclipse.m2e.core.internal.builder.MavenBuilder$BuildMethod.execute(MavenBuilder.java:86)
at org.eclipse.m2e.core.internal.builder.MavenBuilder.build(MavenBuilder.java:200)
at org.eclipse.core.internal.events.BuildManager$2.run(BuildManager.java:735)
at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
at org.eclipse.core.internal.events.BuildManager.basicBuild(BuildManager.java:206)
at org.eclipse.core.internal.events.BuildManager.basicBuild(BuildManager.java:246)
at org.eclipse.core.internal.events.BuildManager$1.run(BuildManager.java:301)
at org.eclipse.core.runtime.SafeRunner.run(Sa

master branch compilation errors shown in Eclipse

2017-08-02 Thread Muhammad Gelbana

I pulled the latest code from the master branch just today. Ran *mvn clean
install -DskipTests* which finished successfully, updated all projects but
the update filed due to a *NullPointerException* while updating the *vector*
project. So I ran the update again but excluded the *vector* project this
time.

Now I'm left with the errors shown in the attached image.

I'm used to developing a Drill plugin in Eclipse but using the *1.9* branch
with only errors only related to maven like lifecycles errors. And that
didn't cause any issue for me.

I was hoping to create a simple PR but this is blocking me. Would someone
please help me get through this ?

Thanks,
Gelbana

[jira] [Created] (DRILL-5695) INTERVAL DAY multiplication isn't supported

2017-07-30 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5695:
---

 Summary: INTERVAL DAY multiplication isn't supported
 Key: DRILL-5695
 URL: https://issues.apache.org/jira/browse/DRILL-5695
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.9.0
Reporter: Muhammad Gelbana


I'm not sure if this is intended or a missing feature.

The following query
{code:sql}
SELECT CUSTOM_DATE_TRUNC('day', CAST('1900-01-01' AS DATE) + CAST (NULL AS 
INTERVAL DAY) * INTERVAL '1' DAY) + 1 * INTERVAL '1' YEAR FROM 
`dfs`.`path_to_parquet` Calcs HAVING (COUNT(1) > 0) LIMIT 0
{code}

{noformat}
2017-07-30 13:12:15,439 [268240ef-eeea-04e2-cca2-b95033061af5:foreman] INFO  
o.a.d.e.p.sql.TypeInferenceUtils - User Error Occurred
org.apache.drill.common.exceptions.UserException: FUNCTION ERROR: * does not 
support operand types (INTERVAL_DAY_TIME,INTERVAL_DAY_TIME)


[Error Id: 50c2bd86-332c-4569-a5a2-76193e7eca41 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
 ~[drill-common-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.TypeInferenceUtils.resolveDrillFuncHolder(TypeInferenceUtils.java:644)
 [drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.TypeInferenceUtils.access$1700(TypeInferenceUtils.java:57)
 [drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.TypeInferenceUtils$DrillDefaultSqlReturnTypeInference.inferReturnType(TypeInferenceUtils.java:260)
 [drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.calcite.sql.SqlOperator.inferReturnType(SqlOperator.java:468) 
[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlOperator.validateOperands(SqlOperator.java:435) 
[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlOperator.deriveType(SqlOperator.java:507) 
[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlBinaryOperator.deriveType(SqlBinaryOperator.java:143) 
[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:4337)
 [calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:4324)
 [calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:130) 
[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1501)
 [calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveType(SqlValidatorImpl.java:1484)
 [calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlOperator.deriveType(SqlOperator.java:493) 
[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlBinaryOperator.deriveType(SqlBinaryOperator.java:143) 
[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:4337)
 [calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:4324)
 [calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:130) 
[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1501)
 [calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveType(SqlValidatorImpl.java:1484)
 [calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlOperator.constructArgTypeList(SqlOperator.java:581) 
[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:240) 
[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:222) 
[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:4337)
 [calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:4324)
 [calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:130) 
[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1501)
 [calcite-core-1.4.0-drill-r19.jar:1.4.0-dr

Re: A storage plugin for a custom datasource

2017-07-18 Thread Muhammad Gelbana

Thanks a lot Aman, my mistake was that there was a logical node between
physical ones, which couldn't be implemented of course.



- Gelbana

On Tue, Jul 11, 2017 at 5:51 PM, Aman Sinha <amansi...@apache.org> wrote:

> For such 'could not be implemented' issues, it is useful to look at Calcite
> trace logs to determine where an appropriate implementation of the logical
> plan node to physical could not be found.   You would need 2 things:  a
> logging.properties file and adding couple of -D flags to your JVM
> properties.
>
> In drill-env.sh:
>
> export DRILL_JAVA_OPTS="$DRILL_JAVA_OPTS *-Dcalcite.debug=true
> -Djava.util.logging.config.file=* "
>
> // Create logging.properties file  (note the output of this will go to your
> $HOME/log directory, so you need to create that).
>
> $ cat logging.properties
>
> handlers=java.util.logging.FileHandler
>
> .level=ALL
>
> #.level= INFO
>
> org.eigenbase.relopt.RelOptPlanner.level=FINER
>
> java.util.logging.FileHandler.pattern=%h/log/java%u.log
>
> On Mon, Jul 10, 2017 at 9:20 AM, Muhammad Gelbana <m.gelb...@gmail.com>
> wrote:
>
> > Forgive me for accidentally sending the previous email before preparing
> it
> > well. Ignore the plans mentioned earlier for now, to continue...
> >
> > Getting back to the rules, I edited the old JDBC physical converter rule
> to
> > be
> >
> > private static class GelbanaPrule extends ConverterRule {
> > > private GelbanaPrule(IncortaLayoutConvention
> > incortaLayoutConvention)
> > > {
> > > super(GelbanaRel.class, incortaLayoutConvention,
> > > Prel.DRILL_PHYSICAL, "PREL_Converter");
> > > }
> > > @Override
> > > public boolean matches(RelOptRuleCall call) {
> > > return super.matches(call);
> > > }
> > > @Override
> > > public RelNode convert(RelNode in) {
> > > RelTraitSet physicalTraits =
> > > in.getTraitSet().replace(getOutTrait());
> > > RelTraitSet noneTraits = in.getTraitSet().replace(
> > Convention.NONE);
> > > return new GelbanaIntermediatePrel(in.getCluster(),
> > > physicalTraits, convert(in, noneTraits));
> > > }
> > > }
> >
> >
> > What happens is that the physical rule is executed successfully but then
> an
> > error (*Node [rel#50:Subset#3.LOGICAL.ANY([]).[]] could not be
> > implemented;
> > planner state*) is thrown from this method:
> > *org.apache.calcite.plan.volcano.RelSubset.CheapestPlanReplacer.visit(
> > RelNode,
> > int, RelNode) *
> >
> > That's because a *RelSubset* is visited but it doesn't have a best
> > performing node and it's cost is infinite.
> >
> > Getting back to the plan included in the previous email, I encapsulated
> the
> > *LogicalAggregate* as an *IncortaRel*, so that I can physically implement
> > the aggregation. I succeeded in the encapsulation but I can't figure out
> > how to fix the "*could not be implemented*" error so far. Would someone
> > please give a hint about how I can approach this error ?
> >
> >
> >
> > - Gelbana
> >
> > On Mon, Jul 10, 2017 at 6:09 PM, Muhammad Gelbana <m.gelb...@gmail.com>
> > wrote:
> >
> > > I'm planning to create a storage plugin for a custom datasource that
> > > accepts queries in the form of XML messages, but it's metadata can be
> > > discovered using the JDBC metadata API. I can work on discovering the
> > > metadata differently but that's not my priority for now.
> > >
> > > So I copied the JDBC storage plugin, Ignored all JDBC rules, and edited
> > > wrote the following JDBC storage rules:
> > >
> > > I renamed *JdbcDrelConverterRule* to *GelbanaRelConverterRule* and
> edited
> > > it's *constructor* and *convert* methods to be:
> > >
> > > public GelbanaRelConverterRule(IncortaLayoutConvention out) {
> > >> super(Aggregate.class, Convention.NONE, out,
> > >> "Incorta_Rel_Converter");
> > >> }
> > >> @Override
> > >> public RelNode convert(RelNode rel) {
> > >> RelTraitSet newTraits = rel.getTraitSet().replace(getOutTrait());
> > >> return new GelbanaRel(rel.getCluster(), newTraits, convert(rel,
> > >> newTraits));
> > >> }
> > >
> > >
> > > 17:57:19.931 [269c5c27-2f94-14ff-1f3e-0035b17b5965:foreman] DEBUG
> > > o.a.d.e.p.s.h.DefaultSqlHandler - HEP:Window Function rewrites
&g

Re: Why Drill required a special Calcite fork ?

2017-07-17 Thread Muhammad Gelbana

Could anyone at least just provide a link for that fork's repository please
?



- Gelbana

On Sat, Jul 1, 2017 at 12:02 AM, Muhammad Gelbana <m.gelb...@gmail.com>
wrote:

> Would someone please list the reasons for which Drill required having a
> custom Calcite build ? This will help integrating Calcite whenever a
> release is out.
>
> One reason I can assume is that Calcite didn't support unparsing all SQL
> clauses such as FETCH and OFFSET ?
>
> I for my self, very much need to use the latest Calcite version with Drill
> and I'm willing to spend time working on that, if it's possible for my poor
> Drill and Calcite knowledge.
>
> -
> Gelbana
>

Re: Debugging Drill in Eclipse

2017-07-14 Thread Muhammad Gelbana

When you said I'm rebuilding Drill, do you mean I'm running "*mvn clean
install*" after every change ? Actually I'm configuring eclipse to run the
same java command that "*drillbit.sh start*" command would run.

But I have to say that your method is much more efficient and practical. I
should've used test cases for development from the start.

I recently started exploring the test cases classes but I found it very
hard to comprehend so If there is some sort of a guide to the testing
classes architecture, packages structure\categorization..etc, it would be
very helpful if you share it. I looked into your github wiki
<https://github.com/paul-rogers/drill/wiki/Testing-Tips> but I didn't find
what I need.



- Gelbana

On Thu, Jul 13, 2017 at 6:53 PM, Paul Rogers <prog...@mapr.com> wrote:

> Hi Muhammad,
>
> There are several issues here.
>
> First, the problem you describe is not related to Eclipse. Second, several
> of us do use Eclipse and it works fine. Third, there are far easier ways to
> debug Drill in Eclipse then building Drill and doing remote debugging.
>
> First the error. The problem is that some bit of code is referring to a
> config parameter that does not appear in the config files. This kind of key
> would normally appear in some drill-module.conf file. (The file
> drill-override.conf is only for the very few config setting that you must
> modify for your site.)
>
> My source is a bit different than your, but the line in question seems to
> be this one:
>
> this.loop = TransportCheck.createEventLoopGroup(config.
> getInt(ExecConstants.BIT_SERVER_RPC_THREADS), "BitServer-“);
>
> The config property here is defined in java-exec/src/main/resources
> drill-module.conf:
>
> drill.exec: {
>   ...
>   rpc: {
> user: {
>   ...
>   server: {
>...
> threads: 1,
>
> The only way you will get the error in your stack is if, somehow, your
> build omits the drill-module.conf file. You can inspect your jar files to
> see if this file is present.
>
> Second, Eclipse works fine. The only trick is if you try to run certain
> unit tests that use Mockito. “Modern” Eclipse Neon is based on JDK 8. But
> the Mockito tests only run on JDK 7. There is no way to get the test runner
> in Eclipse to use JDK 7. So, I end up building Drill for JDK 8 when using
> Eclipse (just change the Java version in the root pom.xml file from 1.7 to
> 1.8 — in two places.) Then run the Mockito-based tests outside of Eclipse,
> after rebuilding back on JDK 7. Yes, a hassle, but this is just the way
> that Drill works today.
>
> Further, what works for me is:
>
> 1. Check out Drill
> 2. Change the pom.xml Java version number as noted above
> 3. Build all of Drill without tests: “mvn clean install -DskipTests”
> 4. Open Drill or refresh Drill in Eclipse. Eclipse does its own build.
> 5. Run Drill directly from Eclipse.
>
> Item 5 above is the third item. For many purposes, it is far more
> convenient to run Drill directly from Eclipse. I use unit tests based on a
> newer test framework. Find ExampleTest.java for example. For example, if a
> particular query fails, and I can copy the data locally, then I just use
> something like “fourthTest” to set up a storage plugin, set required
> session, system and config options, run the query, and either display the
> results or as summary. You can set breakpoints in the lines in question and
> debug. Entire edit/compile/debug cycle is maybe 30 seconds vs. the five
> minutes if you do a full external build.
>
> I hope some of this helps you resolve your issue.
>
> The most practical solution:
>
> 1. Rebuild Drill
> 2. Retry the run without the debugger.
>
> If that works, review your Eclipse settings that might affect class path.
>
> Thanks,
>
> - Paul
>
>
> > On Jul 13, 2017, at 8:20 AM, Muhammad Gelbana <m.gelb...@gmail.com>
> wrote:
> >
> > To debug Drill in Eclipse, I ran *./drillbit.sh debug* and copied the VM
> > args and environment variables into a launcher. This worked fine for
> months.
> >
> > Obviously now I messed things up in a way and I can't debug Drill in
> > Eclipse anymore. I'm facing the following error:
> >
> > Exception in thread "main"
> > org.apache.drill.exec.exception.DrillbitStartupException: Failure while
> > initializing values in Drillbit.
> > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:288)
> > at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:272)
> > at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:268)
> > Caused by: com.typesafe.config.ConfigException$Missing: *No
> configuration
> > setting found for

Debugging Drill in Eclipse

2017-07-13 Thread Muhammad Gelbana

To debug Drill in Eclipse, I ran *./drillbit.sh debug* and copied the VM
args and environment variables into a launcher. This worked fine for months.

Obviously now I messed things up in a way and I can't debug Drill in
Eclipse anymore. I'm facing the following error:

Exception in thread "main"
org.apache.drill.exec.exception.DrillbitStartupException: Failure while
initializing values in Drillbit.
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:288)
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:272)
at org.apache.drill.exec.server.Drillbit.main(Drillbit.java:268)
Caused by: com.typesafe.config.ConfigException$Missing: *No configuration
setting found for key 'drill.exec.rpc'*
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:138)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)
at
com.typesafe.config.impl.SimpleConfig.getConfigNumber(SimpleConfig.java:170)
at com.typesafe.config.impl.SimpleConfig.getInt(SimpleConfig.java:181)
at org.apache.drill.common.config.NestedConfig.getInt(NestedConfig.java:98)
at org.apache.drill.common.config.DrillConfig.getInt(DrillConfig.java:1)
at
org.apache.drill.exec.server.BootStrapContext.(BootStrapContext.java:55)
at org.apache.drill.exec.server.Drillbit.(Drillbit.java:94)
at org.apache.drill.exec.server.Drillbit.start(Drillbit.java:286)
... 2 more

My *drill-override.conf* file has the following content (Aside from the
comments)

drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "localhost:2181"
}

I never needed to change this file to debug Drill in Eclipse !

Is there a standard way that Drill developers use to debug Drill ?

I appreciate any help for this because it's totally blocking me !

Re: A storage plugin for a custom datasource

2017-07-10 Thread Muhammad Gelbana

Forgive me for accidentally sending the previous email before preparing it
well. Ignore the plans mentioned earlier for now, to continue...

Getting back to the rules, I edited the old JDBC physical converter rule to
be

private static class GelbanaPrule extends ConverterRule {
> private GelbanaPrule(IncortaLayoutConvention incortaLayoutConvention)
> {
> super(GelbanaRel.class, incortaLayoutConvention,
> Prel.DRILL_PHYSICAL, "PREL_Converter");
> }
> @Override
> public boolean matches(RelOptRuleCall call) {
> return super.matches(call);
> }
> @Override
> public RelNode convert(RelNode in) {
> RelTraitSet physicalTraits =
> in.getTraitSet().replace(getOutTrait());
> RelTraitSet noneTraits = in.getTraitSet().replace(Convention.NONE);
> return new GelbanaIntermediatePrel(in.getCluster(),
> physicalTraits, convert(in, noneTraits));
> }
> }


What happens is that the physical rule is executed successfully but then an
error (*Node [rel#50:Subset#3.LOGICAL.ANY([]).[]] could not be implemented;
planner state*) is thrown from this method:
*org.apache.calcite.plan.volcano.RelSubset.CheapestPlanReplacer.visit(RelNode,
int, RelNode) *

That's because a *RelSubset* is visited but it doesn't have a best
performing node and it's cost is infinite.

Getting back to the plan included in the previous email, I encapsulated the
*LogicalAggregate* as an *IncortaRel*, so that I can physically implement
the aggregation. I succeeded in the encapsulation but I can't figure out
how to fix the "*could not be implemented*" error so far. Would someone
please give a hint about how I can approach this error ?



- Gelbana

On Mon, Jul 10, 2017 at 6:09 PM, Muhammad Gelbana <m.gelb...@gmail.com>
wrote:

> I'm planning to create a storage plugin for a custom datasource that
> accepts queries in the form of XML messages, but it's metadata can be
> discovered using the JDBC metadata API. I can work on discovering the
> metadata differently but that's not my priority for now.
>
> So I copied the JDBC storage plugin, Ignored all JDBC rules, and edited
> wrote the following JDBC storage rules:
>
> I renamed *JdbcDrelConverterRule* to *GelbanaRelConverterRule* and edited
> it's *constructor* and *convert* methods to be:
>
> public GelbanaRelConverterRule(IncortaLayoutConvention out) {
>> super(Aggregate.class, Convention.NONE, out,
>> "Incorta_Rel_Converter");
>> }
>> @Override
>> public RelNode convert(RelNode rel) {
>> RelTraitSet newTraits = rel.getTraitSet().replace(getOutTrait());
>> return new GelbanaRel(rel.getCluster(), newTraits, convert(rel,
>> newTraits));
>> }
>
>
> 17:57:19.931 [269c5c27-2f94-14ff-1f3e-0035b17b5965:foreman] DEBUG
> o.a.d.e.p.s.h.DefaultSqlHandler - HEP:Window Function rewrites (152ms):
> LogicalProject(EXPR$0=[$1], PROD_CATEGORY=[$0]): rowcount = 150.0,
> cumulative cost = {3518.75 rows, 3502.0 cpu, 0.0 io, 0.0 network, 0.0
> memory}, id = 21
>   LogicalAggregate(group=[{0}], EXPR$0=[COUNT($1)]): rowcount = 150.0,
> cumulative cost = {3368.75 rows, 3202.0 cpu, 0.0 io, 0.0 network, 0.0
> memory}, id = 19
> LogicalProject(PROD_CATEGORY=[$13], Revenue=[$3]): rowcount = 1500.0,
> cumulative cost = {3200.0 rows, 3202.0 cpu, 0.0 io, 0.0 network, 0.0
> memory}, id = 17
>   LogicalJoin(condition=[=($0, $13)], joinType=[inner]): rowcount =
> 1500.0, cumulative cost = {1700.0 rows, 202.0 cpu, 0.0 io, 0.0 network, 0.0
> memory}, id = 15
> JdbcTableScan(table=[[incorta, SALES, Target]]): rowcount = 100.0,
> cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory},
> id = 7
> JdbcTableScan(table=[[incorta, SALES, PRODUCTS]]): rowcount =
> 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0
> memory}, id = 8
>
> 17:57:20.094 [269c5c27-2f94-14ff-1f3e-0035b17b5965:foreman] DEBUG
> o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Directory Prune Planning
> (150ms):
> LogicalProject(EXPR$0=[$1], PROD_CATEGORY=[$0]): rowcount = 150.0,
> cumulative cost = {150.0 rows, 300.0 cpu, 0.0 io, 0.0 network, 0.0 memory},
> id = 37
>   IncortaRel: rowcount = 150.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0
> io, 0.0 network, 0.0 memory}, id = 45
>
>
> - Gelbana
>

A storage plugin for a custom datasource

2017-07-10 Thread Muhammad Gelbana

I'm planning to create a storage plugin for a custom datasource that
accepts queries in the form of XML messages, but it's metadata can be
discovered using the JDBC metadata API. I can work on discovering the
metadata differently but that's not my priority for now.

So I copied the JDBC storage plugin, Ignored all JDBC rules, and edited
wrote the following JDBC storage rules:

I renamed *JdbcDrelConverterRule* to *GelbanaRelConverterRule* and edited
it's *constructor* and *convert* methods to be:

public GelbanaRelConverterRule(IncortaLayoutConvention out) {
> super(Aggregate.class, Convention.NONE, out,
> "Incorta_Rel_Converter");
> }
> @Override
> public RelNode convert(RelNode rel) {
> RelTraitSet newTraits = rel.getTraitSet().replace(getOutTrait());
> return new GelbanaRel(rel.getCluster(), newTraits, convert(rel,
> newTraits));
> }


17:57:19.931 [269c5c27-2f94-14ff-1f3e-0035b17b5965:foreman] DEBUG
o.a.d.e.p.s.h.DefaultSqlHandler - HEP:Window Function rewrites (152ms):
LogicalProject(EXPR$0=[$1], PROD_CATEGORY=[$0]): rowcount = 150.0,
cumulative cost = {3518.75 rows, 3502.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 21
  LogicalAggregate(group=[{0}], EXPR$0=[COUNT($1)]): rowcount = 150.0,
cumulative cost = {3368.75 rows, 3202.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 19
LogicalProject(PROD_CATEGORY=[$13], Revenue=[$3]): rowcount = 1500.0,
cumulative cost = {3200.0 rows, 3202.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 17
  LogicalJoin(condition=[=($0, $13)], joinType=[inner]): rowcount =
1500.0, cumulative cost = {1700.0 rows, 202.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 15
JdbcTableScan(table=[[incorta, SALES, Target]]): rowcount = 100.0,
cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0 memory},
id = 7
JdbcTableScan(table=[[incorta, SALES, PRODUCTS]]): rowcount =
100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 8

17:57:20.094 [269c5c27-2f94-14ff-1f3e-0035b17b5965:foreman] DEBUG
o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Directory Prune Planning
(150ms):
LogicalProject(EXPR$0=[$1], PROD_CATEGORY=[$0]): rowcount = 150.0,
cumulative cost = {150.0 rows, 300.0 cpu, 0.0 io, 0.0 network, 0.0 memory},
id = 37
  IncortaRel: rowcount = 150.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0
io, 0.0 network, 0.0 memory}, id = 45


- Gelbana

Test cases that require a UTC timezone.

2017-07-09 Thread Muhammad Gelbana

While trying to run Drill's test cases
, I found that one of the
failing tests would succeed

if the timezone was set to UTC (Mine is GMT+2).

When I looked around for other test cases that may require timezones, I
found a couple of tests ignored (Marked with @Ignore) because they depend
on timezones !

Would someone please tell me how can I set the timezone for a test case ?
Also sharing a guide about Drill's tests classes, packages,
architecture...etc, would be very helpful.

-Gelbana

Re: Problems building Drill

2017-07-03 Thread Muhammad Gelbana

I tried a building from source using the same branch (i.e. *master*) after
creating a fresh clone and I faced some trouble
 but its none of what you
are mentioning here. Are you running the tests from the command line or
from eclipse ? Try the command line (i.e. mvn clean install)

One thing that caused the test failures (None of the issues you are
mentioning) on my machine was that my system had other things running and
that caused some timeout-oriented tests to fails. There are 2 constant
failures though but I haven't got the chance to look into that yet.

-- Gelbana

On Sun, Jul 2, 2017 at 10:43 PM, Charles Givre  wrote:

> Hello all,
> I’m having a small problem building Drill from source.  I keep getting the
> errors below when I try to run tests.  It builds fine when I skip the
> tests.  I’ve googled the errors and haven’t really found anything helpful.
> I’m not an expert on Maven so any suggestions would be helpful. Full stack
> trace below.  I’m on a Mac using Sierra.
>
> I tried mvn dependencies::tree -U and am getting this error as well:
>
> [ERROR] Failed to execute goal on project drill-jdbc: Could not resolve
> dependencies for project org.apache.drill.exec:drill-jdbc:jar:1.11.0-SNAPSHOT:
> The following artifacts could not be resolved: 
> org.apache.drill.exec:drill-java-exec:jar:1.11.0-SNAPSHOT,
> org.apache.drill.exec:drill-java-exec:jar:tests:1.11.0-SNAPSHOT: Could
> not find artifact org.apache.drill.exec:drill-java-exec:jar:1.11.0-SNAPSHOT
> in mapr-drill-optiq-snapshots (http://repository.mapr.com/
> nexus/content/repositories/drill-optiq/) -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
> goal on project drill-jdbc: Could not resolve dependencies for project
> org.apache.drill.exec:drill-jdbc:jar:1.11.0-SNAPSHOT: The following
> artifacts could not be resolved: 
> org.apache.drill.exec:drill-java-exec:jar:1.11.0-SNAPSHOT,
> org.apache.drill.exec:drill-java-exec:jar:tests:1.11.0-SNAPSHOT: Could
> not find artifact org.apache.drill.exec:drill-java-exec:jar:1.11.0-SNAPSHOT
> in mapr-drill-optiq-snapshots (http://repository.mapr.com/
> nexus/content/repositories/drill-optiq/)
> at org.apache.maven.lifecycle.internal.
> LifecycleDependencyResolver.getDependencies(LifecycleDependencyRe
>
>
> Thanks,
> — C
>
>
> [ERROR] Failed to execute goal org.apache.maven.plugins:
> maven-surefire-plugin:2.17:test (default-test) on project
> drill-java-exec: ExecutionException: java.lang.RuntimeException: There was
> an error in the forked process
> [ERROR] java.lang.NoClassDefFoundError: mockit/internal/state/TestRun
> [ERROR] at org.junit.runner.notification.RunNotifier.
> fireTestRunStarted(RunNotifier.java)
> [ERROR] at org.junit.runner.JUnitCore.run(JUnitCore.java:136)
> [ERROR] at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
> [ERROR] at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.
> createRequestAndRun(JUnitCoreWrapper.java:113)
> [ERROR] at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.
> executeLazy(JUnitCoreWrapper.java:94)
> [ERROR] at org.apache.maven.surefire.junitcore.JUnitCoreWrapper.
> execute(JUnitCoreWrapper.java:58)
> [ERROR] at org.apache.maven.surefire.junitcore.JUnitCoreProvider.
> invoke(JUnitCoreProvider.java:134)
> [ERROR] at org.apache.maven.surefire.booter.ForkedBooter.
> invokeProviderInSameClassLoader(ForkedBooter.java:200)
> [ERROR] at org.apache.maven.surefire.booter.ForkedBooter.
> runSuitesInProcess(ForkedBooter.java:153)
> [ERROR] at org.apache.maven.surefire.booter.ForkedBooter.main(
> ForkedBooter.java:103)
> [ERROR] Caused by: java.lang.ClassNotFoundException:
> mockit.internal.state.TestRun
> [ERROR] at java.net.URLClassLoader.findClass(URLClassLoader.java:
> 381)
> [ERROR] at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> [ERROR] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> [ERROR] ... 10 more
> [ERROR] -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
> goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test
> (default-test) on project drill-java-exec: ExecutionException
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute(
> MojoExecutor.java:213)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute(
> MojoExecutor.java:154)
> at org.apache.maven.lifecycle.internal.MojoExecutor.execute(
> MojoExecutor.java:146)
> at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.
> buildProject(LifecycleModuleBuilder.java:117)
> at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.
> buildProject(LifecycleModuleBuilder.java:81)
> at org.apache.maven.lifecycle.internal.builder.singlethreaded.
> SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
> at

Re: Why rules from all plugins contribute into optimizing any type of query ?

2017-07-03 Thread Muhammad Gelbana

Thanks a lot everyone.

Aman, your answer is very convincing. You made it clear that since a single
query can involve multiple plugins, then all rules provided by at least the
involved plugins must be considered by the planner.

-- Gelbana

On Mon, Jul 3, 2017 at 4:04 AM, Aman Sinha <amansi...@apache.org> wrote:

> Agree with Ted and Julian's comments and would add one more point: The
> Planner registers the Storage plugin optimizer rules from all the plugins
> [1].  The assumption is that a single query could be querying multiple data
> sources, joining them in Drill etc, so it is the rule's responsibility to
> have the proper constructor and matches() method to specialize it.   For
> example,  if you have a logical planning rule and have a DrillScanRel which
> is supposed to have a GroupScan of JDBCGroupScan,  you could check that in
> your matches() and return False if otherwise.
>
>
> [1]
> https://github.com/apache/drill/blob/master/exec/java-
> exec/src/main/java/org/apache/drill/exec/planner/PlannerPhase.java#L197
>
> Aman
>
> On Sun, Jul 2, 2017 at 11:47 AM, Julian Hyde <jh...@apache.org> wrote:
>
> > What Ted said.
> >
> > But also, conversely, you should know that in Calcite you can write a
> > general-purpose rule. Or better, re-use a general-purpose rule that
> someone
> > else has written. There are logical rules, for example constant reduction
> > and logic simplification, that work regardless of the data source. And
> > there are common patterns, for example pushing down projects and filters,
> > that apply to many different data sources. You would want to push down
> > projects and filters to Parquet just as would would want to push them
> into
> > a JDBC source, albeit that Parquet cannot handle as rich an expression
> > language.
> >
> > General-purpose rules often take classes as constructor parameters, so
> you
> > can instantiate the rule to look for a FooProject.class rather than a
> > JdbcProject.class or LogicalProject.class.
> >
> > Julian
> >
> >
> > > On Jul 2, 2017, at 11:08 AM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
> > >
> > > It all depends on how you write your rules. If you write them so that
> > they
> > > apply too generally, then the rules themselves are at fault.
> > >
> > > If you write rules that only optimize for your input format, then you
> > > should be fine.
> > >
> > >
> > > On Jul 2, 2017 9:41 AM, "Muhammad Gelbana" <m.gelb...@gmail.com>
> wrote:
> > >
> > >> I wrote a plugin for a custom JDBC datasource. This plugin registers a
> > >> couple of rules.
> > >>
> > >> When I ran an SQL query that uses parquet files, I found that a rule
> of
> > my
> > >> JDBC plugin was invoked to optimize the query !
> > >>
> > >> I believe this is a mistake. Please correct me if I'm wrong.
> > >>
> > >> I'm saying this is a mistake because a rule registered by a plugin
> that
> > >> utilizes a specific datasource should only be concerned about queries
> > >> executed by that plugin.
> > >>
> > >> Query optimizations for a JDBC datasource won't probably work for
> > queries
> > >> targeted for parquet files !
> > >>
> > >> What do you think ?
> > >>
> > >> Gelbana
> > >>
> >
> >
>

Why rules from all plugins contribute into optimizing any type of query ?

2017-07-02 Thread Muhammad Gelbana

I wrote a plugin for a custom JDBC datasource. This plugin registers a
couple of rules.

When I ran an SQL query that uses parquet files, I found that a rule of my
JDBC plugin was invoked to optimize the query !

I believe this is a mistake. Please correct me if I'm wrong.

I'm saying this is a mistake because a rule registered by a plugin that
utilizes a specific datasource should only be concerned about queries
executed by that plugin.

Query optimizations for a JDBC datasource won't probably work for queries
targeted for parquet files !

What do you think ?

Gelbana

Why Drill required a special Calcite fork ?

2017-06-30 Thread Muhammad Gelbana

Would someone please list the reasons for which Drill required having a
custom Calcite build ? This will help integrating Calcite whenever a
release is out.

One reason I can assume is that Calcite didn't support unparsing all SQL
clauses such as FETCH and OFFSET ?

I for my self, very much need to use the latest Calcite version with Drill
and I'm willing to spend time working on that, if it's possible for my poor
Drill and Calcite knowledge.

-
Gelbana

Re: FindHardDistributionScans throws a NPE while visiting a TableScan

2017-06-24 Thread Muhammad Gelbana

With pleasure. I can't successfully run all test cases
<https://issues.apache.org/jira/browse/DRILL-5606> ATM. When I overcome
that, I'll push the fix.

*-----*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Wed, Jun 21, 2017 at 1:47 PM, Khurram Faraaz <kfar...@mapr.com> wrote:

> Muhammad, please create a pull request and someone will review you code,
> ensure that existing unit tests don't fail due to your changes.
>
>
> Thanks,
>
> Khurram
>
> ________
> From: Muhammad Gelbana <m.gelb...@gmail.com>
> Sent: Wednesday, June 21, 2017 4:11:41 PM
> To: dev@drill.apache.org
> Subject: Re: FindHardDistributionScans throws a NPE while visiting a
> TableScan
>
> This has been bugging me for sometime, and I've only solved it after
> starting this thread !
>
> I solved this by overriding the
> *org.apache.calcite.rel.AbstractRelNode.accept(RelShuttle)* method for the
> relational node(s) containing *JdbcTableScan* to avoid this.
>
> @Override
> > public RelNode accept(RelShuttle shuttle) {
> >
> > if(shuttle.getClass().getName().equals("org.apache.drill.
> exec.planner.sql.handlers.FindHardDistributionScans")){
> > return this;
> > }
> > return super.accept(shuttle);
> > }
>
>
> If someone finds this introducing another bug, please tell me about it.
>
>
> *-*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Tue, Jun 20, 2017 at 2:13 AM, Jinfeng Ni <j...@apache.org> wrote:
>
> > unwrap() essentially is doing a cast.  If it returns null for
> > unwrap(DrillTranslatableTable.class) or unwrap(DrillTable.class), it
> means
> > the table associate with this TableScan does not implement either
> > interface. My suspicion is  JDBC storage plugin returns JdbcTable [1],
> > unlikely other storage plugin which returns an instance implementing
> > DrillTable.
> >
> > This seems to indicate FindHardDistributionScans could not be used to
> > non-DrillTable. I'm not sure if that's the intention of that code,
> though.
> >
> > 1.
> > https://github.com/apache/calcite/blob/master/core/src/
> > main/java/org/apache/calcite/adapter/jdbc/JdbcSchema.java#L233-L234
> >
> > On Mon, Jun 19, 2017 at 2:19 PM, Muhammad Gelbana <m.gelb...@gmail.com>
> > wrote:
> >
> > > Everyone,
> > >
> > > I made a copy of the Jdbc plugin and made modifications to it by
> adding a
> > > few rules. None of the modification I made or the rules I wrote should
> > have
> > > anything extra to do with handling the following SQL query
> > >
> > > SELECT * FROM incorta.SALES.SALES SALES WHERE 1 = 2 LIMIT 1
> > >
> > >
> > > I know the query is useless, but I need to to know how to fix the
> > following
> > > error thrown while handling this query. This is the final query plan:
> > >
> > > DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Convert SUM to
> > $SUM0
> > > > (0ms):
> > > > DrillLimitRel(*fetch=[1]*): rowcount = 1.0, cumulative cost = {201.0
> > > > rows, 205.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2653
> > > >   DrillLimitRel(*offset=[0], fetch=[0]*): rowcount = 1.0, cumulative
> > cost
> > > > = {200.0 rows, 201.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2651
> > > > GelbanaJdbcDrel: rowcount = 100.0, cumulative cost = {200.0 rows,
> > > > 201.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2649
> > > >   JdbcTableScan(table=[[gelbana, SALES, SALES]]): rowcount =
> 100.0,
> > > > cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0
> > > memory},
> > > > id = 2572
> > >
> > >
> > > This is the throw error stacktrace
> > >
> > > [Error Id: 83ea094a-db24-4d6d-bf0d-271db26db933 on 640fb7ebbd1a:31010]
> > > at
> > > org.apache.drill.common.exceptions.UserException$
> > > Builder.build(UserException.java:543)
> > > ~[drill-common-1.9.0.jar:1.9.0]
> > > at
> > > org.apache.drill.exec.work.foreman.Foreman$ForemanResult.
> > > close(Foreman.java:825)
> > > [drill-java-exec-1.9.0.jar:1.9.0]
> > > at org.apache.drill.exec.work.foreman.Foreman.moveToState(
> > > Foreman.java:935)
> > > [drill-java-exec-1.9.0.jar:1.9.0]
> > > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281)
> > > [drill-java-exec-1.9.0.jar:1.9.0]
> > > at
> > > java.util.concurrent.ThreadPoolExecut

Re: Drill test cases failures after a fresh clone

2017-06-24 Thread Muhammad Gelbana

Thank you all for your replies. Here is the jira issue as requested.
https://issues.apache.org/jira/browse/DRILL-5606

Please tell me right away if you need more information.

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Fri, Jun 23, 2017 at 7:52 PM, Paul Rogers <prog...@mapr.com> wrote:

> Hi Muhammad,
>
> The tests have worked for me. That said, we do want them to work “out of
> the box” for everyone, so let’s see if we can track down the issues you are
> having.
>
> TestClassTransformation.testCompilationNoDebug is a timeout issue.
> Sometimes this just means that tests run slowly, perhaps due to other load
> on the system. When that happens, JUnit fails them due to a hard-coded,
> fixed timeout. Was there other load on our machine while running the tests?
>
> The others look like functional issues. We need more details since the
> summary messages are truncated. Did you happen to capture the entire test
> output? If so, you can find the full details for each test so we can see
> what’s happening.
>
> Otherwise, you can run the test individually:
>
> > cd exec/java-exec
> > mvn surefire:test -Dtest=TestName
>
> Try this with the four tests that failed (in one case, there were two
> failures in the same test class.)
>
> You can also run the four tests in our IDE. Now, sometimes I’ve found that
> tests work fine in one of the three cases, but fail in others. (Maven set
> up a bunch of system properties, for example. Maven runs tests in a single
> JVM, which can occasionally cause problems, but your IDE will launch a new
> JVM for each run…)
>
> Once we see what you find, we can start to figure out what is going on.
>
> To make this easier to track, I’d suggest opening a JIRA ticket to report
> the issue, then posting your test results there. We’ll need that JIRA if we
> have to make any fixes to tests.
>
> Thanks,
>
> - Paul
>
> > On Jun 23, 2017, at 4:35 AM, Muhammad Gelbana <m.gelb...@gmail.com>
> wrote:
> >
> > I intend to create a pull request, so I forked Drill on github and cloned
> > the repository locally on my machine.
> >
> > I built the project by running
> > *mvn clean install -DskipTests*
> >
> > Then I tried running the tests by running
> > *mvn clean install*
> >
> > Then I started facing test failures. The first time, the following tests
> > failed:
> >
> > Tests in error:
> >>
> >> TestClassTransformation.testJDKClassCompiler:73->
> compilationInnerClass:118
> >> »  ...
> >>  TestClassTransformation.testCompilationNoDebug:88 »  test timed out
> >> after 5000...
> >> *  TestCastFunctions.testToDateForTimeStamp:79 »  at position 0 column
> >> '`col`' mi...*
> >>  TestNewMathFunctions.testTrigoMathFunc:111->runTest:85 »
> ExecutionSetup
> >> Failur...
> >> *  TestNewDateFunctions.testIsDate:61 »  After matching 0 records, did
> not
> >> find e...*
> >>
> >
> > I thought to run the tests again because I found a test case failing due
> to
> > a timeout, so after the second run, the following tests failed *again*:
> >
> > Tests in error:
> >> *  TestCastFunctions.testToDateForTimeStamp:79 »  at position 0 column
> >> '`col`' mi...*
> >> *  TestNewDateFunctions.testIsDate:61 »  After matching 0 records, did
> not
> >> find e...*
> >
> >
> > Running tests take around half an hour and I was wondering if there is
> > something I need to setup or configure before I can run the tests and get
> > 100% success rate since I haven't modified anything yet.
> >
> > *-*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
>
>

[jira] [Created] (DRILL-5606) Some tests fail after creating a fresh clone

2017-06-24 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5606:
---

 Summary: Some tests fail after creating a fresh clone
 Key: DRILL-5606
 URL: https://issues.apache.org/jira/browse/DRILL-5606
 Project: Apache Drill
  Issue Type: Bug
  Components: Tools, Build & Test
 Environment: {noformat}
$ uname -a
Linux mg-mate 4.4.0-81-generic #104-Ubuntu SMP Wed Jun 14 08:17:06 UTC 2017 
x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 16.04.2 LTS
Release:16.04
Codename:   xenial

$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.16.04.2-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
{noformat}

Environment variables JAVA_HOME, JRE_HOME, JDK_HOME aren't configured. Java 
executable is found as the PATH environment variables links to it. I can 
provide more details if needed.
Reporter: Muhammad Gelbana


I cloned Drill from Github using this url: 
[https://github.com/apache/drill.git] and I didn't change the branch 
afterwards, so I'm using *master*.

Afterwards, I ran the following command

{noformat}
mvn clean install
{noformat}

I attached the full log but here is a snippet indicating the failing tests:
{noformat}
Failed tests: 
  TestExtendedTypes.checkReadWriteExtended:60 expected:<...ateDay" : 
"1997-07-1[6"
  },
  "drill_timestamp" : {
"$date" : "2009-02-23T08:00:00.000Z"
  },
  "time" : {
"$time" : "19:20:30.450Z"
  },
  "interval" : {
"$interval" : "PT26.400S"
  },
  "integer" : {
"$numberLong" : 4
  },
  "inner" : {
"bin" : {
  "$binary" : "ZHJpbGw="
},
"drill_date" : {
  "$dateDay" : "1997-07-16]"
},
"drill_...> but was:<...ateDay" : "1997-07-1[5"
  },
  "drill_timestamp" : {
"$date" : "2009-02-23T08:00:00.000Z"
  },
  "time" : {
"$time" : "19:20:30.450Z"
  },
  "interval" : {
"$interval" : "PT26.400S"
  },
  "integer" : {
"$numberLong" : 4
  },
  "inner" : {
"bin" : {
  "$binary" : "ZHJpbGw="
},
"drill_date" : {
  "$dateDay" : "1997-07-15]"
},
"drill_...>

Tests in error: 
  TestCastFunctions.testToDateForTimeStamp:79 »  at position 0 column '`col`' 
mi...
  TestNewDateFunctions.testIsDate:61 »  After matching 0 records, did not find 
e...

Tests run: 2128, Failures: 1, Errors: 2, Skipped: 139

[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Drill Root POM .. SUCCESS [ 19.805 s]
[INFO] tools/Parent Pom ... SUCCESS [  0.605 s]
[INFO] tools/freemarker codegen tooling ... SUCCESS [  7.077 s]
[INFO] Drill Protocol . SUCCESS [  7.959 s]
[INFO] Common (Logical Plan, Base expressions)  SUCCESS [  7.734 s]
[INFO] Logical Plan, Base expressions . SUCCESS [  8.099 s]
[INFO] exec/Parent Pom  SUCCESS [  0.575 s]
[INFO] exec/memory/Parent Pom . SUCCESS [  0.513 s]
[INFO] exec/memory/base ... SUCCESS [  4.666 s]
[INFO] exec/rpc ... SUCCESS [  2.684 s]
[INFO] exec/Vectors ... SUCCESS [01:11 min]
[INFO] contrib/Parent Pom . SUCCESS [  0.547 s]
[INFO] contrib/data/Parent Pom  SUCCESS [  0.496 s]
[INFO] contrib/data/tpch-sample-data .. SUCCESS [  2.698 s]
[INFO] exec/Java Execution Engine . FAILURE [19:09 min]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Drill test cases failures after a fresh clone

2017-06-23 Thread Muhammad Gelbana

I intend to create a pull request, so I forked Drill on github and cloned
the repository locally on my machine.

I built the project by running
*mvn clean install -DskipTests*

Then I tried running the tests by running
*mvn clean install*

Then I started facing test failures. The first time, the following tests
failed:

Tests in error:
>
> TestClassTransformation.testJDKClassCompiler:73->compilationInnerClass:118
> »  ...
>   TestClassTransformation.testCompilationNoDebug:88 »  test timed out
> after 5000...
> *  TestCastFunctions.testToDateForTimeStamp:79 »  at position 0 column
> '`col`' mi...*
>   TestNewMathFunctions.testTrigoMathFunc:111->runTest:85 » ExecutionSetup
> Failur...
> *  TestNewDateFunctions.testIsDate:61 »  After matching 0 records, did not
> find e...*
>

I thought to run the tests again because I found a test case failing due to
a timeout, so after the second run, the following tests failed *again*:

Tests in error:
> *  TestCastFunctions.testToDateForTimeStamp:79 »  at position 0 column
> '`col`' mi...*
> *  TestNewDateFunctions.testIsDate:61 »  After matching 0 records, did not
> find e...*


Running tests take around half an hour and I was wondering if there is
something I need to setup or configure before I can run the tests and get
100% success rate since I haven't modified anything yet.

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

Re: FindHardDistributionScans throws a NPE while visiting a TableScan

2017-06-21 Thread Muhammad Gelbana

This has been bugging me for sometime, and I've only solved it after
starting this thread !

I solved this by overriding the
*org.apache.calcite.rel.AbstractRelNode.accept(RelShuttle)* method for the
relational node(s) containing *JdbcTableScan* to avoid this.

@Override
> public RelNode accept(RelShuttle shuttle) {
>
> if(shuttle.getClass().getName().equals("org.apache.drill.exec.planner.sql.handlers.FindHardDistributionScans")){
> return this;
> }
> return super.accept(shuttle);
> }


If someone finds this introducing another bug, please tell me about it.


*-----*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Tue, Jun 20, 2017 at 2:13 AM, Jinfeng Ni <j...@apache.org> wrote:

> unwrap() essentially is doing a cast.  If it returns null for
> unwrap(DrillTranslatableTable.class) or unwrap(DrillTable.class), it means
> the table associate with this TableScan does not implement either
> interface. My suspicion is  JDBC storage plugin returns JdbcTable [1],
> unlikely other storage plugin which returns an instance implementing
> DrillTable.
>
> This seems to indicate FindHardDistributionScans could not be used to
> non-DrillTable. I'm not sure if that's the intention of that code, though.
>
> 1.
> https://github.com/apache/calcite/blob/master/core/src/
> main/java/org/apache/calcite/adapter/jdbc/JdbcSchema.java#L233-L234
>
> On Mon, Jun 19, 2017 at 2:19 PM, Muhammad Gelbana <m.gelb...@gmail.com>
> wrote:
>
> > Everyone,
> >
> > I made a copy of the Jdbc plugin and made modifications to it by adding a
> > few rules. None of the modification I made or the rules I wrote should
> have
> > anything extra to do with handling the following SQL query
> >
> > SELECT * FROM incorta.SALES.SALES SALES WHERE 1 = 2 LIMIT 1
> >
> >
> > I know the query is useless, but I need to to know how to fix the
> following
> > error thrown while handling this query. This is the final query plan:
> >
> > DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Convert SUM to
> $SUM0
> > > (0ms):
> > > DrillLimitRel(*fetch=[1]*): rowcount = 1.0, cumulative cost = {201.0
> > > rows, 205.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2653
> > >   DrillLimitRel(*offset=[0], fetch=[0]*): rowcount = 1.0, cumulative
> cost
> > > = {200.0 rows, 201.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2651
> > > GelbanaJdbcDrel: rowcount = 100.0, cumulative cost = {200.0 rows,
> > > 201.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 2649
> > >   JdbcTableScan(table=[[gelbana, SALES, SALES]]): rowcount = 100.0,
> > > cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network, 0.0
> > memory},
> > > id = 2572
> >
> >
> > This is the throw error stacktrace
> >
> > [Error Id: 83ea094a-db24-4d6d-bf0d-271db26db933 on 640fb7ebbd1a:31010]
> > at
> > org.apache.drill.common.exceptions.UserException$
> > Builder.build(UserException.java:543)
> > ~[drill-common-1.9.0.jar:1.9.0]
> > at
> > org.apache.drill.exec.work.foreman.Foreman$ForemanResult.
> > close(Foreman.java:825)
> > [drill-java-exec-1.9.0.jar:1.9.0]
> > at org.apache.drill.exec.work.foreman.Foreman.moveToState(
> > Foreman.java:935)
> > [drill-java-exec-1.9.0.jar:1.9.0]
> > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281)
> > [drill-java-exec-1.9.0.jar:1.9.0]
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1142)
> > [na:1.8.0_131]
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:617)
> > [na:1.8.0_131]
> > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
> > Caused by: org.apache.drill.exec.work.foreman.ForemanException:
> Unexpected
> > exception during fragment initialization: null
> > ... 4 common frames omitted
> > *Caused by: java.lang.NullPointerException: null*
> > at
> > org.apache.drill.exec.planner.sql.handlers.FindHardDistributionScans.
> > visit(FindHardDistributionScans.java:55)
> > ~[drill-java-exec-1.9.0.jar:1.9.0]
> > at org.apache.calcite.rel.core.TableScan.accept(TableScan.java:166)
> > ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
> > at org.apache.calcite.rel.RelShuttleImpl.visitChild(
> > RelShuttleImpl.java:53)
> > ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
> > at
> > org.apache.calcite.rel.RelShuttleImpl.visitChildren(
> > RelShuttleImpl.java:68)
> > ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
> > at org.apache.calcite.rel.RelShuttleImpl.visit(RelSh

FindHardDistributionScans throws a NPE while visiting a TableScan

2017-06-19 Thread Muhammad Gelbana

Whic is because this statement

unwrap =
> scan.getTable().unwrap(DrillTranslatableTable.class).getDrillTable();


In *FindHardDistributionScans.java:55* evaluates
*scan.getTable().unwrap(DrillTranslatableTable.class)* to null

Would someone please explain to me what is Drill trying to do and what did
I do wrong ?

*-----*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

[jira] [Created] (DRILL-5583) Literal expression not handled

2017-06-13 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5583:
---

 Summary: Literal expression not handled
 Key: DRILL-5583
 URL: https://issues.apache.org/jira/browse/DRILL-5583
 Project: Apache Drill
  Issue Type: Bug
  Components: SQL Parser
Affects Versions: 1.9.0
Reporter: Muhammad Gelbana


The following query
{code:sql}
SELECT ((UNIX_TIMESTAMP(Calcs.`date0`, '-MM-dd') / (60 * 60 * 24)) + (365 * 
70 + 17)) `TEMP(Test)(64617177)(0)` FROM `dfs`.`path_to_parquet` Calcs GROUP BY 
((UNIX_TIMESTAMP(Calcs.`date0`, '-MM-dd') / (60 * 60 * 24)) + (365 * 70 + 
17))
{code}

Throws the following exception
{noformat}
[Error Id: 5ee33c0f-9edc-43a0-8125-3e6499e72410 on mgelbana:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError: 
Internal error: invalid literal: 60 + 2


[Error Id: 5ee33c0f-9edc-43a0-8125-3e6499e72410 on mgelbana:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
 ~[drill-common-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:825)
 [drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:935) 
[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) 
[drill-java-exec-1.9.0.jar:1.9.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: Internal error: invalid literal: 60 + 
2
... 4 common frames omitted
Caused by: java.lang.AssertionError: Internal error: invalid literal: 60 + 2
at org.apache.calcite.util.Util.newInternal(Util.java:777) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlLiteral.value(SqlLiteral.java:329) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlCallBinding.getOperandLiteralValue(SqlCallBinding.java:219)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlBinaryOperator.getMonotonicity(SqlBinaryOperator.java:188)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.drill.exec.planner.sql.DrillCalciteSqlOperatorWrapper.getMonotonicity(DrillCalciteSqlOperatorWrapper.java:107)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.calcite.sql.SqlCall.getMonotonicity(SqlCall.java:175) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.SqlCallBinding.getOperandMonotonicity(SqlCallBinding.java:193)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.fun.SqlMonotonicBinaryOperator.getMonotonicity(SqlMonotonicBinaryOperator.java:59)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.drill.exec.planner.sql.DrillCalciteSqlOperatorWrapper.getMonotonicity(DrillCalciteSqlOperatorWrapper.java:107)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.calcite.sql.SqlCall.getMonotonicity(SqlCall.java:175) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SelectScope.getMonotonicity(SelectScope.java:154)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:2476)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:2374)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:603)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:564)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2769)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:518)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.drill.exec.planner.sql.SqlConverter.toRel(SqlConverter.java:263) 
~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:626)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:195)
 ~[drill-java-exec-1.9.0.jar:1.9.0

Upgrading Calcite's version

2017-06-02 Thread Muhammad Gelbana

Was the currently used version of Calcite (Based on v1.4 ?) modified in
anyway before it was used in building Drill ?

I'm considering creating a new build of Drill with the latest version of
Calcite and I need to understand the amount of effort needed.

The reason I want to do that is that I need a feature that exists in a more
recent version of Calcite, which is pushing down aggregates without using
subqueries.

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

Why isn't Drill using a more recent version of Calcite ?

2017-05-28 Thread Muhammad Gelbana

Drill is using Calcite v1.4 while it has already reached v1.12. For myself,
there is no reason to upgrade at the moment but I'm wondering why haven't
Drill upgraded it's used version of such a core component as Calcite for a
bit more than a year and a half now ?

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

Issues categorization suggestion

2017-05-25 Thread Muhammad Gelbana

Hi,

I suggest to categorize issues according the level of expertise needed to
solve each one. This will encourage want-to-help\learn but
not-so-experienced-with-drill developers to take a thorough look into
issues not requiring high level of experience with Drill.

What do you think ?

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

[jira] [Created] (DRILL-5539) drillbit.sh script breaks if the working directory contains spaces

2017-05-25 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5539:
---

 Summary: drillbit.sh script breaks if the working directory 
contains spaces
 Key: DRILL-5539
 URL: https://issues.apache.org/jira/browse/DRILL-5539
 Project: Apache Drill
  Issue Type: Bug
 Environment: Linux
Reporter: Muhammad Gelbana


The following output occurred when we tried running the drillbit.sh script in a 
path that contains spaces: */home/folder1/Folder Name/drill/bin*

{noformat}
[mgelbana@regression-sysops bin]$ ./drillbit.sh start
./drillbit.sh: line 114: [: /home/folder1/Folder: binary operator expected
Starting drillbit, logging to /home/folder1/Folder Name/drill/log/drillbit.out
./drillbit.sh: line 147: $pid: ambiguous redirect
[mgelbana@regression-sysops bin]$ pwd
/home/folder1/Folder Name/drill/bin
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Running cartesian joins on Drill

2017-05-16 Thread Muhammad Gelbana

You are correct Aman. Here is the JIRA issue
<https://issues.apache.org/jira/browse/DRILL-5515>

This thread has been very helpful. Thank you all.

*-----*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Fri, May 12, 2017 at 6:50 AM, Aman Sinha <asi...@mapr.com> wrote:

> Muhammad,
> The join condition  ‘a = b or (a is null && b is null)’ works.
> Internally, this is converted to  ‘a is not distinct from b’ which is
> processed by Drill.
> For some reason, if the second form is directly supplied in the user
> query, it is not working and ends up with the Cartesian join condition.
> Drill leverages Calcite for this (you can see CALCITE-1200 for some
> background).
> Can you file a JIRA for this ?
>
> -Aman
>
> From: "Aman Sinha (asi...@mapr.com)" <asi...@mapr.com>
> Date: Thursday, May 11, 2017 at 4:29 PM
> To: dev <dev@drill.apache.org>, user <u...@drill.apache.org>
> Cc: Shadi Khalifa <khal...@cs.queensu.ca>
> Subject: Re: Running cartesian joins on Drill
>
>
> I think Muhammad may be trying to run his original query with IS NOT
> DISTINCT FROM.   That discussion got side-tracked into Cartesian joins
> because his query was not getting planned and the error was about Cartesian
> join.
>
> Muhammad,  can you try with the equivalent version below ?  You mentioned
> the rewrite but did you try the rewritten version ?
>
>
>
> SELECT * FROM (SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc
>
> LIMIT 2147483647) `t0` INNER JOIN (SELECT 'ABC' `UserID` FROM
>
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON (
>
> 
>
> `t0`.`UserID` = `t1`.`UserID` OR (`t0`.`UserID` IS NULL && `t1`.`UserID`
> IS NULL) )
>
>
>
> On 5/11/17, 3:23 PM, "Zelaine Fong" <zf...@mapr.com> wrote:
>
>
>
> I’m not sure why it isn’t working for you.  Using Drill 1.10, here’s
> my output:
>
>
>
> 0: jdbc:drill:zk=local> alter session set 
> `planner.enable_nljoin_for_scalar_only`
> = false;
>
> +---+-+
>
> |  ok   | summary |
>
> +---+-+
>
> | true  | planner.enable_nljoin_for_scalar_only updated.  |
>
> +---+-+
>
> 1 row selected (0.137 seconds)
>
> 0: jdbc:drill:zk=local> explain plan for select * from
> dfs.`/Users/zfong/foo.csv` t1, dfs.`/Users/zfong/foo.csv` t2;
>
> +--+--+
>
> | text | json |
>
> +--+--+
>
> | 00-00Screen
>
> 00-01  ProjectAllowDup(*=[$0], *0=[$1])
>
> 00-02NestedLoopJoin(condition=[true], joinType=[inner])
>
> 00-04  Project(T2¦¦*=[$0])
>
> 00-06Scan(groupscan=[EasyGroupScan
> [selectionRoot=file:/Users/zfong/foo.csv, numFiles=1, columns=[`*`],
> files=[file:/Users/zfong/foo.csv]]])
>
> 00-03  Project(T3¦¦*=[$0])
>
> 00-05Scan(groupscan=[EasyGroupScan
> [selectionRoot=file:/Users/zfong/foo.csv, numFiles=1, columns=[`*`],
> files=[file:/Users/zfong/foo.csv]]])
>
>
>
> -- Zelaine
>
>
>
> On 5/11/17, 3:17 PM, "Muhammad Gelbana" <m.gelb...@gmail.com> wrote:
>
>
>
> But the query I provided failed to be planned because it's a
> cartesian
>
> join, although I've set the option you mentioned to false. Is
> there a
>
> reason why wouldn't Drill rules physically implement the logical
> join in my
>
> query to a nested loop join ?
>
>
>
> *-*
>
> *Muhammad Gelbana*
>
> http://www.linkedin.com/in/mgelbana
>
>
>
> On Thu, May 11, 2017 at 5:05 PM, Zelaine Fong <zf...@mapr.com>
> wrote:
>
>
>
> > Provided `planner.enable_nljoin_for_scalar_only` is set to
> false, even
>
> > without an explicit join condition, the query should use the
> Cartesian
>
> > join/nested loop join.
>
> >
>
> > -- Zelaine
>
> >
>
> > On 5/11/17, 4:20 AM, "Anup Tiwari" <anup.tiw...@games24x7.com>
> wrote:
>
> >
>
> > Hi,
>
> >
>
> > I have one question here.. so if we have to use Cartesian
> join in Drill
>
> > then do we have to follow some workaround like Shadi mention
> : adding a
>
> > dummy column on the fly that has the value 1 in

[jira] [Created] (DRILL-5515) "IS NO DISTINCT FROM" and it's equivalent form aren't handled likewise

2017-05-16 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5515:
---

 Summary: "IS NO DISTINCT FROM" and it's equivalent form aren't 
handled likewise
 Key: DRILL-5515
 URL: https://issues.apache.org/jira/browse/DRILL-5515
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.10.0, 1.9.0
Reporter: Muhammad Gelbana


The following query fails to execute
{code:sql}SELECT * FROM (SELECT `UserID` FROM `dfs`.`path_ot_parquet` tc) `t0` 
INNER JOIN (SELECT `UserID` FROM `dfs`.`path_ot_parquet` tc) `t1` ON 
(`t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID`){code}
and produces the following error message
{noformat}org.apache.drill.common.exceptions.UserRemoteException: 
UNSUPPORTED_OPERATION ERROR: This query cannot be planned possibly due to 
either a cartesian join or an inequality join [Error Id: 
0bd41e06-ccd7-45d6-a038-3359bf5a4a7f on mgelbana-incorta:31010]{noformat}
While the query's equivalent form runs fine
{code:sql}SELECT * FROM (SELECT `UserID` FROM `dfs`.`path_ot_parquet` tc) `t0` 
INNER JOIN (SELECT `UserID` FROM `dfs`.`path_ot_parquet` tc) `t1` ON 
(`t0`.`UserID` = `t1`.`UserID` OR (`t0`.`UserID` IS NULL AND `t1`.`UserID` IS 
NULL)){code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Running cartesian joins on Drill

2017-05-11 Thread Muhammad Gelbana

But the query I provided failed to be planned because it's a cartesian
join, although I've set the option you mentioned to false. Is there a
reason why wouldn't Drill rules physically implement the logical join in my
query to a nested loop join ?

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Thu, May 11, 2017 at 5:05 PM, Zelaine Fong <zf...@mapr.com> wrote:

> Provided `planner.enable_nljoin_for_scalar_only` is set to false, even
> without an explicit join condition, the query should use the Cartesian
> join/nested loop join.
>
> -- Zelaine
>
> On 5/11/17, 4:20 AM, "Anup Tiwari" <anup.tiw...@games24x7.com> wrote:
>
> Hi,
>
> I have one question here.. so if we have to use Cartesian join in Drill
> then do we have to follow some workaround like Shadi mention : adding a
> dummy column on the fly that has the value 1 in both tables and then
> join
> on that column leading to having a match of every row of the first
> table
> with every row of the second table, hence do a Cartesian product?
> OR
> If we just don't specify join condition like :
> select a.*, b.* from tt1 as a, tt2 b; then will it internally treat
> this
> query as Cartesian join.
>
> Regards,
> *Anup Tiwari*
>
> On Mon, May 8, 2017 at 10:00 PM, Zelaine Fong <zf...@mapr.com> wrote:
>
> > Cartesian joins in Drill are implemented as nested loop joins, and I
> think
> > you should see that reflected in the resultant query plan when you
> run
> > explain plan on the query.
> >
> > Yes, Cartesian joins/nested loop joins are expensive because you’re
> > effectively doing an MxN read of your tables.  There are more
> efficient
> > ways of processing a nested loop join, e.g., by creating an index on
> the
> > larger table in the join and then using that index to do lookups
> into that
> > table.  That way, the nested loop join cost is the cost of creating
> the
> > index + M, where M is the number of rows in the smaller table and
> assuming
> > the lookup cost into the index does minimize the amount of data read
> of the
> > second table.  Drill currently doesn’t do this.
> >
> > -- Zelaine
> >
> > On 5/8/17, 9:09 AM, "Muhammad Gelbana" <m.gelb...@gmail.com> wrote:
> >
> > I believe clhubert is referring to this discussion
> > <http://drill-user.incubator.apache.narkive.com/TIXWiTY4/
> > cartesian-product-in-apache-drill#post1>
> > .
> >
> > So why Drill doesn't transform this query into a nested join
> query ?
> > Simply
> > because there is no Calcite rule to transform it into a nested
> loop
> > join ?
> > Is it not technically possible to write such Rule or is it
> feasible so
> > I
> > may take on this challenge ?
> >
> > Also pardon me for repeating my question but I fail to find an
> answer
> > in
> > your replies, why doesn't Drill just run a cartesian join ?
> Because
> > it's
> > expensive regarding resources (i.e. CPU\Network\RAM) ?
> >
> > Thanks a lot Shadi for the query, it works for me.
> >
> > *-*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
> > On Mon, May 8, 2017 at 6:10 AM, Shadi Khalifa <
> khal...@cs.queensu.ca>
> > wrote:
> >
> > > Hi Muhammad,
> > >
> > > I did the following as a workaround to have Cartesian product.
> The
> > basic
> > > idea is to create a dummy column on the fly that has the value
> 1 in
> > both
> > > tables and then join on that column leading to having a match
> of
> > every row
> > > of the first table with every row of the second table, hence
> do a
> > Cartesian
> > > product. This might not be the most efficient way but it will
> do the
> > job.
> > >
> > > *Original Query:*
> > > SELECT * FROM
> > > ( SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc
> LIMIT
> > > 2147483647) `t0`
> > > INNER JOIN
> > > ( SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc
> LIMIT
> > > 2147483647) `t1`
> > > ON (`t0`.`UserID` IS NOT DISTINCT FROM `t1`.`U

Re: Running cartesian joins on Drill

2017-05-08 Thread Muhammad Gelbana

I believe clhubert is referring to this discussion
<http://drill-user.incubator.apache.narkive.com/TIXWiTY4/cartesian-product-in-apache-drill#post1>
.

So why Drill doesn't transform this query into a nested join query ? Simply
because there is no Calcite rule to transform it into a nested loop join ?
Is it not technically possible to write such Rule or is it feasible so I
may take on this challenge ?

Also pardon me for repeating my question but I fail to find an answer in
your replies, why doesn't Drill just run a cartesian join ? Because it's
expensive regarding resources (i.e. CPU\Network\RAM) ?

Thanks a lot Shadi for the query, it works for me.

*-----*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Mon, May 8, 2017 at 6:10 AM, Shadi Khalifa <khal...@cs.queensu.ca> wrote:

> Hi Muhammad,
>
> I did the following as a workaround to have Cartesian product. The basic
> idea is to create a dummy column on the fly that has the value 1 in both
> tables and then join on that column leading to having a match of every row
> of the first table with every row of the second table, hence do a Cartesian
> product. This might not be the most efficient way but it will do the job.
>
> *Original Query:*
> SELECT * FROM
> ( SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc LIMIT
> 2147483647) `t0`
> INNER JOIN
> ( SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc LIMIT
> 2147483647) `t1`
> ON (`t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID`)
> LIMIT 2147483647
>
> *Workaround (add columns **d1a381f3g73 and **d1a381f3g74 to tables one
> and two, respectively. Names don't really matter, just need to be unique):*
> SELECT * FROM
> ( SELECT *1 as d1a381f3g73*, 'ABC' `UserID` FROM
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t0`
> INNER JOIN
> ( SELECT *1 as d1a381f3g74*, 'ABC' `UserID` FROM
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1`
> ON (`t0`.*d1a381f3g73 = *`t1`.*d1a381f3g74*)
> WHERE `t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID`
> LIMIT 2147483647
>
> Regards
>
>
> *Shadi Khalifa, PhD*
> Postdoctoral Fellow
> Cognitive Analytics Development Hub
> Centre for Advanced Computing
> Queen’s University
> (613) 533-6000 x78347
> http://cac.queensu.ca
>
> I'm just a neuron in the society collective brain
>
> *Join us for HPCS in June 2017! Register at:*  *http://2017.hpcs.ca/
> <http://2017.hpcs.ca/>*
>
> P Please consider your environmental responsibility before printing this
> e-mail
>
> *01001001 0010 01101100 0110 01110110 01100101 0010 01000101
> 01100111 0001 0111 01110100 *
>
> *The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential material. Any review or
> dissemination of this information by persons other than the intended
> recipient is prohibited. If you received this in error, please contact the
> sender and delete the material from any computer. Thank you.*
>
>
>
> On Saturday, May 6, 2017 6:05 PM, Muhammad Gelbana <m.gelb...@gmail.com>
> wrote:
>
>
> 
> Here it is:
>
> SELECT * FROM (SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc
> LIMIT 2147483647) `t0` INNER JOIN (SELECT 'ABC' `UserID` FROM
> `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON (
> 
> `t0`.`UserID` IS NOT DISTINCT FROM
> 
> `t1`.`UserID`) LIMIT 2147483647
>
> I debugged Drill code and found it decomposes *IS NOT DISTINCT FROM* into
> 
> *`t0`.`UserID` = `t1`.`UserID` OR (`t0`.`UserID` IS NULL && `t1`.`UserID`
> IS NULL**)* while checking if the query is a cartesian join, and when the
> check returns true, it throws an excetion saying: *This query cannot be
> planned possibly due to either a cartesian join or an inequality join*
>
>
> *-*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Sat, May 6, 2017 at 6:53 PM, Gautam Parai <gpa...@mapr.com> wrote:
>
> > Can you please specify the query you are trying to execute?
> >
> >
> > Gautam
> >
> > 
> > From: Muhammad Gelbana <m.gelb...@gmail.com>
> > Sent: Saturday, May 6, 2017 7:34:53 AM
> > To: u...@drill.apache.org; dev@drill.apache.org
> > Subject: Running cartesian joins on Drill
> >
> > Is there a reason why Drill would intentionally reject cartesian join
> > queries even if *planner.enable_nljoin_for_scalar_only* is disabled ?
> >
> > Any ideas how could a query be rewritten to overcome this restriction ?
> >
> > *-*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
>
>
>

Re: Running cartesian joins on Drill

2017-05-06 Thread Muhammad Gelbana


Here it is:

SELECT * FROM (SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc
LIMIT 2147483647) `t0` INNER JOIN (SELECT 'ABC' `UserID` FROM
`dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON (

`t0`.`UserID` IS NOT DISTINCT FROM

`t1`.`UserID`) LIMIT 2147483647

I debugged Drill code and found it decomposes *IS NOT DISTINCT FROM* into

*`t0`.`UserID` = `t1`.`UserID` OR (`t0`.`UserID` IS NULL && `t1`.`UserID`
IS NULL**)* while checking if the query is a cartesian join, and when the
check returns true, it throws an excetion saying: *This query cannot be
planned possibly due to either a cartesian join or an inequality join*


*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Sat, May 6, 2017 at 6:53 PM, Gautam Parai <gpa...@mapr.com> wrote:

> Can you please specify the query you are trying to execute?
>
>
> Gautam
>
> ________
> From: Muhammad Gelbana <m.gelb...@gmail.com>
> Sent: Saturday, May 6, 2017 7:34:53 AM
> To: u...@drill.apache.org; dev@drill.apache.org
> Subject: Running cartesian joins on Drill
>
> Is there a reason why Drill would intentionally reject cartesian join
> queries even if *planner.enable_nljoin_for_scalar_only* is disabled ?
>
> Any ideas how could a query be rewritten to overcome this restriction ?
>
> *-*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>

Running cartesian joins on Drill

2017-05-06 Thread Muhammad Gelbana

Is there a reason why Drill would intentionally reject cartesian join
queries even if *planner.enable_nljoin_for_scalar_only* is disabled ?

Any ideas how could a query be rewritten to overcome this restriction ?

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

Understanding the science and concepts behind Calcite

2017-04-29 Thread Muhammad Gelbana

I'm trying to understand the scientific concepts behind Calcite and I was
wondering if anyone would kindly recommend
articles\papers\books\topic-titles that would help me understand Calcite
from the ground up.

For instance, I'm not fully understanding what are:

   - Relational expressions
   - Row expressions
   - Calling conventions
   - Relational traits
   - Relational traits definitions

I'm currently looking for books about "Relational Algebra", but when look
into one, I can't find anything about traits or calling conventions. Or am
I not searching for the correct keywords ?

*-----*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

[jira] [Created] (DRILL-5452) Join query cannot be planned although all joins are enabled and "planner.enable_nljoin_for_scalar_only" is disabled

2017-04-28 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5452:
---

 Summary: Join query cannot be planned although all joins are 
enabled and "planner.enable_nljoin_for_scalar_only" is disabled
 Key: DRILL-5452
 URL: https://issues.apache.org/jira/browse/DRILL-5452
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.10.0, 1.9.0
Reporter: Muhammad Gelbana


The following query
{code:sql}
SELECT * FROM (SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file tc LIMIT 
2147483647) `t0` INNER JOIN (SELECT 'ABC' `UserID` FROM 
`dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON (`t0`.`UserID` IS NOT 
DISTINCT FROM `t1`.`UserID`) LIMIT 2147483647{code}

Leads to the following exception

{preformatted}2017-04-28 16:59:11,722 
[26fca73f-92f0-4664-4dca-88bc48265c92:foreman] INFO  
o.a.d.e.planner.sql.DrillSqlWorker - User Error Occurred
org.apache.drill.common.exceptions.UserException: UNSUPPORTED_OPERATION ERROR: 
This query cannot be planned possibly due to either a cartesian join or an 
inequality join


[Error Id: 672b4f2c-02a3-4004-af4b-279759c36c96 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
 ~[drill-common-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:107)
 [drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1008) 
[drill-java-exec-1.9.0.jar:1.9.0]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:264) 
[drill-java-exec-1.9.0.jar:1.9.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_121]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
Caused by: org.apache.drill.exec.work.foreman.UnsupportedRelOperatorException: 
This query cannot be planned possibly due to either a cartesian join or an 
inequality join
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:432)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:169)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:123)
 [drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:97)
 [drill-java-exec-1.9.0.jar:1.9.0]
... 5 common frames omitted
2017-04-28 16:59:11,741 [USER-rpc-event-queue] ERROR 
o.a.d.exec.server.rest.QueryWrapper - Query Failed
org.apache.drill.common.exceptions.UserRemoteException: UNSUPPORTED_OPERATION 
ERROR: This query cannot be planned possibly due to either a cartesian join or 
an inequality join


[Error Id: 672b4f2c-02a3-4004-af4b-279759c36c96 on mgelbana-incorta:31010]
at 
org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
 [drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:144) 
[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46)
 [drill-rpc-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31)
 [drill-rpc-1.9.0.jar:1.9.0]
at org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:65) 
[drill-rpc-1.9.0.jar:1.9.0]
at org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:363) 
[drill-rpc-1.9.0.jar:1.9.0]
at 
org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89)
 [drill-rpc-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:240) 
[drill-rpc-1.9.0.jar:1.9.0]
at 
org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123) 
[drill-rpc-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:274) 
[drill-rpc-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:245) 
[drill-rpc-1.9.0.jar:1.9.0]
at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
 [netty-codec-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 [netty-transport-4.0.27.Final.jar:4.0.27.Final]
at 
io.netty.handler.timeout.IdleStateHandl

Re: Is it possible to delegate data joins and filtering to the datasource ?

2017-04-12 Thread Muhammad Gelbana

I have done it. Thanks a lot Weijie and all of you for your time.

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Thu, Apr 6, 2017 at 3:15 PM, weijie tong <tongweijie...@gmail.com> wrote:

> some tips:
> 1. you need to know the RexInputRef index relationship between the
>  JoinRel's  and its inputs's  .
>
> join ( 1,2 ,3,4,5)
>
> left input(1,2,3) right input (1,2)
>
> 1,2,3,  ===> left input (1 ,2,3)
>
> 4,5 >right input (1,2)
>
> 2. you capture the index map relationship  when you iterate over your
> JoinRelNode of your defined Rule( CartesianProductJoinRule) , and store
> these index mapping data in your defined BGroupScan( name convention of my
> last example )
> this mapping struct may be:  destination index  ->( source
> ScanRel  :  source Index) .
> to 1 example data ,the struct will be:
> 1 ==>(left scan1   : 1)
> 2 ==>(left scan1  : 2)
> 3 ==>(left scan1  : 3)
> 4 ==>(right scan2  : 1)
> 5 ==>(right scan2  : 2)
>
> 3. you define another Rule (match Project RelNode)which depends on the
> index mapping data of your last step . At this rule you pick the final
> output project's index and pick its mapped index by the mapping struct,
> then you find the final output column name and related tables.
>
>
>
>
> On Tue, Apr 4, 2017 at 1:51 AM, Muhammad Gelbana <m.gelb...@gmail.com>
> wrote:
>
> > I've succeeded, theoretically, in what I wanted to do because I had to
> send
> > the selected columns manually to my datasource. Would someone please tell
> > me how can I identify the selected columns in the join ? I searched a lot
> > without success.
> >
> > *-*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
> > On Sat, Apr 1, 2017 at 1:43 AM, Muhammad Gelbana <m.gelb...@gmail.com>
> > wrote:
> >
> > > So I intend to use this constructor for the new *RelNode*:
> > *org.apache.drill.exec.planner.logical.DrillScanRel.
> > DrillScanRel(RelOptCluster,
> > > RelTraitSet, RelOptTable, GroupScan, RelDataType, List)*
> > >
> > > How can I provide it's parameters ?
> > >
> > >1. *RelOptCluster*: Can I pass *DrillJoinRel.getCluster()* ?
> > >
> > >2. *RelTraitSet*: Can I pass *DrillJoinRel.getTraitSet()* ?
> > >
> > >3. *RelOptTable*: I assume I can use this factory method
> > (*org.apache.calcite.prepare.RelOptTableImpl.create(RelOptSchema,
> > >RelDataType, Table, Path)*). Any hints of how I can provide these
> > >parameters too ? Should I just go ahead and manually create a new
> > instance
> > >of each parameter ?
> > >
> > >4. *GroupScan*: I understand I have to create a new implementation
> > >class for this one so now questions here so far.
> > >
> > >5. *RelDataType*: This one is confusing. Because I understand that
> for
> > >*DrillJoinRel.transformTo(newRel)* to work, I have to provide a
> > >*newRel* instance that has a *RelDataType* instance with the same
> > >amount of fields and compatible types (i.e. this is mandated by
> > *org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelNode,
> > >RelNode, Object)*). Why couldn't I provide a *RelDataType* with
> > >a different set of fields ? How can I resolve this ?
> > >
> > >6. *List*: I assume I can call this method and pass my
> > >columns names to it, one by one. (i.e.
> > >*org.apache.drill.common.expression.SchemaPath.
> > getCompoundPath(String...)*
> > >)
> > >
> > > Thanks.
> > >
> > > *-*
> > > *Muhammad Gelbana*
> > > http://www.linkedin.com/in/mgelbana
> > >
> > > On Fri, Mar 31, 2017 at 1:59 PM, weijie tong <tongweijie...@gmail.com>
> > > wrote:
> > >
> > >> your code seems right , just to implement the 'call.transformTo()'
> ,but
> > >> the
> > >> left detail , maybe I think I can't express the left things so
> > precisely,
> > >> just as @Paul Rogers mentioned the plugin detail is a little trivial.
> > >>
> > >> 1.  drillScanRel.getGroupScan  .
> > >> 2. you need to extend the AbstractGroupScan ,and let it holds some
> > >> information about your storage . This defined GroupScan just call it
> > >> AGroupScan corresponds to a joint scan RelNode. Then you can define
> > >> another
> > >> GroupScan called BGroupScan

Re: Is it possible to delegate data joins and filtering to the datasource ?

2017-04-03 Thread Muhammad Gelbana

I've succeeded, theoretically, in what I wanted to do because I had to send
the selected columns manually to my datasource. Would someone please tell
me how can I identify the selected columns in the join ? I searched a lot
without success.

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Sat, Apr 1, 2017 at 1:43 AM, Muhammad Gelbana <m.gelb...@gmail.com>
wrote:

> So I intend to use this constructor for the new *RelNode*: 
> *org.apache.drill.exec.planner.logical.DrillScanRel.DrillScanRel(RelOptCluster,
> RelTraitSet, RelOptTable, GroupScan, RelDataType, List)*
>
> How can I provide it's parameters ?
>
>1. *RelOptCluster*: Can I pass *DrillJoinRel.getCluster()* ?
>
>2. *RelTraitSet*: Can I pass *DrillJoinRel.getTraitSet()* ?
>
>3. *RelOptTable*: I assume I can use this factory method 
> (*org.apache.calcite.prepare.RelOptTableImpl.create(RelOptSchema,
>RelDataType, Table, Path)*). Any hints of how I can provide these
>parameters too ? Should I just go ahead and manually create a new instance
>of each parameter ?
>
>4. *GroupScan*: I understand I have to create a new implementation
>class for this one so now questions here so far.
>
>5. *RelDataType*: This one is confusing. Because I understand that for
>*DrillJoinRel.transformTo(newRel)* to work, I have to provide a
>*newRel* instance that has a *RelDataType* instance with the same
>amount of fields and compatible types (i.e. this is mandated by 
> *org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelNode,
>RelNode, Object)*). Why couldn't I provide a *RelDataType* with
>a different set of fields ? How can I resolve this ?
>
>6. *List*: I assume I can call this method and pass my
>columns names to it, one by one. (i.e.
>*org.apache.drill.common.expression.SchemaPath.getCompoundPath(String...)*
>)
>
> Thanks.
>
> *-*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Fri, Mar 31, 2017 at 1:59 PM, weijie tong <tongweijie...@gmail.com>
> wrote:
>
>> your code seems right , just to implement the 'call.transformTo()' ,but
>> the
>> left detail , maybe I think I can't express the left things so precisely,
>> just as @Paul Rogers mentioned the plugin detail is a little trivial.
>>
>> 1.  drillScanRel.getGroupScan  .
>> 2. you need to extend the AbstractGroupScan ,and let it holds some
>> information about your storage . This defined GroupScan just call it
>> AGroupScan corresponds to a joint scan RelNode. Then you can define
>> another
>> GroupScan called BGroupScan which extends AGroupScan, The BGroupScan acts
>> as a aggregate container which holds the two joint AGroupScan.
>> 3 . The new DrillScanRel has the same RowType as the JoinRel. The
>> requirement and exmple of transforming between two different RelNodes can
>> be found from other codes. This DrillScanRel's GroupScan is the
>> BGroupScan.
>> This new DrillScanRel is the one applys to the code
>>  `call.transformTo()`.
>>
>> maybe the picture below may help you  understand my idea:
>>
>>
>>  ---Scan (AGroupScan)
>> suppose the initial RelNode tree is : Project Join --|
>>
>>   |   ---Scan (AGroupScan)
>>
>>   |
>>
>>  \|/
>> after applied this rule ,the final tree is: Project-Scan ( BGroupScan
>> (
>> List(AGroupScan ,AGroupScan) ) )
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 30, 2017 at 10:01 PM, Muhammad Gelbana <m.gelb...@gmail.com>
>> wrote:
>>
>> > *This is my rule class*
>> >
>> > public class CartesianProductJoinRule extends RelOptRule {
>> >
>> > public static final CartesianProductJoinRule INSTANCE = new
>> > CartesianProductJoinRule(DrillJoinRel.class);
>> >
>> > public CartesianProductJoinRule(Class clazz) {
>> > super(operand(clazz, operand(RelNode.class, any()),
>> > operand(RelNode.class, any())),
>> > "CartesianProductJoin");
>> > }
>> >
>> > @Override
>> > public boolean matches(RelOptRuleCall call) {
>> > DrillJoinRel drillJoin = call.rel(0);
>> > return drillJoin.getJoinType() == JoinRelType.INNER &&
>> > drillJoin.getCondition().isAlwaysTrue();
>> > }
>> >
>> > @Override
>> > public void onMatch(RelOptRuleCall call) {
>> > DrillJoinRel join = call.rel(0);
>> > RelNode firstRel = call.rel(1);
&g

Re: Is it possible to delegate data joins and filtering to the datasource ?

2017-03-31 Thread Muhammad Gelbana

So I intend to use this constructor for the new *RelNode*:
*org.apache.drill.exec.planner.logical.DrillScanRel.DrillScanRel(RelOptCluster,
RelTraitSet, RelOptTable, GroupScan, RelDataType, List)*

How can I provide it's parameters ?

   1. *RelOptCluster*: Can I pass *DrillJoinRel.getCluster()* ?

   2. *RelTraitSet*: Can I pass *DrillJoinRel.getTraitSet()* ?

   3. *RelOptTable*: I assume I can use this factory method
(*org.apache.calcite.prepare.RelOptTableImpl.create(RelOptSchema,
   RelDataType, Table, Path)*). Any hints of how I can provide these
   parameters too ? Should I just go ahead and manually create a new instance
   of each parameter ?

   4. *GroupScan*: I understand I have to create a new implementation class
   for this one so now questions here so far.

   5. *RelDataType*: This one is confusing. Because I understand that for
   *DrillJoinRel.transformTo(newRel)* to work, I have to provide a *newRel*
   instance that has a *RelDataType* instance with the same amount of
   fields and compatible types (i.e. this is mandated by
*org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelNode,
   RelNode, Object)*). Why couldn't I provide a *RelDataType* with
   a different set of fields ? How can I resolve this ?

   6. *List*: I assume I can call this method and pass my
   columns names to it, one by one. (i.e.
   *org.apache.drill.common.expression.SchemaPath.getCompoundPath(String...)*
   )

Thanks.

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Fri, Mar 31, 2017 at 1:59 PM, weijie tong <tongweijie...@gmail.com>
wrote:

> your code seems right , just to implement the 'call.transformTo()' ,but the
> left detail , maybe I think I can't express the left things so precisely,
> just as @Paul Rogers mentioned the plugin detail is a little trivial.
>
> 1.  drillScanRel.getGroupScan  .
> 2. you need to extend the AbstractGroupScan ,and let it holds some
> information about your storage . This defined GroupScan just call it
> AGroupScan corresponds to a joint scan RelNode. Then you can define another
> GroupScan called BGroupScan which extends AGroupScan, The BGroupScan acts
> as a aggregate container which holds the two joint AGroupScan.
> 3 . The new DrillScanRel has the same RowType as the JoinRel. The
> requirement and exmple of transforming between two different RelNodes can
> be found from other codes. This DrillScanRel's GroupScan is the BGroupScan.
> This new DrillScanRel is the one applys to the code
>  `call.transformTo()`.
>
> maybe the picture below may help you  understand my idea:
>
>
>  ---Scan (AGroupScan)
> suppose the initial RelNode tree is : Project Join --|
>
>   |   ---Scan (AGroupScan)
>
>   |
>
>  \|/
> after applied this rule ,the final tree is: Project-Scan ( BGroupScan (
> List(AGroupScan ,AGroupScan) ) )
>
>
>
>
>
>
>
> On Thu, Mar 30, 2017 at 10:01 PM, Muhammad Gelbana <m.gelb...@gmail.com>
> wrote:
>
> > *This is my rule class*
> >
> > public class CartesianProductJoinRule extends RelOptRule {
> >
> > public static final CartesianProductJoinRule INSTANCE = new
> > CartesianProductJoinRule(DrillJoinRel.class);
> >
> > public CartesianProductJoinRule(Class clazz) {
> > super(operand(clazz, operand(RelNode.class, any()),
> > operand(RelNode.class, any())),
> > "CartesianProductJoin");
> > }
> >
> > @Override
> > public boolean matches(RelOptRuleCall call) {
> > DrillJoinRel drillJoin = call.rel(0);
> > return drillJoin.getJoinType() == JoinRelType.INNER &&
> > drillJoin.getCondition().isAlwaysTrue();
> > }
> >
> > @Override
> > public void onMatch(RelOptRuleCall call) {
> > DrillJoinRel join = call.rel(0);
> > RelNode firstRel = call.rel(1);
> > RelNode secondRel = call.rel(2);
> > HepRelVertex right = (HepRelVertex) join.getRight();
> > HepRelVertex left = (HepRelVertex) join.getLeft();
> >
> > List firstFields = firstRel.getRowType().
> > getFieldList();
> > List secondFields = secondRel.getRowType().
> > getFieldList();
> >
> > RelNode firstTable = ((HepRelVertex)firstRel.
> > getInput(0)).getCurrentRel();
> > RelNode secondTable = ((HepRelVertex)secondRel.
> > getInput(0)).getCurrentRel();
> >
> > //call.transformTo(???);
> > }
> > }
> >
> > *To register the rule*, I overrode the *getOptimizerRules* method in my
> > storage plugin class
> >
> > public Set getOptimizerRules(OptimizerRulesContext
> > optimizerContext, Pla

Re: Is it possible to delegate data joins and filtering to the datasource ?

2017-03-30 Thread Muhammad Gelbana

*This is my rule class*

public class CartesianProductJoinRule extends RelOptRule {

public static final CartesianProductJoinRule INSTANCE = new
CartesianProductJoinRule(DrillJoinRel.class);

public CartesianProductJoinRule(Class clazz) {
super(operand(clazz, operand(RelNode.class, any()),
operand(RelNode.class, any())),
"CartesianProductJoin");
}

@Override
public boolean matches(RelOptRuleCall call) {
DrillJoinRel drillJoin = call.rel(0);
return drillJoin.getJoinType() == JoinRelType.INNER &&
drillJoin.getCondition().isAlwaysTrue();
}

@Override
public void onMatch(RelOptRuleCall call) {
DrillJoinRel join = call.rel(0);
RelNode firstRel = call.rel(1);
RelNode secondRel = call.rel(2);
HepRelVertex right = (HepRelVertex) join.getRight();
HepRelVertex left = (HepRelVertex) join.getLeft();

List firstFields = firstRel.getRowType().
getFieldList();
List secondFields = secondRel.getRowType().
getFieldList();

RelNode firstTable = ((HepRelVertex)firstRel.
getInput(0)).getCurrentRel();
RelNode secondTable = ((HepRelVertex)secondRel.
getInput(0)).getCurrentRel();

//call.transformTo(???);
}
}

*To register the rule*, I overrode the *getOptimizerRules* method in my
storage plugin class

public Set getOptimizerRules(OptimizerRulesContext
optimizerContext, PlannerPhase phase) {
switch (phase) {
case LOGICAL_PRUNE_AND_JOIN:
case LOGICAL_PRUNE:
case LOGICAL:
return getLogicalOptimizerRules(optimizerContext);
case PHYSICAL:
return getPhysicalOptimizerRules(optimizerContext);
case PARTITION_PRUNING:
case JOIN_PLANNING:
*return ImmutableSet.of(CartesianProductJoinRule.INSTANCE);*
default:
return ImmutableSet.of();
}

}

The rule is firing as expected but I'm lost when it comes to the
conversion. Earlier, you said "the new equivalent ScanRel is to have the joined
ScanRel nodes's GroupScans", so

   1. How can I obtain the left and right tables group scans ?
   2. What exactly do you mean by joining them ? Is there a utility method
   to do so ? Or should I manually create a new single group scan and add the
   information I need there ? Looking into other *GroupScan*
   implementations, I found that they have references to some runtime objects
   such as the storage plugin and the storage plugin configuration. At this
   stage, I don't know how to obtain those !
   3. Precisely, what kind of object should I use to represent a *RelNode*
   that represents the whole join ? I understand that I need to use an object
   that has implements the *RelNode* interface. Then I should add the
   created *GroupScan* to that *RelNode* instance and call
   *call.transformTo(newRelNode)*, correct ?

*-----*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Thu, Mar 30, 2017 at 2:46 AM, weijie tong <tongweijie...@gmail.com>
wrote:

> I mean the rule you write could be placed in the PlannerPhase.JOIN_PlANNING
> which uses the HepPlanner. This phase is to solve the logical relnode .
> Hope to help you.
> Muhammad Gelbana <m.gelb...@gmail.com>于2017年3月30日 周四上午12:07写道：
>
> > Thanks a lot Weijie, I believe I'm very close now. I hope you don't mind
> > few more questions please:
> >
> >
> >1. The new rule you are mentioning is a physical rule ? So I should
> >implement the Prel interface ?
> >2. By "traversing the join to find the ScanRel"
> >   - This sounds like I have to "search" for something. Shouldn't I
> just
> >   work on transforming the left (i.e. DrillJoinRel's getLeft()
> method)
> > and
> >   right (i.e. DrillJoinRel's getLeft() method) join objects ?
> >   - The "left" and "right" elements of the DrillJoinRel object are of
> >   type RelSubset, not *ScanRel* and I can't find a type called
> > *ScanRel*.
> >   I suppose you meant *ScanPrel*, specially because it implements the
> >   *Prel* interface that provides the *getPhysicalOperator* method.
> >3. What if multiple physical or logical rules match for a single node,
> >what decides which rule will be applied and which will be rejected ?
> Is
> > it
> >the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method ? What if
> >more than one rule produces the same cost ?
> >
> > I'll go ahead and see what I can do for now before hopefully you may
> offer
> > more guidance. THANKS A LOT.
> >
> > *-*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
> >
> > On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <tongweijie...@gmail.com>
> > wrote

Re: Is it possible to delegate data joins and filtering to the datasource ?

2017-03-29 Thread Muhammad Gelbana

Thanks a lot Weijie, I believe I'm very close now. I hope you don't mind
few more questions please:


   1. The new rule you are mentioning is a physical rule ? So I should
   implement the Prel interface ?
   2. By "traversing the join to find the ScanRel"
  - This sounds like I have to "search" for something. Shouldn't I just
  work on transforming the left (i.e. DrillJoinRel's getLeft() method) and
  right (i.e. DrillJoinRel's getLeft() method) join objects ?
  - The "left" and "right" elements of the DrillJoinRel object are of
  type RelSubset, not *ScanRel* and I can't find a type called *ScanRel*.
  I suppose you meant *ScanPrel*, specially because it implements the
  *Prel* interface that provides the *getPhysicalOperator* method.
   3. What if multiple physical or logical rules match for a single node,
   what decides which rule will be applied and which will be rejected ? Is it
   the *AbstractRelNode.computeSelfCost(RelOptPlanner)* method ? What if
   more than one rule produces the same cost ?

I'll go ahead and see what I can do for now before hopefully you may offer
more guidance. THANKS A LOT.

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Wed, Mar 29, 2017 at 4:23 AM, weijie tong <tongweijie...@gmail.com>
wrote:

> to avoid misunderstanding , the new equivalent ScanRel is to have the
> joined ScanRel nodes's GroupScans, as the GroupScans indirectly hold the
> underlying storage information.
>
> On Wed, Mar 29, 2017 at 10:15 AM, weijie tong <tongweijie...@gmail.com>
> wrote:
>
> >
> > my suggestion is you define a rule which matches the DrillJoinRel RelNode
> > , then at the onMatch method ,you traverse the join children to find the
> > ScanRel nodes . You define a new ScanRel which include the ScanRel nodes
> > you find last step. Then transform the JoinRel to this equivalent new
> > ScanRel.
> > Finally , the plan tree will not have the JoinRel but the ScanRel.   You
> > can let your join plan rule  in the PlannerPhase.JOIN_PLANNING.
> >
>

[jira] [Created] (DRILL-5393) ALTER SESSION documentation page broken link

2017-03-28 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5393:
---

 Summary: ALTER SESSION documentation page broken link
 Key: DRILL-5393
 URL: https://issues.apache.org/jira/browse/DRILL-5393
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Reporter: Muhammad Gelbana


On [this page|https://drill.apache.org/docs/modifying-query-planning-options/], 
there is a link to the ALTER SESSION documentation page which points to this 
broken link: https://drill.apache.org/docs/alter-session/

I believe the correct link should be: https://drill.apache.org/docs/set/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Is it possible to delegate data joins and filtering to the datasource ?

2017-03-28 Thread Muhammad Gelbana

I'm focusing on JOINs now, specially a query such as this: *SELECT * FROM
TABLE1, TABLE2*, drill plans to transform this into 2 separate full scan
queries and then perform the cartesian product join on it's own. I'm trying
to make drill send the query as it is in a single scan (group scan ?)

@weijie

I've found that if I opt-out the JDBC's JdbcDrelConverterRule rule (i.e.
JdbcStoragePlugin.DrillJdbcConvention.DrillJdbcConvention), an exception is
thrown because Drill refuses to plan cartesian product joins. Are you
saying that I need to keep such rule and let Drill plan it to 2 different
group scans, then I should change this plan to merge these 2 group scans
into one ?

Is there a way to make Drill accept planning cartesian product joins ?

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Sun, Mar 26, 2017 at 1:33 AM, Muhammad Gelbana <m.gelb...@gmail.com>
wrote:

> Priceless information ! Thank you all.
>
> I managed to debug Drill in Eclipse hoping to get a better understanding
> but I can't get my head around some stuff:
>
>- What is the purpose of these clases\interfaces:
>   - ConverterRule
>   - DrillRel
>   - Prel
>   - JdbcStoragePlugin.JdbcPrule
>   - JdbcIntermediatePrel
>- What does the words *Prel* and *Prule* stand for ? *Prel*iminary and
>*P*reliminary *Rule* ?
>- What is a calling convention ? (i.e. mentioned in *ConverterRule*'s
>documentation)
>
> Is there a way configure the costing model for the JDBC plugin without
> having to customize it through code ? After all, my ultimate goal is to
> push down filters and joins.
>
> I'll continue debugging\browsing the code and come back with more
> questions, or hopefully an achievement !
>
> Thanks again, your help is very much appreciated.
>
> *-*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>
> On Fri, Mar 24, 2017 at 1:29 AM, weijie tong <tongweijie...@gmail.com>
> wrote:
>
>> I am working on pushing down joins to Druid storage plugin. To my
>> experience, you need to write a rule to know whether the joins could be
>> pushed down by your storage plugin metadata first,then if ok ,you transfer
>> the join node to the scan node with the query relevant information in the
>> scan node. The key point is to do this rule in the HepPlanner.
>> Zelaine Fong <zf...@mapr.com>于2017年3月24日 周五上午5:15写道：
>>
>> > The JDBC storage plugin does attempt to do pushdowns of joins.  However,
>> > the Drill optimizer will evaluate different query plans.  In doing so,
>> it
>> > may choose an alternative plan that does not do a full pushdown if it
>> > believes that’s a less costly plan than a full pushdown.  There are a
>> > number of open bugs with the JDBC storage plugin, including DRILL-4696.
>> > For that particular issue, I believe that when it was investigated, it
>> was
>> > determined that the costing model for the JDBC storage plugin needed
>> more
>> > work.  Hence Drill wasn’t picking the more optimal full pushdown plan.
>> >
>> > -- Zelaine
>> >
>> > On 3/23/17, 1:53 PM, "Paul Rogers" <prog...@mapr.com> wrote:
>> >
>> > Hi Muhammad,
>> >
>> > It seems that the goal for filters should be possible; I’m not
>> > familiar enough with the code to know if joins are currently supported,
>> or
>> > if this is where you’d have to make some contributions to Drill.
>> >
>> > The storage plugin is called at various places in the planning
>> > process, and can insert planning rules. We have plugins that push down
>> > filters, so this seems possible. For example, check Parquet and JDBC for
>> > hints. See my answer to a previous question for hints on how to get
>> started
>> > with storage plugins.
>> >
>> > Joins may be a bit more complex. You’d have to insert planner rules;
>> > such code *may* be available, or may require extensions to Drill. Drill
>> > should certainly do this, so if the code is not there, we’d welcome your
>> > contribution.
>> >
>> > You’d have to create an rule that creates a new scan operator that
>> > includes the information you wish to push down. For example, if you
>> push a
>> > filter, the scan definition (AKA group scan and scan entry) would need
>> to
>> > hold the information needed to implement the push-down. Again, you can
>> > probably find examples of filters, you’d have to be creative to push
>> joins.
>> >
>> > Assembling the pieces: y

Re: Is it possible to delegate data joins and filtering to the datasource ?

2017-03-25 Thread Muhammad Gelbana

Priceless information ! Thank you all.

I managed to debug Drill in Eclipse hoping to get a better understanding
but I can't get my head around some stuff:

   - What is the purpose of these clases\interfaces:
  - ConverterRule
  - DrillRel
  - Prel
  - JdbcStoragePlugin.JdbcPrule
  - JdbcIntermediatePrel
   - What does the words *Prel* and *Prule* stand for ? *Prel*iminary and
   *P*reliminary *Rule* ?
   - What is a calling convention ? (i.e. mentioned in *ConverterRule*'s
   documentation)

Is there a way configure the costing model for the JDBC plugin without
having to customize it through code ? After all, my ultimate goal is to
push down filters and joins.

I'll continue debugging\browsing the code and come back with more
questions, or hopefully an achievement !

Thanks again, your help is very much appreciated.

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Fri, Mar 24, 2017 at 1:29 AM, weijie tong <tongweijie...@gmail.com>
wrote:

> I am working on pushing down joins to Druid storage plugin. To my
> experience, you need to write a rule to know whether the joins could be
> pushed down by your storage plugin metadata first,then if ok ,you transfer
> the join node to the scan node with the query relevant information in the
> scan node. The key point is to do this rule in the HepPlanner.
> Zelaine Fong <zf...@mapr.com>于2017年3月24日 周五上午5:15写道：
>
> > The JDBC storage plugin does attempt to do pushdowns of joins.  However,
> > the Drill optimizer will evaluate different query plans.  In doing so, it
> > may choose an alternative plan that does not do a full pushdown if it
> > believes that’s a less costly plan than a full pushdown.  There are a
> > number of open bugs with the JDBC storage plugin, including DRILL-4696.
> > For that particular issue, I believe that when it was investigated, it
> was
> > determined that the costing model for the JDBC storage plugin needed more
> > work.  Hence Drill wasn’t picking the more optimal full pushdown plan.
> >
> > -- Zelaine
> >
> > On 3/23/17, 1:53 PM, "Paul Rogers" <prog...@mapr.com> wrote:
> >
> > Hi Muhammad,
> >
> > It seems that the goal for filters should be possible; I’m not
> > familiar enough with the code to know if joins are currently supported,
> or
> > if this is where you’d have to make some contributions to Drill.
> >
> > The storage plugin is called at various places in the planning
> > process, and can insert planning rules. We have plugins that push down
> > filters, so this seems possible. For example, check Parquet and JDBC for
> > hints. See my answer to a previous question for hints on how to get
> started
> > with storage plugins.
> >
> > Joins may be a bit more complex. You’d have to insert planner rules;
> > such code *may* be available, or may require extensions to Drill. Drill
> > should certainly do this, so if the code is not there, we’d welcome your
> > contribution.
> >
> > You’d have to create an rule that creates a new scan operator that
> > includes the information you wish to push down. For example, if you push
> a
> > filter, the scan definition (AKA group scan and scan entry) would need to
> > hold the information needed to implement the push-down. Again, you can
> > probably find examples of filters, you’d have to be creative to push
> joins.
> >
> > Assembling the pieces: your plugin would add planner rules that
> > determine when joins can be pushed. Those rules would case your plugin to
> > create a semantic node (group scan) that holds the required information.
> > The planner then converts group scan nodes to specific plans passed to
> the
> > execution engine. On the execution side, your plugin provides a “Record
> > Reader” for your format, and that reader does the actual work to push the
> > filter or join down to your data source.
> >
> > Your best bet is to mine existing plugins for ideas, and then
> > experiment. Start simply and gradually add functionality. And, ask
> > questions back on this list.
> >
> >
> > Thanks,
> >
> > - Paul
> >
> > > On Mar 22, 2017, at 8:20 AM, Muhammad Gelbana <m.gelb...@gmail.com
> >
> > wrote:
> > >
> > > I'm trying to use Drill with a proprietary datasource that is very
> > fast in
> > > applying data joins (i.e. SQL joins) and query filters (i.e. SQL
> > where
> > > conditions).
> > >
> > > To connect to that datasource, I first have to write a storage
> > plugin, but
> > > I'm

Is it possible to delegate data joins and filtering to the datasource ?

2017-03-22 Thread Muhammad Gelbana

I'm trying to use Drill with a proprietary datasource that is very fast in
applying data joins (i.e. SQL joins) and query filters (i.e. SQL where
conditions).

To connect to that datasource, I first have to write a storage plugin, but
I'm not sure if my main goal is applicable.

May main goal is to configure Drill to let the datasource perform JOINS and
filters and only return the data. Then drill can perform further processing
based on the original SQL query sent to Drill.

Is this possible by developing a storage plugin ? Where exactly should I be
looking ?

I've been going through this wiki
<https://github.com/paul-rogers/drill/wiki> and I don't think I understood
every concept. So if there is another source of information about storage
plugins development, please point it out.

*-----*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

Re: A tutorial on how to write a custom storage plugin

2017-03-16 Thread Muhammad Gelbana

That's very helpful Paul ! But I can't access the wiki page you referenced.
May be it's set to private ?

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Thu, Mar 16, 2017 at 1:50 AM, Paul Rogers <prog...@mapr.com> wrote:

> Hi Muhammad,
>
> I know of no tutorial. I recently updated the “mock” storage plugin and so
> have some experience with this interface. You can find my notes at [1].
>
> Unfortunately, creating a storage plugin seems to require a significant
> commitment of time and effort because you must understand:
>
> * The storage plugin structure. Some bits are a bit unusual (such as
> bindings between the various bits and pieces.)
> * Enough about Calcite to provide it with the required plan-time
> information.
> * The Jackson-serialization structure for various components.
> * The rather complex process by which you work with ScanBatch and value
> vectors to get data from your data source into value vectors.
> * Drill as a whole so you can build Drill and debug it. The only way to
> test a plugin is by running it inside Drill.
>
> Your best approach is to carefully study existing plugins. You can start
> simple, say with the mock plugin. Replace the bits of the mock
> implementation with your own. Try to get it to work for a single table.
> Then, add other functionality gradually.
>
> By the time you are done you will be well on your way to being an expert
> in some of Drill’s internals.
>
> Thanks,
>
> - Paul
>
> [1] https://github.com/paul-rogers/drill/wiki/Storage-Plugin-Model
>
> > On Mar 15, 2017, at 10:35 AM, Muhammad Gelbana <m.gelb...@gmail.com>
> wrote:
> >
> > Everyone,
> >
> > Is there a tutorial on how to write a custom storage plugin to support
> some
> > sort of a proprietary data source ?
> >
> > I understand I can configure a storage plugin based on pre-shipped
> storage
> > plugins such as the one for MongoDB, MySQL\JDBC, HBase, Hadoop HDFS..etc,
> > but that's not what I need.
> >
> > I need to write a new plugin to support a storage that is not publicly
> > available.
> >
> > The best that I've got so far is the information on these pages:
> >
> >   -
> >   https://github.com/apache/drill/blob/master/exec/java-
> exec/src/main/java/org/apache/drill/exec/store/StoragePlugin.java
> >   -
> >   https://github.com/apache/drill/tree/master/contrib/
> storage-kudu/src/main/java/org/apache/drill/exec/store/kudu
> >
> >
> > But this isn't enough to understand what needs to be done or troubleshoot
> > errors while developing the plugin.
> >
> > *-*
> > *Muhammad Gelbana*
> > http://www.linkedin.com/in/mgelbana
>
>

A tutorial on how to write a custom storage plugin

2017-03-15 Thread Muhammad Gelbana

Everyone,

Is there a tutorial on how to write a custom storage plugin to support some
sort of a proprietary data source ?

I understand I can configure a storage plugin based on pre-shipped storage
plugins such as the one for MongoDB, MySQL\JDBC, HBase, Hadoop HDFS..etc,
but that's not what I need.

I need to write a new plugin to support a storage that is not publicly
available.

The best that I've got so far is the information on these pages:

   -
   
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/StoragePlugin.java
   -
   
https://github.com/apache/drill/tree/master/contrib/storage-kudu/src/main/java/org/apache/drill/exec/store/kudu


But this isn't enough to understand what needs to be done or troubleshoot
errors while developing the plugin.

*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

[jira] [Created] (DRILL-5300) SYSTEM ERROR: IllegalStateException: Memory was leaked by query while querying parquet files

2017-02-27 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5300:
---

 Summary: SYSTEM ERROR: IllegalStateException: Memory was leaked by 
query while querying parquet files
 Key: DRILL-5300
 URL: https://issues.apache.org/jira/browse/DRILL-5300
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.9.0
 Environment: OS: Linux
Reporter: Muhammad Gelbana
 Attachments: both_queries_logs.zip

Running the following query against parquet files (I modified some values for 
privacy reasons)
{code:title=Query causing the long logs|borderStyle=solid}
SELECT AL4.NAME, AL5.SEGMENT2, SUM(AL1.AMOUNT), AL2.ATTRIBUTE4, 
AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, 
AL11.NAME FROM 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA__TRX_LINE_GL_DIST_ALL`
 AL1, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_XX/RA_OMER_TRX_ALL`
 AL2, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX` 
AL3, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_HR_COMMON/HR_ALL_ORGANIZATION_UNITS`
 AL4, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_CODE_COMBINATIONS`
 AL5, 
dfs.`/disk2/XXX/XXX//data/../parquet//XXAT_AR_MU_TAB` 
AL8, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_FIN_COMMON/GL_XXX` 
AL11, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
 AL12, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
 AL13, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
 AL14, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
 AL15, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___S_ALL`
 AL16, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX___USES_ALL`
 AL17, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_LOCATIONS`
 AL18, 
dfs.`/disk2/XXX/XXX//data/../parquet/XXX_X_COMMON/XX_X_S`
 AL19 WHERE (AL2.SHIP_TO__USE_ID = AL15._USE_ID AND 
AL15.___ID = AL14.___ID AND AL14.X__ID = 
AL12.X__ID AND AL12.LOCATION_ID = AL13.LOCATION_ID AND 
AL17.___ID = AL16.___ID AND AL16.X__ID = 
AL19.X__ID AND AL19.LOCATION_ID = AL18.LOCATION_ID AND 
AL2.BILL_TO__USE_ID = AL17._USE_ID AND AL2.SET_OF_X_ID = 
AL3.SET_OF_X_ID AND AL1.CODE_COMBINATION_ID = AL5.CODE_COMBINATION_ID AND 
AL5.SEGMENT4 = AL8.MU AND AL1.SET_OF_X_ID = AL11.SET_OF_X_ID AND 
AL2.ORG_ID = AL4.ORGANIZATION_ID AND AL2.OMER_TRX_ID = AL1.OMER_TRX_ID) 
AND ((AL5.SEGMENT2 = '41' AND AL1.AMOUNT <> 0 AND AL4.NAME IN 
('XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 
'XXX-XX-', 'XXX-XX-', 'XXX-XX-', 'XXX-XX-') AND AL3.NAME like 
'%-PR-%')) GROUP BY AL4.NAME, AL5.SEGMENT2, AL2.ATTRIBUTE4, 
AL2.XXX__CODE, AL8.D_BU, AL8.F_PL, AL18.COUNTRY, AL13.COUNTRY, 
AL11.NAME
{code}

{code:title=Query causing the short logs|borderStyle=solid}
SELECT AL11.NAME
FROM
dfs.`/XXX/XXX/XXX/data/../parquet/XXX_XXX_COMMON/GL_XXX` 
LIMIT 10
{code}
This issue may be a duplicate for [this 
one|https://issues.apache.org/jira/browse/DRILL-4398] but I created a new one 
based on [this 
suggestion|https://issues.apache.org/jira/browse/DRILL-4398?focusedCommentId=15884846=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15884846].



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Where can I find enough information to help me fix these issues ?

2017-01-16 Thread Muhammad Gelbana

Everyone,

I'm facing 2 issues with Apache Drill:

   - DRILL-5197 <https://issues.apache.org/jira/browse/DRILL-5197>
   - DRILL-5193 <https://issues.apache.org/jira/browse/DRILL-5193>


And it's urgent for me to have them fixed so I tried fixing them myself. I
cloned this repository <https://github.com/apache/drill.git> and
successfully built the project using maven (i.e. mvn clean package)

Now I can't decide were or how to start ! If I attempt to open a class I
found in a thrown exception, I find multiple copies of the same .java file !


   - So how can I decide which one I should edit ?
   - In some classes, I found this syntax "*<#if entry.hiveType ==
   "BOOLEAN">*" what is this syntax and what is it for ?!
   - Is there a document or a set of documents that would answer
   development questions so I can easily start contributing if I can ?


*-*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

[jira] [Created] (DRILL-5197) CASE statement fails due to error: Unable to get value vector class for minor type [NULL] and mode [OPTIONAL]

2017-01-15 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5197:
---

 Summary: CASE statement fails due to error: Unable to get value 
vector class for minor type [NULL] and mode [OPTIONAL]
 Key: DRILL-5197
 URL: https://issues.apache.org/jira/browse/DRILL-5197
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types
Affects Versions: 1.9.0
Reporter: Muhammad Gelbana


The following query fails for no obvious reason
{code:sql}
SELECT
   CASE
  WHEN `tname`.`full_name` = 'ABC' 
  THEN
( 
 CASE
WHEN `tname`.`full_name` = 'ABC' 
THEN
   (
  CASE
 WHEN `tname`.`full_name` = ' ' 
 THEN
(
   CASE
  WHEN `tname`.`full_name` = 'ABC' 
  THEN `tname`.`full_name` 
  ELSE NULL 
   END
)
ELSE NULL 
  END
   )
   ELSE NULL 
 END
 )
 WHEN `tname`.`full_name` = 'ABC' 
 THEN NULL 
 ELSE NULL 
   END
FROM
   cp.`employee.json` `tname`
{code}
If the `THEN `tname`.`full_name`` statements is changed to `THEN 'ABC'` the 
error does not occur.

Thrown exception
{quote}
[Error Id: e75fd0fe-132b-4eb4-b2e8-7b34dc39657e on mgelbana-incorta:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
 ~[drill-common-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:293)
 [drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 [drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:262)
 [drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.9.0.jar:1.9.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_111]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Caused by: java.lang.UnsupportedOperationException: Unable to get value vector 
class for minor type [NULL] and mode [OPTIONAL]
at 
org.apache.drill.exec.expr.BasicTypeHelper.getValueVectorClass(BasicTypeHelper.java:441)
 ~[vector-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.record.VectorContainer.addOrGet(VectorContainer.java:123) 
~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:463)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:135)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) 
~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) 
~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:232)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:226)
 ~[drill-java-exec-1.9.0.jar:1.9.0]
at java.security.AccessController.doPrivileged(Native Method) 
~[na:1.8.0_111]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[na:1.8.0_111]
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 ~[hadoop-common-2.7.1.jar:na]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:226)
 [drill-java-exec-1.9.0.jar:1.9.0]
... 4 common frames omitted
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5193) UDF returns NULL as expected only if the input is a literal

2017-01-12 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5193:
---

 Summary: UDF returns NULL as expected only if the input is a 
literal
 Key: DRILL-5193
 URL: https://issues.apache.org/jira/browse/DRILL-5193
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.9.0
Reporter: Muhammad Gelbana


I defined the following UDF
{code:title=SplitPartFunc.java|borderStyle=solid}
import javax.inject.Inject;

import org.apache.drill.exec.expr.DrillSimpleFunc;
import org.apache.drill.exec.expr.annotations.FunctionTemplate;
import org.apache.drill.exec.expr.annotations.Output;
import org.apache.drill.exec.expr.annotations.Param;
import org.apache.drill.exec.expr.holders.IntHolder;
import org.apache.drill.exec.expr.holders.NullableVarCharHolder;
import org.apache.drill.exec.expr.holders.VarCharHolder;

import io.netty.buffer.DrillBuf;

@FunctionTemplate(name = "split_string", scope = 
FunctionTemplate.FunctionScope.SIMPLE, nulls = 
FunctionTemplate.NullHandling.NULL_IF_NULL)
public class SplitPartFunc implements DrillSimpleFunc {

@Param
VarCharHolder input;

@Param(constant = true)
VarCharHolder delimiter;

@Param(constant = true)
IntHolder field;

@Output
NullableVarCharHolder out;

@Inject
DrillBuf buffer;

public void setup() {
}

public void eval() {

String stringValue = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(input.start,
 input.end, input.buffer);

out.buffer = buffer; //If I return before this statement, a NPE is 
thrown :(
if(stringValue == null){
return;
}

int fieldValue = field.value;
if(fieldValue <= 0){
return; 
}

String delimiterValue = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(delimiter.start,
 delimiter.end, delimiter.buffer);
if(delimiterValue == null){
return;
}

String[] splittedInput = stringValue.split(delimiterValue);
if(splittedInput.length < fieldValue){
return;
}

// put the output value in the out buffer
String outputValue = splittedInput[fieldValue - 1];
out.start = 0;
out.end = outputValue.getBytes().length;
buffer.setBytes(0, outputValue.getBytes());
out.isSet = 1;
}

}
{code}

If I run the following query on the sample employees.json file (or actually a 
parquet, after modifying the table and columns names)

{code:title=SQL Query|borderStyle=solid}SELECT full_name, 
split_string(full_name, ' ', 4), split_string('Whatever', ' ', 4) FROM 
cp.employee.json LIMIT 1{code}

I get the following result
!https://i.stack.imgur.com/L8uQW.png!

Shouldn't I be getting the column value and null for the other 2 columns ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (DRILL-5194) UDF returns NULL as expected only if the input is a literal

2017-01-12 Thread Muhammad Gelbana (JIRA)

Muhammad Gelbana created DRILL-5194:
---

 Summary: UDF returns NULL as expected only if the input is a 
literal
 Key: DRILL-5194
 URL: https://issues.apache.org/jira/browse/DRILL-5194
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.9.0
Reporter: Muhammad Gelbana


I defined the following UDF
{code:title=SplitPartFunc.java|borderStyle=solid}
import javax.inject.Inject;

import org.apache.drill.exec.expr.DrillSimpleFunc;
import org.apache.drill.exec.expr.annotations.FunctionTemplate;
import org.apache.drill.exec.expr.annotations.Output;
import org.apache.drill.exec.expr.annotations.Param;
import org.apache.drill.exec.expr.holders.IntHolder;
import org.apache.drill.exec.expr.holders.NullableVarCharHolder;
import org.apache.drill.exec.expr.holders.VarCharHolder;

import io.netty.buffer.DrillBuf;

@FunctionTemplate(name = "split_string", scope = 
FunctionTemplate.FunctionScope.SIMPLE, nulls = 
FunctionTemplate.NullHandling.NULL_IF_NULL)
public class SplitPartFunc implements DrillSimpleFunc {

@Param
VarCharHolder input;

@Param(constant = true)
VarCharHolder delimiter;

@Param(constant = true)
IntHolder field;

@Output
NullableVarCharHolder out;

@Inject
DrillBuf buffer;

public void setup() {
}

public void eval() {

String stringValue = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(input.start,
 input.end, input.buffer);

out.buffer = buffer; //If I return before this statement, a NPE is 
thrown :(
if(stringValue == null){
return;
}

int fieldValue = field.value;
if(fieldValue <= 0){
return; 
}

String delimiterValue = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.toStringFromUTF8(delimiter.start,
 delimiter.end, delimiter.buffer);
if(delimiterValue == null){
return;
}

String[] splittedInput = stringValue.split(delimiterValue);
if(splittedInput.length < fieldValue){
return;
}

// put the output value in the out buffer
String outputValue = splittedInput[fieldValue - 1];
out.start = 0;
out.end = outputValue.getBytes().length;
buffer.setBytes(0, outputValue.getBytes());
out.isSet = 1;
}

}
{code}

If I run the following query on the sample employees.json file (or actually a 
parquet, after modifying the table and columns names)

{code:title=SQL Query|borderStyle=solid}SELECT full_name, 
split_string(full_name, ' ', 4), split_string('Whatever', ' ', 4) FROM 
cp.employee.json LIMIT 1{code}

I get the following result
!https://i.stack.imgur.com/L8uQW.png!

Shouldn't I be getting the column value and null for the other 2 columns ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

72 matches

Mail list logo