Re: Recursive CTE Support in Drill

Jason Altekruse Thu, 16 Jul 2015 15:44:27 -0700

@Alexander If you want to test the speed of the ODBC driver you can do that
without a new storage plugin.


If you get the entire dataset into memory, it will be returned from Drill a
quickly as we can possibly send it to the client. One way to do this is to
insert a sort; we cannot send along any of the data until the compete sort
is done. As long as you don't read so much data that we will start spilling
the sort to disk, all of the records will be in memory. To take the read
and sort time out of your test, just make sure to record the time you first
receive data from Drill, not the query start time.

There is one gotcha here. To make the BI tools more responsive, we
implemented a feature that will send along one empty batch of records with
the schema information populated. This schema is generated by applying all
of the transformations that happen throughout the query. For example, the
join operator handles this schema population by sending along the schema
merged from the two sides of the join, project will similarly add or remove
column based on the expressions and columns requested. You will want to
make sure you record your start time when you receive the first batch with
actual records. This can give you an accurate measurement of the ODBC
performance, removing the bottleneck of the disk.

On Thu, Jul 16, 2015 at 3:24 PM, Alexander Zarei <alexanderz.si...@gmail.com
> wrote:

> Thanks for the answers.
>
> @Ted my only goal is to pump a large amount of data without having to read
> from Hard Disk. I am measuring the ODBC driver performance and I need a
> higher data transfer rate. So any method that helps pumping data out of
> Drill faster would help. The log-synth seems a good way to generate data
> for testing. However, I'd need a ram only option which hopefully provides a
> higher throughput.
>
> @Jacques How involved is it to write a dummy plugin that returns one
> hardcoded row repeatedly 12 million times?
>
> Thanks,
> Alex
>
> On Fri, Jul 10, 2015 at 12:56 PM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
>
> > It may be easy, but it is completely opaque about what really needs to
> > happen.
> >
> > For instance,
> >
> > 1) how is schema exposed?
> >
> > 2) which classes do I really need to implement?
> >
> > 3) how do I express partitioning of a format?
> >
> > 4) how do I test it?
> >
> > Just a bit of documentation and comments would go a very, very long way.
> >
> > Even answers on the mailing list that have more details than "oh, that's
> > easy".  I would be happy to transcribe answers into the code if I could
> > just get some.
> >
> >
> >
> > On Fri, Jul 10, 2015 at 11:04 AM, Jacques Nadeau <jacq...@apache.org>
> > wrote:
> >
> > > Creating an EasyFormatPlugin is pretty simple.  They were designed to
> get
> > > rid of much of the scaffolding required for a standard FormatPlugin.
> > >
> > > JSON
> > >
> > >
> >
> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json
> > >
> > > Text
> > >
> > >
> >
> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text
> > >
> > > AVRO
> > >
> > >
> >
> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/avro
> > >
> > > In all cases, the connection code is pretty light.  A fully schematized
> > > format like log-synth should be even simpler to implement.
> > >
> > > On Fri, Jul 10, 2015 at 10:58 AM, Ted Dunning <ted.dunn...@gmail.com>
> > > wrote:
> > >
> > > > I don't think we need a full on storage plugin.  I think a data
> format
> > > > should be sufficient, basically CSV on steroids.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Jul 10, 2015 at 10:47 AM, Abdel Hakim Deneche <
> > > > adene...@maprtech.com
> > > > > wrote:
> > > >
> > > > > Yeah, we still lack documentation on how to write a storage plugin.
> > One
> > > > > advice I've been seeing a lot is to take a look at the mongo-db
> > plugin,
> > > > it
> > > > > was basically added in one single commit:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304
> > > > >
> > > > > I think this will give some general ideas on what to expect when
> > > writing
> > > > a
> > > > > storage plugin.
> > > > >
> > > > > On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning <
> ted.dunn...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hakim,
> > > > > >
> > > > > > Not yet.  Still very much in the stage of gathering feedback.
> > > > > >
> > > > > > I would think it very simple.  The biggest obstacles are
> > > > > >
> > > > > > 1) no documentation on how to write a data format
> > > > > >
> > > > > > 2) I need to release a jar for log-synth to Maven Central.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche <
> > > > > > adene...@maprtech.com>
> > > > > > wrote:
> > > > > >
> > > > > > > @Ted, the log-synth storage format would be really useful. I'm
> > > > already
> > > > > > > seeing many unit tests that could benefit from this. Do you
> have
> > a
> > > > > github
> > > > > > > repo for your ongoing work ?
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning <
> > > ted.dunn...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Are you hard set on using common table expressions?
> > > > > > > >
> > > > > > > > I have discussed a bit off-list creating a data format that
> > would
> > > > > allow
> > > > > > > > tables to be read from a log-synth [1] schema.  That would
> let
> > > you
> > > > > read
> > > > > > > as
> > > > > > > > much data as you might like with an arbitrarily complex (or
> > > simple)
> > > > > > > query.
> > > > > > > >
> > > > > > > > Operationally, you would create a file containing a log-synth
> > > > schema
> > > > > > that
> > > > > > > > has the extension .synth.  Your data source would have to be
> > > > > configured
> > > > > > > to
> > > > > > > > connect that extension with the log-synth format.  At that
> > point,
> > > > you
> > > > > > > could
> > > > > > > > select as much or little data as you like from the file and
> you
> > > > would
> > > > > > see
> > > > > > > > generated data rather than the schema.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > [1] https://github.com/tdunning/log-synth
> > > > > > > >
> > > > > > > > On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei <
> > > > > > > > alexanderz.si...@gmail.com
> > > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > I am trying to come up with a query which returns a given
> > > number
> > > > of
> > > > > > > rows
> > > > > > > > > without having a real table on Storage.
> > > > > > > > >
> > > > > > > > > I am hoping to achieve something like this:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table
> > > > > > > > >
> > > > > > > > > DECLARE @start INT = 1;DECLARE @end INT = 1000000;
> > > > > > > > > WITH numbers AS (
> > > > > > > > >     SELECT @start AS number
> > > > > > > > >     UNION ALL
> > > > > > > > >     SELECT number + 1
> > > > > > > > >     FROM  numbers
> > > > > > > > >     WHERE number < @end)SELECT *FROM numbersOPTION
> > > (MAXRECURSION
> > > > > 0);
> > > > > > > > >
> > > > > > > > > I do not actually need to create different values and
> > returning
> > > > > > > identical
> > > > > > > > > rows would work too.I just need to bypass the "from clause"
> > in
> > > > the
> > > > > > > query.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Alex
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Abdelhakim Deneche
> > > > > > >
> > > > > > > Software Engineer
> > > > > > >
> > > > > > >   <http://www.mapr.com/>
> > > > > > >
> > > > > > >
> > > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Recursive CTE Support in Drill

Reply via email to