Re: Recursive CTE Support in Drill

Alexander Zarei Thu, 16 Jul 2015 15:27:15 -0700

Thanks for the answers.

@Ted my only goal is to pump a large amount of data without having to read
from Hard Disk. I am measuring the ODBC driver performance and I need a
higher data transfer rate. So any method that helps pumping data out of
Drill faster would help. The log-synth seems a good way to generate data
for testing. However, I'd need a ram only option which hopefully provides a
higher throughput.


@Jacques How involved is it to write a dummy plugin that returns one
hardcoded row repeatedly 12 million times?

Thanks,
Alex

On Fri, Jul 10, 2015 at 12:56 PM, Ted Dunning <[email protected]> wrote:

> It may be easy, but it is completely opaque about what really needs to
> happen.
>
> For instance,
>
> 1) how is schema exposed?
>
> 2) which classes do I really need to implement?
>
> 3) how do I express partitioning of a format?
>
> 4) how do I test it?
>
> Just a bit of documentation and comments would go a very, very long way.
>
> Even answers on the mailing list that have more details than "oh, that's
> easy".  I would be happy to transcribe answers into the code if I could
> just get some.
>
>
>
> On Fri, Jul 10, 2015 at 11:04 AM, Jacques Nadeau <[email protected]>
> wrote:
>
> > Creating an EasyFormatPlugin is pretty simple.  They were designed to get
> > rid of much of the scaffolding required for a standard FormatPlugin.
> >
> > JSON
> >
> >
> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json
> >
> > Text
> >
> >
> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text
> >
> > AVRO
> >
> >
> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/avro
> >
> > In all cases, the connection code is pretty light.  A fully schematized
> > format like log-synth should be even simpler to implement.
> >
> > On Fri, Jul 10, 2015 at 10:58 AM, Ted Dunning <[email protected]>
> > wrote:
> >
> > > I don't think we need a full on storage plugin.  I think a data format
> > > should be sufficient, basically CSV on steroids.
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jul 10, 2015 at 10:47 AM, Abdel Hakim Deneche <
> > > [email protected]
> > > > wrote:
> > >
> > > > Yeah, we still lack documentation on how to write a storage plugin.
> One
> > > > advice I've been seeing a lot is to take a look at the mongo-db
> plugin,
> > > it
> > > > was basically added in one single commit:
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304
> > > >
> > > > I think this will give some general ideas on what to expect when
> > writing
> > > a
> > > > storage plugin.
> > > >
> > > > On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning <[email protected]>
> > > > wrote:
> > > >
> > > > > Hakim,
> > > > >
> > > > > Not yet.  Still very much in the stage of gathering feedback.
> > > > >
> > > > > I would think it very simple.  The biggest obstacles are
> > > > >
> > > > > 1) no documentation on how to write a data format
> > > > >
> > > > > 2) I need to release a jar for log-synth to Maven Central.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > @Ted, the log-synth storage format would be really useful. I'm
> > > already
> > > > > > seeing many unit tests that could benefit from this. Do you have
> a
> > > > github
> > > > > > repo for your ongoing work ?
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Are you hard set on using common table expressions?
> > > > > > >
> > > > > > > I have discussed a bit off-list creating a data format that
> would
> > > > allow
> > > > > > > tables to be read from a log-synth [1] schema.  That would let
> > you
> > > > read
> > > > > > as
> > > > > > > much data as you might like with an arbitrarily complex (or
> > simple)
> > > > > > query.
> > > > > > >
> > > > > > > Operationally, you would create a file containing a log-synth
> > > schema
> > > > > that
> > > > > > > has the extension .synth.  Your data source would have to be
> > > > configured
> > > > > > to
> > > > > > > connect that extension with the log-synth format.  At that
> point,
> > > you
> > > > > > could
> > > > > > > select as much or little data as you like from the file and you
> > > would
> > > > > see
> > > > > > > generated data rather than the schema.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > [1] https://github.com/tdunning/log-synth
> > > > > > >
> > > > > > > On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei <
> > > > > > > [email protected]
> > > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > I am trying to come up with a query which returns a given
> > number
> > > of
> > > > > > rows
> > > > > > > > without having a real table on Storage.
> > > > > > > >
> > > > > > > > I am hoping to achieve something like this:
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table
> > > > > > > >
> > > > > > > > DECLARE @start INT = 1;DECLARE @end INT = 1000000;
> > > > > > > > WITH numbers AS (
> > > > > > > >     SELECT @start AS number
> > > > > > > >     UNION ALL
> > > > > > > >     SELECT number + 1
> > > > > > > >     FROM  numbers
> > > > > > > >     WHERE number < @end)SELECT *FROM numbersOPTION
> > (MAXRECURSION
> > > > 0);
> > > > > > > >
> > > > > > > > I do not actually need to create different values and
> returning
> > > > > > identical
> > > > > > > > rows would work too.I just need to bypass the "from clause"
> in
> > > the
> > > > > > query.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Alex
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Abdelhakim Deneche
> > > > > >
> > > > > > Software Engineer
> > > > > >
> > > > > >   <http://www.mapr.com/>
> > > > > >
> > > > > >
> > > > > > Now Available - Free Hadoop On-Demand Training
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Abdelhakim Deneche
> > > >
> > > > Software Engineer
> > > >
> > > >   <http://www.mapr.com/>
> > > >
> > > >
> > > > Now Available - Free Hadoop On-Demand Training
> > > > <
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > >
> > > >
> > >
> >
>

Re: Recursive CTE Support in Drill

Reply via email to