Re: Recursive CTE Support in Drill

Ted Dunning Fri, 10 Jul 2015 12:58:58 -0700

It may be easy, but it is completely opaque about what really needs to
happen.


For instance,

1) how is schema exposed?

2) which classes do I really need to implement?

3) how do I express partitioning of a format?

4) how do I test it?

Just a bit of documentation and comments would go a very, very long way.

Even answers on the mailing list that have more details than "oh, that's
easy".  I would be happy to transcribe answers into the code if I could
just get some.



On Fri, Jul 10, 2015 at 11:04 AM, Jacques Nadeau <jacq...@apache.org> wrote:

> Creating an EasyFormatPlugin is pretty simple.  They were designed to get
> rid of much of the scaffolding required for a standard FormatPlugin.
>
> JSON
>
> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json
>
> Text
>
> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text
>
> AVRO
>
> https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/avro
>
> In all cases, the connection code is pretty light.  A fully schematized
> format like log-synth should be even simpler to implement.
>
> On Fri, Jul 10, 2015 at 10:58 AM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
>
> > I don't think we need a full on storage plugin.  I think a data format
> > should be sufficient, basically CSV on steroids.
> >
> >
> >
> >
> >
> > On Fri, Jul 10, 2015 at 10:47 AM, Abdel Hakim Deneche <
> > adene...@maprtech.com
> > > wrote:
> >
> > > Yeah, we still lack documentation on how to write a storage plugin. One
> > > advice I've been seeing a lot is to take a look at the mongo-db plugin,
> > it
> > > was basically added in one single commit:
> > >
> > >
> > >
> >
> https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304
> > >
> > > I think this will give some general ideas on what to expect when
> writing
> > a
> > > storage plugin.
> > >
> > > On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning <ted.dunn...@gmail.com>
> > > wrote:
> > >
> > > > Hakim,
> > > >
> > > > Not yet.  Still very much in the stage of gathering feedback.
> > > >
> > > > I would think it very simple.  The biggest obstacles are
> > > >
> > > > 1) no documentation on how to write a data format
> > > >
> > > > 2) I need to release a jar for log-synth to Maven Central.
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche <
> > > > adene...@maprtech.com>
> > > > wrote:
> > > >
> > > > > @Ted, the log-synth storage format would be really useful. I'm
> > already
> > > > > seeing many unit tests that could benefit from this. Do you have a
> > > github
> > > > > repo for your ongoing work ?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning <
> ted.dunn...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Are you hard set on using common table expressions?
> > > > > >
> > > > > > I have discussed a bit off-list creating a data format that would
> > > allow
> > > > > > tables to be read from a log-synth [1] schema.  That would let
> you
> > > read
> > > > > as
> > > > > > much data as you might like with an arbitrarily complex (or
> simple)
> > > > > query.
> > > > > >
> > > > > > Operationally, you would create a file containing a log-synth
> > schema
> > > > that
> > > > > > has the extension .synth.  Your data source would have to be
> > > configured
> > > > > to
> > > > > > connect that extension with the log-synth format.  At that point,
> > you
> > > > > could
> > > > > > select as much or little data as you like from the file and you
> > would
> > > > see
> > > > > > generated data rather than the schema.
> > > > > >
> > > > > >
> > > > > >
> > > > > > [1] https://github.com/tdunning/log-synth
> > > > > >
> > > > > > On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei <
> > > > > > alexanderz.si...@gmail.com
> > > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I am trying to come up with a query which returns a given
> number
> > of
> > > > > rows
> > > > > > > without having a real table on Storage.
> > > > > > >
> > > > > > > I am hoping to achieve something like this:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table
> > > > > > >
> > > > > > > DECLARE @start INT = 1;DECLARE @end INT = 1000000;
> > > > > > > WITH numbers AS (
> > > > > > >     SELECT @start AS number
> > > > > > >     UNION ALL
> > > > > > >     SELECT number + 1
> > > > > > >     FROM  numbers
> > > > > > >     WHERE number < @end)SELECT *FROM numbersOPTION
> (MAXRECURSION
> > > 0);
> > > > > > >
> > > > > > > I do not actually need to create different values and returning
> > > > > identical
> > > > > > > rows would work too.I just need to bypass the "from clause" in
> > the
> > > > > query.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Alex
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Abdelhakim Deneche
> > > > >
> > > > > Software Engineer
> > > > >
> > > > >   <http://www.mapr.com/>
> > > > >
> > > > >
> > > > > Now Available - Free Hadoop On-Demand Training
> > > > > <
> > > > >
> > > >
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Abdelhakim Deneche
> > >
> > > Software Engineer
> > >
> > >   <http://www.mapr.com/>
> > >
> > >
> > > Now Available - Free Hadoop On-Demand Training
> > > <
> > >
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > > >
> > >
> >
>

Re: Recursive CTE Support in Drill

Reply via email to