Thanks for the answers. @Ted my only goal is to pump a large amount of data without having to read from Hard Disk. I am measuring the ODBC driver performance and I need a higher data transfer rate. So any method that helps pumping data out of Drill faster would help. The log-synth seems a good way to generate data for testing. However, I'd need a ram only option which hopefully provides a higher throughput.
@Jacques How involved is it to write a dummy plugin that returns one hardcoded row repeatedly 12 million times? Thanks, Alex On Fri, Jul 10, 2015 at 12:56 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > It may be easy, but it is completely opaque about what really needs to > happen. > > For instance, > > 1) how is schema exposed? > > 2) which classes do I really need to implement? > > 3) how do I express partitioning of a format? > > 4) how do I test it? > > Just a bit of documentation and comments would go a very, very long way. > > Even answers on the mailing list that have more details than "oh, that's > easy". I would be happy to transcribe answers into the code if I could > just get some. > > > > On Fri, Jul 10, 2015 at 11:04 AM, Jacques Nadeau <jacq...@apache.org> > wrote: > > > Creating an EasyFormatPlugin is pretty simple. They were designed to get > > rid of much of the scaffolding required for a standard FormatPlugin. > > > > JSON > > > > > https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json > > > > Text > > > > > https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text > > > > AVRO > > > > > https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/avro > > > > In all cases, the connection code is pretty light. A fully schematized > > format like log-synth should be even simpler to implement. > > > > On Fri, Jul 10, 2015 at 10:58 AM, Ted Dunning <ted.dunn...@gmail.com> > > wrote: > > > > > I don't think we need a full on storage plugin. I think a data format > > > should be sufficient, basically CSV on steroids. > > > > > > > > > > > > > > > > > > On Fri, Jul 10, 2015 at 10:47 AM, Abdel Hakim Deneche < > > > adene...@maprtech.com > > > > wrote: > > > > > > > Yeah, we still lack documentation on how to write a storage plugin. > One > > > > advice I've been seeing a lot is to take a look at the mongo-db > plugin, > > > it > > > > was basically added in one single commit: > > > > > > > > > > > > > > > > > > https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304 > > > > > > > > I think this will give some general ideas on what to expect when > > writing > > > a > > > > storage plugin. > > > > > > > > On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning <ted.dunn...@gmail.com> > > > > wrote: > > > > > > > > > Hakim, > > > > > > > > > > Not yet. Still very much in the stage of gathering feedback. > > > > > > > > > > I would think it very simple. The biggest obstacles are > > > > > > > > > > 1) no documentation on how to write a data format > > > > > > > > > > 2) I need to release a jar for log-synth to Maven Central. > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche < > > > > > adene...@maprtech.com> > > > > > wrote: > > > > > > > > > > > @Ted, the log-synth storage format would be really useful. I'm > > > already > > > > > > seeing many unit tests that could benefit from this. Do you have > a > > > > github > > > > > > repo for your ongoing work ? > > > > > > > > > > > > Thanks! > > > > > > > > > > > > On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning < > > ted.dunn...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Are you hard set on using common table expressions? > > > > > > > > > > > > > > I have discussed a bit off-list creating a data format that > would > > > > allow > > > > > > > tables to be read from a log-synth [1] schema. That would let > > you > > > > read > > > > > > as > > > > > > > much data as you might like with an arbitrarily complex (or > > simple) > > > > > > query. > > > > > > > > > > > > > > Operationally, you would create a file containing a log-synth > > > schema > > > > > that > > > > > > > has the extension .synth. Your data source would have to be > > > > configured > > > > > > to > > > > > > > connect that extension with the log-synth format. At that > point, > > > you > > > > > > could > > > > > > > select as much or little data as you like from the file and you > > > would > > > > > see > > > > > > > generated data rather than the schema. > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] https://github.com/tdunning/log-synth > > > > > > > > > > > > > > On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei < > > > > > > > alexanderz.si...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi All, > > > > > > > > > > > > > > > > I am trying to come up with a query which returns a given > > number > > > of > > > > > > rows > > > > > > > > without having a real table on Storage. > > > > > > > > > > > > > > > > I am hoping to achieve something like this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table > > > > > > > > > > > > > > > > DECLARE @start INT = 1;DECLARE @end INT = 1000000; > > > > > > > > WITH numbers AS ( > > > > > > > > SELECT @start AS number > > > > > > > > UNION ALL > > > > > > > > SELECT number + 1 > > > > > > > > FROM numbers > > > > > > > > WHERE number < @end)SELECT *FROM numbersOPTION > > (MAXRECURSION > > > > 0); > > > > > > > > > > > > > > > > I do not actually need to create different values and > returning > > > > > > identical > > > > > > > > rows would work too.I just need to bypass the "from clause" > in > > > the > > > > > > query. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Alex > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > Abdelhakim Deneche > > > > > > > > > > > > Software Engineer > > > > > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > > > < > > > > > > > > > > > > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Abdelhakim Deneche > > > > > > > > Software Engineer > > > > > > > > <http://www.mapr.com/> > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > < > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > >