Thanks for more elaboration Ted, Jacques and Jason! @Ted that is a very cool idea. I tried the cross join but figured cross join is not supported in drill yet but we have DRILL-786 for it. The new method looks very promising. It seems it is an implicit cross join, isn't it? I just tried it out and it worked like a charm. I will go on with this method.
@Jaques, yes as Jason said, we discussed this before and I have talked to my colleagues to help me with modifying the ODBC driver so it sends a plan. Also thanks for the query. I tied it out for tow tables and it worked find but extending it to three tables gives me a syntax error. select * from ((select column1, 1 as join_keyb from (SELECT column1, 1 as join_key FROM `hive43`.`default`.`double_table`) t1 Join (SELECT column1, 1 as join_key FROM `hive43`.`default`.`double_table`) t2 on t1.join_key=t2.join_key) t12 Join (SELECT column1, 1 as join_key FROM `hive43`.`default`.`double_table`) t3 on t12.join_keyb=t3.join_key) *The other syntax was easier for me to use the join three times so I could test it with * select t1.column1 from (SELECT column1, 1 as join_key FROM `hive43`.`default`.`double_table`) t1, (SELECT column1, 1 as join_key FROM `hive43`.`default`.`double_table`) t2, (SELECT column1, 1 as join_key FROM `hive43`.`default`.`double_table`) t3 where t1.join_key=t2.join_key and t1.join_key=t3.join_key Thank you very much for your time Ted, Jacques and Jason! Thanks, Alex On Fri, Jul 17, 2015 at 2:09 PM, Jason Altekruse <altekruseja...@gmail.com> wrote: > Jacques, > > Alexander has brought up this problem previously in one of the hangouts and > said that submitting a physical plan was not possible through ODBC. If he > is able to modify the driver code to make it possible to submit one, that > would be an option, as I believe the C++ client is capable of submitting > plans. The issue I seem to recall him mentioning is that the ODBC driver > was running a little sanity checking on they sql query to try to prevent > submitting complete garbage queries to a server. I think he had concerns > that a JSON formatted physical plan would fail these checks and he would > have to disable them along with trying to allow submitting two types of > queries from ODBC. > > On Fri, Jul 17, 2015 at 8:52 AM, Jacques Nadeau <jacq...@dremio.com> > wrote: > > > Removing cross posting > > > > Alexander, > > > > There is currently no way for Drill to generate a large amount of data > > using SQL. However, you can generate large generic data by using the > > MockStoragePlugin if you submit a plan. You can find an example plan > using > > this at [1]. > > > > I heard someone might be working on extending the MockStoragePlugin to > > support SQL which would provide the outcome you requested. > > > > [1] > > > > > https://github.com/apache/drill/blob/master/exec/java-exec/src/test/resources/mock-scan.json > > > > On Thu, Jul 16, 2015 at 10:16 PM, Ted Dunning <ted.dunn...@gmail.com> > > wrote: > > > > > > > > Also, just doing a Cartesian join of three copies of 1000 records will > > > give you a billion records with negligible I/o. > > > > > > Sent from my iPhone > > > > > > > On Jul 16, 2015, at 15:43, Jason Altekruse <altekruseja...@gmail.com > > > > > wrote: > > > > > > > > @Alexander If you want to test the speed of the ODBC driver you can > do > > > that > > > > without a new storage plugin. > > > > > > > > If you get the entire dataset into memory, it will be returned from > > > Drill a > > > > quickly as we can possibly send it to the client. One way to do this > is > > > to > > > > insert a sort; we cannot send along any of the data until the compete > > > sort > > > > is done. As long as you don't read so much data that we will start > > > spilling > > > > the sort to disk, all of the records will be in memory. To take the > > read > > > > and sort time out of your test, just make sure to record the time you > > > first > > > > receive data from Drill, not the query start time. > > > > > > > > There is one gotcha here. To make the BI tools more responsive, we > > > > implemented a feature that will send along one empty batch of records > > > with > > > > the schema information populated. This schema is generated by > applying > > > all > > > > of the transformations that happen throughout the query. For example, > > the > > > > join operator handles this schema population by sending along the > > schema > > > > merged from the two sides of the join, project will similarly add or > > > remove > > > > column based on the expressions and columns requested. You will want > to > > > > make sure you record your start time when you receive the first batch > > > with > > > > actual records. This can give you an accurate measurement of the ODBC > > > > performance, removing the bottleneck of the disk. > > > > > > > > On Thu, Jul 16, 2015 at 3:24 PM, Alexander Zarei < > > > alexanderz.si...@gmail.com > > > >> wrote: > > > > > > > >> Thanks for the answers. > > > >> > > > >> @Ted my only goal is to pump a large amount of data without having > to > > > read > > > >> from Hard Disk. I am measuring the ODBC driver performance and I > need > > a > > > >> higher data transfer rate. So any method that helps pumping data out > > of > > > >> Drill faster would help. The log-synth seems a good way to generate > > data > > > >> for testing. However, I'd need a ram only option which hopefully > > > provides a > > > >> higher throughput. > > > >> > > > >> @Jacques How involved is it to write a dummy plugin that returns one > > > >> hardcoded row repeatedly 12 million times? > > > >> > > > >> Thanks, > > > >> Alex > > > >> > > > >> On Fri, Jul 10, 2015 at 12:56 PM, Ted Dunning < > ted.dunn...@gmail.com> > > > >> wrote: > > > >> > > > >>> It may be easy, but it is completely opaque about what really needs > > to > > > >>> happen. > > > >>> > > > >>> For instance, > > > >>> > > > >>> 1) how is schema exposed? > > > >>> > > > >>> 2) which classes do I really need to implement? > > > >>> > > > >>> 3) how do I express partitioning of a format? > > > >>> > > > >>> 4) how do I test it? > > > >>> > > > >>> Just a bit of documentation and comments would go a very, very long > > > way. > > > >>> > > > >>> Even answers on the mailing list that have more details than "oh, > > > that's > > > >>> easy". I would be happy to transcribe answers into the code if I > > could > > > >>> just get some. > > > >>> > > > >>> > > > >>> > > > >>> On Fri, Jul 10, 2015 at 11:04 AM, Jacques Nadeau < > jacq...@apache.org > > > > > > >>> wrote: > > > >>> > > > >>>> Creating an EasyFormatPlugin is pretty simple. They were designed > > to > > > >> get > > > >>>> rid of much of the scaffolding required for a standard > FormatPlugin. > > > >>>> > > > >>>> JSON > > > >> > > > > > > https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json > > > >>>> > > > >>>> Text > > > >> > > > > > > https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/text > > > >>>> > > > >>>> AVRO > > > >> > > > > > > https://github.com/apache/drill/tree/master/exec/java-exec/src/main/java/org/apache/drill/exec/store/avro > > > >>>> > > > >>>> In all cases, the connection code is pretty light. A fully > > > schematized > > > >>>> format like log-synth should be even simpler to implement. > > > >>>> > > > >>>> On Fri, Jul 10, 2015 at 10:58 AM, Ted Dunning < > > ted.dunn...@gmail.com> > > > >>>> wrote: > > > >>>> > > > >>>>> I don't think we need a full on storage plugin. I think a data > > > >> format > > > >>>>> should be sufficient, basically CSV on steroids. > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> On Fri, Jul 10, 2015 at 10:47 AM, Abdel Hakim Deneche < > > > >>>>> adene...@maprtech.com > > > >>>>>> wrote: > > > >>>>> > > > >>>>>> Yeah, we still lack documentation on how to write a storage > > plugin. > > > >>> One > > > >>>>>> advice I've been seeing a lot is to take a look at the mongo-db > > > >>> plugin, > > > >>>>> it > > > >>>>>> was basically added in one single commit: > > > >> > > > > > > https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304 > > > >>>>>> > > > >>>>>> I think this will give some general ideas on what to expect when > > > >>>> writing > > > >>>>> a > > > >>>>>> storage plugin. > > > >>>>>> > > > >>>>>> On Fri, Jul 10, 2015 at 9:10 AM, Ted Dunning < > > > >> ted.dunn...@gmail.com> > > > >>>>>> wrote: > > > >>>>>> > > > >>>>>>> Hakim, > > > >>>>>>> > > > >>>>>>> Not yet. Still very much in the stage of gathering feedback. > > > >>>>>>> > > > >>>>>>> I would think it very simple. The biggest obstacles are > > > >>>>>>> > > > >>>>>>> 1) no documentation on how to write a data format > > > >>>>>>> > > > >>>>>>> 2) I need to release a jar for log-synth to Maven Central. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche < > > > >>>>>>> adene...@maprtech.com> > > > >>>>>>> wrote: > > > >>>>>>> > > > >>>>>>>> @Ted, the log-synth storage format would be really useful. I'm > > > >>>>> already > > > >>>>>>>> seeing many unit tests that could benefit from this. Do you > > > >> have > > > >>> a > > > >>>>>> github > > > >>>>>>>> repo for your ongoing work ? > > > >>>>>>>> > > > >>>>>>>> Thanks! > > > >>>>>>>> > > > >>>>>>>> On Thu, Jul 9, 2015 at 10:56 PM, Ted Dunning < > > > >>>> ted.dunn...@gmail.com> > > > >>>>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>>> Are you hard set on using common table expressions? > > > >>>>>>>>> > > > >>>>>>>>> I have discussed a bit off-list creating a data format that > > > >>> would > > > >>>>>> allow > > > >>>>>>>>> tables to be read from a log-synth [1] schema. That would > > > >> let > > > >>>> you > > > >>>>>> read > > > >>>>>>>> as > > > >>>>>>>>> much data as you might like with an arbitrarily complex (or > > > >>>> simple) > > > >>>>>>>> query. > > > >>>>>>>>> > > > >>>>>>>>> Operationally, you would create a file containing a log-synth > > > >>>>> schema > > > >>>>>>> that > > > >>>>>>>>> has the extension .synth. Your data source would have to be > > > >>>>>> configured > > > >>>>>>>> to > > > >>>>>>>>> connect that extension with the log-synth format. At that > > > >>> point, > > > >>>>> you > > > >>>>>>>> could > > > >>>>>>>>> select as much or little data as you like from the file and > > > >> you > > > >>>>> would > > > >>>>>>> see > > > >>>>>>>>> generated data rather than the schema. > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> [1] https://github.com/tdunning/log-synth > > > >>>>>>>>> > > > >>>>>>>>> On Thu, Jul 9, 2015 at 11:31 AM, Alexander Zarei < > > > >>>>>>>>> alexanderz.si...@gmail.com > > > >>>>>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>>> Hi All, > > > >>>>>>>>>> > > > >>>>>>>>>> I am trying to come up with a query which returns a given > > > >>>> number > > > >>>>> of > > > >>>>>>>> rows > > > >>>>>>>>>> without having a real table on Storage. > > > >>>>>>>>>> > > > >>>>>>>>>> I am hoping to achieve something like this: > > > >> > > > > > > http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table > > > >>>>>>>>>> > > > >>>>>>>>>> DECLARE @start INT = 1;DECLARE @end INT = 1000000; > > > >>>>>>>>>> WITH numbers AS ( > > > >>>>>>>>>> SELECT @start AS number > > > >>>>>>>>>> UNION ALL > > > >>>>>>>>>> SELECT number + 1 > > > >>>>>>>>>> FROM numbers > > > >>>>>>>>>> WHERE number < @end)SELECT *FROM numbersOPTION > > > >>>> (MAXRECURSION > > > >>>>>> 0); > > > >>>>>>>>>> > > > >>>>>>>>>> I do not actually need to create different values and > > > >>> returning > > > >>>>>>>> identical > > > >>>>>>>>>> rows would work too.I just need to bypass the "from clause" > > > >>> in > > > >>>>> the > > > >>>>>>>> query. > > > >>>>>>>>>> > > > >>>>>>>>>> Thanks, > > > >>>>>>>>>> Alex > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> -- > > > >>>>>>>> > > > >>>>>>>> Abdelhakim Deneche > > > >>>>>>>> > > > >>>>>>>> Software Engineer > > > >>>>>>>> > > > >>>>>>>> <http://www.mapr.com/> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> Now Available - Free Hadoop On-Demand Training > > > >>>>>>>> < > > > >> > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>>> -- > > > >>>>>> > > > >>>>>> Abdelhakim Deneche > > > >>>>>> > > > >>>>>> Software Engineer > > > >>>>>> > > > >>>>>> <http://www.mapr.com/> > > > >>>>>> > > > >>>>>> > > > >>>>>> Now Available - Free Hadoop On-Demand Training > > > >>>>>> < > > > >> > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > >> > > > > > >