Ted Dunning wrote:
The cartesian join approach will produce an enormous stream of data with
only a very small amount of disk read.
You don't even need an external seed file (or to query a built-in table), and
using WITH can extend multiplication to exponentiation:
WITH q(key) AS (
WITH
Thanks for more elaboration Ted, Jacques and Jason!
@Ted that is a very cool idea. I tried the cross join but figured cross
join is not supported in drill yet but we have DRILL-786 for it. The new
method looks very promising. It seems it is an implicit cross join, isn't
it? I just tried it out
Good point. In fact, you can just use a literal expression and some sample
data such tpch lineitem:
SELECT * FROM
(select l_orderkey, l_shipdate, l_commitdate, l_shipmode, 1 as join_key
from cp.`tpch/lineitem.parquet`) t1
JOIN
(select l_orderkey, l_shipdate, l_commitdate, l_shipmode, 1 as
Also, just doing a Cartesian join of three copies of 1000 records will give you
a billion records with negligible I/o.
Sent from my iPhone
On Jul 16, 2015, at 15:43, Jason Altekruse altekruseja...@gmail.com wrote:
@Alexander If you want to test the speed of the ODBC driver you can do that
Thanks for the answers.
@Ted my only goal is to pump a large amount of data without having to read
from Hard Disk. I am measuring the ODBC driver performance and I need a
higher data transfer rate. So any method that helps pumping data out of
Drill faster would help. The log-synth seems a good
@Alexander If you want to test the speed of the ODBC driver you can do that
without a new storage plugin.
If you get the entire dataset into memory, it will be returned from Drill a
quickly as we can possibly send it to the client. One way to do this is to
insert a sort; we cannot send along any
Yeah, we still lack documentation on how to write a storage plugin. One
advice I've been seeing a lot is to take a look at the mongo-db plugin, it
was basically added in one single commit:
https://github.com/apache/drill/commit/2ca9c907bff639e08a561eac32e0acab3a0b3304
I think this will give some
I don't think we need a full on storage plugin. I think a data format
should be sufficient, basically CSV on steroids.
On Fri, Jul 10, 2015 at 10:47 AM, Abdel Hakim Deneche adene...@maprtech.com
wrote:
Yeah, we still lack documentation on how to write a storage plugin. One
advice I've
Hakim,
Not yet. Still very much in the stage of gathering feedback.
I would think it very simple. The biggest obstacles are
1) no documentation on how to write a data format
2) I need to release a jar for log-synth to Maven Central.
On Fri, Jul 10, 2015 at 8:17 AM, Abdel Hakim Deneche
It may be easy, but it is completely opaque about what really needs to
happen.
For instance,
1) how is schema exposed?
2) which classes do I really need to implement?
3) how do I express partitioning of a format?
4) how do I test it?
Just a bit of documentation and comments would go a very,
+ user@drill
Hi All,
I am trying to come up with a query which returns a given number of rows
without having a real table on Storage.
I am hoping to achieve something like this:
http://stackoverflow.com/questions/6533524/sql-select-n-records-without-a-table
DECLARE @start INT = 1;DECLARE @end
Are you hard set on using common table expressions?
I have discussed a bit off-list creating a data format that would allow
tables to be read from a log-synth [1] schema. That would let you read as
much data as you might like with an arbitrarily complex (or simple) query.
Operationally, you
12 matches
Mail list logo