Re: Creating a single parquet or csv file using CTAS command?

2016-02-04 Thread Jason Altekruse
While both parquet and javascript are widely used, they kind of exist in different worlds. I cannot find a javscript reader for parquet files. That being said, I'm not so sure that one ought to exist, as parquet files are designed specifically for storing volumes of data for scan efficiency.

Creating a single parquet or csv file using CTAS command?

2016-02-04 Thread Peder Jakobsen | gmail
Hi, is there a way to force drill to create a single file when performing a CTAS command (or some other method). Right now, I'm creating CSV files, and then have to perform and extra step to stitch 1_0_0.parquet 1_1_0.parquet 1_2_0.parquet etc. together into a single file. Thank you. Peder

Re: Creating a single parquet or csv file using CTAS command?

2016-02-04 Thread Andries Engelbrecht
On a desktop you will likely be limited on memory. Perhaps set width to 1 to go on single threaded execution, and use 512MB or 1GB for parquet block size pending how much memory the Drillbit has for direct memory. This will limit the number of parquet files being created, see how much smaller

Re: Creating a single parquet or csv file using CTAS command?

2016-02-04 Thread Jason Altekruse
Are you even trying to write parquet files? in your original post you said you are writing CSV files, but then gave files with parquet extensions as what you are trying to concatenate. I'm a little confused though if you are not working with tools for big data, concatenating parquet files is not

Re: Creating a single parquet or csv file using CTAS command?

2016-02-04 Thread Peder Jakobsen | gmail
On Thu, Feb 4, 2016 at 11:15 AM, Andries Engelbrecht < aengelbre...@maprtech.com> wrote: > Is there a reason to create a single file? Typically you may want more > files to improve parallel operation on distributed systems like drill. > Good question. I'm not actually using Drill for "big

Re: Creating a single parquet or csv file using CTAS command?

2016-02-04 Thread Andries Engelbrecht
Is there a reason to create a single file? Typically you may want more files to improve parallel operation on distributed systems like drill. That said, if you have a single node drill cluster (or embedded mode) you can reduce the threads to a single thread and increase the parquet file size

Re: Creating a single parquet or csv file using CTAS command?

2016-02-04 Thread Peder Jakobsen | gmail
Hi Andries, the trouble is that I run Drill on my desktop machine, but I have no server available to me that is capable of running Drill. Most $10/month hosting accounts do not permit you to run java apps. For this reason I simply use Drill for "pre-processing" of the files that I eventually

Re: Creating a single parquet or csv file using CTAS command?

2016-02-04 Thread Peder Jakobsen | gmail
Sorry, bad typo: I have 50GB of data, NOT 500GB ;). And I usually only query a 1 GB subset of this data using Drill. On Thu, Feb 4, 2016 at 1:04 PM, Peder Jakobsen | gmail wrote: > On Thu, Feb 4, 2016 at 11:15 AM, Andries Engelbrecht < > aengelbre...@maprtech.com>

Re: Creating a single parquet or csv file using CTAS command?

2016-02-04 Thread Andries Engelbrecht
You can create multiple parquet files and have the ability to query them all through the Drill SQL interface with minimal overhead. Creating a single 50GB parquet file is likely not be the best option for performance, perhaps use Drill partitioning for the parquet files to speed up queries and

Re: Creating a single parquet or csv file using CTAS command?

2016-02-04 Thread Peder Jakobsen | gmail
Hi, Jason sorry for the confusion; I'm generating both cvs files and parquet. Parquet is just an experiment for me to see if I get better performance than with CSV or loading the csv into something like TinyDB or MongoDB. I've found a way to read the parquet files with a python library;