While both parquet and javascript are widely used, they kind of exist in
different worlds. I cannot find a javscript reader for parquet files.
That being said, I'm not so sure that one ought to exist, as parquet files
are designed specifically for storing volumes of data for scan efficiency.
Hi, is there a way to force drill to create a single file when performing a
CTAS command (or some other method).
Right now, I'm creating CSV files, and then have to perform and extra step
to stitch 1_0_0.parquet 1_1_0.parquet 1_2_0.parquet etc. together into a
single file.
Thank you.
Peder
On a desktop you will likely be limited on memory.
Perhaps set width to 1 to go on single threaded execution, and use 512MB or 1GB
for parquet block size pending how much memory the Drillbit has for direct
memory. This will limit the number of parquet files being created, see how much
smaller
Are you even trying to write parquet files? in your original post you said
you are writing CSV files, but then gave files with parquet extensions as
what you are trying to concatenate.
I'm a little confused though if you are not working with tools for big
data, concatenating parquet files is not
On Thu, Feb 4, 2016 at 11:15 AM, Andries Engelbrecht <
aengelbre...@maprtech.com> wrote:
> Is there a reason to create a single file? Typically you may want more
> files to improve parallel operation on distributed systems like drill.
>
Good question. I'm not actually using Drill for "big
Is there a reason to create a single file? Typically you may want more files to
improve parallel operation on distributed systems like drill.
That said, if you have a single node drill cluster (or embedded mode) you can
reduce the threads to a single thread and increase the parquet file size
Hi Andries, the trouble is that I run Drill on my desktop machine, but I
have no server available to me that is capable of running Drill. Most
$10/month hosting accounts do not permit you to run java apps. For this
reason I simply use Drill for "pre-processing" of the files that I
eventually
Sorry, bad typo: I have 50GB of data, NOT 500GB ;). And I usually only
query a 1 GB subset of this data using Drill.
On Thu, Feb 4, 2016 at 1:04 PM, Peder Jakobsen | gmail
wrote:
> On Thu, Feb 4, 2016 at 11:15 AM, Andries Engelbrecht <
> aengelbre...@maprtech.com>
You can create multiple parquet files and have the ability to query them all
through the Drill SQL interface with minimal overhead.
Creating a single 50GB parquet file is likely not be the best option for
performance, perhaps use Drill partitioning for the parquet files to speed up
queries and
Hi, Jason sorry for the confusion; I'm generating both cvs files and
parquet. Parquet is just an experiment for me to see if I get better
performance than with CSV or loading the csv into something like TinyDB or
MongoDB.
I've found a way to read the parquet files with a python library;
10 matches
Mail list logo