Hi Dobes,
I like your idea! I think it would be a great addition to Drill in general and
will show the way for other storage plugins.
Technically, you are right, there should not be that much work. We have working
examples in other plugins of what would be needed. Perhaps the biggest cost is
Hi Paul,
After looking at the mongo stuff a bit more today I realized that probably the
simplest solution would be to put some kind of schema mapping into the storage
plugin configuration. Some subset of JSON schema using the drill config syntax.
What do you think of this idea? Might actually
I update parquet files as follows:
A. First save your data in row groups.
B. Modify any row groups by removing DELETED records. Delete the row group from
the parquet file and append the modified row group to the file.
C. Add any new INSERTS as a new row group appended to the file.
Alternative
On 2/27/2020 1:04:07 PM, Nicolas PARIS wrote:
> However, updating parquet files can be a bit troublesome.
You might be interested in delta-lake which provides an implementation
of the sql merge statement on top of parquet files. Implementing a drill
connector on this should be feasible. This
Paul, Ted:
Thanks for your excellent responses, I will mull on this.
We have 443 million answers currently, so I suppose we are approaching the
billions threshold.
I had originally thought to do as you suggest - load the most recent data from
mongodb and the history from parquet files. This
> However, updating parquet files can be a bit troublesome. The files
> cannot easily be appended to. So some process has to periodically
> re-write the parquet files. Also, we don't want to have hundreds or
> thousands of separate files, as this can slow down query executing.
> So we don't
Yes. I have seen things like this before.
Typically, if you have short time-to-visibility requirements, some kind of
database is required. If you have large data and long retention
requirements, it can be advantageous to roll out to a columnar compressed
form like parquet.
The design that I have
Hi Dobes,
Also, if Ted is still lurking on this list, he's an expert at this stuff. Here
are some patterns I've seen.
What you describe is a pretty standard pattern. Substitute anything for
"scores" (logs, sales, clicks, GPS tracking locations) and you find that many
folks have solved the
Hi,
I am trying to figure out a system that can offer both low latency in
generating reports, low latency between data being collected and being
available in the reporting system, and avoiding glitches and errors.
In our system users are collecting responses from their students. We want to
Hi
Thanks so much for your answ
On the other hand, I wonder if that would cause undesired interference with the
SQL Boolean
rules: maybe there is some place where the ability to treat a Bit as both an
INT and a BOOLEAN
causes ambiguities in the planner or in run-time type resolution code.
On
10 matches
Mail list logo