If I understand Kudu correctly, it manages its own cluster, and does not rely
on ZK for coordiantion (?).
When using the Kudu storage configuration in Drill 1.7/8, is it necessary to
run multiple Drillbits, and is there a concept of data locality with Drill on
Kudu?
Or is a single standalone
to be
corrected).
On Mon, Oct 24, 2016 at 1:49 PM, MattK <m...@hybriddba.com> wrote:
I have a cluster that receives log files in a csv format on a
per-minute
basis, and those files are immediately available to Drill users. For
performance I create Parquet files from them in batch usin
The Drill FAQ mentions that Swift can be queried as well as S3.
I have found an S3 plugin
(https://drill.apache.org/docs/s3-storage-plugin/) but nothing yet for
docs, examples, or plugins for Swift.
Is there any documentation available?
The Drill FAQ mentions that Swift can be queried as well as S3. I have
found an S3 plugin (https://drill.apache.org/docs/s3-storage-plugin/)
but nothing yet for docs, examples, or plugins for Swift.
Is there any documentation available?
Sometimes text / CSV data comes in with formatting errors, and Drill
seems to have a difficulty with this by throwing a Java error instead of
what I would describe as a DB engine error that describes the problem.
I logged https://issues.apache.org/jira/browse/DRILL-4845 for this, but
wanted
R. Please note that this isn't certified for production purposes.
>
> You can get them from here: http://package.mapr.com/labs/
> drill/redhat/mapr-drill-1.7.0.201606301441-1.noarch.rpm
>
> Regards,
> Abhishek
>
>
> On Thu, Aug 11, 2016 at 2:57 PM, MattK <m...@hybriddba.c
I would like to upgrade Drill on my new MapR Community cluster
(v.5.1.0.37549.GA) to apply
https://issues.apache.org/jira/browse/DRILL-4317 but the docs seem to
take an "yum install" approach where Drill 1.7 is not in the MapR repos.
Is there a set of docs for performing a Drill upgrade from
With CSV data like:
~~~
id datearray
1 2016-01-01 "1,2,3"
2 2016-01-02 "4,5,6"
~~~
I would like to "flatten" the data on the "array" column like so:
~~~
id dateelement
1 2016-01-01 1
1 2016-01-01 2
1 2016-01-01
Problem was trailing whitespace in column names:
https://issues.apache.org/jira/browse/DRILL-4843
On 11 Aug 2016, at 20:06, MattK wrote:
On MapR Community cluster with Drill v1.6, using simple comma
delimited data with a header line, gzip compressed, and storage as:
~~~
&quo
On MapR Community cluster with Drill v1.6, using simple comma delimited
data with a header line, gzip compressed, and storage as:
~~~
"csv": {
"type": "text",
"extensions": [
"csv",
"gz"
],
"extractHeader": true,
"delimiter": ","
},
~~~
One solution seems to be to pre-flatten the data in a CTE, resulting in
dramatically lower runtimes:
~~~
WITH flat AS (SELECT id, FLATTEN(data) AS data)
SELECT id, data[0] AS dttm, data)[1] AS result
FROM flat
~~~
This was tested on a single node, and each JSON array to be flattened
has 1,440
UDFs scare me in that the only Java I've conquered is evident from my
empty
french press...
Same issue here. I have solved this in other platforms by pre-processing
the data with a set of regex replacements in Awk:
~~~
# "Repair" invalid dates as stored in MySQL (3 replacements for
Would the PostgreSQL function jsonb_to_recordset(jsonb) help in this
case?
It would return to Drill a table instead of a set of JSON objects, but
you would have to declare the types in the call.
On 25 May 2016, at 12:26, Andrew Evans wrote:
Drill Members,
I have an intriguing problem
13 matches
Mail list logo