Drill on Kudu

2016-11-17 Thread MattK
If I understand Kudu correctly, it manages its own cluster, and does not rely on ZK for coordiantion (?). When using the Kudu storage configuration in Drill 1.7/8, is it necessary to run multiple Drillbits, and is there a concept of data locality with Drill on Kudu? Or is a single standalone

Re: "Transactional" conversion of CSV to Parquet?

2016-10-24 Thread MattK
to be corrected). On Mon, Oct 24, 2016 at 1:49 PM, MattK <m...@hybriddba.com> wrote: I have a cluster that receives log files in a csv format on a per-minute basis, and those files are immediately available to Drill users. For performance I create Parquet files from them in batch usin

Queries over Swift?

2016-09-14 Thread MattK
The Drill FAQ mentions that Swift can be queried as well as S3. I have found an S3 plugin (https://drill.apache.org/docs/s3-storage-plugin/) but nothing yet for docs, examples, or plugins for Swift. Is there any documentation available?

Queries over Swift?

2016-09-08 Thread MattK
The Drill FAQ mentions that Swift can be queried as well as S3. I have found an S3 plugin (https://drill.apache.org/docs/s3-storage-plugin/) but nothing yet for docs, examples, or plugins for Swift. Is there any documentation available?

IllegalArgumentException with malformed CSV

2016-08-14 Thread MattK
Sometimes text / CSV data comes in with formatting errors, and Drill seems to have a difficulty with this by throwing a Java error instead of what I would describe as a DB engine error that describes the problem. I logged https://issues.apache.org/jira/browse/DRILL-4845 for this, but wanted

Re: Upgrade Drill to v1.7 on MapR

2016-08-12 Thread MattK
R. Please note that this isn't certified for production purposes. > > You can get them from here: http://package.mapr.com/labs/ > drill/redhat/mapr-drill-1.7.0.201606301441-1.noarch.rpm > > Regards, > Abhishek > > > On Thu, Aug 11, 2016 at 2:57 PM, MattK <m...@hybriddba.c

Upgrade Drill to v1.7 on MapR

2016-08-12 Thread MattK
I would like to upgrade Drill on my new MapR Community cluster (v.5.1.0.37549.GA) to apply https://issues.apache.org/jira/browse/DRILL-4317 but the docs seem to take an "yum install" approach where Drill 1.7 is not in the MapR repos. Is there a set of docs for performing a Drill upgrade from

"Flatten" rows with a non-JSON array?

2016-08-11 Thread MattK
With CSV data like: ~~~ id datearray 1 2016-01-01 "1,2,3" 2 2016-01-02 "4,5,6" ~~~ I would like to "flatten" the data on the "array" column like so: ~~~ id dateelement 1 2016-01-01 1 1 2016-01-01 2 1 2016-01-01

Re: IndexOutOfBoundsException on selecting column from CSV

2016-08-11 Thread MattK
Problem was trailing whitespace in column names: https://issues.apache.org/jira/browse/DRILL-4843 On 11 Aug 2016, at 20:06, MattK wrote: On MapR Community cluster with Drill v1.6, using simple comma delimited data with a header line, gzip compressed, and storage as: ~~~ &quo

IndexOutOfBoundsException on selecting column from CSV

2016-08-11 Thread MattK
On MapR Community cluster with Drill v1.6, using simple comma delimited data with a header line, gzip compressed, and storage as: ~~~ "csv": { "type": "text", "extensions": [ "csv", "gz" ], "extractHeader": true, "delimiter": "," }, ~~~

Re: Performance with multiple FLATTENs

2016-07-19 Thread MattK
One solution seems to be to pre-flatten the data in a CTE, resulting in dramatically lower runtimes: ~~~ WITH flat AS (SELECT id, FLATTEN(data) AS data) SELECT id, data[0] AS dttm, data)[1] AS result FROM flat ~~~ This was tested on a single node, and each JSON array to be flattened has 1,440

Re: Is there a good way to handle bad date data?

2016-05-25 Thread MattK
UDFs scare me in that the only Java I've conquered is evident from my empty french press... Same issue here. I have solved this in other platforms by pre-processing the data with a set of regex replacements in Awk: ~~~ # "Repair" invalid dates as stored in MySQL (3 replacements for

Re: Apache Drill, Query PostgreSQL text or Jsonb as if it were from a json storage type?

2016-05-25 Thread MattK
Would the PostgreSQL function jsonb_to_recordset(jsonb) help in this case? It would return to Drill a table instead of a set of JSON objects, but you would have to declare the types in the call. On 25 May 2016, at 12:26, Andrew Evans wrote: Drill Members, I have an intriguing problem