Re: query performance with unequal drillbits

2018-08-27 Thread Paul Rogers
Hi All, For those following along who have not tried Ted's idea (running multiple Drillbits per host), note that when running two or more Drillbits per node, the admin is responsible for choosing non-conflicting port numbers. The port numbers are configured in drill-override.conf. See

Re: Love Drill - Hate Key Has String Token

2018-08-27 Thread Charles Givre
Hi John, Have you tried enclosing your field names in back ticks? IE SELECT `$field1`, `$field2` FROM… — C > On Aug 27, 2018, at 15:47, John Folkers wrote: > > Hello, I downloaded Drill over the weekend, and I love it. > > > Problem: $ string token in a key. > > > Question: How can I

Re: query performance with unequal drillbits

2018-08-27 Thread Ted Dunning
Paul, Thanks for the reality side of this. Configuring a system to handle unusual setups can definitely be a challenge. Btw, the general term for running several sub-scale workers on each node to allow more flexibility is "micro-sharding". On Mon, Aug 27, 2018 at 3:24 PM Paul Rogers wrote:

Love Drill - Hate Key Has String Token

2018-08-27 Thread John Folkers
Hello, I downloaded Drill over the weekend, and I love it. Problem: $ string token in a key. Question: How can I get Drill to not trip on the $ string token when it sees it inside the keyname? Error Message Error: DATA_READ ERROR: Failure while reading ExtendedJSON typed value. Expected a

Re: [DISCUSS] Deprecation policy in Drill

2018-08-27 Thread salim achouche
Drill is a SQL engine, which means the SQL syntax and associated options (runtime configuration and session properties) constitute its user facing APIs (if I may say). When we talk about deprecating and then removing documented session / configuration properties within the same release, then what

Re: Apache Drill High Availability using HAproxy

2018-08-27 Thread John Omernik
This is a great topic, that I have run into running Drill on Apache Mesos due to each of my bits having essentially a DNS load balancer. (One DNS Name, multiple Drill bits IPs assigned to them). That said, I've run into a few issues and have a few workarounds. Note, I am talking about the REST

Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record

2018-08-27 Thread scott
Hi All, I'm getting an error querying some of my json files. The error I'm getting is: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record. Current token was START_ARRAY The json files are in array format, like [ { "var1": "foo", "var2": "bar"},{"var1": "fo",

Re: RE: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record

2018-08-27 Thread Paul Rogers
Hi David, JSON files are never splittable: there is no single-character way to find the start of a JSON record within a file. Drill is supposed to support two JSON formats: the array format from the earlier post, and the non-JSON (but very common) list of objects format in this example.

Re: RE: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record

2018-08-27 Thread scott
Paul, I'm using version 1.12. Can you tell me what version you think that was fixed in? The ticket I referenced is still open, with no comments. Scott On Mon, Aug 27, 2018 at 5:47 PM Paul Rogers wrote: > Hi David, > > JSON files are never splittable: there is no single-character way to find >

RE: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record

2018-08-27 Thread Lee, David
Get rid of the opening and closing brackets and see if you can turn the commas into newlines.. The file needs to be splittable I think to reduce memory overhead vs parsing a giant string... {"var1": "foo", "var2":"bar"} {"var1": "fo", "var2": "baz"} {"var1": "f2o", "var2": "baz2"} {"var1":

Re: Error: DATA_READ ERROR: Error parsing JSON - Cannot read from the middle of a record

2018-08-27 Thread Paul Rogers
Hi Scott, The code to handle top-level arrays is supposed to be in Drill already. I tested it for a not-yet-committed version of the JSON parser. I thought it worked in the current version as well... Just checked the unit tests. We have one, TestJsonRecordReader.testContainingArray that reads

Re: Love Drill - Hate Key Has String Token

2018-08-27 Thread Ted Dunning
Can you post a sample file with, say, 5-10 lines? Is it the file names? Or the data values that are giving you fits? On Mon, Aug 27, 2018, 12:51 John Folkers wrote: > Hello, I downloaded Drill over the weekend, and I love it. > > > Problem: $ string token in a key. > > > Question: How can I

Re: [DISCUSSION] current project state

2018-08-27 Thread Carlos Derich
Hello guys, Thanks for bringing up this discussion, I may be a little bit late but I would like to add an use case I've been through recently. I think Drill should be able to use ZK for storing session's data. In a multiple Drillbit scenario, if a second Drillbit receives a request with a

Failure while reading messages from kafka

2018-08-27 Thread Matt
I have a Kafka topic with some non-JSON test messages in it, resulting in errors "Error: DATA_READ ERROR: Failure while reading messages from kafka. Recordreader was at record: 1" I don't seem to be able to bypass these topic messages with "store.json.reader.skip_invalid_records" or even an

Re: [DISCUSSION] current project state

2018-08-27 Thread Paul Rogers
Hi Derich, From the shameless self promotion dept., Charles and I are wrapping up the O’Reilly book “Learning Apache Drill” that gives an in-depth discussion of format plugins and UDFs. We still have a red for docs on storage plugins. - Paul Sent from my iPhone > On Aug 27, 2018, at 9:04