Got it. It looks like the problem is the like function not handling
wildcards with newlines. I'm guessing we would see the same problem with
an embedded line break in a parquet file or jdbc source. Can you file a
bug?
On Jan 6, 2016 6:45 PM, <[email protected]> wrote:
> For CSV:
> I've realized why newline is not be supported.
>
>
> For JSON:
> I mentioned WHERE clause to b in your case.
>
> escaped_break.json is same as yours.
>
>
> 0: jdbc:drill:zk=local> select * from `escaped_break.json` where b like
> 'hello%';
> +----+--------+
> | a | b |
> +----+--------+
> | 4 | hello |
> +----+--------+
> 1 row selected (0.228 seconds)
>
>
> it can't find {a: 7, b: "hello \ngoodbye"} record.
>
> Thank you.
>
> --
> Miura, Masahide
>
> -----Original Message-----
> From: Jacques Nadeau [mailto:[email protected]]
> Sent: Thursday, January 07, 2016 1:29 AM
> To: user
> Subject: Re: Does drill recognize new line correctly?
>
> For CSV:
>
> Drill doesn't currently support newlines within a csv record. The reason
> has to do with supporting parallel reading of a csv file. It seems
> reasonable to add an option for support of this at the cost of
> parallelization capabilities. Can you open a JIRA requesting this feature
> and vote on it? We are more likely to focus on issues that have a number of
> votes.
>
> For JSON:
>
> I think this works. I'm guessing you are having a different problem. For
> example:
>
> $ cat /tmp/escaped_break.json
> {a: 4, b: "hello"}
> {a: 7, b: "hello \ngoodbye"}
>
> $ cat /tmp/break_in_object.json
> {a: 4, b: "hello"}
> {
> a: 7,
> b: "hello goodbye"
> }
>
>
> 0: jdbc:drill:zk=local> use dfs.tmp;
> +-------+--------------------------------------+
> | ok | summary |
> +-------+--------------------------------------+
> | true | Default schema changed to [dfs.tmp] |
> +-------+--------------------------------------+
> 1 row selected (0.101 seconds)
> 0: jdbc:drill:zk=local> select * from `escaped_break.json`;
> +----+-----------------+
> | a | b |
> +----+-----------------+
> | 4 | hello |
> | 7 | hello
> goodbye |
> +----+-----------------+
> 2 rows selected (0.114 seconds)
> 0: jdbc:drill:zk=local> select * from `break_in_object.json`;
> +----+----------------+
> | a | b |
> +----+----------------+
> | 4 | hello |
> | 7 | hello goodbye |
> +----+----------------+
> 2 rows selected (0.111 seconds)
> 0: jdbc:drill:zk=local>
>
> Note that we don't support an actual embedded line break within a string
> value (apparently json requires this to be escaped... I didn't even realize
> the spec requires that).
>
> $ cat /tmp/bad_break_in_string.json
> {a: 10, b: "hello
> goodbye"
> }
>
> 0: jdbc:drill:zk=local> select * from `bad_break_in_string.json`;
> Error: DATA_READ ERROR: Error parsing JSON - Illegal unquoted character
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in
> string value
>
> File /tmp/bad_break_in_string.json
> Record 1
> Column 19
> Fragment 0:0
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Tue, Jan 5, 2016 at 8:10 PM, <[email protected]> wrote:
>
> > Happy new year!
> > We, Japanese like new year's greeting ;-)
> >
> > There's two issues in this message.
> >
> > First, CSV file which has value includes new line.
> > The other, JSON file which has value includes new line.
> >
> > 1) CSV
> > Doesn't drill recognize CSV which has some columns including new line?
> > For example, CSV file exported from MS-Excel.
> >
> > I tried some patterns. Quoting column, escaping by \ (like \[LF]),
> > replacing \r or \n...
> > But all of those are not good for me.
> >
> > By the way, new lines are approved in CSV columns by RFC, you know.
> > * https://tools.ietf.org/html/rfc4180
> > Then I would like to parse such CSV though I know it is
> > informational definition.
> >
> > 2) JSON
> > It can't query correctly to JSON with records include new line.
> >
> > JSON:
> > { "key": "test record with \n newline" }
> >
> > Query:
> > select * from dfs.`test.json` where key like 'test%'
> >
> > Result:
> > No result found
> >
> > It doesn't compare value correctly if it includes new line, I think.
> >
> > Do you know how to use new lines in values as expected?
> >
> > Thank you.
> >
> > --
> > Miura, Masahide
> >
> >
>