Got it. It looks like the problem is the like function not handling
wildcards with newlines.  I'm guessing we would see the same problem with
an embedded line break in a parquet file or jdbc source.  Can you file a
bug?
On Jan 6, 2016 6:45 PM, <[email protected]> wrote:

> For CSV:
> I've realized why newline is not be supported.
>
>
> For JSON:
> I mentioned WHERE clause to b in your case.
>
> escaped_break.json is same as yours.
>
>
> 0: jdbc:drill:zk=local> select * from `escaped_break.json` where b like
> 'hello%';
> +----+--------+
> | a  |   b    |
> +----+--------+
> | 4  | hello  |
> +----+--------+
> 1 row selected (0.228 seconds)
>
>
> it can't find {a: 7, b: "hello \ngoodbye"} record.
>
> Thank you.
>
> --
> Miura, Masahide
>
> -----Original Message-----
> From: Jacques Nadeau [mailto:[email protected]]
> Sent: Thursday, January 07, 2016 1:29 AM
> To: user
> Subject: Re: Does drill recognize new line correctly?
>
> For CSV:
>
> Drill doesn't currently support newlines within a csv record. The reason
> has to do with supporting parallel reading of a csv file. It seems
> reasonable to add an option for support of this at the cost of
> parallelization capabilities. Can you open a JIRA requesting this feature
> and vote on it? We are more likely to focus on issues that have a number of
> votes.
>
> For JSON:
>
> I think this works. I'm guessing you are having a different problem. For
> example:
>
> $ cat /tmp/escaped_break.json
> {a: 4, b: "hello"}
> {a: 7, b: "hello \ngoodbye"}
>
> $  cat /tmp/break_in_object.json
> {a: 4, b: "hello"}
> {
>   a: 7,
>   b: "hello goodbye"
> }
>
>
> 0: jdbc:drill:zk=local> use dfs.tmp;
> +-------+--------------------------------------+
> |  ok   |               summary                |
> +-------+--------------------------------------+
> | true  | Default schema changed to [dfs.tmp]  |
> +-------+--------------------------------------+
> 1 row selected (0.101 seconds)
> 0: jdbc:drill:zk=local> select * from `escaped_break.json`;
> +----+-----------------+
> | a  |        b        |
> +----+-----------------+
> | 4  | hello           |
> | 7  | hello
> goodbye  |
> +----+-----------------+
> 2 rows selected (0.114 seconds)
> 0: jdbc:drill:zk=local> select * from `break_in_object.json`;
> +----+----------------+
> | a  |       b        |
> +----+----------------+
> | 4  | hello          |
> | 7  | hello goodbye  |
> +----+----------------+
> 2 rows selected (0.111 seconds)
> 0: jdbc:drill:zk=local>
>
> Note that we don't support an actual embedded line break within a string
> value (apparently json requires this to be escaped... I didn't even realize
> the spec requires that).
>
> $ cat /tmp/bad_break_in_string.json
> {a: 10, b: "hello
>   goodbye"
> }
>
> 0: jdbc:drill:zk=local> select * from `bad_break_in_string.json`;
> Error: DATA_READ ERROR: Error parsing JSON - Illegal unquoted character
> ((CTRL-CHAR, code 10)): has to be escaped using backslash to be included in
> string value
>
> File  /tmp/bad_break_in_string.json
> Record  1
> Column  19
> Fragment 0:0
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Tue, Jan 5, 2016 at 8:10 PM, <[email protected]> wrote:
>
> > Happy new year!
> >   We, Japanese like new year's greeting ;-)
> >
> > There's two issues in this message.
> >
> > First, CSV file which has value includes new line.
> > The other, JSON file which has value includes new line.
> >
> > 1) CSV
> >   Doesn't drill recognize CSV which has some columns including new line?
> >   For example, CSV file exported from MS-Excel.
> >
> >   I tried some patterns. Quoting column, escaping by \ (like \[LF]),
> > replacing \r or \n...
> >   But all of those are not good for me.
> >
> >   By the way, new lines are approved in CSV columns by RFC, you know.
> >         * https://tools.ietf.org/html/rfc4180
> >   Then I would like to parse such CSV though I know it is
> > informational definition.
> >
> > 2) JSON
> >   It can't query correctly to JSON with records include new line.
> >
> >   JSON:
> >     { "key": "test record with \n newline" }
> >
> >   Query:
> >     select * from dfs.`test.json` where key like 'test%'
> >
> >   Result:
> >     No result found
> >
> >   It doesn't compare value correctly if it includes new line, I think.
> >
> > Do you know how to use new lines in values as expected?
> >
> > Thank you.
> >
> > --
> > Miura, Masahide
> >
> >
>

Reply via email to