[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values
[ https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426938#comment-16426938 ] Jelmer Kuperus commented on HIVE-14044: --- It seems that the problem is with the result format. Either setting set hive.query.result.fileformat = SequenceFile; or set hive.fetch.task.conversion=more; Worked as a workaround for me > Newlines in Avro maps cause external table to return corrupt values > --- > > Key: HIVE-14044 > URL: https://issues.apache.org/jira/browse/HIVE-14044 > Project: Hive > Issue Type: Bug > Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera > 5.5.1) >Reporter: David Nies >Assignee: Sahil Takiar >Priority: Critical > Attachments: test.json, test.schema > > > When {{\n}} characters are contained in Avro files that are used as data > bases for an external table, the result of {{SELECT}} queries may be corrupt. > I encountered this error when querying hive both from {{beeline}} and from > JDBC. > h3. Steps to reproduce (used files are attached to ticket) > # Create an {{.avro}} file that contains newline characters in a value of a > map: > {code} > avro-tools fromjson --schema-file test.schema test.json > test.avro > {code} > # Copy {{.avro}} file to HDFS > {code} > hdfs dfs -copyFromLocal test.avro /some/location/ > {code} > # Create an external table in beeline containing this {{.avro}}: > {code} > beeline> CREATE EXTERNAL TABLE broken_newline_map > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION '/some/location/' > TBLPROPERTIES ('avro.schema.literal'=' > { > "type" : "record", > "name" : "myEntry", > "namespace" : "myNamespace", > "fields" : [ { > "name" : "foo", > "type" : "long" > }, { > "name" : "bar", > "type" : { > "type" : "map", > "values" : "string" > } > } ] > } > '); > {code} > # Now, selecting may return corrupt results: > {code} > jdbc:hive2://my-server:1/> select * from broken_newline_map; > +-+---+--+ > | broken_newline_map.foo | broken_newline_map.bar > | > +-+---+--+ > | 1 | {"key2":"value2","key1":"value1\nafter newline"} > | > | 2 | {"key2":"new value2","key1":"new value"} > | > +-+---+--+ > 2 rows selected (1.661 seconds) > jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) > from broken_newline_map; > +---+--+-+--+ > | foo | _c1| _c2 | > +---+--+-+--+ > | 1 | ["key2","key1"] | ["value2","value1"] | > | NULL | NULL | NULL| > | 2 | ["key2","key1"] | ["new value2","new value"] | > +---+--+-+--+ > 3 rows selected (28.05 seconds) > {code} > Obviously, the last result set contains corrupt entries (line 2) and > incorrect entries (line 1). I also encountered this when doing this query > with JDBC. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values
[ https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425849#comment-16425849 ] Jelmer Kuperus commented on HIVE-14044: --- [~Sh4pe] I think that's only for LazySimpleSerDe If i look at this code it declares the SERIALIZATION_ESCAPE_CRLF property [https://github.com/apache/hive/blob/6d890faf22fd1ede3658a5eed097476eab3c67e9/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazySimpleSerDe.java#L69] But the avro one doesn't [https://github.com/apache/hive/blob/6d890faf22fd1ede3658a5eed097476eab3c67e9/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java#L46] Specifying it on the table does absolutely nothing for me on CDH-5.9.2 > Newlines in Avro maps cause external table to return corrupt values > --- > > Key: HIVE-14044 > URL: https://issues.apache.org/jira/browse/HIVE-14044 > Project: Hive > Issue Type: Bug > Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera > 5.5.1) >Reporter: David Nies >Assignee: Sahil Takiar >Priority: Critical > Attachments: test.json, test.schema > > > When {{\n}} characters are contained in Avro files that are used as data > bases for an external table, the result of {{SELECT}} queries may be corrupt. > I encountered this error when querying hive both from {{beeline}} and from > JDBC. > h3. Steps to reproduce (used files are attached to ticket) > # Create an {{.avro}} file that contains newline characters in a value of a > map: > {code} > avro-tools fromjson --schema-file test.schema test.json > test.avro > {code} > # Copy {{.avro}} file to HDFS > {code} > hdfs dfs -copyFromLocal test.avro /some/location/ > {code} > # Create an external table in beeline containing this {{.avro}}: > {code} > beeline> CREATE EXTERNAL TABLE broken_newline_map > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION '/some/location/' > TBLPROPERTIES ('avro.schema.literal'=' > { > "type" : "record", > "name" : "myEntry", > "namespace" : "myNamespace", > "fields" : [ { > "name" : "foo", > "type" : "long" > }, { > "name" : "bar", > "type" : { > "type" : "map", > "values" : "string" > } > } ] > } > '); > {code} > # Now, selecting may return corrupt results: > {code} > jdbc:hive2://my-server:1/> select * from broken_newline_map; > +-+---+--+ > | broken_newline_map.foo | broken_newline_map.bar > | > +-+---+--+ > | 1 | {"key2":"value2","key1":"value1\nafter newline"} > | > | 2 | {"key2":"new value2","key1":"new value"} > | > +-+---+--+ > 2 rows selected (1.661 seconds) > jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) > from broken_newline_map; > +---+--+-+--+ > | foo | _c1| _c2 | > +---+--+-+--+ > | 1 | ["key2","key1"] | ["value2","value1"] | > | NULL | NULL | NULL| > | 2 | ["key2","key1"] | ["new value2","new value"] | > +---+--+-+--+ > 3 rows selected (28.05 seconds) > {code} > Obviously, the last result set contains corrupt entries (line 2) and > incorrect entries (line 1). I also encountered this when doing this query > with JDBC. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values
[ https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15938844#comment-15938844 ] Sahil Takiar commented on HIVE-14044: - Thanks for the pointer Anthony. [~Sh4pe] if you can check to see if HIVE-11785 fixes your issue that would be great. > Newlines in Avro maps cause external table to return corrupt values > --- > > Key: HIVE-14044 > URL: https://issues.apache.org/jira/browse/HIVE-14044 > Project: Hive > Issue Type: Bug > Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera > 5.5.1) >Reporter: David Nies >Assignee: Sahil Takiar >Priority: Critical > Attachments: test.json, test.schema > > > When {{\n}} characters are contained in Avro files that are used as data > bases for an external table, the result of {{SELECT}} queries may be corrupt. > I encountered this error when querying hive both from {{beeline}} and from > JDBC. > h3. Steps to reproduce (used files are attached to ticket) > # Create an {{.avro}} file that contains newline characters in a value of a > map: > {code} > avro-tools fromjson --schema-file test.schema test.json > test.avro > {code} > # Copy {{.avro}} file to HDFS > {code} > hdfs dfs -copyFromLocal test.avro /some/location/ > {code} > # Create an external table in beeline containing this {{.avro}}: > {code} > beeline> CREATE EXTERNAL TABLE broken_newline_map > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION '/some/location/' > TBLPROPERTIES ('avro.schema.literal'=' > { > "type" : "record", > "name" : "myEntry", > "namespace" : "myNamespace", > "fields" : [ { > "name" : "foo", > "type" : "long" > }, { > "name" : "bar", > "type" : { > "type" : "map", > "values" : "string" > } > } ] > } > '); > {code} > # Now, selecting may return corrupt results: > {code} > jdbc:hive2://my-server:1/> select * from broken_newline_map; > +-+---+--+ > | broken_newline_map.foo | broken_newline_map.bar > | > +-+---+--+ > | 1 | {"key2":"value2","key1":"value1\nafter newline"} > | > | 2 | {"key2":"new value2","key1":"new value"} > | > +-+---+--+ > 2 rows selected (1.661 seconds) > jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) > from broken_newline_map; > +---+--+-+--+ > | foo | _c1| _c2 | > +---+--+-+--+ > | 1 | ["key2","key1"] | ["value2","value1"] | > | NULL | NULL | NULL| > | 2 | ["key2","key1"] | ["new value2","new value"] | > +---+--+-+--+ > 3 rows selected (28.05 seconds) > {code} > Obviously, the last result set contains corrupt entries (line 2) and > incorrect entries (line 1). I also encountered this when doing this query > with JDBC. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values
[ https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15806417#comment-15806417 ] Anthony Hsu commented on HIVE-14044: I believe this issue was fixed by HIVE-11785. > Newlines in Avro maps cause external table to return corrupt values > --- > > Key: HIVE-14044 > URL: https://issues.apache.org/jira/browse/HIVE-14044 > Project: Hive > Issue Type: Bug > Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera > 5.5.1) >Reporter: David Nies >Assignee: Sahil Takiar >Priority: Critical > Attachments: test.json, test.schema > > > When {{\n}} characters are contained in Avro files that are used as data > bases for an external table, the result of {{SELECT}} queries may be corrupt. > I encountered this error when querying hive both from {{beeline}} and from > JDBC. > h3. Steps to reproduce (used files are attached to ticket) > # Create an {{.avro}} file that contains newline characters in a value of a > map: > {code} > avro-tools fromjson --schema-file test.schema test.json > test.avro > {code} > # Copy {{.avro}} file to HDFS > {code} > hdfs dfs -copyFromLocal test.avro /some/location/ > {code} > # Create an external table in beeline containing this {{.avro}}: > {code} > beeline> CREATE EXTERNAL TABLE broken_newline_map > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION '/some/location/' > TBLPROPERTIES ('avro.schema.literal'=' > { > "type" : "record", > "name" : "myEntry", > "namespace" : "myNamespace", > "fields" : [ { > "name" : "foo", > "type" : "long" > }, { > "name" : "bar", > "type" : { > "type" : "map", > "values" : "string" > } > } ] > } > '); > {code} > # Now, selecting may return corrupt results: > {code} > jdbc:hive2://my-server:1/> select * from broken_newline_map; > +-+---+--+ > | broken_newline_map.foo | broken_newline_map.bar > | > +-+---+--+ > | 1 | {"key2":"value2","key1":"value1\nafter newline"} > | > | 2 | {"key2":"new value2","key1":"new value"} > | > +-+---+--+ > 2 rows selected (1.661 seconds) > jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) > from broken_newline_map; > +---+--+-+--+ > | foo | _c1| _c2 | > +---+--+-+--+ > | 1 | ["key2","key1"] | ["value2","value1"] | > | NULL | NULL | NULL| > | 2 | ["key2","key1"] | ["new value2","new value"] | > +---+--+-+--+ > 3 rows selected (28.05 seconds) > {code} > Obviously, the last result set contains corrupt entries (line 2) and > incorrect entries (line 1). I also encountered this when doing this query > with JDBC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values
[ https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365069#comment-15365069 ] Sahil Takiar commented on HIVE-14044: - Is this still an issue for you? Have you seen this bug come up again? > Newlines in Avro maps cause external table to return corrupt values > --- > > Key: HIVE-14044 > URL: https://issues.apache.org/jira/browse/HIVE-14044 > Project: Hive > Issue Type: Bug > Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera > 5.5.1) >Reporter: David Nies >Assignee: Sahil Takiar >Priority: Critical > Attachments: test.json, test.schema > > > When {{\n}} characters are contained in Avro files that are used as data > bases for an external table, the result of {{SELECT}} queries may be corrupt. > I encountered this error when querying hive both from {{beeline}} and from > JDBC. > h3. Steps to reproduce (used files are attached to ticket) > # Create an {{.avro}} file that contains newline characters in a value of a > map: > {code} > avro-tools fromjson --schema-file test.schema test.json > test.avro > {code} > # Copy {{.avro}} file to HDFS > {code} > hdfs dfs -copyFromLocal test.avro /some/location/ > {code} > # Create an external table in beeline containing this {{.avro}}: > {code} > beeline> CREATE EXTERNAL TABLE broken_newline_map > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION '/some/location/' > TBLPROPERTIES ('avro.schema.literal'=' > { > "type" : "record", > "name" : "myEntry", > "namespace" : "myNamespace", > "fields" : [ { > "name" : "foo", > "type" : "long" > }, { > "name" : "bar", > "type" : { > "type" : "map", > "values" : "string" > } > } ] > } > '); > {code} > # Now, selecting may return corrupt results: > {code} > jdbc:hive2://my-server:1/> select * from broken_newline_map; > +-+---+--+ > | broken_newline_map.foo | broken_newline_map.bar > | > +-+---+--+ > | 1 | {"key2":"value2","key1":"value1\nafter newline"} > | > | 2 | {"key2":"new value2","key1":"new value"} > | > +-+---+--+ > 2 rows selected (1.661 seconds) > jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) > from broken_newline_map; > +---+--+-+--+ > | foo | _c1| _c2 | > +---+--+-+--+ > | 1 | ["key2","key1"] | ["value2","value1"] | > | NULL | NULL | NULL| > | 2 | ["key2","key1"] | ["new value2","new value"] | > +---+--+-+--+ > 3 rows selected (28.05 seconds) > {code} > Obviously, the last result set contains corrupt entries (line 2) and > incorrect entries (line 1). I also encountered this when doing this query > with JDBC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values
[ https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15364005#comment-15364005 ] David Nies commented on HIVE-14044: --- Sadly, no. I just checked the Hive version: {code} $ hiveserver2 --version Hive 1.1.0-cdh5.5.1 {code} > Newlines in Avro maps cause external table to return corrupt values > --- > > Key: HIVE-14044 > URL: https://issues.apache.org/jira/browse/HIVE-14044 > Project: Hive > Issue Type: Bug > Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera > 5.5.1) >Reporter: David Nies >Assignee: Sahil Takiar >Priority: Critical > Attachments: test.json, test.schema > > > When {{\n}} characters are contained in Avro files that are used as data > bases for an external table, the result of {{SELECT}} queries may be corrupt. > I encountered this error when querying hive both from {{beeline}} and from > JDBC. > h3. Steps to reproduce (used files are attached to ticket) > # Create an {{.avro}} file that contains newline characters in a value of a > map: > {code} > avro-tools fromjson --schema-file test.schema test.json > test.avro > {code} > # Copy {{.avro}} file to HDFS > {code} > hdfs dfs -copyFromLocal test.avro /some/location/ > {code} > # Create an external table in beeline containing this {{.avro}}: > {code} > beeline> CREATE EXTERNAL TABLE broken_newline_map > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION '/some/location/' > TBLPROPERTIES ('avro.schema.literal'=' > { > "type" : "record", > "name" : "myEntry", > "namespace" : "myNamespace", > "fields" : [ { > "name" : "foo", > "type" : "long" > }, { > "name" : "bar", > "type" : { > "type" : "map", > "values" : "string" > } > } ] > } > '); > {code} > # Now, selecting may return corrupt results: > {code} > jdbc:hive2://my-server:1/> select * from broken_newline_map; > +-+---+--+ > | broken_newline_map.foo | broken_newline_map.bar > | > +-+---+--+ > | 1 | {"key2":"value2","key1":"value1\nafter newline"} > | > | 2 | {"key2":"new value2","key1":"new value"} > | > +-+---+--+ > 2 rows selected (1.661 seconds) > jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) > from broken_newline_map; > +---+--+-+--+ > | foo | _c1| _c2 | > +---+--+-+--+ > | 1 | ["key2","key1"] | ["value2","value1"] | > | NULL | NULL | NULL| > | 2 | ["key2","key1"] | ["new value2","new value"] | > +---+--+-+--+ > 3 rows selected (28.05 seconds) > {code} > Obviously, the last result set contains corrupt entries (line 2) and > incorrect entries (line 1). I also encountered this when doing this query > with JDBC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values
[ https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363512#comment-15363512 ] Sahil Takiar commented on HIVE-14044: - [~Sh4pe] is there anymore environment information you can provided? I tried to re-produce this on CDH 5.5.1 but I can't reproduce is there either. > Newlines in Avro maps cause external table to return corrupt values > --- > > Key: HIVE-14044 > URL: https://issues.apache.org/jira/browse/HIVE-14044 > Project: Hive > Issue Type: Bug > Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera > 5.5.1) >Reporter: David Nies >Assignee: Sahil Takiar >Priority: Critical > Attachments: test.json, test.schema > > > When {{\n}} characters are contained in Avro files that are used as data > bases for an external table, the result of {{SELECT}} queries may be corrupt. > I encountered this error when querying hive both from {{beeline}} and from > JDBC. > h3. Steps to reproduce (used files are attached to ticket) > # Create an {{.avro}} file that contains newline characters in a value of a > map: > {code} > avro-tools fromjson --schema-file test.schema test.json > test.avro > {code} > # Copy {{.avro}} file to HDFS > {code} > hdfs dfs -copyFromLocal test.avro /some/location/ > {code} > # Create an external table in beeline containing this {{.avro}}: > {code} > beeline> CREATE EXTERNAL TABLE broken_newline_map > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION '/some/location/' > TBLPROPERTIES ('avro.schema.literal'=' > { > "type" : "record", > "name" : "myEntry", > "namespace" : "myNamespace", > "fields" : [ { > "name" : "foo", > "type" : "long" > }, { > "name" : "bar", > "type" : { > "type" : "map", > "values" : "string" > } > } ] > } > '); > {code} > # Now, selecting may return corrupt results: > {code} > jdbc:hive2://my-server:1/> select * from broken_newline_map; > +-+---+--+ > | broken_newline_map.foo | broken_newline_map.bar > | > +-+---+--+ > | 1 | {"key2":"value2","key1":"value1\nafter newline"} > | > | 2 | {"key2":"new value2","key1":"new value"} > | > +-+---+--+ > 2 rows selected (1.661 seconds) > jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) > from broken_newline_map; > +---+--+-+--+ > | foo | _c1| _c2 | > +---+--+-+--+ > | 1 | ["key2","key1"] | ["value2","value1"] | > | NULL | NULL | NULL| > | 2 | ["key2","key1"] | ["new value2","new value"] | > +---+--+-+--+ > 3 rows selected (28.05 seconds) > {code} > Obviously, the last result set contains corrupt entries (line 2) and > incorrect entries (line 1). I also encountered this when doing this query > with JDBC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14044) Newlines in Avro maps cause external table to return corrupt values
[ https://issues.apache.org/jira/browse/HIVE-14044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15363467#comment-15363467 ] Sahil Takiar commented on HIVE-14044: - I checked and this issue is no longer present in the master branch. The query: {{select foo, map_keys(bar), map_values(bar) from broken_newline_map}} Prints: {code} +--+--+-+--+ | foo |c1| c2 | +--+--+-+--+ | 1| ["key1","key2"] | ["value1\nafter newline","value2"] | | 2| ["key1","key2"] | ["new value","new value2"] | +--+--+-+--+ {code} > Newlines in Avro maps cause external table to return corrupt values > --- > > Key: HIVE-14044 > URL: https://issues.apache.org/jira/browse/HIVE-14044 > Project: Hive > Issue Type: Bug > Environment: Hive version: 1.1.0-cdh5.5.1 (bundled with cloudera > 5.5.1) >Reporter: David Nies >Assignee: Sahil Takiar >Priority: Critical > Attachments: test.json, test.schema > > > When {{\n}} characters are contained in Avro files that are used as data > bases for an external table, the result of {{SELECT}} queries may be corrupt. > I encountered this error when querying hive both from {{beeline}} and from > JDBC. > h3. Steps to reproduce (used files are attached to ticket) > # Create an {{.avro}} file that contains newline characters in a value of a > map: > {code} > avro-tools fromjson --schema-file test.schema test.json > test.avro > {code} > # Copy {{.avro}} file to HDFS > {code} > hdfs dfs -copyFromLocal test.avro /some/location/ > {code} > # Create an external table in beeline containing this {{.avro}}: > {code} > beeline> CREATE EXTERNAL TABLE broken_newline_map > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION '/some/location/' > TBLPROPERTIES ('avro.schema.literal'=' > { > "type" : "record", > "name" : "myEntry", > "namespace" : "myNamespace", > "fields" : [ { > "name" : "foo", > "type" : "long" > }, { > "name" : "bar", > "type" : { > "type" : "map", > "values" : "string" > } > } ] > } > '); > {code} > # Now, selecting may return corrupt results: > {code} > jdbc:hive2://my-server:1/> select * from broken_newline_map; > +-+---+--+ > | broken_newline_map.foo | broken_newline_map.bar > | > +-+---+--+ > | 1 | {"key2":"value2","key1":"value1\nafter newline"} > | > | 2 | {"key2":"new value2","key1":"new value"} > | > +-+---+--+ > 2 rows selected (1.661 seconds) > jdbc:hive2://my-server:1/> select foo, map_keys(bar), map_values(bar) > from broken_newline_map; > +---+--+-+--+ > | foo | _c1| _c2 | > +---+--+-+--+ > | 1 | ["key2","key1"] | ["value2","value1"] | > | NULL | NULL | NULL| > | 2 | ["key2","key1"] | ["new value2","new value"] | > +---+--+-+--+ > 3 rows selected (28.05 seconds) > {code} > Obviously, the last result set contains corrupt entries (line 2) and > incorrect entries (line 1). I also encountered this when doing this query > with JDBC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)