Boaz Ben-Zvi created DRILL-4960: ----------------------------------- Summary: Wrong columns after scanning Json files where some files have missing columns Key: DRILL-4960 URL: https://issues.apache.org/jira/browse/DRILL-4960 Project: Apache Drill Issue Type: Bug Components: Server Affects Versions: 1.8.0 Environment: Mac Reporter: Boaz Ben-Zvi
(This problem may be more general than just Json) To recreate: Scan two small Json files (e.g. copy twice contrib/storage-mongo/src/test/resources/emp.json ) where in one of the files a whole column was eliminated (e.g. "last_name"). A "normal" scan (the missing column shows up as nulls): 0: jdbc:drill:zk=local> select * from `drill/data/emp`; +--------------+---------------------+-------------+--------------+--------------+---------+------------+--------+ | employee_id | full_name | first_name | last_name | position_id | rating | position | isFTE | +--------------+---------------------+-------------+--------------+--------------+---------+------------+--------+ | 1101 | Steve Eurich | Steve | Eurich | 16 | 23.0 | Store T | true | | 1102 | Mary Pierson | Mary | Pierson | 16 | 45.6 | Store T | true | | 1103 | Leo Jones | Leo | Jones | 16 | 85.94 | Store Tem | true | | 1104 | Nancy Beatty | Nancy | Beatty | 16 | 97.16 | Store T | false | | 1105 | Clara McNight | Clara | McNight | 16 | 81.25 | Store | true | | 1106 | null | Marcella | Isaacs | 17 | 67.86 | Stor | false | | 1107 | Charlotte Yonce | Charlotte | Yonce | 17 | 52.17 | Stor | true | | 1108 | Benjamin Foster | Benjamin | Foster | 17 | 89.8 | Stor | false | | 1109 | John Reed | John | Reed | 17 | 12.9 | Store Per | false | | 1110 | Lynn Kwiatkowski | Lynn | Kwiatkowski | 17 | 25.76 | St | true | | 1111 | Donald Vann | Donald | Vann | 17 | 34.86 | Store Per | false | | 1112 | null | William | Smith | null | 79.06 | St | true | | 1113 | Amy Hensley | Amy | Hensley | 17 | 82.96 | Store Pe | false | | 1114 | Judy Owens | Judy | Owens | 17 | 24.6 | Store Per | true | | 1115 | Frederick Castillo | Frederick | Castillo | 17 | 82.36 | S | false | | 1116 | Phil Munoz | Phil | Munoz | 17 | 97.63 | Store Per | false | | 1117 | Lori Lightfoot | Lori | Lightfoot | 17 | 39.16 | Store | true | | 1 | Kumar | Anil | B | 19 | 45.45 | Store | true | | 2 | Kamesh | Bh | Venkata | null | 32.89 | Store | true | | 1101 | Steve Eurich | Steve | null | 16 | 23.0 | Store T | true | | 1102 | Mary Pierson | Mary | null | 16 | 45.6 | Store T | true | | 1103 | Leo Jones | Leo | null | 16 | 85.94 | Store Tem | true | | 1104 | Nancy Beatty | Nancy | null | 16 | 97.16 | Store T | false | | 1105 | Clara McNight | Clara | null | 16 | 81.25 | Store | true | | 1106 | null | Marcella | null | 17 | 67.86 | Stor | false | | 1107 | Charlotte Yonce | Charlotte | null | 17 | 52.17 | Stor | true | | 1108 | Benjamin Foster | Benjamin | null | 17 | 89.8 | Stor | false | | 1109 | John Reed | John | null | 17 | 12.9 | Store Per | false | | 1110 | Lynn Kwiatkowski | Lynn | null | 17 | 25.76 | St | true | | 1111 | Donald Vann | Donald | null | 17 | 34.86 | Store Per | false | | 1112 | null | William | null | null | 79.06 | St | true | | 1113 | Amy Hensley | Amy | null | 17 | 82.96 | Store Pe | false | | 1114 | Judy Owens | Judy | null | 17 | 24.6 | Store Per | true | | 1115 | Frederick Castillo | Frederick | null | 17 | 82.36 | S | false | | 1116 | Phil Munoz | Phil | null | 17 | 97.63 | Store Per | false | | 1117 | Lori Lightfoot | Lori | null | 17 | 39.16 | Store | true | | 1 | Kumar | Anil | null | 19 | 45.45 | Store | true | | 2 | Kamesh | Bh | null | null | 32.89 | Store | true | +--------------+---------------------+-------------+--------------+--------------+---------+------------+--------+ 38 rows selected (0.16 seconds) But when the first alphabetically ordered file name is renamed to become second, that column ("last_name") does not show: 0: jdbc:drill:zk=local> select * from foo; +--------------+---------------------+-------------+--------------+---------+------------+--------+ | employee_id | full_name | first_name | position_id | rating | position | isFTE | +--------------+---------------------+-------------+--------------+---------+------------+--------+ | 1101 | Steve Eurich | Steve | 16 | 23.0 | Store T | true | | 1102 | Mary Pierson | Mary | 16 | 45.6 | Store T | true | | 1103 | Leo Jones | Leo | 16 | 85.94 | Store Tem | true | | 1104 | Nancy Beatty | Nancy | 16 | 97.16 | Store T | false | | 1105 | Clara McNight | Clara | 16 | 81.25 | Store | true | | 1106 | null | Marcella | 17 | 67.86 | Stor | false | | 1107 | Charlotte Yonce | Charlotte | 17 | 52.17 | Stor | true | | 1108 | Benjamin Foster | Benjamin | 17 | 89.8 | Stor | false | | 1109 | John Reed | John | 17 | 12.9 | Store Per | false | | 1110 | Lynn Kwiatkowski | Lynn | 17 | 25.76 | St | true | | 1111 | Donald Vann | Donald | 17 | 34.86 | Store Per | false | | 1112 | null | William | null | 79.06 | St | true | | 1113 | Amy Hensley | Amy | 17 | 82.96 | Store Pe | false | | 1114 | Judy Owens | Judy | 17 | 24.6 | Store Per | true | | 1115 | Frederick Castillo | Frederick | 17 | 82.36 | S | false | | 1116 | Phil Munoz | Phil | 17 | 97.63 | Store Per | false | | 1117 | Lori Lightfoot | Lori | 17 | 39.16 | Store | true | | 1 | Kumar | Anil | 19 | 45.45 | Store | true | | 2 | Kamesh | Bh | null | 32.89 | Store | true | | 1101 | Steve Eurich | Steve | 16 | 23.0 | Store T | true | | 1102 | Mary Pierson | Mary | 16 | 45.6 | Store T | true | | 1103 | Leo Jones | Leo | 16 | 85.94 | Store Tem | true | | 1104 | Nancy Beatty | Nancy | 16 | 97.16 | Store T | false | | 1105 | Clara McNight | Clara | 16 | 81.25 | Store | true | | 1106 | null | Marcella | 17 | 67.86 | Stor | false | | 1107 | Charlotte Yonce | Charlotte | 17 | 52.17 | Stor | true | | 1108 | Benjamin Foster | Benjamin | 17 | 89.8 | Stor | false | | 1109 | John Reed | John | 17 | 12.9 | Store Per | false | | 1110 | Lynn Kwiatkowski | Lynn | 17 | 25.76 | St | true | | 1111 | Donald Vann | Donald | 17 | 34.86 | Store Per | false | | 1112 | null | William | null | 79.06 | St | true | | 1113 | Amy Hensley | Amy | 17 | 82.96 | Store Pe | false | | 1114 | Judy Owens | Judy | 17 | 24.6 | Store Per | true | | 1115 | Frederick Castillo | Frederick | 17 | 82.36 | S | false | | 1116 | Phil Munoz | Phil | 17 | 97.63 | Store Per | false | | 1117 | Lori Lightfoot | Lori | 17 | 39.16 | Store | true | | 1 | Kumar | Anil | 19 | 45.45 | Store | true | | 2 | Kamesh | Bh | null | 32.89 | Store | true | +--------------+---------------------+-------------+--------------+---------+------------+--------+ 38 rows selected (0.261 seconds) But if requested explicitly, the column does show: 0: jdbc:drill:zk=local> select last_name from `drill/data/emp`; +--------------+ | last_name | +--------------+ | null | | null | | null | | null | | null | | null | | null | | null | | null | | null | | null | | null | | null | | null | | null | | null | | null | | null | | null | | Eurich | | Pierson | | Jones | | Beatty | | McNight | | Isaacs | | Yonce | | Foster | | Reed | | Kwiatkowski | | Vann | | Smith | | Hensley | | Owens | | Castillo | | Munoz | | Lightfoot | | B | | Venkata | +--------------+ 38 rows selected (0.159 seconds) Things get even WORSE when a parallel plan is chosen -- some column data shows up under the wrong columns: 0: jdbc:drill:zk=local> alter session set planner.slice_target = 1; +-------+--------------------------------+ | ok | summary | +-------+--------------------------------+ | true | planner.slice_target updated. | +-------+--------------------------------+ 1 row selected (0.084 seconds) 0: jdbc:drill:zk=local> select * from `drill/data/emp`; +--------------+---------------------+-------------+--------------+---------+------------+------------+ | employee_id | full_name | first_name | position_id | rating | position | isFTE | +--------------+---------------------+-------------+--------------+---------+------------+------------+ | 1101 | Steve Eurich | Steve | 16 | 23.0 | Store T | true | | 1102 | Mary Pierson | Mary | 16 | 45.6 | Store T | true | | 1103 | Leo Jones | Leo | 16 | 85.94 | Store Tem | true | | 1104 | Nancy Beatty | Nancy | 16 | 97.16 | Store T | false | | 1105 | Clara McNight | Clara | 16 | 81.25 | Store | true | | 1106 | null | Marcella | 17 | 67.86 | Stor | false | | 1107 | Charlotte Yonce | Charlotte | 17 | 52.17 | Stor | true | | 1108 | Benjamin Foster | Benjamin | 17 | 89.8 | Stor | false | | 1109 | John Reed | John | 17 | 12.9 | Store Per | false | | 1110 | Lynn Kwiatkowski | Lynn | 17 | 25.76 | St | true | | 1111 | Donald Vann | Donald | 17 | 34.86 | Store Per | false | | 1112 | null | William | null | 79.06 | St | true | | 1113 | Amy Hensley | Amy | 17 | 82.96 | Store Pe | false | | 1114 | Judy Owens | Judy | 17 | 24.6 | Store Per | true | | 1115 | Frederick Castillo | Frederick | 17 | 82.36 | S | false | | 1116 | Phil Munoz | Phil | 17 | 97.63 | Store Per | false | | 1117 | Lori Lightfoot | Lori | 17 | 39.16 | Store | true | | 1 | Kumar | Anil | 19 | 45.45 | Store | true | | 2 | Kamesh | Bh | null | 32.89 | Store | true | | 1101 | Steve Eurich | Steve | Eurich | 16 | 23.0 | Store T | | 1102 | Mary Pierson | Mary | Pierson | 16 | 45.6 | Store T | | 1103 | Leo Jones | Leo | Jones | 16 | 85.94 | Store Tem | | 1104 | Nancy Beatty | Nancy | Beatty | 16 | 97.16 | Store T | | 1105 | Clara McNight | Clara | McNight | 16 | 81.25 | Store | | 1106 | null | Marcella | Isaacs | 17 | 67.86 | Stor | | 1107 | Charlotte Yonce | Charlotte | Yonce | 17 | 52.17 | Stor | | 1108 | Benjamin Foster | Benjamin | Foster | 17 | 89.8 | Stor | | 1109 | John Reed | John | Reed | 17 | 12.9 | Store Per | | 1110 | Lynn Kwiatkowski | Lynn | Kwiatkowski | 17 | 25.76 | St | | 1111 | Donald Vann | Donald | Vann | 17 | 34.86 | Store Per | | 1112 | null | William | Smith | null | 79.06 | St | | 1113 | Amy Hensley | Amy | Hensley | 17 | 82.96 | Store Pe | | 1114 | Judy Owens | Judy | Owens | 17 | 24.6 | Store Per | | 1115 | Frederick Castillo | Frederick | Castillo | 17 | 82.36 | S | | 1116 | Phil Munoz | Phil | Munoz | 17 | 97.63 | Store Per | | 1117 | Lori Lightfoot | Lori | Lightfoot | 17 | 39.16 | Store | | 1 | Kumar | Anil | B | 19 | 45.45 | Store | | 2 | Kamesh | Bh | Venkata | null | 32.89 | Store | +--------------+---------------------+-------------+--------------+---------+------------+------------+ 38 rows selected (0.253 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)