I have used the queries below to create parquet files from 2 CSV files: create table dfs.datatransfer.`ct_fremde/2015/07` as select to_timestamp(columns[0],'dd.MM.yyyy') as Datum, columns[1] as Airline_In, columns[2] as Trip_In, columns[3] as Ac_Typ, columns[4] as Ordertype, to_time(columns[5],'HH:mm') as Start_Time, columns[6] as End_Time, columns[7] as Reg_In from dfs.datatransfer.`CT_Fremde_Juli_2015.tsv`
create table dfs.datatransfer.`ct_fremde/2015/08` as select to_timestamp(columns[0],'dd.MM.yyyy') as Datum, columns[1] as Airline_In, columns[2] as Trip_In, columns[3] as Ac_Typ, columns[4] as Ordertype, to_time(columns[5],'HH:mm') as Start_Time, columns[6] as End_Time, columns[7] as Reg_In from dfs.datatransfer.`CT_Fremde_August_2015.tsv` when I query the data using following sql: select distinct dir0 from dfs.datatransfer.`ct_fremde/2015/*` ... I get 07 and 08 as the result. When I run a group by query: select dir0,count(3) from dfs.datatransfer.`ct_fremde/2015/*` group by dir0 ... I get 2115 for 07 and 2128 for 08 back. Now when I run following query: select * from dfs.datatransfer.`ct_fremde/2015/*` where dir0=7 ... I get records back from the query And when I run this query: select * from dfs.datatransfer.`ct_fremde/2015/*` where dir0=8 ... I do NOT get a result back Am I doing something wrong here? Or what is going on here? Greetings, Uwe
