Re: Hudi Query Latest Records
The table description looks ok. Are you seeing an exception or incorrect data. This might require some debugging. Please open a support github ticket and we will look at it . Please provide same query output in hive and spark along with file listings of your dataset and .hoodie folder. Thanks,Balaji.V On Friday, October 9, 2020, 01:25:58 AM PDT, Ranganath Tirumala wrote: Hi Balaji, Here is the desc formatted col_name data_type comment # col_name data_type comment NULL NULL _hoodie_commit_time string _hoodie_commit_seqno string _hoodie_record_key string _hoodie_partition_path string _hoodie_file_name string ee_id bigint er_id bigint evnt_src string evnt_typ string evnt_confidence string evnt_yr string evnt_src_id string evnt_amt string evnt_prtn string evnt_sys_dt string evnt_bus_dt string evnt_strt_dt string evnt_end_dt string evnt_id string NULL NULL # Detailed Table Information NULL NULL Database: default NULL OwnerType: USER NULL Owner: user999 NULL CreateTime: Wed Oct 07 22:17:42 AEDT 2020 NULL LastAccessTime: UNKNOWN NULL Retention: 0 NULL Location: hdfs://path-to-external-table NULL Table Type: EXTERNAL_TABLE NULL Table Parameters: NULL NULL EXTERNAL TRUE last_commit_time_sync 20201009072526 numFiles 2619 totalSize 51903292933 transient_lastDdlTime 1602069462 NULL NULL # Storage Information NULL NULL SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe NULL InputFormat: org.apache.hudi.hadoop.HoodieParquetInputFormat NULL OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat NULL Compressed: No NULL Num Buckets: -1 NULL Bucket Columns: [] NULL Sort Columns: [] NULL Storage Desc Params: NULL NULL serialization.format 1 On Fri, 9 Oct 2020 at 19:07, Balaji Varadarajan wrote: > Can you paste the detailed hive table description. (desc formatted .) > Balaji.V > On Friday, October 9, 2020, 12:37:19 AM PDT, Ranganath Tirumala < > ranganath.tirum...@gmail.com> wrote: > > Hi Balaji, > > I cannot get this to work on hive / hue. > It works as expected using spark shell. > > Any idea how I can get this to work in hive / hue? > > Regards, > > Ranganath > > On Thu, 1 Oct 2020 at 09:45, Balaji Varadarajan > > wrote: > > > Assuming commit1 happened before commit2, this is what you should expect > > when running a standard query through query engines. > > Balaji.V > > > > On Tuesday, September 29, 2020, 03:04:17 PM PDT, Ranganath Tirumala < > > ranganath.tirum...@gmail.com> wrote: > > > > Hi, > > > > Is there a way we can query to get the latest record across commits? > > > > e.g. > > commit-1 > > Record-1, Value A > > Record-2, Value A > > > > commit-2 > > Record-1, Value B > > Record-3, Value B > > > > desired output > > Record-1, Value B > > Record-2, Value A > > Record-3, Value B > > > > -- > > Regards, > > > > Ranganath Tirumala > > > > > > -- > Regards, > > Ranganath Tirumala > -- Regards, Ranganath Tirumala
Re: Hudi Query Latest Records
Hi Balaji, Here is the desc formatted col_namedata_type comment # col_name data_type comment NULLNULL _hoodie_commit_time string _hoodie_commit_seqnostring _hoodie_record_key string _hoodie_partition_path string _hoodie_file_name string ee_id bigint er_id bigint evnt_srcstring evnt_typstring evnt_confidence string evnt_yr string evnt_src_id string evnt_amtstring evnt_prtn string evnt_sys_dt string evnt_bus_dt string evnt_strt_dtstring evnt_end_dt string evnt_id string NULLNULL # Detailed Table InformationNULLNULL Database: default NULL OwnerType: USERNULL Owner: user999 NULL CreateTime: Wed Oct 07 22:17:42 AEDT 2020 NULL LastAccessTime: UNKNOWN NULL Retention: 0 NULL Location: hdfs://path-to-external-table NULL Table Type: EXTERNAL_TABLE NULL Table Parameters: NULLNULL EXTERNALTRUE last_commit_time_sync 20201009072526 numFiles2619 totalSize 51903292933 transient_lastDdlTime 1602069462 NULLNULL # Storage Information NULLNULL SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe NULL InputFormat:org.apache.hudi.hadoop.HoodieParquetInputFormat NULL OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat NULL Compressed: No NULL Num Buckets:-1 NULL Bucket Columns: [] NULL Sort Columns: [] NULL Storage Desc Params:NULLNULL serialization.format1 On Fri, 9 Oct 2020 at 19:07, Balaji Varadarajan wrote: > Can you paste the detailed hive table description. (desc formatted .) > Balaji.V > On Friday, October 9, 2020, 12:37:19 AM PDT, Ranganath Tirumala < > ranganath.tirum...@gmail.com> wrote: > > Hi Balaji, > > I cannot get this to work on hive / hue. > It works as expected using spark shell. > > Any idea how I can get this to work in hive / hue? > > Regards, > > Ranganath > > On Thu, 1 Oct 2020 at 09:45, Balaji Varadarajan > > wrote: > > > Assuming commit1 happened before commit2, this is what you should expect > > when running a standard query through query engines. > > Balaji.V > > > >On Tuesday, September 29, 2020, 03:04:17 PM PDT, Ranganath Tirumala < > > ranganath.tirum...@gmail.com> wrote: > > > > Hi, > > > > Is there a way we can query to get the latest record across commits? > > > > e.g. > > commit-1 > > Record-1, Value A > > Record-2, Value A > > > > commit-2 > > Record-1, Value B > > Record-3, Value B > > > > desired output > > Record-1, Value B > > Record-2, Value A > > Record-3, Value B > > > > -- > > Regards, > > > > Ranganath Tirumala > > > > > > -- > Regards, > > Ranganath Tirumala > -- Regards, Ranganath Tirumala
Re: Hudi Query Latest Records
Can you paste the detailed hive table description. (desc formatted .) Balaji.V On Friday, October 9, 2020, 12:37:19 AM PDT, Ranganath Tirumala wrote: Hi Balaji, I cannot get this to work on hive / hue. It works as expected using spark shell. Any idea how I can get this to work in hive / hue? Regards, Ranganath On Thu, 1 Oct 2020 at 09:45, Balaji Varadarajan wrote: > Assuming commit1 happened before commit2, this is what you should expect > when running a standard query through query engines. > Balaji.V > > On Tuesday, September 29, 2020, 03:04:17 PM PDT, Ranganath Tirumala < > ranganath.tirum...@gmail.com> wrote: > > Hi, > > Is there a way we can query to get the latest record across commits? > > e.g. > commit-1 > Record-1, Value A > Record-2, Value A > > commit-2 > Record-1, Value B > Record-3, Value B > > desired output > Record-1, Value B > Record-2, Value A > Record-3, Value B > > -- > Regards, > > Ranganath Tirumala > -- Regards, Ranganath Tirumala
Re: Hudi Query Latest Records
Hi Balaji, I cannot get this to work on hive / hue. It works as expected using spark shell. Any idea how I can get this to work in hive / hue? Regards, Ranganath On Thu, 1 Oct 2020 at 09:45, Balaji Varadarajan wrote: > Assuming commit1 happened before commit2, this is what you should expect > when running a standard query through query engines. > Balaji.V > > On Tuesday, September 29, 2020, 03:04:17 PM PDT, Ranganath Tirumala < > ranganath.tirum...@gmail.com> wrote: > > Hi, > > Is there a way we can query to get the latest record across commits? > > e.g. > commit-1 > Record-1, Value A > Record-2, Value A > > commit-2 > Record-1, Value B > Record-3, Value B > > desired output > Record-1, Value B > Record-2, Value A > Record-3, Value B > > -- > Regards, > > Ranganath Tirumala > -- Regards, Ranganath Tirumala