Re: Hudi Query Latest Records

2020-10-09 Thread Balaji Varadarajan
 The table description looks ok. Are you seeing an exception or incorrect data. 
This might require some debugging. Please open a support github ticket and we 
will look at it . Please provide same query output in hive and spark along with 
file listings of your dataset and .hoodie folder.
Thanks,Balaji.V
On Friday, October 9, 2020, 01:25:58 AM PDT, Ranganath Tirumala 
 wrote:  
 
 Hi Balaji,

Here is the desc formatted

col_name    data_type    comment    
# col_name                data_type              comment                
    NULL    NULL    
_hoodie_commit_time    string        
_hoodie_commit_seqno    string        
_hoodie_record_key    string        
_hoodie_partition_path    string        
_hoodie_file_name    string        
ee_id    bigint        
er_id    bigint        
evnt_src    string        
evnt_typ    string        
evnt_confidence    string        
evnt_yr    string        
evnt_src_id    string        
evnt_amt    string        
evnt_prtn    string        
evnt_sys_dt    string        
evnt_bus_dt    string        
evnt_strt_dt    string        
evnt_end_dt    string        
evnt_id    string        
    NULL    NULL    
# Detailed Table Information    NULL    NULL    
Database:              default              NULL    
OwnerType:              USER                    NULL    
Owner:                  user999                  NULL    
CreateTime:            Wed Oct 07 22:17:42 AEDT 2020    NULL    
LastAccessTime:        UNKNOWN                NULL    
Retention:              0                      NULL    
Location:              hdfs://path-to-external-table    NULL    
Table Type:            EXTERNAL_TABLE          NULL    
Table Parameters:    NULL    NULL    
    EXTERNAL                TRUE                    
    last_commit_time_sync    20201009072526          
    numFiles                2619                    
    totalSize              51903292933            
    transient_lastDdlTime    1602069462              
    NULL    NULL    
# Storage Information    NULL    NULL    
SerDe Library:
    org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe    NULL
InputFormat:            org.apache.hudi.hadoop.HoodieParquetInputFormat    NULL 
   
OutputFormat:
    org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat    NULL
Compressed:            No                      NULL    
Num Buckets:            -1                      NULL    
Bucket Columns:        []                      NULL    
Sort Columns:          []                      NULL    
Storage Desc Params:    NULL    NULL    
    serialization.format    1


On Fri, 9 Oct 2020 at 19:07, Balaji Varadarajan 
wrote:

>  Can you paste the detailed hive table description. (desc formatted .)
> Balaji.V
>    On Friday, October 9, 2020, 12:37:19 AM PDT, Ranganath Tirumala <
> ranganath.tirum...@gmail.com> wrote:
>
>  Hi Balaji,
>
> I cannot get this to work on hive / hue.
> It works as expected using spark shell.
>
> Any idea how I can get this to work in hive / hue?
>
> Regards,
>
> Ranganath
>
> On Thu, 1 Oct 2020 at 09:45, Balaji Varadarajan  >
> wrote:
>
> >  Assuming commit1 happened before commit2, this is what you should expect
> > when running a standard query through query engines.
> > Balaji.V
> >
> >    On Tuesday, September 29, 2020, 03:04:17 PM PDT, Ranganath Tirumala <
> > ranganath.tirum...@gmail.com> wrote:
> >
> >  Hi,
> >
> > Is there a way we can query to get the latest record across commits?
> >
> > e.g.
> > commit-1
> > Record-1, Value A
> > Record-2, Value A
> >
> > commit-2
> > Record-1, Value B
> > Record-3, Value B
> >
> > desired output
> > Record-1, Value B
> > Record-2, Value A
> > Record-3, Value B
> >
> > --
> > Regards,
> >
> > Ranganath Tirumala
> >
>
>
>
> --
> Regards,
>
> Ranganath Tirumala
>



-- 
Regards,

Ranganath Tirumala
  

Re: Hudi Query Latest Records

2020-10-09 Thread Ranganath Tirumala
Hi Balaji,

Here is the desc formatted

col_namedata_type   comment 
# col_name  data_type   comment 
NULLNULL
_hoodie_commit_time string  
_hoodie_commit_seqnostring  
_hoodie_record_key  string  
_hoodie_partition_path  string  
_hoodie_file_name   string  
ee_id   bigint  
er_id   bigint  
evnt_srcstring  
evnt_typstring  
evnt_confidence string  
evnt_yr string  
evnt_src_id string  
evnt_amtstring  
evnt_prtn   string  
evnt_sys_dt string  
evnt_bus_dt string  
evnt_strt_dtstring  
evnt_end_dt string  
evnt_id string  
NULLNULL
# Detailed Table InformationNULLNULL
Database:   default NULL
OwnerType:  USERNULL
Owner:  user999 NULL
CreateTime: Wed Oct 07 22:17:42 AEDT 2020   NULL
LastAccessTime: UNKNOWN NULL
Retention:  0   NULL
Location:   hdfs://path-to-external-table   NULL
Table Type: EXTERNAL_TABLE  NULL
Table Parameters:   NULLNULL
EXTERNALTRUE
last_commit_time_sync   20201009072526  
numFiles2619
totalSize   51903292933 
transient_lastDdlTime   1602069462  
NULLNULL
# Storage Information   NULLNULL
SerDe Library:
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe NULL
InputFormat:org.apache.hudi.hadoop.HoodieParquetInputFormat NULL
OutputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat  NULL
Compressed: No  NULL
Num Buckets:-1  NULL
Bucket Columns: []  NULL
Sort Columns:   []  NULL
Storage Desc Params:NULLNULL
serialization.format1


On Fri, 9 Oct 2020 at 19:07, Balaji Varadarajan 
wrote:

>  Can you paste the detailed hive table description. (desc formatted .)
> Balaji.V
> On Friday, October 9, 2020, 12:37:19 AM PDT, Ranganath Tirumala <
> ranganath.tirum...@gmail.com> wrote:
>
>  Hi Balaji,
>
> I cannot get this to work on hive / hue.
> It works as expected using spark shell.
>
> Any idea how I can get this to work in hive / hue?
>
> Regards,
>
> Ranganath
>
> On Thu, 1 Oct 2020 at 09:45, Balaji Varadarajan  >
> wrote:
>
> >  Assuming commit1 happened before commit2, this is what you should expect
> > when running a standard query through query engines.
> > Balaji.V
> >
> >On Tuesday, September 29, 2020, 03:04:17 PM PDT, Ranganath Tirumala <
> > ranganath.tirum...@gmail.com> wrote:
> >
> >  Hi,
> >
> > Is there a way we can query to get the latest record across commits?
> >
> > e.g.
> > commit-1
> > Record-1, Value A
> > Record-2, Value A
> >
> > commit-2
> > Record-1, Value B
> > Record-3, Value B
> >
> > desired output
> > Record-1, Value B
> > Record-2, Value A
> > Record-3, Value B
> >
> > --
> > Regards,
> >
> > Ranganath Tirumala
> >
>
>
>
> --
> Regards,
>
> Ranganath Tirumala
>



-- 
Regards,

Ranganath Tirumala


Re: Hudi Query Latest Records

2020-10-09 Thread Balaji Varadarajan
 Can you paste the detailed hive table description. (desc formatted .)
Balaji.V
On Friday, October 9, 2020, 12:37:19 AM PDT, Ranganath Tirumala 
 wrote:  
 
 Hi Balaji,

I cannot get this to work on hive / hue.
It works as expected using spark shell.

Any idea how I can get this to work in hive / hue?

Regards,

Ranganath

On Thu, 1 Oct 2020 at 09:45, Balaji Varadarajan 
wrote:

>  Assuming commit1 happened before commit2, this is what you should expect
> when running a standard query through query engines.
> Balaji.V
>
>    On Tuesday, September 29, 2020, 03:04:17 PM PDT, Ranganath Tirumala <
> ranganath.tirum...@gmail.com> wrote:
>
>  Hi,
>
> Is there a way we can query to get the latest record across commits?
>
> e.g.
> commit-1
> Record-1, Value A
> Record-2, Value A
>
> commit-2
> Record-1, Value B
> Record-3, Value B
>
> desired output
> Record-1, Value B
> Record-2, Value A
> Record-3, Value B
>
> --
> Regards,
>
> Ranganath Tirumala
>



-- 
Regards,

Ranganath Tirumala
  

Re: Hudi Query Latest Records

2020-10-09 Thread Ranganath Tirumala
Hi Balaji,

I cannot get this to work on hive / hue.
It works as expected using spark shell.

Any idea how I can get this to work in hive / hue?

Regards,

Ranganath

On Thu, 1 Oct 2020 at 09:45, Balaji Varadarajan 
wrote:

>  Assuming commit1 happened before commit2, this is what you should expect
> when running a standard query through query engines.
> Balaji.V
>
> On Tuesday, September 29, 2020, 03:04:17 PM PDT, Ranganath Tirumala <
> ranganath.tirum...@gmail.com> wrote:
>
>  Hi,
>
> Is there a way we can query to get the latest record across commits?
>
> e.g.
> commit-1
> Record-1, Value A
> Record-2, Value A
>
> commit-2
> Record-1, Value B
> Record-3, Value B
>
> desired output
> Record-1, Value B
> Record-2, Value A
> Record-3, Value B
>
> --
> Regards,
>
> Ranganath Tirumala
>



-- 
Regards,

Ranganath Tirumala