Re: Field not found in record HoodieException

2019-09-25 Thread Taher Koitawala
Hi Kabeer, Thanks for the test. Really appreciate the effort you put into
this. I will check that and report back to you.

Regards,
Taher Koitawala

On Tue, Sep 24, 2019 at 5:54 PM Kabeer Ahmed  wrote:

> Taher,
>
> Sorry I got a bit delayed. I have now put everything you may need in a
> gist at: https://gist.github.com/smdahmed/3af0e3110e07cf76772bb73d5e9b65e2
> (
> https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/0?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F3af0e3110e07cf76772bb73d5e9b65e2=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> )
> Note that I am still on 0.4.6. So you may need to swap com.uber.hoodie
> with right org.apache.hudi etc. And I am still on the RDDs based
> implementation. But I can assure you that if you swap the code with a
> dataframe based implementation, it will still work same. If you are looking
> for DataFrame based implementation look at the code sample at:
> https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> (
> https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/1?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> )
>
> You will see in my gist at:
> https://gist.github.com/smdahmed/3af0e3110e07cf76772bb73d5e9b65e2 (
> https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/2?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F3af0e3110e07cf76772bb73d5e9b65e2=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
> the following:
> Code sample to generate parquet
>
> Hive Table creation and addition of partitions
>
> Spark Shell based code that is inline with what you had needed.
>
> If you want any changes to be made, please do not hesitate. I can modify
> the code and able to spin tests for you. But I can assure you that this
> will work and to the best of my belief, this is what you had aimed to
> achieve.
>
> Thanks
> Kabeer.
>
> On Sep 18 2019, at 5:13 pm, Taher Koitawala  wrote:
> > Hi Kabeer,
> > Really appreciate the help. Take your time nothing urgent.
> >
> > Regards,
> > Taher Koitawala
> >
> > On Wed, Sep 18, 2019, 9:38 PM Kabeer Ahmed  wrote:
> > > Taher,
> > > I have a half baked code for test. I shall complete it and test it and
> > > revert back to you - latest by weekend. Please bear with me. If it is
> super
> > > urgent or you are really stuck, then let me know.
> > > Thanks,
> > > On Sep 18 2019, at 7:27 am, Gary Li  wrote:
> > > > I think we can also try to find if there is any illegal character
> that
> > > > could mess up Avro scheme in the column. Like a stand alone “/“ or
> “.”
> > > >
> > > > On Tue, Sep 17, 2019 at 8:35 PM Vinoth Chandar 
> > > wrote:
> > > > > [Orthogonal comment] It's so awesome to see us troubleshooting
> > > >
> > >
> > > together..
> > > > > Thanks everyone on this thread!
> > > > >
> > > > > On Tue, Sep 17, 2019 at 8:04 PM Taher Koitawala <
> taher...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > No there are no nulls in the data and I am getting the same
> error.
> > > > > > On Wed, Sep 18, 2019, 3:33 AM Kabeer Ahmed  >
> > > > >
> > > >
> > >
> > > wrote:
> > > > > > > Taher - did you find any NULLs in the data? If you are still
> not
> > > > > >
> > > > >
> > > >
> > >
> > > able
> > > > > to
> > > > > > > make progress, let us know.
> > > > > > >
> > > > > > > On Sep 17 2019, at 8:30 am, Taher Koitawala <
> taher...@gmail.com>
> > > > > wrote:
> > > > > > > > Sure Gary, Let me check if i can find any nulls in there
> > > > > > > >
> > > > > > > > On Tue, Sep 17, 2019 at 1:28 AM Gary Li <
> > > yanjia.gary...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > Hello, I have seen this exception before. In my case, if
> the
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > precombine key
> > > > > > > > > of one entry is null, then I will have this error. I'd
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > recommend
> > > > > > > >
> > > > > > >
> > > > > > > checking
> > > > > > > > > if there is any row has null in *last_update.*
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Gary
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed <
> > > > > kab...@linuxmail.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Taher,
> > > > > > > > > > Let me spin a test for you to test similar scenario and
> let
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > me
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > revert
> > > > > > > > > back
> > > > > > > > > > to you.
> > > > > > > > > > On Sep 16 2019, at 2:09 pm, Taher Koitawala <
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > taher...@gmail.com>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > wrote:
> > > > > > > > > > > Hi Kabeer, hive table has 

Re: Field not found in record HoodieException

2019-09-24 Thread Kabeer Ahmed
Taher,

Sorry I got a bit delayed. I have now put everything you may need in a gist at: 
https://gist.github.com/smdahmed/3af0e3110e07cf76772bb73d5e9b65e2 
(https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/0?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F3af0e3110e07cf76772bb73d5e9b65e2=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
Note that I am still on 0.4.6. So you may need to swap com.uber.hoodie with 
right org.apache.hudi etc. And I am still on the RDDs based implementation. But 
I can assure you that if you swap the code with a dataframe based 
implementation, it will still work same. If you are looking for DataFrame based 
implementation look at the code sample at: 
https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 
(https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/1?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)

You will see in my gist at: 
https://gist.github.com/smdahmed/3af0e3110e07cf76772bb73d5e9b65e2 
(https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/2?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F3af0e3110e07cf76772bb73d5e9b65e2=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
 the following:
Code sample to generate parquet

Hive Table creation and addition of partitions

Spark Shell based code that is inline with what you had needed.

If you want any changes to be made, please do not hesitate. I can modify the 
code and able to spin tests for you. But I can assure you that this will work 
and to the best of my belief, this is what you had aimed to achieve.

Thanks
Kabeer.

On Sep 18 2019, at 5:13 pm, Taher Koitawala  wrote:
> Hi Kabeer,
> Really appreciate the help. Take your time nothing urgent.
>
> Regards,
> Taher Koitawala
>
> On Wed, Sep 18, 2019, 9:38 PM Kabeer Ahmed  wrote:
> > Taher,
> > I have a half baked code for test. I shall complete it and test it and
> > revert back to you - latest by weekend. Please bear with me. If it is super
> > urgent or you are really stuck, then let me know.
> > Thanks,
> > On Sep 18 2019, at 7:27 am, Gary Li  wrote:
> > > I think we can also try to find if there is any illegal character that
> > > could mess up Avro scheme in the column. Like a stand alone “/“ or “.”
> > >
> > > On Tue, Sep 17, 2019 at 8:35 PM Vinoth Chandar 
> > wrote:
> > > > [Orthogonal comment] It's so awesome to see us troubleshooting
> > >
> >
> > together..
> > > > Thanks everyone on this thread!
> > > >
> > > > On Tue, Sep 17, 2019 at 8:04 PM Taher Koitawala 
> > > > wrote:
> > > >
> > > > > No there are no nulls in the data and I am getting the same error.
> > > > > On Wed, Sep 18, 2019, 3:33 AM Kabeer Ahmed 
> > > >
> > >
> >
> > wrote:
> > > > > > Taher - did you find any NULLs in the data? If you are still not
> > > > >
> > > >
> > >
> >
> > able
> > > > to
> > > > > > make progress, let us know.
> > > > > >
> > > > > > On Sep 17 2019, at 8:30 am, Taher Koitawala 
> > > > wrote:
> > > > > > > Sure Gary, Let me check if i can find any nulls in there
> > > > > > >
> > > > > > > On Tue, Sep 17, 2019 at 1:28 AM Gary Li <
> > yanjia.gary...@gmail.com>
> > > > > > wrote:
> > > > > > > > Hello, I have seen this exception before. In my case, if the
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > precombine key
> > > > > > > > of one entry is null, then I will have this error. I'd
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > recommend
> > > > > > >
> > > > > >
> > > > > > checking
> > > > > > > > if there is any row has null in *last_update.*
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Gary
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed <
> > > > kab...@linuxmail.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Taher,
> > > > > > > > > Let me spin a test for you to test similar scenario and let
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > me
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > revert
> > > > > > > > back
> > > > > > > > > to you.
> > > > > > > > > On Sep 16 2019, at 2:09 pm, Taher Koitawala <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > taher...@gmail.com>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > wrote:
> > > > > > > > > > Hi Kabeer, hive table has everything as a string. However
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > when
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > fetching
> > > > > > > > > > data, the spark query is
> > > > > > > > > > .sql(String.format("select
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> > contact_id,country,cast(last_update
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > as
> > > > > > > > > > 

Re: Field not found in record HoodieException

2019-09-18 Thread Gary Li
I think we can also try to find if there is any illegal character that
could mess up Avro scheme in the column. Like a stand alone “/“ or “.”

On Tue, Sep 17, 2019 at 8:35 PM Vinoth Chandar  wrote:

> [Orthogonal comment] It's so awesome to see us troubleshooting together..
> Thanks everyone on this thread!
>
> On Tue, Sep 17, 2019 at 8:04 PM Taher Koitawala 
> wrote:
>
> > No there are no nulls in the data and I am getting the same error.
> >
> > On Wed, Sep 18, 2019, 3:33 AM Kabeer Ahmed  wrote:
> >
> > > Taher - did you find any NULLs in the data? If you are still not able
> to
> > > make progress, let us know.
> > >
> > > On Sep 17 2019, at 8:30 am, Taher Koitawala 
> wrote:
> > > > Sure Gary, Let me check if i can find any nulls in there
> > > >
> > > > On Tue, Sep 17, 2019 at 1:28 AM Gary Li 
> > > wrote:
> > > > > Hello, I have seen this exception before. In my case, if the
> > > precombine key
> > > > > of one entry is null, then I will have this error. I'd recommend
> > > checking
> > > > > if there is any row has null in *last_update.*
> > > > >
> > > > > Best,
> > > > > Gary
> > > > >
> > > > >
> > > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed <
> kab...@linuxmail.org>
> > > > > wrote:
> > > > >
> > > > > > Taher,
> > > > > > Let me spin a test for you to test similar scenario and let me
> > revert
> > > > > back
> > > > > > to you.
> > > > > > On Sep 16 2019, at 2:09 pm, Taher Koitawala 
> > > wrote:
> > > > > > > Hi Kabeer, hive table has everything as a string. However when
> > > fetching
> > > > > > > data, the spark query is
> > > > > > > .sql(String.format("select contact_id,country,cast(last_update
> as
> > > > > > > TIMESTAMP) as last_update from %s",hiveTable))
> > > > > > >
> > > > > > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed <
> > kab...@linuxmail.org
> > > >
> > > > > > wrote:
> > > > > > > > Is last_update a timestamp? Can you please throw the hive
> > schema
> > > that
> > > > > > >
> > > > > >
> > > > > > you
> > > > > > > > are using to create table. You could run show create table
> > > > > > >
> > > > > >
> > > > > >  and
> > > > > > > > send us the output please?
> > > > > > > >
> > > > > > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala <
> > taher...@gmail.com>
> > > > > > wrote:
> > > > > > > > > Hi Kaber, Same issue when last_update is converted to long.
> > > > > > > > >
> > > > > > > > > HoodieSparkSQLWriter: Registered avro schema : {
> > > > > > > > > "type" : "record",
> > > > > > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > > > > > "fields" : [ {
> > > > > > > > > "name" : "contact_id",
> > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > }, {
> > > > > > > > > "name" : "country",
> > > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > > }, {
> > > > > > > > > "name" : "last_update",
> > > > > > > > > "type" : [ "long", "null" ]
> > > > > > > > > } ]
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed <
> > > kab...@linuxmail.org
> > > > > > > > wrote:
> > > > > > > > > > Taher,
> > > > > > > > > > This error of field not found exception with HUDI is
> mostly
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > because of
> > > > > > > > >
> > > > > > > >
> > > > > > > > 2
> > > > > > > > > > cases:
> > > > > > > > > > The data types of the fields do not match with the types
> > > listed
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > in
> > > > > > hive
> > > > > > > > > > tables.
> > > > > > > > > >
> > > > > > > > > > The field may really not be preset - which doesnt seem to
> > be
> > > your
> > > > > > case.
> > > > > > > > > > I looked into the schema in your log which is below.
> > > Basically
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > most of
> > > > > > > > >
> > > > > > > >
> > > > > > > > the
> > > > > > > > > > items seem to be string but I am not sure what are their
> > > types
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > that you
> > > > > > > > > > have defined in Hive. If you look into Hive table
> > > definition, you
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > may
> > > > > > > > >
> > > > > > > >
> > > > > > > > find
> > > > > > > > > > the bug soon.
> > > > > > > > > >
> > > > > > > > > > On another note, if you are still struggling; then you
> > > should try
> > > > > > to
> > > > > > > > start
> > > > > > > > > > with a very small example and keep building it. A ready
> > made
> > > code
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > copy
> > > > > > > > >
> > > > > > > >
> > > > > > > > is
> > > > > > > > > > at:
> > > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> > > > > > > > > > (
> > > > > > > > > >
> > > > > > > >
> > > 

Re: Field not found in record HoodieException

2019-09-17 Thread Vinoth Chandar
[Orthogonal comment] It's so awesome to see us troubleshooting together..
Thanks everyone on this thread!

On Tue, Sep 17, 2019 at 8:04 PM Taher Koitawala  wrote:

> No there are no nulls in the data and I am getting the same error.
>
> On Wed, Sep 18, 2019, 3:33 AM Kabeer Ahmed  wrote:
>
> > Taher - did you find any NULLs in the data? If you are still not able to
> > make progress, let us know.
> >
> > On Sep 17 2019, at 8:30 am, Taher Koitawala  wrote:
> > > Sure Gary, Let me check if i can find any nulls in there
> > >
> > > On Tue, Sep 17, 2019 at 1:28 AM Gary Li 
> > wrote:
> > > > Hello, I have seen this exception before. In my case, if the
> > precombine key
> > > > of one entry is null, then I will have this error. I'd recommend
> > checking
> > > > if there is any row has null in *last_update.*
> > > >
> > > > Best,
> > > > Gary
> > > >
> > > >
> > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed 
> > > > wrote:
> > > >
> > > > > Taher,
> > > > > Let me spin a test for you to test similar scenario and let me
> revert
> > > > back
> > > > > to you.
> > > > > On Sep 16 2019, at 2:09 pm, Taher Koitawala 
> > wrote:
> > > > > > Hi Kabeer, hive table has everything as a string. However when
> > fetching
> > > > > > data, the spark query is
> > > > > > .sql(String.format("select contact_id,country,cast(last_update as
> > > > > > TIMESTAMP) as last_update from %s",hiveTable))
> > > > > >
> > > > > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed <
> kab...@linuxmail.org
> > >
> > > > > wrote:
> > > > > > > Is last_update a timestamp? Can you please throw the hive
> schema
> > that
> > > > > >
> > > > >
> > > > > you
> > > > > > > are using to create table. You could run show create table
> > > > > >
> > > > >
> > > > >  and
> > > > > > > send us the output please?
> > > > > > >
> > > > > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala <
> taher...@gmail.com>
> > > > > wrote:
> > > > > > > > Hi Kaber, Same issue when last_update is converted to long.
> > > > > > > >
> > > > > > > > HoodieSparkSQLWriter: Registered avro schema : {
> > > > > > > > "type" : "record",
> > > > > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > > > > "fields" : [ {
> > > > > > > > "name" : "contact_id",
> > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > }, {
> > > > > > > > "name" : "country",
> > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > }, {
> > > > > > > > "name" : "last_update",
> > > > > > > > "type" : [ "long", "null" ]
> > > > > > > > } ]
> > > > > > > > }
> > > > > > > >
> > > > > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed <
> > kab...@linuxmail.org
> > > > > > > wrote:
> > > > > > > > > Taher,
> > > > > > > > > This error of field not found exception with HUDI is mostly
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > because of
> > > > > > > >
> > > > > > >
> > > > > > > 2
> > > > > > > > > cases:
> > > > > > > > > The data types of the fields do not match with the types
> > listed
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > in
> > > > > hive
> > > > > > > > > tables.
> > > > > > > > >
> > > > > > > > > The field may really not be preset - which doesnt seem to
> be
> > your
> > > > > case.
> > > > > > > > > I looked into the schema in your log which is below.
> > Basically
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > most of
> > > > > > > >
> > > > > > >
> > > > > > > the
> > > > > > > > > items seem to be string but I am not sure what are their
> > types
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > that you
> > > > > > > > > have defined in Hive. If you look into Hive table
> > definition, you
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > may
> > > > > > > >
> > > > > > >
> > > > > > > find
> > > > > > > > > the bug soon.
> > > > > > > > >
> > > > > > > > > On another note, if you are still struggling; then you
> > should try
> > > > > to
> > > > > > > start
> > > > > > > > > with a very small example and keep building it. A ready
> made
> > code
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > > copy
> > > > > > > >
> > > > > > >
> > > > > > > is
> > > > > > > > > at:
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> >
> https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> > > > > > > > > (
> > > > > > > > >
> > > > > > >
> > > > >
> > > >
> >
> https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > > > > > )
> > > > > > > > > written by Vinoth. You must take that small example build
> it
> > up
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > and
> > > > > > > >
> > > > > > >
> > > > > > > then
> > > > > > > > > relate to your own.
> > > > > > > > > Let us know if this still doesnt 

Re: Field not found in record HoodieException

2019-09-17 Thread Kabeer Ahmed
Taher - did you find any NULLs in the data? If you are still not able to make 
progress, let us know.

On Sep 17 2019, at 8:30 am, Taher Koitawala  wrote:
> Sure Gary, Let me check if i can find any nulls in there
>
> On Tue, Sep 17, 2019 at 1:28 AM Gary Li  wrote:
> > Hello, I have seen this exception before. In my case, if the precombine key
> > of one entry is null, then I will have this error. I'd recommend checking
> > if there is any row has null in *last_update.*
> >
> > Best,
> > Gary
> >
> >
> > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed 
> > wrote:
> >
> > > Taher,
> > > Let me spin a test for you to test similar scenario and let me revert
> > back
> > > to you.
> > > On Sep 16 2019, at 2:09 pm, Taher Koitawala  wrote:
> > > > Hi Kabeer, hive table has everything as a string. However when fetching
> > > > data, the spark query is
> > > > .sql(String.format("select contact_id,country,cast(last_update as
> > > > TIMESTAMP) as last_update from %s",hiveTable))
> > > >
> > > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed 
> > > wrote:
> > > > > Is last_update a timestamp? Can you please throw the hive schema that
> > > >
> > >
> > > you
> > > > > are using to create table. You could run show create table
> > > >
> > >
> > >  and
> > > > > send us the output please?
> > > > >
> > > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala 
> > > wrote:
> > > > > > Hi Kaber, Same issue when last_update is converted to long.
> > > > > >
> > > > > > HoodieSparkSQLWriter: Registered avro schema : {
> > > > > > "type" : "record",
> > > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > > "fields" : [ {
> > > > > > "name" : "contact_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "country",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_update",
> > > > > > "type" : [ "long", "null" ]
> > > > > > } ]
> > > > > > }
> > > > > >
> > > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed  > > > > wrote:
> > > > > > > Taher,
> > > > > > > This error of field not found exception with HUDI is mostly
> > > > > >
> > > > >
> > > >
> > >
> > > because of
> > > > > >
> > > > >
> > > > > 2
> > > > > > > cases:
> > > > > > > The data types of the fields do not match with the types listed
> > > > > >
> > > > >
> > > >
> > >
> >
> > in
> > > hive
> > > > > > > tables.
> > > > > > >
> > > > > > > The field may really not be preset - which doesnt seem to be your
> > > case.
> > > > > > > I looked into the schema in your log which is below. Basically
> > > > > >
> > > > >
> > > >
> > >
> > > most of
> > > > > >
> > > > >
> > > > > the
> > > > > > > items seem to be string but I am not sure what are their types
> > > > > >
> > > > >
> > > >
> > >
> > > that you
> > > > > > > have defined in Hive. If you look into Hive table definition, you
> > > > > >
> > > > >
> > > >
> > >
> > > may
> > > > > >
> > > > >
> > > > > find
> > > > > > > the bug soon.
> > > > > > >
> > > > > > > On another note, if you are still struggling; then you should try
> > > to
> > > > > start
> > > > > > > with a very small example and keep building it. A ready made code
> > > > > >
> > > > >
> > > >
> > >
> > > copy
> > > > > >
> > > > >
> > > > > is
> > > > > > > at:
> > > > > > >
> > > > >
> > >
> > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> > > > > > > (
> > > > > > >
> > > > >
> > >
> > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > > > )
> > > > > > > written by Vinoth. You must take that small example build it up
> > > > > >
> > > > >
> > > >
> > >
> >
> > and
> > > > > >
> > > > >
> > > > > then
> > > > > > > relate to your own.
> > > > > > > Let us know if this still doesnt work for you.
> > > > > > > Thanks
> > > > > > > Kabeer.
> > > > > > >
> > > > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro
> > > schema
> > > > > : {
> > > > > > > > "type" : "record",
> > > > > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > > > > "fields" : [ {
> > > > > > > > "name" : "contact_id",
> > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > }, {
> > > > > > > > "name" : "phone_number",
> > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > }, {
> > > > > > > > "name" : "encrypted_phone_number",
> > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > }, {
> > > > > > > > "name" : "phone_number_hash",
> > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > }, {
> > > > > > > > "name" : "first_name",
> > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > }, {
> > > > > > > > "name" : "last_name",
> > > > > > > > "type" : [ "string", "null" ]
> > > > > > > > }, {
> > 

Re: Field not found in record HoodieException

2019-09-17 Thread Taher Koitawala
Sure Gary, Let me check if i can find any nulls in there

On Tue, Sep 17, 2019 at 1:28 AM Gary Li  wrote:

> Hello, I have seen this exception before. In my case, if the precombine key
> of one entry is null, then I will have this error. I'd recommend checking
> if there is any row has null in *last_update.*
>
> Best,
> Gary
>
>
> On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed 
> wrote:
>
> > Taher,
> >
> > Let me spin a test for you to test similar scenario and let me revert
> back
> > to you.
> > On Sep 16 2019, at 2:09 pm, Taher Koitawala  wrote:
> > > Hi Kabeer, hive table has everything as a string. However when fetching
> > > data, the spark query is
> > > .sql(String.format("select contact_id,country,cast(last_update as
> > > TIMESTAMP) as last_update from %s",hiveTable))
> > >
> > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed 
> > wrote:
> > > > Is last_update a timestamp? Can you please throw the hive schema that
> > you
> > > > are using to create table. You could run show create table
> >  and
> > > > send us the output please?
> > > >
> > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala 
> > wrote:
> > > > > Hi Kaber, Same issue when last_update is converted to long.
> > > > >
> > > > > HoodieSparkSQLWriter: Registered avro schema : {
> > > > > "type" : "record",
> > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > "fields" : [ {
> > > > > "name" : "contact_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "country",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_update",
> > > > > "type" : [ "long", "null" ]
> > > > > } ]
> > > > > }
> > > > >
> > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed  >
> > > > wrote:
> > > > > > Taher,
> > > > > > This error of field not found exception with HUDI is mostly
> > because of
> > > > >
> > > >
> > > > 2
> > > > > > cases:
> > > > > > The data types of the fields do not match with the types listed
> in
> > hive
> > > > > > tables.
> > > > > >
> > > > > > The field may really not be preset - which doesnt seem to be your
> > case.
> > > > > > I looked into the schema in your log which is below. Basically
> > most of
> > > > >
> > > >
> > > > the
> > > > > > items seem to be string but I am not sure what are their types
> > that you
> > > > > > have defined in Hive. If you look into Hive table definition, you
> > may
> > > > >
> > > >
> > > > find
> > > > > > the bug soon.
> > > > > >
> > > > > > On another note, if you are still struggling; then you should try
> > to
> > > > start
> > > > > > with a very small example and keep building it. A ready made code
> > copy
> > > > >
> > > >
> > > > is
> > > > > > at:
> > > > > >
> > > >
> >
> https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> > > > > > (
> > > > > >
> > > >
> >
> https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > > )
> > > > > > written by Vinoth. You must take that small example build it up
> and
> > > > >
> > > >
> > > > then
> > > > > > relate to your own.
> > > > > > Let us know if this still doesnt work for you.
> > > > > > Thanks
> > > > > > Kabeer.
> > > > > >
> > > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro
> > schema
> > > > : {
> > > > > > > "type" : "record",
> > > > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > > > "fields" : [ {
> > > > > > > "name" : "contact_id",
> > > > > > > "type" : [ "string", "null" ]
> > > > > > > }, {
> > > > > > > "name" : "phone_number",
> > > > > > > "type" : [ "string", "null" ]
> > > > > > > }, {
> > > > > > > "name" : "encrypted_phone_number",
> > > > > > > "type" : [ "string", "null" ]
> > > > > > > }, {
> > > > > > > "name" : "phone_number_hash",
> > > > > > > "type" : [ "string", "null" ]
> > > > > > > }, {
> > > > > > > "name" : "first_name",
> > > > > > > "type" : [ "string", "null" ]
> > > > > > > }, {
> > > > > > > "name" : "last_name",
> > > > > > > "type" : [ "string", "null" ]
> > > > > > > }, {
> > > > > > > "name" : "email_id",
> > > > > > > "type" : [ "string", "null" ]
> > > > > > > }, {
> > > > > > > "name" : "encrypted_email_id",
> > > > > > > "type" : [ "string", "null" ]
> > > > > > > }, {
> > > > > > > "name" : "email_id_hash",
> > > > > > > "type" : [ "string", "null" ]
> > > > > > > }, {
> > > > > > > "name" : "email_id_1",
> > > > > > > "type" : [ "string", "null" ]
> > > > > > > }, {
> > > > > > > "name" : "encrypted_email_id_1",
> > > > > > > "type" : [ "string", "null" ]
> > > > > > > }, {
> > > > > > > "name" : "email_id_1_hash",
> > > > > > > "type" : [ "string", "null" ]
> > > > > > > }, {
> > > > > > > "name" : "e_domain",
> > > > > > > "type" : [ "string", "null" ]
> > > 

Re: Field not found in record HoodieException

2019-09-16 Thread Gary Li
Hello, I have seen this exception before. In my case, if the precombine key
of one entry is null, then I will have this error. I'd recommend checking
if there is any row has null in *last_update.*

Best,
Gary


On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed  wrote:

> Taher,
>
> Let me spin a test for you to test similar scenario and let me revert back
> to you.
> On Sep 16 2019, at 2:09 pm, Taher Koitawala  wrote:
> > Hi Kabeer, hive table has everything as a string. However when fetching
> > data, the spark query is
> > .sql(String.format("select contact_id,country,cast(last_update as
> > TIMESTAMP) as last_update from %s",hiveTable))
> >
> > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed 
> wrote:
> > > Is last_update a timestamp? Can you please throw the hive schema that
> you
> > > are using to create table. You could run show create table
>  and
> > > send us the output please?
> > >
> > > On Sep 16 2019, at 1:32 pm, Taher Koitawala 
> wrote:
> > > > Hi Kaber, Same issue when last_update is converted to long.
> > > >
> > > > HoodieSparkSQLWriter: Registered avro schema : {
> > > > "type" : "record",
> > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > "fields" : [ {
> > > > "name" : "contact_id",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "country",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "last_update",
> > > > "type" : [ "long", "null" ]
> > > > } ]
> > > > }
> > > >
> > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed 
> > > wrote:
> > > > > Taher,
> > > > > This error of field not found exception with HUDI is mostly
> because of
> > > >
> > >
> > > 2
> > > > > cases:
> > > > > The data types of the fields do not match with the types listed in
> hive
> > > > > tables.
> > > > >
> > > > > The field may really not be preset - which doesnt seem to be your
> case.
> > > > > I looked into the schema in your log which is below. Basically
> most of
> > > >
> > >
> > > the
> > > > > items seem to be string but I am not sure what are their types
> that you
> > > > > have defined in Hive. If you look into Hive table definition, you
> may
> > > >
> > >
> > > find
> > > > > the bug soon.
> > > > >
> > > > > On another note, if you are still struggling; then you should try
> to
> > > start
> > > > > with a very small example and keep building it. A ready made code
> copy
> > > >
> > >
> > > is
> > > > > at:
> > > > >
> > >
> https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> > > > > (
> > > > >
> > >
> https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > > )
> > > > > written by Vinoth. You must take that small example build it up and
> > > >
> > >
> > > then
> > > > > relate to your own.
> > > > > Let us know if this still doesnt work for you.
> > > > > Thanks
> > > > > Kabeer.
> > > > >
> > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro
> schema
> > > : {
> > > > > > "type" : "record",
> > > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > > "fields" : [ {
> > > > > > "name" : "contact_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "phone_number",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "encrypted_phone_number",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "phone_number_hash",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "first_name",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "last_name",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "email_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "encrypted_email_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "email_id_hash",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "email_id_1",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "encrypted_email_id_1",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "email_id_1_hash",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "e_domain",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "account_id",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "company",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "company_1",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "flc",
> > > > > > "type" : [ "string", "null" ]
> > > > > > }, {
> > > > > > "name" : "flc_1",
> > > 

Re: Field not found in record HoodieException

2019-09-16 Thread Kabeer Ahmed
Taher,

Let me spin a test for you to test similar scenario and let me revert back to 
you.
On Sep 16 2019, at 2:09 pm, Taher Koitawala  wrote:
> Hi Kabeer, hive table has everything as a string. However when fetching
> data, the spark query is
> .sql(String.format("select contact_id,country,cast(last_update as
> TIMESTAMP) as last_update from %s",hiveTable))
>
> On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed  wrote:
> > Is last_update a timestamp? Can you please throw the hive schema that you
> > are using to create table. You could run show create table  and
> > send us the output please?
> >
> > On Sep 16 2019, at 1:32 pm, Taher Koitawala  wrote:
> > > Hi Kaber, Same issue when last_update is converted to long.
> > >
> > > HoodieSparkSQLWriter: Registered avro schema : {
> > > "type" : "record",
> > > "name" : "s3_master_contacts_list_hudi_record",
> > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > "fields" : [ {
> > > "name" : "contact_id",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "country",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "last_update",
> > > "type" : [ "long", "null" ]
> > > } ]
> > > }
> > >
> > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed 
> > wrote:
> > > > Taher,
> > > > This error of field not found exception with HUDI is mostly because of
> > >
> >
> > 2
> > > > cases:
> > > > The data types of the fields do not match with the types listed in hive
> > > > tables.
> > > >
> > > > The field may really not be preset - which doesnt seem to be your case.
> > > > I looked into the schema in your log which is below. Basically most of
> > >
> >
> > the
> > > > items seem to be string but I am not sure what are their types that you
> > > > have defined in Hive. If you look into Hive table definition, you may
> > >
> >
> > find
> > > > the bug soon.
> > > >
> > > > On another note, if you are still struggling; then you should try to
> > start
> > > > with a very small example and keep building it. A ready made code copy
> > >
> >
> > is
> > > > at:
> > > >
> > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> > > > (
> > > >
> > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> > )
> > > > written by Vinoth. You must take that small example build it up and
> > >
> >
> > then
> > > > relate to your own.
> > > > Let us know if this still doesnt work for you.
> > > > Thanks
> > > > Kabeer.
> > > >
> > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema
> > : {
> > > > > "type" : "record",
> > > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > > "fields" : [ {
> > > > > "name" : "contact_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "phone_number",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "encrypted_phone_number",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "phone_number_hash",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "first_name",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "last_name",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "email_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "encrypted_email_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "email_id_hash",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "email_id_1",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "encrypted_email_id_1",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "email_id_1_hash",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "e_domain",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "account_id",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "company",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "company_1",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "flc",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "flc_1",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "flc_trim",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "fln",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "title",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "title_hash",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "address",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > "name" : "zip_code",
> > > > > "type" : [ "string", "null" ]
> > > > > }, {
> > > > > 

Re: Field not found in record HoodieException

2019-09-16 Thread Taher Koitawala
Hi Kabeer, hive table has everything as a string. However when fetching
data, the spark query is
.sql(String.format("select contact_id,country,cast(last_update as
TIMESTAMP) as last_update from %s",hiveTable))

On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed  wrote:

> Is last_update a timestamp? Can you please throw the hive schema that you
> are using to create table. You could run show create table  and
> send us the output please?
>
> On Sep 16 2019, at 1:32 pm, Taher Koitawala  wrote:
> > Hi Kaber, Same issue when last_update is converted to long.
> >
> > HoodieSparkSQLWriter: Registered avro schema : {
> > "type" : "record",
> > "name" : "s3_master_contacts_list_hudi_record",
> > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > "fields" : [ {
> > "name" : "contact_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "country",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_update",
> > "type" : [ "long", "null" ]
> > } ]
> > }
> >
> > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed 
> wrote:
> > > Taher,
> > > This error of field not found exception with HUDI is mostly because of
> 2
> > > cases:
> > > The data types of the fields do not match with the types listed in hive
> > > tables.
> > >
> > > The field may really not be preset - which doesnt seem to be your case.
> > > I looked into the schema in your log which is below. Basically most of
> the
> > > items seem to be string but I am not sure what are their types that you
> > > have defined in Hive. If you look into Hive table definition, you may
> find
> > > the bug soon.
> > >
> > > On another note, if you are still struggling; then you should try to
> start
> > > with a very small example and keep building it. A ready made code copy
> is
> > > at:
> > >
> https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> > > (
> > >
> https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D
> )
> > > written by Vinoth. You must take that small example build it up and
> then
> > > relate to your own.
> > > Let us know if this still doesnt work for you.
> > > Thanks
> > > Kabeer.
> > >
> > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema
> : {
> > > > "type" : "record",
> > > > "name" : "s3_master_contacts_list_hudi_record",
> > > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > > "fields" : [ {
> > > > "name" : "contact_id",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "phone_number",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "encrypted_phone_number",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "phone_number_hash",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "first_name",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "last_name",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "email_id",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "encrypted_email_id",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "email_id_hash",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "email_id_1",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "encrypted_email_id_1",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "email_id_1_hash",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "e_domain",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "account_id",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "company",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "company_1",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "flc",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "flc_1",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "flc_trim",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "fln",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "title",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "title_hash",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "address",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "zip_code",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "country",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "city",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "website",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "website_1",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "timezone",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > "name" : "address_2",
> > > > "type" : [ "string", "null" ]
> > > > }, {
> > > > 

Re: Field not found in record HoodieException

2019-09-16 Thread Kabeer Ahmed
Is last_update a timestamp? Can you please throw the hive schema that you are 
using to create table. You could run show create table  and send us 
the output please?

On Sep 16 2019, at 1:32 pm, Taher Koitawala  wrote:
> Hi Kaber, Same issue when last_update is converted to long.
>
> HoodieSparkSQLWriter: Registered avro schema : {
> "type" : "record",
> "name" : "s3_master_contacts_list_hudi_record",
> "namespace" : "hoodie.s3_master_contacts_list_hudi",
> "fields" : [ {
> "name" : "contact_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "country",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_update",
> "type" : [ "long", "null" ]
> } ]
> }
>
> On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed  wrote:
> > Taher,
> > This error of field not found exception with HUDI is mostly because of 2
> > cases:
> > The data types of the fields do not match with the types listed in hive
> > tables.
> >
> > The field may really not be preset - which doesnt seem to be your case.
> > I looked into the schema in your log which is below. Basically most of the
> > items seem to be string but I am not sure what are their types that you
> > have defined in Hive. If you look into Hive table definition, you may find
> > the bug soon.
> >
> > On another note, if you are still struggling; then you should try to start
> > with a very small example and keep building it. A ready made code copy is
> > at:
> > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> > (
> > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
> > written by Vinoth. You must take that small example build it up and then
> > relate to your own.
> > Let us know if this still doesnt work for you.
> > Thanks
> > Kabeer.
> >
> > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema : {
> > > "type" : "record",
> > > "name" : "s3_master_contacts_list_hudi_record",
> > > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > > "fields" : [ {
> > > "name" : "contact_id",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "phone_number",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "encrypted_phone_number",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "phone_number_hash",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "first_name",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "last_name",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "email_id",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "encrypted_email_id",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "email_id_hash",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "email_id_1",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "encrypted_email_id_1",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "email_id_1_hash",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "e_domain",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "account_id",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "company",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "company_1",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "flc",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "flc_1",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "flc_trim",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "fln",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "title",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "title_hash",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "address",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "zip_code",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "country",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "city",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "website",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "website_1",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "timezone",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "address_2",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "state_province",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "employees",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "employee_range",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "rev_range",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "std_rev_range",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "company_revenue",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "sic_code",
> > > "type" : [ "string", "null" ]
> > > }, {
> > > "name" : "nic_code",
> > > 

Re: Field not found in record HoodieException

2019-09-16 Thread Taher Koitawala
Hi Kaber, Same issue when last_update is converted to long.

HoodieSparkSQLWriter: Registered avro schema : {
  "type" : "record",
  "name" : "s3_master_contacts_list_hudi_record",
  "namespace" : "hoodie.s3_master_contacts_list_hudi",
  "fields" : [ {
"name" : "contact_id",
"type" : [ "string", "null" ]
  }, {
"name" : "country",
"type" : [ "string", "null" ]
  }, {
"name" : "last_update",
"type" : [ "long", "null" ]
  } ]
}

On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed  wrote:

> Taher,
>
> This error of field not found exception with HUDI is mostly because of 2
> cases:
> The data types of the fields do not match with the types listed in hive
> tables.
>
> The field may really not be preset - which doesnt seem to be your case.
>
> I looked into the schema in your log which is below. Basically most of the
> items seem to be string but I am not sure what are their types that you
> have defined in Hive. If you look into Hive table definition, you may find
> the bug soon.
>
> On another note, if you are still struggling; then you should try to start
> with a very small example and keep building it. A ready made code copy is
> at:
> https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262
> (
> https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
> written by Vinoth. You must take that small example build it up and then
> relate to your own.
> Let us know if this still doesnt work for you.
> Thanks
> Kabeer.
>
> > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema : {
> > "type" : "record",
> > "name" : "s3_master_contacts_list_hudi_record",
> > "namespace" : "hoodie.s3_master_contacts_list_hudi",
> > "fields" : [ {
> > "name" : "contact_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "phone_number",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "encrypted_phone_number",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "phone_number_hash",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "first_name",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_name",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "email_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "encrypted_email_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "email_id_hash",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "email_id_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "encrypted_email_id_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "email_id_1_hash",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "e_domain",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "account_id",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "company",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "company_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "flc",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "flc_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "flc_trim",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "fln",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "title",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "title_hash",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "address",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "zip_code",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "country",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "city",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "website",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "website_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "timezone",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "address_2",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "state_province",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "employees",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "employee_range",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "rev_range",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "std_rev_range",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "company_revenue",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "sic_code",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "nic_code",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "primary_industry",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "primary_industry_1",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "standard_primary_industry",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "primary_db_source",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_r8_email_open",
> > "type" : [ "string", "null" ]
> > }, {
> > "name" : "last_r8_email_click",
> > "type" : [ "string", "null" 

Re: Field not found in record HoodieException

2019-09-16 Thread Kabeer Ahmed
Taher,

This error of field not found exception with HUDI is mostly because of 2 cases:
The data types of the fields do not match with the types listed in hive tables.

The field may really not be preset - which doesnt seem to be your case.

I looked into the schema in your log which is below. Basically most of the 
items seem to be string but I am not sure what are their types that you have 
defined in Hive. If you look into Hive table definition, you may find the bug 
soon.

On another note, if you are still struggling; then you should try to start with 
a very small example and keep building it. A ready made code copy is at: 
https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 
(https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D)
 written by Vinoth. You must take that small example build it up and then 
relate to your own.
Let us know if this still doesnt work for you.
Thanks
Kabeer.

> 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema : {
> "type" : "record",
> "name" : "s3_master_contacts_list_hudi_record",
> "namespace" : "hoodie.s3_master_contacts_list_hudi",
> "fields" : [ {
> "name" : "contact_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "phone_number",
> "type" : [ "string", "null" ]
> }, {
> "name" : "encrypted_phone_number",
> "type" : [ "string", "null" ]
> }, {
> "name" : "phone_number_hash",
> "type" : [ "string", "null" ]
> }, {
> "name" : "first_name",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_name",
> "type" : [ "string", "null" ]
> }, {
> "name" : "email_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "encrypted_email_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "email_id_hash",
> "type" : [ "string", "null" ]
> }, {
> "name" : "email_id_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "encrypted_email_id_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "email_id_1_hash",
> "type" : [ "string", "null" ]
> }, {
> "name" : "e_domain",
> "type" : [ "string", "null" ]
> }, {
> "name" : "account_id",
> "type" : [ "string", "null" ]
> }, {
> "name" : "company",
> "type" : [ "string", "null" ]
> }, {
> "name" : "company_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "flc",
> "type" : [ "string", "null" ]
> }, {
> "name" : "flc_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "flc_trim",
> "type" : [ "string", "null" ]
> }, {
> "name" : "fln",
> "type" : [ "string", "null" ]
> }, {
> "name" : "title",
> "type" : [ "string", "null" ]
> }, {
> "name" : "title_hash",
> "type" : [ "string", "null" ]
> }, {
> "name" : "address",
> "type" : [ "string", "null" ]
> }, {
> "name" : "zip_code",
> "type" : [ "string", "null" ]
> }, {
> "name" : "country",
> "type" : [ "string", "null" ]
> }, {
> "name" : "city",
> "type" : [ "string", "null" ]
> }, {
> "name" : "website",
> "type" : [ "string", "null" ]
> }, {
> "name" : "website_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "timezone",
> "type" : [ "string", "null" ]
> }, {
> "name" : "address_2",
> "type" : [ "string", "null" ]
> }, {
> "name" : "state_province",
> "type" : [ "string", "null" ]
> }, {
> "name" : "employees",
> "type" : [ "string", "null" ]
> }, {
> "name" : "employee_range",
> "type" : [ "string", "null" ]
> }, {
> "name" : "rev_range",
> "type" : [ "string", "null" ]
> }, {
> "name" : "std_rev_range",
> "type" : [ "string", "null" ]
> }, {
> "name" : "company_revenue",
> "type" : [ "string", "null" ]
> }, {
> "name" : "sic_code",
> "type" : [ "string", "null" ]
> }, {
> "name" : "nic_code",
> "type" : [ "string", "null" ]
> }, {
> "name" : "primary_industry",
> "type" : [ "string", "null" ]
> }, {
> "name" : "primary_industry_1",
> "type" : [ "string", "null" ]
> }, {
> "name" : "standard_primary_industry",
> "type" : [ "string", "null" ]
> }, {
> "name" : "primary_db_source",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_r8_email_open",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_r8_email_click",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_zd_email_open",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_zd_email_click",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_phone_verified",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_lead_verified",
> "type" : [ "string", "null" ]
> }, {
> "name" : "email_status",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_email_status_updated_at",
> "type" : [ "string", "null" ]
> }, {
> "name" : "is_firmographically_validated",
> "type" : [ "string", "null" ]
> }, {
> "name" : "last_firmographically_validated_at",
> "type" : [ "string", "null" ]
> }, {
> "name" : "is_demographically_validated",
> "type" : [ "string", "null" ]
> }, {
> "name" : "dq_reason",
> "type" : [ "string", "null" ]
> }, {
> "name" : "dq_subreason",
> "type" : [ "string", "null" ]