Re: Field not found in record HoodieException
Hi Kabeer, Thanks for the test. Really appreciate the effort you put into this. I will check that and report back to you. Regards, Taher Koitawala On Tue, Sep 24, 2019 at 5:54 PM Kabeer Ahmed wrote: > Taher, > > Sorry I got a bit delayed. I have now put everything you may need in a > gist at: https://gist.github.com/smdahmed/3af0e3110e07cf76772bb73d5e9b65e2 > ( > https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/0?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F3af0e3110e07cf76772bb73d5e9b65e2=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > ) > Note that I am still on 0.4.6. So you may need to swap com.uber.hoodie > with right org.apache.hudi etc. And I am still on the RDDs based > implementation. But I can assure you that if you swap the code with a > dataframe based implementation, it will still work same. If you are looking > for DataFrame based implementation look at the code sample at: > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 > ( > https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/1?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > ) > > You will see in my gist at: > https://gist.github.com/smdahmed/3af0e3110e07cf76772bb73d5e9b65e2 ( > https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/2?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F3af0e3110e07cf76772bb73d5e9b65e2=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) > the following: > Code sample to generate parquet > > Hive Table creation and addition of partitions > > Spark Shell based code that is inline with what you had needed. > > If you want any changes to be made, please do not hesitate. I can modify > the code and able to spin tests for you. But I can assure you that this > will work and to the best of my belief, this is what you had aimed to > achieve. > > Thanks > Kabeer. > > On Sep 18 2019, at 5:13 pm, Taher Koitawala wrote: > > Hi Kabeer, > > Really appreciate the help. Take your time nothing urgent. > > > > Regards, > > Taher Koitawala > > > > On Wed, Sep 18, 2019, 9:38 PM Kabeer Ahmed wrote: > > > Taher, > > > I have a half baked code for test. I shall complete it and test it and > > > revert back to you - latest by weekend. Please bear with me. If it is > super > > > urgent or you are really stuck, then let me know. > > > Thanks, > > > On Sep 18 2019, at 7:27 am, Gary Li wrote: > > > > I think we can also try to find if there is any illegal character > that > > > > could mess up Avro scheme in the column. Like a stand alone “/“ or > “.” > > > > > > > > On Tue, Sep 17, 2019 at 8:35 PM Vinoth Chandar > > > wrote: > > > > > [Orthogonal comment] It's so awesome to see us troubleshooting > > > > > > > > > > together.. > > > > > Thanks everyone on this thread! > > > > > > > > > > On Tue, Sep 17, 2019 at 8:04 PM Taher Koitawala < > taher...@gmail.com> > > > > > wrote: > > > > > > > > > > > No there are no nulls in the data and I am getting the same > error. > > > > > > On Wed, Sep 18, 2019, 3:33 AM Kabeer Ahmed > > > > > > > > > > > > > > > > wrote: > > > > > > > Taher - did you find any NULLs in the data? If you are still > not > > > > > > > > > > > > > > > > > > > > > able > > > > > to > > > > > > > make progress, let us know. > > > > > > > > > > > > > > On Sep 17 2019, at 8:30 am, Taher Koitawala < > taher...@gmail.com> > > > > > wrote: > > > > > > > > Sure Gary, Let me check if i can find any nulls in there > > > > > > > > > > > > > > > > On Tue, Sep 17, 2019 at 1:28 AM Gary Li < > > > yanjia.gary...@gmail.com> > > > > > > > wrote: > > > > > > > > > Hello, I have seen this exception before. In my case, if > the > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > precombine key > > > > > > > > > of one entry is null, then I will have this error. I'd > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > recommend > > > > > > > > > > > > > > > > > > > > > > checking > > > > > > > > > if there is any row has null in *last_update.* > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Gary > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed < > > > > > kab...@linuxmail.org> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Taher, > > > > > > > > > > Let me spin a test for you to test similar scenario and > let > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > me > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > revert > > > > > > > > > back > > > > > > > > > > to you. > > > > > > > > > > On Sep 16 2019, at 2:09 pm, Taher Koitawala < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > taher...@gmail.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > Hi Kabeer, hive table has
Re: Field not found in record HoodieException
Taher, Sorry I got a bit delayed. I have now put everything you may need in a gist at: https://gist.github.com/smdahmed/3af0e3110e07cf76772bb73d5e9b65e2 (https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/0?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F3af0e3110e07cf76772bb73d5e9b65e2=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) Note that I am still on 0.4.6. So you may need to swap com.uber.hoodie with right org.apache.hudi etc. And I am still on the RDDs based implementation. But I can assure you that if you swap the code with a dataframe based implementation, it will still work same. If you are looking for DataFrame based implementation look at the code sample at: https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 (https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/1?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) You will see in my gist at: https://gist.github.com/smdahmed/3af0e3110e07cf76772bb73d5e9b65e2 (https://link.getmailspring.com/link/930f6985-8e72-4efd-9c97-85965911e...@getmailspring.com/2?redirect=https%3A%2F%2Fgist.github.com%2Fsmdahmed%2F3af0e3110e07cf76772bb73d5e9b65e2=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) the following: Code sample to generate parquet Hive Table creation and addition of partitions Spark Shell based code that is inline with what you had needed. If you want any changes to be made, please do not hesitate. I can modify the code and able to spin tests for you. But I can assure you that this will work and to the best of my belief, this is what you had aimed to achieve. Thanks Kabeer. On Sep 18 2019, at 5:13 pm, Taher Koitawala wrote: > Hi Kabeer, > Really appreciate the help. Take your time nothing urgent. > > Regards, > Taher Koitawala > > On Wed, Sep 18, 2019, 9:38 PM Kabeer Ahmed wrote: > > Taher, > > I have a half baked code for test. I shall complete it and test it and > > revert back to you - latest by weekend. Please bear with me. If it is super > > urgent or you are really stuck, then let me know. > > Thanks, > > On Sep 18 2019, at 7:27 am, Gary Li wrote: > > > I think we can also try to find if there is any illegal character that > > > could mess up Avro scheme in the column. Like a stand alone “/“ or “.” > > > > > > On Tue, Sep 17, 2019 at 8:35 PM Vinoth Chandar > > wrote: > > > > [Orthogonal comment] It's so awesome to see us troubleshooting > > > > > > > together.. > > > > Thanks everyone on this thread! > > > > > > > > On Tue, Sep 17, 2019 at 8:04 PM Taher Koitawala > > > > wrote: > > > > > > > > > No there are no nulls in the data and I am getting the same error. > > > > > On Wed, Sep 18, 2019, 3:33 AM Kabeer Ahmed > > > > > > > > > > > wrote: > > > > > > Taher - did you find any NULLs in the data? If you are still not > > > > > > > > > > > > > > > > able > > > > to > > > > > > make progress, let us know. > > > > > > > > > > > > On Sep 17 2019, at 8:30 am, Taher Koitawala > > > > wrote: > > > > > > > Sure Gary, Let me check if i can find any nulls in there > > > > > > > > > > > > > > On Tue, Sep 17, 2019 at 1:28 AM Gary Li < > > yanjia.gary...@gmail.com> > > > > > > wrote: > > > > > > > > Hello, I have seen this exception before. In my case, if the > > > > > > > > > > > > > > > > > > > > > > > > > > precombine key > > > > > > > > of one entry is null, then I will have this error. I'd > > > > > > > > > > > > > > > > > > > > > > > > > > > > > recommend > > > > > > > > > > > > > > > > > > > checking > > > > > > > > if there is any row has null in *last_update.* > > > > > > > > > > > > > > > > Best, > > > > > > > > Gary > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed < > > > > kab...@linuxmail.org> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Taher, > > > > > > > > > Let me spin a test for you to test similar scenario and let > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > me > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > revert > > > > > > > > back > > > > > > > > > to you. > > > > > > > > > On Sep 16 2019, at 2:09 pm, Taher Koitawala < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > taher...@gmail.com> > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > Hi Kabeer, hive table has everything as a string. However > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > when > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > fetching > > > > > > > > > > data, the spark query is > > > > > > > > > > .sql(String.format("select > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > contact_id,country,cast(last_update > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > as > > > > > > > > > >
Re: Field not found in record HoodieException
I think we can also try to find if there is any illegal character that could mess up Avro scheme in the column. Like a stand alone “/“ or “.” On Tue, Sep 17, 2019 at 8:35 PM Vinoth Chandar wrote: > [Orthogonal comment] It's so awesome to see us troubleshooting together.. > Thanks everyone on this thread! > > On Tue, Sep 17, 2019 at 8:04 PM Taher Koitawala > wrote: > > > No there are no nulls in the data and I am getting the same error. > > > > On Wed, Sep 18, 2019, 3:33 AM Kabeer Ahmed wrote: > > > > > Taher - did you find any NULLs in the data? If you are still not able > to > > > make progress, let us know. > > > > > > On Sep 17 2019, at 8:30 am, Taher Koitawala > wrote: > > > > Sure Gary, Let me check if i can find any nulls in there > > > > > > > > On Tue, Sep 17, 2019 at 1:28 AM Gary Li > > > wrote: > > > > > Hello, I have seen this exception before. In my case, if the > > > precombine key > > > > > of one entry is null, then I will have this error. I'd recommend > > > checking > > > > > if there is any row has null in *last_update.* > > > > > > > > > > Best, > > > > > Gary > > > > > > > > > > > > > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed < > kab...@linuxmail.org> > > > > > wrote: > > > > > > > > > > > Taher, > > > > > > Let me spin a test for you to test similar scenario and let me > > revert > > > > > back > > > > > > to you. > > > > > > On Sep 16 2019, at 2:09 pm, Taher Koitawala > > > wrote: > > > > > > > Hi Kabeer, hive table has everything as a string. However when > > > fetching > > > > > > > data, the spark query is > > > > > > > .sql(String.format("select contact_id,country,cast(last_update > as > > > > > > > TIMESTAMP) as last_update from %s",hiveTable)) > > > > > > > > > > > > > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed < > > kab...@linuxmail.org > > > > > > > > > > wrote: > > > > > > > > Is last_update a timestamp? Can you please throw the hive > > schema > > > that > > > > > > > > > > > > > > > > > > > you > > > > > > > > are using to create table. You could run show create table > > > > > > > > > > > > > > > > > > > and > > > > > > > > send us the output please? > > > > > > > > > > > > > > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala < > > taher...@gmail.com> > > > > > > wrote: > > > > > > > > > Hi Kaber, Same issue when last_update is converted to long. > > > > > > > > > > > > > > > > > > HoodieSparkSQLWriter: Registered avro schema : { > > > > > > > > > "type" : "record", > > > > > > > > > "name" : "s3_master_contacts_list_hudi_record", > > > > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > > > > > > > > "fields" : [ { > > > > > > > > > "name" : "contact_id", > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > }, { > > > > > > > > > "name" : "country", > > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > > }, { > > > > > > > > > "name" : "last_update", > > > > > > > > > "type" : [ "long", "null" ] > > > > > > > > > } ] > > > > > > > > > } > > > > > > > > > > > > > > > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed < > > > kab...@linuxmail.org > > > > > > > > wrote: > > > > > > > > > > Taher, > > > > > > > > > > This error of field not found exception with HUDI is > mostly > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > because of > > > > > > > > > > > > > > > > > > > > > > > > > 2 > > > > > > > > > > cases: > > > > > > > > > > The data types of the fields do not match with the types > > > listed > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > in > > > > > > hive > > > > > > > > > > tables. > > > > > > > > > > > > > > > > > > > > The field may really not be preset - which doesnt seem to > > be > > > your > > > > > > case. > > > > > > > > > > I looked into the schema in your log which is below. > > > Basically > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > most of > > > > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > > items seem to be string but I am not sure what are their > > > types > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > that you > > > > > > > > > > have defined in Hive. If you look into Hive table > > > definition, you > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > > > > > > > > > > > find > > > > > > > > > > the bug soon. > > > > > > > > > > > > > > > > > > > > On another note, if you are still struggling; then you > > > should try > > > > > > to > > > > > > > > start > > > > > > > > > > with a very small example and keep building it. A ready > > made > > > code > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > copy > > > > > > > > > > > > > > > > > > > > > > > > > is > > > > > > > > > > at: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 > > > > > > > > > > ( > > > > > > > > > > > > > > > > > > > > >
Re: Field not found in record HoodieException
[Orthogonal comment] It's so awesome to see us troubleshooting together.. Thanks everyone on this thread! On Tue, Sep 17, 2019 at 8:04 PM Taher Koitawala wrote: > No there are no nulls in the data and I am getting the same error. > > On Wed, Sep 18, 2019, 3:33 AM Kabeer Ahmed wrote: > > > Taher - did you find any NULLs in the data? If you are still not able to > > make progress, let us know. > > > > On Sep 17 2019, at 8:30 am, Taher Koitawala wrote: > > > Sure Gary, Let me check if i can find any nulls in there > > > > > > On Tue, Sep 17, 2019 at 1:28 AM Gary Li > > wrote: > > > > Hello, I have seen this exception before. In my case, if the > > precombine key > > > > of one entry is null, then I will have this error. I'd recommend > > checking > > > > if there is any row has null in *last_update.* > > > > > > > > Best, > > > > Gary > > > > > > > > > > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed > > > > wrote: > > > > > > > > > Taher, > > > > > Let me spin a test for you to test similar scenario and let me > revert > > > > back > > > > > to you. > > > > > On Sep 16 2019, at 2:09 pm, Taher Koitawala > > wrote: > > > > > > Hi Kabeer, hive table has everything as a string. However when > > fetching > > > > > > data, the spark query is > > > > > > .sql(String.format("select contact_id,country,cast(last_update as > > > > > > TIMESTAMP) as last_update from %s",hiveTable)) > > > > > > > > > > > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed < > kab...@linuxmail.org > > > > > > > > wrote: > > > > > > > Is last_update a timestamp? Can you please throw the hive > schema > > that > > > > > > > > > > > > > > > > you > > > > > > > are using to create table. You could run show create table > > > > > > > > > > > > > > > > and > > > > > > > send us the output please? > > > > > > > > > > > > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala < > taher...@gmail.com> > > > > > wrote: > > > > > > > > Hi Kaber, Same issue when last_update is converted to long. > > > > > > > > > > > > > > > > HoodieSparkSQLWriter: Registered avro schema : { > > > > > > > > "type" : "record", > > > > > > > > "name" : "s3_master_contacts_list_hudi_record", > > > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > > > > > > > "fields" : [ { > > > > > > > > "name" : "contact_id", > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > }, { > > > > > > > > "name" : "country", > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > }, { > > > > > > > > "name" : "last_update", > > > > > > > > "type" : [ "long", "null" ] > > > > > > > > } ] > > > > > > > > } > > > > > > > > > > > > > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed < > > kab...@linuxmail.org > > > > > > > wrote: > > > > > > > > > Taher, > > > > > > > > > This error of field not found exception with HUDI is mostly > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > because of > > > > > > > > > > > > > > > > > > > > > > 2 > > > > > > > > > cases: > > > > > > > > > The data types of the fields do not match with the types > > listed > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > in > > > > > hive > > > > > > > > > tables. > > > > > > > > > > > > > > > > > > The field may really not be preset - which doesnt seem to > be > > your > > > > > case. > > > > > > > > > I looked into the schema in your log which is below. > > Basically > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > most of > > > > > > > > > > > > > > > > > > > > > > the > > > > > > > > > items seem to be string but I am not sure what are their > > types > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > that you > > > > > > > > > have defined in Hive. If you look into Hive table > > definition, you > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > > > > > > > > find > > > > > > > > > the bug soon. > > > > > > > > > > > > > > > > > > On another note, if you are still struggling; then you > > should try > > > > > to > > > > > > > start > > > > > > > > > with a very small example and keep building it. A ready > made > > code > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > copy > > > > > > > > > > > > > > > > > > > > > > is > > > > > > > > > at: > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 > > > > > > > > > ( > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > > > > > > > ) > > > > > > > > > written by Vinoth. You must take that small example build > it > > up > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > > then > > > > > > > > > relate to your own. > > > > > > > > > Let us know if this still doesnt
Re: Field not found in record HoodieException
Taher - did you find any NULLs in the data? If you are still not able to make progress, let us know. On Sep 17 2019, at 8:30 am, Taher Koitawala wrote: > Sure Gary, Let me check if i can find any nulls in there > > On Tue, Sep 17, 2019 at 1:28 AM Gary Li wrote: > > Hello, I have seen this exception before. In my case, if the precombine key > > of one entry is null, then I will have this error. I'd recommend checking > > if there is any row has null in *last_update.* > > > > Best, > > Gary > > > > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed > > wrote: > > > > > Taher, > > > Let me spin a test for you to test similar scenario and let me revert > > back > > > to you. > > > On Sep 16 2019, at 2:09 pm, Taher Koitawala wrote: > > > > Hi Kabeer, hive table has everything as a string. However when fetching > > > > data, the spark query is > > > > .sql(String.format("select contact_id,country,cast(last_update as > > > > TIMESTAMP) as last_update from %s",hiveTable)) > > > > > > > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed > > > wrote: > > > > > Is last_update a timestamp? Can you please throw the hive schema that > > > > > > > > > > you > > > > > are using to create table. You could run show create table > > > > > > > > > > and > > > > > send us the output please? > > > > > > > > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala > > > wrote: > > > > > > Hi Kaber, Same issue when last_update is converted to long. > > > > > > > > > > > > HoodieSparkSQLWriter: Registered avro schema : { > > > > > > "type" : "record", > > > > > > "name" : "s3_master_contacts_list_hudi_record", > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > > > > > "fields" : [ { > > > > > > "name" : "contact_id", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "country", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "last_update", > > > > > > "type" : [ "long", "null" ] > > > > > > } ] > > > > > > } > > > > > > > > > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed > > > > wrote: > > > > > > > Taher, > > > > > > > This error of field not found exception with HUDI is mostly > > > > > > > > > > > > > > > > > > > > > because of > > > > > > > > > > > > > > > > 2 > > > > > > > cases: > > > > > > > The data types of the fields do not match with the types listed > > > > > > > > > > > > > > > > > > > > > > in > > > hive > > > > > > > tables. > > > > > > > > > > > > > > The field may really not be preset - which doesnt seem to be your > > > case. > > > > > > > I looked into the schema in your log which is below. Basically > > > > > > > > > > > > > > > > > > > > > most of > > > > > > > > > > > > > > > > the > > > > > > > items seem to be string but I am not sure what are their types > > > > > > > > > > > > > > > > > > > > > that you > > > > > > > have defined in Hive. If you look into Hive table definition, you > > > > > > > > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > > find > > > > > > > the bug soon. > > > > > > > > > > > > > > On another note, if you are still struggling; then you should try > > > to > > > > > start > > > > > > > with a very small example and keep building it. A ready made code > > > > > > > > > > > > > > > > > > > > > copy > > > > > > > > > > > > > > > > is > > > > > > > at: > > > > > > > > > > > > > > > > > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 > > > > > > > ( > > > > > > > > > > > > > > > > > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > > > > > ) > > > > > > > written by Vinoth. You must take that small example build it up > > > > > > > > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > then > > > > > > > relate to your own. > > > > > > > Let us know if this still doesnt work for you. > > > > > > > Thanks > > > > > > > Kabeer. > > > > > > > > > > > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro > > > schema > > > > > : { > > > > > > > > "type" : "record", > > > > > > > > "name" : "s3_master_contacts_list_hudi_record", > > > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > > > > > > > "fields" : [ { > > > > > > > > "name" : "contact_id", > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > }, { > > > > > > > > "name" : "phone_number", > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > }, { > > > > > > > > "name" : "encrypted_phone_number", > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > }, { > > > > > > > > "name" : "phone_number_hash", > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > }, { > > > > > > > > "name" : "first_name", > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > }, { > > > > > > > > "name" : "last_name", > > > > > > > > "type" : [ "string", "null" ] > > > > > > > > }, { > >
Re: Field not found in record HoodieException
Sure Gary, Let me check if i can find any nulls in there On Tue, Sep 17, 2019 at 1:28 AM Gary Li wrote: > Hello, I have seen this exception before. In my case, if the precombine key > of one entry is null, then I will have this error. I'd recommend checking > if there is any row has null in *last_update.* > > Best, > Gary > > > On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed > wrote: > > > Taher, > > > > Let me spin a test for you to test similar scenario and let me revert > back > > to you. > > On Sep 16 2019, at 2:09 pm, Taher Koitawala wrote: > > > Hi Kabeer, hive table has everything as a string. However when fetching > > > data, the spark query is > > > .sql(String.format("select contact_id,country,cast(last_update as > > > TIMESTAMP) as last_update from %s",hiveTable)) > > > > > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed > > wrote: > > > > Is last_update a timestamp? Can you please throw the hive schema that > > you > > > > are using to create table. You could run show create table > > and > > > > send us the output please? > > > > > > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala > > wrote: > > > > > Hi Kaber, Same issue when last_update is converted to long. > > > > > > > > > > HoodieSparkSQLWriter: Registered avro schema : { > > > > > "type" : "record", > > > > > "name" : "s3_master_contacts_list_hudi_record", > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > > > > "fields" : [ { > > > > > "name" : "contact_id", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "country", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "last_update", > > > > > "type" : [ "long", "null" ] > > > > > } ] > > > > > } > > > > > > > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed > > > > > wrote: > > > > > > Taher, > > > > > > This error of field not found exception with HUDI is mostly > > because of > > > > > > > > > > > > > 2 > > > > > > cases: > > > > > > The data types of the fields do not match with the types listed > in > > hive > > > > > > tables. > > > > > > > > > > > > The field may really not be preset - which doesnt seem to be your > > case. > > > > > > I looked into the schema in your log which is below. Basically > > most of > > > > > > > > > > > > > the > > > > > > items seem to be string but I am not sure what are their types > > that you > > > > > > have defined in Hive. If you look into Hive table definition, you > > may > > > > > > > > > > > > > find > > > > > > the bug soon. > > > > > > > > > > > > On another note, if you are still struggling; then you should try > > to > > > > start > > > > > > with a very small example and keep building it. A ready made code > > copy > > > > > > > > > > > > > is > > > > > > at: > > > > > > > > > > > > > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 > > > > > > ( > > > > > > > > > > > > > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > > > > ) > > > > > > written by Vinoth. You must take that small example build it up > and > > > > > > > > > > > > > then > > > > > > relate to your own. > > > > > > Let us know if this still doesnt work for you. > > > > > > Thanks > > > > > > Kabeer. > > > > > > > > > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro > > schema > > > > : { > > > > > > > "type" : "record", > > > > > > > "name" : "s3_master_contacts_list_hudi_record", > > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > > > > > > "fields" : [ { > > > > > > > "name" : "contact_id", > > > > > > > "type" : [ "string", "null" ] > > > > > > > }, { > > > > > > > "name" : "phone_number", > > > > > > > "type" : [ "string", "null" ] > > > > > > > }, { > > > > > > > "name" : "encrypted_phone_number", > > > > > > > "type" : [ "string", "null" ] > > > > > > > }, { > > > > > > > "name" : "phone_number_hash", > > > > > > > "type" : [ "string", "null" ] > > > > > > > }, { > > > > > > > "name" : "first_name", > > > > > > > "type" : [ "string", "null" ] > > > > > > > }, { > > > > > > > "name" : "last_name", > > > > > > > "type" : [ "string", "null" ] > > > > > > > }, { > > > > > > > "name" : "email_id", > > > > > > > "type" : [ "string", "null" ] > > > > > > > }, { > > > > > > > "name" : "encrypted_email_id", > > > > > > > "type" : [ "string", "null" ] > > > > > > > }, { > > > > > > > "name" : "email_id_hash", > > > > > > > "type" : [ "string", "null" ] > > > > > > > }, { > > > > > > > "name" : "email_id_1", > > > > > > > "type" : [ "string", "null" ] > > > > > > > }, { > > > > > > > "name" : "encrypted_email_id_1", > > > > > > > "type" : [ "string", "null" ] > > > > > > > }, { > > > > > > > "name" : "email_id_1_hash", > > > > > > > "type" : [ "string", "null" ] > > > > > > > }, { > > > > > > > "name" : "e_domain", > > > > > > > "type" : [ "string", "null" ] > > >
Re: Field not found in record HoodieException
Hello, I have seen this exception before. In my case, if the precombine key of one entry is null, then I will have this error. I'd recommend checking if there is any row has null in *last_update.* Best, Gary On Mon, Sep 16, 2019 at 12:32 PM Kabeer Ahmed wrote: > Taher, > > Let me spin a test for you to test similar scenario and let me revert back > to you. > On Sep 16 2019, at 2:09 pm, Taher Koitawala wrote: > > Hi Kabeer, hive table has everything as a string. However when fetching > > data, the spark query is > > .sql(String.format("select contact_id,country,cast(last_update as > > TIMESTAMP) as last_update from %s",hiveTable)) > > > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed > wrote: > > > Is last_update a timestamp? Can you please throw the hive schema that > you > > > are using to create table. You could run show create table > and > > > send us the output please? > > > > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala > wrote: > > > > Hi Kaber, Same issue when last_update is converted to long. > > > > > > > > HoodieSparkSQLWriter: Registered avro schema : { > > > > "type" : "record", > > > > "name" : "s3_master_contacts_list_hudi_record", > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > > > "fields" : [ { > > > > "name" : "contact_id", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "country", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "last_update", > > > > "type" : [ "long", "null" ] > > > > } ] > > > > } > > > > > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed > > > wrote: > > > > > Taher, > > > > > This error of field not found exception with HUDI is mostly > because of > > > > > > > > > > 2 > > > > > cases: > > > > > The data types of the fields do not match with the types listed in > hive > > > > > tables. > > > > > > > > > > The field may really not be preset - which doesnt seem to be your > case. > > > > > I looked into the schema in your log which is below. Basically > most of > > > > > > > > > > the > > > > > items seem to be string but I am not sure what are their types > that you > > > > > have defined in Hive. If you look into Hive table definition, you > may > > > > > > > > > > find > > > > > the bug soon. > > > > > > > > > > On another note, if you are still struggling; then you should try > to > > > start > > > > > with a very small example and keep building it. A ready made code > copy > > > > > > > > > > is > > > > > at: > > > > > > > > > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 > > > > > ( > > > > > > > > > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > > > ) > > > > > written by Vinoth. You must take that small example build it up and > > > > > > > > > > then > > > > > relate to your own. > > > > > Let us know if this still doesnt work for you. > > > > > Thanks > > > > > Kabeer. > > > > > > > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro > schema > > > : { > > > > > > "type" : "record", > > > > > > "name" : "s3_master_contacts_list_hudi_record", > > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > > > > > "fields" : [ { > > > > > > "name" : "contact_id", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "phone_number", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "encrypted_phone_number", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "phone_number_hash", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "first_name", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "last_name", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "email_id", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "encrypted_email_id", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "email_id_hash", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "email_id_1", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "encrypted_email_id_1", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "email_id_1_hash", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "e_domain", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "account_id", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "company", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "company_1", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "flc", > > > > > > "type" : [ "string", "null" ] > > > > > > }, { > > > > > > "name" : "flc_1", > > >
Re: Field not found in record HoodieException
Taher, Let me spin a test for you to test similar scenario and let me revert back to you. On Sep 16 2019, at 2:09 pm, Taher Koitawala wrote: > Hi Kabeer, hive table has everything as a string. However when fetching > data, the spark query is > .sql(String.format("select contact_id,country,cast(last_update as > TIMESTAMP) as last_update from %s",hiveTable)) > > On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed wrote: > > Is last_update a timestamp? Can you please throw the hive schema that you > > are using to create table. You could run show create table and > > send us the output please? > > > > On Sep 16 2019, at 1:32 pm, Taher Koitawala wrote: > > > Hi Kaber, Same issue when last_update is converted to long. > > > > > > HoodieSparkSQLWriter: Registered avro schema : { > > > "type" : "record", > > > "name" : "s3_master_contacts_list_hudi_record", > > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > > "fields" : [ { > > > "name" : "contact_id", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "country", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "last_update", > > > "type" : [ "long", "null" ] > > > } ] > > > } > > > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed > > wrote: > > > > Taher, > > > > This error of field not found exception with HUDI is mostly because of > > > > > > > 2 > > > > cases: > > > > The data types of the fields do not match with the types listed in hive > > > > tables. > > > > > > > > The field may really not be preset - which doesnt seem to be your case. > > > > I looked into the schema in your log which is below. Basically most of > > > > > > > the > > > > items seem to be string but I am not sure what are their types that you > > > > have defined in Hive. If you look into Hive table definition, you may > > > > > > > find > > > > the bug soon. > > > > > > > > On another note, if you are still struggling; then you should try to > > start > > > > with a very small example and keep building it. A ready made code copy > > > > > > > is > > > > at: > > > > > > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 > > > > ( > > > > > > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > > ) > > > > written by Vinoth. You must take that small example build it up and > > > > > > > then > > > > relate to your own. > > > > Let us know if this still doesnt work for you. > > > > Thanks > > > > Kabeer. > > > > > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema > > : { > > > > > "type" : "record", > > > > > "name" : "s3_master_contacts_list_hudi_record", > > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > > > > "fields" : [ { > > > > > "name" : "contact_id", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "phone_number", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "encrypted_phone_number", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "phone_number_hash", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "first_name", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "last_name", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "email_id", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "encrypted_email_id", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "email_id_hash", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "email_id_1", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "encrypted_email_id_1", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "email_id_1_hash", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "e_domain", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "account_id", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "company", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "company_1", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "flc", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "flc_1", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "flc_trim", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "fln", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "title", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "title_hash", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "address", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > > "name" : "zip_code", > > > > > "type" : [ "string", "null" ] > > > > > }, { > > > > >
Re: Field not found in record HoodieException
Hi Kabeer, hive table has everything as a string. However when fetching data, the spark query is .sql(String.format("select contact_id,country,cast(last_update as TIMESTAMP) as last_update from %s",hiveTable)) On Mon, Sep 16, 2019 at 6:18 PM Kabeer Ahmed wrote: > Is last_update a timestamp? Can you please throw the hive schema that you > are using to create table. You could run show create table and > send us the output please? > > On Sep 16 2019, at 1:32 pm, Taher Koitawala wrote: > > Hi Kaber, Same issue when last_update is converted to long. > > > > HoodieSparkSQLWriter: Registered avro schema : { > > "type" : "record", > > "name" : "s3_master_contacts_list_hudi_record", > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > "fields" : [ { > > "name" : "contact_id", > > "type" : [ "string", "null" ] > > }, { > > "name" : "country", > > "type" : [ "string", "null" ] > > }, { > > "name" : "last_update", > > "type" : [ "long", "null" ] > > } ] > > } > > > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed > wrote: > > > Taher, > > > This error of field not found exception with HUDI is mostly because of > 2 > > > cases: > > > The data types of the fields do not match with the types listed in hive > > > tables. > > > > > > The field may really not be preset - which doesnt seem to be your case. > > > I looked into the schema in your log which is below. Basically most of > the > > > items seem to be string but I am not sure what are their types that you > > > have defined in Hive. If you look into Hive table definition, you may > find > > > the bug soon. > > > > > > On another note, if you are still struggling; then you should try to > start > > > with a very small example and keep building it. A ready made code copy > is > > > at: > > > > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 > > > ( > > > > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D > ) > > > written by Vinoth. You must take that small example build it up and > then > > > relate to your own. > > > Let us know if this still doesnt work for you. > > > Thanks > > > Kabeer. > > > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema > : { > > > > "type" : "record", > > > > "name" : "s3_master_contacts_list_hudi_record", > > > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > > > "fields" : [ { > > > > "name" : "contact_id", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "phone_number", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "encrypted_phone_number", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "phone_number_hash", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "first_name", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "last_name", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "email_id", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "encrypted_email_id", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "email_id_hash", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "email_id_1", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "encrypted_email_id_1", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "email_id_1_hash", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "e_domain", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "account_id", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "company", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "company_1", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "flc", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "flc_1", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "flc_trim", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "fln", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "title", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "title_hash", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "address", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "zip_code", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "country", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "city", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "website", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "website_1", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "timezone", > > > > "type" : [ "string", "null" ] > > > > }, { > > > > "name" : "address_2", > > > > "type" : [ "string", "null" ] > > > > }, { > > > >
Re: Field not found in record HoodieException
Is last_update a timestamp? Can you please throw the hive schema that you are using to create table. You could run show create table and send us the output please? On Sep 16 2019, at 1:32 pm, Taher Koitawala wrote: > Hi Kaber, Same issue when last_update is converted to long. > > HoodieSparkSQLWriter: Registered avro schema : { > "type" : "record", > "name" : "s3_master_contacts_list_hudi_record", > "namespace" : "hoodie.s3_master_contacts_list_hudi", > "fields" : [ { > "name" : "contact_id", > "type" : [ "string", "null" ] > }, { > "name" : "country", > "type" : [ "string", "null" ] > }, { > "name" : "last_update", > "type" : [ "long", "null" ] > } ] > } > > On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed wrote: > > Taher, > > This error of field not found exception with HUDI is mostly because of 2 > > cases: > > The data types of the fields do not match with the types listed in hive > > tables. > > > > The field may really not be preset - which doesnt seem to be your case. > > I looked into the schema in your log which is below. Basically most of the > > items seem to be string but I am not sure what are their types that you > > have defined in Hive. If you look into Hive table definition, you may find > > the bug soon. > > > > On another note, if you are still struggling; then you should try to start > > with a very small example and keep building it. A ready made code copy is > > at: > > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 > > ( > > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) > > written by Vinoth. You must take that small example build it up and then > > relate to your own. > > Let us know if this still doesnt work for you. > > Thanks > > Kabeer. > > > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema : { > > > "type" : "record", > > > "name" : "s3_master_contacts_list_hudi_record", > > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > > "fields" : [ { > > > "name" : "contact_id", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "phone_number", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "encrypted_phone_number", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "phone_number_hash", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "first_name", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "last_name", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "email_id", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "encrypted_email_id", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "email_id_hash", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "email_id_1", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "encrypted_email_id_1", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "email_id_1_hash", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "e_domain", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "account_id", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "company", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "company_1", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "flc", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "flc_1", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "flc_trim", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "fln", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "title", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "title_hash", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "address", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "zip_code", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "country", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "city", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "website", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "website_1", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "timezone", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "address_2", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "state_province", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "employees", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "employee_range", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "rev_range", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "std_rev_range", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "company_revenue", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "sic_code", > > > "type" : [ "string", "null" ] > > > }, { > > > "name" : "nic_code", > > >
Re: Field not found in record HoodieException
Hi Kaber, Same issue when last_update is converted to long. HoodieSparkSQLWriter: Registered avro schema : { "type" : "record", "name" : "s3_master_contacts_list_hudi_record", "namespace" : "hoodie.s3_master_contacts_list_hudi", "fields" : [ { "name" : "contact_id", "type" : [ "string", "null" ] }, { "name" : "country", "type" : [ "string", "null" ] }, { "name" : "last_update", "type" : [ "long", "null" ] } ] } On Mon, Sep 16, 2019 at 4:17 PM Kabeer Ahmed wrote: > Taher, > > This error of field not found exception with HUDI is mostly because of 2 > cases: > The data types of the fields do not match with the types listed in hive > tables. > > The field may really not be preset - which doesnt seem to be your case. > > I looked into the schema in your log which is below. Basically most of the > items seem to be string but I am not sure what are their types that you > have defined in Hive. If you look into Hive table definition, you may find > the bug soon. > > On another note, if you are still struggling; then you should try to start > with a very small example and keep building it. A ready made code copy is > at: > https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 > ( > https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) > written by Vinoth. You must take that small example build it up and then > relate to your own. > Let us know if this still doesnt work for you. > Thanks > Kabeer. > > > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema : { > > "type" : "record", > > "name" : "s3_master_contacts_list_hudi_record", > > "namespace" : "hoodie.s3_master_contacts_list_hudi", > > "fields" : [ { > > "name" : "contact_id", > > "type" : [ "string", "null" ] > > }, { > > "name" : "phone_number", > > "type" : [ "string", "null" ] > > }, { > > "name" : "encrypted_phone_number", > > "type" : [ "string", "null" ] > > }, { > > "name" : "phone_number_hash", > > "type" : [ "string", "null" ] > > }, { > > "name" : "first_name", > > "type" : [ "string", "null" ] > > }, { > > "name" : "last_name", > > "type" : [ "string", "null" ] > > }, { > > "name" : "email_id", > > "type" : [ "string", "null" ] > > }, { > > "name" : "encrypted_email_id", > > "type" : [ "string", "null" ] > > }, { > > "name" : "email_id_hash", > > "type" : [ "string", "null" ] > > }, { > > "name" : "email_id_1", > > "type" : [ "string", "null" ] > > }, { > > "name" : "encrypted_email_id_1", > > "type" : [ "string", "null" ] > > }, { > > "name" : "email_id_1_hash", > > "type" : [ "string", "null" ] > > }, { > > "name" : "e_domain", > > "type" : [ "string", "null" ] > > }, { > > "name" : "account_id", > > "type" : [ "string", "null" ] > > }, { > > "name" : "company", > > "type" : [ "string", "null" ] > > }, { > > "name" : "company_1", > > "type" : [ "string", "null" ] > > }, { > > "name" : "flc", > > "type" : [ "string", "null" ] > > }, { > > "name" : "flc_1", > > "type" : [ "string", "null" ] > > }, { > > "name" : "flc_trim", > > "type" : [ "string", "null" ] > > }, { > > "name" : "fln", > > "type" : [ "string", "null" ] > > }, { > > "name" : "title", > > "type" : [ "string", "null" ] > > }, { > > "name" : "title_hash", > > "type" : [ "string", "null" ] > > }, { > > "name" : "address", > > "type" : [ "string", "null" ] > > }, { > > "name" : "zip_code", > > "type" : [ "string", "null" ] > > }, { > > "name" : "country", > > "type" : [ "string", "null" ] > > }, { > > "name" : "city", > > "type" : [ "string", "null" ] > > }, { > > "name" : "website", > > "type" : [ "string", "null" ] > > }, { > > "name" : "website_1", > > "type" : [ "string", "null" ] > > }, { > > "name" : "timezone", > > "type" : [ "string", "null" ] > > }, { > > "name" : "address_2", > > "type" : [ "string", "null" ] > > }, { > > "name" : "state_province", > > "type" : [ "string", "null" ] > > }, { > > "name" : "employees", > > "type" : [ "string", "null" ] > > }, { > > "name" : "employee_range", > > "type" : [ "string", "null" ] > > }, { > > "name" : "rev_range", > > "type" : [ "string", "null" ] > > }, { > > "name" : "std_rev_range", > > "type" : [ "string", "null" ] > > }, { > > "name" : "company_revenue", > > "type" : [ "string", "null" ] > > }, { > > "name" : "sic_code", > > "type" : [ "string", "null" ] > > }, { > > "name" : "nic_code", > > "type" : [ "string", "null" ] > > }, { > > "name" : "primary_industry", > > "type" : [ "string", "null" ] > > }, { > > "name" : "primary_industry_1", > > "type" : [ "string", "null" ] > > }, { > > "name" : "standard_primary_industry", > > "type" : [ "string", "null" ] > > }, { > > "name" : "primary_db_source", > > "type" : [ "string", "null" ] > > }, { > > "name" : "last_r8_email_open", > > "type" : [ "string", "null" ] > > }, { > > "name" : "last_r8_email_click", > > "type" : [ "string", "null"
Re: Field not found in record HoodieException
Taher, This error of field not found exception with HUDI is mostly because of 2 cases: The data types of the fields do not match with the types listed in hive tables. The field may really not be preset - which doesnt seem to be your case. I looked into the schema in your log which is below. Basically most of the items seem to be string but I am not sure what are their types that you have defined in Hive. If you look into Hive table definition, you may find the bug soon. On another note, if you are still struggling; then you should try to start with a very small example and keep building it. A ready made code copy is at: https://github.com/apache/incubator-hudi/issues/859#issuecomment-527316262 (https://link.getmailspring.com/link/76e27aed-a21c-4d8d-abd6-92e7c2a0c...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859%23issuecomment-527316262=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D) written by Vinoth. You must take that small example build it up and then relate to your own. Let us know if this still doesnt work for you. Thanks Kabeer. > 19/09/16 10:09:26 INFO HoodieSparkSQLWriter: Registered avro schema : { > "type" : "record", > "name" : "s3_master_contacts_list_hudi_record", > "namespace" : "hoodie.s3_master_contacts_list_hudi", > "fields" : [ { > "name" : "contact_id", > "type" : [ "string", "null" ] > }, { > "name" : "phone_number", > "type" : [ "string", "null" ] > }, { > "name" : "encrypted_phone_number", > "type" : [ "string", "null" ] > }, { > "name" : "phone_number_hash", > "type" : [ "string", "null" ] > }, { > "name" : "first_name", > "type" : [ "string", "null" ] > }, { > "name" : "last_name", > "type" : [ "string", "null" ] > }, { > "name" : "email_id", > "type" : [ "string", "null" ] > }, { > "name" : "encrypted_email_id", > "type" : [ "string", "null" ] > }, { > "name" : "email_id_hash", > "type" : [ "string", "null" ] > }, { > "name" : "email_id_1", > "type" : [ "string", "null" ] > }, { > "name" : "encrypted_email_id_1", > "type" : [ "string", "null" ] > }, { > "name" : "email_id_1_hash", > "type" : [ "string", "null" ] > }, { > "name" : "e_domain", > "type" : [ "string", "null" ] > }, { > "name" : "account_id", > "type" : [ "string", "null" ] > }, { > "name" : "company", > "type" : [ "string", "null" ] > }, { > "name" : "company_1", > "type" : [ "string", "null" ] > }, { > "name" : "flc", > "type" : [ "string", "null" ] > }, { > "name" : "flc_1", > "type" : [ "string", "null" ] > }, { > "name" : "flc_trim", > "type" : [ "string", "null" ] > }, { > "name" : "fln", > "type" : [ "string", "null" ] > }, { > "name" : "title", > "type" : [ "string", "null" ] > }, { > "name" : "title_hash", > "type" : [ "string", "null" ] > }, { > "name" : "address", > "type" : [ "string", "null" ] > }, { > "name" : "zip_code", > "type" : [ "string", "null" ] > }, { > "name" : "country", > "type" : [ "string", "null" ] > }, { > "name" : "city", > "type" : [ "string", "null" ] > }, { > "name" : "website", > "type" : [ "string", "null" ] > }, { > "name" : "website_1", > "type" : [ "string", "null" ] > }, { > "name" : "timezone", > "type" : [ "string", "null" ] > }, { > "name" : "address_2", > "type" : [ "string", "null" ] > }, { > "name" : "state_province", > "type" : [ "string", "null" ] > }, { > "name" : "employees", > "type" : [ "string", "null" ] > }, { > "name" : "employee_range", > "type" : [ "string", "null" ] > }, { > "name" : "rev_range", > "type" : [ "string", "null" ] > }, { > "name" : "std_rev_range", > "type" : [ "string", "null" ] > }, { > "name" : "company_revenue", > "type" : [ "string", "null" ] > }, { > "name" : "sic_code", > "type" : [ "string", "null" ] > }, { > "name" : "nic_code", > "type" : [ "string", "null" ] > }, { > "name" : "primary_industry", > "type" : [ "string", "null" ] > }, { > "name" : "primary_industry_1", > "type" : [ "string", "null" ] > }, { > "name" : "standard_primary_industry", > "type" : [ "string", "null" ] > }, { > "name" : "primary_db_source", > "type" : [ "string", "null" ] > }, { > "name" : "last_r8_email_open", > "type" : [ "string", "null" ] > }, { > "name" : "last_r8_email_click", > "type" : [ "string", "null" ] > }, { > "name" : "last_zd_email_open", > "type" : [ "string", "null" ] > }, { > "name" : "last_zd_email_click", > "type" : [ "string", "null" ] > }, { > "name" : "last_phone_verified", > "type" : [ "string", "null" ] > }, { > "name" : "last_lead_verified", > "type" : [ "string", "null" ] > }, { > "name" : "email_status", > "type" : [ "string", "null" ] > }, { > "name" : "last_email_status_updated_at", > "type" : [ "string", "null" ] > }, { > "name" : "is_firmographically_validated", > "type" : [ "string", "null" ] > }, { > "name" : "last_firmographically_validated_at", > "type" : [ "string", "null" ] > }, { > "name" : "is_demographically_validated", > "type" : [ "string", "null" ] > }, { > "name" : "dq_reason", > "type" : [ "string", "null" ] > }, { > "name" : "dq_subreason", > "type" : [ "string", "null" ]