Re: Upsert after Delete

2019-09-02 Thread Kabeer Ahmed
g from there.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Aug 22, 2019 at 3:06 PM Kabeer Ahmed  > > > > wrote:
> > > > > > > And if you meant HUDI storage type, I have left it to default
> > > > > >
> > > > >
> > > >
> > >
> >
> > COW -
> > > > > >
> > > > >
> > > > > Copy
> > > > > > > On Write.
> > > > > > >
> > > > > > > If anyone has tried this please let me know if you have hit
> > similar
> > > > > issue.
> > > > > > > Any experience would be greatly helpful.
> > > > > > > On Aug 22 2019, at 11:01 pm, Kabeer Ahmed 
> > > > > >
> > > > > >
> > > > >
> > > > > wrote:
> > > > > > > > Hi Vinoth - thanks for the quick response.
> > > > > > > >
> > > > > > > > I have followed the mail thread for deletes ->
> > http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/
> > > <
> > > > > > > 16722511.2660.9583626796839453...@gitbox.apache.org>
> > > > > > > >
> > > > > > > > For your convenience, the code that I use is below at the end
> > of
> > > the
> > > > > > > email. EmptyHoodieRecord is inserted for the relevant records
> > > > > >
> > > > >
> > > >
> > >
> >
> > that
> > > > > >
> > > > >
> > > > > need to
> > > > > > > be deleted. After the delete, I can query from Hive and confirm
> > > > > >
> > > > >
> > > >
> > >
> > > that
> > > > > >
> > > > >
> > > > > the
> > > > > > > rows intended to be deleted are no longer present and the records
> > > > > >
> > > > >
> > > >
> > >
> > > not
> > > > > > > deleted can be seen in the Hive table via Hive and Presto.
> > > > > > > > The issue starts when the upsert is done after a delete.
> > > > > > > > The storage type is S3 and I dont think there is any eventual
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > consistency in play as the record upserted is visible but the old
> > > > > records
> > > > > > > that werent deleted are not visible.
> > > > > > > > And for the sake of completion, my insert and upsert logic is
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > based
> > > > > > >
> > > > > >
> > > > >
> > > > > out
> > > > > > >
> > > > > > > of the code below:
> > >
> > https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > > > > > > Thanks
> > > > > > > > Kabeer.
> > > > > > > >
> > > > > > > > > /**
> > > > > > > > > * Empty payload used for deletions
> > > > > > > > > */
> > > > > > > > > public class EmptyHoodieRecordPayload implements
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > HoodieRecordPayload
> > > > > > > > > {
> > > > > > > > > public EmptyHoodieRecordPayload(GenericRecord record,
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > > Comparable
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > orderingVal) { }
> > > > > > > > > @Override
> > > > > > > > > public EmptyHoodieRecordPayload
> >

Re: Upsert after Delete

2019-08-31 Thread Vinoth Chandar
anks for the quick response.
> > > > > > >
> > > > > > > I have followed the mail thread for deletes ->
> > > > > >
> http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/
> > <
> > > > > > 16722511.2660.9583626796839453...@gitbox.apache.org>
> > > > > > >
> > > > > > > For your convenience, the code that I use is below at the end
> of
> > the
> > > > > > email. EmptyHoodieRecord is inserted for the relevant records
> that
> > > > >
> > > >
> > > > need to
> > > > > > be deleted. After the delete, I can query from Hive and confirm
> > that
> > > > >
> > > >
> > > > the
> > > > > > rows intended to be deleted are no longer present and the records
> > not
> > > > > > deleted can be seen in the Hive table via Hive and Presto.
> > > > > > > The issue starts when the upsert is done after a delete.
> > > > > > > The storage type is S3 and I dont think there is any eventual
> > > > > >
> > > > > >
> > > > > > consistency in play as the record upserted is visible but the old
> > > > records
> > > > > > that werent deleted are not visible.
> > > > > > > And for the sake of completion, my insert and upsert logic is
> > based
> > > > > >
> > > > >
> > > >
> > > > out
> > > > > >
> > > > > > of the code below:
> > > >
> >
> https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > > > > > Thanks
> > > > > > > Kabeer.
> > > > > > >
> > > > > > > > /**
> > > > > > > > * Empty payload used for deletions
> > > > > > > > */
> > > > > > > > public class EmptyHoodieRecordPayload implements
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > HoodieRecordPayload
> > > > > > > > {
> > > > > > > > public EmptyHoodieRecordPayload(GenericRecord record,
> > Comparable
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > orderingVal) { }
> > > > > > > > @Override
> > > > > > > > public EmptyHoodieRecordPayload
> > preCombine(EmptyHoodieRecordPayload
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > another) {
> > > > > > > > return another;
> > > > > > > > }
> > > > > > > > @Override
> > > > > > > > public Optional
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > combineAndGetUpdateValue(IndexedRecord
> > > > > > >
> > > > > >
> > > > > > currentValue,
> > > > > > > > chema schema) {
> > > > > > > > return Optional.empty();
> > > > > > > > }
> > > > > > > > @Override
> > > > > > > > public Optional getInsertValue(Schema schema)
> {
> > > > > > > > return Optional.empty();
> > > > > > > > }
> > > > > > > > }
> > > > > > >
> > > > > > >
> > > > > > > -- Forwarded Message -
> > > > > > > From: Vinoth Chandar 
> > > > > > > Subject: Re: Upsert after Delete
> > > > > > > Date: Aug 22 2019, at 8:38 pm
> > > > > > > To: dev@hudi.apache.org
> > > > > > >
> > > > > > > That’s interesting. Can you also share details on storage type
> > and
> > > > how
> > > > > > you
> > > > > > > are issuing the deletes and also the table/view (ro, rt) that
> > you are
> > > > > > > querying?
> > > > > > >
> > > > > > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed <
> > kab...@linuxmail.org>
> > > > > > wrote:
> > > > > > > > Hudi experts and Users,
> > > > > > > > Has anyone attempted an upsert after a delete? Here is a
> weird
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > thing
> > > > > > >
> > > > > >
> > > > > > that
> > > > > > > > I have bumped into and it is a shame that this has come up
> when
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > someone in
> > > > > > > > the team tested this whilst I failed to run this test.
> > > > > > > > Use case:
> > > > > > > > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> > > > > > > >
> > > > > > > > Delete a record (1, kabeer). Data in the table is: (2,
> vinoth)
> > and
> > > > it
> > > > > > is
> > > > > > > > visible via sql through Presto/Hive.
> > > > > > > >
> > > > > > > > Upsert a new record into the same table (3, balaji). Query
> the
> > > > table
> > > > > > and
> > > > > > > > only record that is visible is: (3, balaji). The record (2,
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > vinoth) is
> > > > > > >
> > > > > >
> > > > > > not
> > > > > > > > displayed in the results.
> > > > > > > >
> > > > > > > > Any ideas on what could be at play here? Has someone done
> > upsert
> > > > after
> > > > > > > > delete?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Kabeer
> > > > > > > >
> > > > > > > > PS: Please note that upsert functionality is well tested and
> > if we
> > > > do
> > > > > > (1,
> > > > > > > > vinoth) insert followed by upsert of (2, balaji) both the
> > records
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > > are
> > > > > > > > visible. So something else is at play and would appreciate
> any
> > help
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > that
> > > > > > > > you experts can provide insight.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
> >
>


Re: Upsert after Delete

2019-08-31 Thread Jaimin Shah
gt; >
> > >
> > > the
> > > > > rows intended to be deleted are no longer present and the records
> not
> > > > > deleted can be seen in the Hive table via Hive and Presto.
> > > > > > The issue starts when the upsert is done after a delete.
> > > > > > The storage type is S3 and I dont think there is any eventual
> > > > >
> > > > >
> > > > > consistency in play as the record upserted is visible but the old
> > > records
> > > > > that werent deleted are not visible.
> > > > > > And for the sake of completion, my insert and upsert logic is
> based
> > > > >
> > > >
> > >
> > > out
> > > > >
> > > > > of the code below:
> > >
> https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > > > > Thanks
> > > > > > Kabeer.
> > > > > >
> > > > > > > /**
> > > > > > > * Empty payload used for deletions
> > > > > > > */
> > > > > > > public class EmptyHoodieRecordPayload implements
> > > > > >
> > > > > >
> > > > >
> > > > > HoodieRecordPayload
> > > > > > > {
> > > > > > > public EmptyHoodieRecordPayload(GenericRecord record,
> Comparable
> > > > > >
> > > > > >
> > > > >
> > > > > orderingVal) { }
> > > > > > > @Override
> > > > > > > public EmptyHoodieRecordPayload
> preCombine(EmptyHoodieRecordPayload
> > > > > >
> > > > > >
> > > > >
> > > > > another) {
> > > > > > > return another;
> > > > > > > }
> > > > > > > @Override
> > > > > > > public Optional
> > > > > >
> > > > >
> > > >
> > >
> > > combineAndGetUpdateValue(IndexedRecord
> > > > > >
> > > > >
> > > > > currentValue,
> > > > > > > chema schema) {
> > > > > > > return Optional.empty();
> > > > > > > }
> > > > > > > @Override
> > > > > > > public Optional getInsertValue(Schema schema) {
> > > > > > > return Optional.empty();
> > > > > > > }
> > > > > > > }
> > > > > >
> > > > > >
> > > > > > -- Forwarded Message -
> > > > > > From: Vinoth Chandar 
> > > > > > Subject: Re: Upsert after Delete
> > > > > > Date: Aug 22 2019, at 8:38 pm
> > > > > > To: dev@hudi.apache.org
> > > > > >
> > > > > > That’s interesting. Can you also share details on storage type
> and
> > > how
> > > > > you
> > > > > > are issuing the deletes and also the table/view (ro, rt) that
> you are
> > > > > > querying?
> > > > > >
> > > > > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed <
> kab...@linuxmail.org>
> > > > > wrote:
> > > > > > > Hudi experts and Users,
> > > > > > > Has anyone attempted an upsert after a delete? Here is a weird
> > > > > >
> > > > >
> > > >
> > >
> > > thing
> > > > > >
> > > > >
> > > > > that
> > > > > > > I have bumped into and it is a shame that this has come up when
> > > > > >
> > > > > >
> > > > >
> > > > > someone in
> > > > > > > the team tested this whilst I failed to run this test.
> > > > > > > Use case:
> > > > > > > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> > > > > > >
> > > > > > > Delete a record (1, kabeer). Data in the table is: (2, vinoth)
> and
> > > it
> > > > > is
> > > > > > > visible via sql through Presto/Hive.
> > > > > > >
> > > > > > > Upsert a new record into the same table (3, balaji). Query the
> > > table
> > > > > and
> > > > > > > only record that is visible is: (3, balaji). The record (2,
> > > > > >
> > > > >
> > > >
> > >
> > > vinoth) is
> > > > > >
> > > > >
> > > > > not
> > > > > > > displayed in the results.
> > > > > > >
> > > > > > > Any ideas on what could be at play here? Has someone done
> upsert
> > > after
> > > > > > > delete?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Kabeer
> > > > > > >
> > > > > > > PS: Please note that upsert functionality is well tested and
> if we
> > > do
> > > > > (1,
> > > > > > > vinoth) insert followed by upsert of (2, balaji) both the
> records
> > > > > >
> > > > >
> > > >
> > >
> > > are
> > > > > > > visible. So something else is at play and would appreciate any
> help
> > > > > >
> > > > > >
> > > > >
> > > > > that
> > > > > > > you experts can provide insight.
> > > > > >
> > > > >
> > > >
> > >
> >
> >
>
>


Re: Upsert after Delete

2019-08-30 Thread Vinoth Chandar
d Presto.
> > > > > > The issue starts when the upsert is done after a delete.
> > > > > > The storage type is S3 and I dont think there is any eventual
> > > > >
> > > > >
> > > > > consistency in play as the record upserted is visible but the old
> > > records
> > > > > that werent deleted are not visible.
> > > > > > And for the sake of completion, my insert and upsert logic is
> based
> > > > >
> > > >
> > >
> > > out
> > > > >
> > > > > of the code below:
> > >
> https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > > > > Thanks
> > > > > > Kabeer.
> > > > > >
> > > > > > > /**
> > > > > > > * Empty payload used for deletions
> > > > > > > */
> > > > > > > public class EmptyHoodieRecordPayload implements
> > > > > >
> > > > > >
> > > > >
> > > > > HoodieRecordPayload
> > > > > > > {
> > > > > > > public EmptyHoodieRecordPayload(GenericRecord record,
> Comparable
> > > > > >
> > > > > >
> > > > >
> > > > > orderingVal) { }
> > > > > > > @Override
> > > > > > > public EmptyHoodieRecordPayload
> preCombine(EmptyHoodieRecordPayload
> > > > > >
> > > > > >
> > > > >
> > > > > another) {
> > > > > > > return another;
> > > > > > > }
> > > > > > > @Override
> > > > > > > public Optional
> > > > > >
> > > > >
> > > >
> > >
> > > combineAndGetUpdateValue(IndexedRecord
> > > > > >
> > > > >
> > > > > currentValue,
> > > > > > > chema schema) {
> > > > > > > return Optional.empty();
> > > > > > > }
> > > > > > > @Override
> > > > > > > public Optional getInsertValue(Schema schema) {
> > > > > > > return Optional.empty();
> > > > > > > }
> > > > > > > }
> > > > > >
> > > > > >
> > > > > > -- Forwarded Message -
> > > > > > From: Vinoth Chandar 
> > > > > > Subject: Re: Upsert after Delete
> > > > > > Date: Aug 22 2019, at 8:38 pm
> > > > > > To: dev@hudi.apache.org
> > > > > >
> > > > > > That’s interesting. Can you also share details on storage type
> and
> > > how
> > > > > you
> > > > > > are issuing the deletes and also the table/view (ro, rt) that
> you are
> > > > > > querying?
> > > > > >
> > > > > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed <
> kab...@linuxmail.org>
> > > > > wrote:
> > > > > > > Hudi experts and Users,
> > > > > > > Has anyone attempted an upsert after a delete? Here is a weird
> > > > > >
> > > > >
> > > >
> > >
> > > thing
> > > > > >
> > > > >
> > > > > that
> > > > > > > I have bumped into and it is a shame that this has come up when
> > > > > >
> > > > > >
> > > > >
> > > > > someone in
> > > > > > > the team tested this whilst I failed to run this test.
> > > > > > > Use case:
> > > > > > > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> > > > > > >
> > > > > > > Delete a record (1, kabeer). Data in the table is: (2, vinoth)
> and
> > > it
> > > > > is
> > > > > > > visible via sql through Presto/Hive.
> > > > > > >
> > > > > > > Upsert a new record into the same table (3, balaji). Query the
> > > table
> > > > > and
> > > > > > > only record that is visible is: (3, balaji). The record (2,
> > > > > >
> > > > >
> > > >
> > >
> > > vinoth) is
> > > > > >
> > > > >
> > > > > not
> > > > > > > displayed in the results.
> > > > > > >
> > > > > > > Any ideas on what could be at play here? Has someone done
> upsert
> > > after
> > > > > > > delete?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Kabeer
> > > > > > >
> > > > > > > PS: Please note that upsert functionality is well tested and
> if we
> > > do
> > > > > (1,
> > > > > > > vinoth) insert followed by upsert of (2, balaji) both the
> records
> > > > > >
> > > > >
> > > >
> > >
> > > are
> > > > > > > visible. So something else is at play and would appreciate any
> help
> > > > > >
> > > > > >
> > > > >
> > > > > that
> > > > > > > you experts can provide insight.
> > > > > >
> > > > >
> > > >
> > >
> >
> >
>
>


Re: Upsert after Delete

2019-08-30 Thread Kabeer Ahmed
t; The issue starts when the upsert is done after a delete.
> > > > > > The storage type is S3 and I dont think there is any eventual
> > > > >
> > > > >
> > > > > consistency in play as the record upserted is visible but the old
> > > records
> > > > > that werent deleted are not visible.
> > > > > > And for the sake of completion, my insert and upsert logic is based
> > > > >
> > > >
> > >
> > > out
> > > > >
> > > > > of the code below:
> > >
> > https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > > > > Thanks
> > > > > > Kabeer.
> > > > > >
> > > > > > > /**
> > > > > > > * Empty payload used for deletions
> > > > > > > */
> > > > > > > public class EmptyHoodieRecordPayload implements
> > > > > >
> > > > > >
> > > > >
> > > > > HoodieRecordPayload
> > > > > > > {
> > > > > > > public EmptyHoodieRecordPayload(GenericRecord record, Comparable
> > > > > >
> > > > > >
> > > > >
> > > > > orderingVal) { }
> > > > > > > @Override
> > > > > > > public EmptyHoodieRecordPayload
> > > > > >
> > > > >
> > > >
> > >
> >
> > preCombine(EmptyHoodieRecordPayload
> > > > > >
> > > > >
> > > > > another) {
> > > > > > > return another;
> > > > > > > }
> > > > > > > @Override
> > > > > > > public Optional
> > > > > >
> > > > >
> > > >
> > >
> > > combineAndGetUpdateValue(IndexedRecord
> > > > > >
> > > > >
> > > > > currentValue,
> > > > > > > chema schema) {
> > > > > > > return Optional.empty();
> > > > > > > }
> > > > > > > @Override
> > > > > > > public Optional getInsertValue(Schema schema) {
> > > > > > > return Optional.empty();
> > > > > > > }
> > > > > > > }
> > > > > >
> > > > > >
> > > > > > -- Forwarded Message -
> > > > > > From: Vinoth Chandar 
> > > > > > Subject: Re: Upsert after Delete
> > > > > > Date: Aug 22 2019, at 8:38 pm
> > > > > > To: dev@hudi.apache.org
> > > > > >
> > > > > > That’s interesting. Can you also share details on storage type and
> > > how
> > > > > you
> > > > > > are issuing the deletes and also the table/view (ro, rt) that you
> > > > >
> > > >
> > >
> >
> > are
> > > > > > querying?
> > > > > >
> > > > > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed  > > > > wrote:
> > > > > > > Hudi experts and Users,
> > > > > > > Has anyone attempted an upsert after a delete? Here is a weird
> > > > > >
> > > > >
> > > >
> > >
> > > thing
> > > > > >
> > > > >
> > > > > that
> > > > > > > I have bumped into and it is a shame that this has come up when
> > > > > >
> > > > > >
> > > > >
> > > > > someone in
> > > > > > > the team tested this whilst I failed to run this test.
> > > > > > > Use case:
> > > > > > > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> > > > > > >
> > > > > > > Delete a record (1, kabeer). Data in the table is: (2, vinoth)
> > and
> > > it
> > > > > is
> > > > > > > visible via sql through Presto/Hive.
> > > > > > >
> > > > > > > Upsert a new record into the same table (3, balaji). Query the
> > > table
> > > > > and
> > > > > > > only record that is visible is: (3, balaji). The record (2,
> > > > > >
> > > > >
> > > >
> > >
> > > vinoth) is
> > > > > >
> > > > >
> > > > > not
> > > > > > > displayed in the results.
> > > > > > >
> > > > > > > Any ideas on what could be at play here? Has someone done upsert
> > > after
> > > > > > > delete?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Kabeer
> > > > > > >
> > > > > > > PS: Please note that upsert functionality is well tested and if
> > we
> > > do
> > > > > (1,
> > > > > > > vinoth) insert followed by upsert of (2, balaji) both the records
> > > > > >
> > > > >
> > > >
> > >
> > > are
> > > > > > > visible. So something else is at play and would appreciate any
> > > > > >
> > > > >
> > > >
> > >
> >
> > help
> > > > > >
> > > > >
> > > > > that
> > > > > > > you experts can provide insight.
> > > > > >
> > > > >
> > > >
> > >
> >
>
>



Re: Upsert after Delete

2019-08-30 Thread Vinoth Chandar
> /**
> > > > > > * Empty payload used for deletions
> > > > > > */
> > > > > > public class EmptyHoodieRecordPayload implements
> > > > >
> > > >
> > > > HoodieRecordPayload
> > > > > > {
> > > > > > public EmptyHoodieRecordPayload(GenericRecord record, Comparable
> > > > >
> > > >
> > > > orderingVal) { }
> > > > > > @Override
> > > > > > public EmptyHoodieRecordPayload
> preCombine(EmptyHoodieRecordPayload
> > > > >
> > > >
> > > > another) {
> > > > > > return another;
> > > > > > }
> > > > > > @Override
> > > > > > public Optional
> > combineAndGetUpdateValue(IndexedRecord
> > > > >
> > > >
> > > > currentValue,
> > > > > > chema schema) {
> > > > > > return Optional.empty();
> > > > > > }
> > > > > > @Override
> > > > > > public Optional getInsertValue(Schema schema) {
> > > > > > return Optional.empty();
> > > > > > }
> > > > > > }
> > > > >
> > > > > -- Forwarded Message -
> > > > >
> > > > > From: Vinoth Chandar 
> > > > > Subject: Re: Upsert after Delete
> > > > > Date: Aug 22 2019, at 8:38 pm
> > > > > To: dev@hudi.apache.org
> > > > >
> > > > > That’s interesting. Can you also share details on storage type and
> > how
> > > > you
> > > > > are issuing the deletes and also the table/view (ro, rt) that you
> are
> > > > > querying?
> > > > >
> > > > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed  >
> > > > wrote:
> > > > > > Hudi experts and Users,
> > > > > > Has anyone attempted an upsert after a delete? Here is a weird
> > thing
> > > > >
> > > >
> > > > that
> > > > > > I have bumped into and it is a shame that this has come up when
> > > > >
> > > >
> > > > someone in
> > > > > > the team tested this whilst I failed to run this test.
> > > > > > Use case:
> > > > > > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> > > > > >
> > > > > > Delete a record (1, kabeer). Data in the table is: (2, vinoth)
> and
> > it
> > > > is
> > > > > > visible via sql through Presto/Hive.
> > > > > >
> > > > > > Upsert a new record into the same table (3, balaji). Query the
> > table
> > > > and
> > > > > > only record that is visible is: (3, balaji). The record (2,
> > vinoth) is
> > > > >
> > > >
> > > > not
> > > > > > displayed in the results.
> > > > > >
> > > > > > Any ideas on what could be at play here? Has someone done upsert
> > after
> > > > > > delete?
> > > > > >
> > > > > > Thanks,
> > > > > > Kabeer
> > > > > >
> > > > > > PS: Please note that upsert functionality is well tested and if
> we
> > do
> > > > (1,
> > > > > > vinoth) insert followed by upsert of (2, balaji) both the records
> > are
> > > > > > visible. So something else is at play and would appreciate any
> help
> > > > >
> > > >
> > > > that
> > > > > > you experts can provide insight.
> > > > >
> > > >
> > >
> > >
> >
>


Re: Upsert after Delete

2019-08-29 Thread Jaimin Shah
Hi
  I remember I was also facing some issues with deletes. Maybe both issues
are related ? After deletes not able to query data. At that time
https://jira.apache.org/jira/projects/HUDI/issues/HUDI-107?filter=allopenissues
this issue was filled. Is this issue now resolved?

Thanks,
Jaimin

On Wed, 28 Aug 2019 at 23:29, vbal...@apache.org  wrote:

>
> Hi Kabeer,
> I have requested some information in the github ticket.
> Balaji.VOn Wednesday, August 28, 2019, 10:46:04 AM PDT, Kabeer Ahmed <
> kab...@linuxmail.org> wrote:
>
>  Thanks for the quick response Vinoth. That is what I would have thought
> that there is nothing complex or different in upsert after a delete. Yes, I
> can reproduce the issue with simple example that I have written in the
> email.
>
> I have dug into the issue in detail and it seems it is a bug. I have filed
> it at: https://github.com/apache/incubator-hudi/issues/859 (
> https://link.getmailspring.com/link/23c57df5-045c-4021-a880-93a1c46a3...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D).
> Let me know if more information is required.
> Thank you,
>
> On Aug 23 2019, at 1:37 am, Vinoth Chandar  wrote:
> > yes. I was asking about the HUDI storage type..
> >
> > There is nothing complex about upsert() after delete(). It almost as if a
> > delete() for (2, vinoth) happened in between.
> >
> > Are you able to repro this literally with this tiny example with 3
> records?
> > Some things to check
> >
> > - This sequence would have created 3 commits. You can look at the commit
> > files and see if the number of record updated, inserted, deleted match
> > expectations.
> > - if they do, then you can use spark.read.parquet(.). on the individual
> > parquet files and see what records they actually contain ..
> >
> > This should shed some light on the pattern of failure and when exactly
> (2,
> > vinoth) disappeared.
> >
> > Alternatively, if you can give a small snippet that reproduces this, we
> can
> > debug from there.
> >
> >
> >
> >
> >
> >
> > On Thu, Aug 22, 2019 at 3:06 PM Kabeer Ahmed 
> wrote:
> > > And if you meant HUDI storage type, I have left it to default COW -
> Copy
> > > On Write.
> > >
> > > If anyone has tried this please let me know if you have hit similar
> issue.
> > > Any experience would be greatly helpful.
> > > On Aug 22 2019, at 11:01 pm, Kabeer Ahmed 
> wrote:
> > > > Hi Vinoth - thanks for the quick response.
> > > >
> > > > I have followed the mail thread for deletes ->
> > > http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/<
> > > 16722511.2660.9583626796839453...@gitbox.apache.org>
> > > >
> > > > For your convenience, the code that I use is below at the end of the
> > > email. EmptyHoodieRecord is inserted for the relevant records that
> need to
> > > be deleted. After the delete, I can query from Hive and confirm that
> the
> > > rows intended to be deleted are no longer present and the records not
> > > deleted can be seen in the Hive table via Hive and Presto.
> > > > The issue starts when the upsert is done after a delete.
> > > > The storage type is S3 and I dont think there is any eventual
> > >
> > > consistency in play as the record upserted is visible but the old
> records
> > > that werent deleted are not visible.
> > > > And for the sake of completion, my insert and upsert logic is based
> out
> > >
> > > of the code below:
> > >
> https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > > Thanks
> > > > Kabeer.
> > > >
> > > > > /**
> > > > > * Empty payload used for deletions
> > > > > */
> > > > > public class EmptyHoodieRecordPayload implements
> > > >
> > >
> > > HoodieRecordPayload
> > > > > {
> > > > > public EmptyHoodieRecordPayload(GenericRecord record, Comparable
> > > >
> > >
> > > orderingVal) { }
> > > > > @Override
> > > > > public EmptyHoodieRecordPayload preCombine(EmptyHoodieRecordPayload
> > > >
> > >
> > > another) {
> > > > > return another;
> > > > > }
> > > > > @Override
> > > > > public Optional
> com

Re: Upsert after Delete

2019-08-28 Thread vbal...@apache.org
 
Hi Kabeer,
I have requested some information in the github ticket. 
Balaji.VOn Wednesday, August 28, 2019, 10:46:04 AM PDT, Kabeer Ahmed 
 wrote:  
 
 Thanks for the quick response Vinoth. That is what I would have thought that 
there is nothing complex or different in upsert after a delete. Yes, I can 
reproduce the issue with simple example that I have written in the email.

I have dug into the issue in detail and it seems it is a bug. I have filed it 
at: https://github.com/apache/incubator-hudi/issues/859 
(https://link.getmailspring.com/link/23c57df5-045c-4021-a880-93a1c46a3...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D).
 Let me know if more information is required.
Thank you,

On Aug 23 2019, at 1:37 am, Vinoth Chandar  wrote:
> yes. I was asking about the HUDI storage type..
>
> There is nothing complex about upsert() after delete(). It almost as if a
> delete() for (2, vinoth) happened in between.
>
> Are you able to repro this literally with this tiny example with 3 records?
> Some things to check
>
> - This sequence would have created 3 commits. You can look at the commit
> files and see if the number of record updated, inserted, deleted match
> expectations.
> - if they do, then you can use spark.read.parquet(.). on the individual
> parquet files and see what records they actually contain ..
>
> This should shed some light on the pattern of failure and when exactly (2,
> vinoth) disappeared.
>
> Alternatively, if you can give a small snippet that reproduces this, we can
> debug from there.
>
>
>
>
>
>
> On Thu, Aug 22, 2019 at 3:06 PM Kabeer Ahmed  wrote:
> > And if you meant HUDI storage type, I have left it to default COW - Copy
> > On Write.
> >
> > If anyone has tried this please let me know if you have hit similar issue.
> > Any experience would be greatly helpful.
> > On Aug 22 2019, at 11:01 pm, Kabeer Ahmed  wrote:
> > > Hi Vinoth - thanks for the quick response.
> > >
> > > I have followed the mail thread for deletes ->
> > http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/<
> > 16722511.2660.9583626796839453...@gitbox.apache.org>
> > >
> > > For your convenience, the code that I use is below at the end of the
> > email. EmptyHoodieRecord is inserted for the relevant records that need to
> > be deleted. After the delete, I can query from Hive and confirm that the
> > rows intended to be deleted are no longer present and the records not
> > deleted can be seen in the Hive table via Hive and Presto.
> > > The issue starts when the upsert is done after a delete.
> > > The storage type is S3 and I dont think there is any eventual
> >
> > consistency in play as the record upserted is visible but the old records
> > that werent deleted are not visible.
> > > And for the sake of completion, my insert and upsert logic is based out
> >
> > of the code below:
> > https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > Thanks
> > > Kabeer.
> > >
> > > > /**
> > > > * Empty payload used for deletions
> > > > */
> > > > public class EmptyHoodieRecordPayload implements
> > >
> >
> > HoodieRecordPayload
> > > > {
> > > > public EmptyHoodieRecordPayload(GenericRecord record, Comparable
> > >
> >
> > orderingVal) { }
> > > > @Override
> > > > public EmptyHoodieRecordPayload preCombine(EmptyHoodieRecordPayload
> > >
> >
> > another) {
> > > > return another;
> > > > }
> > > > @Override
> > > > public Optional combineAndGetUpdateValue(IndexedRecord
> > >
> >
> > currentValue,
> > > > chema schema) {
> > > > return Optional.empty();
> > > > }
> > > > @Override
> > > > public Optional getInsertValue(Schema schema) {
> > > > return Optional.empty();
> > > > }
> > > > }
> > >
> > > -- Forwarded Message -
> > >
> > > From: Vinoth Chandar 
> > > Subject: Re: Upsert after Delete
> > > Date: Aug 22 2019, at 8:38 pm
> > > To: dev@hudi.apache.org
> > >
> > > That’s interesting. Can you also share details on storage type and how
> > you
> > > are issuing the deletes and also the table/view (ro, rt) that you are
> > > querying?
> > >
> > &g

Re: Upsert after Delete

2019-08-28 Thread Kabeer Ahmed
Thanks for the quick response Vinoth. That is what I would have thought that 
there is nothing complex or different in upsert after a delete. Yes, I can 
reproduce the issue with simple example that I have written in the email.

I have dug into the issue in detail and it seems it is a bug. I have filed it 
at: https://github.com/apache/incubator-hudi/issues/859 
(https://link.getmailspring.com/link/23c57df5-045c-4021-a880-93a1c46a3...@getmailspring.com/0?redirect=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-hudi%2Fissues%2F859=ZGV2QGh1ZGkuYXBhY2hlLm9yZw%3D%3D).
 Let me know if more information is required.
Thank you,

On Aug 23 2019, at 1:37 am, Vinoth Chandar  wrote:
> yes. I was asking about the HUDI storage type..
>
> There is nothing complex about upsert() after delete(). It almost as if a
> delete() for (2, vinoth) happened in between.
>
> Are you able to repro this literally with this tiny example with 3 records?
> Some things to check
>
> - This sequence would have created 3 commits. You can look at the commit
> files and see if the number of record updated, inserted, deleted match
> expectations.
> - if they do, then you can use spark.read.parquet(.). on the individual
> parquet files and see what records they actually contain ..
>
> This should shed some light on the pattern of failure and when exactly (2,
> vinoth) disappeared.
>
> Alternatively, if you can give a small snippet that reproduces this, we can
> debug from there.
>
>
>
>
>
>
> On Thu, Aug 22, 2019 at 3:06 PM Kabeer Ahmed  wrote:
> > And if you meant HUDI storage type, I have left it to default COW - Copy
> > On Write.
> >
> > If anyone has tried this please let me know if you have hit similar issue.
> > Any experience would be greatly helpful.
> > On Aug 22 2019, at 11:01 pm, Kabeer Ahmed  wrote:
> > > Hi Vinoth - thanks for the quick response.
> > >
> > > I have followed the mail thread for deletes ->
> > http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/<
> > 16722511.2660.9583626796839453...@gitbox.apache.org>
> > >
> > > For your convenience, the code that I use is below at the end of the
> > email. EmptyHoodieRecord is inserted for the relevant records that need to
> > be deleted. After the delete, I can query from Hive and confirm that the
> > rows intended to be deleted are no longer present and the records not
> > deleted can be seen in the Hive table via Hive and Presto.
> > > The issue starts when the upsert is done after a delete.
> > > The storage type is S3 and I dont think there is any eventual
> >
> > consistency in play as the record upserted is visible but the old records
> > that werent deleted are not visible.
> > > And for the sake of completion, my insert and upsert logic is based out
> >
> > of the code below:
> > https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > > Thanks
> > > Kabeer.
> > >
> > > > /**
> > > > * Empty payload used for deletions
> > > > */
> > > > public class EmptyHoodieRecordPayload implements
> > >
> >
> > HoodieRecordPayload
> > > > {
> > > > public EmptyHoodieRecordPayload(GenericRecord record, Comparable
> > >
> >
> > orderingVal) { }
> > > > @Override
> > > > public EmptyHoodieRecordPayload preCombine(EmptyHoodieRecordPayload
> > >
> >
> > another) {
> > > > return another;
> > > > }
> > > > @Override
> > > > public Optional combineAndGetUpdateValue(IndexedRecord
> > >
> >
> > currentValue,
> > > > chema schema) {
> > > > return Optional.empty();
> > > > }
> > > > @Override
> > > > public Optional getInsertValue(Schema schema) {
> > > > return Optional.empty();
> > > > }
> > > > }
> > >
> > > -- Forwarded Message -
> > >
> > > From: Vinoth Chandar 
> > > Subject: Re: Upsert after Delete
> > > Date: Aug 22 2019, at 8:38 pm
> > > To: dev@hudi.apache.org
> > >
> > > That’s interesting. Can you also share details on storage type and how
> > you
> > > are issuing the deletes and also the table/view (ro, rt) that you are
> > > querying?
> > >
> > > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed 
> > wrote:
> > > > Hudi experts and Users,
> > > > Has anyone attempted an upse

Re: Upsert after Delete

2019-08-22 Thread Vinoth Chandar
yes. I was asking about the HUDI storage type..

There is nothing complex about upsert() after delete(). It almost as if a
delete() for (2, vinoth) happened in between.

Are you able to repro this literally with this tiny example with 3 records?
Some things to check

 - This sequence would have created 3 commits. You can look at the commit
files and see if the number of record updated, inserted, deleted match
expectations.
 - if they do, then you can use spark.read.parquet(.). on the individual
parquet files and see what records they actually contain ..

This should shed some light on the pattern of failure and when exactly (2,
vinoth) disappeared.

Alternatively, if you can give a small snippet that reproduces this, we can
debug from there.






On Thu, Aug 22, 2019 at 3:06 PM Kabeer Ahmed  wrote:

> And if you meant HUDI storage type, I have left it to default COW - Copy
> On Write.
>
> If anyone has tried this please let me know if you have hit similar issue.
> Any experience would be greatly helpful.
> On Aug 22 2019, at 11:01 pm, Kabeer Ahmed  wrote:
> > Hi Vinoth - thanks for the quick response.
> >
> > I have followed the mail thread for deletes ->
> http://mail-archives.apache.org/mod_mbox/hudi-commits/201904.mbox/<
> 16722511.2660.9583626796839453...@gitbox.apache.org>
> >
> > For your convenience, the code that I use is below at the end of the
> email. EmptyHoodieRecord is inserted for the relevant records that need to
> be deleted. After the delete, I can query from Hive and confirm that the
> rows intended to be deleted are no longer present and the records not
> deleted can be seen in the Hive table via Hive and Presto.
> > The issue starts when the upsert is done after a delete.
> > The storage type is S3 and I dont think there is any eventual
> consistency in play as the record upserted is visible but the old records
> that werent deleted are not visible.
> > And for the sake of completion, my insert and upsert logic is based out
> of the code below:
> https://github.com/apache/incubator-hudi/blob/a4f9d7575f39bb79089714049ffea12ba5f25ec8/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L43
> > Thanks
> > Kabeer.
> >
> > > /**
> > > * Empty payload used for deletions
> > > */
> > > public class EmptyHoodieRecordPayload implements
> HoodieRecordPayload
> > > {
> > > public EmptyHoodieRecordPayload(GenericRecord record, Comparable
> orderingVal) { }
> > > @Override
> > > public EmptyHoodieRecordPayload preCombine(EmptyHoodieRecordPayload
> another) {
> > > return another;
> > > }
> > > @Override
> > > public Optional combineAndGetUpdateValue(IndexedRecord
> currentValue,
> > > chema schema) {
> > > return Optional.empty();
> > > }
> > > @Override
> > > public Optional getInsertValue(Schema schema) {
> > > return Optional.empty();
> > > }
> > > }
> > -- Forwarded Message -
> >
> > From: Vinoth Chandar 
> > Subject: Re: Upsert after Delete
> > Date: Aug 22 2019, at 8:38 pm
> > To: dev@hudi.apache.org
> >
> > That’s interesting. Can you also share details on storage type and how
> you
> > are issuing the deletes and also the table/view (ro, rt) that you are
> > querying?
> >
> > On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed 
> wrote:
> > > Hudi experts and Users,
> > > Has anyone attempted an upsert after a delete? Here is a weird thing
> that
> > > I have bumped into and it is a shame that this has come up when
> someone in
> > > the team tested this whilst I failed to run this test.
> > > Use case:
> > > Insert data into a table. Say records (1, kabeer | 2, vinoth)
> > >
> > > Delete a record (1, kabeer). Data in the table is: (2, vinoth) and it
> is
> > > visible via sql through Presto/Hive.
> > >
> > > Upsert a new record into the same table (3, balaji). Query the table
> and
> > > only record that is visible is: (3, balaji). The record (2, vinoth) is
> not
> > > displayed in the results.
> > >
> > > Any ideas on what could be at play here? Has someone done upsert after
> > > delete?
> > >
> > > Thanks,
> > > Kabeer
> > >
> > > PS: Please note that upsert functionality is well tested and if we do
> (1,
> > > vinoth) insert followed by upsert of (2, balaji) both the records are
> > > visible. So something else is at play and would appreciate any help
> that
> > > you experts can provide insight.
> >
> >
> >
> >
>
>


Re: Upsert after Delete

2019-08-22 Thread Vinoth Chandar
That’s interesting. Can you also share details on storage type and how you
are issuing the deletes and also the table/view (ro, rt) that you are
querying?

On Thu, Aug 22, 2019 at 9:49 AM Kabeer Ahmed  wrote:

> Hudi experts and Users,
>
> Has anyone attempted an upsert after a delete? Here is a weird thing that
> I have bumped into and it is a shame that this has come up when someone in
> the team tested this whilst I failed to run this test.
> Use case:
> Insert data into a table. Say records (1, kabeer | 2, vinoth)
>
> Delete a record (1, kabeer). Data in the table is: (2, vinoth) and it is
> visible via sql through Presto/Hive.
>
> Upsert a new record into the same table (3, balaji). Query the table and
> only record that is visible is: (3, balaji). The record (2, vinoth) is not
> displayed in the results.
>
> Any ideas on what could be at play here? Has someone done upsert after
> delete?
>
> Thanks,
> Kabeer
>
> PS: Please note that upsert functionality is well tested and if we do (1,
> vinoth) insert followed by upsert of (2, balaji) both the records are
> visible. So something else is at play and would appreciate any help that
> you experts can provide insight.