need help with Hudi Delete

2022-07-14 Thread aakash aakash
Hi,

We have a use case to perform soft delete over some record keys where we
nullify non-key fields and ignore any update for this record later on.  We
thought of using a hudi meta field: "_hoodie_is_soft_deleted" as hudi hard
delete (_hoodie_is_deleted) does to make it simple to identify if the
platform perform any soft delete but I am getting avro field not found
exception when we perform another soft delete on the same index, please let
me know if you have any advise how to fix it or if this is a wrong
approach, we wanted to avoid adding any extra field in the customer schema
and behind the scene filter the soft delete record as done for hard delete
but still keep the record in the system.


Hudi : 0.8.0
Exception stacktrace:

2/07/14 22:08:21 WARN TaskSetManager: Lost task 5.0 in stage 93.0 (TID
33283, 172.25.31.77, executor 3):
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType
UPDATE for partition :5
  at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:288)
  at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$execute$ecf5068c$1(BaseSparkCommitActionExecutor.java:139)
  at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
  at
org.apache.spark.api.java.JavaRDDLike$$anonfun$mapPartitionsWithIndex$1.apply(JavaRDDLike.scala:102)
  at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
  at
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853)
  at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
  at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
  at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:337)
  at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:335)
  at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
  at
org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
  at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
  at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
  at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  at org.apache.spark.scheduler.Task.run(Task.scala:123)
  at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hudi.exception.HoodieException:
org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieException: operation has failed
  at
org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:102)
  at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:317)
  at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:308)
  at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:281)
  ... 30 more
Caused by: org.apache.hudi.exception.HoodieException:
java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieException: operation has failed
  at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:143)
  at
org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
  ... 33 more
Caused by: java.util.concurrent.ExecutionException:
org.apache.hudi.exception.HoodieException: operation has failed
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  at
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
  ... 34 more
Caused by: org.apache.hudi.exception.HoodieException: operation has failed
  at
org.apache.hudi.common.util.queue.BoundedInMemoryQueu

Re: 0.12.0 Release Timeline

2022-07-14 Thread Vinoth Chandar
+1 from me.

On Thu, Jul 14, 2022 at 9:43 AM sagar sumit  wrote:

> Hi Folks,
>
> After some deliberation with the community and keeping the release blockers
> <
> https://github.com/apache/hudi/pulls?q=is%3Apr+is%3Aopen+label%3Apriority%3Ablocker
> >
> in
> mind,
> I am proposing a new date for code freeze *July 25, 11:59 PM PST*.
>
> Please highlight any concerns you may have.
>
> Regards,
> Sagar
>
> On Wed, Jul 13, 2022 at 5:59 PM Shawy Geng 
> wrote:
>
> > July 30 may be better for me. There are some performance issues of spark
> > row writing that need to be fixed and it needs more detailed benchmark
> > test.
> >
> > sagar sumit  于2022年7月11日周一 12:59写道:
> >
> > > Hi Folks,
> > >
> > > Some excellent features from the community are in review and
> near-landing
> > > (could take up to a week).
> > > The release blockers are tracked here
> > > <
> > >
> >
> https://github.com/apache/hudi/pulls?q=is%3Apr+is%3Aopen+label%3Apriority%3Ablocker
> > > >
> > > .
> > > So, I'd like to propose an updated release timeline.
> > > - *July 20, 11:59 PM PST*: Code freeze - new features/functionalities
> > > won't be merged to master.
> > > - *July 22, 11:59 PM PST*: Cut release branch and start RC
> > voting/testing.
> > >
> > > Regards,
> > > Sagar
> > >
> > > On Wed, Jun 29, 2022 at 2:54 PM sagar sumit  wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > I am the RM for the upcoming release 0.12.0.
> > > > In line with our roadmap, I'd like to propose the following timeline:
> > > >
> > > > - *July 15, 11:59 PM PST*: Code freeze - new features/functionalities
> > > > won't be merged to master.
> > > > - *July 18, 11:59 PM PST*: Cut release branch and start RC
> > > voting/testing.
> > > >
> > > > Please highlight any concerns with the timeline.
> > > > If it works, please +1 on this thread.
> > > >
> > > > Also, if you have not done already, please tag any JIRAs that you
> have
> > > > planned for the release by setting its "Fix Version/s" to "0.12.0"
> > > >
> > > > Regards,
> > > > Sagar
> > > >
> > >
> >
>


Re: 0.12.0 Release Timeline

2022-07-14 Thread sagar sumit
Hi Folks,

After some deliberation with the community and keeping the release blockers

in
mind,
I am proposing a new date for code freeze *July 25, 11:59 PM PST*.

Please highlight any concerns you may have.

Regards,
Sagar

On Wed, Jul 13, 2022 at 5:59 PM Shawy Geng  wrote:

> July 30 may be better for me. There are some performance issues of spark
> row writing that need to be fixed and it needs more detailed benchmark
> test.
>
> sagar sumit  于2022年7月11日周一 12:59写道:
>
> > Hi Folks,
> >
> > Some excellent features from the community are in review and near-landing
> > (could take up to a week).
> > The release blockers are tracked here
> > <
> >
> https://github.com/apache/hudi/pulls?q=is%3Apr+is%3Aopen+label%3Apriority%3Ablocker
> > >
> > .
> > So, I'd like to propose an updated release timeline.
> > - *July 20, 11:59 PM PST*: Code freeze - new features/functionalities
> > won't be merged to master.
> > - *July 22, 11:59 PM PST*: Cut release branch and start RC
> voting/testing.
> >
> > Regards,
> > Sagar
> >
> > On Wed, Jun 29, 2022 at 2:54 PM sagar sumit  wrote:
> >
> > > Hi Folks,
> > >
> > > I am the RM for the upcoming release 0.12.0.
> > > In line with our roadmap, I'd like to propose the following timeline:
> > >
> > > - *July 15, 11:59 PM PST*: Code freeze - new features/functionalities
> > > won't be merged to master.
> > > - *July 18, 11:59 PM PST*: Cut release branch and start RC
> > voting/testing.
> > >
> > > Please highlight any concerns with the timeline.
> > > If it works, please +1 on this thread.
> > >
> > > Also, if you have not done already, please tag any JIRAs that you have
> > > planned for the release by setting its "Fix Version/s" to "0.12.0"
> > >
> > > Regards,
> > > Sagar
> > >
> >
>