Thanks. After reading the discussion in HUDI-561, I just realized that the
previously-mentioned built-in partition transformer is better suited to a
custom key generator. Hopefully other suitable ideas of built-in
transformer would come up later.
On Sun, Feb 23, 2020 at 6:34 PM vino yang wrote:
Hi,
I recently came across a strange issue for table T. For the same timestamp,
2 clean instants were present in .hoodie folder, one of them in completed
state and other one in inflight state. As a result, if I try to run cleaner
or DeltaStreamer for this table T, it was failing with the below
Hi Shiyan,
Really sorry, I forgot to attach the reference, the relevant Jira ID is
HUDI-561: https://issues.apache.org/jira/browse/HUDI-561
It seems both of you faced the same issue. While the solution is not the
same. Never mind, you can move the discussion to that issue.
Best,
Vino
Shiyan
Hi,
As discussed in last to last week's weekly sync, I want to put forward this
point on our mailing list also. Since with 0.5.1 release, we have upgraded
spark to 2.4 in our master branch, we are facing difficulties after
rebasing our codebase with master. At our organisation we are using spark
Late to the party. :P
I really favor the idea of built-in support enrichment. It is a very common
case where we want to set datetime fields for partition path. We could have
a built-in support to normalize ISO format / unix timestamp. For example
`HourlyPartitionTransformer` will normalize
Hi,
While working on one of my PRs, I am stuck with the following test cases in
TestHoodieDeltaStreamer -
1. testUpsertsCOWContinuousMode
2. testUpsertsMORContinuousMode
For both of them, at line [1] and [2], we are adding 200 to totalRecords
while asserting record count and distance count
Hi Shiyan,
Thanks for rasing this thread up again and sharing your thoughts. They are
valuable.
Regarding the date-time specific transform, there is an issue[1] that
describes this business requirement.
Best,
Vino
Shiyan Xu 于2020年2月24日周一 上午7:22写道:
> Late to the party. :P
>
> I really favor
Hi Sivabalan,
Thanks for your proposal.
Big +1 from my side, indexing for record granularity is really good for
performance. It is also towards the streaming processing.
Best,
Vino
Sivabalan 于2020年2月23日周日 上午12:52写道:
> As Aapche Hudi is getting widely adopted, performance has become the need
Thanks Vino. Are you referring to HUDI-613? How about making it an umbrella
task due to its big scope? (btw it is stated as "bug", which should be
fixed too). I can create another specific task under it for the idea of
datetime -> partition path transformer, if it makes sense.
On Sun, Feb 23,