Re: Dropping support for Spark 2.2 and lower

2019-09-10 Thread Shiyan Xu
+1 On Tue, Sep 10, 2019 at 7:16 AM Vinoth Chandar wrote: > Hello all, > > I am trying to gauge what spark version everyone is on. We would like to > move the spark version to 2.4 and simplify a whole bunch of stuff. Any > objections? As a best effort, we can try to make 2.3 work reliably. Any >

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-12 Thread Shiyan Xu
gt; Hi Shiyan, > > > > > > > > +1 for this proposal, Also, it looks like an exporter tool. > > > > > > > > @Vinoth Chandar Any thoughts about where to > place > > > it? > > > > > > > > Best, > > > >

[DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-11 Thread Shiyan Xu
Hi All, The existing SnapshotCopier under Hudi Utilities is a Hudi-to-Hudi copy and primarily for backup purpose. I would like to start a RFC for a more generic Hudi snapshotter, which - Supports existing SnapshotCopier features - Add option to export a Hudi dataset to plain parquet files

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-11 Thread Shiyan Xu
at 4:31 PM Vinoth Chandar wrote: > What you suggest sounds more like an `Exporter` tool? I imagine you will > support MOR as well? +1 on the idea itself. It could be useful if plain > parquet snapshot was generated as a backup. > > On Mon, Nov 11, 2019 at 4:21 PM Shiyan Xu > w

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-12 Thread Shiyan Xu
Came up with the first draft. Thank you. https://cwiki.apache.org/confluence/display/HUDI/RFC-9%3A+%28WIP%29+Hudi+Dataset+Snapshotter On Tue, Nov 12, 2019 at 12:44 PM Shiyan Xu wrote: > Thank you all for the +1s! I'll go ahead add a RFC page then. > > On Tue, Nov 12, 2019 at 8:41 A

RFC process step 1 votes

2019-11-12 Thread Shiyan Xu
Hi all, As per the RFC process https://cwiki.apache.org/confluence/display/HUDI/RFC+Process We usually start with an email thread to raise an idea before step 2: creating an RFC page. It'll be good to reach an agreement on how many votes (+1) do we need to proceed to step 2. The idea is not to

[QUESTION] Handle record partition change

2019-12-11 Thread Shiyan Xu
Hi Hudi devs, Upon upsert operations, does Hudi detect record's partition path change? As for the same record, the partition path field may get updated while the record key (the primary id) stays the same, then the insert would result in duplicate record (based on record key) in the dataset. Is

Re: [DISCUSS] Simplification of terminologies

2019-11-11 Thread Shiyan Xu
[1] +1; "query" indeed sounds better [2] +1 on the term "snapshot"; so basically we follow the convention that when we say "snapshot", it means "give me the most up-to-date facts (lowest data latency) even if it takes some query time" [3] Though I agree with the renaming, I have a different

Re: Re: Re:Re: Re: Re: [DISCUSS] Rework of new web site

2019-12-18 Thread Shiyan Xu
Thank you @lamber-ken for the work! It is definitely a greater browsing experience. On Tue, Dec 17, 2019 at 8:28 PM lamberken wrote: > > Hi, @Vinoth > > > > I'm glad to hear your thoughts on the new UI, thanks. So we keep its style > as it is now. > The development of new UI can be completed

Re: [QUESTION] Handle record partition change

2019-12-18 Thread Shiyan Xu
Thank you. Best, Raymond On Wed, Dec 11, 2019 at 11:16 AM Sivabalan wrote: > Depends on whether you are using regular BLOOM or GLOBAL_BLOOM. May I know > which one are you talking about? > > > On Wed, Dec 11, 2019 at 9:12 AM Shiyan Xu > wrote: > > > Hi Hudi devs, >

Re: [QUESTION] Handle record partition change

2019-12-18 Thread Shiyan Xu
on is ignored. > > Option2: > Insert a new record, record1 to Partition2. and Delete record1 from > Partition1. > > I have already put up a patch for Option1. but looks like Raymond is > looking for Option2. > > > > > > On Wed, Dec 18, 2019 at 8:48 AM Shiyan

Re: [QUESTION] Handle record partition change

2019-12-18 Thread Shiyan Xu
Sure. I can create a JIRA and note down the discussion points there. On Wed, Dec 18, 2019 at 7:14 PM Vinoth Chandar wrote: > Interesting discussion. We can file a JIRA for option 2? It seems to also > make the semantics simpler. > > On Wed, Dec 18, 2019 at 11:21 AM Shiyan

Re: HudiDeltaStreamer on EMR

2020-02-24 Thread Shiyan Xu
It's likely that the source parquet data has a column of Spark Timestamp type, which is not convertible to avro. By the way, ParquetDFSSource is not available in 0.5.0. Only added in 0.5.1. You'll probably need to add a custom class which follows its existing implementation, and get rid of it once

Re: Weekly sync notes 20201225

2020-02-25 Thread Shiyan Xu
link https://cwiki.apache.org/confluence/display/HUDI/20200225+Weekly+Sync+Minutes On Tue, Feb 25, 2020 at 9:39 PM vbal...@apache.org wrote: > Please find the weekly sync notes here > 20200225 Weekly Sync Minutes - HUDI - Apache Software Foundation > > Thanks,Balaji.V

Re: [DISCUSS] RFC - 08 : Record level indexing mechanisms for Hudi datasets

2020-02-24 Thread Shiyan Xu
+1 great reading and values! On Mon, 24 Feb 2020, 15:31 nishith agarwal, wrote: > +100 > - Reduces index lookup time hence improves job runtime > - Paves the way for streaming style ingestion > - Eliminates dependency on Hbase (alternate "global index" support at the > moment) > > -Nishith > >

Re: Refactor and enhance Hudi Transformer

2020-02-23 Thread Shiyan Xu
that issue. > > Best, > Vino > > > Shiyan Xu 于2020年2月24日周一 上午10:21写道: > > > Thanks Vino. Are you referring to HUDI-613? How about making it an > umbrella > > task due to its big scope? (btw it is stated as "bug", which should be > > fixed too).

Re: Apache Hudi on AWS EMR

2020-02-27 Thread Shiyan Xu
6 PM Dubey, Raghu > > > > > > wrote: > > > > > > > > > Athena is indeed Presto inside, but there is lot of custom code > > which has > > > > > gone on top of Presto there. > > > > > Couple months back I tried running a

Re: Apache Hudi on AWS EMR

2020-02-17 Thread Shiyan Xu
For 2) I think running presto on EMR is able to let you run read-optimized queries. I don't quite understand how exactly Athena not support Hudi as it is Presto underlying. Perhaps @Udit could give some insights from AWS? As @Raghvendra you mentioned, another option is to export Hudi dataset to

Re: Snapshot from cold storage store and continues with latest data from biglog

2020-02-17 Thread Shiyan Xu
Hi Syed, as Vinoth mentioned, the HoodieSnapshotCopier is meant for this purpose You may also read more on the RFC-9, which plans to introduce a backward-compatible tool to cover HoodieSnapshotCopier https://cwiki.apache.org/confluence/display/HUDI/RFC+-+09+%3A+Hudi+Dataset+Snapshot+Exporter

Re: Refactor and enhance Hudi Transformer

2020-02-23 Thread Shiyan Xu
Late to the party. :P I really favor the idea of built-in support enrichment. It is a very common case where we want to set datetime fields for partition path. We could have a built-in support to normalize ISO format / unix timestamp. For example `HourlyPartitionTransformer` will normalize

Re: Refactor and enhance Hudi Transformer

2020-02-23 Thread Shiyan Xu
n Sun, Feb 23, 2020 at 5:57 PM vino yang wrote: > Hi Shiyan, > > Thanks for rasing this thread up again and sharing your thoughts. They are > valuable. > > Regarding the date-time specific transform, there is an issue[1] that > describes this business requirement. > > Best,

Re: Please welcome our new PPMCs and Committer

2020-02-14 Thread Shiyan Xu
Congrats! Very well deserved! On Fri, 14 Feb 2020, 13:11 vbal...@apache.org, wrote: > Congratulations to Leesf, Vino Yang and Siva. > +1 Very well deserved :) Looking forward to your continued contributions. > Balaji.V > On Friday, February 14, 2020, 12:11:18 PM PST, Bhavani Sudha < >

Re: [DISCUSS] Delay code freeze date for next release until Jan 19th (Sunday)

2020-01-15 Thread Shiyan Xu
+1 I assume you meant UTC-8  On Wed, 15 Jan 2020, 11:26 nishith agarwal, wrote: > +1, sunday sounds good. > > -Nishith > > On Wed, Jan 15, 2020 at 9:08 AM Balaji Varadarajan > wrote: > > > +1 Sunday should give breathing space to fix the blockers. > > Balaji.V > > On Wednesday, January

Re: [DISCUSS] Unify Hudi code cleanup and improvement

2020-01-21 Thread Shiyan Xu
The clean-up work can actually be split by modules. Though it is generally a good practice to follow, my concern is the clean-up is likely to cause conflicts with some on-going changes. If I may suggest, the dedicated clean-up tasks should avoid - modules that are undergoing multiple feature

Re: [DISCUSS] Code freeze date for next release(0.5.1)

2020-01-08 Thread Shiyan Xu
+1. Good idea for testing phase. On Wed, 8 Jan 2020, 08:26 Vinoth Chandar, wrote: > +1 one More week to land two weeks of testing is a good plan > > On Wed, Jan 8, 2020 at 2:41 AM leesf wrote: > > > Dear Community, > > > > As discussed before[1], the proposed release date of *end of Jan* for >

Re: running Hudi in AWS Glue Spark

2020-03-06 Thread Shiyan Xu
I can answer this as my team faces exactly the same problems. We recently sync'ed up with AWS EMR team and got some directions. Hudi dataset <> Glue An interim approach is needed: configure S3 notification to detect new commit file after each compaction, upon the notification update an manifest

Re: New PPMC Member : Bhavani Sudha

2020-04-08 Thread Shiyan Xu
Congrats Sudha! Well deserved! On Tue, Apr 7, 2020 at 8:46 PM vino yang wrote: > Congrats sudha, well deserved! > > Best, > Vino > > leesf 于2020年4月8日周三 上午9:31写道: > > > Congrats sudha, well deserved! > > > > Balaji Varadarajan 于2020年4月8日周三 上午6:55写道: > > > > > Congratulations Sudha :) Well

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-04-08 Thread Shiyan Xu
s increases the scope to a overhaul of tests > across > > > the project.. Wonder if we can do a RFC for this? But overall +1 from > me. > > > > > > I would like to call upon the community to chime in more though :) . > > let's > > > give it a few days..

Re: New Committer: lamber-ken

2020-04-08 Thread Shiyan Xu
Congrats Lamber-ken! Well deserved! On Wed, Apr 8, 2020 at 4:52 AM Sivabalan wrote: > Congrats Lamber! Well deserved. > > On Wed, Apr 8, 2020 at 5:21 AM Pratyaksh Sharma > wrote: > > > Congratulations lamberken! > > > > On Wed, Apr 8, 2020 at 11:10 AM Jiayi Liao > > wrote: > > > > >

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-04-09 Thread Shiyan Xu
he initial work, we could just begin > with JIRA? > > On Wed, Apr 8, 2020 at 12:56 PM Shiyan Xu > wrote: > > > Thank you all for the feedback. > > > > > This increases the scope to a overhaul of tests across the project.. > > Wonder if we can do a RFC fo

Re: [HELP WANTED] Codecov report skips JUnit 5 test cases

2020-04-14 Thread Shiyan Xu
decov support as well? > > Does seem weird.. :/ > > > > On Mon, Apr 13, 2020 at 11:14 PM Shiyan Xu > > wrote: > > > > > Hi all, > > > > > > We're migrating all test cases to JUnit 5. > > > > > > This PR, as an initial step to ena

[HELP WANTED] Codecov report skips JUnit 5 test cases

2020-04-14 Thread Shiyan Xu
Hi all, We're migrating all test cases to JUnit 5. This PR, as an initial step to enable JUnit 5, has migrated quite a few test cases. The test cases pass in green with no issue. https://github.com/apache/incubator-hudi/pull/1504 However, as you can see in the Codecov comment, the coverage

[DISCUSS] Support popular metrics reporter

2020-04-20 Thread Shiyan Xu
Hi all, I'd like raise the topic of supporting multiple metrics reporters. Currently hudi supports graphite and JMX. And there are 2 proposed reporter types: CSV and Prometheus https://jira.apache.org/jira/browse/HUDI-210 https://jira.apache.org/jira/browse/HUDI-361 I think supporting multiple

Re: [DISSCUSS] Troubleshooting flow

2020-04-06 Thread Shiyan Xu
gt; issues raised here). > > > >>> > > > >>> That said, we could definitely formalize this and look to move > slack > > > >>> threads into GH issue for triaging (then follow up with JIRA, if > real > > > bug) >

HoodieSnapshotExporter

2020-03-27 Thread Shiyan Xu
Hi all, We recently merged a utility class HoodieSnapshotExporter (RFC-9 ) into master with a goal to enhance exporting capabilities. Many thanks to @openopened (sorry I only know your GitHub handle) for

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-03-27 Thread Shiyan Xu
gt; > Sorry to expand scope, but when someone is going to take a look at every > test, I could not pass up an opportunity to sneak this in :) > > Love to hear others thoughts.. any one with experience working with > Junit5/Assertj-Hamcrest? > > On Tue, Mar 24, 2020 at 9:36 PM Shiyan X

Re: HoodieSnapshotExporter

2020-03-28 Thread Shiyan Xu
M vino yang wrote: > > > Hi Raymond, > > > > Thanks for driving this valuable feature! Having this tool, it would be > > easier for backup purposes! > > > > Best, > > Vino > > > > > > > > Shiyan Xu 于2020年3月28日周六 上午8:21写道: > &g

Re: [DISSCUSS] Troubleshooting flow

2020-03-31 Thread Shiyan Xu
Good idea to use GH issues as triage. Not sure if slack has some answerbot to auto reply and promote users to create GH issues. If it can be configured that way, that'd be great for this purpose :) On Tue, 31 Mar 2020, 10:03 lamberken, wrote: > Hi team, > > > > > Many users use slack ask for

Re: [DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-03-24 Thread Shiyan Xu
Some references https://junit.org/junit5/docs/current/user-guide/ https://joel-costigliola.github.io/assertj/ On Tue, Mar 24, 2020 at 9:27 PM Shiyan Xu wrote: > Hi all, > > I'd like to gather some feedback about > 1. upgrading Junit 4 to 5 > 2. adopt AssertJ as preferred asse

[DISCUSS] Upgrade unit test: Junit 5 & AssertJ

2020-03-24 Thread Shiyan Xu
Hi all, I'd like to gather some feedback about 1. upgrading Junit 4 to 5 2. adopt AssertJ as preferred assertion statement style IMO 1) will give many benefits on writing better unit tests. A google search of "junit 4 vs 5" could lead to many good points. And it is some migration can be done

Re: [ATTN] JUnit 5 adoption

2020-04-23 Thread Shiyan Xu
pr 21, 2020 at 10:39 PM Vinoth Chandar > > wrote: > > > > > > > +1 Appreciate the efforts, Raymond! > > > > > > > > [Wondering if there is a way to stick a checkstyle rule to this > effect. > > > > guess it won't check for new changes alone, rathe

Re: [DISCUSS] Bug bash?

2020-04-23 Thread Shiyan Xu
+1 would like to participate On Thu, Apr 23, 2020 at 5:51 PM Dongdong Hong wrote: > +1 sounds great! > > Sivabalan 于2020年4月23日周四 下午9:30写道: > > > +1 > > > > On Wed, Apr 22, 2020 at 7:29 PM lamber-ken wrote: > > > > > > > > > > > > > > Wow, challenging job, +1 > > > > > > > > > Best, > > >

Re: [DISCUSS] Support popular metrics reporter

2020-04-23 Thread Shiyan Xu
Thank you all for the approval! Filed https://issues.apache.org/jira/browse/HUDI-836 On Thu, Apr 23, 2020 at 5:40 PM dongdong hong wrote: > +1 > >

[ATTN] JUnit 5 adoption

2020-04-21 Thread Shiyan Xu
Hi all, We're in progress with JUnit 5 migration for all test classes. So far the JUnit 5 dependencies (including Mockito) have been added to all modules. The APIs/modules migration status is shown here https://github.com/apache/incubator-hudi/pull/1530#issue-405575235 I would like to kindly ask

[DISCUSS] Return schema provider as optional?

2020-05-02 Thread Shiyan Xu
Hi all, In case of reading schema-inferable source like parquet, when no new data is found, then, if i understand correctly, no schema can be inferred, and need not to be. Seeing this method org.apache.hudi.utilities.sources.InputBatch#getSchemaProvider requiring non-null schemaProvider, and

Re: Unable to run hudi-cli integration tests

2020-05-17 Thread Shiyan Xu
Hi Pratyaksh, I have the same setup as yours. I would normally tend to clean up my local deps mvn dependency:purge-local-repository mvn clean install -DskipTests -DskipITs mvn -Dtest=ITTestRepairsCommand#testDeduplicateWithReal -DfailIfNoTests=false test Though I was able to run the test, it

Re: [DISCUSS] should we do a 0.5.3 patch set release ?

2020-05-06 Thread Shiyan Xu
+1 for 0.5.3 as well On Wed, May 6, 2020 at 1:55 PM Sivabalan wrote: > sounds good Sudha. Let's have a good list of projects/features to be done > for 0.6.0 and not end up in a similar situation. I am ok to go with 0.5.3. > > On Wed, May 6, 2020 at 4:31 PM Vinoth Chandar wrote: > > > Hi Sudha,

Re: [VOTE] Apache Hudi graduation to top level project

2020-05-06 Thread Shiyan Xu
+1 On Wed, May 6, 2020 at 2:49 PM Sivabalan wrote: > +1 :) > > On Wed, May 6, 2020 at 5:30 PM Gary Li wrote: > > > +1 > > > > On Wed, May 6, 2020 at 2:28 PM Suneel Marthi wrote: > > > > > +1 > > > > > > On Wed, May 6, 2020 at 5:01 PM Bhavani Sudha > > > wrote: > > > > > > > +1 > > > > > > >

Re: [DISCUSS] Why add unit tests for hudi-cli module

2020-05-12 Thread Shiyan Xu
Hi, the tests in hudi-cli are more of functional tests. They are conducive to verifying features in cli module are working. Though not covering all options, it is always better to have some assuring passing tests than none, isn't it? :) On Tue, May 12, 2020 at 8:31 AM hmantu wrote: > hi all, >

Re: Question on DeltaStreamer

2020-03-18 Thread Shiyan Xu
To answer your question regarding the properties file It is a way to manage a bunch of hoodie configuration; those confs will be merged with other confs passed from --hoodie-conf. See this line

Re: Sequence of Transformers

2020-03-23 Thread Shiyan Xu
Seems like an abstract class would be good enough for generic use? User can provide a list of `Transformer` then the abstract class just apply all the way through the list. The implementation can be minimal for this approach. On Mon, Mar 23, 2020 at 4:12 PM Vinoth Chandar wrote: > sg. Filed

Re: [Discussion] hudi support log append scenario with better write and asynchronous compaction

2020-05-19 Thread Shiyan Xu
Hi Wei, +1 on the proposal; append-only is a commonly seen use case. IIUC, the main concern is, Hudi by default generates small files internally in COW tables. And by setting `hoodie.parquet.small.file.limit` can reduce the number of small files but slow down the pipeline (by doing compaction).

Re: Apache Hudi Graduation vote on general@incubator

2020-05-22 Thread Shiyan Xu
Great news. Congratulations! On Fri, May 22, 2020 at 5:40 PM wangxianghu wrote: > congratulations,great job! > > 发自我的iPhone > > > 在 2020年5月23日,05:59,Sivabalan 写道: > > > > Congrats :) Kudos to Vinoth and the community :) > > > > > >> On Fri, May 22, 2020 at 5:57 PM Mehrotra, Udit > > >>

[DISCUSS] Write failed records

2020-05-22 Thread Shiyan Xu
Hi all, I'd like to bring up this discussion around handling errors in Hudi write paths. https://issues.apache.org/jira/browse/HUDI-648 Trying to gather some feedbacks about the implementation details 1. Error location I'm thinking of writing the failed records to `.hoodie/errors/` for a)

Re: hudi dependency conflicts for test

2020-05-21 Thread Shiyan Xu
Hi Lian, it appears that you need to have spark-avro_2.11:2.4.4 in your classpath. On Thu, May 21, 2020 at 10:04 AM Lian Jiang wrote: > Thanks Balaji. > > My unit test failed due to dependency incompatibility. Any idea will be > highly appreciated! > > > The test is copied from hudi quick

Re: hudi dependency conflicts for test

2020-05-21 Thread Shiyan Xu
That was a close one. :) On Thu, May 21, 2020 at 10:46 AM Vinoth Chandar wrote: > Wow.. Race condition :) .. > > Thanks for racing , Raymond! > > On Thu, May 21, 2020 at 10:08 AM Shiyan Xu > wrote: > > > Hi Lian, it appears that you need to have spark-avro_2.11:

Re: hudi dependency conflicts for test

2020-05-21 Thread Shiyan Xu
mpl.java:48) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > org.gradle.internal.concurrent.ThreadFac

Re: [VOTE] Release 0.6.0, release candidate #1

2020-08-21 Thread Shiyan Xu
I should have documented this...(which I will soon) When run from terminal, could you please try running with maven profile like `mvn -Punit-tests test` `mvn -Pfunctional-tests test` which should work.. Best, Raymond On Fri, Aug 21, 2020 at 9:44 PM Gary Li wrote: > +1 (non binding) > -

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-22 Thread Shiyan Xu
> >>>> Yes, it's the key thing. But, IMO, we can ignore the IDE here, if > it > > > >>> breaks > > > >>>> the code style, checkstyle will stop building and spotless will > > work. > > > >>>> > &g

Re: Sequence of Transformers

2020-07-15 Thread Shiyan Xu
apache/incubator-hudi/pull/1440 > > > > > > On Mon, Mar 23, 2020 at 5:32 PM Shiyan Xu > > > > wrote: > > > > > > > Seems like an abstract class would be good enough for generic use? > > > > User can provide a list of `Transformer` the

Re: PSA: master integ-tests failing

2020-08-01 Thread Shiyan Xu
looks like this is caused by scalatest-maven-plugin, which is controlled by skipTests property. had a fix for changing to skipUTs https://github.com/apache/hudi/pull/1897/files On Fri, Jul 31, 2020 at 8:21 PM nishith agarwal wrote: > All, > > I've added new log4j properties to the docker setup

Re: Merge upserts across partitions

2020-08-10 Thread Shiyan Xu
Looks like you might need to use GLOBAL_BLOOM and set this to true https://hudi.apache.org/docs/configurations.html#bloomIndexUpdatePartitionPath Note that there is a fix related to this setting in the upcoming 0.6.0; recommend to use it instead of 0.5.2. On Sun, Aug 9, 2020 at 10:28 PM Taher

[DISCUSS] Codestyle: force multiline indentation

2020-08-10 Thread Shiyan Xu
Hi all, I noticed that throughout the codebase, when method arguments wrap to a new line, there are cases where indentation is 4 and other cases align the wrapped line to the previous line of argument. The latter is caused by intelliJ settings of "Align when multiline" enabled. This won't be

Re: [DISCUSS] Codestyle: force multiline indentation

2020-08-10 Thread Shiyan Xu
re we add more checking? > > On Mon, Aug 10, 2020 at 7:04 PM Shiyan Xu > wrote: > > > Hi all, > > > > I noticed that throughout the codebase, when method arguments wrap to a > new > > line, there are cases where indentation is 4 and other cases align the &

Re: DISCUSS code, config, design walk through sessions

2020-07-06 Thread Shiyan Xu
+1 On Mon, Jul 6, 2020 at 9:27 AM vbal...@apache.org wrote: > +1. > On Monday, July 6, 2020, 09:11:47 AM PDT, Bhavani Sudha < > bhavanisud...@gmail.com> wrote: > > +1 this is a great idea! > > On Mon, Jul 6, 2020 at 7:54 AM vino yang wrote: > > > +1 > > > > Adam Feldman 于2020年7月6日周一

GitHub release display issue

2020-07-14 Thread Shiyan Xu
The new GitHub UI displays 0.4.7 as the latest, which is misleading. Guess adding notes to the later releases could resolve it? [image: Screen Shot 2020-07-13 at 11.37.16 PM.png]

Re: DISCUSS code, config, design walk through sessions

2020-07-14 Thread Shiyan Xu
+1 On Tue, Jul 14, 2020, 11:34 AM Vinoth Chandar wrote: > Typo: date TBD (not data :)) > > On Tue, Jul 14, 2020 at 11:20 AM Adam Feldman wrote: > > > +1 > > > > On Tue, Jul 14, 2020, 14:09 Gary Li wrote: > > > > > +1. 8am works for me. > > > > > > On Tue, Jul 14, 2020 at 11:01 AM Vinoth

Re: DISCUSS code, config, design walk through sessions

2020-07-08 Thread Shiyan Xu
; > > > Thanks, everyone! There appears to be great interest. let's do it. > > > > > > In terms of timing, I was thinking if we can extend one of our existing > > > community weekly sync meetings for this purpose. > > > So, timing would be 930-1

Re: [DISCUSS] Introduce a write committed callback hook

2020-06-21 Thread Shiyan Xu
+1. It is a great complement to the pull model; helpful to fan-out scenarios On Sun, Jun 21, 2020 at 8:07 AM Bhavani Sudha wrote: > +1 . I think this is a valid use case and would be useful in general. > > On Sun, Jun 21, 2020 at 7:11 AM Vinoth Chandar wrote: > > > +1 as well > > > > > We

[DISCUSS] Make delete marker configurable?

2020-06-26 Thread Shiyan Xu
Hi all, A small suggestion: as delta streamer relies on `_hoodie_is_deleted` to do hard delete, can we make it configurable? as in users can specify any boolean field for delete marker and `_hoodie_is_deleted` remains as default. Regards, Raymond

Re: Re:Re: [DISCUSS] Regarding nightly builds

2020-06-21 Thread Shiyan Xu
+1 very helpful to accelerate the adoption. On Sun, Jun 21, 2020 at 4:51 PM Sivabalan wrote: > +1 > > On Sun, Jun 21, 2020 at 11:58 AM vbal...@apache.org > wrote: > > > +1. It is a good idea to run hudi-test-suite on a daily basis with > > expanded tests. > > Balaji.VOn Sunday, June 21,

Re: [DISCUSS] Publishing benchmarks for releases

2020-06-21 Thread Shiyan Xu
+1 definitely useful info. On Sun, Jun 21, 2020 at 4:56 PM Sivabalan wrote: > Hey folks, > Is it a common practise to publish benchmarks for releases? I have put > up an initial PR to add jmh > benchmark support to a couple of Hudi operations. If

Re: [DISCUSS] Make delete marker configurable?

2020-06-28 Thread Shiyan Xu
> > > > > > > > > Thanks, > > > Sudha > > > > > > On Fri, Jun 26, 2020 at 9:02 PM Shiyan Xu > > > > wrote: > > > > > > > Hi all, > > > > > > > > A small suggestion: as delta stream

Re: [DISCUSS] Make delete marker configurable?

2020-06-28 Thread Shiyan Xu
. Can you please provide more > context on what problem this addresses ? > > > Thanks, > Sudha > > On Fri, Jun 26, 2020 at 9:02 PM Shiyan Xu > wrote: > > > Hi all, > > > > A small suggestion: as delta streamer relies on `_hoodie_is_deleted` to > do

Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-10 Thread Shiyan Xu
are really external data, not related > to a given table’ core functioning.. we don’t necessarily want to keep one > error table per hudi table.. > > Thoughts? > > On Tue, Jun 2, 2020 at 5:34 PM Shiyan Xu > wrote: > > > I also encountered use cases where I'd like to p

Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-12 Thread Shiyan Xu
each hudi table.. > > For this effort, does it make sense to take a dependency on the > multi-writer jira HUDI-944, that liwei filed? > > On Wed, Jun 10, 2020 at 7:49 PM Shiyan Xu > wrote: > > > Yes, Vinoth, it does go a bit too far with first class support on these > > data. >

Re: [VOTE] Release 0.5.3, release candidate #2

2020-06-12 Thread Shiyan Xu
+1 (non-binding) Source compile ... ok Local UT ... ok Delta streamer run on EMR ... ok Release label:emr-5.29.0 Hadoop distribution:Amazon 2.8.5 Applications:Spark 2.4.4, Hive 2.3.6, Tez 0.9.2, Presto 0.227 Upsert to COW table ... ok Hive sync ... ok HiveQL select ... ok On Fri, Jun 12, 2020

Re: [DISCUSS] querying commit metadata from spark DataSource

2020-06-02 Thread Shiyan Xu
I also encountered use cases where I'd like to programmatically query metadata. +1 on the idea of format(“hudi-timeline”) I also feel that the metadata can be extended further to include more info like, errors, metrics/write statistics, etc. Like the newly proposed error handling, we could also

Re: GitHub release display issue

2020-07-17 Thread Shiyan Xu
Jul 13, 2020 at 11:42 PM Shiyan Xu > wrote: > > > The new GitHub UI displays 0.4.7 as the latest, which is misleading. > > Guess adding notes to the later releases could resolve it? > > [image: Screen Shot 2020-07-13 at 11.37.16 PM.png] > > >

Re: Unit tests in hudi-client module fail due to SparkContext

2020-07-28 Thread Shiyan Xu
iscrepancy between the intellij test runner and CLI mvn runner may > be > > > affected via these settings > > > > Somehow I can't see the screenshots... > > > > Thanks, > > - Ethan > > > > On Tue, Jul 28, 2020 at 9:32 AM Shiyan Xu >

Re: Unit tests in hudi-client module fail due to SparkContext

2020-07-28 Thread Shiyan Xu
The maven surefire/failsafe plugin is configured with -Xmx2g here , which should be plentiful for all tests so far. The OOM looks weird to me.. maybe try checking the maven log see if -Xmx2g is indeed

Re: [DISCUSS] Adding Metrics to Hudi Common

2020-07-28 Thread Shiyan Xu
es can just live in hudi-common for > now? > > On Tue, Jul 28, 2020 at 9:06 AM Shiyan Xu > wrote: > > > +1. It would be very helpful to have more internal > performance/cost-related > > metrics (perhaps optionally enabled). Also it does make sense to move &g

Re: [DISCUSS] Adding Metrics to Hudi Common

2020-07-28 Thread Shiyan Xu
+1. It would be very helpful to have more internal performance/cost-related metrics (perhaps optionally enabled). Also it does make sense to move metrics classes to common, or even to a separate module (if the scope gets extended a lot further) On Tue, Jul 28, 2020 at 8:43 AM vbal...@apache.org

Re: 0.11.0 release timeline

2022-03-27 Thread Shiyan Xu
Hi All, just a reminder on the timeline, as discussed earlier: - Mar 31 00:00 PST : feature freeze - new features/functionalities won't be merged to master (3 days from now) - Apr 03 00:00 PST : cut release branch and start RC voting/testing (6 days from now) Thank you. On Wed, Mar 23, 2022 at

Re: Permission to contribute

2022-04-03 Thread Shiyan Xu
Done and welcome! On Sat, Apr 2, 2022 at 8:37 PM wulingqi wrote: > Hi Team , > > I want to contribute to Apache Hudi. Would you please give me the > contributor permission? My JIRA username is KnightChess > > Thanks > Lingqi > > -- Best, Shiyan

Re: [ANNOUNCE] New Apache Hudi Committer - Zhaojing Yu

2022-03-25 Thread Shiyan Xu
Congrats! On Fri, Mar 25, 2022 at 1:40 PM Danny Chan wrote: > Hi everyone, > > On behalf of the PMC, I'm very happy to announce Zhaojing Yu as a new > Hudi committer. > > Zhaojing is very active in Flink Hudi contributions, many cool > features such as the flink streaming bootstrap, compaction

Re: 0.11.0 release timeline

2022-04-05 Thread Shiyan Xu
to be ready the next day. In the meantime, we'll continue landing bug fixes. Thanks. On Mon, Mar 28, 2022 at 2:25 AM Shiyan Xu wrote: > Hi All, just a reminder on the timeline, as discussed earlier: > > - Mar 31 00:00 PST : feature freeze - new features/functionalities won't &g

Re: contributor permission

2022-04-11 Thread Shiyan Xu
Done. welcome! On Mon, Apr 11, 2022 at 7:39 PM 金鱼缸底的秘密 <1715123...@qq.com.invalid> wrote: > Hi, > > > I want to contribute to Apache Hudi. > Would you please give me the contributor permission? > > > My JIRA ID is zyp. > > My email is1715123...@qq.com. > > 张一鹏 > Tel:18162317322 >

Re: 0.11.0 release timeline

2022-04-06 Thread Shiyan Xu
and then send a separate voting email for RC1. Thanks. On Wed, Apr 6, 2022 at 2:20 AM Shiyan Xu wrote: > Hi all, > > Apologies about the delays, with the last few blockers just landed, we are > now starting the RC process. > > I started by following the release guide here > >

Re: 0.11.0 release timeline

2022-04-06 Thread Shiyan Xu
.11.0-rc1>. Please report to the [VOTE] email by casting votes and giving feedback or test results. We would love to have as much feedback as possible to help stabilize the RC. Please don't hesitate to test even if there is a -1 vote. Thank you for your cooperation. On Wed, Apr 6, 2022 at 5:46

[VOTE] Release 0.11.0, release candidate #1

2022-04-06 Thread Shiyan Xu
Hi everyone, Please review and vote on the release candidate #1 for the version 0.11.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes

[VOTE] Release 0.11.0, release candidate #2

2022-04-15 Thread Shiyan Xu
Hi everyone, Please review and vote on the release candidate #2 for the version 0.11.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes

Re: [VOTE] Release 0.11.0, release candidate #1

2022-04-12 Thread Shiyan Xu
Sun, Apr 10, 2022 at 11:03 AM Shiyan Xu > wrote: > > > -1 > > > > Rat plugin in CI was not working for some time and resulted in some files > > missing Apache license header. This was fixed in master > > > > > https://github.com/apache/hudi/commit/5e65aef

Re: [VOTE] Release 0.14.0, release candidate #3

2023-09-25 Thread Shiyan Xu
+1 (binding) - Ran some sanity tests for spark 3.4 On Fri, Sep 22, 2023 at 3:42 PM Shawn Chang wrote: > +1 (non-binding) > > - Ran integration tests with Hudi 0.14.0-rc3 jars on the latest EMR cluster > > On Fri, Sep 22, 2023 at 12:13 PM Hussein Awala wrote: > > > +1 (non-binding) ran some

Re: [VOTE] Release 0.11.0, release candidate #2

2022-04-22 Thread Shiyan Xu
, 2022 at 4:31 PM Y Ethan Guo > > > wrote: > > > > > > > -1 > > > > The Kafka Connect Sink for Hudi cannot ingest data using > > > > hudi-kafka-connect-bundle from 0.11.0-rc2 due to > NoClassDefFoundError. > > > The > > >

Re: [VOTE] Release 0.11.0, release candidate #2

2022-04-19 Thread Shiyan Xu
2 due to NoClassDefFoundError. > The > > following fix is put up. > > https://github.com/apache/hudi/pull/5353 > > > > Best, > > - Ethan > > > > On Fri, Apr 15, 2022 at 5:20 AM Shiyan Xu > > wrote: > > > > > Hi everyone, > > > &

[VOTE] Release 0.11.0, release candidate #3

2022-04-24 Thread Shiyan Xu
Hi everyone, Please review and vote on the release candidate #3 for the version 0.11.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes

Re: Permission to contribute

2022-04-26 Thread Shiyan Xu
Done and welcome! On Tue, Apr 26, 2022 at 7:47 PM Александр Трушев wrote: > Hi, > I want to contribute to Apache Hudi. > Would you please give me the contributor permission? > My jira username is trushev > > Thanks > -- Best, Shiyan

Re: [VOTE] Release 0.11.0, release candidate #1

2022-04-10 Thread Shiyan Xu
ease. Please check below > > ./docker/push_to_docker_hub.png: image/png; charset=binary > > > > > > > > > On Wed, 6 Apr 2022 at 15:37, Shiyan Xu > wrote: > > > Hi everyone, > > > > Please review and vote on the release candidate #1 for the ve

[DISCUSS] Diagnostic reporter

2022-05-30 Thread Shiyan Xu
Hi all, When troubleshooting Hudi jobs in users' environments, we always ask users to share configs, environment info, check spark UI, etc. Here is an RFC idea: can we extend the Hudi metrics system and make a diagnostic reporter? It can be turned on like a normal metrics reporter. it should

  1   2   >