Re: TLP Announcement
Great news! Congratulations Vinoth and the community! Best regards, Shaofeng Shi 史少锋 Apache Kylin PMC Email: shaofeng...@apache.org Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html Join Kylin user mail group: user-subscr...@kylin.apache.org Join Kylin dev mail group: dev-subscr...@kylin.apache.org hddong 于2020年6月5日周五 上午9:30写道: > A great news! > > leesf 于2020年6月5日周五 上午9:04写道: > > > Amazing, thanks everyone in the community. > > > > Lamber Ken 于2020年6月5日周五 上午1:03写道: > > > > > Great news, and thank you all and congratulations. > > > > > > On 2020/06/04 14:28:33, Vinoth Chandar wrote: > > > > Hello all, > > > > > > > > The ASF press release announcing Apache Hudi as TLP is live! Thanks > for > > > all > > > > your contributions! We could not have been achieved that without > such a > > > > great community effort! > > > > > > > > Please help spread the word! > > > > > > > > - GlobeNewswire > > > > > > > > > > http://www.globenewswire.com/news-release/2020/06/04/2043732/0/en/The-Apache-Software-Foundation-Announces-Apache-Hudi-as-a-Top-Level-Project.html > > > > - ASF "Foundation" blog https://s.apache.org/odtwv > > > > - @TheASF twitter feed > > > > https://twitter.com/TheASF/status/1268528110959497217 > > > > - The ASF on LinkedIn > > > > https://www.linkedin.com/company/the-apache-software-foundation > > > > > > > > Thanks > > > > Vinoth > > > > > > > > > >
Re: TLP Announcement
A great news! leesf 于2020年6月5日周五 上午9:04写道: > Amazing, thanks everyone in the community. > > Lamber Ken 于2020年6月5日周五 上午1:03写道: > > > Great news, and thank you all and congratulations. > > > > On 2020/06/04 14:28:33, Vinoth Chandar wrote: > > > Hello all, > > > > > > The ASF press release announcing Apache Hudi as TLP is live! Thanks for > > all > > > your contributions! We could not have been achieved that without such a > > > great community effort! > > > > > > Please help spread the word! > > > > > > - GlobeNewswire > > > > > > http://www.globenewswire.com/news-release/2020/06/04/2043732/0/en/The-Apache-Software-Foundation-Announces-Apache-Hudi-as-a-Top-Level-Project.html > > > - ASF "Foundation" blog https://s.apache.org/odtwv > > > - @TheASF twitter feed > > > https://twitter.com/TheASF/status/1268528110959497217 > > > - The ASF on LinkedIn > > > https://www.linkedin.com/company/the-apache-software-foundation > > > > > > Thanks > > > Vinoth > > > > > >
Re: How to extend the timeline server schema to accommodate business metadata
Sorry did not understand the last part. :) are you suggesting we create a jira On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera wrote: > That sounds great ! Will check that and keep an eye on the long running > server approach... once it gets a ticket I could watch for just let me know > please. > > Thanks > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, wrote: > > > Hi Mario, > > > > We actually started with the idea of making the timeline server, a long > > running service. We have a module if you notice that builds our a bundle > > that you could deploy. May be you can play with it and see if that sounds > > interesting to you. It will definitely have some rough edges given it’s > not > > been widely used. > > > > Thanks > > Vinoth > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera > > wrote: > > > > > Hi Vinoth, thanks for your comments on this. I spent sometime thinking > > over > > > another possibility which would be externalising the Hudi timeline > > service > > > itself to an external server holding both operational (ie Hudi) and > > > business metadata. > > > > > > would you guys have any opinion on that ? would that be easy as I do > not > > > seem to see a way yet , except reading about RocksDB but that is still > > not > > > quite clear. > > > > > > best regards, > > > > > > Mario. > > > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar < > > > mail.vinoth.chan...@gmail.com> escreveu: > > > > > > > Hi Mario, > > > > > > > > Thanks for the detailed explanation. Hudi already allows extra > metadata > > > to > > > > be written atomically with each commit i.e write operation. In fact, > > that > > > > is how we track checkpoints for our delta streamer tool.. It may not > > > solve > > > > the need for querying the data together with this information. but > > gives > > > > you ability to do some basic tagging.. if thats useful > > > > > > > > >>If we enable the timeline service metadata model to be extended we > > > could > > > > use the service instance itself to support specialised queries that > > > involve > > > > business qualifiers in order to return a proper set of metadata > > pointing > > > to > > > > the related commits > > > > > > > > This is a good idea actually.. There is another active discuss thread > > on > > > > making the metadata queryable.. there is also > > > > https://issues.apache.org/jira/browse/HUDI-309 which we paused for > > now.. > > > > But that's more in line with what you are thinking IIUC > > > > > > > > > > > > Thanks > > > > vinoth > > > > > > > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera > > > > wrote: > > > > > > > > > Hi Balaji, > > > > > > > > > > business metadata are all types of info related to the business > where > > > the > > > > > Hudi solution is being used... from a COB (ie close of business > date) > > > > > related to that commit to any qualifier related to that commit that > > > might > > > > > be useful to be associated with that commit id. If we enable the > > > timeline > > > > > service metadata model to be extended we could use the service > > instance > > > > > itself to support specialised queries that involve business > > qualifiers > > > in > > > > > order to return a proper set of metadata pointing to the related > > > commits > > > > > that answer a business query. > > > > > > > > > > if we do not have that flexibility we might end up creating a > > external > > > > > transaction log and then comes the hard task to make that service > in > > > sync > > > > > to the timeline service. > > > > > > > > > > let me know if that makes sense to you, > > > > > > > > > > Mario. > > > > > > > > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan > > > > > escreveu: > > > > > > > > > > > Hi Mario, > > > > > > Timeline Server was designed to serve hudi metadata for Hudi > > writers > > > > and > > > > > > readers. it may not be suitable to serve arbitrary data. But, it > > is > > > an > > > > > > interesting thought. Can you elaborate more on what kind of > > business > > > > > > metadata are you looking. Is this something you are planning to > > store > > > > in > > > > > > commit files ? > > > > > > Balaji.V > > > > > > > > > > > > On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá Vera < > > > > > > desav...@gmail.com> wrote: > > > > > > > > > > > > I see a need for extending the current timeline server schema so > > > that > > > > a > > > > > > flexible model could be achieved in order to accommodate business > > > > > metadata. > > > > > > > > > > > > let me know if that makes sense to anyone here... > > > > > > > > > > > > Regards, > > > > > > > > > > > > Mario. > > > > > > > > > > > > > > > > > > > > >
Re: TLP Announcement
Amazing, thanks everyone in the community. Lamber Ken 于2020年6月5日周五 上午1:03写道: > Great news, and thank you all and congratulations. > > On 2020/06/04 14:28:33, Vinoth Chandar wrote: > > Hello all, > > > > The ASF press release announcing Apache Hudi as TLP is live! Thanks for > all > > your contributions! We could not have been achieved that without such a > > great community effort! > > > > Please help spread the word! > > > > - GlobeNewswire > > > http://www.globenewswire.com/news-release/2020/06/04/2043732/0/en/The-Apache-Software-Foundation-Announces-Apache-Hudi-as-a-Top-Level-Project.html > > - ASF "Foundation" blog https://s.apache.org/odtwv > > - @TheASF twitter feed > > https://twitter.com/TheASF/status/1268528110959497217 > > - The ASF on LinkedIn > > https://www.linkedin.com/company/the-apache-software-foundation > > > > Thanks > > Vinoth > > >
Re: Suggestion needed - Hudi performance wrt no. and depth of partitions
Thanks a lot Vinoth for your suggestion. I will look into it. On Thu, 4 Jun 2020 at 10:15 AM, Vinoth Chandar wrote: > This is a good conversation. The ask for support of bucketed tables has not > actually come up much, since if you are looking up things at that > granularity, it almost feels like you are doing OLTP/database like queries? > > Assuming you hash the primary key into a hash that denotes the partition, > then a simple workaround is to always add a where clause using a UDF in > presto, I.e where key = 123 and partition = hash_udf(123) > > But of course the down side Is that your ops team needs to remember to add > the second partition clause (which is not very different from querying > large time partitioned tables today) > > Our mid term plan is to build out column indexes (RFC-15 has the details, > if you are interested) > > On Wed, Jun 3, 2020 at 2:54 AM tanu dua wrote: > > > If I need to plugin this hashing algorithm to resolve the partitions in > > Presto and hive what is the code I should look into ? > > > > On Wed, Jun 3, 2020, 12:04 PM tanu dua wrote: > > > > > Yes that’s also on cards and for developers that’s ok but we need to > > > provide an interface to our ops people to execute the queries from > presto > > > so I need to find out if they fire a query on primary key how can I > > > calculate the hash. They can fire a query including primary key with > > other > > > fields. So that is the only problem I see in hash partitions and to get > > if > > > work I believe I need to go deeper into presto Hudi plugin > > > > > > On Wed, 3 Jun 2020 at 11:48 AM, Jaimin Shah > > > wrote: > > > > > >> Hi Tanu, > > >> > > >> If your primary key is integer you can add one more field as hash of > > >> integer and partition based on hash field. It will add some complexity > > to > > >> read and write because hash has to be computed prior to each read or > > >> write. > > >> Not whether overhead of doing this exceeds performance gains due to > less > > >> partitions. I wonder why HUDI don't directly support hash based > > >> partitions? > > >> > > >> Thanks > > >> Jaimin > > >> > > >> On Wed, 3 Jun 2020 at 10:07, tanu dua wrote: > > >> > > >> > Thanks Vinoth for detailed explanation. Even I was thinking on the > > same > > >> > lines and I will relook. We can reduce the 2nd and 3rd partition but > > >> it’s > > >> > very difficult to reduce the 1st partition as that is the basic > > primary > > >> key > > >> > of our domain model on which analysts and developers need to query > > >> almost > > >> > 90% of time and its an integer primary key and can’t be decomposed > > >> further. > > >> > > > >> > On Wed, 3 Jun 2020 at 9:23 AM, Vinoth Chandar > > >> wrote: > > >> > > > >> > > Hi tanu, > > >> > > > > >> > > For good query performance, its recommended to write optimally > sized > > >> > files. > > >> > > Hudi already ensures that. > > >> > > > > >> > > Generally speaking, if you have too many partitions, then it also > > >> means > > >> > too > > >> > > many files. Mostly people limit to 1000s of partitions in their > > >> datasets, > > >> > > since queries typically crunch data based on time or a > > business_domain > > >> > (e.g > > >> > > city for uber).. Partitioning too granular - say based on > user_id - > > >> is > > >> > not > > >> > > very useful unless your queries only crunch per user.. if you are > > >> using > > >> > > Hive metastore then 25M partitions mean 25M rows in your backing > > mysql > > >> > > metastore db as well - not very scalable. > > >> > > > > >> > > What I am trying to say is : even outside of Hudi, if analytics is > > >> your > > >> > use > > >> > > case, might be worth partitioning at lower granularity and > increase > > >> rows > > >> > > per parquet file. > > >> > > > > >> > > Thanks > > >> > > Vinoth > > >> > > > > >> > > On Tue, Jun 2, 2020 at 3:18 AM Tanuj > wrote: > > >> > > > > >> > > > Hi, > > >> > > > We have a requirement to ingest 30M records in S3 backed up by > > >> HUDI. I > > >> > am > > >> > > > figuring out the partition strategy and ending up with lot of > > >> > partitions > > >> > > > like 25M partitions (primary partition) --> 2.5 M (secondary > > >> partition) > > >> > > --> > > >> > > > 2.5 M (third partition) and each parquet file will have the > > records > > >> > with > > >> > > > less than 10 rows of data. > > >> > > > > > >> > > > Our dataset will be ingested at once in full and then it will be > > >> > > > incremental daily with less than 1k updates. So its more read > > heavy > > >> > > rather > > >> > > > than write heavy > > >> > > > > > >> > > > So what should be the suggestion in terms of HUDI performance - > go > > >> > ahead > > >> > > > with the above partition strategy or shall I reduce my > partitions > > >> and > > >> > > > increase no of rows in each parquet file. > > >> > > > > > >> > > > > >> > > > >> > > > > > >
Re: TLP Announcement
Great news, and thank you all and congratulations. On 2020/06/04 14:28:33, Vinoth Chandar wrote: > Hello all, > > The ASF press release announcing Apache Hudi as TLP is live! Thanks for all > your contributions! We could not have been achieved that without such a > great community effort! > > Please help spread the word! > > - GlobeNewswire > http://www.globenewswire.com/news-release/2020/06/04/2043732/0/en/The-Apache-Software-Foundation-Announces-Apache-Hudi-as-a-Top-Level-Project.html > - ASF "Foundation" blog https://s.apache.org/odtwv > - @TheASF twitter feed > https://twitter.com/TheASF/status/1268528110959497217 > - The ASF on LinkedIn > https://www.linkedin.com/company/the-apache-software-foundation > > Thanks > Vinoth >
Re: TLP Announcement
Thank you all and congratulations. This is a big milestone! -Sudha On Thu, Jun 4, 2020 at 9:21 AM vino yang wrote: > Great news! > > Thanks for the whole community! > > Best, > Vino > > Pratyaksh Sharma 于2020年6月4日周四 下午11:23写道: > > > That is a great news. > > > > On Thu, Jun 4, 2020 at 7:58 PM Vinoth Chandar wrote: > > > > > Hello all, > > > > > > The ASF press release announcing Apache Hudi as TLP is live! Thanks for > > all > > > your contributions! We could not have been achieved that without such a > > > great community effort! > > > > > > Please help spread the word! > > > > > > - GlobeNewswire > > > > > > > > > http://www.globenewswire.com/news-release/2020/06/04/2043732/0/en/The-Apache-Software-Foundation-Announces-Apache-Hudi-as-a-Top-Level-Project.html > > > - ASF "Foundation" blog https://s.apache.org/odtwv > > > - @TheASF twitter feed > > > https://twitter.com/TheASF/status/1268528110959497217 > > > - The ASF on LinkedIn > > > https://www.linkedin.com/company/the-apache-software-foundation > > > > > > Thanks > > > Vinoth > > > > > >
Re: TLP Announcement
Great news! Thanks for the whole community! Best, Vino Pratyaksh Sharma 于2020年6月4日周四 下午11:23写道: > That is a great news. > > On Thu, Jun 4, 2020 at 7:58 PM Vinoth Chandar wrote: > > > Hello all, > > > > The ASF press release announcing Apache Hudi as TLP is live! Thanks for > all > > your contributions! We could not have been achieved that without such a > > great community effort! > > > > Please help spread the word! > > > > - GlobeNewswire > > > > > http://www.globenewswire.com/news-release/2020/06/04/2043732/0/en/The-Apache-Software-Foundation-Announces-Apache-Hudi-as-a-Top-Level-Project.html > > - ASF "Foundation" blog https://s.apache.org/odtwv > > - @TheASF twitter feed > > https://twitter.com/TheASF/status/1268528110959497217 > > - The ASF on LinkedIn > > https://www.linkedin.com/company/the-apache-software-foundation > > > > Thanks > > Vinoth > > >
Re: TLP Announcement
That is a great news. On Thu, Jun 4, 2020 at 7:58 PM Vinoth Chandar wrote: > Hello all, > > The ASF press release announcing Apache Hudi as TLP is live! Thanks for all > your contributions! We could not have been achieved that without such a > great community effort! > > Please help spread the word! > > - GlobeNewswire > > http://www.globenewswire.com/news-release/2020/06/04/2043732/0/en/The-Apache-Software-Foundation-Announces-Apache-Hudi-as-a-Top-Level-Project.html > - ASF "Foundation" blog https://s.apache.org/odtwv > - @TheASF twitter feed > https://twitter.com/TheASF/status/1268528110959497217 > - The ASF on LinkedIn > https://www.linkedin.com/company/the-apache-software-foundation > > Thanks > Vinoth >
TLP Announcement
Hello all, The ASF press release announcing Apache Hudi as TLP is live! Thanks for all your contributions! We could not have been achieved that without such a great community effort! Please help spread the word! - GlobeNewswire http://www.globenewswire.com/news-release/2020/06/04/2043732/0/en/The-Apache-Software-Foundation-Announces-Apache-Hudi-as-a-Top-Level-Project.html - ASF "Foundation" blog https://s.apache.org/odtwv - @TheASF twitter feed https://twitter.com/TheASF/status/1268528110959497217 - The ASF on LinkedIn https://www.linkedin.com/company/the-apache-software-foundation Thanks Vinoth
Re: CI/Master tests failing
I fixed a a bunch of issues around flakiness (PR-1697) and have landed the change. There is still flakiness with CI possibly related to leaks (HUDI-997) in unit-tests in hudi-client. At this time, I would need help to have someone take up HUDI-997 to make CI test stable. Any volunteers ? Balaji.V On Tuesday, June 2, 2020, 11:36:37 AM PDT, Vinoth Chandar wrote: Hi all, This is PSA.. We are observing some flakiness with master and the last three PR merges have failed. balaji is looking at the fix/issue.. But in the meantime, I'd ask committers to temporarily not merge more PRs until this is resolved. It will help us fix this early. Error looks something like this [ERROR] Error occurred in starting fork, check output in log 14007[ERROR] Process Exit Code: 1 14008[ERROR] Crashed tests: 14009[ERROR] org.apache.hudi.table.TestHoodieMergeOnReadTable 14010[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called? 14011[ERROR] Command was /bin/sh -c cd /home/travis/build/apache/hudi/hudi-client && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -Xmx2g -jar /home/travis/build/apache/hudi/hudi-client/target/surefire/surefirebooter6746374343787497574.jar /home/travis/build/apache/hudi/hudi-client/target/surefire 2020-06-02T12-27-45_220-jvmRun1 surefire6644367455247671601tmp surefire_37399402556296347739tmp 14012[ERROR] Error occurred in starting fork, check output in log 14013[ERROR] Process Exit Code: 1 14014[ERROR] Crashed tests: 14015[ERROR] org.apache.hudi.table.TestHoodieMergeOnReadTable 14016[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:690) 14017[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:285) 14018[ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:248) 14019[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1217) 14020[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1063) 14021[ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:889) 14022[ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) 14023[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) 14024[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154) 14025[ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146) 14026[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117) 14027[ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81) 14028[ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) 14029[ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) 14030[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309) 14031[ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194) 14032[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107) 14033[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:955) 14034[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:290) 14035[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:194) 14036[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 14037[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 14038[ERROR] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 14039[ERROR] at java.lang.reflect.Method.invoke(Method.java:498) 14040[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) 14041[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) 14042[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) 14043[ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) 14044[ERROR] -> [Help 1] 14045[ERROR]
Re: [VOTE] Release 0.5.3, release candidate #1
-1. Siva, We found an issue that needs to be ported to 0.5.3. Jira : https://jira.apache.org/jira/browse/HUDI-990 I will work with you to port the change to 0.5.3. We would need create a new release candidate for this. Balaji.V On Tuesday, June 2, 2020, 08:51:54 PM PDT, yajunf...@163.com wrote: +1 yajunf...@163.com From: Sivabalan Date: 2020-06-03 11:22 To: dev Subject: [VOTE] Release 0.5.3, release candidate #1 Hi everyone, Please review and vote on the release candidate #1 for the version 0.5.3, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], * the official Apache source release and binary convenience releases to be deployed to dist.apache.org [2], which are signed with the key with fingerprint 001B66FA2B2543C151872CCC29A4FD82F1508833 [3], * all artifacts to be deployed to the Maven Central Repository [4], * source code tag "release-0.5.3-rc1" [5], The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. Thanks, Release Manager [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822=12348256 [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.5.3-rc1/ [3] https://dist.apache.org/repos/dist/release/hudi/KEYS [4] https://repository.apache.org/content/repositories/orgapachehudi-1022/ [5] https://github.com/apache/hudi/tree/release-0.5.3-rc1