Re: Regards to have Athena Metastore Sync

2019-12-31 Thread Vinoth Chandar
Can one of the aws folks please chime in here? IIRC I saw some tweets mentioning Hudi/Athena support is in the works. Not sure myself. On Sun, Dec 29, 2019 at 11:33 PM Syed Abdul Kather wrote: > Hi Team, > > We have built the "CDC pipeline with apache hudi and debezium" . It > works very

Re: Contributor Permission Request

2019-12-31 Thread Vinoth Chandar
Done! Welcome aboard! On Tue, Dec 31, 2019 at 10:03 AM Peter Huang wrote: > Hi, > > I want to contribute to Apache Hudi. Would you please give me the > contributor permission? My JIRA ID is ZhenqiuHuang. > > > > Best Regards > > Peter Huang >

Re: Apache project maturity model

2019-12-30 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi+Maturity+Matrix Our awesome mentor Suneel has gone ahead and taken a first pass.. Will file the remaining work in the next few days. On Tue, Nov 19, 2019 at 7:55 AM Vinoth Chandar wrote: > Thanks Thomas! Will read it over and f

Re: insert too slow

2019-12-30 Thread Vinoth Chandar
Hi, If you could share your Spark UI or the stage that is taking more time, we can further drill down. Overall, this FAQ entry also sets expectations and provides some data modelling tips

Re: [DISCUSS] Is it possible to use DynamoDB as index storage?

2019-12-30 Thread Vinoth Chandar
Hi, I would imagine we could write a dynamoDB index similar to HBase. No one has attempted it so far though :) You can look at issues tagged with "Index" component and see if there are small ones you could pick up to familiarize and then may be draw up a plan. Thanks Vinoth On Mon, Dec 30,

Re: [get assign permission]

2019-12-29 Thread Vinoth Chandar
Added you! Welcome ! On Fri, Dec 27, 2019 at 8:38 PM deng ziming wrote: > I can‘t assign a ticket to myself? thank you! > > wiki Jira id : dengziming >

Contribution guidelines

2019-12-27 Thread Vinoth Chandar
Hi all, In an effort to scale ourselves better as a community, I have been spending a lot of time cleaning up JIRAs and also writing up explicitly, the processes around contributions that we have been adopting.. Please give this is a quick read

Re: Commit time issue in DeltaStreamer (Real-Time)

2019-12-27 Thread Vinoth Chandar
his might sound a very dumb question, but still if you can help > me with this; do you have any idea from where or which module is > responsible for this issue ..??? > > > > On Fri, 27 Dec 2019 at 9:10 PM, Vinoth Chandar wrote: > > > Hi Shahida, > > > > It

Re: Commit time issue in DeltaStreamer (Real-Time)

2019-12-27 Thread Vinoth Chandar
Hi Shahida, It seems like a bug from the rename changes we did recently. Could you please raise a JIRA, tagged to 0.5.1 release? Should be easy to fix, but DeltaStreamer should not be reading the clean timeline.. (Also btw this is also related to the one issue that you filed). Balaji, could you

Re: Re:How to write a performance test program

2019-12-25 Thread Vinoth Chandar
Filed HUDI-468 - Not an avro data file - error while archiving post rename() change to track this On Mon, Dec 23, 2019 at 11:40 PM ma...@bonc.com.cn wrote: > Thank you, I have replaced it with hubi-spark-bundle-0.5.0-incubating.jar, > and the

Re: jira permission

2019-12-24 Thread Vinoth Chandar
Done, welcome! :) On Tue, Dec 24, 2019 at 7:18 PM sandyfog . wrote: > *Hi,* > > *I want to contribute to Apache Hudi. Would you please give me the > contributor permission? My JIRA username is *sandyfog > > Thank You ! >

Re: Apply for JIRA permission

2019-12-24 Thread Vinoth Chandar
Done :) Welcome! On Tue, Dec 24, 2019 at 6:50 PM Jingsong Li wrote: > Hi, > > I want to contribute to Apache Hudi. > Would you please give me the contributor permission? > My > JIRA ID is lzljs3620320 > JIRA name is Jingsong Lee > > -- > Best, Jingsong Lee >

Re: apply for contributor permission

2019-12-24 Thread Vinoth Chandar
Done. and Welcome! Please see https://hudi.apache.org/community.html#contributing to get started! On Mon, Dec 23, 2019 at 6:37 PM yuehan...@163.com wrote: > Hello, > I want to contribute to Apache Hudi. Would you please give me the > contributor permission? > Here is my accounts: > apache jira

Re: Apply for JIRA permission

2019-12-24 Thread Vinoth Chandar
Done. and Welcome! Please see https://hudi.apache.org/community.html#contributing to get started! On Tue, Dec 24, 2019 at 4:07 AM hejinbiao_521 wrote: > > > > Hi, > > I want to contribute to Apache Hudi. Would you please give me the > contributor permission? > > My JIRA ID is jinbiaoh.

Re: apply for contribution

2019-12-24 Thread Vinoth Chandar
Done. and Welcome! Please see https://hudi.apache.org/community.html#contributing to get started! On Tue, Dec 24, 2019 at 7:58 AM pu zhang wrote: > *Hi,* > > *I want to contribute to Apache Hudi. Would you please give me the > contributor permission? My JIRA ID is zhangpu-paul.* > > *thanks~* >

Re: apply for contributor permission

2019-12-24 Thread Vinoth Chandar
Done. and Welcome! Please see https://hudi.apache.org/community.html#contributing to get started! On Tue, Dec 24, 2019 at 5:29 PM 王祥虎 wrote: > Hi > > I want to contribute to Apache Hudi. Would you please give me the > contributor permission? My JIRA ID is wangxianghu > > thanks~

Re: Re: Facing issues when using HiveIncrementalPuller

2019-12-24 Thread Vinoth Chandar
r.loadClass(Launcher.java:349) > >at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > >... 1 more > > > >I was able to fix it by including the corresponding jar in the bundle. > > > >After fixing the above, still I am getting the NPE even thou

Re: How to write a performance test program

2019-12-23 Thread Vinoth Chandar
Could you give 0.5.0-incubating (last release) a shot in the meantime? Lamberken, do you have steps to reproduce this issue. Love to get a JIRA filed so we could fix before the next release. On Mon, Dec 23, 2019 at 7:25 PM lamberken wrote: > > Hi @mayu1, > > I guess you used the latest master

Fwd: Re: IDE setup for code formatting

2019-12-23 Thread Vinoth Chandar
. best, lamber-ken At 2019-12-24 11:00:12, "Vinoth Chandar" wrote: >Ironically, google style + checkstyle is what we had few months ago :) > >Can we have an owner to drive this to a point where, the code formatting is >well-documented for contributors? >leesf. and la

Re: IDE setup for code formatting

2019-12-23 Thread Vinoth Chandar
Ironically, google style + checkstyle is what we had few months ago :) Can we have an owner to drive this to a point where, the code formatting is well-documented for contributors? leesf. and lamber, seems like you have the most context? On Mon, Dec 23, 2019 at 6:24 PM lamber...@163.com wrote:

This week's sync meeting

2019-12-23 Thread Vinoth Chandar
Folks, I will not be able to attend this week's meeting. Please go ahead without me and send notes to the group as usual Cheers Vinoth

Re: Facing issues when using HiveIncrementalPuller

2019-12-23 Thread Vinoth Chandar
Hi Pratyaksh, HveIncrementalPuller is just a java program. Does not need Spark, since it just runs a HiveQL remotely.. On the error you specified, seems like it can't find the template? Can you see if the bundle does not have the template file.. May be this got broken during the bundling

Re: Re: IDE setup for code formatting

2019-12-23 Thread Vinoth Chandar
to fix this > error manually. In Apache Flink/Calcite, we also fix it manually, and will > also look for other plugins to fix import order error if exist. > > Best, > Leesf > > Vinoth Chandar 于2019年12月23日周一 下午4:55写道: > > > I understand. I am saying - we should automate all of t

Re: Re: IDE setup for code formatting

2019-12-23 Thread Vinoth Chandar
import javax.management.remote.JMXServiceURL; > > import java.io.Closeable; > import java.lang.management.ManagementFactory; > import java.rmi.registry.LocateRegistry; > > public class JmxMetricsReporter extends MetricsReporter { > > > /-----------

Re: IDE setup for code formatting

2019-12-22 Thread Vinoth Chandar
> 4) http://www.scalastyle.org/rules-1.0.0.html > > > best, > lamber-ken > On 12/23/2019 13:03,Vinoth Chandar wrote: > Hello all, > > I know a bunch of work has happened to format the code base, closer to what > other project are doing.. > > While working through

IDE setup for code formatting

2019-12-22 Thread Vinoth Chandar
Hello all, I know a bunch of work has happened to format the code base, closer to what other project are doing.. While working through some checkstyle violations today, I noticed that the IDE formatting is now out of date with the checkstyle enforced? Manually fixing these checkstyle issues are

Re: Any one can help to add me to the project contributor group?

2019-12-22 Thread Vinoth Chandar
Hello, You should have perms now to claim JIRAs or create a new RFC say. Welcome aboard! On Sun, Dec 22, 2019 at 7:21 AM zhenyuan wei wrote: > Hello, > I've read most src code, and very happy to make some contributions > to code Any one can help to add me to the project contributor group?

Re: [QUESTION] Encountering exceptions while upserting with Deltastreamer

2019-12-21 Thread Vinoth Chandar
+1 for trimming the schema down and iterating On Thu, Dec 19, 2019 at 10:07 PM Kabeer Ahmed wrote: > Hi Nishith, > > I do not want to diverge this thread. I looked into the jira link that you > have sent which is - ( >

Re: Re: Re: Re:Re: Re: Re:Re: Re: Re: [DISCUSS] Rework of new web site

2019-12-20 Thread Vinoth Chandar
; > >> > > > >> > >Thanks. :) > >> > >best, > >> > >lamber-ken > >> > > > >> > > > >> > >At 2019-12-19 00:53:51, "Shiyan Xu" > >> wrote: > >> > >>Thank you @la

Re: [QUESTION] Handle record partition change

2019-12-18 Thread Vinoth Chandar
Interesting discussion. We can file a JIRA for option 2? It seems to also make the semantics simpler. On Wed, Dec 18, 2019 at 11:21 AM Shiyan Xu wrote: > Thanks Sivabalan. Exactly, that's what I meant. > I can think of a usecase for option 2: a Hudi dataset manages people info > and

Re: Re: Re: Re: Re: Re: Re: Re: [DISCUSS] Refactor of the configuration framework of hudi project

2019-12-18 Thread Vinoth Chandar
ken > > > > > At 2019-12-18 11:39:30, "Vinoth Chandar" wrote: > >Expect most users to use inputDF.write() approach... Uber uses the lower > >level RDD apis, like the DeltaStreamer tool does.. > >If we don't rename configs and still support a buil

Re: Re:Re: Re: Re: [DISCUSS] Rework of new web site

2019-12-17 Thread Vinoth Chandar
gt; >if we need the navigation bar on the right in the new UI. > > > > > >[1] https://lamber-ken.github.io/docs/admin_guide > >[2] https://lamber-ken.github.io/docs/writing_data > >[3] https://lamber-ken.github.io/docs/quick-start-guide/ > > > > > &g

Re: Re: Re: Re: Re: Re: Re: [DISCUSS] Refactor of the configuration framework of hudi project

2019-12-17 Thread Vinoth Chandar
Options.PRECOMBINE_FIELD_OPT_KEY(), "timestamp") > .option(HoodieWriteConfig.TABLE_NAME, tableName) > .mode(SaveMode.Append) > .save(basePath); > > / > > > > > best, > lamb

Re: [REMINDAR] Pick up jira tickets for next release

2019-12-17 Thread Vinoth Chandar
+1 On Tue, Dec 17, 2019 at 5:52 PM leesf wrote: > Hi all, > > I am here to remind you that there are more than 40 uncompleted jira > tickets[1] target against next release. > > Considering the comming holidays, the available time for next release is > not enought. In order to drive the release

Re: Re: Re: [DISCUSS] Rework of new web site

2019-12-17 Thread Vinoth Chandar
as much as I can to keep the theming blue and > white. > > > When the above work is completed, I will notify you all again. > best, > lamber-ken > > > At 2019-12-17 12:49:23, "Vinoth Chandar" wrote: > >Hi Lamber, > > > >+1 on the look and feel. D

Re: Re: [DISCUSS] Rework of new web site

2019-12-16 Thread Vinoth Chandar
er, > > > >Thanks for your work, have gone through the new web ui, looks good. > >Hence +1 from my side. > > > >Best, > >Leesf > > > >vino yang 于2019年12月16日周一 上午10:17写道: > > > >> Hi Lamber, > >> > >> I am not an expert

Re: Re: [DISCUSS] Scaling community support

2019-12-16 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/Community+Support Anyone interested in giving the rotation a shot, please add your name next to the a slot. Let's see how this goes in Jan and we can learn On Mon, Dec 16, 2019 at 5:27 AM Vinoth Chandar wrote: > Thanks every

Re: Re: [DISCUSS] Scaling community support

2019-12-16 Thread Vinoth Chandar
> >> > > >> > > wrote: > >> > > > >> > > > > >> > > > Regarding (1), I support the "on-call" model for answering to dev@ > >> > > > emails, triaging GH and Jira. This would help reduce > context-switch

Re: [DISCUSS] Rework of new web site

2019-12-15 Thread Vinoth Chandar
Thanks for taking the time to improve the site. Will review closely and get back to you. On Sun, Dec 15, 2019 at 11:02 AM lamberken wrote: > > > Hello, everyone. > > > Compare to the web site of Delta Lake[1] and Apache Iceberg[2], they may > looks better than hudi project[3]. > > > I delved

Re: Re: Checkstyle changes?

2019-12-14 Thread Vinoth Chandar
Is this worth documenting for everyone? On Fri, Dec 13, 2019 at 4:19 PM Sivabalan wrote: > Thanks. I had some issue w/ local intellij check style configuration. After > fixing it, everything is good. > > > On Thu, Dec 12, 2019 at 12:08 PM lamberken wrote: > > > > > You are welcome. For detail,

Re: Re: Re: Re: Re: Re: [DISCUSS] Refactor of the configuration framework of hudi project

2019-12-13 Thread Vinoth Chandar
need to change due to this, because only HoodieWriteConfig and > *Options will be kept. > > > best, > lamber-ken > > > At 2019-12-14 01:23:35, "Vinoth Chandar" wrote: > >Hi, > > > >We are trying to understand if existing jobs (datasource, deltastreamer

Re: [DISCUSS] Scaling community support

2019-12-13 Thread Vinoth Chandar
; new and most active code-reviewers. > > > > Regarding (3), How about we assume that if a ticket owner is not > > > > responding for more than 1 or 2 weeks, then they are not working on > > this > > > > and we can re-assign if it is a critical fea

Re: Re: Re: Re: Re: [DISCUSS] Refactor of the configuration framework of hudi project

2019-12-13 Thread Vinoth Chandar
> Best, > lamber-ken > > > At 2019-12-12 11:01:36, "Vinoth Chandar" wrote: > >I actually prefer the builder pattern for making the configs, because I > can > >do `builder.` in the IDE and actually see all the options... That said, > >most developers prog

Re: [DISCUSS] RFC-12 : Efficient migration of large parquet tables to Apache Hudi

2019-12-13 Thread Vinoth Chandar
+1 (per asf policy) +100 per my own excitement :) .. Happy to review this! On Fri, Dec 13, 2019 at 3:07 AM Balaji Varadarajan wrote: > With Apache Hudi growing in popularity, one of the fundamental challenges > for users has been about efficiently migrating their historical datasets to >

Re: [DISCUSS] Next Apache Release

2019-12-12 Thread Vinoth Chandar
ase manager for 0.5.1. > > Best, > Vino > > Balaji Varadarajan 于2019年12月12日周四 下午2:38写道: > > > + 1 from me as well for having @leesf be the release manager for 0.5.1. > > @leesf - Appreciate your spirit in helping Hudi community. > > Balaji.VOn Wednesday, Dec

Re: Re: Re: Re: [DISCUSS] Refactor of the configuration framework of hudi project

2019-12-11 Thread Vinoth Chandar
quot;10"). > >> option("hoodie.upsert.shuffle.parallelism", "10"). > >> option("hoodie.delete.shuffle.parallelism", "10"). > >> option("hoodie.bulkinsert.shuffle.parallelism", "10"). > >>

Re: [DISCUSS] Next Apache Release

2019-12-11 Thread Vinoth Chandar
+1 for leesf, driving the release.. >From http://www.apache.org/dev/release-publishing.html#release_manager, it does explicitly confirm that any committer can be RM. I am happy to volunteer my services to assist leesf in the process. @all : Please speak up if you have concerns with the

Re: Re: [DISCUSS] Refactor of the configuration framework of hudi project

2019-12-11 Thread Vinoth Chandar
model. > Would you mind adding some more details on the RFC. It would save time to > read it in one place as opposed to checking out github repo :) > >Thanks,Balaji.V > >On Tuesday, December 10, 2019, 07:55:01 AM PST, Vinoth Chandar < > vin...@apache.org> wrote:

[DISCUSS] Scaling community support

2019-12-07 Thread Vinoth Chandar
Hello all, As we grow, we need a scalable way for new users/contributors to either easily use Hudi or ramp up on the project. Last month alone, we had close to 1600 notifications on commits@. and few hundred emails on this list. In addition, to authoring RFCs and implementing JIRAs we need to

Re: Re: Error when running TestHoodieDeltaStreamer

2019-12-06 Thread Vinoth Chandar
xplore if you can reproduce this easily On Wed, Dec 4, 2019 at 6:13 AM Pratyaksh Sharma wrote: > https://jira.apache.org/jira/browse/HUDI-380 tracks this. > > On Tue, Dec 3, 2019 at 12:11 PM Pratyaksh Sharma > wrote: > > > @Vinoth Chandar > > > > Just to en

2020 ASF Community Survey

2019-12-05 Thread Vinoth Chandar
Hello everyone, If you have an apache.org email, you should have received an email with an invitation to take the 2020 ASF Community Survey. Please take 15 minutes to complete it. If you do not have an apache.org email address or you didn’t receive a link, please follow this link to the survey:

Re: [DISCUSS] Introduce stricter comment and code style validation rules

2019-12-02 Thread Vinoth Chandar
Hi Nicholas, Sorry for the late reply. Thanksgiving :) >>Now Hudi's design, in order to highlight its core components, is a patchwork of the Spark RDD API mixed with business logic scattered in multiple modules and various types of methods. Agree that the hudi-client module needs to be

Re: Re: Error when running TestHoodieDeltaStreamer

2019-12-02 Thread Vinoth Chandar
https://github.com/apache/incubator-hudi/issues/894. is also a IDE related issue.. May be worth opening a usability JIRA ? On Mon, Dec 2, 2019 at 10:53 AM Vinoth Chandar wrote: > FWIW I add spark jars as dependency to the hudi module, I am running and > move it to the top of the li

Re: Re: Error when running TestHoodieDeltaStreamer

2019-12-02 Thread Vinoth Chandar
FWIW I add spark jars as dependency to the hudi module, I am running and move it to the top of the list (so it resolves first). Then I don't see the netty issue anymore.. This is something worth documenting here? https://hudi.apache.org/contributing.html#ide-setup On Sat, Nov 30, 2019 at 4:05 AM

Re: Hudi vs AresDB?

2019-11-25 Thread Vinoth Chandar
Hi Marina, Thanks for reaching out. Hudi is now part of the Apache Software Foundation actually. :) Cutting to the chase, although you can build real-time dashboards using both, both systems are pretty different and provide different tradeoffs. For e.g: - AresDB primarily keeps data in memory

Re: [Discuss] Convenient time for weekly sync meeting

2019-11-25 Thread Vinoth Chandar
gt; > @Sivabalan Yes. Tuesday 9 - 10 pm PST still continues to be the 1st slot. > > > > On Tue, Nov 12, 2019 at 10:33 AM Sivabalan wrote: > > > As we work out details for 2nd slot, did we narrow down the slot for > 1st > > > one? Do we have a meeting later to

Re: [Discuss] Migrate from log4j to slf4j

2019-11-25 Thread Vinoth Chandar
Hi, Its log4j actually across the board. (I think there are a couple files that have non log4j loggers? might be good to fix to log4j as well for now to be consistent) Nonetheless, there is a JIRA for this already https://issues.apache.org/jira/browse/HUDI-233 Main thing we need to be mindful

Re: [DISCUSS] Hide Github issues tab and Unified management of issues in JIRA

2019-11-20 Thread Vinoth Chandar
currently. > > > > +1 to introduce issue template and management bot. > > > > Best, > > Vino > > > > Vinoth Chandar 于2019年11月19日周二 上午3:23写道: > > > > > If we decide to keep GitHub Issues, both great suggestions. We should > > still &g

20191119 Weekly Meeting

2019-11-19 Thread Vinoth Chandar
Hangout link here

Re: Apache project maturity model

2019-11-19 Thread Vinoth Chandar
Thanks Thomas! Will read it over and file tickets against the "release" component. On Sat, Nov 16, 2019 at 1:57 PM Thomas Weise wrote: > Hi, > > The maturity model is an (optional) framework for evaluating the project. I > would recommend to take a look and check if there are focus areas for

Re: [DISCUSS] Introduce stricter comment and code style validation rules

2019-11-18 Thread Vinoth Chandar
+1 on all three. Would there be a overhaul of existing code to add comments to all classes? We are pretty reasonable already, but good to get this in shape. 17:54:37 [incubator-hudi]$ grep -R -B 1 "public class" hudi-*/src/main/java | grep "public class" | wc -l 274 17:54:50

Re: Reporting 0.5.0-incubating release to reporter.apache.org

2019-11-18 Thread Vinoth Chandar
https://jira.apache.org/jira/browse/HUDI-343. tracks this On Sat, Nov 16, 2019 at 1:46 PM Thomas Weise wrote: > Sorry for the late reply. > > The reporter is applicable to top level projects. > > But please create a DOAP file for Hudi, where you can also list the > release:

Re: [DISCUSS] Hide Github issues tab and Unified management of issues in JIRA

2019-11-18 Thread Vinoth Chandar
gt; they are indeed issues after discussion, which need extra work. > > > > So keep the issues tab open may be more convenient for common users > while a > > little more expensive to maintain two entries. > > > > Open to hearing other thoughts. > > > > Best,

Re: Spark v2.3.2 : Duplicate entries found for each primary Key

2019-11-16 Thread Vinoth Chandar
hadoop.HoodieROTablePathFilter], > > classOf[org.apache.hadoop.fs.PathFilter]); > > On Nov 15 2019, at 1:37 pm, Vinoth Chandar wrote: > > > Hi, > > > > > > are you setting the path filters when you query the Hudi Hive table via > > > Spark > > > http

EMR + HUDI

2019-11-15 Thread Vinoth Chandar
Hello all, In case you did not notice, AWS EMR now has Hudi support, which should make life easier for folks on AWS. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi.html Thanks to our wonderful contributors from AWS (Udit & team) for making it happen Thanks Vinoth

Re: Default Max Events to Read from Kafka

2019-11-15 Thread Vinoth Chandar
Concurrent Writes :D The magic number 1M is from me actually :) . and there is no magic, it was picked to keep jobs from batch scanning Kafka since source-limit default was Long.MAX_VALUE (for dfs source).. I acknowledge you could go much larger. Happy to take a PR, to make this limit higher (say

Re: Spark v2.3.2 : Duplicate entries found for each primary Key

2019-11-15 Thread Vinoth Chandar
Hi, are you setting the path filters when you query the Hudi Hive table via Spark http://hudi.apache.org/querying_data.html#spark-ro-view (or http://hudi.apache.org/querying_data.html#spark-rt-view alternatively)? - Vinoth On Fri, Nov 15, 2019 at 5:03 AM Purushotham Pushpavanthar <

Re: [DISCUSS] RFC-10: Restructuring and auto-generation of docs

2019-11-15 Thread Vinoth Chandar
Sorry. bit late here to this party .. +1 on having the .md files alone on master along with code. Will comment on the RFC itself. Right now, we have a separate branch which may be sufficient IMO. Separate repo means, also separate access control/management. ? We should do a better job. But we

Re: [DISCUSS] Hide Github issues tab and Unified management of issues in JIRA

2019-11-15 Thread Vinoth Chandar
Hi Vino, To echo what Nishith was saying, issues is only being used currently for support i.e looking at stack traces for failures, user errors. Any real work resulting from that always gets a JIRA. I mulled the same thing - disabling issues - a while back. The value I see it adding is - if you

Re: DISCUSS RFC 6 - Add indexing support to the log file

2019-11-14 Thread Vinoth Chandar
> Balaji.VOn Tuesday, October 29, 2019, 07:52:00 PM PDT, Bhavani >> Sudha wrote: >> > >> > I vote for the second option. Also it can give time to analyze on how to >> > deal with backwards compatibility. I ll take a look at the RFC later >> > tonigh

Re: aws dependencies not working for writing for S3 Write access

2019-11-14 Thread Vinoth Chandar
Hi, You might want to subscribe the mailing list, so that the replies actually make it to the list automatically. This seems like a class version mismatch between jars, since you. are getting NoSuchMethodError (and not NoClassDefFound..) We don't bundle either hadoop or aws or spark jars. There

Re: [DISCUSS] Simplification of terminologies

2019-11-13 Thread Vinoth Chandar
n wiki and decide how to > proceed. > > On November 12, 2019 at 10:04 PM Vinoth Chandar < vin...@apache.org> > wrote: > > > Thanks everyone for the feedback. Looks like we are in general agreement. > > I am inclined to just do 1 & 2 and leave COPY_ON_WRITE as is based

Re: RFC process step 1 votes

2019-11-13 Thread Vinoth Chandar
Thanks for initiating this, Raymond! I am wondering if Lazy consensus + may be just +1 from PMC/Committers is sufficient for this? Like you mentioned, given RFC review itself has approvals. We don't want to filter ideas too early without even giving a chance for proposer to fully express it in a

Re: [DISCUSS] Intent to RFC: Restructuring and auto-generation of docs

2019-11-13 Thread Vinoth Chandar
Thanks for initiating this, Ethan. Will send detailed comments in a while. @raymond, I actually think this deserves an RFC for two reasons. (1) docs is as important as code and something developers have to deal with all the time. so good to get broad feedback on this. (2) We actually expanded

Weekly sync meeting - 20191112

2019-11-12 Thread Vinoth Chandar
Starting now https://hangouts.google.com/call/9e14PAcxRb2_TXPlOcCnAEEE

Re: [DISCUSS] Simplification of terminologies

2019-11-12 Thread Vinoth Chandar
t; > > file systems. > > > > > > > > > > Best, > > > > > Vino > > > > > > > > > > > > > > > Bhavani Sudha 于2019年11月12日周二 上午9:05写道: > > > > > > > > > > > +1 on all three rename proposals. I think this would make the &

Re: DISCUSS RFC 7 - Point in time queries on Hudi table (Time-Travel)

2019-11-12 Thread Vinoth Chandar
+1 Will review the RFC On Mon, Nov 11, 2019 at 11:36 PM Balaji Varadarajan wrote: > +1. This would be a powerful feature which would open up use-cases > requiring repeatable query results. > > Balaji.V > > > On Mon, Nov 11, 2019 at 8:12 AM nishith agarwal > wrote: > > > Folks, > > > >

Roadmap viz

2019-11-12 Thread Vinoth Chandar
Hello all, Since there are different aspects to the project and we have lots of new contributors, I thought it would help to look at the big picture visually and see how things connect. So I put together https://cwiki.apache.org/confluence/display/HUDI/Apache+Hudi#ApacheHudi-Roadmap and plan to

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-11 Thread Vinoth Chandar
hotCopier logic to find the latest file slices. > > So is it good to create a RFC for further discussion? > > > On Mon, Nov 11, 2019 at 4:31 PM Vinoth Chandar wrote: > > > What you suggest sounds more like an `Exporter` tool? I imagine you will > > support MOR as wel

Re: [DISCUSS] New RFC? Hudi dataset snapshotter

2019-11-11 Thread Vinoth Chandar
What you suggest sounds more like an `Exporter` tool? I imagine you will support MOR as well? +1 on the idea itself. It could be useful if plain parquet snapshot was generated as a backup. On Mon, Nov 11, 2019 at 4:21 PM Shiyan Xu wrote: > Hi All, > > The existing SnapshotCopier under Hudi

Re: [Discuss] Convenient time for weekly sync meeting

2019-11-11 Thread Vinoth Chandar
we can have an email conversation 3-4 days before the meeting to see > if there are any open items to discuss. If not, we don't necessarily need > the meeting. What do folks think ? > > Thanks, > Nishith > > On Sun, Nov 10, 2019 at 5:41 PM Vinoth Chandar wrote: > > > @kabe

[DISCUSS] Simplification of terminologies

2019-11-11 Thread Vinoth Chandar
Hello all, I wanted to raise an important topic with the community around whether we should rename some of our terminologies in code/docs to be more user-friendly and understandable.. Let me also provide some context for each, since I am probably guilty of introducing most of them in the first

Re: Migrate Existing DataFrame to Hudi DataSet

2019-11-11 Thread Vinoth Chandar
Hi, On 1. I am wondering if its relatd to https://issues.apache.org/jira/browse/HUDI-83 , i.e support for timestamps. if you can give us a small snippet to reproduce the problem that would be great. On 2, Not sure whats going on. there are no size limitations. Please check if you precombine

Re: New Committer : bhavanisudha

2019-11-07 Thread Vinoth Chandar
Congrats sudha! On Thu, Nov 7, 2019 at 5:46 PM vino yang wrote: > Congratulations Bhavani Sudha! Well deserved! > > Best, > Vino > > leesf 于2019年11月8日周五 上午9:24写道: > > > Congrats Sudha. > > > > Best, > > Leesf > > > > Balaji Varadarajan 于2019年11月8日周五 上午7:10写道: > > > > > Hello Apache Hudi

Re: [Discuss] Convenient time for weekly sync meeting

2019-11-06 Thread Vinoth Chandar
Interested. Mon-Thu 5AM-6:30AM PST Mon-Thu 9PM-10:30PM PST On Wed, Nov 6, 2019 at 12:28 PM Bhavani Sudha wrote: > Hello all, > > Currently the weekly sync meeting is scheduled to run on Tuesdays from 9pm > PST to 10 pm PST. Given our users are from multiple time zones, we can try > to see

Re: Unable to run Integration tests

2019-11-04 Thread Vinoth Chandar
fs data transient across integration test >> runs. I have removed the volumes in the compose file and updated the PR >> https://github.com/apache/incubator-hudi/pull/989 >> Hopefully, this should fix the flakiness. >> Balaji.V >> >> On Friday, November 1, 2019, 0

new committer: vinoyang/Hua Yang

2019-11-02 Thread Vinoth Chandar
Hello all, The Podling Project Management Committee (PPMC) for Apache Hudi (Incubating) has invited Hua Yang to become a committer and we are pleased to announce that he has accepted. Over the last few months, he has been a real champion for Hudi and brought a lot of great discussions to our

new committer: leesf/Shaofeng Li

2019-11-02 Thread Vinoth Chandar
Hello all, The Podling Project Management Committee (PPMC) for Apache Hudi (Incubating) has invited Shaofeng Li to become a committer and we are pleased to announce that he has accepted. Over the past few months, he has fixed a lot of issues around our test infrastructure and kickstarting

Re: [Question] Handling Avro Kafka Records with Epoch time in milliseconds

2019-11-02 Thread Vinoth Chandar
added to contributors and you have the jira now ;) On Fri, Nov 1, 2019 at 10:41 AM Gurudatt Kulkarni wrote: > Hi Vinoth, > Sure, I can work on that issue, my jira username is gurudatt. > > On Fri, Nov 1, 2019 at 10:05 PM Vinoth Chandar wrote: > > > Hi Gurudatt,

Re: [Question] Handling Avro Kafka Records with Epoch time in milliseconds

2019-11-01 Thread Vinoth Chandar
Hi Gurudatt, Right now, we are assuming longs to be unix_timestamp. It should very easy to add support for milli seconds. https://issues.apache.org/jira/browse/HUDI-324 Do you want to give it a shot? If not, i can give you a patch.For you, can also create your own key extractor class .. Thanks

Re: Unable to run Integration tests

2019-11-01 Thread Vinoth Chandar
a few follow ups : HUDI-322, HUDI-323 On Sat, Oct 26, 2019 at 9:36 AM Vinoth Chandar wrote: > Disabling UI is not doing the trick. I think it gets stuck while starting > up (and not while exiting like I assumed incorrectly before). > > On Fri, Oct 25, 2019 at 9:00 AM Vinoth Chandar

Re: DISCUSS RFC 6 - Add indexing support to the log file

2019-10-27 Thread Vinoth Chandar
complex. Alternatively we can keep it simple for now, disable by default and only advise to enable for new tables or when hudi version is stable On Sun, Oct 27, 2019 at 12:13 AM Vinoth Chandar wrote: > > https://cwiki.apache.org/confluence/display/HUDI/RFC-6+Add+indexing+support+to+the+lo

DISCUSS RFC 6 - Add indexing support to the log file

2019-10-27 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/RFC-6+Add+indexing+support+to+the+log+file Feedback welcome, on this RFC tackling HUDI-86

Re: Unable to run Integration tests

2019-10-26 Thread Vinoth Chandar
Disabling UI is not doing the trick. I think it gets stuck while starting up (and not while exiting like I assumed incorrectly before). On Fri, Oct 25, 2019 at 9:00 AM Vinoth Chandar wrote: > Could we disable the UI and try again? Its either the jetty threads or the > two HDFS threads

Re: Error while running Hive Sync (hoodie-0.4.7)

2019-10-25 Thread Vinoth Chandar
n > making Hudi work for Hadoop 3 and Hive 2.1.X and Hbase 2, Spark 2.4. Phew! > > >>> Would really love to support your use case on an official release :) > >Let me know how can I contribute. I never contributed majorly to any > OSS. > > > > On Fri, Oct 25,

Re: Unable to run Integration tests

2019-10-25 Thread Vinoth Chandar
urred the > > same errors I listed in previous mail. > > > > On Fri, Oct 25, 2019 at 8:26 AM Vinoth Chandar < > > mail.vinoth.chan...@gmail.com> wrote: > > > > > Got the integ test to hang once, at the same spot as Pratyaksh > > mentioned.. >

Re: [ANNOUNCE] Apache Hudi (incubating) 0.5.0 released!

2019-10-25 Thread Vinoth Chandar
+1 Thanks Balaji! @all Please spread the word :) https://twitter.com/apachehudi/status/1187681260405063680 https://www.linkedin.com/posts/vinothchandar_releases-activity-6593457461380898816-OoWs On Fri, Oct 25, 2019 at 4:01 AM vino yang wrote: > Thanks for your great job Balaji! Very happy

Re: Unable to run Integration tests

2019-10-24 Thread Vinoth Chandar
I’m going to look into the flaky tests on Travis sometime today. > > -Nishith > > Sent from my iPhone > > > On Oct 23, 2019, at 10:23 PM, Vinoth Chandar wrote: > > > > Just to make sure we are on the same page, > > > > can you try > > - Do : docker

Re: Error while running Hive Sync (hoodie-0.4.7)

2019-10-24 Thread Vinoth Chandar
11 > [2] https://github.com/apache/incubator-hudi/pull/625 > > > Regards, > Gurudatt > > On Mon, Oct 21, 2019 at 9:29 PM Gurudatt Kulkarni > wrote: > > > Hi Vinoth, > > > > Thank you for going back in time to figure out a way :) > > Will try out your

Re: Inline storage of parquet data in logs

2019-10-24 Thread Vinoth Chandar
hanks, > Jaimin > > On Thu, 24 Oct 2019 at 02:37, Kabeer Ahmed wrote: > > > Vinoth, > > > > Thanks for clarification. :-). > > I looked at the email from a periphery without getting into details. I > > will review it thoroughly in few days and catch up. >

<    3   4   5   6   7   8   9   10   11   12   >