Re: [QUESTION] May I ask if the Hudi contributor JIRA group can receive the notification email.

2019-08-07 Thread Vinoth Chandar
M PDT, Bhavani Sudha > > Saktheeswaran wrote: > > > > +1 I think it would be useful > > > > On Tue, Aug 6, 2019 at 9:45 AM Vinoth Chandar wrote: > > > > > This is what I see on the Notification settings . This sort of explains > > > it..

Re: Committership guidelines

2019-08-07 Thread Vinoth Chandar
Got a few reviews on the PR. So going ahead and merging. Please feel free to still leave comments on the PR, if you still have em. On Mon, Aug 5, 2019 at 9:10 AM Vinoth Chandar wrote: > Gentle reminder to review this PR! :) Have a great week! > > On Fri, Aug 2, 2019 at 5:29 AM Vinot

Re: [DISCUSS] Decouple Hudi and Spark

2019-08-07 Thread Vinoth Chandar
pport the hoodie-spark module > as a first-class. > > Thank, > Nishith > > > On Tue, Aug 6, 2019 at 9:48 AM taher koitawala wrote: > > > Hi Vinoth, > > Are there some tasks I can take up to ramp up the code? Want to > get > > more used to the code

Re: [DISCUSS] Decouple Hudi and Spark

2019-08-06 Thread Vinoth Chandar
compatible matrix -> > > https://beam.apache.org/documentation/runners/capability-matrix/ > > > > Hence my vote on Approch 1 let's decouple and build the abstract for each > > framework. That is a much better option. We will also have more control > >

Re: [QUESTION] May I ask if the Hudi contributor JIRA group can receive the notification email.

2019-08-06 Thread Vinoth Chandar
I am an administrator.. Even I don't get any emails for watches, assigned tickets. :) I would imagine such a thing would be configurable at the user level? Have you checked it out? Parallely, let me poke around the settings and see if I find something. On Tue, Aug 6, 2019 at 4:54 AM vino yang

Re: Committership guidelines

2019-08-05 Thread Vinoth Chandar
Gentle reminder to review this PR! :) Have a great week! On Fri, Aug 2, 2019 at 5:29 AM Vinoth Chandar wrote: > hello all, > > Put up a draft here https://github.com/apache/incubator-hudi/pull/823 > Please review > > /thanks/vinoth > > On Tue, Jul 30, 2019 at 9:53 PM v

Re: [DISCUSS] Decouple Hudi and Spark (HudiLink / approach)

2019-08-05 Thread Vinoth Chandar
Great discussions! Responded on the. original thread on decoupling.. Let's continue there? On Mon, Aug 5, 2019 at 1:39 AM Semantic Beeng wrote: > "design is more important. When we have a clear idea, it is not too late > to create an issue" > > 100% with Vino > > > On August 5, 2019 at 2:50 AM

Re: [DISCUSS] Decouple Hudi and Spark

2019-08-05 Thread Vinoth Chandar
Would like to highlight that there are two distinct approaches here with different tradeoffs. Think of this as my braindump, as I have been thinking about this quite a bit in the past. *Approach 1 : Point integration with each framework * >>We may need a pure client module named for example

Re: [DISCUSS] Decouple Hudi and Spark

2019-08-03 Thread Vinoth Chandar
Flink or any runtime stuff. On Sat, Aug 3, 2019 at 12:50 PM Vinoth Chandar < mail.vinoth.chan...@gmail.com> wrote: > Decoupling Spark and Hudi is the first step to bring in a Flink runtime, > and its also the hardest part. > > On the decoupling itself, the IOHandle classes are (al

Re: [DISCUSS] Decouple Hudi and Spark

2019-08-03 Thread Vinoth Chandar
Decoupling Spark and Hudi is the first step to bring in a Flink runtime, and its also the hardest part. On the decoupling itself, the IOHandle classes are (almost) unaware of Spark itself, where the Write/ReadClient and the Table classes are very aware.. First step here is to probably draw out

Re: Committership guidelines

2019-08-02 Thread Vinoth Chandar
e > > can do to encourage and recognize contributions. > > > > Thomas > > > > > > On Tue, Jul 30, 2019 at 11:30 AM Suneel Marthi > wrote: > > > > > Please go ahead - we can review the PR. > > > > > > On Tue, Jul 30, 2019 at

Re: [DISCUSS] Integrate Hudi with Apache Flink

2019-08-01 Thread Vinoth Chandar
> > And surely quite a few library version conflicts too. > > Instead we need to seek some abstractions in between them to decouple. > > Hence, the more use cases and design examples you provide the better. :-) > > @vc - thoughts? > > Kind regards > > Nick > >

Re: [DISCUSS] Integrate Hudi with Apache Flink

2019-07-31 Thread Vinoth Chandar
; > > > > > Thanks for your feedback. > > > > > > > > Since there is no objection, I have created an issue on JIRA for more > > > > discussion and let us move to HUDI-184.[1] > > > > > > > > Best, > > > > Vino &

Re: [DISCUSS] Refactor the package name of Hudi

2019-07-31 Thread Vinoth Chandar
s propose, if i want to introduce a new module, > shall I > > put it under package *com.uber.hoodie*? Or simply org.apache.hudi? > > > > Thanks > > Jing > > > > On Tue, Jul 30, 2019 at 10:44 AM Vinoth Chandar > wrote: > > > > > Hi, > &

Re: [DISCUSS] Integrate Hudi with Apache Flink

2019-07-30 Thread Vinoth Chandar
Awesome! On Tue, Jul 30, 2019 at 11:05 AM taher koitawala wrote: > Sure would like to start one and see how it goes. > > Regards, > Taher Koitawala > > On Tue, Jul 30, 2019, 11:17 PM Vinoth Chandar wrote: > > > Glad to see this being revived again. Love to get this g

Re: Committership guidelines

2019-07-30 Thread Vinoth Chandar
n, Jul 29, 2019 at 1:54 PM Vinoth Chandar wrote: > > > Hello mentors, > > > > I realized that we have not replaced our committership criteria with new > > ones, post the move to ASF. Would like to get that ball rolling. > > > > This seems like something we vote on? what do you think? > > > > /thanks/vinoth > > >

Re: [DISCUSS] Integrate Hudi with Apache Flink

2019-07-30 Thread Vinoth Chandar
Glad to see this being revived again. Love to get this going. For reference, the old thread on this topic https://lists.apache.org/thread.html/49ba202ce7947eecbf3a800def58afabcaf1f3b30481d8c4e0f88ec7@%3Cdev.hudi.apache.org%3E If we can have some volunteers to scope and plan the work, happy to

Re: [DISCUSS] Refactor the package name of Hudi

2019-07-30 Thread Vinoth Chandar
Hi, Could not agree more. Its captured under the work for the first release already https://issues.apache.org/jira/browse/HUDI-121?filter=-1 Balaji is the RM. Plan to do this in August. One issue we realized was that we need a solid migration path, since the tables are all registered with

Committership guidelines

2019-07-29 Thread Vinoth Chandar
Hello mentors, I realized that we have not replaced our committership criteria with new ones, post the move to ASF. Would like to get that ball rolling. This seems like something we vote on? what do you think? /thanks/vinoth

Re: Request contributor permission

2019-07-29 Thread Vinoth Chandar
Done . Welcome aboard! :) On Sun, Jul 28, 2019 at 9:49 PM vino yang wrote: > Hi, > > I want to contribute to Apache Hudi. > Would you please give me the contributor permission? > My JIRA ID is yanghua. > > Best, > Vino >

Re: [VOTE] Proposal to clone default JIRA workflow for Hudi project

2019-07-25 Thread Vinoth Chandar
Hello all, The vote closes after 72 hours has passed with 4 yes (binding), 2 yes, 1 zero On Thu, Jul 25, 2019 at 12:51 PM Vinoth Chandar wrote: > +1 binding > > On Wed, Jul 24, 2019 at 10:06 AM Vinoth Chandar wrote: > >> Gentle reminder: 24 hrs to go before vote closes >

Re: [VOTE] Proposal to clone default JIRA workflow for Hudi project

2019-07-25 Thread Vinoth Chandar
+1 binding On Wed, Jul 24, 2019 at 10:06 AM Vinoth Chandar wrote: > Gentle reminder: 24 hrs to go before vote closes > > On 2019/07/23 23:38:02, Kabeer Ahmed wrote: > > +1. > > > > On Jul 23 2019, at 11:36 pm, Vinoth Chandar wrote: > > > Thanks for

Re: [VOTE] Proposal to clone default JIRA workflow for Hudi project

2019-07-24 Thread Vinoth Chandar
Gentle reminder: 24 hrs to go before vote closes On 2019/07/23 23:38:02, Kabeer Ahmed wrote: > +1. > > On Jul 23 2019, at 11:36 pm, Vinoth Chandar wrote: > > Thanks for the feedback! > > > > In case my description was misleading, only change we want to do at this

Re: Spark 2.4 and Timestamp Type

2019-07-24 Thread Vinoth Chandar
Hi Kabeer, We definitely want to get to that. But right now, the focus is on first cleaning up the poms/bundles and pave the way for upgrading spark (which seems to needed around this). There is one immediate step we could do though around timestamp types. If someone can make a consolidated jira

Re: [VOTE] Proposal to clone default JIRA workflow for Hudi project

2019-07-23 Thread Vinoth Chandar
, Jul 22, 2019 at 11:12 AM Vinoth Chandar wrote: > > > > Hello all, > > > > Pursuant to https://issues.apache.org/jira/browse/INFRA-18765, I would > like > > to initiate a vote to clone the default workflow for Hudi. Specifically, > > this will allow

[VOTE] Proposal to clone default JIRA workflow for Hudi project

2019-07-22 Thread Vinoth Chandar
Hello all, Pursuant to https://issues.apache.org/jira/browse/INFRA-18765, I would like to initiate a vote to clone the default workflow for Hudi. Specifically, this will allow us to make changes like introducing new statuses, enabling anyone to change the ticket status etc and truly customize

Re: Spark Configuration

2019-07-19 Thread Vinoth Chandar
9 AM Amarnath Venkataswamy < > amarnath.venkatasw...@gmail.com> wrote: > > > yes.I am looking for the same thing only. > > > > On Thu, Jul 18, 2019 at 9:20 PM Vinoth Chandar > wrote: > > > >> No real reason. If you notice a sample configuration is presen

Re: Setting up a JIRA board

2019-07-18 Thread Vinoth Chandar
its up now. here. http://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=335 should be very easy to follow work in-flight using that On Wed, Jul 17, 2019 at 5:14 AM Vinoth Chandar wrote: > Hi all, > > Since we have a lot going on, took a stab at setting up a JIRA board for &g

Re: Spark Configuration

2019-07-18 Thread Vinoth Chandar
https://cwiki.apache.org/confluence/display/HUDI/Tuning+Guide https://hudi.apache.org/performance.html are good resources for what you need. On Thu, Jul 18, 2019 at 7:37 AM Amarnath Venkataswamy < amarnath.venkatasw...@gmail.com> wrote: > Hi > > Can you anyone of you share the Spark

Setting up a JIRA board

2019-07-17 Thread Vinoth Chandar
Hi all, Since we have a lot going on, took a stab at setting up a JIRA board for us to see the work in flight, assignees for easier collaboration. Ran into a bunch of snags reported here https://issues.apache.org/jira/browse/INFRA-18765 if you have any suggestions, please chime in here or on the

Re: Block bloom filters vs conventional bloom filters

2019-07-16 Thread Vinoth Chandar
Thanks for sending it along. Looks interesting. Knee deep in cleaning up poms and other stuff. Will read more closely and get back to you. :) On Mon, Jul 15, 2019 at 10:52 AM Prasanna wrote: > Hello Folks, > > http://algo2.iti.kit.edu/documents/cacheefficientbloomfilters-jea.pdf > > Looks like

Re: Request help testing new pom/bundles

2019-07-15 Thread Vinoth Chandar
Manu Zhang wrote: > Linking the jira issue id in the PR/commit will help to avoid duplicating > work. I think it's good to enforce this rule as a GitHub PR template. > > Thanks, > Manu > > On Tue, Jul 16, 2019 at 3:13 AM Vinoth Chandar wrote: > > > Thanks for co

Re: Request help testing new pom/bundles

2019-07-15 Thread Vinoth Chandar
Thanks for confirming. https://github.com/apache/incubator-hudi/pull/751 passed the demo and few others tests. Late Friday, also had a chat with the author of https://github.com/apache/incubator-hudi/pull/780 to rework that based off this PR. On Thu, Jul 11, 2019 at 11:32 PM Bhavani Sudha

Request help testing new pom/bundles

2019-07-11 Thread Vinoth Chandar
Hello all, https://issues.apache.org/jira/browse/HUDI-159 is an effort around looking at all the past bug reports around jar mismatch issues different users have hit. To this end, we have cleaned up the pom and bundles substantially in https://github.com/apache/incubator-hudi/pull/751 . I

Re: Add checkpoint metadata while using HoodieSparkSQLWriter

2019-07-11 Thread Vinoth Chandar
; This took about 38 minutes. You can see the details from the UI provided > below and the schema have 20 columns. > > Thanks for your consideration. > > kind regards, > > > > On Thu, Jul 11, 2019 at 12:28 AM Vinoth Chandar wrote: > >> Hi, >> >> >>A

Re: Add checkpoint metadata while using HoodieSparkSQLWriter

2019-07-10 Thread Vinoth Chandar
to set up and run deltastreamer in > >> continuous mode and ingest fake data in the following gist > >> https://gist.github.com/bvaradar/c5feec486fd4b2a3dac40c93649962c7 > >> > >> We will eventually get this to project wiki. > >> Balaji.V > >> >

kanban board

2019-07-09 Thread Vinoth Chandar
Hello all, To increase visibility into what's been worked on, I created a simple board here https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=334 what do you all think? (works as long as assignee hits "start progress" on issues being worked on. I have not spend time customizing it

Re: Out of Heap Error when inserting into Hudi dataset

2019-07-08 Thread Vinoth Chandar
I can help you out. :) Created https://issues.apache.org/jira/browse/HUDI-164 to track this. Please share your jira ID and I will assign it to you. :) We can write a simple loop that looks for the first non-zero size commit or fallback to default configs. On Wed, Jul 3, 2019 at 8:35 AM Kabeer

Re: Podling Report Reminder - July 2019

2019-07-01 Thread Vinoth Chandar
Thanks! Shall I mark your name in “signed-off by” section myself or I ll leave it you ? On Sat, Jun 29, 2019 at 9:53 PM Suneel Marthi wrote: > lgtm - ship it > > On Sat, Jun 29, 2019 at 9:52 PM Vinoth Chandar wrote: > > > Just a gentle reminder :) > > > > On Wed,

Re: Schema compatibility

2019-06-28 Thread Vinoth Chandar
to ingest > > records with old schema. > > Thanks,Balaji.VOn Wednesday, June 26, 2019, 10:52:16 AM PDT, Vinoth > > Chandar wrote: > > > > are you using DeltaStreamer with the Confluent Schema Registry? I think > > you > > read the stack trace right

Re: Schema compatibility

2019-06-26 Thread Vinoth Chandar
are you using DeltaStreamer with the Confluent Schema Registry? I think you read the stack trace right.. Schema Registry may be using the latest schema instead of reading it using the schema the record was written in. I remember Balaji alluded to a (related?) issue around this.. balaji? On Wed,

Re: Hudi Documentation feedback

2019-06-26 Thread Vinoth Chandar
Very impressive! thanks for sharing! I am sure there are others in the list who are struggling with CDC as well. Do you mind if I add this to the http://hudi.apache.org/powered_by.html#articles page? (also have few other articles to add as well) Metorikku sounds interesting as well. On Wed, Jun

Re: Hoodie dataset write without partition

2019-06-24 Thread Vinoth Chandar
Amarnath, Mind sending a PR with updated docs once you get it working? :) might be useful for others too. Non partitioned tables have come up few times now On Mon, Jun 24, 2019 at 2:57 PM vbal...@apache.org wrote: > > Hi Amarnath, > Apart from changing the partition extractor class, you

Re: Add checkpoint metadata while using HoodieSparkSQLWriter

2019-06-20 Thread Vinoth Chandar
; > spark UI which can be found at the following link > > > https://github.com/apache/incubator-hudi/issues/714. > > > In the UI, it seems that the ingestion with the data source API is > > > spending much time in the count by key of HoodieBloomIndex and &

Re: KEEP_LATEST_COMMIT vs KEEP_LATEST_VERSION

2019-06-15 Thread Vinoth Chandar
am tools so KEEP_LATEST_FILE_VERSIONS > and CLEANER_FILE_VERSIONS_RETAINED_PROP = "1" works. > > Also I can control when to start consuming data from downstream jobs so I > don’t face issue with files deleted while running query etc. > > > On Thursday, 13 June 2019, Vinoth Chan

Re: KEEP_LATEST_COMMIT vs KEEP_LATEST_VERSION

2019-06-12 Thread Vinoth Chandar
ire KEEP_LATEST_FILE_VERSIONS and some > people might find it useful as I do. > > Thanks! > Gary > > > On Tue, Jun 11, 2019 at 9:20 AM Vinoth Chandar wrote: > > > Cool. So, cleaning policy determines how we clean up older versions of > file > > groups (simpl

Re: Possible ambiguity in HoodieKey

2019-06-11 Thread Vinoth Chandar
sg. Look forward to the PR :) On Tue, Jun 11, 2019 at 9:19 AM Jaimin Shah wrote: > Sure I will update once I make those changes thanks. > > On Tuesday, 11 June 2019, Vinoth Chandar wrote: > > > Thanks for the link. I was grabbing in parallel as well :) > > > > So

Re: KEEP_LATEST_COMMIT vs KEEP_LATEST_VERSION

2019-06-11 Thread Vinoth Chandar
Hi Gary, Do you mean cleaning policy? KEEP_LATEST_FILE_VERSIONS vs KEEP_LATEST_COMMITS ? Thanks VInoth On Mon, Jun 10, 2019 at 9:47 PM Gary Li wrote: > Hello, > > I am a little confused when I was looking at the compaction policy. What is > the difference between KEEP_LATEST_COMMIT vs

Re: Possible ambiguity in HoodieKey

2019-06-11 Thread Vinoth Chandar
Hi Jaimin, True. Is this a custom class you have? if we separate the concatenation by a standard special character, it should be fine? for e.g CA#US, C#AUS ? Thanks Vinoth On Mon, Jun 10, 2019 at 4:53 AM Jaimin Shah wrote: > Hi > I was going through the ComplexKeyGenerator class. I found

Re: Add checkpoint metadata while using HoodieSparkSQLWriter

2019-06-03 Thread Vinoth Chandar
ear real time analytics, can > we consider Hudi as a batch job? > > Kind regards, > > > On Thu, May 30, 2019 at 5:52 PM Vinoth Chandar wrote: > > > Hi, > > > > Short answer, by default any parameter you pass in using option(k,v) or > > options()

Re: Strange exception after upgrade to 0.4.7

2019-06-01 Thread Vinoth Chandar
ards > > Yuanbin Cheng > CR/PJ-AI-S1 > > > > -Original Message- > From: Vinoth Chandar > Sent: Friday, May 31, 2019 1:19 AM > To: dev@hudi.apache.org > Subject: Re: Strange exception after upgrade to 0.4.7 > > Hi, > > This does sound like a jar mism

Re: 0.4.6 release and upcoming 0.4.7

2019-05-31 Thread Vinoth Chandar
for a bug fix (although it should be simple to build it yourself as well) On Tue, May 28, 2019 at 8:07 PM Vinoth Chandar wrote: > Following up on this. 0.4.7 is out > https://github.com/apache/incubator-hudi/releases/tag/hoodie-0.4.7 > > All future releases will be apache releases, with B

Re: Strange exception after upgrade to 0.4.7

2019-05-31 Thread Vinoth Chandar
n > 1. GitHub repository compiled on my laptop > 2. Source Code of the 0.4.7 compiled on my laptop > All worked very well. > > Maybe, it because of the Maven release. > > Best regards > > Yuanbin Cheng > CR/PJ-AI-S1 > > > > -Original Message- > From:

Re: Add checkpoint metadata while using HoodieSparkSQLWriter

2019-05-30 Thread Vinoth Chandar
Hi, Short answer, by default any parameter you pass in using option(k,v) or options() beginning with "_" would be saved to the commit metadata. You can change "_" prefix to something else by using the DataSourceWriteOptions.COMMIT_METADATA_KEYPREFIX_OPT_KEY(). Reason you are not seeing the

Re: Strange exception after upgrade to 0.4.7

2019-05-29 Thread Vinoth Chandar
Also curious if this error does not happen with 0.4.6? Can you please confirm that? It would be helpful to narrow it down On Wed, May 29, 2019 at 6:25 PM vbal...@apache.org wrote: > Hi Yuanbin, > > Not sure if I completely understood the problem. Are you using > "com.uber.hoodie" format for

Re: 0.4.6 release and upcoming 0.4.7

2019-05-28 Thread Vinoth Chandar
Following up on this. 0.4.7 is out https://github.com/apache/incubator-hudi/releases/tag/hoodie-0.4.7 All future releases will be apache releases, with Balaji as our release manager for the first release. On Tue, May 14, 2019 at 8:28 AM Vinoth Chandar wrote: > Hello all, > > 0.4.

Re: convert existing Parquet table using Hive/MR

2019-05-22 Thread Vinoth Chandar
Hi, Unfortunately not. Hudi writing is a spark job. With just MR/Hive, we were unable to implement features like file sizing/index lookup which need shuffling of data. Hive OutputFormat for e.g, does not allow to be extended in this fashion. Thanks Vinoth On Wed, May 22, 2019 at 12:55 AM

Re: [DISCUSS] Faster Hive incremental pull queries

2019-05-20 Thread Vinoth Chandar
I just gave you wiki access. can you try again ? On Mon, May 20, 2019 at 10:53 AM Bhavani Sudha Saktheeswaran wrote: > Hi, > > I am trying to create a HIP in cwiki ( username: bhasudha) . Seems like I > need some access to create a HIP. Can you grant me permission ? > > Thanks, > Sudha > > On

Re: Read RO table in Spark as hive table | No records returned

2019-05-20 Thread Vinoth Chandar
veTable and manage it programmatically. Any > example would help. > Or others who can chip in here and say that they have used APIs to drive > this would help me to definitively spend time on this. > Thanks > Kabeer. > > On May 17 2019, at 4:03 pm, Vinoth Chandar wrote: > > Glad you got

Re: Questime abou the Payload in Hudi

2019-05-19 Thread Vinoth Chandar
ave some discussion about it. I am glad to make a > patch about it. > > Thanks so much for the reply and help. > > Mit freundlichen Grüßen / Best regards > > Yuanbin Cheng > CR/PJ-AI-S1 > > > > -Original Message- > From: Vinoth Chandar > Sent: Friday, May 17, 2019

Re: Upgrade HUDI to Hive 2.x

2019-05-17 Thread Vinoth Chandar
I am in favor of deprecating Hive 1.x unless someone has a strong objection. Most cloud offerings like EMR/Data Proc all support Hive 2.x and Hive 3.x is going to grow. This seems like a move in the right direction /thanks/vinoth On Fri, May 17, 2019 at 11:55 AM nishith agarwal wrote: > All, >

Re: Read RO table in Spark as hive table | No records returned

2019-05-17 Thread Vinoth Chandar
Glad you got it working.. Any reason why you are not using the Hive sync tool to manage the table creation/registration to Hive? On Fri, May 17, 2019 at 7:04 AM satish.sidnakoppa...@gmail.com < satish.sidnakoppa...@gmail.com> wrote: > > > On 2019/05/17 12:45:26, satish.sidnakoppa...@gmail.com <

Re: Questime abou the Payload in Hudi

2019-05-17 Thread Vinoth Chandar
could change this behavior to match pre-combining. Are you interested in sending a patch? Thanks Vinoth On Fri, May 17, 2019 at 7:18 AM Vinoth Chandar wrote: > Thanks for the clear example. Let me check this out and get back shortly. > > On Thu, May 16, 2019 at 5:29 PM Yanjia Li > wrote

Re: Questime abou the Payload in Hudi

2019-05-17 Thread Vinoth Chandar
d similar issue > before. > > Thanks so much! > Gary > > On Thu, May 16, 2019 at 2:49 PM Vinoth Chandar wrote: > > > Hi, > > > > (Please subscribe to the mailing list, so the message actually comes over > > directly to the list.) > > &

Re: Request to join slack group

2019-05-16 Thread Vinoth Chandar
Glad to have you on board as well! Added you to slack On Wed, May 15, 2019 at 11:09 PM Yanjia Li wrote: > Hello All, > > My name is Gary from Bosch. I am currently working on migrating Hudi to our > current data platform. Glad to join Hudi community. > > May I get an invitation to the Slack

Re: Hudi on Mapr

2019-05-15 Thread Vinoth Chandar
t format). > > If we are not able to solve this problem we have to rebuild the target > everytime. > > I can able to complete all the way to quick start steps successfully. > > I appreciate if you can provide sometime to discuss further over the phone > or Webex. > > > &

Re: Travis CI integration

2019-05-14 Thread Vinoth Chandar
yes. but the travis account is from the old location of the repo i.e Uber's. We are trying to see if we can avail the ci infra in apache.. Balaji probably can answer the Jenkins part. On Sun, May 12, 2019 at 5:36 PM Thomas Weise wrote: > Hi, > > Isn't the Travis integration already working? >

Re: Last commit id/ts checkpoint for incremental pull

2019-05-14 Thread Vinoth Chandar
ting the read. > > However, if we were running a MOR table (and provided the compaction job > has not run in between step 1 and step 2), we would receive the value of > row 1 at state c3. > > Is this correct? > > Roshan > > > On Thu, May 9, 2019 at 3:39 AM Vinoth Cha

Re: Data change events table in Hudi

2019-05-14 Thread Vinoth Chandar
les rather than coordinate > an off-peak slot to run compaction on the read-optimized view? > > Rishan > > On Wed, May 15, 2019 at 12:05 AM Vinoth Chandar wrote: > > > Hi Roshan, > > > > Good point. Actually the incremental view + either > read-optimiz

Re: Hudi on Mapr

2019-05-14 Thread Vinoth Chandar
Hi Amarnath, Do you mean mapr fs? If so, I don't think this has come up before. I think it should be doable though. Can you provide more details may be? Thanks Vinoth On Tue, May 14, 2019 at 8:18 AM Amarnath Venkataswamy < amarnath.venkatasw...@gmail.com> wrote: > Hi > > Is there anyone

Re: [DISCUSS] Steps to making the first Apache release

2019-05-09 Thread Vinoth Chandar
://www.apache.org/dev/release-publishing.html > > Thanks, > Thomas > > > > > On Fri, Apr 26, 2019 at 11:29 AM Vinoth Chandar wrote: > > > Hello all, > > > > Starting this thread to discuss the prep work needed to make code > suitable >

Re: Supporting Collapse type operation for better data layout

2019-05-08 Thread Vinoth Chandar
This would be an exciting project! On Wed, May 8, 2019 at 5:19 PM nishith agarwal wrote: > High level requirements : > > 1. Write larger files while keeping the ingestion & query latencies low > 2. Better data layout, for eg.,when rewriting smaller files to larger ones, > piggyback on the I/O

Re: Last commit id/ts checkpoint for incremental pull

2019-05-07 Thread Vinoth Chandar
Hi Roshan, Thanks for writing. Yes. the user needs to manage the _commit_time watermark on the HiveIncrementalPuller path. Also you need to set the table in incremental mode, providing a start commit_time and max_commits to pull as documented. The DeltaStreamer tool will manage it for you

Re: Data change events table in Hudi

2019-05-07 Thread Vinoth Chandar
Thanks for starting the thread, Minh! We do the same thing at Uber actually. Its handy to join these two at times and its a common pattern. so curious to know what others think? DeltaStreamer option seems like a good idea. Some implementation considerations on how we configure this second table

Re: How to capture the actual query in hive-plugin

2019-05-06 Thread Vinoth Chandar
Hi Anshuman, Not sure about the presto internals needed to accomplish your goal. But I can talk about the annotation I added back then. As long as your CustomInputFormat adds it, Presto will fallback to obtaining splits using that, instead of listing filesystem on its own. Hope that helps Thanks

Re: [IMP] Understanding present state and planning ahead

2019-05-01 Thread Vinoth Chandar
knowing the 5 non-uber production cases, from the survey, would encourage a lot of users to join our community as well!! :) On Tue, Apr 23, 2019 at 11:18 AM Vinoth Chandar wrote: > Bumping this thread again. We got 9 responses so far, with 5 production > use-cases. > > One more hu

Re: About github issue 639

2019-05-01 Thread Vinoth Chandar
nting out the top 100 errors") ... > ''' > Balaji.V > > On Tuesday, April 30, 2019, 8:17:57 AM PDT, Vinoth Chandar < > vin...@apache.org> wrote: > > Hi Jun, > > Basically you are saying streaming path leaves some inflights behind.. let >

Re: Starting point for contribution

2019-04-30 Thread Vinoth Chandar
+1 if you can find a JIRA that interests you, we 'd be happy to discuss it over mailing list before you begin working and offer some guidance if needed as well. Welcome to the team! On Tue, Apr 30, 2019 at 4:54 PM vbal...@apache.org wrote: > > Hi Abhishek, > Great to see that you are

Re: About github issue 639

2019-04-30 Thread Vinoth Chandar
am I think. > Regards, > Jun > > > On Tue, Apr 30, 2019 at 2:46 AM Vinoth Chandar wrote: > > > Another option to try would be setting the > > spark.sql.hive.convertMetastoreParquet=false, if you are querying via the > > Hive table registered by Hudi. > > &g

Re: About github issue 639

2019-04-29 Thread Vinoth Chandar
reduce.input.pathFilter.class", > classOf[com.uber.hoodie.hadoop.HoodieROTablePathFilter], > classOf[org.apache.hadoop.fs.PathFilter]);` from the phenomenon, the > config did not take effects maybe. > > On Sat, Apr 27, 2019 at 12:09 AM Vinoth Chandar wrote: > > >

Re: [DISCUSS] Creating HIPs on cWIKI instead of Google Docs

2019-04-26 Thread Vinoth Chandar
my iPhone > > > > > On Apr 26, 2019, at 11:39 AM, Balaji Varadarajan > > wrote: > > > > > > > > > +1 on using cwiki for HIPsi. Avoids unnecessary copying work and keeps > > all information (in-progress/complete) in one place. > > > Bal

[DISCUSS] Steps to making the first Apache release

2019-04-26 Thread Vinoth Chandar
Hello all, Starting this thread to discuss the prep work needed to make code suitable and easy for our first Apache release. First good step would be to rename the packages to org.apache.hudi? Here's a list of items that may need to happen before that. - Cut a final release on com.uber.hoodie

[DISCUSS] Creating HIPs on cWIKI instead of Google Docs

2019-04-26 Thread Vinoth Chandar
Hello all, Like to propose modifiying the process to create HIPs right on cWiki, since it already supports commenting/resolving flows and it eliminates need to format content from gdocs to cWiki again. I also noticed a wealth of tools on cWiki for diagrams etc.. we can leverage all that in one

Re: Reading Merge_on_read table| Unable to read updated records after multiple updates

2019-04-26 Thread Vinoth Chandar
updated record are fetched from log1 file. > > Only after third update both the updates are placed in log files. > > > > > On Fri 26 Apr, 2019, 6:30 PM Vinoth Chandar > > Looks like you are querying the RO table? If so, the query only hits > > parquet file; which was prob

Re: About github issue 639

2019-04-26 Thread Vinoth Chandar
; > 2019-04-24 00:35:18 881925 20190423163429.deltacommit > > 2019-04-24 00:46:14 2991 20190423164428.clean > > 2019-04-24 00:45:44 888025 20190423164428.deltacommit > > Thanks, > Jun > > On 2019/04/18 14:29:23, Vinoth Chandar wrote: > > Hi

Re: Reading Merge_on_read table| Unable to read updated records after multiple updates

2019-04-26 Thread Vinoth Chandar
https://github.com/apache/incubator-hudi/issues/652#issuecomment-487016906 Looks like Nishith and you were chatting about this here. On Fri, Apr 26, 2019 at 6:00 AM Vinoth Chandar wrote: > Looks like you are querying the RO table? If so, the query only hits > parquet file; which was pr

Re: How to use HoodieDeltaStreamer for upsert on JsonDFSSource

2019-04-24 Thread Vinoth Chandar
The demo here https://hudi.apache.org/docker_demo.html actually invokes this path.. Is that helpful? Balaji, please correct me if I am wrong. Thanks Vinoth On Wed, Apr 24, 2019 at 4:07 AM Jack Wang wrote: > Hi forks, > > Doesn't anyone know how to use HoodieDeltaStreamer for upsert on >

Re: Hudi CLI doesn't work for dedup

2019-04-24 Thread Vinoth Chandar
Hi Jack, Trying to understand what your goal is. DeDupeSparkJob is used for repairing datasets with repairs.. Is this your intention? Mailing list does not show images. :( Can you please post the code here or in a gist? I can take a look. Thanks Vinoth On Wed, Apr 24, 2019 at 3:57 AM Jack Wang

Re: Hudi support for records deduplication

2019-04-24 Thread Vinoth Chandar
Hi Li, Welcome. Both the delta streamer and data source support an option to de-duplicate data before inserting. How are you planning on writing the Hudi dataset? I can point you in the right direction accordingly Thanks Vinoth On Tue, Apr 23, 2019 at 4:12 PM Li Gao wrote: > Hi Hudi

Re: About hive table column

2019-04-24 Thread Vinoth Chandar
Hi Jun, We assign a seq_no to each record upserted in each commit. Use cases we had/have around this are to be able to building windowing/incremental consumption at record level and not commit level as we do now. Hope that helps. Thanks Vinoth On Tue, Apr 23, 2019 at 8:24 PM Jun Zhu wrote: >

Re: [IMP] Understanding present state and planning ahead

2019-04-23 Thread Vinoth Chandar
, folks please take 2 mins to fill out the survey. > > > > Thanks, > > Nishith > > > > On Sat, Apr 6, 2019 at 11:47 PM Kabeer Ahmed > wrote: > > > > > +1. Just submitted my input to this form. > > > On Apr 5 2019, at 5:46 pm, Vinot

Re: Not able to find HoodieJavaApp

2019-04-23 Thread Vinoth Chandar
Hoodie spark datasource to upsert delete as of now. > > Regards, > Umesh > > On Mon, Apr 22, 2019 at 8:24 PM Vinoth Chandar wrote: > > > Hi Umesh, > > > > This is on top of my list of the week. But If you already have input data > > somewhere on s3/hdfs,

Re: Please add me to Hudi slack community

2019-04-22 Thread Vinoth Chandar
Done! On Mon, Apr 22, 2019 at 4:20 PM Abhishek Sharma wrote: > Hi, > > Please add me to the Apache Hudi slack community. Email id is > *abhioncbr.apa...@gmail.com > * > > Thanks >

Re: Request for Hudi Jira contibutor

2019-04-22 Thread Vinoth Chandar
Welcome! added! On Mon, Apr 22, 2019 at 1:14 PM Abhishek Sharma wrote: > Hi, > > I want to contribute to Hudi project. Please provide me the contributor > permission. > My Jira ID is *abhioncbr* > > Thanks > Abhishek >

Re: Not able to find HoodieJavaApp

2019-04-22 Thread Vinoth Chandar
sh Kacha wrote: > > > Thanks Vinoth yes please that would be great HoodieJavaApp moved out of > > test and working. > > > > On Sat, Apr 20, 2019, 6:09 AM Vinoth Chandar < > > mail.vinoth.chan...@gmail.com> wrote: > > > >> Sorry. Not following. If you

Re: Error updating partition

2019-04-21 Thread Vinoth Chandar
Hi Liran, Can you please provide us with the full stack trace? Are you running off 0.4.6-SNAPSHOT or 0.4.5 release? Thanks Vinoth On Sun, Apr 21, 2019 at 4:58 PM Liran Yogev wrote: > Hi! > Just tried hudi in production and I'm getting a lot of these errors: > > ... > *19/04/21 11:42:39 ERROR

Re: Not able to find HoodieJavaApp

2019-04-19 Thread Vinoth Chandar
.5.jar would do that since it's an uber > jar but it's not recently I found I had to add spark maven coordinates > separately in pom file. Anyways if you can give me list of jars I can put > in a classpath and run. > > On Fri, Apr 19, 2019, 11:40 PM Vinoth Chandar wrote: > > &g

Re: Not able to find HoodieJavaApp

2019-04-18 Thread Vinoth Chandar
Hi Umesh, IIUC, your suggestion is without the need to checkout/build source code, one should be able to run the sample app? That does seem fair to me. We had to move test data generator out of tests to place this under source code. I am hoping something like hoodie-bench could be a more

Hudi reddit thread

2019-04-17 Thread Vinoth Chandar
Ran into this interesting thread today https://www.reddit.com/r/bigdata/comments/baxx9q/uber_hudi/ Just sharing

Re: Can we bulk insert without using physical partition column/folder in hdfs?

2019-04-16 Thread Vinoth Chandar
Hi Umesh, Could you share the exact error? Also have you tried using com.uber.hoodie.NonpartitionedKeyGenerator.class as the key generator class? >> Is string allowed as time key column? Sorry not fully following. Please clarify. Thanks Vinoth On Tue, Apr 16, 2019 at 11:02 AM Umesh Kacha

<    5   6   7   8   9   10   11   12   >