[Question] How to use Hudi for migrating a historical mysql table?

2020-08-21 Thread Gurudatt Kulkarni
Hi All, I have a use case where there is historical data available in MySQL table which is being populated by a Kafka topic. My plan is to create a spark job that will migrate data from MySQL using Hudi Datasource. Once the migration of historical data is done from MySQL, use Deltastreamer to

Re: DISCUSS code, config, design walk through sessions

2020-07-10 Thread Gurudatt Kulkarni
If possible recoding of these sessions would be great, to fill the timezone gap. On Friday, July 10, 2020, Pratyaksh Sharma wrote: > @Vinoth Chandar Time zones are indeed tricky. Maybe we > can do a poll again to decide on the time for these sessions given the > community size has increased

Re: DISCUSS code, config, design walk through sessions

2020-07-06 Thread Gurudatt Kulkarni
+1 Really a great idea. Will help in understanding the project better. On Mon, Jul 6, 2020 at 1:35 PM Pratyaksh Sharma wrote: > This is a great idea and really helpful one. > > On Mon, Jul 6, 2020 at 1:09 PM wrote: > > > +1 > > It can also attract more partners to join us. > > > > > > > > On

[DISCUSS] Can Hudi DAG be written in Apache Beam?

2020-02-07 Thread Gurudatt Kulkarni
Hi All, I am just putting out my thoughts based on my primitive knowledge about Beam/Hudi. I saw some discussions around making Hudi compatible with Flink. Is it possible to make Hudi compatible with Beam, in turn making Hudi compatible with Spark/ Flink /(Others)? Is it worth exploring in this

Re: [QUESTION] Why is TimestampBasedKeyGenerator part of hudi-utilities?

2019-12-05 Thread Gurudatt Kulkarni
; DataSourceWriteOptions (hoodie.datasource.xxx) for configuring > TimestampBasedKeyGenerator as part of datasource write. > Balaji.V > On Wednesday, December 4, 2019, 10:30:40 PM PST, Gurudatt Kulkarni < > guruak...@gmail.com> wrote: > > Hi All, > > All other key generators are

[QUESTION] Why is TimestampBasedKeyGenerator part of hudi-utilities?

2019-12-04 Thread Gurudatt Kulkarni
Hi All, All other key generators are part of hudi-spark except TimestampBasedKeyGenerator, it causes issue while using just hudi-spark directly in a spark job. Any specific reason for this? Can we move this to hudi-spark module? Regards, Gurudatt

Re: [DISCUSS] Hide Github issues tab and Unified management of issues in JIRA

2019-11-22 Thread Gurudatt Kulkarni
/github.com/actions/starter-workflows/blob/master/automation/stale.yml > > > > On Mon, Nov 18, 2019 at 10:33 PM Gurudatt Kulkarni > > wrote: > > > > > > With templates, we can collect good information while people file the > > > > issues..Not sur

Re: Issue while querying Hive table after updates

2019-11-21 Thread Gurudatt Kulkarni
://github.com/apache/incubator-hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java > > Thanks, > Balaji.V > > > On Mon, Nov 18, 2019 at 11:15 PM Gurudatt Kulkarni > wrote: > > > Hi Bhavani Sudha, > > > &g

Re: [DISCUSS] Hide Github issues tab and Unified management of issues in JIRA

2019-11-18 Thread Gurudatt Kulkarni
y opinion. :) > > > > With templates, we can collect good information while people file the > > issues..Not sure about permissions we have on JIRA to enable bots, but > may > > have more luck on github workflows doing these already? > > Can we do templates/r

Re: [DISCUSS] Hide Github issues tab and Unified management of issues in JIRA

2019-11-18 Thread Gurudatt Kulkarni
Hi Vinoth / Vino, Just adding my 2 cents to the discussion. Yes, I agree that GitHub issues are low friction and can be the first line of support. It will help in keeping the JIRA clean. Potential solutions that I have come across in the community, 1. Introduce an issue template. 2. Add a bot

Issue while querying Hive table after updates

2019-11-17 Thread Gurudatt Kulkarni
Hi All, I am facing an issue where the aggregate query fails on partitions that have more than one parquet file. But if I run a select *, query it displays all results properly. Here's the stack trace of the error that I am getting. I checked the hdfs directory for the particular file and it

Re: [DISCUSS] RFC-10: Restructuring and auto-generation of docs

2019-11-13 Thread Gurudatt Kulkarni
Hi Ethan, Thanks for the RFC. I have a few observations about the docs. I saw the diagram for the docs, asf-site and master are kept as separate branches, currently. (1) I suggest two approaches for maintaining the docs looking at how other popular Apache projects do, - Apache Flink

Unable to query Hudi Tables via Spark Shell

2019-11-13 Thread Gurudatt Kulkarni
Hi All, I am running into a strange issue where I am unable to query Hudi tables via spark-shell. I followed the procedure as stated in Hudi Docs . Used this command spark-shell --jars hdfs:///jars/hudi-spark-bundle-0.5.1-SNAPSHOT.jar --master

Re: RFC process step 1 votes

2019-11-13 Thread Gurudatt Kulkarni
The voting procedure for Apache projects is described here https://www.apache.org/foundation/voting.html. I am not sure if this is open to change. On Wed, Nov 13, 2019 at 2:20 PM vino yang wrote: > Hi Shiyan, > > Thanks for starting this discussion. > > +1 from my side > > Some additional

Re: [Discuss] Convenient time for weekly sync meeting

2019-11-06 Thread Gurudatt Kulkarni
Interested. Mon-Thu 5AM-6:30AM PST Mon-Thu 9PM-10:30PM PST These timings work for me. On Thu, Nov 7, 2019 at 10:20 AM Gary Li wrote: > Interested. > Mon-Thu 8 PM-11 PM PST. > It's very difficult to cover America, Europe, and Asia in the same meeting. > Maybe we can have US and US two

Re: [Question] Handling Avro Kafka Records with Epoch time in milliseconds

2019-11-01 Thread Gurudatt Kulkarni
apache.org/jira/browse/HUDI-324 Do you want to give it a > shot? > If not, i can give you a patch.For you, can also create your own key > extractor class .. > > > Thanks > Vinoth > > On Fri, Nov 1, 2019 at 5:12 AM Gurudatt Kulkarni > wrote: > > > Hi All, >

[Question] Handling Avro Kafka Records with Epoch time in milliseconds

2019-11-01 Thread Gurudatt Kulkarni
Hi All, I have a use-case where the data from the database is being pulled by Kafka Connect using JDBC Connector, and the data is in Avro format. I am trying to partition the table based on date, but the column is in long type. Below is the avro schema for the column { "type":

Re: Error while running Hive Sync (hoodie-0.4.7)

2019-10-25 Thread Gurudatt Kulkarni
this :) > > Do you have plans to eventually upgrade Hive? Would really love to support > your use case on an official release :) > > On Thu, Oct 24, 2019 at 4:58 AM Gurudatt Kulkarni > wrote: > > > Hi Vinoth, > > > > I tried your second suggestion and it

Re: Error while running Hive Sync (hoodie-0.4.7)

2019-10-24 Thread Gurudatt Kulkarni
/pull/625 Regards, Gurudatt On Mon, Oct 21, 2019 at 9:29 PM Gurudatt Kulkarni wrote: > Hi Vinoth, > > Thank you for going back in time to figure out a way :) > Will try out your suggestions. > > Regards, > Gurudatt > > On Monday, October 21, 2019, Vinoth Chandar wrot

Re: Hive sync parameters

2019-10-22 Thread Gurudatt Kulkarni
Hi Qian, I have faced this error before. It generally happens if your hive server and client(in this case hudi) are on different versions. What version of hive server are you using, because hudi is compiled against Hive 2.3.X. Regards, Gurudatt On Wednesday, October 23, 2019, Qian Wang wrote:

Re: Compile failing on master

2019-10-21 Thread Gurudatt Kulkarni
Ya sure will do it. On Monday, October 21, 2019, Bhavani Sudha wrote: > Great. Thanks for testing that. Can you send a PR with that change ? > > > On Mon, Oct 21, 2019 at 8:51 AM Gurudatt Kulkarni > wrote: > >> Hi Bhavani, >> >> Yes it worked for me aft

Re: Compile failing on master

2019-10-18 Thread Gurudatt Kulkarni
error. > > On Fri, Oct 18, 2019 at 4:24 PM vbal...@apache.org > wrote: > > > > > I have seen this happens sometimes if you have a custom > ~/.m2/settings.xml > > file. Try removing it and check. > > Balaji.VOn Friday, October 18, 2019, 03:31:09 AM PDT

Compile failing on master

2019-10-18 Thread Gurudatt Kulkarni
Hi All, I ran `mvn compile` on master, but it fails to build completely because it was unable to find Confluent dependencies. I added https://packages.confluent.io/maven2 in repositories in the main pom.xml and it built successfully. Just curious, how you guys are compiling? Can we add confluent

Re: Error while running Hive Sync (hoodie-0.4.7)

2019-10-16 Thread Gurudatt Kulkarni
s there any chance to make master work with Hive 1.x with some > > custom changes? > > > > On Mon, Oct 14, 2019 at 10:11 PM Gurudatt Kulkarni > > wrote: > > > > > Hi Vinoth, > > > > > > Thank you for the quick response, but using the master

Error while running Hive Sync (hoodie-0.4.7)

2019-10-14 Thread Gurudatt Kulkarni
Hi All, I am using HoodieDeltaStreamer (hoodie-0.4.7) to migrate a small table. The data is being written successfully in parquet format but the hive sync fails. Here's the Stacktrace. 19/10/14 17:02:12 INFO metastore.ObjectStore: Setting MetaStore object pin classes with

Re: [VOTE] Release 0.5.0-incubating, release candidate #5

2019-10-05 Thread Gurudatt Kulkarni
+1 (non-binding) Ran the script ./release/validate_staged_release.sh --release=0.5.0 --rc_num=5 Checksum Check of Source Release - [OK] Signature Check - [OK] No Binary Files in Source Release? - [OK] DISCLAIMER file exists ? [OK] License file exists ? [OK] Notice file exists ? [OK]

Difference between Apache Hudi and Delta Lake

2019-09-25 Thread Gurudatt Kulkarni
Hi All, What is the difference between Delta Lake and Apache Hudi? Both of them kind of look similar. What are the pros/cons of one over the other? Both are new tools, wasn't able to find any comparisons and the kind of problems each one is more efficient to solve. Regards, Gurudatt