Re: Need clarity on these test cases in TestHoodieDeltaStreamer

2020-02-27 Thread Pratyaksh Sharma
Hi Balaji, Right now I am facing some different issue in the same test case. The number of records are not matching and assertion is failing. Once I am able to fix that as well, I will open the PR for sure. :) On Thu, Feb 27, 2020 at 11:17 PM Balaji Varadarajan wrote: > > Awesome Pratyaksh,

Subscribing to commits@

2020-02-27 Thread Vinoth Chandar
Folks, Realized some folks may not have noticed this. But https://lists.apache.org/list.html?comm...@hudi.apache.org has all the github/jira activity, in a single place.. If you are interested in helping others out on the community, please join that list for email notifications. That's how I

Re: Apache Hudi on AWS EMR

2020-02-27 Thread Mehrotra, Udit
Raghvendra, Can you enable TRACE level logging for Hudi on EMR, and provide the error logs. For this go to /etc/spark/conf/log4j.properties and change logging level of log4j.logger.org.apache.hudi to TRACE. This would help provide the failed record/keys based off

Re: Apache Hudi on AWS EMR

2020-02-27 Thread Shiyan Xu
+1 on the idea. Giving an config like `--error-path` where all failed conversions are saved provides flexibility for later processing. SQS/SNS can pick that up later. On Thu, Feb 27, 2020 at 8:10 AM Vinoth Chandar wrote: > On the second part, it seems like a question for EMR folks ? > > Hudi's

Re: Need clarity on these test cases in TestHoodieDeltaStreamer

2020-02-27 Thread Balaji Varadarajan
Awesome Pratyaksh, would you mind opening a PR to documenting it. Balaji.V Sent from Yahoo Mail for iPhone On Wednesday, February 26, 2020, 11:14 PM, Pratyaksh Sharma wrote: Hi, I figured out the issue yesterday. Thank you for helping me out. On Thu, Feb 27, 2020 at 12:01 AM

Re: [DISCUSS] Support for complex record keys with TimestampBasedKeyGenerator

2020-02-27 Thread Vinoth Chandar
+1 for adding a new composite KeyGenerator, which can combine both... Workaround : you can use the Transformer api to do a more flexible key generation as you wish as well. for deltastreamer On Tue, Feb 25, 2020 at 9:37 AM Balaji Varadarajan wrote: > > See if you can have a generic

Re: Apache Hudi on AWS EMR

2020-02-27 Thread Vinoth Chandar
On the second part, it seems like a question for EMR folks ? Hudi's RDD level APIs, do hand the failure records back and .. May be we should consider writing out the error records somewhere for the datasource as well.? others any thoughts? On Mon, Feb 24, 2020 at 10:59 PM Raghvendra Dhar Dubey