Re: Migrate Existing DataFrame to Hudi DataSet

2019-11-14 Thread Bhavani Sudha
Answered inline. On Thu, Nov 14, 2019 at 10:30 AM Zhengxiang Pan wrote: > . > Great, this solve my 2nd issue. Followup question: HUDI internal columns ( > _hoodie_record_key, _hoodie_partition_path), now duplicate my existing > columns(keys), my concern is that this will increase the data

Re: [DISCUSS] RFC-10: Restructuring and auto-generation of docs

2019-11-14 Thread Y Ethan Guo
Hey Gurudatt, Thanks for the great feedback! Comments inlined... On Wed, Nov 13, 2019 at 10:57 PM Gurudatt Kulkarni wrote: > Hi Ethan, > > Thanks for the RFC. I have a few observations about the docs. I saw the > diagram for the docs, asf-site and master are kept as separate branches, >

Re: Migrate Existing DataFrame to Hudi DataSet

2019-11-14 Thread Zhengxiang Pan
. Great, this solve my 2nd issue. Followup question: HUDI internal columns ( _hoodie_record_key, _hoodie_partition_path), now duplicate my existing columns(keys), my concern is that this will increase the data size, or I should not worry about it. secondly, how do you control the partitions? I

Re: DISCUSS RFC 6 - Add indexing support to the log file

2019-11-14 Thread Vinoth Chandar
Since attachments don't really work on the mailing list, Can you may be attach them to comments on the RFC itself? In this scenario, we will get a larger range than is probably in the newly compacted base file, correct? Current thinking is, yes it will lead to less efficient pruning by ranges,

Re: aws dependencies not working for writing for S3 Write access

2019-11-14 Thread Vinoth Chandar
Hi, You might want to subscribe the mailing list, so that the replies actually make it to the list automatically. This seems like a class version mismatch between jars, since you. are getting NoSuchMethodError (and not NoClassDefFound..) We don't bundle either hadoop or aws or spark jars. There

aws dependencies not working for writing for S3 Write access

2019-11-14 Thread Sudharshan Rajendhiran
Hello, can anyone point me to the right dependencies to configure Hudi with to write to S3 I start the Spark shell with aws sdk and hadoop-aws libs as per the S3 guide with hudi.conf consists of spark Kryo serialiser and S3 keys. spark-shell --jars

Re: DISCUSS RFC 6 - Add indexing support to the log file

2019-11-14 Thread Sivabalan
I have s doubt on the design. I guess this is the right place to discuss. I want to understand how compaction interplays with this new scheme. Let's assume all log block are of new format only. Once compaction completes, those log blocks/files not compacted will have range info pertaining to