Answered inline.
On Thu, Nov 14, 2019 at 10:30 AM Zhengxiang Pan wrote:
> .
> Great, this solve my 2nd issue. Followup question: HUDI internal columns (
> _hoodie_record_key, _hoodie_partition_path), now duplicate my existing
> columns(keys), my concern is that this will increase the data
Hey Gurudatt,
Thanks for the great feedback! Comments inlined...
On Wed, Nov 13, 2019 at 10:57 PM Gurudatt Kulkarni
wrote:
> Hi Ethan,
>
> Thanks for the RFC. I have a few observations about the docs. I saw the
> diagram for the docs, asf-site and master are kept as separate branches,
>
.
Great, this solve my 2nd issue. Followup question: HUDI internal columns (
_hoodie_record_key, _hoodie_partition_path), now duplicate my existing
columns(keys), my concern is that this will increase the data size, or I
should not worry about it. secondly, how do you control the partitions? I
Since attachments don't really work on the mailing list, Can you may be
attach them to comments on the RFC itself?
In this scenario, we will get a larger range than is probably in the newly
compacted base file, correct? Current thinking is, yes it will lead to less
efficient pruning by ranges,
Hi,
You might want to subscribe the mailing list, so that the replies actually
make it to the list automatically.
This seems like a class version mismatch between jars, since you. are
getting NoSuchMethodError (and not NoClassDefFound..)
We don't bundle either hadoop or aws or spark jars. There
Hello, can anyone point me to the right dependencies to configure Hudi with to
write to S3
I start the Spark shell with aws sdk and hadoop-aws libs as per the S3 guide
with hudi.conf consists of spark Kryo serialiser and S3 keys.
spark-shell --jars
I have s doubt on the design. I guess this is the right place to discuss.
I want to understand how compaction interplays with this new scheme.
Let's assume all log block are of new format only. Once compaction
completes, those log blocks/files not compacted will have range info
pertaining to