Re: Apache Hudi + Apache Ignite

Tecno Brain Mon, 26 Sep 2022 12:33:09 -0700

Yes.

My understanding is that a Data Lakehouse is  a " format/engine" to improve
processing (over Hadoop or some object storage), replacing pure data files
(avro/parquet) by a new format.
The "lakehouse"  format I am looking at includes Delta, Apache Hudi or
Apache Iceberg.
 In my understanding, they provide a combo of data file + transaction log +
indexes.
I believe compaction and indexing is provided by a set of background
processes.
The idea is to have a "data warehouse" at the price of a "data lake"
(therefore the "lakehouse" term).


I was wondering if putting Apache Ignite on top of the Data Lakehouse could
further improve the performance of it.
I was wondering if someone already tried it and was running such a
configuration successfully.




On Mon, Sep 26, 2022 at 4:40 AM Stephen Darlington <
[email protected]> wrote:

> Similar. The original question was about using the Cache Store (with
> read-through). The architecture described in the Hadoop Acceleration page
> is probably better for most purposes.
>
> On 25 Sep 2022, at 23:25, John Smith <[email protected]> wrote:
>
> Something like this?
>
> https://ignite.apache.org/use-cases/hadoop-acceleration.html
>
> On Thu., Sep. 22, 2022, 3:44 a.m. Stephen Darlington, <
> [email protected]> wrote:
>
>> I don’t know of anyone doing this, however it looks like it should be
>> possible.
>>
>> According to a quick skim of the docs, to read/write to Hudl you need
>> Flink or Spark. To use the Cache Store (read/write-through) you’d need to
>> embed one of those inside Ignite, so plenty of opportunity for “dependency
>> hell.” I do know of one project where they embedded Spark.
>>
>> On 22 Sep 2022, at 03:58, Tecno Brain <[email protected]>
>> wrote:
>>
>> I have heard of a tool called Alluxio used between Hudi and Spark/Presto.
>> (
>> https://www.alluxio.io/blog/building-high-performance-data-lake-using-apache-hudi-and-alluxio-at-t3go/
>> )
>> I was wondering if Apache Ignite could serve the same purpose, allowing
>> queries to be processed faster.
>>
>> On Thu, Sep 15, 2022 at 10:29 AM Jeremy McMillan <
>> [email protected]> wrote:
>>
>>> I just read this, about hudi, and I can't see a use case for putting
>>> hudi behind an Ignite write-through cache.
>>>
>>> https://www.xenonstack.com/insights/what-is-hudi
>>>
>>> Hudi seems to be a write accelerator for Spark on HDFS, primarily.
>>>
>>> What would the expected outcome be if we assume the magic integration
>>> was present and working as you intend? What's the difference between that
>>> and not using Ignite with Hudi?
>>>
>>> On Wed, Sep 14, 2022, 22:50 Tecno Brain <[email protected]>
>>> wrote:
>>>
>>>> In particular I am looking if anyone has used Apache Ignite as a
>>>> write-through cache to Hudi.
>>>> Does that make sense?
>>>>
>>>> On Wed, Sep 14, 2022 at 10:50 PM Tecno Brain <
>>>> [email protected]> wrote:
>>>>
>>>>> I was wondering if anybody has used Hudi + Ignite?
>>>>> Any references to articles, conferences are greatly appreciated.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>
>

Re: Apache Hudi + Apache Ignite

Reply via email to