Hello Sir,
Thank you for the email. I have understood.

Thanks,
Siddharth Salian

From: Danny McCormick via user <user@beam.apache.org>
Date: Thursday, 20 February 2025 at 9:51 PM
To: user@beam.apache.org <user@beam.apache.org>
Cc: Danny McCormick <dannymccorm...@google.com>
Subject: Re: Regarding the GSOC 2025 Project
> Sir, as you have mentioned in the mail, Python is must for this project, I 
> just wanted to ask, what about Java and Golang SDK applications, I mean I 
> know it’s an AI/ML pipeline based project, but if you could tell me it would 
> add to my clarity.

I would expect this project to pretty much exclusively be in Python. The only 
exception is if some vector DB or feature store only offers a Go or Java client 
(but this seems unlikely)

> Sir, I wanted to also ask, as Retrieval Augmented Generation(RAG) has a close 
> relation with this project, don’t you think RAG is still limited to capturing 
> historical data, or it has capability of capturing latest/modern data’s too?

I'm not sure I understand the question, but I can try to give an overview of 
how I think Beam and RAG work together. Basically, I think Beam can be used to:


  1.  Ingest data -> generate embeddings -> write to a vector DB. This can 
include very recent data, it just depends on how you configure your source 
(e.g. you could ingest Data continuously with PubSub or Kafka)
  2.  Ingest incoming query -> enrich with embedding data from a vector DB -> 
perform inference with the additional relevant context -> write result somewhere
So I think this can handle reasonably tight data freshness requirements.

On Tue, Feb 18, 2025 at 11:01 AM SIDDHARTH SALIAN 
<siddharthsalia...@gmail.com<mailto:siddharthsalia...@gmail.com>> wrote:
Respected Sir,


  1.  Thank you for the email. With the reference to the previous mail , I have 
understood all the points and I shall also go through the I/O page in the 
documentation page as well as vector DB’s, features.



  1.  Sir, as you have mentioned in the mail, Python is must for this project, 
I just wanted to ask, what about Java and Golang SDK applications, I mean I 
know it’s an AI/ML pipeline based project, but if you could tell me it would 
add to my clarity.


  1.  Sir, I wanted to also ask, as Retrieval Augmented Generation(RAG) has a 
close relation with this project, don’t you think RAG is still limited to 
capturing historical data, or it has capability of capturing latest/modern 
data’s too?


Best regards,
Thanking you,
Siddharth Salian

From: Danny McCormick via user 
<user@beam.apache.org<mailto:user@beam.apache.org>>
Date: Tuesday, 18 February 2025 at 8:36 PM
To: user@beam.apache.org<mailto:user@beam.apache.org> 
<user@beam.apache.org<mailto:user@beam.apache.org>>
Cc: Danny McCormick 
<dannymccorm...@google.com<mailto:dannymccorm...@google.com>>
Subject: Re: Regarding the GSOC 2025 Project
Hey Siddharth, thanks for reaching out. I'm glad you're interested in the 
project. In general, I would expect there to be more details about projects 
once we know which ones have been accepted.

> Sir, if you could tell me the pre-required knowledge (such as major 
> programming languages used, etc., ) for this project, it would bring more 
> clarity to me sir.

I would expect it to be primarily done in Python, though it depends what 
connectors are available for each vector DB/feature store. Other than that, the 
main things you'd want to learn about are Beam itself, especially about how to 
write a sink (IO 
standards<https://beam.apache.org/documentation/io/io-standards> can help 
here), and also high level how vector DBs and feature stores work.

Thanks,
Danny



On Thu, Feb 13, 2025 at 10:55 PM SIDDHARTH SALIAN 
<siddharthsalia...@gmail.com<mailto:siddharthsalia...@gmail.com>> wrote:
Hello Sir,


  1.  My intention of writing this email is with reference to the GSOC 2025 
mail - https://lists.apache.org/thread/o3mwncq0k4c58c630n49l7bvhq74o2wj


  1.  I’m Siddharth Salian and I’m an undergraduate student and I’m part of 
Apache Beam and I have just joined the community. After going through the GSOC 
2025 idea list and going through the project description, I founded 
https://issues.apache.org/jira/browse/GSOC-279 this project to be interesting 
for me sir. So sir, I would like to contribute to this project in GSOC 2025, 
since AI/ML is area of my interest. Since you are the mentor, I’m letting you 
know sir.



  1.  Sir, if you could tell me the pre-required knowledge (such as major 
programming languages used, etc., ) for this project, it would bring more 
clarity to me sir.



  1.  Sir also wanted to ask is there any other project that you are thinking 
about for GSOC 2025, I would like to contribute in it sir.


Best Regards,

Thanking You
Siddharth Salian


Reply via email to