No, you will not need a strong understanding of deep learning. I think the
main thing you'll need is an understanding of how embeddings work (
https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-embeddings/).
You won't be training any models as part of this project, so the main
things you'll need to understand are:

1) What an embedding is and how it is used
2) How to use a preexisting model to generate embeddings (there should be
lots of examples of this on the internet)
3) How to write an embedding to a vector DB and how to retrieve embeddings
from a vector DB
4) Similarly, what a feature store is and how to read/write to/from it.

Thanks,
Danny

On Mon, Mar 3, 2025 at 3:38 PM SIDDHARTH SALIAN <siddharthsalia...@gmail.com>
wrote:

> Respected Sir,
>
> Sir, is understanding and strong fundamentals of Deep learning essential
> as a part of project prerequisites? Also sir what about machine learning
> fundamentals. So basically I meant to ask, that sir out of the 3 in the
> AI,ML,DL(the hierarchy circle), which one would you suggest me to have
> strong grip over it especially for this project?
>
>
>
> Thanking you,
>
> Siddharth
>
>
>
> *From: *SIDDHARTH SALIAN <siddharthsalia...@gmail.com>
> *Date: *Tuesday, 4 March 2025 at 1:53 AM
> *To: *Danny McCormick <dannymccorm...@google.com>, Danny McCormick via
> user <user@beam.apache.org>
> *Subject: *Re: Regarding the GSOC 2025 Project
>
> Respected Sir,
>
> Thank you for the email. I have understood. I’ll continue the conversation
> upon this project in this user mailing lists. Anything with regard to ideas
> and opinions, I shall move to dev list.
>
>
>
> Thanking you,
>
> Siddharth
>
>
>
> *From: *Danny McCormick <dannymccorm...@google.com>
> *Date: *Tuesday, 4 March 2025 at 1:50 AM
> *To: *SIDDHARTH SALIAN <siddharthsalia...@gmail.com>
> *Cc: *Danny McCormick via user <user@beam.apache.org>
> *Subject: *Re: Regarding the GSOC 2025 Project
>
> I'd probably recommend using the dev@ list; both are fine, but dev@ is
> probably more likely to have more folks with ideas/opinions.
>
>
>
> Thanks,
>
> Danny
>
>
>
> On Mon, Mar 3, 2025 at 3:17 PM SIDDHARTH SALIAN <
> siddharthsalia...@gmail.com> wrote:
>
> Respected Sir,
>
> Thank you for the email. I have understood. Also sir should I move to dev
> – mailing lists for further conversation on this project? Or I shall
> continue future conversations here at user mailing lists?
>
>
>
> Thanking you,
>
> Siddharth
>
>
>
> *From: *Danny McCormick <dannymccorm...@google.com>
> *Date: *Tuesday, 4 March 2025 at 1:43 AM
> *To: *SIDDHARTH SALIAN <siddharthsalia...@gmail.com>
> *Cc: *Danny McCormick via user <user@beam.apache.org>
> *Subject: *Re: Regarding the GSOC 2025 Project
>
> I think that should be plenty for now, thanks!
>
>
>
> On Mon, Mar 3, 2025 at 3:11 PM SIDDHARTH SALIAN <
> siddharthsalia...@gmail.com> wrote:
>
> Respected Sir,
>
> Thank you for the email. I shall go through the mentioned link. Sir
> anything more to be read currently as a part of project requisites? Or is
> it good for now sir?
>
>
>
> Thanking you,
>
> Siddharth
>
>
>
> *From: *Danny McCormick <dannymccorm...@google.com>
> *Date: *Tuesday, 4 March 2025 at 1:37 AM
> *To: *SIDDHARTH SALIAN <siddharthsalia...@gmail.com>
> *Cc: *user@beam.apache.org <user@beam.apache.org>, damcc...@apache.org <
> damcc...@apache.org>
> *Subject: *Re: Regarding the GSOC 2025 Project
>
> > Sir, with reference to the point about python, I meant to ask that sir,
> like apart from learning the main coding language of python, anything more
> important topic has to be learnt (such as python with ML pipelines, etc.)
> as a part of project prerequisites?
>
>
>
> I think that knowing how to write good python code is the most important
> thing. It might be useful, but not required, to understand how to generate
> embeddings using python and more generally to understand how embeddings
> work [1].
>
>
>
> Thanks,
>
> Danny
>
> [1]
> https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-embeddings/
>
>
>
> On Mon, Mar 3, 2025 at 3:02 PM SIDDHARTH SALIAN <
> siddharthsalia...@gmail.com> wrote:
>
> Respected Sir,
>
>
>
>    1. With reference to the previous email.
>
>
>
>    2. Thank you for the email, I shall follow the mode of communication
>    through mailing lists.
>
>
>
>    3. Sir, with reference to the point about python, I meant to ask that
>    sir, like apart from learning the main coding language of python, anything
>    more important topic has to be learnt (such as python with ML pipelines,
>    etc.) as a part of project prerequisites?
>
>
>
> Best Regards,
>
> Thanking you,
>
> Siddharth Salian
>
>
>
> *From: *Danny McCormick via user <user@beam.apache.org>
> *Date: *Tuesday, 4 March 2025 at 1:25 AM
> *To: *user@beam.apache.org <user@beam.apache.org>
> *Cc: *Danny McCormick <dannymccorm...@google.com>
> *Subject: *Re: Regarding the GSOC 2025 Project
>
> > Sir, apart from strong fundamentals of vector DB’s, python fundamentals,
> Beam docs, writing sink, is there anything much important topic to be
> covered/learnt other than these as part of project prerequisites?
>
>
>
> I think those are the main pieces to consider here.
>
>
>
> > Sir, I also wanted to ask, what all other topics have to be covered in
> python other than the main code language as a part of project prerequisites.
>
>
>
> I don't understand what you're asking - could you try rephrasing?
>
>
>
> > Sir, as the GSOC – 2025 organization list have been released, as well as
> the project list (for GSOC 2025) has been released. As I’ am interested in
> this project and you are the potential mentor for it, if you could please
> tell me which mode of communication would be better - either slack or
> through mailing lists? I’ am asking this because I would want to seek
> multiple helps when needed, when I’ am understanding the project/codebases,
> as It’s a new concept and environment for me. Also continuous mails won’t
> be appealing. Whatever you agree upon sir, we can follow it upon sir.
>
>
>
> Lets keep conversation on the mailing list - that way anyone who is
> interested in the project can benefit.
>
>
>
> Thanks,
>
> Danny
>
>
>
> On Sun, Mar 2, 2025 at 11:48 AM SIDDHARTH SALIAN <
> siddharthsalia...@gmail.com> wrote:
>
> Respected Sir,
>
>
>
>    1. With reference to the previous mails.
>
>
>
>    2. Sir, apart from strong fundamentals of vector DB’s, python
>    fundamentals, Beam docs, writing sink, is there anything much important
>    topic to be covered/learnt other than these as part of project
>    prerequisites?
>
>
>
>    3. Sir, I also wanted to ask, what all other topics have to be covered
>    in python other than the main code language as a part of project
>    prerequisites.
>
>
>
>    4. Sir, as the GSOC – 2025 organization list have been released, as
>    well as the project list (for GSOC 2025) has been released. As I’ am
>    interested in this project and you are the potential mentor for it, if you
>    could please tell me which mode of communication would be better - either
>    slack or through mailing lists? I’ am asking this because I would want to
>    seek multiple helps when needed, when I’ am understanding the
>    project/codebases, as It’s a new concept and environment for me. Also
>    continuous mails won’t be appealing. Whatever you agree upon sir, we can
>    follow it upon sir.
>
>
>
> Best Regards,
>
> Thanking you,
>
> Siddharth Salian
>
>
>
> *From: *SIDDHARTH SALIAN <siddharthsalia...@gmail.com>
> *Date: *Friday, 21 February 2025 at 12:59 AM
> *To: *user@beam.apache.org <user@beam.apache.org>
> *Subject: *Re: Regarding the GSOC 2025 Project
>
> Hello Sir,
>
> Thank you for the email. I have understood.
>
>
>
> Thanks,
>
> Siddharth Salian
>
>
>
> *From: *Danny McCormick via user <user@beam.apache.org>
> *Date: *Thursday, 20 February 2025 at 9:51 PM
> *To: *user@beam.apache.org <user@beam.apache.org>
> *Cc: *Danny McCormick <dannymccorm...@google.com>
> *Subject: *Re: Regarding the GSOC 2025 Project
>
> > Sir, as you have mentioned in the mail, Python is must for this project,
> I just wanted to ask, what about Java and Golang SDK applications, I mean I
> know it’s an AI/ML pipeline based project, but if you could tell me it
> would add to my clarity.
>
>
>
> I would expect this project to pretty much exclusively be in Python. The
> only exception is if some vector DB or feature store only offers a Go or
> Java client (but this seems unlikely)
>
>
>
> > Sir, I wanted to also ask, as Retrieval Augmented Generation(RAG) has a
> close relation with this project, don’t you think RAG is still limited to
> capturing historical data, or it has capability of capturing latest/modern
> data’s too?
>
>
>
> I'm not sure I understand the question, but I can try to give an overview
> of how I think Beam and RAG work together. Basically, I think Beam can be
> used to:
>
>
>
>    1. Ingest data -> generate embeddings -> write to a vector DB. This
>    can include very recent data, it just depends on how you configure your
>    source (e.g. you could ingest Data continuously with PubSub or Kafka)
>    2. Ingest incoming query -> enrich with embedding data from a vector
>    DB -> perform inference with the additional relevant context -> write
>    result somewhere
>
> So I think this can handle reasonably tight data freshness requirements.
>
>
>
> On Tue, Feb 18, 2025 at 11:01 AM SIDDHARTH SALIAN <
> siddharthsalia...@gmail.com> wrote:
>
> Respected Sir,
>
>
>
>    1. Thank you for the email. With the reference to the previous mail ,
>    I have understood all the points and I shall also go through the I/O page
>    in the documentation page as well as vector DB’s, features.
>
>
>
>    2. Sir, as you have mentioned in the mail, Python is must for this
>    project, I just wanted to ask, what about Java and Golang SDK applications,
>    I mean I know it’s an AI/ML pipeline based project, but if you could tell
>    me it would add to my clarity.
>
>
>
>    3. Sir, I wanted to also ask, as Retrieval Augmented Generation(RAG)
>    has a close relation with this project, don’t you think RAG is still
>    limited to capturing historical data, or it has capability of capturing
>    latest/modern data’s too?
>
>
>
> Best regards,
>
> Thanking you,
>
> Siddharth Salian
>
>
>
> *From: *Danny McCormick via user <user@beam.apache.org>
> *Date: *Tuesday, 18 February 2025 at 8:36 PM
> *To: *user@beam.apache.org <user@beam.apache.org>
> *Cc: *Danny McCormick <dannymccorm...@google.com>
> *Subject: *Re: Regarding the GSOC 2025 Project
>
> Hey Siddharth, thanks for reaching out. I'm glad you're interested in the
> project. In general, I would expect there to be more details about projects
> once we know which ones have been accepted.
>
>
>
> > Sir, if you could tell me the pre-required knowledge (such as major
> programming languages used, etc., ) for this project, it would bring more
> clarity to me sir.
>
>
>
> I would expect it to be primarily done in Python, though it depends what
> connectors are available for each vector DB/feature store. Other than that,
> the main things you'd want to learn about are Beam itself, especially about
> how to write a sink (IO standards
> <https://beam.apache.org/documentation/io/io-standards> can help here),
> and also high level how vector DBs and feature stores work.
>
>
>
> Thanks,
>
> Danny
>
>
>
>
>
>
>
> On Thu, Feb 13, 2025 at 10:55 PM SIDDHARTH SALIAN <
> siddharthsalia...@gmail.com> wrote:
>
> Hello Sir,
>
>
>
>    1. My intention of writing this email is with reference to the GSOC
>    2025 mail -
>    https://lists.apache.org/thread/o3mwncq0k4c58c630n49l7bvhq74o2wj
>
>
>
>    2. I’m Siddharth Salian and I’m an undergraduate student and I’m part
>    of Apache Beam and I have just joined the community. After going through
>    the GSOC 2025 idea list and going through the project description, I
>    founded https://issues.apache.org/jira/browse/GSOC-279 this project to
>    be interesting for me sir. So sir, I would like to contribute to this
>    project in GSOC 2025, since AI/ML is area of my interest. Since you are the
>    mentor, I’m letting you know sir.
>
>
>
>    3. Sir, if you could tell me the pre-required knowledge (such as major
>    programming languages used, etc., ) for this project, it would bring more
>    clarity to me sir.
>
>
>
>    4. Sir also wanted to ask is there any other project that you are
>    thinking about for GSOC 2025, I would like to contribute in it sir.
>
>
>
> Best Regards,
>
>
>
> Thanking You
>
> Siddharth Salian
>
>
>
>
>
>

Reply via email to