No, you will not need a strong understanding of deep learning. I think the main thing you'll need is an understanding of how embeddings work ( https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-embeddings/). You won't be training any models as part of this project, so the main things you'll need to understand are:
1) What an embedding is and how it is used 2) How to use a preexisting model to generate embeddings (there should be lots of examples of this on the internet) 3) How to write an embedding to a vector DB and how to retrieve embeddings from a vector DB 4) Similarly, what a feature store is and how to read/write to/from it. Thanks, Danny On Mon, Mar 3, 2025 at 3:38 PM SIDDHARTH SALIAN <siddharthsalia...@gmail.com> wrote: > Respected Sir, > > Sir, is understanding and strong fundamentals of Deep learning essential > as a part of project prerequisites? Also sir what about machine learning > fundamentals. So basically I meant to ask, that sir out of the 3 in the > AI,ML,DL(the hierarchy circle), which one would you suggest me to have > strong grip over it especially for this project? > > > > Thanking you, > > Siddharth > > > > *From: *SIDDHARTH SALIAN <siddharthsalia...@gmail.com> > *Date: *Tuesday, 4 March 2025 at 1:53 AM > *To: *Danny McCormick <dannymccorm...@google.com>, Danny McCormick via > user <user@beam.apache.org> > *Subject: *Re: Regarding the GSOC 2025 Project > > Respected Sir, > > Thank you for the email. I have understood. I’ll continue the conversation > upon this project in this user mailing lists. Anything with regard to ideas > and opinions, I shall move to dev list. > > > > Thanking you, > > Siddharth > > > > *From: *Danny McCormick <dannymccorm...@google.com> > *Date: *Tuesday, 4 March 2025 at 1:50 AM > *To: *SIDDHARTH SALIAN <siddharthsalia...@gmail.com> > *Cc: *Danny McCormick via user <user@beam.apache.org> > *Subject: *Re: Regarding the GSOC 2025 Project > > I'd probably recommend using the dev@ list; both are fine, but dev@ is > probably more likely to have more folks with ideas/opinions. > > > > Thanks, > > Danny > > > > On Mon, Mar 3, 2025 at 3:17 PM SIDDHARTH SALIAN < > siddharthsalia...@gmail.com> wrote: > > Respected Sir, > > Thank you for the email. I have understood. Also sir should I move to dev > – mailing lists for further conversation on this project? Or I shall > continue future conversations here at user mailing lists? > > > > Thanking you, > > Siddharth > > > > *From: *Danny McCormick <dannymccorm...@google.com> > *Date: *Tuesday, 4 March 2025 at 1:43 AM > *To: *SIDDHARTH SALIAN <siddharthsalia...@gmail.com> > *Cc: *Danny McCormick via user <user@beam.apache.org> > *Subject: *Re: Regarding the GSOC 2025 Project > > I think that should be plenty for now, thanks! > > > > On Mon, Mar 3, 2025 at 3:11 PM SIDDHARTH SALIAN < > siddharthsalia...@gmail.com> wrote: > > Respected Sir, > > Thank you for the email. I shall go through the mentioned link. Sir > anything more to be read currently as a part of project requisites? Or is > it good for now sir? > > > > Thanking you, > > Siddharth > > > > *From: *Danny McCormick <dannymccorm...@google.com> > *Date: *Tuesday, 4 March 2025 at 1:37 AM > *To: *SIDDHARTH SALIAN <siddharthsalia...@gmail.com> > *Cc: *user@beam.apache.org <user@beam.apache.org>, damcc...@apache.org < > damcc...@apache.org> > *Subject: *Re: Regarding the GSOC 2025 Project > > > Sir, with reference to the point about python, I meant to ask that sir, > like apart from learning the main coding language of python, anything more > important topic has to be learnt (such as python with ML pipelines, etc.) > as a part of project prerequisites? > > > > I think that knowing how to write good python code is the most important > thing. It might be useful, but not required, to understand how to generate > embeddings using python and more generally to understand how embeddings > work [1]. > > > > Thanks, > > Danny > > [1] > https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-embeddings/ > > > > On Mon, Mar 3, 2025 at 3:02 PM SIDDHARTH SALIAN < > siddharthsalia...@gmail.com> wrote: > > Respected Sir, > > > > 1. With reference to the previous email. > > > > 2. Thank you for the email, I shall follow the mode of communication > through mailing lists. > > > > 3. Sir, with reference to the point about python, I meant to ask that > sir, like apart from learning the main coding language of python, anything > more important topic has to be learnt (such as python with ML pipelines, > etc.) as a part of project prerequisites? > > > > Best Regards, > > Thanking you, > > Siddharth Salian > > > > *From: *Danny McCormick via user <user@beam.apache.org> > *Date: *Tuesday, 4 March 2025 at 1:25 AM > *To: *user@beam.apache.org <user@beam.apache.org> > *Cc: *Danny McCormick <dannymccorm...@google.com> > *Subject: *Re: Regarding the GSOC 2025 Project > > > Sir, apart from strong fundamentals of vector DB’s, python fundamentals, > Beam docs, writing sink, is there anything much important topic to be > covered/learnt other than these as part of project prerequisites? > > > > I think those are the main pieces to consider here. > > > > > Sir, I also wanted to ask, what all other topics have to be covered in > python other than the main code language as a part of project prerequisites. > > > > I don't understand what you're asking - could you try rephrasing? > > > > > Sir, as the GSOC – 2025 organization list have been released, as well as > the project list (for GSOC 2025) has been released. As I’ am interested in > this project and you are the potential mentor for it, if you could please > tell me which mode of communication would be better - either slack or > through mailing lists? I’ am asking this because I would want to seek > multiple helps when needed, when I’ am understanding the project/codebases, > as It’s a new concept and environment for me. Also continuous mails won’t > be appealing. Whatever you agree upon sir, we can follow it upon sir. > > > > Lets keep conversation on the mailing list - that way anyone who is > interested in the project can benefit. > > > > Thanks, > > Danny > > > > On Sun, Mar 2, 2025 at 11:48 AM SIDDHARTH SALIAN < > siddharthsalia...@gmail.com> wrote: > > Respected Sir, > > > > 1. With reference to the previous mails. > > > > 2. Sir, apart from strong fundamentals of vector DB’s, python > fundamentals, Beam docs, writing sink, is there anything much important > topic to be covered/learnt other than these as part of project > prerequisites? > > > > 3. Sir, I also wanted to ask, what all other topics have to be covered > in python other than the main code language as a part of project > prerequisites. > > > > 4. Sir, as the GSOC – 2025 organization list have been released, as > well as the project list (for GSOC 2025) has been released. As I’ am > interested in this project and you are the potential mentor for it, if you > could please tell me which mode of communication would be better - either > slack or through mailing lists? I’ am asking this because I would want to > seek multiple helps when needed, when I’ am understanding the > project/codebases, as It’s a new concept and environment for me. Also > continuous mails won’t be appealing. Whatever you agree upon sir, we can > follow it upon sir. > > > > Best Regards, > > Thanking you, > > Siddharth Salian > > > > *From: *SIDDHARTH SALIAN <siddharthsalia...@gmail.com> > *Date: *Friday, 21 February 2025 at 12:59 AM > *To: *user@beam.apache.org <user@beam.apache.org> > *Subject: *Re: Regarding the GSOC 2025 Project > > Hello Sir, > > Thank you for the email. I have understood. > > > > Thanks, > > Siddharth Salian > > > > *From: *Danny McCormick via user <user@beam.apache.org> > *Date: *Thursday, 20 February 2025 at 9:51 PM > *To: *user@beam.apache.org <user@beam.apache.org> > *Cc: *Danny McCormick <dannymccorm...@google.com> > *Subject: *Re: Regarding the GSOC 2025 Project > > > Sir, as you have mentioned in the mail, Python is must for this project, > I just wanted to ask, what about Java and Golang SDK applications, I mean I > know it’s an AI/ML pipeline based project, but if you could tell me it > would add to my clarity. > > > > I would expect this project to pretty much exclusively be in Python. The > only exception is if some vector DB or feature store only offers a Go or > Java client (but this seems unlikely) > > > > > Sir, I wanted to also ask, as Retrieval Augmented Generation(RAG) has a > close relation with this project, don’t you think RAG is still limited to > capturing historical data, or it has capability of capturing latest/modern > data’s too? > > > > I'm not sure I understand the question, but I can try to give an overview > of how I think Beam and RAG work together. Basically, I think Beam can be > used to: > > > > 1. Ingest data -> generate embeddings -> write to a vector DB. This > can include very recent data, it just depends on how you configure your > source (e.g. you could ingest Data continuously with PubSub or Kafka) > 2. Ingest incoming query -> enrich with embedding data from a vector > DB -> perform inference with the additional relevant context -> write > result somewhere > > So I think this can handle reasonably tight data freshness requirements. > > > > On Tue, Feb 18, 2025 at 11:01 AM SIDDHARTH SALIAN < > siddharthsalia...@gmail.com> wrote: > > Respected Sir, > > > > 1. Thank you for the email. With the reference to the previous mail , > I have understood all the points and I shall also go through the I/O page > in the documentation page as well as vector DB’s, features. > > > > 2. Sir, as you have mentioned in the mail, Python is must for this > project, I just wanted to ask, what about Java and Golang SDK applications, > I mean I know it’s an AI/ML pipeline based project, but if you could tell > me it would add to my clarity. > > > > 3. Sir, I wanted to also ask, as Retrieval Augmented Generation(RAG) > has a close relation with this project, don’t you think RAG is still > limited to capturing historical data, or it has capability of capturing > latest/modern data’s too? > > > > Best regards, > > Thanking you, > > Siddharth Salian > > > > *From: *Danny McCormick via user <user@beam.apache.org> > *Date: *Tuesday, 18 February 2025 at 8:36 PM > *To: *user@beam.apache.org <user@beam.apache.org> > *Cc: *Danny McCormick <dannymccorm...@google.com> > *Subject: *Re: Regarding the GSOC 2025 Project > > Hey Siddharth, thanks for reaching out. I'm glad you're interested in the > project. In general, I would expect there to be more details about projects > once we know which ones have been accepted. > > > > > Sir, if you could tell me the pre-required knowledge (such as major > programming languages used, etc., ) for this project, it would bring more > clarity to me sir. > > > > I would expect it to be primarily done in Python, though it depends what > connectors are available for each vector DB/feature store. Other than that, > the main things you'd want to learn about are Beam itself, especially about > how to write a sink (IO standards > <https://beam.apache.org/documentation/io/io-standards> can help here), > and also high level how vector DBs and feature stores work. > > > > Thanks, > > Danny > > > > > > > > On Thu, Feb 13, 2025 at 10:55 PM SIDDHARTH SALIAN < > siddharthsalia...@gmail.com> wrote: > > Hello Sir, > > > > 1. My intention of writing this email is with reference to the GSOC > 2025 mail - > https://lists.apache.org/thread/o3mwncq0k4c58c630n49l7bvhq74o2wj > > > > 2. I’m Siddharth Salian and I’m an undergraduate student and I’m part > of Apache Beam and I have just joined the community. After going through > the GSOC 2025 idea list and going through the project description, I > founded https://issues.apache.org/jira/browse/GSOC-279 this project to > be interesting for me sir. So sir, I would like to contribute to this > project in GSOC 2025, since AI/ML is area of my interest. Since you are the > mentor, I’m letting you know sir. > > > > 3. Sir, if you could tell me the pre-required knowledge (such as major > programming languages used, etc., ) for this project, it would bring more > clarity to me sir. > > > > 4. Sir also wanted to ask is there any other project that you are > thinking about for GSOC 2025, I would like to contribute in it sir. > > > > Best Regards, > > > > Thanking You > > Siddharth Salian > > > > > >