I think that should be plenty for now, thanks! On Mon, Mar 3, 2025 at 3:11 PM SIDDHARTH SALIAN <siddharthsalia...@gmail.com> wrote:
> Respected Sir, > > Thank you for the email. I shall go through the mentioned link. Sir > anything more to be read currently as a part of project requisites? Or is > it good for now sir? > > > > Thanking you, > > Siddharth > > > > *From: *Danny McCormick <dannymccorm...@google.com> > *Date: *Tuesday, 4 March 2025 at 1:37 AM > *To: *SIDDHARTH SALIAN <siddharthsalia...@gmail.com> > *Cc: *user@beam.apache.org <user@beam.apache.org>, damcc...@apache.org < > damcc...@apache.org> > *Subject: *Re: Regarding the GSOC 2025 Project > > > Sir, with reference to the point about python, I meant to ask that sir, > like apart from learning the main coding language of python, anything more > important topic has to be learnt (such as python with ML pipelines, etc.) > as a part of project prerequisites? > > > > I think that knowing how to write good python code is the most important > thing. It might be useful, but not required, to understand how to generate > embeddings using python and more generally to understand how embeddings > work [1]. > > > > Thanks, > > Danny > > [1] > https://stackoverflow.blog/2023/11/09/an-intuitive-introduction-to-text-embeddings/ > > > > On Mon, Mar 3, 2025 at 3:02 PM SIDDHARTH SALIAN < > siddharthsalia...@gmail.com> wrote: > > Respected Sir, > > > > 1. With reference to the previous email. > > > > 2. Thank you for the email, I shall follow the mode of communication > through mailing lists. > > > > 3. Sir, with reference to the point about python, I meant to ask that > sir, like apart from learning the main coding language of python, anything > more important topic has to be learnt (such as python with ML pipelines, > etc.) as a part of project prerequisites? > > > > Best Regards, > > Thanking you, > > Siddharth Salian > > > > *From: *Danny McCormick via user <user@beam.apache.org> > *Date: *Tuesday, 4 March 2025 at 1:25 AM > *To: *user@beam.apache.org <user@beam.apache.org> > *Cc: *Danny McCormick <dannymccorm...@google.com> > *Subject: *Re: Regarding the GSOC 2025 Project > > > Sir, apart from strong fundamentals of vector DB’s, python fundamentals, > Beam docs, writing sink, is there anything much important topic to be > covered/learnt other than these as part of project prerequisites? > > > > I think those are the main pieces to consider here. > > > > > Sir, I also wanted to ask, what all other topics have to be covered in > python other than the main code language as a part of project prerequisites. > > > > I don't understand what you're asking - could you try rephrasing? > > > > > Sir, as the GSOC – 2025 organization list have been released, as well as > the project list (for GSOC 2025) has been released. As I’ am interested in > this project and you are the potential mentor for it, if you could please > tell me which mode of communication would be better - either slack or > through mailing lists? I’ am asking this because I would want to seek > multiple helps when needed, when I’ am understanding the project/codebases, > as It’s a new concept and environment for me. Also continuous mails won’t > be appealing. Whatever you agree upon sir, we can follow it upon sir. > > > > Lets keep conversation on the mailing list - that way anyone who is > interested in the project can benefit. > > > > Thanks, > > Danny > > > > On Sun, Mar 2, 2025 at 11:48 AM SIDDHARTH SALIAN < > siddharthsalia...@gmail.com> wrote: > > Respected Sir, > > > > 1. With reference to the previous mails. > > > > 2. Sir, apart from strong fundamentals of vector DB’s, python > fundamentals, Beam docs, writing sink, is there anything much important > topic to be covered/learnt other than these as part of project > prerequisites? > > > > 3. Sir, I also wanted to ask, what all other topics have to be covered > in python other than the main code language as a part of project > prerequisites. > > > > 4. Sir, as the GSOC – 2025 organization list have been released, as > well as the project list (for GSOC 2025) has been released. As I’ am > interested in this project and you are the potential mentor for it, if you > could please tell me which mode of communication would be better - either > slack or through mailing lists? I’ am asking this because I would want to > seek multiple helps when needed, when I’ am understanding the > project/codebases, as It’s a new concept and environment for me. Also > continuous mails won’t be appealing. Whatever you agree upon sir, we can > follow it upon sir. > > > > Best Regards, > > Thanking you, > > Siddharth Salian > > > > *From: *SIDDHARTH SALIAN <siddharthsalia...@gmail.com> > *Date: *Friday, 21 February 2025 at 12:59 AM > *To: *user@beam.apache.org <user@beam.apache.org> > *Subject: *Re: Regarding the GSOC 2025 Project > > Hello Sir, > > Thank you for the email. I have understood. > > > > Thanks, > > Siddharth Salian > > > > *From: *Danny McCormick via user <user@beam.apache.org> > *Date: *Thursday, 20 February 2025 at 9:51 PM > *To: *user@beam.apache.org <user@beam.apache.org> > *Cc: *Danny McCormick <dannymccorm...@google.com> > *Subject: *Re: Regarding the GSOC 2025 Project > > > Sir, as you have mentioned in the mail, Python is must for this project, > I just wanted to ask, what about Java and Golang SDK applications, I mean I > know it’s an AI/ML pipeline based project, but if you could tell me it > would add to my clarity. > > > > I would expect this project to pretty much exclusively be in Python. The > only exception is if some vector DB or feature store only offers a Go or > Java client (but this seems unlikely) > > > > > Sir, I wanted to also ask, as Retrieval Augmented Generation(RAG) has a > close relation with this project, don’t you think RAG is still limited to > capturing historical data, or it has capability of capturing latest/modern > data’s too? > > > > I'm not sure I understand the question, but I can try to give an overview > of how I think Beam and RAG work together. Basically, I think Beam can be > used to: > > > > 1. Ingest data -> generate embeddings -> write to a vector DB. This > can include very recent data, it just depends on how you configure your > source (e.g. you could ingest Data continuously with PubSub or Kafka) > 2. Ingest incoming query -> enrich with embedding data from a vector > DB -> perform inference with the additional relevant context -> write > result somewhere > > So I think this can handle reasonably tight data freshness requirements. > > > > On Tue, Feb 18, 2025 at 11:01 AM SIDDHARTH SALIAN < > siddharthsalia...@gmail.com> wrote: > > Respected Sir, > > > > 1. Thank you for the email. With the reference to the previous mail , > I have understood all the points and I shall also go through the I/O page > in the documentation page as well as vector DB’s, features. > > > > 2. Sir, as you have mentioned in the mail, Python is must for this > project, I just wanted to ask, what about Java and Golang SDK applications, > I mean I know it’s an AI/ML pipeline based project, but if you could tell > me it would add to my clarity. > > > > 3. Sir, I wanted to also ask, as Retrieval Augmented Generation(RAG) > has a close relation with this project, don’t you think RAG is still > limited to capturing historical data, or it has capability of capturing > latest/modern data’s too? > > > > Best regards, > > Thanking you, > > Siddharth Salian > > > > *From: *Danny McCormick via user <user@beam.apache.org> > *Date: *Tuesday, 18 February 2025 at 8:36 PM > *To: *user@beam.apache.org <user@beam.apache.org> > *Cc: *Danny McCormick <dannymccorm...@google.com> > *Subject: *Re: Regarding the GSOC 2025 Project > > Hey Siddharth, thanks for reaching out. I'm glad you're interested in the > project. In general, I would expect there to be more details about projects > once we know which ones have been accepted. > > > > > Sir, if you could tell me the pre-required knowledge (such as major > programming languages used, etc., ) for this project, it would bring more > clarity to me sir. > > > > I would expect it to be primarily done in Python, though it depends what > connectors are available for each vector DB/feature store. Other than that, > the main things you'd want to learn about are Beam itself, especially about > how to write a sink (IO standards > <https://beam.apache.org/documentation/io/io-standards> can help here), > and also high level how vector DBs and feature stores work. > > > > Thanks, > > Danny > > > > > > > > On Thu, Feb 13, 2025 at 10:55 PM SIDDHARTH SALIAN < > siddharthsalia...@gmail.com> wrote: > > Hello Sir, > > > > 1. My intention of writing this email is with reference to the GSOC > 2025 mail - > https://lists.apache.org/thread/o3mwncq0k4c58c630n49l7bvhq74o2wj > > > > 2. I’m Siddharth Salian and I’m an undergraduate student and I’m part > of Apache Beam and I have just joined the community. After going through > the GSOC 2025 idea list and going through the project description, I > founded https://issues.apache.org/jira/browse/GSOC-279 this project to > be interesting for me sir. So sir, I would like to contribute to this > project in GSOC 2025, since AI/ML is area of my interest. Since you are the > mentor, I’m letting you know sir. > > > > 3. Sir, if you could tell me the pre-required knowledge (such as major > programming languages used, etc., ) for this project, it would bring more > clarity to me sir. > > > > 4. Sir also wanted to ask is there any other project that you are > thinking about for GSOC 2025, I would like to contribute in it sir. > > > > Best Regards, > > > > Thanking You > > Siddharth Salian > > > > > >