Re: [NEW member] Hi
To the community, active committers, etc. > On Jun 1, 2016, at 11:01 AM, Suneel Marthi <smar...@apache.org> wrote: > > Was that question directed to the community or were u asking urself loud ? > > On Wed, Jun 1, 2016 at 10:48 AM, Khurrum Nasim <khurrum.na...@useitc.com> > wrote: > >> How are you folks getting over the learning curves associated with things >> like Nifi and AirFlow ? >> >>> On May 28, 2016, at 9:50 AM, Suneel Marthi <smar...@apache.org> wrote: >>> >>> Debo, >>> >>> On Tue, May 17, 2016 at 9:18 PM, Andrew Palumbo <ap@outlook.com> >> wrote: >>> >>>> We are certainly interested in online clustering Algorithms, and >>>> clustering of timeseries seems like a great fit. (our text >> vectorization >>>> pipeline has not yet been reworked for the new Mahout "Samsara" but >> that is >>>> an interest too). What type of compute platform would you require for >> this? >>>> >>> >>> For data processing pipeline, the requirements are : >>> (A) it should be agnostic to any distributed processing engine like >>> Spark, Flink, etc. >>> (b) should be able to scale data pipelines and be able to support back >>> pressure. >>> (c) should be able to ingest both Batch and Streaming data from Spark, >>> Flink, Beam etc... >>> >>> So far Apache NiFi seems to fit the bill for all of the above criteria >>> (they don't have a Beam interface yet but is being worked on) and they >> also >>> have an excellent GUI along with features to define common workflow >>> templates that could be imported into custom workflows. >>> >>> The other alternatives being considered are Airbnb's Airflow - proposed >> for >>> Apache incubator and defines workflows as a DAG in python, >>> Apache Beam. >>> >>> >>> >>>> >>>> Currently we are not looking at FPGAs. >>>> >>> >>> If any of the Math packages handle FPGAs natively out-of-the-box, let's >> go >>> for it. But we need not optimize the heck to get the last bit of >>> performance from FPGAs. >>> >>> >>>> >>>> The most recent, and only real Documentation for Mahout Samsara is in >>>> Apache Mahout: Beyond MapReduce: >>>> >>>> >>>> >> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html >> . >>>> You may want to check that out as a reference. >>>> >>>> (I'm sorry for the shameless plug but it is the only thing that cover >> most >>>> all Mahout "Samsara" features and architecture up to our previous >> release) >>>> >>> >>> I don't see this as a shameless plug, its definitely much better than the >>> dozen low grade books that have been churned out by PackT publishers and >>> went nowhere, other than bringing disrepute to the project and community. >>> >>> >>>> >>>> Please do let us know if you have any questions about the Samsara >> platform. >>>> >>>> From: Debojyoti Dutta <ddu...@gmail.com> >>>> Sent: Tuesday, May 17, 2016 8:35:04 PM >>>> To: dev@mahout.apache.org >>>> Subject: Re: [NEW member] Hi >>>> >>>> Thanks Andy! Would like to see if there is interest for algorithms such >> as >>>> 1) clustering text in an online fashion (maybe using LSH or sim/min >> hash) >>>> or 2) online clustering of time series. Basically my focus is "online" >> or >>>> real time. >>>> >>>> LSH on GPU sounds very interesting and would love to look at the >> patches. >>>> Personally have helped accelerate LSH on TCAMs long ago e.g. >>>> http://arxiv.org/abs/1006.3514 Is GPU the only hw accel you are >>>> looking at or are you considering PCIe FPGA cards too? >>>> >>>> debo >>>> >>>> On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <ap@outlook.com> >>>> wrote: >>>> >>>>> Welcome, Debojyoti. >>>>> We look forward to your contributiins. We are currently working >> towards >>>>> integrating GPU acceleration for our 0.13 release and LSH sounds like a >>>>> great addition. Could you tell us some more about what you would like >> to >>>> do? >>>>> >>>>> Let us know if we can help you get familiar with the mahout code base. >>>> We >>>>> try to implement algorithms in the math-scala module. >>>>> >>>>> Thanks, >>>>> >>>>> Andy >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Original message >>>>> From: Debojyoti Dutta <ddu...@gmail.com> >>>>> Date: 05/17/2016 8:11 PM (GMT-05:00) >>>>> To: dev@mahout.apache.org >>>>> Subject: [NEW member] Hi >>>>> >>>>> Hi there, >>>>> >>>>> Am very interested in contributing to Mahout especially towards fast ML >>>>> kernels that can be used for streaming. Have some experience with LSH >>>> based >>>>> techniques (including hw accel) for clustering and near neighbors based >>>>> stuff in general. >>>>> >>>>> Was chatting with Sunil and he suggested I join the merry band. >>>>> >>>>> regards >>>>> -Debo~ >>>>> >>>> >>>> >>>> >>>> -- >>>> -Debo~ >>>> >> >>
Re: [NEW member] Hi
Was that question directed to the community or were u asking urself loud ? On Wed, Jun 1, 2016 at 10:48 AM, Khurrum Nasim <khurrum.na...@useitc.com> wrote: > How are you folks getting over the learning curves associated with things > like Nifi and AirFlow ? > > > On May 28, 2016, at 9:50 AM, Suneel Marthi <smar...@apache.org> wrote: > > > > Debo, > > > > On Tue, May 17, 2016 at 9:18 PM, Andrew Palumbo <ap@outlook.com> > wrote: > > > >> We are certainly interested in online clustering Algorithms, and > >> clustering of timeseries seems like a great fit. (our text > vectorization > >> pipeline has not yet been reworked for the new Mahout "Samsara" but > that is > >> an interest too). What type of compute platform would you require for > this? > >> > > > > For data processing pipeline, the requirements are : > >(A) it should be agnostic to any distributed processing engine like > > Spark, Flink, etc. > >(b) should be able to scale data pipelines and be able to support back > > pressure. > >(c) should be able to ingest both Batch and Streaming data from Spark, > > Flink, Beam etc... > > > > So far Apache NiFi seems to fit the bill for all of the above criteria > > (they don't have a Beam interface yet but is being worked on) and they > also > > have an excellent GUI along with features to define common workflow > > templates that could be imported into custom workflows. > > > > The other alternatives being considered are Airbnb's Airflow - proposed > for > > Apache incubator and defines workflows as a DAG in python, > > Apache Beam. > > > > > > > >> > >> Currently we are not looking at FPGAs. > >> > > > > If any of the Math packages handle FPGAs natively out-of-the-box, let's > go > > for it. But we need not optimize the heck to get the last bit of > > performance from FPGAs. > > > > > >> > >> The most recent, and only real Documentation for Mahout Samsara is in > >> Apache Mahout: Beyond MapReduce: > >> > >> > >> > http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html > . > >> You may want to check that out as a reference. > >> > >> (I'm sorry for the shameless plug but it is the only thing that cover > most > >> all Mahout "Samsara" features and architecture up to our previous > release) > >> > > > > I don't see this as a shameless plug, its definitely much better than the > > dozen low grade books that have been churned out by PackT publishers and > > went nowhere, other than bringing disrepute to the project and community. > > > > > >> > >> Please do let us know if you have any questions about the Samsara > platform. > >> > >> From: Debojyoti Dutta <ddu...@gmail.com> > >> Sent: Tuesday, May 17, 2016 8:35:04 PM > >> To: dev@mahout.apache.org > >> Subject: Re: [NEW member] Hi > >> > >> Thanks Andy! Would like to see if there is interest for algorithms such > as > >> 1) clustering text in an online fashion (maybe using LSH or sim/min > hash) > >> or 2) online clustering of time series. Basically my focus is "online" > or > >> real time. > >> > >> LSH on GPU sounds very interesting and would love to look at the > patches. > >> Personally have helped accelerate LSH on TCAMs long ago e.g. > >> http://arxiv.org/abs/1006.3514 Is GPU the only hw accel you are > >> looking at or are you considering PCIe FPGA cards too? > >> > >> debo > >> > >> On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <ap@outlook.com> > >> wrote: > >> > >>> Welcome, Debojyoti. > >>> We look forward to your contributiins. We are currently working > towards > >>> integrating GPU acceleration for our 0.13 release and LSH sounds like a > >>> great addition. Could you tell us some more about what you would like > to > >> do? > >>> > >>> Let us know if we can help you get familiar with the mahout code base. > >> We > >>> try to implement algorithms in the math-scala module. > >>> > >>> Thanks, > >>> > >>> Andy > >>> > >>> > >>> > >>> > >>> > >>> Original message > >>> From: Debojyoti Dutta <ddu...@gmail.com> > >>> Date: 05/17/2016 8:11 PM (GMT-05:00) > >>> To: dev@mahout.apache.org > >>> Subject: [NEW member] Hi > >>> > >>> Hi there, > >>> > >>> Am very interested in contributing to Mahout especially towards fast ML > >>> kernels that can be used for streaming. Have some experience with LSH > >> based > >>> techniques (including hw accel) for clustering and near neighbors based > >>> stuff in general. > >>> > >>> Was chatting with Sunil and he suggested I join the merry band. > >>> > >>> regards > >>> -Debo~ > >>> > >> > >> > >> > >> -- > >> -Debo~ > >> > >
Re: [NEW member] Hi
How are you folks getting over the learning curves associated with things like Nifi and AirFlow ? > On May 28, 2016, at 9:50 AM, Suneel Marthi <smar...@apache.org> wrote: > > Debo, > > On Tue, May 17, 2016 at 9:18 PM, Andrew Palumbo <ap@outlook.com> wrote: > >> We are certainly interested in online clustering Algorithms, and >> clustering of timeseries seems like a great fit. (our text vectorization >> pipeline has not yet been reworked for the new Mahout "Samsara" but that is >> an interest too). What type of compute platform would you require for this? >> > > For data processing pipeline, the requirements are : >(A) it should be agnostic to any distributed processing engine like > Spark, Flink, etc. >(b) should be able to scale data pipelines and be able to support back > pressure. >(c) should be able to ingest both Batch and Streaming data from Spark, > Flink, Beam etc... > > So far Apache NiFi seems to fit the bill for all of the above criteria > (they don't have a Beam interface yet but is being worked on) and they also > have an excellent GUI along with features to define common workflow > templates that could be imported into custom workflows. > > The other alternatives being considered are Airbnb's Airflow - proposed for > Apache incubator and defines workflows as a DAG in python, > Apache Beam. > > > >> >> Currently we are not looking at FPGAs. >> > > If any of the Math packages handle FPGAs natively out-of-the-box, let's go > for it. But we need not optimize the heck to get the last bit of > performance from FPGAs. > > >> >> The most recent, and only real Documentation for Mahout Samsara is in >> Apache Mahout: Beyond MapReduce: >> >> >> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html. >> You may want to check that out as a reference. >> >> (I'm sorry for the shameless plug but it is the only thing that cover most >> all Mahout "Samsara" features and architecture up to our previous release) >> > > I don't see this as a shameless plug, its definitely much better than the > dozen low grade books that have been churned out by PackT publishers and > went nowhere, other than bringing disrepute to the project and community. > > >> >> Please do let us know if you have any questions about the Samsara platform. >> >> From: Debojyoti Dutta <ddu...@gmail.com> >> Sent: Tuesday, May 17, 2016 8:35:04 PM >> To: dev@mahout.apache.org >> Subject: Re: [NEW member] Hi >> >> Thanks Andy! Would like to see if there is interest for algorithms such as >> 1) clustering text in an online fashion (maybe using LSH or sim/min hash) >> or 2) online clustering of time series. Basically my focus is "online" or >> real time. >> >> LSH on GPU sounds very interesting and would love to look at the patches. >> Personally have helped accelerate LSH on TCAMs long ago e.g. >> http://arxiv.org/abs/1006.3514 Is GPU the only hw accel you are >> looking at or are you considering PCIe FPGA cards too? >> >> debo >> >> On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <ap@outlook.com> >> wrote: >> >>> Welcome, Debojyoti. >>> We look forward to your contributiins. We are currently working towards >>> integrating GPU acceleration for our 0.13 release and LSH sounds like a >>> great addition. Could you tell us some more about what you would like to >> do? >>> >>> Let us know if we can help you get familiar with the mahout code base. >> We >>> try to implement algorithms in the math-scala module. >>> >>> Thanks, >>> >>> Andy >>> >>> >>> >>> >>> >>> Original message >>> From: Debojyoti Dutta <ddu...@gmail.com> >>> Date: 05/17/2016 8:11 PM (GMT-05:00) >>> To: dev@mahout.apache.org >>> Subject: [NEW member] Hi >>> >>> Hi there, >>> >>> Am very interested in contributing to Mahout especially towards fast ML >>> kernels that can be used for streaming. Have some experience with LSH >> based >>> techniques (including hw accel) for clustering and near neighbors based >>> stuff in general. >>> >>> Was chatting with Sunil and he suggested I join the merry band. >>> >>> regards >>> -Debo~ >>> >> >> >> >> -- >> -Debo~ >>
Re: [NEW member] Hi
Debo, On Tue, May 17, 2016 at 9:18 PM, Andrew Palumbo <ap@outlook.com> wrote: > We are certainly interested in online clustering Algorithms, and > clustering of timeseries seems like a great fit. (our text vectorization > pipeline has not yet been reworked for the new Mahout "Samsara" but that is > an interest too). What type of compute platform would you require for this? > For data processing pipeline, the requirements are : (A) it should be agnostic to any distributed processing engine like Spark, Flink, etc. (b) should be able to scale data pipelines and be able to support back pressure. (c) should be able to ingest both Batch and Streaming data from Spark, Flink, Beam etc... So far Apache NiFi seems to fit the bill for all of the above criteria (they don't have a Beam interface yet but is being worked on) and they also have an excellent GUI along with features to define common workflow templates that could be imported into custom workflows. The other alternatives being considered are Airbnb's Airflow - proposed for Apache incubator and defines workflows as a DAG in python, Apache Beam. > > Currently we are not looking at FPGAs. > If any of the Math packages handle FPGAs natively out-of-the-box, let's go for it. But we need not optimize the heck to get the last bit of performance from FPGAs. > > The most recent, and only real Documentation for Mahout Samsara is in > Apache Mahout: Beyond MapReduce: > > > http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html. > You may want to check that out as a reference. > > (I'm sorry for the shameless plug but it is the only thing that cover most > all Mahout "Samsara" features and architecture up to our previous release) > I don't see this as a shameless plug, its definitely much better than the dozen low grade books that have been churned out by PackT publishers and went nowhere, other than bringing disrepute to the project and community. > > Please do let us know if you have any questions about the Samsara platform. > > From: Debojyoti Dutta <ddu...@gmail.com> > Sent: Tuesday, May 17, 2016 8:35:04 PM > To: dev@mahout.apache.org > Subject: Re: [NEW member] Hi > > Thanks Andy! Would like to see if there is interest for algorithms such as > 1) clustering text in an online fashion (maybe using LSH or sim/min hash) > or 2) online clustering of time series. Basically my focus is "online" or > real time. > > LSH on GPU sounds very interesting and would love to look at the patches. > Personally have helped accelerate LSH on TCAMs long ago e.g. > http://arxiv.org/abs/1006.3514 Is GPU the only hw accel you are > looking at or are you considering PCIe FPGA cards too? > > debo > > On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <ap@outlook.com> > wrote: > > > Welcome, Debojyoti. > > We look forward to your contributiins. We are currently working towards > > integrating GPU acceleration for our 0.13 release and LSH sounds like a > > great addition. Could you tell us some more about what you would like to > do? > > > > Let us know if we can help you get familiar with the mahout code base. > We > > try to implement algorithms in the math-scala module. > > > > Thanks, > > > > Andy > > > > > > > > > > > > Original message > > From: Debojyoti Dutta <ddu...@gmail.com> > > Date: 05/17/2016 8:11 PM (GMT-05:00) > > To: dev@mahout.apache.org > > Subject: [NEW member] Hi > > > > Hi there, > > > > Am very interested in contributing to Mahout especially towards fast ML > > kernels that can be used for streaming. Have some experience with LSH > based > > techniques (including hw accel) for clustering and near neighbors based > > stuff in general. > > > > Was chatting with Sunil and he suggested I join the merry band. > > > > regards > > -Debo~ > > > > > > -- > -Debo~ >
Re: [NEW member] Hi
We are certainly interested in online clustering Algorithms, and clustering of timeseries seems like a great fit. (our text vectorization pipeline has not yet been reworked for the new Mahout "Samsara" but that is an interest too). What type of compute platform would you require for this? Currently we are not looking at FPGAs. The most recent, and only real Documentation for Mahout Samsara is in Apache Mahout: Beyond MapReduce: http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html. You may want to check that out as a reference. (I'm sorry for the shameless plug but it is the only thing that cover most all Mahout "Samsara" features and architecture up to our previous release) Please do let us know if you have any questions about the Samsara platform. From: Debojyoti Dutta <ddu...@gmail.com> Sent: Tuesday, May 17, 2016 8:35:04 PM To: dev@mahout.apache.org Subject: Re: [NEW member] Hi Thanks Andy! Would like to see if there is interest for algorithms such as 1) clustering text in an online fashion (maybe using LSH or sim/min hash) or 2) online clustering of time series. Basically my focus is "online" or real time. LSH on GPU sounds very interesting and would love to look at the patches. Personally have helped accelerate LSH on TCAMs long ago e.g. http://arxiv.org/abs/1006.3514 Is GPU the only hw accel you are looking at or are you considering PCIe FPGA cards too? debo On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <ap@outlook.com> wrote: > Welcome, Debojyoti. > We look forward to your contributiins. We are currently working towards > integrating GPU acceleration for our 0.13 release and LSH sounds like a > great addition. Could you tell us some more about what you would like to do? > > Let us know if we can help you get familiar with the mahout code base. We > try to implement algorithms in the math-scala module. > > Thanks, > > Andy > > > > > > Original message > From: Debojyoti Dutta <ddu...@gmail.com> > Date: 05/17/2016 8:11 PM (GMT-05:00) > To: dev@mahout.apache.org > Subject: [NEW member] Hi > > Hi there, > > Am very interested in contributing to Mahout especially towards fast ML > kernels that can be used for streaming. Have some experience with LSH based > techniques (including hw accel) for clustering and near neighbors based > stuff in general. > > Was chatting with Sunil and he suggested I join the merry band. > > regards > -Debo~ > -- -Debo~
Re: [NEW member] Hi
Thanks Andy! Would like to see if there is interest for algorithms such as 1) clustering text in an online fashion (maybe using LSH or sim/min hash) or 2) online clustering of time series. Basically my focus is "online" or real time. LSH on GPU sounds very interesting and would love to look at the patches. Personally have helped accelerate LSH on TCAMs long ago e.g. http://arxiv.org/abs/1006.3514 Is GPU the only hw accel you are looking at or are you considering PCIe FPGA cards too? debo On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <ap@outlook.com> wrote: > Welcome, Debojyoti. > We look forward to your contributiins. We are currently working towards > integrating GPU acceleration for our 0.13 release and LSH sounds like a > great addition. Could you tell us some more about what you would like to do? > > Let us know if we can help you get familiar with the mahout code base. We > try to implement algorithms in the math-scala module. > > Thanks, > > Andy > > > > > > Original message > From: Debojyoti Dutta <ddu...@gmail.com> > Date: 05/17/2016 8:11 PM (GMT-05:00) > To: dev@mahout.apache.org > Subject: [NEW member] Hi > > Hi there, > > Am very interested in contributing to Mahout especially towards fast ML > kernels that can be used for streaming. Have some experience with LSH based > techniques (including hw accel) for clustering and near neighbors based > stuff in general. > > Was chatting with Sunil and he suggested I join the merry band. > > regards > -Debo~ > -- -Debo~
RE: [NEW member] Hi
Welcome, Debojyoti. We look forward to your contributiins. We are currently working towards integrating GPU acceleration for our 0.13 release and LSH sounds like a great addition. Could you tell us some more about what you would like to do? Let us know if we can help you get familiar with the mahout code base. We try to implement algorithms in the math-scala module. Thanks, Andy Original message From: Debojyoti Dutta <ddu...@gmail.com> Date: 05/17/2016 8:11 PM (GMT-05:00) To: dev@mahout.apache.org Subject: [NEW member] Hi Hi there, Am very interested in contributing to Mahout especially towards fast ML kernels that can be used for streaming. Have some experience with LSH based techniques (including hw accel) for clustering and near neighbors based stuff in general. Was chatting with Sunil and he suggested I join the merry band. regards -Debo~
[NEW member] Hi
Hi there, Am very interested in contributing to Mahout especially towards fast ML kernels that can be used for streaming. Have some experience with LSH based techniques (including hw accel) for clustering and near neighbors based stuff in general. Was chatting with Sunil and he suggested I join the merry band. regards -Debo~