Re: [NEW member] Hi

2016-06-01 Thread Khurrum Nasim
To the community, active committers, etc. 



> On Jun 1, 2016, at 11:01 AM, Suneel Marthi <smar...@apache.org> wrote:
> 
> Was that question directed to the community or were u asking urself loud ?
> 
> On Wed, Jun 1, 2016 at 10:48 AM, Khurrum Nasim <khurrum.na...@useitc.com>
> wrote:
> 
>> How are you folks getting over the learning curves associated with things
>> like Nifi and AirFlow ?
>> 
>>> On May 28, 2016, at 9:50 AM, Suneel Marthi <smar...@apache.org> wrote:
>>> 
>>> Debo,
>>> 
>>> On Tue, May 17, 2016 at 9:18 PM, Andrew Palumbo <ap@outlook.com>
>> wrote:
>>> 
>>>> We are certainly interested in  online clustering Algorithms, and
>>>> clustering of timeseries seems like a great fit.  (our text
>> vectorization
>>>> pipeline has not yet been reworked for the new Mahout "Samsara" but
>> that is
>>>> an interest too).  What type of compute platform would you require for
>> this?
>>>> 
>>> 
>>> For data processing pipeline, the requirements are :
>>>   (A) it should be agnostic to any distributed processing engine like
>>> Spark, Flink, etc.
>>>   (b) should be able to scale data pipelines and be able to support back
>>> pressure.
>>>   (c) should be able to ingest both Batch and Streaming data from Spark,
>>> Flink, Beam etc...
>>> 
>>>  So far Apache NiFi seems to fit the bill for all of the above criteria
>>> (they don't have a Beam interface yet but is being worked on) and they
>> also
>>> have an excellent GUI along with features to define common workflow
>>> templates that could be imported into custom workflows.
>>> 
>>> The other alternatives being considered are Airbnb's Airflow - proposed
>> for
>>> Apache incubator and defines workflows as a DAG in python,
>>> Apache Beam.
>>> 
>>> 
>>> 
>>>> 
>>>> Currently we are not looking at FPGAs.
>>>> 
>>> 
>>> If any of the Math packages handle FPGAs natively out-of-the-box, let's
>> go
>>> for it. But we need not optimize the heck to get the last bit of
>>> performance from FPGAs.
>>> 
>>> 
>>>> 
>>>> The most recent, and only real Documentation for Mahout Samsara is in
>>>> Apache Mahout: Beyond MapReduce:
>>>> 
>>>> 
>>>> 
>> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html
>> .
>>>> You may want to check that out as a reference.
>>>> 
>>>> (I'm sorry for the shameless plug but it is the only thing that cover
>> most
>>>> all Mahout "Samsara" features and architecture up to our previous
>> release)
>>>> 
>>> 
>>> I don't see this as a shameless plug, its definitely much better than the
>>> dozen low grade books that have been churned out by PackT publishers and
>>> went nowhere, other than bringing disrepute to the project and community.
>>> 
>>> 
>>>> 
>>>> Please do let us know if you have any questions about the Samsara
>> platform.
>>>> 
>>>> From: Debojyoti Dutta <ddu...@gmail.com>
>>>> Sent: Tuesday, May 17, 2016 8:35:04 PM
>>>> To: dev@mahout.apache.org
>>>> Subject: Re: [NEW member] Hi
>>>> 
>>>> Thanks Andy! Would like to see if there is interest for algorithms such
>> as
>>>> 1) clustering text in an online fashion (maybe using LSH or sim/min
>> hash)
>>>> or 2) online clustering of time series. Basically my focus is "online"
>> or
>>>> real time.
>>>> 
>>>> LSH on GPU sounds very interesting and would love to look at the
>> patches.
>>>> Personally have helped accelerate LSH on TCAMs long ago e.g.
>>>> http://arxiv.org/abs/1006.3514  Is GPU the only hw accel you are
>>>> looking at or are you considering PCIe FPGA cards too?
>>>> 
>>>> debo
>>>> 
>>>> On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <ap@outlook.com>
>>>> wrote:
>>>> 
>>>>> Welcome, Debojyoti.
>>>>> We look forward to your contributiins.  We are currently working
>> towards
>>>>> integrating GPU acceleration for our 0.13 release and LSH sounds like a
>>>>> great addition. Could you tell us some more about what you would like
>> to
>>>> do?
>>>>> 
>>>>> Let us know if we can help you get familiar with the mahout code base.
>>>> We
>>>>> try to implement algorithms in the math-scala module.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Andy
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>  Original message 
>>>>> From: Debojyoti Dutta <ddu...@gmail.com>
>>>>> Date: 05/17/2016 8:11 PM (GMT-05:00)
>>>>> To: dev@mahout.apache.org
>>>>> Subject: [NEW member] Hi
>>>>> 
>>>>> Hi there,
>>>>> 
>>>>> Am very interested in contributing to Mahout especially towards fast ML
>>>>> kernels that can be used for streaming. Have some experience with LSH
>>>> based
>>>>> techniques (including hw accel) for clustering and near neighbors based
>>>>> stuff in general.
>>>>> 
>>>>> Was chatting with Sunil and he suggested I join the merry band.
>>>>> 
>>>>> regards
>>>>> -Debo~
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> -Debo~
>>>> 
>> 
>> 



Re: [NEW member] Hi

2016-06-01 Thread Suneel Marthi
Was that question directed to the community or were u asking urself loud ?

On Wed, Jun 1, 2016 at 10:48 AM, Khurrum Nasim <khurrum.na...@useitc.com>
wrote:

> How are you folks getting over the learning curves associated with things
> like Nifi and AirFlow ?
>
> > On May 28, 2016, at 9:50 AM, Suneel Marthi <smar...@apache.org> wrote:
> >
> > Debo,
> >
> > On Tue, May 17, 2016 at 9:18 PM, Andrew Palumbo <ap@outlook.com>
> wrote:
> >
> >> We are certainly interested in  online clustering Algorithms, and
> >> clustering of timeseries seems like a great fit.  (our text
> vectorization
> >> pipeline has not yet been reworked for the new Mahout "Samsara" but
> that is
> >> an interest too).  What type of compute platform would you require for
> this?
> >>
> >
> > For data processing pipeline, the requirements are :
> >(A) it should be agnostic to any distributed processing engine like
> > Spark, Flink, etc.
> >(b) should be able to scale data pipelines and be able to support back
> > pressure.
> >(c) should be able to ingest both Batch and Streaming data from Spark,
> > Flink, Beam etc...
> >
> >   So far Apache NiFi seems to fit the bill for all of the above criteria
> > (they don't have a Beam interface yet but is being worked on) and they
> also
> > have an excellent GUI along with features to define common workflow
> > templates that could be imported into custom workflows.
> >
> > The other alternatives being considered are Airbnb's Airflow - proposed
> for
> > Apache incubator and defines workflows as a DAG in python,
> > Apache Beam.
> >
> >
> >
> >>
> >> Currently we are not looking at FPGAs.
> >>
> >
> > If any of the Math packages handle FPGAs natively out-of-the-box, let's
> go
> > for it. But we need not optimize the heck to get the last bit of
> > performance from FPGAs.
> >
> >
> >>
> >> The most recent, and only real Documentation for Mahout Samsara is in
> >> Apache Mahout: Beyond MapReduce:
> >>
> >>
> >>
> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html
> .
> >> You may want to check that out as a reference.
> >>
> >> (I'm sorry for the shameless plug but it is the only thing that cover
> most
> >> all Mahout "Samsara" features and architecture up to our previous
> release)
> >>
> >
> > I don't see this as a shameless plug, its definitely much better than the
> > dozen low grade books that have been churned out by PackT publishers and
> > went nowhere, other than bringing disrepute to the project and community.
> >
> >
> >>
> >> Please do let us know if you have any questions about the Samsara
> platform.
> >> 
> >> From: Debojyoti Dutta <ddu...@gmail.com>
> >> Sent: Tuesday, May 17, 2016 8:35:04 PM
> >> To: dev@mahout.apache.org
> >> Subject: Re: [NEW member] Hi
> >>
> >> Thanks Andy! Would like to see if there is interest for algorithms such
> as
> >> 1) clustering text in an online fashion (maybe using LSH or sim/min
> hash)
> >> or 2) online clustering of time series. Basically my focus is "online"
> or
> >> real time.
> >>
> >> LSH on GPU sounds very interesting and would love to look at the
> patches.
> >> Personally have helped accelerate LSH on TCAMs long ago e.g.
> >> http://arxiv.org/abs/1006.3514  Is GPU the only hw accel you are
> >> looking at or are you considering PCIe FPGA cards too?
> >>
> >> debo
> >>
> >> On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <ap@outlook.com>
> >> wrote:
> >>
> >>> Welcome, Debojyoti.
> >>> We look forward to your contributiins.  We are currently working
> towards
> >>> integrating GPU acceleration for our 0.13 release and LSH sounds like a
> >>> great addition. Could you tell us some more about what you would like
> to
> >> do?
> >>>
> >>> Let us know if we can help you get familiar with the mahout code base.
> >> We
> >>> try to implement algorithms in the math-scala module.
> >>>
> >>> Thanks,
> >>>
> >>> Andy
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>  Original message 
> >>> From: Debojyoti Dutta <ddu...@gmail.com>
> >>> Date: 05/17/2016 8:11 PM (GMT-05:00)
> >>> To: dev@mahout.apache.org
> >>> Subject: [NEW member] Hi
> >>>
> >>> Hi there,
> >>>
> >>> Am very interested in contributing to Mahout especially towards fast ML
> >>> kernels that can be used for streaming. Have some experience with LSH
> >> based
> >>> techniques (including hw accel) for clustering and near neighbors based
> >>> stuff in general.
> >>>
> >>> Was chatting with Sunil and he suggested I join the merry band.
> >>>
> >>> regards
> >>> -Debo~
> >>>
> >>
> >>
> >>
> >> --
> >> -Debo~
> >>
>
>


Re: [NEW member] Hi

2016-06-01 Thread Khurrum Nasim
How are you folks getting over the learning curves associated with things like 
Nifi and AirFlow ?

> On May 28, 2016, at 9:50 AM, Suneel Marthi <smar...@apache.org> wrote:
> 
> Debo,
> 
> On Tue, May 17, 2016 at 9:18 PM, Andrew Palumbo <ap@outlook.com> wrote:
> 
>> We are certainly interested in  online clustering Algorithms, and
>> clustering of timeseries seems like a great fit.  (our text vectorization
>> pipeline has not yet been reworked for the new Mahout "Samsara" but that is
>> an interest too).  What type of compute platform would you require for this?
>> 
> 
> For data processing pipeline, the requirements are :
>(A) it should be agnostic to any distributed processing engine like
> Spark, Flink, etc.
>(b) should be able to scale data pipelines and be able to support back
> pressure.
>(c) should be able to ingest both Batch and Streaming data from Spark,
> Flink, Beam etc...
> 
>   So far Apache NiFi seems to fit the bill for all of the above criteria
> (they don't have a Beam interface yet but is being worked on) and they also
> have an excellent GUI along with features to define common workflow
> templates that could be imported into custom workflows.
> 
> The other alternatives being considered are Airbnb's Airflow - proposed for
> Apache incubator and defines workflows as a DAG in python,
> Apache Beam.
> 
> 
> 
>> 
>> Currently we are not looking at FPGAs.
>> 
> 
> If any of the Math packages handle FPGAs natively out-of-the-box, let's go
> for it. But we need not optimize the heck to get the last bit of
> performance from FPGAs.
> 
> 
>> 
>> The most recent, and only real Documentation for Mahout Samsara is in
>> Apache Mahout: Beyond MapReduce:
>> 
>> 
>> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html.
>> You may want to check that out as a reference.
>> 
>> (I'm sorry for the shameless plug but it is the only thing that cover most
>> all Mahout "Samsara" features and architecture up to our previous release)
>> 
> 
> I don't see this as a shameless plug, its definitely much better than the
> dozen low grade books that have been churned out by PackT publishers and
> went nowhere, other than bringing disrepute to the project and community.
> 
> 
>> 
>> Please do let us know if you have any questions about the Samsara platform.
>> 
>> From: Debojyoti Dutta <ddu...@gmail.com>
>> Sent: Tuesday, May 17, 2016 8:35:04 PM
>> To: dev@mahout.apache.org
>> Subject: Re: [NEW member] Hi
>> 
>> Thanks Andy! Would like to see if there is interest for algorithms such as
>> 1) clustering text in an online fashion (maybe using LSH or sim/min hash)
>> or 2) online clustering of time series. Basically my focus is "online" or
>> real time.
>> 
>> LSH on GPU sounds very interesting and would love to look at the patches.
>> Personally have helped accelerate LSH on TCAMs long ago e.g.
>> http://arxiv.org/abs/1006.3514  Is GPU the only hw accel you are
>> looking at or are you considering PCIe FPGA cards too?
>> 
>> debo
>> 
>> On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <ap@outlook.com>
>> wrote:
>> 
>>> Welcome, Debojyoti.
>>> We look forward to your contributiins.  We are currently working towards
>>> integrating GPU acceleration for our 0.13 release and LSH sounds like a
>>> great addition. Could you tell us some more about what you would like to
>> do?
>>> 
>>> Let us know if we can help you get familiar with the mahout code base.
>> We
>>> try to implement algorithms in the math-scala module.
>>> 
>>> Thanks,
>>> 
>>> Andy
>>> 
>>> 
>>> 
>>> 
>>> 
>>>  Original message 
>>> From: Debojyoti Dutta <ddu...@gmail.com>
>>> Date: 05/17/2016 8:11 PM (GMT-05:00)
>>> To: dev@mahout.apache.org
>>> Subject: [NEW member] Hi
>>> 
>>> Hi there,
>>> 
>>> Am very interested in contributing to Mahout especially towards fast ML
>>> kernels that can be used for streaming. Have some experience with LSH
>> based
>>> techniques (including hw accel) for clustering and near neighbors based
>>> stuff in general.
>>> 
>>> Was chatting with Sunil and he suggested I join the merry band.
>>> 
>>> regards
>>> -Debo~
>>> 
>> 
>> 
>> 
>> --
>> -Debo~
>> 



Re: [NEW member] Hi

2016-05-28 Thread Suneel Marthi
Debo,

On Tue, May 17, 2016 at 9:18 PM, Andrew Palumbo <ap@outlook.com> wrote:

> We are certainly interested in  online clustering Algorithms, and
> clustering of timeseries seems like a great fit.  (our text vectorization
> pipeline has not yet been reworked for the new Mahout "Samsara" but that is
> an interest too).  What type of compute platform would you require for this?
>

For data processing pipeline, the requirements are :
(A) it should be agnostic to any distributed processing engine like
Spark, Flink, etc.
(b) should be able to scale data pipelines and be able to support back
pressure.
(c) should be able to ingest both Batch and Streaming data from Spark,
Flink, Beam etc...

   So far Apache NiFi seems to fit the bill for all of the above criteria
(they don't have a Beam interface yet but is being worked on) and they also
have an excellent GUI along with features to define common workflow
templates that could be imported into custom workflows.

The other alternatives being considered are Airbnb's Airflow - proposed for
Apache incubator and defines workflows as a DAG in python,
Apache Beam.



>
> Currently we are not looking at FPGAs.
>

If any of the Math packages handle FPGAs natively out-of-the-box, let's go
for it. But we need not optimize the heck to get the last bit of
performance from FPGAs.


>
> The most recent, and only real Documentation for Mahout Samsara is in
> Apache Mahout: Beyond MapReduce:
>
>
> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html.
> You may want to check that out as a reference.
>
> (I'm sorry for the shameless plug but it is the only thing that cover most
> all Mahout "Samsara" features and architecture up to our previous release)
>

I don't see this as a shameless plug, its definitely much better than the
dozen low grade books that have been churned out by PackT publishers and
went nowhere, other than bringing disrepute to the project and community.


>
> Please do let us know if you have any questions about the Samsara platform.
> 
> From: Debojyoti Dutta <ddu...@gmail.com>
> Sent: Tuesday, May 17, 2016 8:35:04 PM
> To: dev@mahout.apache.org
> Subject: Re: [NEW member] Hi
>
> Thanks Andy! Would like to see if there is interest for algorithms such as
> 1) clustering text in an online fashion (maybe using LSH or sim/min hash)
> or 2) online clustering of time series. Basically my focus is "online" or
> real time.
>
> LSH on GPU sounds very interesting and would love to look at the patches.
> Personally have helped accelerate LSH on TCAMs long ago e.g.
> http://arxiv.org/abs/1006.3514  Is GPU the only hw accel you are
> looking at or are you considering PCIe FPGA cards too?
>
> debo
>
> On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <ap@outlook.com>
> wrote:
>
> > Welcome, Debojyoti.
> > We look forward to your contributiins.  We are currently working towards
> > integrating GPU acceleration for our 0.13 release and LSH sounds like a
> > great addition. Could you tell us some more about what you would like to
> do?
> >
> > Let us know if we can help you get familiar with the mahout code base.
> We
> > try to implement algorithms in the math-scala module.
> >
> > Thanks,
> >
> > Andy
> >
> >
> >
> >
> >
> >  Original message 
> > From: Debojyoti Dutta <ddu...@gmail.com>
> > Date: 05/17/2016 8:11 PM (GMT-05:00)
> > To: dev@mahout.apache.org
> > Subject: [NEW member] Hi
> >
> > Hi there,
> >
> > Am very interested in contributing to Mahout especially towards fast ML
> > kernels that can be used for streaming. Have some experience with LSH
> based
> > techniques (including hw accel) for clustering and near neighbors based
> > stuff in general.
> >
> > Was chatting with Sunil and he suggested I join the merry band.
> >
> > regards
> > -Debo~
> >
>
>
>
> --
> -Debo~
>


Re: [NEW member] Hi

2016-05-17 Thread Andrew Palumbo
We are certainly interested in  online clustering Algorithms, and clustering of 
timeseries seems like a great fit.  (our text vectorization pipeline has not 
yet been reworked for the new Mahout "Samsara" but that is an interest too).  
What type of compute platform would you require for this?

Currently we are not looking at FPGAs.

The most recent, and only real Documentation for Mahout Samsara is in  Apache 
Mahout: Beyond MapReduce:

http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html.
  You may want to check that out as a reference.

(I'm sorry for the shameless plug but it is the only thing that cover most all 
Mahout "Samsara" features and architecture up to our previous release)

Please do let us know if you have any questions about the Samsara platform.

From: Debojyoti Dutta <ddu...@gmail.com>
Sent: Tuesday, May 17, 2016 8:35:04 PM
To: dev@mahout.apache.org
Subject: Re: [NEW member] Hi

Thanks Andy! Would like to see if there is interest for algorithms such as
1) clustering text in an online fashion (maybe using LSH or sim/min hash)
or 2) online clustering of time series. Basically my focus is "online" or
real time.

LSH on GPU sounds very interesting and would love to look at the patches.
Personally have helped accelerate LSH on TCAMs long ago e.g.
http://arxiv.org/abs/1006.3514  Is GPU the only hw accel you are
looking at or are you considering PCIe FPGA cards too?

debo

On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <ap@outlook.com> wrote:

> Welcome, Debojyoti.
> We look forward to your contributiins.  We are currently working towards
> integrating GPU acceleration for our 0.13 release and LSH sounds like a
> great addition. Could you tell us some more about what you would like to do?
>
> Let us know if we can help you get familiar with the mahout code base.  We
> try to implement algorithms in the math-scala module.
>
> Thanks,
>
> Andy
>
>
>
>
>
>  Original message 
> From: Debojyoti Dutta <ddu...@gmail.com>
> Date: 05/17/2016 8:11 PM (GMT-05:00)
> To: dev@mahout.apache.org
> Subject: [NEW member] Hi
>
> Hi there,
>
> Am very interested in contributing to Mahout especially towards fast ML
> kernels that can be used for streaming. Have some experience with LSH based
> techniques (including hw accel) for clustering and near neighbors based
> stuff in general.
>
> Was chatting with Sunil and he suggested I join the merry band.
>
> regards
> -Debo~
>



--
-Debo~


Re: [NEW member] Hi

2016-05-17 Thread Debojyoti Dutta
Thanks Andy! Would like to see if there is interest for algorithms such as
1) clustering text in an online fashion (maybe using LSH or sim/min hash)
or 2) online clustering of time series. Basically my focus is "online" or
real time.

LSH on GPU sounds very interesting and would love to look at the patches.
Personally have helped accelerate LSH on TCAMs long ago e.g.
http://arxiv.org/abs/1006.3514  Is GPU the only hw accel you are
looking at or are you considering PCIe FPGA cards too?

debo

On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <ap@outlook.com> wrote:

> Welcome, Debojyoti.
> We look forward to your contributiins.  We are currently working towards
> integrating GPU acceleration for our 0.13 release and LSH sounds like a
> great addition. Could you tell us some more about what you would like to do?
>
> Let us know if we can help you get familiar with the mahout code base.  We
> try to implement algorithms in the math-scala module.
>
> Thanks,
>
> Andy
>
>
>
>
>
>  Original message 
> From: Debojyoti Dutta <ddu...@gmail.com>
> Date: 05/17/2016 8:11 PM (GMT-05:00)
> To: dev@mahout.apache.org
> Subject: [NEW member] Hi
>
> Hi there,
>
> Am very interested in contributing to Mahout especially towards fast ML
> kernels that can be used for streaming. Have some experience with LSH based
> techniques (including hw accel) for clustering and near neighbors based
> stuff in general.
>
> Was chatting with Sunil and he suggested I join the merry band.
>
> regards
> -Debo~
>



-- 
-Debo~


RE: [NEW member] Hi

2016-05-17 Thread Andrew Palumbo
Welcome, Debojyoti.
We look forward to your contributiins.  We are currently working towards 
integrating GPU acceleration for our 0.13 release and LSH sounds like a great 
addition. Could you tell us some more about what you would like to do?

Let us know if we can help you get familiar with the mahout code base.  We try 
to implement algorithms in the math-scala module.

Thanks,

Andy





 Original message 
From: Debojyoti Dutta <ddu...@gmail.com>
Date: 05/17/2016 8:11 PM (GMT-05:00)
To: dev@mahout.apache.org
Subject: [NEW member] Hi

Hi there,

Am very interested in contributing to Mahout especially towards fast ML
kernels that can be used for streaming. Have some experience with LSH based
techniques (including hw accel) for clustering and near neighbors based
stuff in general.

Was chatting with Sunil and he suggested I join the merry band.

regards
-Debo~


[NEW member] Hi

2016-05-17 Thread Debojyoti Dutta
Hi there,

Am very interested in contributing to Mahout especially towards fast ML
kernels that can be used for streaming. Have some experience with LSH based
techniques (including hw accel) for clustering and near neighbors based
stuff in general.

Was chatting with Sunil and he suggested I join the merry band.

regards
-Debo~