GSoc 2017

2017-01-05 Thread Krishna Kalyan
Hello Developers,
I am Krishna, currently a 2nd year Masters student in (MSc. in Data Mining)
currently in Barcelona studying at Université Polytechnique de Catalogne.
I was interested in contributing to SystemML this year under GSoc program.
Could anyone please guide on how to go about it?. (I understand the I need
to write a proposal)

Related Experience:
My masters is mostly focussed on data mining techniques. Before my masters,
I was a  data engineer with IBM (India). I was responsible for managing 50
node Hadoop Cluster for more than a year. Most of my time was spent
optimising and writing ETL (Apache Pig) jobs.

I am the most comfortable with Python followed by R and Scala.

My Webpage
kkalyan.in

My Spark Pull Requests
https://github.com/apache/spark/pulls?utf8=%E2%9C%93=is%3Apr%20author%3Akrishnakalyan3%20

Thank you so much,
Krishna


Re: Release cadence

2017-01-05 Thread Deron Eriksson
+1 for trying out a 1 month release cycle.

However, I highly agree with Matthias that there is a lot of overhead with
releases, so it would be good if we can work to streamline/automate the
process as much as possible. Also, it would be good to distribute the tasks
around as much as possible. This can result in cross-training and help
avoid overburdening the same contributors each month.

If the overhead slows us down too much, then we can go to a slower release
cycle.

Deron




On Thu, Jan 5, 2017 at 1:50 PM,  wrote:

> +1 for adopting a 1 month release cycle.
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Jan 5, 2017, at 1:35 PM, Luciano Resende 
> wrote:
> >
> > On Thu, Jan 5, 2017 at 6:05 AM, Matthias Boehm 
> > wrote:
> >
> >> In general, I like the idea of aiming for consistent release cycles.
> >> However, every month is just too much, at least for me. There is a
> >> considerable overhead associated with each release for end-to-end
> >> performance tests, tests on different environments, code freeze for new
> >> features, etc. Hence, a too short release cycle would not be "agile" but
> >> would actually slow us down. From my perspective, a realistic release
> >> cadence would be 2-3 months, maybe a bit more for major releases.
> >>
> >>
> > 2-3 months of release cadence for an open source is probably a long
> > stretch, particular for a project that does not have very large set of
> 3rd
> > party dependencies.
> >
> > As for some of the overhead issues you mentioned, they are probably easy
> to
> > workaround:
> >
> > - code-freeze timeframe can be resolved with branches
> > - end-to-end performance regressions can be avoided by better code
> review,
> > and if you were willing to go with 2-3 months without performing these
> > tests, we could perform them only for major releases, and proactively
> > quickly build a minor release with the patch when a user report any
> > performance regression.
> >
> >
> > Anyway, I would really like to see SystemML more agile with regards to
> its
> > release process because, as I mentioned before, the release early,
> release
> > often mantra is good to increase community interest, generate more
> traffic
> > to the list as developers discuss the roadmap and release blockers, and
> > also enable users to provide feedback sooner on the areas we are
> developing.
> >
> >
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
>



-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/


Re: Release cadence

2017-01-05 Thread Luciano Resende
On Thu, Jan 5, 2017 at 6:05 AM, Matthias Boehm 
wrote:

> In general, I like the idea of aiming for consistent release cycles.
> However, every month is just too much, at least for me. There is a
> considerable overhead associated with each release for end-to-end
> performance tests, tests on different environments, code freeze for new
> features, etc. Hence, a too short release cycle would not be "agile" but
> would actually slow us down. From my perspective, a realistic release
> cadence would be 2-3 months, maybe a bit more for major releases.
>
>
2-3 months of release cadence for an open source is probably a long
stretch, particular for a project that does not have very large set of 3rd
party dependencies.

As for some of the overhead issues you mentioned, they are probably easy to
workaround:

- code-freeze timeframe can be resolved with branches
- end-to-end performance regressions can be avoided by better code review,
and if you were willing to go with 2-3 months without performing these
tests, we could perform them only for major releases, and proactively
quickly build a minor release with the patch when a user report any
performance regression.


Anyway, I would really like to see SystemML more agile with regards to its
release process because, as I mentioned before, the release early, release
often mantra is good to increase community interest, generate more traffic
to the list as developers discuss the roadmap and release blockers, and
also enable users to provide feedback sooner on the areas we are developing.



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/