Re: Apache MADlib v1.12 status

2017-08-17 Thread Frank McQuillan
Here is the PR
https://github.com/apache/incubator-madlib/pull/169
Should be merged today

On Wed, Aug 16, 2017 at 10:20 AM, Ed Espino <esp...@apache.org> wrote:

> Frankie,
>
> Are there Jiras for the remaining work? This work (minor changes to neural
> nets) is currently not visible on the release dashboard (
> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12331450
> )
>
> -=e
>
> On Wed, Aug 16, 2017 at 9:52 AM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
>
> > Some doc changes coming for multiple modules, and minor changes to neural
> > nets in the next day or so.
> >
> > Frank
> >
> > On Wed, Aug 16, 2017 at 9:49 AM, Cooper Sloan <csl...@pivotal.io> wrote:
> >
> > > We shouldn't hold up the release.  This is a no-op for now.
> > > If we get more information from the customer we can reopen it, but for
> > now
> > > we will do nothing.
> > >
> > > CS
> > >
> > > On Wed, Aug 16, 2017 at 9:41 AM Ed Espino <esp...@apache.org> wrote:
> > >
> > > > We have one outstanding Apache MADlib v1.12 Jira holding up the
> release
> > > > (MADLIB-1091). It appears Cooper has been working on it and is
> seeking
> > > > additional information. If it is not resolved soon, we need to decide
> > if
> > > we
> > > > will push this to a future release.
> > > >
> > > > FYI: Today I will be performing preliminary convenience binary builds
> > > > following the information provided in the Release Process section
> > titled
> > > > "Prepare rpm and dmg binaries" (
> > > >
> > > > https://cwiki.apache.org/confluence/display/MADLIB/Release+Process#
> > > ReleaseProcess-Preparerpmanddmgbinaries
> > > > ).
> > > > I will undoubtedly be contributing additional information in the
> > section
> > > > and looking for guidance and confirmation of my understanding of the
> > > > convenience binary build environments.
> > > >
> > > > We're almost there!
> > > >
> > > > Cheerios,
> > > > -=e
> > > >
> > > > --
> > > > *Ed Espino*
> > > >
> > >
> >
>
>
>
> --
> *Ed Espino*
>


Re: Apache MADlib v1.12 status

2017-08-16 Thread Frank McQuillan
Some doc changes coming for multiple modules, and minor changes to neural
nets in the next day or so.

Frank

On Wed, Aug 16, 2017 at 9:49 AM, Cooper Sloan  wrote:

> We shouldn't hold up the release.  This is a no-op for now.
> If we get more information from the customer we can reopen it, but for now
> we will do nothing.
>
> CS
>
> On Wed, Aug 16, 2017 at 9:41 AM Ed Espino  wrote:
>
> > We have one outstanding Apache MADlib v1.12 Jira holding up the release
> > (MADLIB-1091). It appears Cooper has been working on it and is seeking
> > additional information. If it is not resolved soon, we need to decide if
> we
> > will push this to a future release.
> >
> > FYI: Today I will be performing preliminary convenience binary builds
> > following the information provided in the Release Process section titled
> > "Prepare rpm and dmg binaries" (
> >
> > https://cwiki.apache.org/confluence/display/MADLIB/Release+Process#
> ReleaseProcess-Preparerpmanddmgbinaries
> > ).
> > I will undoubtedly be contributing additional information in the section
> > and looking for guidance and confirmation of my understanding of the
> > convenience binary build environments.
> >
> > We're almost there!
> >
> > Cheerios,
> > -=e
> >
> > --
> > *Ed Espino*
> >
>


Re: Apache MADlib v1.12 status

2017-08-14 Thread Frank McQuillan
Hi Ed,

We have not been able to reproduce
https://issues.apache.org/jira/browse/MADLIB-1091
so it may move out.

I still have some docs updates to do so that will be a coming PR probably
Tues or Wed.

Frank

On Mon, Aug 14, 2017 at 3:30 PM, Ed Espino  wrote:

> MADlib dev,
>
> We are winding down the number of outstanding issues for the Apache MADlib
> v1.12 release. The one outstanding issue is
> https://issues.apache.org/jira/browse/MADLIB-1091. Once this is resolved,
> I'm hoping to start the release process.
>
> Regards,
> -=e
>
> --
> *Ed Espino*
>


Re: Jira post v1.12 version?

2017-08-14 Thread Frank McQuillan
Ed,

I would suggest v2.0 for the next version, so you can add those 2 JIRAs to
v2.0

Once we get v1.12 out the door I was going to solicit comments from the
community on v2.0 features so we can get that backlog going.

Frank

On Mon, Aug 14, 2017 at 11:30 AM, Ed Espino  wrote:

> Dev,
>
> What are we setting the Jira Fix Version/s for issues to be addressed in
> the next release (post v1.12)? I noticed a v2.0 version (06/Oct/17)
> available in Jira.
>
> The two issues I'd like to set to the next release are the following:
>
> https://issues.apache.org/jira/browse/MADLIB-1025 - MADlib does not
> compile
> with gcc 6.2
> https://issues.apache.org/jira/browse/MADLIB-1145 - Ubuntu 16.04 - Using
> GCC 5 (default gcc) causes Postgres 9.6 crash
>
> Any guidance is greatly appreciated.
>
> Regards
> -=e
>
> --
> *Ed Espino*
>


Re: [VOTE]: MADlib repo(s) migration

2017-08-14 Thread Frank McQuillan
1

On Fri, Aug 11, 2017 at 10:16 AM, Nandish Jayaram 
wrote:

> Hi All,
>
> A gentle reminder to vote if you'd like. I was thinking of opening the
> Apache Infra
> ticket for the move sometime today if there are no more votes to come.
>
> NJ
>
> On Thu, Aug 10, 2017 at 3:39 AM, ChenLiang Wang 
> wrote:
>
> > 1
> >
> > On 08/10/2017 05:47 AM, Orhan Kislal wrote:
> > > 1
> > >
> > > Orhan Kislal
> > >
> > > On Wed, Aug 9, 2017 at 2:32 PM, Nandish Jayaram 
> > wrote:
> > >
> > >> Hi All,
> > >>
> > >> With MADlib's graduation to TLP, it's time to migrate its github
> > >> repos from `*incubator-madlib*` to `*madlib*`. We will have to open
> > >> an Apache Infrastructure ticket to request this move for the following
> > >> repos (along with other stuff like wiki, jenkins etc):
> > >> https://git1-us-west.apache.org/repos/asf?p=incubator-madlib.git
> > >>  (Read/Write)
> > >> https://github.com/apache/incubator-madlib (Github mirror- read only)
> > >> https://git1-us-west.apache.org/repos/asf?p=incubator-madlib-site.git
> > >> https://github.com/apache/incubator-madlib-site (GitHub mirror)
> > >>
> > >> There are two ways to go about this, and the Infra ticket has to be
> > >> raised accordingly.
> > >> 1) Just maintain the current set-up, but have the repos renamed from
> > >> incubator-madlib to madlib.
> > >> 2) Use Gitbox to enable github repo as a R/W repo and not just
> > read-only.
> > >> Check this email (
> > >> https://mail-archives.apache.org/mod_mbox/incubator-madlib-
> > >> dev/201708.mbox/%3cCA+ULb+vP0ViWH4Nc=4eaXvbT0KOmeFtQzp4eAa3p0fKPP7c
> > >> 8...@mail.gmail.com%3e)
> > >> for further information.
> > >>
> > >> Please vote you preference and we can decide to move accordingly.
> > >>
> > >> NJ
> > >>
> > >
> >
>


Re: MADlib Top Level Project Graduation

2017-07-28 Thread Frank McQuillan
There will be a press release put out by the ASF, it is being written now
but there have been some delays with people out on summer vacation.

I will update the web site this afternoon with the news.  I was planning to
wait for the press release, but think I will update it now and add a link
to the press release later when  it comes out.

Frank

On Fri, Jul 28, 2017 at 2:24 PM, Ivan Novick  wrote:

> I just tweeted it and referenced @joe_hellerstein
> 
>
> Let's make some noise.
>
> I am sure there will be more, but we can start on our own.
>
> On Fri, Jul 28, 2017 at 2:18 PM, Joseph Hellerstein <
> hellerst...@berkeley.edu> wrote:
>
>> Is this public? Is anybody planning on putting a news item on a web page
>> or
>> something?
>>
>> Would be good to brag on social media, once that's in place.
>>
>> J
>>
>> On Wed, Jul 26, 2017 at 12:39 AM, Kazmi,Auon H  wrote:
>>
>> > Congrats developers!
>> >
>> > 
>> > From: Woo Jung 
>> > Sent: Monday, July 24, 2017 5:56:35 PM
>> > To: u...@madlib.incubator.apache.org
>> > Cc: dev@madlib.incubator.apache.org
>> > Subject: Re: MADlib Top Level Project Graduation
>> >
>> > Congratulations MADlib -- Oh, the places you'll go! :)
>> >
>> > On Mon, Jul 24, 2017 at 2:29 PM, Greg Chase  wrote:
>> >
>> > > Congrats MADlib team! Very proud of you!
>> > >
>> > > On Mon, Jul 24, 2017 at 2:09 PM, Jarrod Vawdrey 
>> > > wrote:
>> > >
>> > >> Awesome!!! Congrats team.
>> > >>
>> > >> Jarrod Vawdrey
>> > >> (678) 651-0795
>> > >>
>> > >> > On Jul 24, 2017, at 3:57 PM, Joseph Hellerstein <
>> > >> hellerst...@berkeley.edu> wrote:
>> > >> >
>> > >> > Very cool!
>> > >> >
>> > >> > On Mon, Jul 24, 2017 at 2:11 PM, Anirudh Kondaveeti <
>> > >> akondave...@pivotal.io>
>> > >> > wrote:
>> > >> >
>> > >> >> Congrats team!
>> > >> >>
>> > >> >>> On Mon, Jul 24, 2017 at 11:06 AM, Ivan Novick <
>> inov...@pivotal.io>
>> > >> wrote:
>> > >> >>>
>> > >> >>> nice work all!
>> > >> >>>
>> > >> >>> On Mon, Jul 24, 2017 at 11:04 AM, Marshall Presser <
>> > >> mpres...@pivotal.io>
>> > >> >>> wrote:
>> > >> >>>
>> > >>  Woof, woof, woof!  Congrats to the team.
>> > >>  MEP
>> > >> 
>> > >>  On Mon, Jul 24, 2017 at 1:46 PM, FENG, Xixuan (Aaron) <
>> > >>  xixuan.f...@gmail.com> wrote:
>> > >> 
>> > >> > Dear MADlib community,
>> > >> >
>> > >> > I am pleased to report that on July 19, the ASF board
>> established
>> > >> > Apache MADlib as a Top Level Project, which was approved by
>> > >> unanimous
>> > >> > vote of the directors present.
>> > >> >
>> > >> > MADlib entered incubation in the fall of 2015 and made 5
>> releases
>> > >> as an
>> > >> > incubating project.  Along the way, we have worked hard to
>> ensure
>> > >> that
>> > >> >> the
>> > >> > project is being developed according to the principles of the
>> > Apache
>> > >> >> Way.
>> > >> > We will continue to do so in the future as a TLP,  to the best
>> of
>> > >> our
>> > >> > ability.
>> > >> >
>> > >> > Thank you to all of you for your contributions to the project,
>> > and I
>> > >> > look forward to working with you as part of this community!
>> > >> >
>> > >> > Aaron Feng
>> > >> > Vice President, Apache MADlib
>> > >> >
>> > >> 
>> > >> 
>> > >> 
>> > >>  --
>> > >>  Marshall Presser
>> > >>  Pivotal Data Engineering
>> > >>  mpresser@pivotal .io
>> > >>  240.401.1750 <(240)%20401-1750>
>> > >> 
>> > >> 
>> > >> >>>
>> > >> >>>
>> > >> >>> --
>> > >> >>> Ivan Novick, Product Manager Pivotal Greenplum
>> > >> >>> inov...@pivotal.io --  (Mobile) 408-230-6491 <(408)%20230-6491>
>> > >> >>> https://www.youtube.com/GreenplumDatabase
>> > >> >>>
>> > >> >>>
>> > >> >>
>> > >> >>
>> > >> >> --
>> > >> >> Anirudh Kondaveeti, Ph.D. | Principal Data Scientist | Pivotal
>> Data
>> > >> Science
>> > >> >> Team akondave...@pivotal.io | c - 650 483 3985
>> > >> >>
>> > >>
>> > >
>> > >
>> >
>>
>
>
>
> --
> Ivan Novick, Product Manager Pivotal Greenplum
> inov...@pivotal.io --  (Mobile) 408-230-6491 <(408)%20230-6491> --
> (Skype) 512-782-9555 <(512)%20782-9555>
> https://www.youtube.com/GreenplumDatabase
>
>


Re: External references to MADlib incubator project content

2017-07-13 Thread Frank McQuillan
Yes it will re-direct to the TLP location.

On Thu, Jul 13, 2017 at 3:42 PM, Ed Espino  wrote:

> When MADlib graduates, will the previous incubator links redirect to the
> TLP location?  I noticed the following MADlib incubator references in the
> Pivotal Greenplum DB docs::
>
> source page:
> Greenplum MADlib Extension for Analytics
> https://gpdb.docs.pivotal.io/4390/ref_guide/extensions/madlib.html#topic9
>
> link references:
>   MADlib web site is at http://madlib.incubator.apache.org/
>   MADlib documentation is at
> http://madlib.incubator.apache.org/documentation.html
>
> -=e
> --
> *Ed Espino*
>


Re: Apache Jira: MADLIB v1.12-incubating and Metrics dashboard

2017-07-11 Thread Frank McQuillan
tall.sh
> 198:echo "Release notes and additional documentation can be found at
> http://madlib.incubator.apache.org/;
>
> deploy/PackageMaker/Welcome.html
> 6:Welcome to Apache MADlib (incubating)
> 8:Welcome to Apache MADlib (incubating)
> 14:Apache MADlib is an effort undergoing incubation at the Apache
> Software
> 15:Foundation (ASF), sponsored by the Apache Incubator PMC.
> 17:Incubation is required of all newly accepted projects until a further
> 22:While incubation status is not necessarily a reflection of the
>
> deploy/PGXN/META.json.in
> 6:"maintainer": "MADlib contributors <dev@madlib.incubator.apache.org
> >",
> 16:"homepage": "http://madlib.incubator.apache.org/;,
> 21:"url":  "https://github.com/apache/incubator-madlib.git;,
> 22:"web":  "https://github.com/apache/incubator-madlib;,
>
> deploy/PGXN/ReadMe.txt
> 1:Apache MADlib (incubating) Read Me
> 8:See the project web site located at http://madlib.incubator.apache.org/
> for
> 14:The latest documentation of MADlib modules can be found at
> http://madlib.incubator.apache.org/docs
> 27:
> https://github.com/apache/incubator-madlib/blob/master/
> licenses/third_party/_M_widen_init.txt
> 65:Apache MADlib is an effort undergoing incubation at the Apache Software
> 66:Foundation (ASF), sponsored by the Apache Incubator PMC.
> 68:Incubation is required of all newly accepted projects until a further
> 73:While incubation status is not necessarily a reflection of the
>
> doc/etc/developer.doxyfile.in
> 843:USE_MDFILE_AS_MAINPAGE = "
> https://github.com/apache/incubator-madlib/blob/master/README.md;
>
> doc/etc/header.html
> 30:  ga('create', 'UA-45382226-1', 'madlib.incubator.apache.org');
> 44:  http://madlib.incubator.apache.org
> "> alt="Logo" src="$relpath^$projectlogo" height="50"
> style="padding-left:0.5em;" border="0"/ >
>
> doc/mainpage.dox.in
> 3:Apache MADlib (incubating) is an open-source library for scalable
> 14:http://madlib.incubator.apache.org;>MADlib web
> site
> 17:https://mail-archives.apache.org/mod_mbox/incubator-madlib-user/;>User
> mailing list
> 18:https://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/;>Dev
> mailing list
> 35:https://github.com/apache/incubator-madlib/blob/
> master/README.md
> ">ReadMe
> 38:https://github.com/apache/incubator-madlib/blob/master/LICENSE
> ">
>
> src/madpack/madpack.py
> 698:<
> http://madlib.incubator.apache.org/docs/latest/group__
> grp__linreg.html#warning
> >
>
> tool/docker/base/Dockerfile_gpdb_4_3_10
> 34:#ADD ./ /incubator-madlib
> 35:##RUN cd incubator-madlib && \
> 50:## 1) docker run -d -it --name gpdb -v
> (path-to-incubator-madlib)/src:/incubator-madlib/src gpdb bash
> 53:## 2) docker exec -it gpdb /incubator-madlib/build/src/bin/madpack -p
> greenplum -c gpadmin@127.0.0.1:5432/gpadmin install
> 59:## - cd /incubator-madlib/build
> 60:## - make (This can be run after changing code in the incubator-madlib
> source code)
>
> tool/docker/base/Dockerfile_postgres_9_6
> 56:## To build an image from this docker file, from incubator-madlib
> folder, run:
>
> tool/docker/base/Dockerfile_postgres_9_6_Jenkins
> 41:## To build an image from this docker file, from incubator-madlib
> folder, run:
>
> tool/jenkins/jenkins_build.sh
> 48:docker run -d --name madlib -v
> "${workdir}/incubator-madlib":/incubator-madlib
> madlib/postgres_9.6:jenkins
> | tee logs/docker_setup.log
> 50:docker run -d --name madlib -v
> "${workdir}/incubator-madlib":/incubator-madlib
> madlib/postgres_9.6:jenkins
> | tee logs/docker_setup.log
> 60:docker exec madlib bash -c 'rm -rf /build; mkdir /build; cd /build;
> cmake ../incubator-madlib; make clean; make; make install; make package' |
> tee $workdir/logs/madlib_compile.log
> 62:docker exec madlib bash -c 'rm -rf /build; mkdir /build; cd /build;
> cmake ../incubator-madlib; make clean; make; make install; make package' |
> tee $workdir/logs/madlib_compile.log
> 95:python incubator-madlib/tool/jenkins/junit_export.py
> $workdir/logs/madlib_install_check.log
> $workdir/logs/madlib_install_check.xml
> 97:python incubator-madlib/tool/jenkins/junit_export.py $workdir
> $workdir/logs/madlib_install_check.log
> $workdir/logs/madlib_install_check.xml
>
> tool/jenkins/rat_check.sh
> 27:grep "Copyright 2016-$(date +"%Y") The Apache Software Foundation"
> "${workdir}/incubator-madlib/NOTICE"
> 32:grep "$(cat
> "${workdir}/incubator-madlib/src/con

Re: Apache Jira: MADLIB v1.12-incubating and Metrics dashboard

2017-07-10 Thread Frank McQuillan
Thanks Ed, those dashboards are useful and give a good view of things.

Regarding the 1.12 release timing, I suggest we move the release date until
after the next ASF board meeting, which is scheduled for July 19, 2017. The
reason is that MADlib graduation is on the agenda for the ASF meeting and
hopefully it will pass fine.  So I suggest the new release date for 1.12 is
Aug 4, a couple weeks or so later.  I updated the release date in JIRA.

And yes, there is quite a lot of history on this project as it has been
around since 2011 or so, well before the move the ASF in the fall of 2015.

Frank




On Mon, Jul 10, 2017 at 1:58 PM, Ed Espino  wrote:

> The automated Jira report for MADLIB Version v1.12 (UNRELEASED) is also
> useful for getting a very quick view of the release status. It also
> respects the tentative release date (14/Jul/17).
>
> https://issues.apache.org/jira/projects/MADLIB/versions/12340360
>
> -=e
>
> On Mon, Jul 10, 2017 at 1:04 PM, Ed Espino  wrote:
>
> > MADlibers,
> >
> > FYI: In order to get my head wrapped around the current Apache Jira state
> > for the MADlib v1.12 release, I have thrown together a quick dashboard.
> I
> > have made the dashboard and corresponding filters publicly available.
> This
> > will help me/us monitor the release convergence.
> >
> > Apache MADlib v1.12-incubating Release Dashboard:
> > https://issues.apache.org/jira/secure/Dashboard.jspa?
> selectPageId=12331450
> >
> > Additionally, to get a status of the overall Jira state, I also threw
> > together a quick MADlib metrics dashboard. It appears there is a bit of
> > Jira legacy history with the project. :
> > https://issues.apache.org/jira/secure/Dashboard.jspa?
> selectPageId=12331451
> >
> > Please take a quick look and let me know what you think. I can easily
> > adjust the dashboards if needed.
> >
> > Regards,
> > -=e
> >
> >
> > --
> > *Ed Espino*
> >
>
>
>
> --
> *Ed Espino*
>


Re: MADlib Q2 report to ASF

2017-07-10 Thread Frank McQuillan
Thanks for the suggestion Roman.  I updated the report with this additional
information.

Frank

On Mon, Jul 10, 2017 at 2:27 PM, Roman Shaposhnik <ro...@shaposhnik.org>
wrote:

> Looks good, but I'd also add (if not too late) that the resolution for
> graduation was tabled by the board last month and is now being
> re-submitted
>
> On Thu, Jul 6, 2017 at 1:39 AM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
> > Here is the draft ASF report for July 2017, covering Q2 2017 activity.
> >
> > It is posted at http://wiki.apache.org/incubator/July2017
> >
> > Please let me know if you have any comments or suggestions and I will
> > update the report.
> >
> > ---
> >
> > MADlib
> >
> > Big Data Machine Learning in SQL for Data Scientists.
> >
> > MADlib has been incubating since 2015-09-15.
> >
> > Three most important issues to address in the move towards graduation:
> >
> >   1. Finalize trademark transfer from Pivotal to ASF.
> >   2. Continue to produce regular Apache (incubating) releases.
> >   3. Continue to execute and manage the project according to governance
> > model of the "Apache Way”.
> >
> > Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> aware
> > of?
> >
> >   1. The Apache MADlib Project is ready for graduation out of the
> > incubator.
> > Discussion by Project:
> > https://lists.apache.org/thread.html/070c6764fcd0448b2db8975936b52f
> 7a28bd0e231c0e690288a6968e@%3Cdev.madlib.apache.org%3E
> > Vote by IPMC and community:
> > https://lists.apache.org/thread.html/733920464e8f8170d9cc831b701f27
> 5d757ee9448a7bfd05a1bf8dfd@%3Cgeneral.incubator.apache.org%3E
> > Trademark transfer from Pivotal to ASF is being tracked in:
> > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-125
> >
> > How has the community developed since the last report?
> >
> >   1. Some related events in Q2 2017:
> >  * May 25, 2017 - MADlib community call.  Topic:  New Features in
> > Apache MADlib 1.11 (Frank McQuillan)
> >  * Jun 21, 2017 - Greenplum meetup in San Francisco.  Topic:  Apache
> > Solr & MADlib (incubating): Enabling Massive Text Analytics In-Database
> > (Bharath Sitaraman)
> >  * Jul 5-7, 2017 - PG Day Russia.  Topic: Various on “Greenplum Day”
> > Jul 5 including in-database analyitics (Roman Shaposhnik and others)
> >  * Jul 25, 2017 (upcoming) - SF Bay ACM Chapter meetup.  Topic:
> >  Advanced Analytics for Security: Lateral Movement Detection (Anirudh
> > Kondaveti)
> >
> >   2. See material technical conversations on user/dev mailing lists and
> in
> > the appropriate JIRAs and pull requests.
> >
> > How has the project developed since the last report?
> >
> >   1. TLP readiness - maturity evaluation matrix
> > https://cwiki.apache.org/confluence/display/MADLIB/ASF+
> Maturity+Evaluation
> >   2. TLP readiness - graduation resolution
> > https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution
> >   3. TLP readiness - documented release process
> > https://cwiki.apache.org/confluence/display/MADLIB/Release+Process
> >   4. Active work in progress for 6th ASF release MADlib v1.12 scheduled
> for
> > Jul/Aug 2017.  Features include: more graph analytics (weakly connected
> > components, breadth first search, all pairs shortest path, multiple graph
> > measures), neural nets, stratified sampling, train-test split,
> improvements
> > to decision tree & random forest, improvements to summary function
> >   5. Mailing list activity in Q2:  295 postings to dev, 77 postings to
> user.
> >
> > How would you assess the podling's maturity?
> > Please feel free to add your own commentary.
> >
> >   [ ] Initial setup
> >   [ ] Working towards first release
> >   [ ] Community building
> >   [X] Nearing graduation
> >   [ ] Other:
> >
> > Date of last release:
> >
> >   MADlib v1.11 on 5/16/17.
> >
> > When were the last committers or PMC members elected:
> >
> >   Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16.
>


Re: Volunteer: Apache MADlib 1.12 (incubating) release manager

2017-07-07 Thread Frank McQuillan
Hi Ed,

Thank you kindly for your offer to be release manager for 1.12!
We heartily accept your offer!

And it is great that you have experience
on HAWQ - I think the MADlib release process will be very similar to
what you are used to.

We have put together a wiki page on the MADlib release process
https://cwiki.apache.org/confluence/display/MADLIB/Release+Process
so you can have a look there and see the steps. Hopefully no surprises.

We are looking at releasing 1.12 within the next month, depending on
community wishes, and we will be happy to work thru the steps with you.

Again, thanks again for the offer, and we'll talk soon!

Frank



On Fri, Jul 7, 2017 at 12:49 PM, Trevor Grant 
wrote:

> ... that's a very nice gesture-
>
> I'm only a lurker on this mailing list but I'm a PMC on a couple of other
> projects- would be happy to take you up if these folks don't :D
>
>
>
> On Fri, Jul 7, 2017 at 2:42 PM, Ed Espino  wrote:
>
> > MADlib dev,
> >
> > I'm not sure if one has been identified and even though I am not a
> > committer on the project, I would like to volunteer my services to be the
> > release manager for the upcoming Apache MADlib 1.12 (incubating). I have
> > served in this capacity for the Apache HAWQ 2.1.0.0-incubating release
> > (references below). I have had the chance to review several of the
> > previous MADlib releases. I am looking forward to hone my ASF skill set
> and
> > this looks like a very good opportunity.
> >
> > Regards,
> > -=e
> >
> > My release manager participation references:
> > Apache HAWQ 2.1.0.0-incubating dev voting thread:
> > https://lists.apache.org/thread.html/9d3025c12dc032437d1317d662f0e4
> > 434754c00258ca1abdd5c0ab9f@%3Cdev.hawq.apache.org%3E
> >
> > Apache HAWQ 2.1.0.0-incubating IPMC voting thread:
> > https://lists.apache.org/thread.html/1636e892b95475fe0af130d83fa457
> > c3e8bfa0d26f695f6faac0@%3Cgeneral.incubator.apache.org%3E
> >
> > --
> > *Ed Espino*
> >
>


MADlib Q2 report to ASF

2017-07-05 Thread Frank McQuillan
Here is the draft ASF report for July 2017, covering Q2 2017 activity.

It is posted at http://wiki.apache.org/incubator/July2017

Please let me know if you have any comments or suggestions and I will
update the report.

---

MADlib

Big Data Machine Learning in SQL for Data Scientists.

MADlib has been incubating since 2015-09-15.

Three most important issues to address in the move towards graduation:

  1. Finalize trademark transfer from Pivotal to ASF.
  2. Continue to produce regular Apache (incubating) releases.
  3. Continue to execute and manage the project according to governance
model of the "Apache Way”.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

  1. The Apache MADlib Project is ready for graduation out of the
incubator.
Discussion by Project:
https://lists.apache.org/thread.html/070c6764fcd0448b2db8975936b52f7a28bd0e231c0e690288a6968e@%3Cdev.madlib.apache.org%3E
Vote by IPMC and community:
https://lists.apache.org/thread.html/733920464e8f8170d9cc831b701f275d757ee9448a7bfd05a1bf8dfd@%3Cgeneral.incubator.apache.org%3E
Trademark transfer from Pivotal to ASF is being tracked in:
https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-125

How has the community developed since the last report?

  1. Some related events in Q2 2017:
 * May 25, 2017 - MADlib community call.  Topic:  New Features in
Apache MADlib 1.11 (Frank McQuillan)
 * Jun 21, 2017 - Greenplum meetup in San Francisco.  Topic:  Apache
Solr & MADlib (incubating): Enabling Massive Text Analytics In-Database
(Bharath Sitaraman)
 * Jul 5-7, 2017 - PG Day Russia.  Topic: Various on “Greenplum Day”
Jul 5 including in-database analyitics (Roman Shaposhnik and others)
 * Jul 25, 2017 (upcoming) - SF Bay ACM Chapter meetup.  Topic:
 Advanced Analytics for Security: Lateral Movement Detection (Anirudh
Kondaveti)

  2. See material technical conversations on user/dev mailing lists and in
the appropriate JIRAs and pull requests.

How has the project developed since the last report?

  1. TLP readiness - maturity evaluation matrix
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Maturity+Evaluation
  2. TLP readiness - graduation resolution
https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution
  3. TLP readiness - documented release process
https://cwiki.apache.org/confluence/display/MADLIB/Release+Process
  4. Active work in progress for 6th ASF release MADlib v1.12 scheduled for
Jul/Aug 2017.  Features include: more graph analytics (weakly connected
components, breadth first search, all pairs shortest path, multiple graph
measures), neural nets, stratified sampling, train-test split, improvements
to decision tree & random forest, improvements to summary function
  5. Mailing list activity in Q2:  295 postings to dev, 77 postings to user.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

  [ ] Initial setup
  [ ] Working towards first release
  [ ] Community building
  [X] Nearing graduation
  [ ] Other:

Date of last release:

  MADlib v1.11 on 5/16/17.

When were the last committers or PMC members elected:

  Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16.


Update on graduation to TLP

2017-06-26 Thread Frank McQuillan
Hello MADlib community,

As you may know, the MADlib project is proceeding towards TLP status
and was on the agenda at the 6/21/17 ASF board meeting. At that meeting
the board tabled (postponed) voting on the MADlib graduation resolution.

Mark and John (copied) from the ASF sent the MADlib PMC
some information regarding the postponement, and I would like to
briefly summarize the current status with you:

If I may quote Mark directly:

“The MADlib graduation resolution triggered some discussion on board@
that started when it was pointed out that the registered MADlib marks
had not yet been transferred to the ASF. During that discussion, IPMC
members expressed differing views on whether MADlib should graduate or
not because of the missing trademark assignment.”

I know that the current owner of the MADlib trademark (Pivotal)
is currently working with the ASF to transfer the trademarks,
and we are hoping that this can be achieved with minimum delay.
Here is the related JIRA
https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-125

Also, Mark also took the time to review the MADlib archives,
to see if there were any indications that MADlib would benefit from
remaining longer in the incubator. This includes the licensing
described here
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Licensing+Guidance

He reported:

“I’ve looked through the archives and nothing jumped out at me as
problematic.”
&
“I don't see anything in the licensing situation that should prevent
graduation.”

Hopefully I have represented the current status correctly, but I
have copied Mark and John on this thread in the case they would like
to add/correct anything.

We will keep you posted as we learn more.  In the interim, we will
continue to work hard on the upcoming 1.12 release and look forward to
more great tech coming out of this project.

Frank


Candidate 1.12 JIRAs

2017-06-05 Thread Frank McQuillan
Hello,

There is a pretty healthy list of candidate 1.12 JIRAs:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.12%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC

If anyone in the community would like to contribute to any of these JIRAs,
or has any suggestions, comments or additions/subtractions, it would be
great to hear your thoughts.

Frank


1.11 release presentation

2017-05-25 Thread Frank McQuillan
https://drive.google.com/open?id=0B62dTQMossK9STNETUR0aGZkRFE

>From recent MADlib community call.   Video will be posted shortly.

Frank


Re: Announcing MADlib v1.11 GA

2017-05-17 Thread Frank McQuillan
Thank you Rashmi for being release manager for 1.11

Just a reminder to the MADlib community that examples of most of the new
features in 1.11 are included in the Juypyter notebooks posted at
https://github.com/apache/incubator-madlib-site/tree/asf-site/community-artifacts
Look for the most recent notebooks.

Looking forward to hearing what the community is interested in for 1.12
release.  Please share your ideas here or in JIRA.

Frank






On Tue, May 16, 2017 at 3:49 PM, Rashmi Raghu  wrote:

> MADlib v1.11 is now generally available.
>
> The vote was PASSED by Incubator PMC members:
> http://mail-archives.apache.org/mod_mbox/incubator-general/201705.mbox/%
> 3CCAMtNjok4BJaSzG=yfkqcdfnqrrvedeomuf2jvxz6giatg85...@mail.gmail.com%3E
>
> The source and binaries are posted at:
> https://dist.apache.org/repos/dist/release/incubator/madlib/
> 1.11-incubating/
>
> Release notes:
> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.11
>
> User documentation:
> http://madlib.incubator.apache.org/docs/latest/
>
> We look forward to community participation for the next release v1.12 and
> moving towards TLP graduation!
>
> Regards,
> Rashmi Raghu
>
> --
> Rashmi Raghu, Ph.D.
> Pivotal Data Science
>


Re: Progress towards graduation

2017-05-16 Thread Frank McQuillan
Roman,

I had another read thru the graduation resolution
https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution
and maturity evaluation
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Maturity+Evaluation

Both look pretty complete to me at this point.   Other parts of the wiki
have a lot of
info about the project and getting started and such, so looks comprehensive
enough for a project at this level of maturity.

Frank



On Mon, May 15, 2017 at 11:26 AM, Roman Shaposhnik  wrote:

> Hi!
>
> based on the discussion that happened on privated@madlib
> I've updated the PMC roster and a PMC Chair position:
> https://cwiki.apache.org/confluence/display/MADLIB/
> Graduation+Resolution
>
> I'd like to ask everyone to read through this doc once more
> and let me know if something is missing.
>
> Also, I need all those who are currently committers on
> MADlib but are NOT on the PMC to reply to this thread
> if they wish to keep being committers on the project.
>
> Thanks,
> Roman.
>


Re: [DISCUSS] Graduation

2017-05-04 Thread Frank McQuillan
Thanks Roman.

I agree that this project is in the correct state to qualify as a TLP,
and would like to help move that forward.

In addition to the
https://cwiki.apache.org/confluence/display/MADLIB/Graduation+Resolution
that you mention, we also created a check list
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Maturity+Evaluation
which aims to describe where the project stands according the the Apache
project maturity model.

I would encourage members of the Apache MADlib community to take a look
at the check list and comment on any of the items there.

The project mgmt part of the wiki
https://cwiki.apache.org/confluence/display/MADLIB/Project+Management
also gives a pretty good snapshot of the project as it stands today.

Frank


On Thu, May 4, 2017 at 10:44 AM, Rahul Iyer  wrote:

> Hi Roman,
>
> Many thanks for your excellent mentorship!
>
> Your #2 and #3 proposals sound good to me and I look forward to the
> discussion on private@.
>
> - Rahul
>
>
> On Fri, Apr 28, 2017 at 10:47 AM, Roman Shaposhnik  wrote:
> > Hi!
> >
> > with the fifth (v1.11) release in the final stages of being cut,
> > I think now would be a good time to officially start our graduation
> > discussion. With my mentor hat on, I feel that the project is
> > mature and self-reliant enough to qualify as a TLP.
> >
> > Process-wise graduation consists of drafting a board resolution,
> > getting it approved by the IPMC and finally submitting it to the ASF
> > board's consideration. At the very minimum your resolution will contain:
> > 1. A name of the project (I assume that'll be MADlib)
> > 2. A list of proposed PMC members
> > 3. A proposed PMC chair
> > A good example of a resolution can be found here:
> > https://cwiki.apache.org/confluence/display/FINERACT/
> Graduation+Resolution
> >
> > In fact, Frank and I took the liberty to use that as the basis for our
> own:
> >  https://cwiki.apache.org/confluence/display/MADLIB/
> Graduation+Resolution
> > Please read it carefully and let us know what do you think.
> >
> > On #2 my suggestion would be to have an opt-in system. Basically
> > we will kick off the thread off on private@madlib asking current PPMC
> > members if they are willing to continue on the PMC.
> >
> > On #3 I typically recommend podlings I mentor to setup a rotating chair
> > policy. This is, in no way, an ASF requirement so feel free to ignore it,
> > but it worked well before. The chair will be expected up for rotation
> every
> > year. It will be more that ok for the same person to self-nominate once
> > the year is up -- but at the same time it'll be up to the same person to
> > actually kick off a thread asking if anybody else is interested in
> serving
> > as a chair for the next year. Of course, if there multiple candidates
> there
> > will have to be a vote.
> >
> > Speaking of self-nomination -- the same thread that we're going to kick
> > off as part of solving for #2 will ask for folks to self-nominate as an
> initial
> > chair to be listed on the resolution.
> >
> > Unless somebody objects strongly to my #2 and #3 proposals I'm going
> > to kick of this thread on private@.
> >
> > With that in mind, lets make the rest of the discussion on dev@ to be
> about
> > collecting the datapoints to present to IPCM as part of us asking them to
> > vote YES on our graduation. Lets collect all these data points in the
> same
> > wiki page:
> > https://cwiki.apache.org/confluence/display/MADLIB/
> Graduation+Resolution
> > Or if you feel that a discussion may be needed -- just reply to this
> thread.
> >
> > Thanks,
> > Roman.
>


Re: [VOTE] MADlib v1.11-rc2

2017-05-02 Thread Frank McQuillan
Thanks for updating to RC-2, Rashmi.

I just tried the dmg on OSX on PG9.6 on my local machine and the soft link
seems to be set correctly now, since it upgraded 1.11 over 1.10 OK.  When I
uninstalled MADlib and did a fresh install, that worked fine too for 1.11.
So...

+1

Frank

On Tue, May 2, 2017 at 5:01 PM, Rashmi Raghu  wrote:

> Hello MADlib community,
>
> We have created a MADlib 1.11 RC-2, with the artifacts below (source and
> convenience binaries) up for a vote.
>
> Note that voting for the RC-1 release has been cancelled due to the need
> for minor corrections based on community feedback. Sorry for the
> inconvenience.
>
> RC-2 replaces RC-1 with the following minor changes:
> * Ensure source tarball unpacks into a folder
> * Ensure soft links are correct for OS X installations
>
> This will be the 5th release for Apache MADlib (incubating).
>
> The main goals of this release are:
> * new module (PageRank for graph analytics with grouping support included)
> * improvements to existing modules (add grouping support to Single Source
> Shortest Path, reduce memory footprint of DT and RF, include NULL features
> in training DT, add support for array and svec output for Pivot module,
> utility to unnest 2-D arrays into rows of 1-D arrays)
> * platform updates (GPDB 5)
> * updates for Apache Top Level Project readiness and build process on
> Apache infrastructure
> * bug fixes
> * doc improvements
>
> For more information including release notes, please see:
> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.11
>
> *** Please download, review and vote by Fri May 05, 2017 @ 6pm PDT ***
>
> We're voting upon the source and convenience binaries below:
>
> Source Repository (tag):  rc/1.11-rc2
> https://github.com/apache/incubator-madlib/tree/rc/1.11-rc2
>
> Source Files and convenience Binaries:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
> 11-incubating-rc2/
>
> Commit:
> https://github.com/apache/incubator-madlib/commit/
> d54be2b8574c5bf0ace96b94ba81f3e5cbf70a35
>
> KEYS file containing PGP Keys we use to sign the release:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
>
> To help in tallying the vote, PMC members please be sure to indicate
> "(binding)" with the vote.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
>
> Regards,
> Rashmi Raghu
>
> --
> Rashmi Raghu, Ph.D.
> Pivotal Data Science
>


Re: [VOTE] MADlib v1.11-rc1

2017-05-02 Thread Frank McQuillan
Hi Ed,

The binaries are up for a vote for 1.11.  Although we have hosted
convenience binaries on the Apache dist site in the past, we have not
explicitly included them in the RC's that we have posted for community vote
on a release.  Guidance from one of our mentors suggested that we start to
do this, so we have included binaries this time around.

Frank



On Tue, May 2, 2017 at 9:57 AM, Xiaocheng Tang <xiaochen...@gmail.com>
wrote:

> +1
>
> Xiaocheng
> --
> *From:* Daisy Zhe Wang <dai...@ufl.edu>
> *Sent:* Tuesday, May 2, 2017 6:48:43 AM
> *To:* dev@madlib.incubator.apache.org
> *Cc:* u...@madlib.incubator.apache.org
> *Subject:* Re: [VOTE] MADlib v1.11-rc1
>
>
> +1
>
> On Mon, 1 May 2017 18:18:32 -0700
> Joe Hellerstein <hellerst...@berkeley.edu> wrote:
>
> > +1
> >
> > Sent from a telephone.
> >
> > > On May 1, 2017, at 6:12 PM, Ivan Novick <inov...@pivotal.io> wrote:
> > >
> > > +1
> > >
> > >> On Mon, May 1, 2017 at 5:56 PM, ChenLiang Wang
> > >> <hi181904...@msn.com> wrote:
> > >>
> > >> +1
> > >>
> > >>> On 05/02/2017 08:23 AM, Woo Jae Jung wrote:
> > >>> +1
> > >>>
> > >>> On Mon, May 1, 2017 at 4:45 PM, Frank McQuillan
> > >>> <fmcquil...@pivotal.io> wrote:
> > >>>
> > >>>> +1
> > >>>>
> > >>>> On Mon, May 1, 2017 at 4:31 PM, Jarrod Vawdrey
> > >>>> <jvawd...@pivotal.io> wrote:
> > >>>>
> > >>>>> +1
> > >>>>>
> > >>>>>
> > >>>>> Jarrod Vawdrey
> > >>>>> Sr. Data Scientist
> > >>>>> Data Science & Engineering | Pivotal Atlanta
> > >>>>> (678) 651-0795
> > >>>>> https://pivotal.io/
> > >>>>>
> > >>>>> On Mon, May 1, 2017 at 7:30 PM, Orhan Kislal
> > >>>>> <okis...@pivotal.io>
> > >> wrote:
> > >>>>>
> > >>>>>> +1
> > >>>>>>
> > >>>>>> On Mon, May 1, 2017 at 4:25 PM, Rahul Iyer
> > >>>>>> <rahulri...@gmail.com>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>>> +1
> > >>>>>>>
> > >>>>>>>> On May 1, 2017 3:55 PM, "Rashmi Raghu" <rra...@pivotal.io>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>> Hello MADlib community,
> > >>>>>>>>
> > >>>>>>>> We have created a MADlib 1.11 RC-1, with the artifacts below
> > >>>>>>>> up for
> > >> a
> > >>>>>>> vote.
> > >>>>>>>>
> > >>>>>>>> This will be the 5th release for Apache MADlib (incubating).
> > >>>>>>>>
> > >>>>>>>> The main goals of this release are:
> > >>>>>>>> * new module (PageRank for graph analytics with grouping
> > >>>>>>>> support
> > >>>>>>> included)
> > >>>>>>>> * improvements to existing modules (add grouping support to
> > >>>>>>>> Single
> > >>>>>>> Source
> > >>>>>>>> Shortest Path, reduce memory footprint of DT and RF, include
> > >>>>>>>> NULL
> > >>>>>>> features
> > >>>>>>>> in training DT, add support for array and svec output for
> > >>>>>>>> Pivot
> > >>>>> module,
> > >>>>>>>> utility to unnest 2-D arrays into rows of 1-D arrays)
> > >>>>>>>> * platform updates (GPDB 5)
> > >>>>>>>> * updates for Apache Top Level Project readiness and build
> > >>>>>>>> process
> > >> on
> > >>>>>>>> Apache infrastructure
> > >>>>>>>> * bug fixes
> > >>>>>>>> * doc improvements
> > >>>>>>>>
> > >>>>>>>> For more information including release notes, please see:
> > >>>>>>>> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.11
> > >>>>>>>>
> > >>>>>>>> *** Please download, review and vote by Thu May 04, 2017 @
> > >>>>>>>> 6pm PDT
> > >>>>> ***
> > >>>>>>>>
> > >>>>>>>> We're voting upon the source (tag):  rc/1.11-rc1
> > >>>>>>>> https://github.com/apache/incubator-madlib/tree/rc/1.11-rc1
> > >>>>>>>>
> > >>>>>>>> Source Files:
> > >>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
> > >>>>>>>> 11-incubating-rc1/
> > >>>>>>>>
> > >>>>>>>> Commit to be voted upon:
> > >>>>>>>> https://github.com/apache/incubator-madlib/commit/
> > >>>>>>>> 0ff829a7060d08f284e8468ebf35c31b6e231d58
> > >>>>>>>>
> > >>>>>>>> KEYS file containing PGP Keys we use to sign the release:
> > >>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
> > >>>>>>>>
> > >>>>>>>> To help in tallying the vote, PMC members please be sure to
> > >>>>>>>> indicate "(binding)" with the vote.
> > >>>>>>>>
> > >>>>>>>> [ ] +1  approve
> > >>>>>>>> [ ] +0  no opinion
> > >>>>>>>> [ ] -1  disapprove (and reason why)
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Regards,
> > >>>>>>>> Rashmi Raghu
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> Rashmi Raghu, Ph.D.
> > >>>>>>>> Pivotal Data Science
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > Ivan Novick
> > > Product Manager Pivotal Greenplum
> > > https://www.youtube.com/GreenplumDatabase
>
>


Re: [VOTE] MADlib v1.11-rc1

2017-05-01 Thread Frank McQuillan
+1

On Mon, May 1, 2017 at 4:31 PM, Jarrod Vawdrey  wrote:

> +1
>
>
> Jarrod Vawdrey
> Sr. Data Scientist
> Data Science & Engineering | Pivotal Atlanta
> (678) 651-0795
> https://pivotal.io/
>
> On Mon, May 1, 2017 at 7:30 PM, Orhan Kislal  wrote:
>
> > +1
> >
> > On Mon, May 1, 2017 at 4:25 PM, Rahul Iyer  wrote:
> >
> >> +1
> >>
> >> On May 1, 2017 3:55 PM, "Rashmi Raghu"  wrote:
> >>
> >> > Hello MADlib community,
> >> >
> >> > We have created a MADlib 1.11 RC-1, with the artifacts below up for a
> >> vote.
> >> >
> >> > This will be the 5th release for Apache MADlib (incubating).
> >> >
> >> > The main goals of this release are:
> >> > * new module (PageRank for graph analytics with grouping support
> >> included)
> >> > * improvements to existing modules (add grouping support to Single
> >> Source
> >> > Shortest Path, reduce memory footprint of DT and RF, include NULL
> >> features
> >> > in training DT, add support for array and svec output for Pivot
> module,
> >> > utility to unnest 2-D arrays into rows of 1-D arrays)
> >> > * platform updates (GPDB 5)
> >> > * updates for Apache Top Level Project readiness and build process on
> >> > Apache infrastructure
> >> > * bug fixes
> >> > * doc improvements
> >> >
> >> > For more information including release notes, please see:
> >> > https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.11
> >> >
> >> > *** Please download, review and vote by Thu May 04, 2017 @ 6pm PDT ***
> >> >
> >> > We're voting upon the source (tag):  rc/1.11-rc1
> >> > https://github.com/apache/incubator-madlib/tree/rc/1.11-rc1
> >> >
> >> > Source Files:
> >> > https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
> >> > 11-incubating-rc1/
> >> >
> >> > Commit to be voted upon:
> >> > https://github.com/apache/incubator-madlib/commit/
> >> > 0ff829a7060d08f284e8468ebf35c31b6e231d58
> >> >
> >> > KEYS file containing PGP Keys we use to sign the release:
> >> > https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
> >> >
> >> > To help in tallying the vote, PMC members please be sure to indicate
> >> > "(binding)" with the vote.
> >> >
> >> > [ ] +1  approve
> >> > [ ] +0  no opinion
> >> > [ ] -1  disapprove (and reason why)
> >> >
> >> >
> >> > Regards,
> >> > Rashmi Raghu
> >> >
> >> > --
> >> > Rashmi Raghu, Ph.D.
> >> > Pivotal Data Science
> >> >
> >>
> >
> >
>


Fwd: Github's disappearing mirrors

2017-04-28 Thread Frank McQuillan
fyi

-- Forwarded message --
From: Chris Lambertus 
Date: Fri, Apr 28, 2017 at 12:22 PM
Subject: Github's disappearing mirrors
To: committers 


Hello committers,

We have received quite a few reports of github mirrors gone missing. We’ve
tracked this down to an errant process at Github which appears to be
deleting
not only ours but also other orgs’ mirrors. We contacted Github but have
yet to
receive a reply. Another organization also contacted github and received the
following reply:

"Hi there, Sorry for the trouble! We've now had a couple of reports of this
problem, and we've opened an issue internally to investigate.  I don't have
an
ETA on a fix, but we'll be in touch if we need more information from you or
if
we have any information to share.  Regards, Laura GitHub Support”


We have no further information at this time. We have been restoring the
mirrors
wherever possible, but until the root cause is resolved on Github’s side, we
expect mirrors to continue to be erroneously removed.

Access to the repos via the usual https://git-wip-us.apache.org/ channel
remains functional.

-Chris
ASF Infra


signature.asc
Description: PGP signature


Re: 1.11 release planning

2017-04-17 Thread Frank McQuillan
Thank you Rashmi.   Other folks in the community who have done it before
can help you out with the details.

Frank

On Mon, Apr 17, 2017 at 2:28 PM, Rashmi Raghu <rra...@pivotal.io> wrote:

> I volunteer to be release manager.
>
> Thanks,
> Rashmi
>
> On Mon, Apr 17, 2017 at 2:26 PM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
>
> > We are getting closer to having a RC for the 1.11 release - perhaps
> within
> > a week or so.
> >
> > After this release, we will be applying for graduation to TLP status in
> the
> > ASF, so hopefully this will be the last incubating release.
> >
> > The JIRAs for 1.11 are:
> >
> > https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%
> > 20fixVersion%20%3D%20v1.11%20ORDER%20BY%20due%20ASC%2C%
> > 20priority%20DESC%2C%20created%20ASC
> >
> > Any volunteers out there to be release manager?
> >
> > Thanks,
> > Frank
> >
>
>
>
> --
> Rashmi Raghu, Ph.D.
> Pivotal Data Science
>


1.11 release planning

2017-04-17 Thread Frank McQuillan
We are getting closer to having a RC for the 1.11 release - perhaps within
a week or so.

After this release, we will be applying for graduation to TLP status in the
ASF, so hopefully this will be the last incubating release.

The JIRAs for 1.11 are:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.11%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC

Any volunteers out there to be release manager?

Thanks,
Frank


Re: Graph SSSP Scale Tests

2017-04-05 Thread Frank McQuillan
Not yet for HAWQ.  For PostgreSQL, the larger data sets would be too big...

On Wed, Apr 5, 2017 at 2:11 PM, Greg Chase  wrote:

> Very nice!
>
> Do you have similar benchmarks for HAWQ and PostgreSQL?
>
> On Wed, Apr 5, 2017 at 1:26 PM, Ivan Novick  wrote:
>
> > looks good!
> >
> > On Thu, Apr 6, 2017 at 2:49 AM, Orhan Kislal  wrote:
> >
> > > Hello MADlib community,
> > >
> > >
> > >
> > > We have been doing some additional scale testing on SSSP introduced in
> > the
> > > 1.10 release
> > >
> > > http://madlib.incubator.apache.org/docs/latest/group__grp__sssp.html
> > >
> > >
> > >
> > > A sample of results, going up to 100M vertices and 5B edges can be
> found
> > in
> > > the following links:
> > >
> > >
> > > https://drive.google.com/file/d/0B62dTQMossK9eml5LV9EZ09LcmM/
> > > view?usp=sharing
> > >
> > > https://drive.google.com/file/d/0B62dTQMossK9dU1rSEs1TTBZN1U/
> > > view?usp=sharing
> > >
> > >
> > > So scaling looks pretty good.
> > >
> > >
> > >
> > > Please let me know if you have any comments.
> > >
> > >
> > > Orhan Kislal
> > >
> > > ­­
> > >
> >
> >
> >
> > --
> > Ivan Novick
> > Product Manager Pivotal Greenplum
> > https://www.youtube.com/GreenplumDatabase
> >
>


Re: DRAFT ASF report for MADlib for Q1 2017

2017-03-31 Thread Frank McQuillan
RM:
Ed, so far we don't have a release manager.  Any volunteers out there?

PR #75
 https://github.com/apache/incubator-madlib/pull/75
introduces mini-batching for SVM but also has potential application to
other algorithms.  It is not yet merged because it is part of an epic
https://issues.apache.org/jira/browse/MADLIB-1047
and those stories are still in flight or awaiting contributors

New usage or contributions
Certainly some good features planned for this release.  Lots of time still
for community members to contribute.  For example:
rashmi.ra...@gmail.com expressed an interest in working on
https://issues.apache.org/jira/browse/MADLIB-1086 and possible stratified
sampling

Frank








On Fri, Mar 31, 2017 at 11:11 AM, Ed Espino <esp...@apache.org> wrote:

> My naivete on the proper content of the Apache quarterly reports are
> obvious to me and most likely to you ... but here are a couple of
> observations/questions on the draft.
>
>- As the next release is scheduled for April 2017, has someone
>volunteered to be the release manager?
>- I noticed there is one PR #75
><https://github.com/apache/incubator-madlib/pull/75> outstanding since
>Nov 2016. I think it might be worthy of noting that the dev community is
>reviewing PRs actively. This helps promote future contributions.
>- Have there been any significant new usage stories or contributions
>from the community to highlight?
>
> -=e
>
> On Thu, Mar 30, 2017 at 11:32 AM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
>
> > Here is the draft ASF report for Apr 2017, covering Q1 2017 activity.
> >
> > It is posted at http://wiki.apache.org/incubator/April2017
> >
> > Please let me know if you have any comments or suggestions and I will
> > update the report.
> >
> > ---
> >
> > MADlib
> >
> > Big Data Machine Learning in SQL for Data Scientists.
> >
> > MADlib has been incubating since 2015-09-15.
> >
> > Three most important issues to address in the move towards graduation:
> >
> >   1. Continue to produce regular Apache (incubating) releases.
> >   2. Continue to execute and manage the project according to governance
> > model of the "Apache Way”.
> >   3. Continue to build community.
> >
> > Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> aware
> > of?
> >
> >  1. The next release v1.11 will be the 5th as an incubating project.  We
> > believe this release will meet all requirements for a clean ASF release,
> > based on listening to guidance from the IPMC over the previous releases.
> > After that, the community would ideally like to move towards top level
> > status.
> >  2.  The licensing issues have been resolved.  Should anyone want to
> > review, we have summarized the issue and resolution with relevant links
> on
> > the MADlib wiki at
> > https://cwiki.apache.org/confluence/display/MADLIB/ASF+
> Licensing+Guidance
> >
> > How has the community developed since the last report?
> >
> >   1. Some related events in Q1 2017:
> >   * Feb 4, 2017 - Presentation at FOSDEM’17 Graph devroom.  Topic:
> >  Graph Analytics on Massively Parallel Processing Databases (Frank
> > McQuillan)
> > * Feb 2, 2017 - Greenplum meetup in SF.  Topic:  Machine Learning and
> Cyber
> > Security with Greenplum and Apache MADlib (Anirudh Kondaveeti, Frank
> > McQuillan)
> > * Mar 23, 2017 - MADlib community call.  Topic:  New Features in Apache
> > MADlib 1.10 (Frank McQuillan)
> >   2. See material technical conversations on user/dev mailing lists and
> in
> > the appropriate JIRAs and pull requests.
> >
> > How has the project developed since the last report?
> >
> >   1. Build infra set up on Apache infra
> > https://builds.apache.org/job/madlib-master-build/
> >   2. Docker image with necessary dependencies required to compile and
> test
> > MADlib on PostgreSQL 9.6
> > https://cwiki.apache.org/confluence/display/MADLIB/
> Quick+Start+Guide+for+
> > Developers#QuickStartGuideforDevelopers-Dock
> >   3. Active work in progress for 5th ASF release MADlib v1.11 scheduled
> for
> > Apr 2017.  Features include: PageRank, connected components, stratified
> > sampling, improvements to decision tree & random forest, array & sparse
> > vector output for pivot
> >   4. Mailing list activity in Q1 to date:  274 postings to dev, 111
> > postings to user.
> >
> > How would you assess the podling's maturity?
> > Please feel free to add your own commentary.
> >
> >   [ ] Initial setup
> >   [ ] Working towards first release
> >   [ ] Community building
> >   [X] Nearing graduation
> >   [ ] Other:
> >
> > Date of last release:
> >
> >   MADlib v1.10 on 3/10/17.
> >
> > When were the last committers or PMC members elected:
> >
> >   Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16.
> >
>
>
>
> --
> *Ed Espino*
>


DRAFT ASF report for MADlib for Q1 2017

2017-03-30 Thread Frank McQuillan
Here is the draft ASF report for Apr 2017, covering Q1 2017 activity.

It is posted at http://wiki.apache.org/incubator/April2017

Please let me know if you have any comments or suggestions and I will
update the report.

---

MADlib

Big Data Machine Learning in SQL for Data Scientists.

MADlib has been incubating since 2015-09-15.

Three most important issues to address in the move towards graduation:

  1. Continue to produce regular Apache (incubating) releases.
  2. Continue to execute and manage the project according to governance
model of the "Apache Way”.
  3. Continue to build community.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

 1. The next release v1.11 will be the 5th as an incubating project.  We
believe this release will meet all requirements for a clean ASF release,
based on listening to guidance from the IPMC over the previous releases.
After that, the community would ideally like to move towards top level
status.
 2.  The licensing issues have been resolved.  Should anyone want to
review, we have summarized the issue and resolution with relevant links on
the MADlib wiki at
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Licensing+Guidance

How has the community developed since the last report?

  1. Some related events in Q1 2017:
  * Feb 4, 2017 - Presentation at FOSDEM’17 Graph devroom.  Topic:
 Graph Analytics on Massively Parallel Processing Databases (Frank
McQuillan)
* Feb 2, 2017 - Greenplum meetup in SF.  Topic:  Machine Learning and Cyber
Security with Greenplum and Apache MADlib (Anirudh Kondaveeti, Frank
McQuillan)
* Mar 23, 2017 - MADlib community call.  Topic:  New Features in Apache
MADlib 1.10 (Frank McQuillan)
  2. See material technical conversations on user/dev mailing lists and in
the appropriate JIRAs and pull requests.

How has the project developed since the last report?

  1. Build infra set up on Apache infra
https://builds.apache.org/job/madlib-master-build/
  2. Docker image with necessary dependencies required to compile and test
MADlib on PostgreSQL 9.6
https://cwiki.apache.org/confluence/display/MADLIB/Quick+Start+Guide+for+Developers#QuickStartGuideforDevelopers-Dock
  3. Active work in progress for 5th ASF release MADlib v1.11 scheduled for
Apr 2017.  Features include: PageRank, connected components, stratified
sampling, improvements to decision tree & random forest, array & sparse
vector output for pivot
  4. Mailing list activity in Q1 to date:  274 postings to dev, 111
postings to user.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

  [ ] Initial setup
  [ ] Working towards first release
  [ ] Community building
  [X] Nearing graduation
  [ ] Other:

Date of last release:

  MADlib v1.10 on 3/10/17.

When were the last committers or PMC members elected:

  Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16.


Re: Apache Jenkins MADlib projects

2017-03-14 Thread Frank McQuillan
Here are the relevant JIRAs:

Docker image
https://issues.apache.org/jira/browse/MADLIB-920

PR integration
https://issues.apache.org/jira/browse/MADLIB-1080
* this is the new JIRA - Ed please provide the HAWQ links in this thread or
add them to this JIRA

Thanks,
Frank


On Tue, Mar 14, 2017 at 9:38 AM, Rahul Iyer  wrote:

> Thanks, Ed.
>
> The master and PR integration would be quite useful for MADlib and are on
> the cards. We're in the process of wrapping our docker work; once that goes
> in, we can finalize these other projects.
> It would be easier for us to start with the HAWQ projects as references -
> could you please post their links?
>
> Best,
> iR
>
> On Tue, Mar 14, 2017 at 8:15 AM, Ed Espino  wrote:
>
> > I see Apache Jenkins build service testing in madlib-test-build
> >  is being worked on.
> > This
> > is pretty cool for the dev community. Is there a set of projects and
> GitHub
> > *master* branch and *Pull Request* (PR) integration points being worked
> on?
> >
> > For what it is worth, here are some integration points we have for the
> HAWQ
> > project that may be of use to MADlib:
> >
> >- For each Pull Request (PR), perform the following checks (these go
> >along with the default conflict check performed automatically by
> > github):
> >   - Perform build (compilation) and Apache Release Audit Tool (RAT)
> >   check
> >- For each master branch submission:
> >   - Perform build (compilation)
> >   - Perform Apache Release Audit Tool (RAT) check
> >   - Add "Embeddable Build Status Icon" to the project's README.md:
> >   https://builds.apache.org/job/madlib-test-build/badge/
> >
> > Cheers,
> > -=e
> >
> > --
> > *Ed Espino*
> >
>


Announcing MADlib v1.10 GA

2017-03-10 Thread Frank McQuillan
MADlib v1.10 is now generally available.

The vote was PASSED by Incubator PMC members:
http://mail-archives.apache.org/mod_mbox/incubator-general/201703.mbox/%3CCAKBQfzTSxD1e53iTnNbci89HYXoyah9kg-8zLts83_8kMRtWGw%40mail.gmail.com%3E

Special thanks to mentor Roman Shaposhnik for his help in resolving some
thorny legal issues leading up to this release.

The source and binaries are posted at:
https://dist.apache.org/repos/dist/release/incubator/madlib/1.10.0-incubating/

Release notes:
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.10

User documentation:
http://madlib.incubator.apache.org/docs/latest/

We look forward to community participation for the next release v1.11 and
moving towards TLP graduation!

Regards,
Frank McQuillan


Re: [VOTE] MADlib v1.10-rc2

2017-03-09 Thread Frank McQuillan
I see.  In that case I will remove those links from the download page when
I update the web site tomorrow announcing the 1.10 release (assuming that
it goes thru IPMC voting OK).

Frank

On Thu, Mar 9, 2017 at 2:30 PM, Roman Shaposhnik <ro...@shaposhnik.org>
wrote:

> CCing dev@madlib
>
> On Thu, Mar 9, 2017 at 9:26 AM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
> > @john
> > Pivotal Network
> > https://network.pivotal.io/
> > is a commercial download site maintained by Pivotal.  MADlib binaries are
> > also hosted there after Apache releases are completed
> > e.g.,
> > https://network.pivotal.io/products/pivotal-gpdb#/
> releases/4540/file_groups/491
>
> I reviewed the link that John mentioned and I must say I agree with him.
> That link just doesn't belong to a Download page of an ASF project.
>
> It would be fine on a "powered by" kind of a page or on the wiki, but not
> on a main download page.
>
> Thanks,
> Roman.
>


Re: [VOTE] MADlib v1.10-rc2

2017-03-06 Thread Frank McQuillan
>
> --
> Attached to this email: For reference: here is the entire build log
> (including PostgreSQL 9.6.2) and test run attempts. Several of the
> issues above can be seen in the log.
> --
>
>
> On Fri, Mar 3, 2017 at 4:20 PM, Orhan Kislal <okis...@pivotal.io> wrote:
>
>> +1
>>
>> On Fri, Mar 3, 2017 at 4:14 PM, Rahul Iyer <rahulri...@gmail.com> wrote:
>>
>> > +1
>> >
>> > On Fri, Mar 3, 2017 at 11:17 AM, Frank McQuillan <fmcquil...@pivotal.io
>> >
>> > wrote:
>> >
>> > > Hello MADlib community,
>> > >
>> > > I am sending this email on behalf of the release manager Satoshi
>> > Nagayasu <
>> > > sn...@uptime.jp> .
>> > >
>> > > We have created a MADlib 1.10 RC-2, with the artifacts below up for a
>> > vote.
>> > >
>> > > From project mentor Roman Shaposhnik we heard the ultimate resolution
>> on
>> > > the IP issue:
>> > >* we don't do anything with existing (BSD) files even if we edit
>> them
>> > >* every new file we create gets an ASF license header
>> > >* more details:
>> > >
>> > > https://issues.apache.org/jira/browse/LEGAL-293?
>> > focusedCommentId=15881595&
>> > > page=com.atlassian.jira.plugin.system.issuetabpanels:
>> > > comment-tabpanel#comment-15881595
>> > >
>> > > RC-2 replaces RC-1 with the following changes:
>> > >
>> > > * Multiple: Update license headers per Apache guidance
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > a3863b6c2407eb28ba007f6288d167bf88674e6d
>> > >
>> > > * Build: Fix module sort order for PGXN installation
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > fa80240f72a6551c2ee567d471afa499fd1d1efe
>> > >
>> > > * Update the copyright year.
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > 0b8415e7eec5c9ebb83fbf22923c69a99b0056ef
>> > >
>> > > * Build: Add error for missing server includedir
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > b3495c50bf491139ac245a21d97963e81892c610
>> > >
>> > > * Encode categorical: Add distributed_by in Postgresql w/ no-op
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > 7055dceb3fbde35bae602ac80d4b70486f015748
>> > >
>> > > * Renamed the top level source directory as suggested:
>> > > apache-madlib-src-1.10-incubating
>> > >
>> > > This will be the 4th release for Apache MADlib (incubating).
>> > >
>> > > The main goals of this release are:
>> > > * new modules (single source shortest path for graph analytics, encode
>> > > categorical variables, K-nearest neighbors)
>> > > * improvements to existing modules (add grouping support to elastic
>> > > net and PCA, add cross validation to elastic net, array input for
>> > > K-means, verbose output option for DT and RF, limit itemset size in
>> > > association rules, various madpack installer improvements)
>> > > * platform updates (PostgreSQL 9.6)
>> > > * bug fixes
>> > > * doc improvements
>> > >
>> > > For more information including release notes, please see:
>> > > https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.10
>> > >
>> > > *** Please download, review and vote by Mon Mar 6, 2017 @ 6pm Pacific
>> > Time
>> > > USA ***
>> > >
>> > > We're voting upon the source (tag):  rc/1.10.0-rc2
>> > > https://github.com/apache/incubator-madlib/tree/rc/1.10.0-rc2
>> > >
>> > > Source Files:
>> > > https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
>> > > 10.0-incubating-rc2/
>> > >
>> > > Commit to be voted upon:
>> > > https://github.com/apache/incubator-madlib/commit/
>> > > a3863b6c2407eb28ba007f6288d167bf88674e6d
>> > >
>> > > KEYS file containing PGP Keys we use to sign the release:
>> > > https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
>> > >
>> > > To help in tallying the vote, can PMC members please be sure to
>> > > indicate "(binding)" with their vote.
>> > >
>> > > [ ] +1  approve
>> > > [ ] +0  no opinion
>> > > [ ] -1  disapprove (and reason why)
>> > >
>> > > Regards,
>> > > Frank McQuillan
>> > >
>> >
>>
>
>


Re: [VOTE] MADlib v1.10-rc1

2017-03-03 Thread Frank McQuillan
To finish this thread, I captured all of these licensing issues on the
MADlib wiki at
https://cwiki.apache.org/confluence/display/MADLIB/ASF+Licensing+Guidance
should anyone need to refer to it.


On Tue, Feb 28, 2017 at 11:43 AM, Frank McQuillan <fmcquil...@pivotal.io>
wrote:

> Thanks Rahul.  I see your commit has addressed the remaining issues:
> https://git1-us-west.apache.org/repos/asf?p=incubator-
> madlib.git;a=commit;h=a3863b6c
>
> We are declaring create_indicators.* as new files so they will have
> Apache header.
>
> For the record, I attached an Excel spreadsheet with some more notes so
> that we remember how we went from the two lists Rahul posted above to the
> above commit.
>
> Frank
>
> On Mon, Feb 27, 2017 at 5:44 PM, Rahul Iyer <rahulri...@gmail.com> wrote:
>
>> I have attached two files:
>>
>> new_files_after_apache.txt: New files added since September 15, 2015
>> (grant date) till date
>> files_w_apache_header.txt: Files that contain the Apache header right
>> now.
>>
>> Comparing the two lists, there are open questions regarding below files.
>>
>> Extra headers:
>> - sort-module.py has Apache header but was created before grant (recently
>> edited and header added). *I'll fix this*.
>> - create_indicators.* have headers but were renamed from
>> data_preparation.*. *What is the legal guidance with this*?
>>
>> No header:
>> - class_diagram.mp looks like a text file with no header, even though it
>> was added just after the grant. I'm not aware of the purpose of this file.
>>
>>
>>
>> On Mon, Feb 27, 2017 at 4:42 PM, Frank McQuillan <fmcquil...@pivotal.io>
>> wrote:
>>
>>> OK, so we need to go back and do the comparison from the original code
>>> grant in the fall of 2015 to the  current 1.10 release candidate.
>>>
>>> On Mon, Feb 27, 2017 at 4:19 PM, Roman Shaposhnik <ro...@shaposhnik.org>
>>> wrote:
>>>
>>> > Frank, I'm not sure I understand the question. The criteria needs to
>>> hold
>>> > for anything that came in via the initial code ingest compared to how
>>> the
>>> > master of your project looks now.
>>> >
>>> > Thanks,
>>> > Roman.
>>> >
>>> > On Mon, Feb 27, 2017 at 4:10 PM, Frank McQuillan <
>>> fmcquil...@pivotal.io>
>>> > wrote:
>>> > > Roman,
>>> > >
>>> > > Does this apply retro-actively back to initial grant of the code to
>>> > ASF?  Or
>>> > > just from the last release 1.9.1?
>>> > >
>>> > > Frank
>>> > >
>>> > > On Sun, Feb 26, 2017 at 11:23 PM, Roman Shaposhnik <
>>> ro...@shaposhnik.org
>>> > >
>>> > > wrote:
>>> > >>
>>> > >> Here's the ultimate resolution on the IP issue:
>>> > >>* we don't do anything with existing (BSD) files even if we edit
>>> them
>>> > >>* every new file we create gets an ASF license header
>>> > >>
>>> > >> More details:
>>> > >>
>>> > >> https://issues.apache.org/jira/browse/LEGAL-293?
>>> > focusedCommentId=15881595=com.atlassian.jira.
>>> > plugin.system.issuetabpanels:comment-tabpanel#comment-15881595
>>> > >>
>>> > >> Thanks,
>>> > >> Roman.
>>> > >>
>>> > >> On Tue, Feb 21, 2017 at 5:54 PM, Frank McQuillan <
>>> fmcquil...@pivotal.io
>>> > >
>>> > >> wrote:
>>> > >> > Thanks Roman for working on this.
>>> > >> >
>>> > >> > If you feel a final answer will be ready next week, then yes by
>>> all
>>> > >> > means l
>>> > >> > would suggest to the community that we wait and re-spin an RC2
>>> with
>>> > the
>>> > >> > license headers issue resolved.  Seems less overhead and effort
>>> than a
>>> > >> > quick follow on release right after 1.10.  Also, there some
>>> momentum
>>> > >> > going
>>> > >> > with the legal discussion, so let's take advantage of that.
>>> > >> >
>>> > >> > Satoshi (release manager), are you OK pausing the RC2 until we
>>> hear
>>> > back
>>> > >> > from Roman next week?
>>> > >> >
>&g

[VOTE] MADlib v1.10-rc2

2017-03-03 Thread Frank McQuillan
Hello MADlib community,

I am sending this email on behalf of the release manager Satoshi Nagayasu <
sn...@uptime.jp> .

We have created a MADlib 1.10 RC-2, with the artifacts below up for a vote.

>From project mentor Roman Shaposhnik we heard the ultimate resolution on
the IP issue:
   * we don't do anything with existing (BSD) files even if we edit them
   * every new file we create gets an ASF license header
   * more details:

https://issues.apache.org/jira/browse/LEGAL-293?focusedCommentId=15881595=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15881595

RC-2 replaces RC-1 with the following changes:

* Multiple: Update license headers per Apache guidance
https://github.com/apache/incubator-madlib/commit/a3863b6c2407eb28ba007f6288d167bf88674e6d

* Build: Fix module sort order for PGXN installation
https://github.com/apache/incubator-madlib/commit/fa80240f72a6551c2ee567d471afa499fd1d1efe

* Update the copyright year.
https://github.com/apache/incubator-madlib/commit/0b8415e7eec5c9ebb83fbf22923c69a99b0056ef

* Build: Add error for missing server includedir
https://github.com/apache/incubator-madlib/commit/b3495c50bf491139ac245a21d97963e81892c610

* Encode categorical: Add distributed_by in Postgresql w/ no-op
https://github.com/apache/incubator-madlib/commit/7055dceb3fbde35bae602ac80d4b70486f015748

* Renamed the top level source directory as suggested:
apache-madlib-src-1.10-incubating

This will be the 4th release for Apache MADlib (incubating).

The main goals of this release are:
* new modules (single source shortest path for graph analytics, encode
categorical variables, K-nearest neighbors)
* improvements to existing modules (add grouping support to elastic
net and PCA, add cross validation to elastic net, array input for
K-means, verbose output option for DT and RF, limit itemset size in
association rules, various madpack installer improvements)
* platform updates (PostgreSQL 9.6)
* bug fixes
* doc improvements

For more information including release notes, please see:
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.10

*** Please download, review and vote by Mon Mar 6, 2017 @ 6pm Pacific Time
USA ***

We're voting upon the source (tag):  rc/1.10.0-rc2
https://github.com/apache/incubator-madlib/tree/rc/1.10.0-rc2

Source Files:
https://dist.apache.org/repos/dist/dev/incubator/madlib/1.10.0-incubating-rc2/

Commit to be voted upon:
https://github.com/apache/incubator-madlib/commit/a3863b6c2407eb28ba007f6288d167bf88674e6d

KEYS file containing PGP Keys we use to sign the release:
https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS

To help in tallying the vote, can PMC members please be sure to
indicate "(binding)" with their vote.

[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)

Regards,
Frank McQuillan


Re: [VOTE] MADlib v1.10-rc1

2017-02-28 Thread Frank McQuillan
Thanks Rahul.  I see your commit has addressed the remaining issues:
https://git1-us-west.apache.org/repos/asf?p=incubator-madlib.git;a=commit;h=a3863b6c

We are declaring create_indicators.* as new files so they will have Apache
header.

For the record, I attached an Excel spreadsheet with some more notes so
that we remember how we went from the two lists Rahul posted above to the
above commit.

Frank

On Mon, Feb 27, 2017 at 5:44 PM, Rahul Iyer <rahulri...@gmail.com> wrote:

> I have attached two files:
>
> new_files_after_apache.txt: New files added since September 15, 2015
> (grant date) till date
> files_w_apache_header.txt: Files that contain the Apache header right now.
>
> Comparing the two lists, there are open questions regarding below files.
>
> Extra headers:
> - sort-module.py has Apache header but was created before grant (recently
> edited and header added). *I'll fix this*.
> - create_indicators.* have headers but were renamed from
> data_preparation.*. *What is the legal guidance with this*?
>
> No header:
> - class_diagram.mp looks like a text file with no header, even though it
> was added just after the grant. I'm not aware of the purpose of this file.
>
>
>
> On Mon, Feb 27, 2017 at 4:42 PM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
>
>> OK, so we need to go back and do the comparison from the original code
>> grant in the fall of 2015 to the  current 1.10 release candidate.
>>
>> On Mon, Feb 27, 2017 at 4:19 PM, Roman Shaposhnik <ro...@shaposhnik.org>
>> wrote:
>>
>> > Frank, I'm not sure I understand the question. The criteria needs to
>> hold
>> > for anything that came in via the initial code ingest compared to how
>> the
>> > master of your project looks now.
>> >
>> > Thanks,
>> > Roman.
>> >
>> > On Mon, Feb 27, 2017 at 4:10 PM, Frank McQuillan <fmcquil...@pivotal.io
>> >
>> > wrote:
>> > > Roman,
>> > >
>> > > Does this apply retro-actively back to initial grant of the code to
>> > ASF?  Or
>> > > just from the last release 1.9.1?
>> > >
>> > > Frank
>> > >
>> > > On Sun, Feb 26, 2017 at 11:23 PM, Roman Shaposhnik <
>> ro...@shaposhnik.org
>> > >
>> > > wrote:
>> > >>
>> > >> Here's the ultimate resolution on the IP issue:
>> > >>* we don't do anything with existing (BSD) files even if we edit
>> them
>> > >>* every new file we create gets an ASF license header
>> > >>
>> > >> More details:
>> > >>
>> > >> https://issues.apache.org/jira/browse/LEGAL-293?
>> > focusedCommentId=15881595=com.atlassian.jira.
>> > plugin.system.issuetabpanels:comment-tabpanel#comment-15881595
>> > >>
>> > >> Thanks,
>> > >> Roman.
>> > >>
>> > >> On Tue, Feb 21, 2017 at 5:54 PM, Frank McQuillan <
>> fmcquil...@pivotal.io
>> > >
>> > >> wrote:
>> > >> > Thanks Roman for working on this.
>> > >> >
>> > >> > If you feel a final answer will be ready next week, then yes by all
>> > >> > means l
>> > >> > would suggest to the community that we wait and re-spin an RC2 with
>> > the
>> > >> > license headers issue resolved.  Seems less overhead and effort
>> than a
>> > >> > quick follow on release right after 1.10.  Also, there some
>> momentum
>> > >> > going
>> > >> > with the legal discussion, so let's take advantage of that.
>> > >> >
>> > >> > Satoshi (release manager), are you OK pausing the RC2 until we hear
>> > back
>> > >> > from Roman next week?
>> > >> >
>> > >> > Thank you,
>> > >> > Frank
>> > >> >
>> > >> >
>> > >> > On Tue, Feb 21, 2017 at 4:45 PM, Roman Shaposhnik <
>> > ro...@shaposhnik.org>
>> > >> > wrote:
>> > >> >
>> > >> >> On Tue, Feb 21, 2017 at 2:55 PM, Frank McQuillan
>> > >> >> <fmcquil...@pivotal.io>
>> > >> >> wrote:
>> > >> >> > Agree with Rahul re putting up an RC2 with the suggested changes
>> > from
>> > >> >> Roman,
>> > >> >> > including incorporating Ed's comments on copyright year and top
>> > level
>> > >> >> folder
>> > >> >> > naming.  These are really items but let's respond to the RC1
>> > >> >> > reviewers
>> > >> >> the
>> > >> >> > best way we can.
>> > >> >>
>> > >> >> +1 to a respin.
>> > >> >>
>> > >> >> > Regarding the ASF legal issue being discussed, MADLib community
>> is
>> > >> >> > more
>> > >> >> than
>> > >> >> > happy to respond to any guidance from the fine folks at the ASF
>> > >> >> > around
>> > >> >> > headers with appropriate licensing verbage.  We just need to
>> know
>> > >> >> > what
>> > >> >> that
>> > >> >> > guidance is.
>> > >> >>
>> > >> >> Well, if you're ok respinning next week I hope to get you a final
>> > >> >> answer by then.
>> > >> >> Might as well kill two birds with the same RC. Or we can quickly
>> do a
>> > >> >> follow up
>> > >> >> release once the licensing headers dust settles. Up to you guys.
>> > >> >>
>> > >> >> Thanks,
>> > >> >> Roman.
>> > >> >>
>> > >
>> > >
>> >
>>
>
>


file headers work1.xlsx
Description: MS-Excel 2007 spreadsheet


Re: [VOTE] MADlib v1.10-rc1

2017-02-27 Thread Frank McQuillan
Roman,

Does this apply retro-actively back to initial grant of the code to ASF?
Or just from the last release 1.9.1?

Frank

On Sun, Feb 26, 2017 at 11:23 PM, Roman Shaposhnik <ro...@shaposhnik.org>
wrote:

> Here's the ultimate resolution on the IP issue:
>* we don't do anything with existing (BSD) files even if we edit them
>* every new file we create gets an ASF license header
>
> More details:
>https://issues.apache.org/jira/browse/LEGAL-293?
> focusedCommentId=15881595=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-15881595
>
> Thanks,
> Roman.
>
> On Tue, Feb 21, 2017 at 5:54 PM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
> > Thanks Roman for working on this.
> >
> > If you feel a final answer will be ready next week, then yes by all
> means l
> > would suggest to the community that we wait and re-spin an RC2 with the
> > license headers issue resolved.  Seems less overhead and effort than a
> > quick follow on release right after 1.10.  Also, there some momentum
> going
> > with the legal discussion, so let's take advantage of that.
> >
> > Satoshi (release manager), are you OK pausing the RC2 until we hear back
> > from Roman next week?
> >
> > Thank you,
> > Frank
> >
> >
> > On Tue, Feb 21, 2017 at 4:45 PM, Roman Shaposhnik <ro...@shaposhnik.org>
> > wrote:
> >
> >> On Tue, Feb 21, 2017 at 2:55 PM, Frank McQuillan <fmcquil...@pivotal.io
> >
> >> wrote:
> >> > Agree with Rahul re putting up an RC2 with the suggested changes from
> >> Roman,
> >> > including incorporating Ed's comments on copyright year and top level
> >> folder
> >> > naming.  These are really items but let's respond to the RC1 reviewers
> >> the
> >> > best way we can.
> >>
> >> +1 to a respin.
> >>
> >> > Regarding the ASF legal issue being discussed, MADLib community is
> more
> >> than
> >> > happy to respond to any guidance from the fine folks at the ASF around
> >> > headers with appropriate licensing verbage.  We just need to know what
> >> that
> >> > guidance is.
> >>
> >> Well, if you're ok respinning next week I hope to get you a final
> >> answer by then.
> >> Might as well kill two birds with the same RC. Or we can quickly do a
> >> follow up
> >> release once the licensing headers dust settles. Up to you guys.
> >>
> >> Thanks,
> >> Roman.
> >>
>


Reminder to vote on MADlib 1.10 release candidate

2017-02-17 Thread Frank McQuillan
Hello,

Gentle reminder that release manager Satoshi-san put up a MADlib 1.10
release candidate and is asking for a vote before Sat 6 pm Pacific Time.

So please vote.

Here are the user and dev threads:

https://mail-archives.apache.org/mod_mbox/incubator-madlib-user/201702.mbox/%3CCAA8sozdFbpqigNMdKbsZQtHft3VvP7%2BOO1dcx9X_qBRZiFVzZA%40mail.gmail.com%3E

https://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201702.mbox/%3CCAA8sozdFbpqigNMdKbsZQtHft3VvP7%2BOO1dcx9X_qBRZiFVzZA%40mail.gmail.com%3E

Thanks,
Frank


Re: [VOTE] MADlib v1.10-rc1

2017-02-16 Thread Frank McQuillan
+1

Frank McQuillan

On Wed, Feb 15, 2017 at 7:27 PM, Satoshi Nagayasu <sn...@uptime.jp> wrote:

> Hello MADlib community,
>
> We have created a MADlib 1.10 RC-1, with the artifacts below up for a vote.
>
> This will be the 4th release for Apache MADlib (incubating).
>
> The main goals of this release are:
> * new modules (single source shortest path for graph analytics, encode
> categorical variables, K-nearest neighbors)
> * improvements to existing modules (add grouping support to elastic
> net and PCA, add cross validation to elastic net, array input for
> K-means, verbose output option for DT and RF, limit itemset size in
> association rules, various madpack installer improvements)
> * platform updates (PostgreSQL 9.6)
> * bug fixes
> * doc improvements
>
> For more information including release notes, please see:
> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.10
>
> *** Please download, review and vote by Sat Feb 18, 2017 @ 6pm PST ***
>
> We're voting upon the source (tag):  rc/1.10.0-rc1
> https://github.com/apache/incubator-madlib/tree/rc/1.10.0-rc1
>
> Source Files:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/1.
> 10.0-incubating-rc1/
>
> Commit to be voted upon:
> https://github.com/apache/incubator-madlib/commit/
> ea17530bfe22a1fde173d7fa83508cbcd9924c20
>
> KEYS file containing PGP Keys we use to sign the release:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
>
> To help in tallying the vote, can PMC members please be sure to
> indicate "(binding)" with their vote.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> --
> Satoshi Nagayasu <sn...@uptime.jp>
>


1.11 feature suggestions

2017-02-15 Thread Frank McQuillan
Release Manager Satoshi-san is putting the final touches on the 1.10 RC,
and he should be sending out an announcement on that shortly for voting by
the community.

While that is happening, I wanted to suggest some ideas for 1.11 .

Based on the recent survey
http://madlib.incubator.apache.org/community-artifacts/Apache-MADlib-user-survey-results-Oct-2016.pdf
graph analytics was identified as a desired area of development for Apache
MADlib.

You can also have a look at my recent talk on FOSDEM17 on this topic
https://fosdem.org/2017/schedule/event/graph_analytics_massively_parallel_processing_databases/

So I have created a bunch of 1.11 JIRAs on graph that I am interested in
pursuing.
https://issues.apache.org/jira/issues/?jql=project%20%3D%20MADLIB%20AND%20fixVersion%20%3D%20v1.11%20ORDER%20BY%20priority%20DESC

If you have other things that you are interested in for 1.11, please by all
means open a JIRA or let the community know or start working on the
software.

I would also suggest we look at a shorter release cycle for 1.11, in the
next couple months or so.

A always, open to suggests and comments.

Regards,
Frank


Re: Status of on-going PRs

2017-02-02 Thread Frank McQuillan
Looks like all of the PRs for 1.10 have been merged, except
https://github.com/apache/incubator-madlib/pull/75
which will spill over to 2.0.  Thank you all who contributed on doing this.

We are still doing some final checking on E2E functional test suite across
postres, greenplum and apache hawq, so not officially at code freeze yet.
But getting dangerously close.

Frank

On Tue, Jan 31, 2017 at 12:22 PM, Rahul Iyer  wrote:

> Hi Satoshi,
>
> Thanks for compiling this list. Please find my comments inline.
>
> On Tue, Jan 31, 2017 at 3:04 AM, Satoshi Nagayasu  wrote:
>
> > Hi all,
> >
> > As release manager for 1.10, I just did a quick review and created a
> status
> > list of the on-going PRs.
> >
> > https://github.com/apache/incubator-madlib/pulls
> >
> > If you have comments, please let me know. I will update the status.
> >
> > Status of the PRs
> > -
> > Use relative path for installation in GPDB/HAWQ #94
> >   -> Need to be tested with GPDB/HAWQ.
> >
> > Build: Use only major version for GPDB 5, HAWQ 2 #91
> >   -> Need review?
> >
> ​Testing is complete for both PRs. Requires a review.
> ​
>
> > Allow encode_categorical_variables() to use the svec type. #93
> >   -> Need more work by the developer (me).
> >
> ​This would be better merged within the 1.10 release.
> Adding it to the next version would require special handling by upgrade
> since there is a change in argument type (hence requiring drop/replace
> during upgrade).
>
>
> >  K-means: support for array input #89
> >   -> Need more review, or ready for committer?
> >
> ​This looks ready to merge. ​
>
> >
> > JIRA: MADLIB-927 Changes made in KNN-help message-test cases-etc #81
> >   -> Need more work by the developer.
> >
> > HAWQ2.1: Changes the cmake to assume any HAWQ 2.X system is 2.0 and #79
> >   -> Need review, or ready for committer?
> >
> ​This is superseded by #91 and will be closed with it. ​
>
>
> > Include boost::format in MathToolkit_impl.hpp. #76
> >   -> Already merged. The PR can be closed.
> >
> ​I forgot to close this with the commit message and can only be manually
> closed by the contributor. If not closed soon, I'll close it with a future
> commit.
> ​
>
> > SVM: Implement c++ functions for training multi-class svm in mini-batch
> #75
> >   -> The doc needs to be updated?
> >
> ​This requires substantial more work and discussion as the scope of the
> work is not defined. We will have to ​release without it.
>
>
>
> >
> > Regards,
> > --
> > Satoshi Nagayasu 
> >
>


Re: schema "madlib" does not exist error on Mac OS Sierra

2017-02-02 Thread Frank McQuillan
I don't think the mailing list supports attachments. At least I cannot see
them.  Maybe cut and paste in-line.

Frank

On Tue, Jan 31, 2017 at 7:33 PM, Sankara Subramanian,Karthik Maharajan <
skarthikmahar...@ufl.edu> wrote:

> I guess pasting the images did not work. Please find the screenshots
> attached,
>
> MADlib.png —> Shows successful installation of MADlib.
> postgresql.png —> Shows the error I had mentioned. I execute this query
> after creating the table and inserting 20 rows as in this tutorial page -->
> https://cwiki.apache.org/confluence/display/MADLIB/
> Quick+Start+Guide+for+Users
>
>
> Thanks and Regards,
>
> Karthik Maharajan Sankara Subramanian
> Computer and Information Science and Engineering Department
> Herbert Wertheim College of Engineering
> University of Florida
> Gainesville, FL-32611
>
>
> On Jan 31, 2017, at 10:25 PM, Sankara Subramanian,Karthik Maharajan <
> skarthikmahar...@ufl.edu> wrote:
>
> Hi Frank,
> Please find the screenshots below,
>
> This is the error I had mentioned. I execute this query after creating the
> table and inserting 20 rows as in this tutorial page
> -> https://cwiki.apache.org/confluence/display/MADLIB/
> Quick+Start+Guide+for+Users
>
>
>
>
> The next screenshot shows that MADlib has been successfully installed.
>
> Please let me know if you need more details.
>
>
>
> Thanks and Regards,
>
> Karthik Maharajan Sankara Subramanian
> Computer and Information Science and Engineering Department
> Herbert Wertheim College of Engineering
> University of Florida
> Gainesville, FL-32611
>
>
> On Jan 31, 2017, at 2:24 PM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
>
> Karthik,
>
> Please attach the output from the installation so we can have a look.
>
> Thanks,
> Frank
>
> On Mon, Jan 30, 2017 at 5:48 PM, Sankara Subramanian,Karthik Maharajan <
> skarthikmahar...@ufl.edu> wrote:
>
> MADlib community,
> I am using Mac OS Sierra 10.12.2. I have installed both postgresql and
> Madlib as per the “Super Quick Start” instruction in this page ->
> https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide.
> The installations were successful. However, when I try to run the sample
> logistic regression query from the tutorial page,
>
> SELECT madlib.logregr_train(
>'patients', -- source table
>'patients_logregr', -- output table
>'second_attack',-- labels
>'ARRAY[1, treatment, trait_anxiety]',   -- features
>NULL,   -- grouping columns
>20, -- max number of iteration
>'irls'  -- optimizer
>);
>
> I am getting the error,
>
> ERROR:  schema "madlib" does not exist
> LINE 1: SELECT madlib.logregr_train(
>
>
> Should I manually create the schema for madlib? Please let me know what I
> am missing here.
>
>
>
> Thanks and Regards,
>
> Karthik Maharajan Sankara Subramanian
> Computer and Information Science and Engineering Department
> Herbert Wertheim College of Engineering
> University of Florida
> Gainesville, FL-32611
>
>
>
>
>
>


Re: Upgrade support

2017-02-01 Thread Frank McQuillan
Orhan, I think this is a reasonable approach.  Supporting upgrades for
older versions is time consuming and probably not worth the effort at this
point. Pus you have offered a work around.

Frank



On Wed, Feb 1, 2017 at 3:14 PM, Orhan Kislal  wrote:

> Dear MADlib community,
>
> I started working on the upgrade support for our upcoming release (MADlib
> 1.10.0) and made some progress. Historically, MADlib supported upgrades
> from any 1.x version. However, with every version, this task becomes more
> and more time consuming. Note that all upgrades have to be tested for 6
> platforms (last 2 versions Postgres, Greenplum and HAWQ). I believe we can
> drop support for upgrades for versions prior to 1.8 but I wanted to consult
> with you before taking this action. This change will not disable upgrade
> for older versions entirely. The upgrade might not give proper error
> messages but it should still work if there are no dependencies. In
> addition, it is possible to follow an upgrade chain 1.x -> 1.9.1 -> 1.10.0.
>
> Please let us know if this change is not reasonable.
>
> Thanks
>
> Orhan Kislal
>


Re: schema "madlib" does not exist error on Mac OS Sierra

2017-01-31 Thread Frank McQuillan
Karthik,

Please attach the output from the installation so we can have a look.

Thanks,
Frank

On Mon, Jan 30, 2017 at 5:48 PM, Sankara Subramanian,Karthik Maharajan <
skarthikmahar...@ufl.edu> wrote:

> MADlib community,
> I am using Mac OS Sierra 10.12.2. I have installed both postgresql and
> Madlib as per the “Super Quick Start” instruction in this page ->
> https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide.
> The installations were successful. However, when I try to run the sample
> logistic regression query from the tutorial page,
>
> SELECT madlib.logregr_train(
> 'patients', -- source table
> 'patients_logregr', -- output table
> 'second_attack',-- labels
> 'ARRAY[1, treatment, trait_anxiety]',   -- features
> NULL,   -- grouping columns
> 20, -- max number of iteration
> 'irls'  -- optimizer
> );
>
> I am getting the error,
>
> ERROR:  schema "madlib" does not exist
> LINE 1: SELECT madlib.logregr_train(
>
>
> Should I manually create the schema for madlib? Please let me know what I
> am missing here.
>
>
>
> Thanks and Regards,
>
> Karthik Maharajan Sankara Subramanian
> Computer and Information Science and Engineering Department
> Herbert Wertheim College of Engineering
> University of Florida
> Gainesville, FL-32611
>
>
>


1.10 release status and release manager

2017-01-27 Thread Frank McQuillan
MADlib community,

We are getting fairly close to completing the software for the 1.10 release
and putting up an RC.

The PR list is getting smaller as we review and complete testing
https://github.com/apache/incubator-madlib/pulls

Satoshi Nagayasu
satoshi.nagay...@gmail.com
https://github.com/snaga
has graciously offered to be the release manager for 1.10.  Thank you very
much Satoshi for your help!

Regards,
Frank


DRAFT Apache MADlib (incubating) podling report for Q416

2017-01-04 Thread Frank McQuillan
Here is the draft report for Jan 2017, covering Q4 activity.

It is posted at http://wiki.apache.org/incubator/January2017

Please let me know if you have any comments or suggestions and I will
update the report.

---

MADlib

Big Data Machine Learning in SQL for Data Scientists.

MADlib has been incubating since 2015-09-15.

Three most important issues to address in the move towards graduation:

  1. Need guidance from Incubator PMC on how to resolve the BSD licensing
switch over to Apache License.  What should be the content of the license
headers for files that were previously BSD licensed and then granted to
ASF?  Related legal-discuss threads:
http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201609.mbox/%3ccalgg8z03zhhbfegxoi4fh+vxtf+9m7x6hak9rjkqjapuzi6...@mail.gmail.com%3E
http://mail-archives.apache.org/mod_mbox/www-legal-discuss/201603.mbox/%3C9D1AF43C-370B-4E58-B0EF-2E29D242F50B%40jaguNET.com%3E
  2. Continue to produce regular Apache (incubating) releases.
  3. Continue to execute and manage the project according to governance
model of the "Apache Way”.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

 1. Yes-please see #1 above and provide guidance.
 2. The next release v1.10 will be the 4th as an incubating project.  After
that, the community would ideally like to move towards top level status.

How has the community developed since the last report?

  1. Some related events in Q4 2016 and upcoming:
* Feb 4, 2017 - Presentation accepted at FOSDEM’17 Graph devroom.
Topic:  Graph Analytics on Massively Parallel Processing Databases (Frank
McQuillan)
* Dec 1, 2016 - MADLib community call.  Topic:  New features in R interface
and MADlib user survey results (hosted by Greg Chase, Orhan Kislal, Frank
McQuillan)
* Nov 16, 2016 - Presentation at PGConf Silicon Valley.  Topic:
 Distributed In-Database Machine Learning with Apache MADlib (incubating)
(Frank McQuillan)
* Nov 14, 2016 - Presentation at Apache Big Data Europe.  Topic:
 Distributed In-Database Machine Learning with Apache MADlib (incubating)
(Roman Shaposhnik)
  2. Material technical conversations on user/dev mailing lists and in the
appropriate JIRAs and pull requests.
  3. New contributors to the project have been working on KNN module and
Python interface.

How has the project developed since the last report?

  1. Active work in progress for 4th ASF release MADlib v10 scheduled for
Jan 2017.  Features include: single source shortest path graph algorithm,
completely new module for encoding categorical variables, R interface
update, grouping support in elastic net and PCA, cross validation in
elastic net, verbose output option for decision tree visualization.
  2. Mailing list activity in Q4:  227 postings to dev, 66 postings to user.

Date of last release:

  MADlib v1.9.1 on 9/19/16.

When were the last committers or PMC members elected:

  Orhan Kislal on 9/7/16 and Nandish Jayaram on 9/7/16.


Podcast on machine learning in enterprise

2016-12-14 Thread Frank McQuillan
Here's a podcast I did recently with Jeff Kelly of Pivotal, talking about
machine learning in enterprise, which is where Apache MADlib can play a
meaningful role.

https://blog.pivotal.io/pivotal-insights/features/12-no-machine-learning-wont-lead-to-killer-robots

Frank


Re: New tweeter for MADlib

2016-12-13 Thread Frank McQuillan
https://twitter.com/ApacheMADlib


On Tue, Dec 13, 2016 at 4:22 AM, Luis Macedo <lmac...@pivotal.io> wrote:

> Hey guys!
>
> What is the twitter handle for MADlib? The bet the one I found is not about
> this MADlib!
>
> https://twitter.com/madlib
>
>
> Thanks!
>
>
> *Luis Macedo | Sr Platform Architect | **Pivotal Inc *
>
> *Mobile:* +55 11 97616-6438
> *Pivotal.io <http://pivotal.io>*
> *Take care of the customers and the rest takes care of itself*
>
> 2016-12-12 22:15 GMT-02:00 Frank McQuillan <fmcquil...@pivotal.io>:
>
> > Thanks Bob.  Yes please, tweet away!
> >
> > Frank
> >
> > On Mon, Dec 12, 2016 at 8:56 AM, Greg Chase <gch...@gmail.com> wrote:
> >
> > > +1
> > >
> > > This email encrypted by tiny buttons & fat thumbs, beta voice
> > recognition,
> > > and autocorrect on my iPhone.
> > >
> > > > On Dec 11, 2016, at 9:16 PM, Bob Glithero <rglith...@pivotal.io>
> > wrote:
> > > >
> > > > Hello MADlib community,
> > > >
> > > > I'm a newish member of Pivotal responsible for product marketing for
> > > > HDB/HAWQ, and will be spending more time on awareness for MADlib.  If
> > > > there's no objection, I'd like to add myself as a tweeter on behalf
> of
> > > > MADlib.
> > > >
> > > > Thanks!
> > > > Bob Glithero
> > > > Pivotal, Inc.
> > >
> >
>


Re: New tweeter for MADlib

2016-12-12 Thread Frank McQuillan
Thanks Bob.  Yes please, tweet away!

Frank

On Mon, Dec 12, 2016 at 8:56 AM, Greg Chase  wrote:

> +1
>
> This email encrypted by tiny buttons & fat thumbs, beta voice recognition,
> and autocorrect on my iPhone.
>
> > On Dec 11, 2016, at 9:16 PM, Bob Glithero  wrote:
> >
> > Hello MADlib community,
> >
> > I'm a newish member of Pivotal responsible for product marketing for
> > HDB/HAWQ, and will be spending more time on awareness for MADlib.  If
> > there's no objection, I'd like to add myself as a tweeter on behalf of
> > MADlib.
> >
> > Thanks!
> > Bob Glithero
> > Pivotal, Inc.
>


New PCA video posted

2016-12-09 Thread Frank McQuillan
Hi,

The latest MADlib video on Principal Component Analysis (PCA) has been
published on Youtube.com under the Pivotal Open Source Hub channel.  The
link to the video is:

https://www.youtube.com/watch?v=2R-76gimBX4=43s

Thank you to Charles Killam for putting this video together.

Frank


Re: Reminder: [VIRTUAL] MADlib Community Call: Pivotal R for MADlib & MADlib user survey results - Thurs, Dec 1, 2016

2016-12-01 Thread Frank McQuillan
Oh, and here is the PivotalR demo that Orhan showed:
https://github.com/apache/incubator-madlib-site/blob/asf-site/community-artifacts/PivotaR-demo-nov-2016.R

Frank

On Thu, Dec 1, 2016 at 10:12 AM, Frank McQuillan <fmcquil...@pivotal.io>
wrote:

> The MADlib user survey results that I went over are posted here
> http://madlib.incubator.apache.org/community-artifacts/Apache-MADlib-user-
> survey-results-Oct-2016.pdf
>
> Thank you for attending.
>
> Frank
>
> On Thu, Dec 1, 2016 at 8:08 AM, Gregory Chase <gch...@pivotal.io> wrote:
>
>> The MADlib Community call discussing Pivotal R for Greenplum, HAWQ, and
>> PostgreSQL starts in less than 1 hour.
>>
>> See you at 9AM, Pacific.
>>
>> Join the call <https://pivotal.zoom.us/j/248236262>
>>
>> On Wed, Nov 30, 2016 at 2:40 PM, Gregory Chase <gch...@pivotal.io> wrote:
>>
>> > Greetings,
>> > This is a reminder about tomorrow's MADlib community call at 9AM
>> Pacific.
>> >
>> > Add to calendar
>> > <https://www.google.com/calendar/event?eid=dXJnbXI3YjBnaTRmO
>> DdkZ21zNzcyc3JvZXMgcGl2b3RhbC5pb191OGtndnVhaGprYm9oMWduZmh2N
>> XRzMnY5Y0Bn=America/Los_Angeles>
>> >  | Join the call <https://pivotal.zoom.us/j/248236262>
>> >
>> > See you tomorrow!
>> >
>> > -Greg
>> >
>> > On Mon, Nov 28, 2016 at 2:39 PM, Gregory Chase <gch...@pivotal.io>
>> wrote:
>> >
>> >> Dear MADlib, HAWQ, and Greenplum communities,
>> >>
>> >> Here's a chance for us to get to know our end users better with this
>> >> double header call this Thursday, Dec 1, 2016.
>> >>
>> >> Add to calendar
>> >> <https://www.google.com/calendar/event?eid=dXJnbXI3YjBnaTRmO
>> DdkZ21zNzcyc3JvZXMgcGl2b3RhbC5pb191OGtndnVhaGprYm9oMWduZmh2N
>> XRzMnY5Y0Bn=America/Los_Angeles>
>> >> | Join the call <https://pivotal.zoom.us/j/248236262>
>> >>
>> >> The first half of this call, we'll be discussing new features in the
>> open
>> >> source Pivotal R project that compliments MADlib, HAWQ, and
>> Greenplum.  In
>> >> the second half, we'll talk about a recent survey of MADlib users.
>> >>
>> >> *Talk #1: What's New in Pivotal R*
>> >> Pivotal R is a popular interface for running data science
>> investigations
>> >> using Apache MADlib with Greenplum Database, Apache HAWQ, and
>> PostgreSQL.
>> >>
>> >> It allows R developers to work with relational database structures such
>> >> as tables and views and operate on data in the database without having
>> to
>> >> switch to SQL. Pivotal R also provides a wrapper for Apache MADlib so
>> that
>> >> data scientists can directly call the parallel processing functions of
>> >> MADlib. This gives them the full power of in-database processing in
>> their
>> >> familiar R environment.
>> >>
>> >> You can find Pivotal R here: https://cran.r-project.o
>> >> rg/web/packages/PivotalR/index.html
>> >>
>> >> *Talk #2: Apache MADlib User Survey Results*
>> >> In the second half of this call, we'll be discussing the results of a
>> >> recent user survey of Apache MADlib users. Hear which platform they
>> like
>> >> best: Greenplum, HAWQ, or PostgreSQL. Hear which use cases are the most
>> >> popular.
>> >>
>> >> This survey was recently lauded at ApacheCon EU as a great example of
>> >> user experience research in open source.
>> >>
>> >> You can find the results posted here: http://madlib.incubator.
>> >> apache.org/community-artifacts/Apache-MADlib-user-survey-
>> >> results-Oct-2016.pdf
>> >>
>> >> See you Thursday!
>> >>
>> >> -Greg
>> >>
>> >> --
>> >> Greg Chase
>> >>
>> >> Global Head, Big Data Communities
>> >> http://www.pivotal.io/big-data
>> >>
>> >> Pivotal Software
>> >> http://www.pivotal.io/
>> >>
>> >> 650-215-0477
>> >> @GregChase
>> >> Blog: http://geekmarketing.biz/
>> >>
>> >>
>> >
>> >
>> > --
>> > Greg Chase
>> >
>> > Global Head, Big Data Communities
>> > http://www.pivotal.io/big-data
>> >
>> > Pivotal Software
>> > http://www.pivotal.io/
>> >
>> > 650-215-0477
>> > @GregChase
>> > Blog: http://geekmarketing.biz/
>> >
>> >
>>
>>
>> --
>> Greg Chase
>>
>> Global Head, Big Data Communities
>> http://www.pivotal.io/big-data
>>
>> Pivotal Software
>> http://www.pivotal.io/
>>
>> 650-215-0477
>> @GregChase
>> Blog: http://geekmarketing.biz/
>>
>
>


Re: FOSDEM 2017 HPC, Bigdata and Data Science DevRoom CFP is closing soon

2016-11-23 Thread Frank McQuillan
I attended FOSDEM last year and can attest to this being a really great
conference for open source developers.

Frank

On Wed, Nov 23, 2016 at 1:11 PM, Roman Shaposhnik 
wrote:

> Hi!
>
> apologies for the extra wide distribution (this exhausts my once
> a year ASF mail-to-all-bigdata-projects quota ;-)) but I wanted
> to suggest that all of you should consider submitting talks
> to FOSDEM 2017 HPC, Bigdata and Data Science DevRoom:
> https://hpc-bigdata-fosdem17.github.io/
>
> It was a great success this year and we hope to make it an even
> bigger success in 2017.
>
> Besides -- FOSDEM is the biggest gathering of open source
> developers on the face of the earth -- don't miss it!
>
> Thanks,
> Roman.
>
> P.S. If you have any questions -- please email me directly and
> see you all in Brussels!
>


Re: Adding KNN to madlib

2016-11-15 Thread Frank McQuillan
Auon,

Thanks for working on kNN for MADlib.   Can you expand a little bit on your
note, and post the interface that you are thinking about and description of
the arguments?  Then people can comment on that.

Thanks,
Frank

On Tue, Nov 15, 2016 at 9:30 AM, Nandish Jayaram 
wrote:

> Hi Auon,
>
> Great going with your first version of k-NN implementation.
> Some useful links for coding guidelines are at (see Developer
> Documentation):
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61319606
> MADilb has something called as install-checks for basic testing. You can
> look at any existing module for an example of the same. For instance, check
> out the install check code for k-means at:
> https://github.com/apache/incubator-madlib/tree/master/
> src/ports/postgres/modules/kmeans/test
>
> I am sure others will pitch in to help you more with your other questions,
> but these are some starters you can consider! Good luck!
>
> NJ
>
> On Mon, Nov 14, 2016 at 10:41 PM, Kazmi,Auon H  wrote:
>
> > Hi,
> >
> > I am a first year Computer Science graduate student at University of
> > Florida working on implementing KNN in Madlib. I am ready with a first
> > version of it but I don't know how to proceed with testing and adding it
> to
> > Madlib platform. Also, I am not clear on what standards do I have to
> choose
> > in the final implementation. My current version asks for the table name
> and
> > column name having vectors in which I have to find the neighbours. The
> > other table given as input holds the vector whose K-NN needs to be found.
> > It is assuming euclidean distance metric for distance calculation. It
> would
> > really help if somebody can share ideas on what can be added to this
> > functionality.
> >
> >
> >
> >
> >
> > Regards,
> >
> > Auon Haidar Kazmi
> >
>


Re: Encoding categorical variables

2016-10-28 Thread Frank McQuillan
Yes thanks Vatsan we have been looking at that.

On Fri, Oct 28, 2016 at 2:39 PM, Srivatsan R <vatsan...@gmail.com> wrote:

> You guys may have already seen this, but linking just in case:
> http://pandas.pydata.org/pandas-docs/stable/generated/
> pandas.get_dummies.html
>
> On Fri, Oct 28, 2016 at 1:32 PM, Woo Jae Jung <wj...@pivotal.io> wrote:
>
> > +Vatsan for his thoughts as well!
> >
> > On Fri, Oct 28, 2016 at 1:29 PM, Woo Jae Jung <wj...@pivotal.io> wrote:
> >
> >> Also agree that double-quoted column names are not ideal.  In addition
> to
> >> the net-new features described in this thread, it'd be nice to see
> >> non-double-quoted output as default behavior in the
> >> existing create_indicator_variables() function.
> >>
> >> Thanks,
> >> Woo
> >>
> >> On Fri, Oct 28, 2016 at 1:05 PM, Woo Jae Jung <wj...@pivotal.io> wrote:
> >>
> >>> I like the one-hot encoded feature.  Another variant of this idea would
> >>> be an "all other" variable (distinct from the reference class) that
> >>> contains occurrences of the less frequent category types.  In both of
> these
> >>> scenarios, the threshold for 'less frequent' could be user-supplied.
> >>>
> >>> Thanks,
> >>> Woo
> >>>
> >>> On Fri, Oct 28, 2016 at 11:29 AM, Rahul Iyer <rahulri...@gmail.com>
> >>> wrote:
> >>>
> >>>> An alternative to dropping is to assign the less frequent values to
> the
> >>>> reference i.e. all one-hot encoded features will be 0.
> >>>> Also important to note: total runtime will increase with this option
> >>>> since
> >>>> we'll have to compute the exact frequency distribution.
> >>>>
> >>>> Another suggested change is to call this function 'one_hot_encoding'
> >>>> since
> >>>> that is the output here (similar to sklearn's OneHotEncoder
> >>>> <http://scikit-learn.org/stable/modules/generated/sklearn.pr
> >>>> eprocessing.OneHotEncoder.html>).
> >>>> We can keep the current name as a deprecated alias till 2.0 is
> released.
> >>>>
> >>>> On Fri, Oct 28, 2016 at 11:17 AM, Frank McQuillan <
> >>>> fmcquil...@pivotal.io>
> >>>> wrote:
> >>>>
> >>>> > Jarrod,
> >>>> >
> >>>> > Just trying to write up detailed requirements.  How would you see
> >>>> this one
> >>>> > working?
> >>>> >
> >>>> > "2) Option to dummy code only the top n most frequently occurring
> >>>> values in
> >>>> > any column"
> >>>> >
> >>>> > With 1 column I can picture it, you would drop the rows with the
> less
> >>>> > frequently occurring values and end up with a smaller table.  But
> >>>> what if
> >>>> > you are encoding multiple rows?Would you want a per row
> >>>> specification
> >>>> > of n? i.e., top 3 values for column x, top 10 values for column y?
> >>>> If you
> >>>> > did this then your result set might include low frequency values for
> >>>> column
> >>>> > x (not in top 3) because they are in the top 10 for column y - this
> >>>> might
> >>>> > be confusing.
> >>>> >
> >>>> > Frank
> >>>> >
> >>>> > On Wed, Oct 19, 2016 at 2:44 PM, Frank McQuillan <
> >>>> fmcquil...@pivotal.io>
> >>>> > wrote:
> >>>> >
> >>>> >> great, thanks for the additional information
> >>>> >>
> >>>> >> Frank
> >>>> >>
> >>>> >> On Wed, Oct 19, 2016 at 1:57 PM, Jarrod Vawdrey <
> jvawd...@pivotal.io
> >>>> >
> >>>> >> wrote:
> >>>> >>
> >>>> >>> IMO
> >>>> >>>
> >>>> >>> 1) Option to define resulting column names. Please see pdltools
> >>>> >>> implementation - the ability to pass in a function is especially
> >>>> useful (
> >>>> >>> http://pivotalsoftware.github.io/PDLTools/group__grp__
> pivot01.html)
> >>>> >>> 2) Option to dummy code only

Re: Encoding categorical variables

2016-10-28 Thread Frank McQuillan
Jarrod,

Just trying to write up detailed requirements.  How would you see this one
working?

"2) Option to dummy code only the top n most frequently occurring values in
any column"

With 1 column I can picture it, you would drop the rows with the less
frequently occurring values and end up with a smaller table.  But what if
you are encoding multiple rows?Would you want a per row specification
of n? i.e., top 3 values for column x, top 10 values for column y?  If you
did this then your result set might include low frequency values for column
x (not in top 3) because they are in the top 10 for column y - this might
be confusing.

Frank

On Wed, Oct 19, 2016 at 2:44 PM, Frank McQuillan <fmcquil...@pivotal.io>
wrote:

> great, thanks for the additional information
>
> Frank
>
> On Wed, Oct 19, 2016 at 1:57 PM, Jarrod Vawdrey <jvawd...@pivotal.io>
> wrote:
>
>> IMO
>>
>> 1) Option to define resulting column names. Please see pdltools
>> implementation - the ability to pass in a function is especially useful (
>> http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html)
>> 2) Option to dummy code only the top n most frequently occurring values in
>> any column
>> 3) Option to create numeric column names (E.g. pivotcol_val1,
>> pivotcol_val2
>> ...) instead of values in column names + secondary mapping table
>> 4) Option to exclude original column from results table
>>
>> (1) & (2) are much higher priority than (3) & (4).
>>
>> Agreed that these could also be applied to Pivoting (especially 1).
>>
>>
>>
>> Jarrod Vawdrey
>> Sr. Data Scientist
>> Data Science & Engineering | Pivotal
>> (650) 315-8905
>> https://pivotal.io/
>>
>> On Wed, Oct 19, 2016 at 4:47 PM, Frank McQuillan <fmcquil...@pivotal.io>
>> wrote:
>>
>> > Thanks for those suggestions, Jarrod.  They all sound pretty useful -
>> > would you mind taking a crack at numbering them 1,2,3... etc, in the
>> order
>> > of priority as you see it?
>> >
>> > Also it seems like some of these could be applied to the Pivot function
>> as
>> > well, e.g., UDF for column naming.
>> >
>> > Frank
>> >
>> >
>> >
>> > On Fri, Oct 14, 2016 at 1:02 PM, Jarrod Vawdrey <jvawd...@pivotal.io>
>> > wrote:
>> >
>> >> Hey Frank,
>> >>
>> >> How are special character values handled today? It is often not ideal
>> to
>> >> end up with column names that require double quotes to call due to
>> >> downstream scripts.
>> >>
>> >> A couple of features that would be useful
>> >>
>> >> * Option to define resulting column names. Please see pdltools
>> >> implementation - the ability to pass in a function is especially
>> useful (
>> >> http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html)
>> >> * Option to dummy code only the top n most frequently occurring values
>> in
>> >> any column
>> >> * Option to exclude original column from results table
>> >> * Option to create numeric column names (E.g. pivotcol_val1,
>> >> pivotcol_val2 ...) instead of values in column names + secondary
>> mapping
>> >> table
>> >>
>> >> Thank you
>> >>
>> >> Jarrod Vawdrey
>> >> Sr. Data Scientist
>> >> Data Science & Engineering | Pivotal
>> >> (650) 315-8905
>> >> https://pivotal.io/
>> >>
>> >> On Fri, Oct 14, 2016 at 3:35 PM, Frank McQuillan <
>> fmcquil...@pivotal.io>
>> >> wrote:
>> >>
>> >>> For the module encoding categorical variables
>> >>> http://madlib.incubator.apache.org/docs/latest/group__grp__d
>> >>> ata__prep.html
>> >>> does anyone have any suggestions on improvements that we could make?
>> >>>
>> >>> Here is a video on how encoding categorical variables works for those
>> not
>> >>> familiar with it
>> >>> https://www.youtube.com/watch?v=zxGgGMGJZRo=7=PL6
>> >>> 2pIycqXx-Qf6EXu5FDxUgXW23BHOtcQ
>> >>>
>> >>
>> >>
>> >
>>
>
>


Proposed improvement to association rules (Apriori) algorithm

2016-10-27 Thread Frank McQuillan
Here is a comment from a MADlib user that I recently heard:

“No apparent way to set an upper bound for itemset size in assoc_rules
function. This results in it running forever with larger data sets. In the
R "arules" package, you can set a max itemset size so that it doesn't look
for unnecessarily large associations.”
https://cran.r-project.org/web/packages/arules/arules.pdf

Does a single optional parameter make sense to add to
http://madlib.incubator.apache.org/docs/latest/group__grp__assoc__rules.html
similar to the maxlen parameter in “arules” ?

Any other considerations here or improvements to make the this algorithm at
the same time? minlen?

Thanks,
Frank


Re: Encoding categorical variables

2016-10-19 Thread Frank McQuillan
great, thanks for the additional information

Frank

On Wed, Oct 19, 2016 at 1:57 PM, Jarrod Vawdrey <jvawd...@pivotal.io> wrote:

> IMO
>
> 1) Option to define resulting column names. Please see pdltools
> implementation - the ability to pass in a function is especially useful (
> http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html)
> 2) Option to dummy code only the top n most frequently occurring values in
> any column
> 3) Option to create numeric column names (E.g. pivotcol_val1, pivotcol_val2
> ...) instead of values in column names + secondary mapping table
> 4) Option to exclude original column from results table
>
> (1) & (2) are much higher priority than (3) & (4).
>
> Agreed that these could also be applied to Pivoting (especially 1).
>
>
>
> Jarrod Vawdrey
> Sr. Data Scientist
> Data Science & Engineering | Pivotal
> (650) 315-8905
> https://pivotal.io/
>
> On Wed, Oct 19, 2016 at 4:47 PM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
>
> > Thanks for those suggestions, Jarrod.  They all sound pretty useful -
> > would you mind taking a crack at numbering them 1,2,3... etc, in the
> order
> > of priority as you see it?
> >
> > Also it seems like some of these could be applied to the Pivot function
> as
> > well, e.g., UDF for column naming.
> >
> > Frank
> >
> >
> >
> > On Fri, Oct 14, 2016 at 1:02 PM, Jarrod Vawdrey <jvawd...@pivotal.io>
> > wrote:
> >
> >> Hey Frank,
> >>
> >> How are special character values handled today? It is often not ideal to
> >> end up with column names that require double quotes to call due to
> >> downstream scripts.
> >>
> >> A couple of features that would be useful
> >>
> >> * Option to define resulting column names. Please see pdltools
> >> implementation - the ability to pass in a function is especially useful
> (
> >> http://pivotalsoftware.github.io/PDLTools/group__grp__pivot01.html)
> >> * Option to dummy code only the top n most frequently occurring values
> in
> >> any column
> >> * Option to exclude original column from results table
> >> * Option to create numeric column names (E.g. pivotcol_val1,
> >> pivotcol_val2 ...) instead of values in column names + secondary mapping
> >> table
> >>
> >> Thank you
> >>
> >> Jarrod Vawdrey
> >> Sr. Data Scientist
> >> Data Science & Engineering | Pivotal
> >> (650) 315-8905
> >> https://pivotal.io/
> >>
> >> On Fri, Oct 14, 2016 at 3:35 PM, Frank McQuillan <fmcquil...@pivotal.io
> >
> >> wrote:
> >>
> >>> For the module encoding categorical variables
> >>> http://madlib.incubator.apache.org/docs/latest/group__grp__d
> >>> ata__prep.html
> >>> does anyone have any suggestions on improvements that we could make?
> >>>
> >>> Here is a video on how encoding categorical variables works for those
> not
> >>> familiar with it
> >>> https://www.youtube.com/watch?v=zxGgGMGJZRo=7=PL6
> >>> 2pIycqXx-Qf6EXu5FDxUgXW23BHOtcQ
> >>>
> >>
> >>
> >
>


New features in MADlib

2016-10-19 Thread Frank McQuillan
Which features would you like to see in a future version of Apache MADlib?
Could be big or small stuff.

Please let the community know what you think would be valuable to work on.

(If you prefer to complete a short survey form about Apache MADlib, please
let me know & I will send a Survey Monkey link.)

I will collect input from all sources and post survey results (aggregate
and anonymous) to the Apache MADlib website


Thanks,
Frank


New blog published on last release

2016-09-27 Thread Frank McQuillan
Hi,

I just published a new blog on Pivotal.io called "New Tools To Shape Data
In Apache MADlib"
https://blog.pivotal.io/big-data-pivotal/products/new-tools-to-shape-data-in-apache-madlib
based on the last release.

Please have a look and let me know if you have any comments.

Frank


Apache MADlib (incubating) v1.9.1 Release Announcement - GA

2016-09-20 Thread Frank McQuillan
This is the 3rd Apache release for MADlib.

Features of this release:
* new modules (1-class SVM for novelty detection, prediction metrics,
sessionization, pivoting)
* improvements to existing modules (class weights in SVM, overlapping
patterns in path)
* performance improvements (path)
* platform updates (support for PostgreSQL 9.5 and 9.6)
* bug fixes
* doc improvements

For more information please read the release notes:
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9.1

Download the release:
http://madlib.incubator.apache.org/download.html

Thank you to the MADlib community for a very fine release.

Here’s a look at some future features being considered:
https://cwiki.apache.org/confluence/display/MADLIB/Roadmap
Happy to get your input on what you would like to see.  New contributors
are always welcome.

Frank


Re: Contributing GMM and Perceptron to MADLib

2016-09-19 Thread Frank McQuillan
Hi Aditya,

I noticed the KNN poster
http://dsr.cise.ufl.edu/wp-content/uploads/2016/05/MADlib_Combined.pptx.pdf
and was wondering if you have plans to make a pull request?

Frank


On Mon, Mar 28, 2016 at 9:37 PM, Roman Shaposhnik <r...@apache.org> wrote:

> Awesome!
>
> On Mon, Mar 28, 2016 at 9:18 PM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
> > Thanks Roman.  I was able to do it just now.
> >
> > Frank
> >
> > On Mon, Mar 28, 2016 at 9:12 PM, Roman Shaposhnik <r...@apache.org>
> wrote:
> >>
> >> I can help with that -- stay tuned.
> >>
> >> On Mon, Mar 28, 2016 at 8:29 PM, Frank McQuillan <fmcquil...@pivotal.io
> >
> >> wrote:
> >> > Let me figure out how to do this and add Aditya as the owner of that
> >> > JIRA.
> >> > My initial attempts in ASF infra-land were not quite successful.
> >> >
> >> > Frank
> >> >
> >> > On Mon, Mar 28, 2016 at 4:54 PM, Rahul Iyer <ri...@pivotal.io> wrote:
> >> >>
> >> >> @Frank, Roman: I believe Aditya needs to be added as a developer to
> the
> >> >> MADlib project to assign a JIRA to him? Is this only available to the
> >> >> lead/owner?
> >> >>
> >> >> On Mon, Mar 28, 2016 at 3:49 PM, Aditya Nain <adityana...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi Rahul,
> >> >>>
> >> >>> I didn't have an id, so I created one now.
> >> >>> My id is : Aditya Nain
> >> >>>
> >> >>> Thanks,
> >> >>> Aditya
> >> >>>
> >> >>> On Mon, Mar 28, 2016 at 6:40 PM, Rahul Iyer <ri...@pivotal.io>
> wrote:
> >> >>>
> >> >>> > I can assign this to you, but you need to have an account in
> >> >>> > https://issues.apache.org.
> >> >>> > If you already have an account, then please send your id - I
> wasn't
> >> >>> > able to
> >> >>> > find you just using your name.
> >> >>> >
> >> >>> > On Mon, Mar 28, 2016 at 3:31 PM, Aditya Nain <
> adityana...@gmail.com>
> >> >>> > wrote:
> >> >>> >
> >> >>> > > Hi Rahul,
> >> >>> > >
> >> >>> > > Thanks for the reply!
> >> >>> > >
> >> >>> > > I am working on implementing Gaussian Mixture Model assuming
> that
> >> >>> > > the
> >> >>> > > co-variance matrix is same for all the Gaussians.
> >> >>> > > The JIRA which deals GMM is MADBLIB-410:
> >> >>> > >
> >> >>> >
> >> >>> >
> >> >>> > https://issues.apache.org/jira/browse/MADLIB-410?jql=
> project%20%3D%20MADLIB
> >> >>> > >
> >> >>> > > Can this be assigned to me, or how do I get it assigned to me?
> >> >>> > >
> >> >>> > > Thanks,
> >> >>> > > Aditya
> >> >>> > >
> >> >>> > > On Mon, Mar 21, 2016 at 3:41 PM, Rahul Iyer <ri...@pivotal.io>
> >> >>> > > wrote:
> >> >>> > >
> >> >>> > > > Hi Aditya,
> >> >>> > > >
> >> >>> > > > Welcome to the MADlib community!
> >> >>> > > >
> >> >>> > > > Gaussian Mixture models is extrememly useful and we would
> >> >>> > > > heartily
> >> >>> > > welcome
> >> >>> > > > a contribution for it. The SQLEM paper might be
> oversimplifying
> >> >>> > > > the
> >> >>> > > > capabilities of the database (e.g. assuming there is no array
> >> >>> > > > type
> >> >>> > > > is
> >> >>> > > > unnecessary for Postgresql). You could speed things (both dev
> >> >>> > > > time
> >> >>> > > > and
> >> >>> > > > execution time) by writing some of the functions in C++.
> K-means
> >> >>> > > > is
> >> >>> > > > an
> >> >>> > > >

New Apache MADlib contributor: Nandish Jayaram

2016-09-15 Thread Frank McQuillan
Dear MADlib dev community,

The Project Management Committee (PMC) for Apache MADlib has asked
Nandish Jayaram to become a committer and we are pleased to announce that he
has accepted.

Here are some of his contributions:

- New features (Sessionization, initial stages of one-class SVM)
- Expansion of existing modules (Path)
- Bug fixes (path, elastic net, decision tree)
- Infrastructure projects

Being a committer enables easier contribution to the project since there is
no need to go via the patch submission process.  This should enable better
productivity.  Being a PMC member enables assistance with the management
and to guide the direction of the project.

Welcome Nandish!

Regards,
Frank


New Apache MADlib contributor: Orhan Kislal

2016-09-15 Thread Frank McQuillan
Dear MADlib dev community,

The Project Management Committee (PMC) for Apache MADlib has asked
Orhan Kislal to become a committer and we are pleased to announce that he
has accepted.

Here are some of Orhan’s recent contributions:

- Release manager for 1.9alpha, 1.9 and 1.9.1
- Worked on:
  - New features (prediction metrics, pivoting)
  - Expansion of existing modules (PCA)
  - Upgrade support
  - Release related tasks
  - Bug fixes (kmeans, random forest, elastic net etc.)

Being a committer enables easier contribution to the project since there is
no need to go via the patch submission process.  This should enable better
productivity.  Being a PMC member enables assistance with the management
and to guide the direction of the project.

Welcome Orhan!

Regards,
Frank


Re: [VIRTUAL] MADlib Community Meeting TODAY, September 13, 9AM Pacific: Deep Dive into MADlib 1.9.1

2016-09-13 Thread Frank McQuillan
Thanks for attending

Links mentioned in today’s MADlib community meeting

Release notes 1.9.1
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9.1

Jupyter notebooks for 1.9.1 demos
https://github.com/madlib/madlib-examples

Previous community call describing path functions in more detail (just
skimmed over this today)
https://www.youtube.com/watch?v=vFJSeSvQT94=4=PL62pIycqXx-Qf6EXu5FDxUgXW23BHOtcQ

Frank

On Tue, Sep 13, 2016 at 8:07 AM, Gregory Chase  wrote:

> Reminder the MADlib meeting starts in about an hour, and we have a new
> meeting platform, with either 100% streaming or dial-in option
>
> Join from PC, Mac, Linux, iOS or Android:
> https://pivotal.zoom.us/j/923158161
>
> To join phone conference (or you can stream from above):
>
> Or iPhone one-tap (US Toll):  +16465588656,923158161# or
> +14086380968,923158161#
>
> Or Telephone:
> Dial: +1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
> Meeting ID: 923 158 161
> International numbers available (scroll to bottom):
> https://pivotal.zoom.us/zoomconference?m=w6tQZeVQnm1XJz5xZ8gX94j8OEH_ENCn
>
> I know we have at least one person from China possibly joining today. I'm
> sorry we don't have a Chinese dial-in number. I recommend using either
> closest or US toll.
>
>
> On Mon, Sep 12, 2016 at 4:50 PM, Gregory Chase  wrote:
>
> > Dear MADlib, HAWQ, and Greenplum communities,
> > This is a reminder that tomorrow at 9AM, we'll be talking about the new
> > release of MADlib 1.9.1.
> >
> > Also, we're changing our meeting platform to Zoom.  Directions on how to
> > join are below.  Please be aware that you'll need to download the Zoom
> > client if its your first time meeting with Zoom.
> >
> > We also have a dial-in for the first time ever, or you can also join and
> > hear streaming audio via your computer:
> >
> > Join from PC, Mac, Linux, iOS or Android: https://pivotal.zoom.us/j/
> > 923158161
> >
> > Or iPhone one-tap (US Toll):  +16465588656,923158161# or +14086380968
> > ,923158161#
> >
> > Or Telephone:
> > Dial: +1 646 558 8656 (US Toll) or +1 408 638 0968 (US Toll)
> > Meeting ID: 923 158 161
> > International numbers available: https://pivotal.zoom.us/
> > zoomconference?m=w6tQZeVQnm1XJz5xZ8gX94j8OEH_ENCn
> >
> > On Tue, Sep 6, 2016 at 3:12 PM, Gregory Chase  wrote:
> >
> >> Dear MADlib, HAWQ, and Greenplum Communities,
> >> The Apache MADlib (incubating) project is about to release MADlib 1.9.1.
> >>
> >> To celebrate, we are organizing the next MADlib Virtual Community
> Meeting
> >> next Tuesday, September 13 at 9AM Pacific.
> >> Join here  | Add to
> >> your calendar
> >>  TEMPLATE=en=MADlib%3A%20Deep%20Dive%20into%201.9.1&
> dates=20160913T09%2F20160913T095000=https%3A%2F%
> 2Fpivotalcommunity.adobeconnect.com%2Fmadlib%2F=America%2FLos_Angeles&
> details=The%20Apache%20MADlib%20%28incubating%29%20project%
> 20is%20about%20to%20release%20MADlib%201.9.1.%20%0A%0ATo%
> 20celebrate%2C%20we%20are%20organizing%20the%20next%20MADlib%20Virtual%
> 20Community%20Meeting%20next%20Tuesday%2C%20September%2013%
> 20at%209AM%20Pacific.%20%20%0A%0A%0AWe%27ll%20be%20taking%
> 20a%20deep%20dive%20into%20the%20new%20capabilities%
> 20of%201.9.1%20including%3A%0A%0ANew%20functions%20for%3A%
> 0AOne%20class%20SVM%0APrediction%20Metrics%0ASessionization%20%0APivot%
> 0A%0AWe%27ve%20also%20enhanced%20existing%20SVM%
> 20use%20cases%20to%20assign%20weights%20to%20multiple%
> 20classes%2C%20and%20greatly%20improved%20the%20path%
> 20function.%20%20Finally%20we%27ve%20updated%20support%
> 20for%20PostgreSQL%209.5%20and%209.6%0A%0AAfter%20discussing%20the%20new%
> 20capabilities%2C%20we%27ll%20demo%20novelty%20detection%
> 2C%20path%20functions%2C%20prediction%20metrics%2C%
> 20and%20sessionization.%0A%0ARelease%20notes%20can%20be%
> 20found%20here%3A%20https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%
> 2FMADLIB%2FMADlib%2B1.9.1%0A%0AExamples%20of%20new%
> 20capabilities%20can%20be%20found%20here%3A%20https%3A%
> 2F%2Fgithub.com%2Fmadlib%2Fmadlib-examples%0A>
> >>
> >> We'll be taking a deep dive into the new capabilities of 1.9.1
> including:
> >>
> >> New functions for:
> >> One class SVM
> >> Prediction Metrics
> >> Sessionization
> >> Pivot
> >>
> >> We've also enhanced existing SVM use cases to assign weights to multiple
> >> classes, and greatly improved the path function.  Finally we've updated
> >> support for PostgreSQL 9.5 and 9.6
> >>
> >> After discussing the new capabilities, we'll demo novelty detection,
> path
> >> functions, prediction metrics, and sessionization.
> >>
> >> Release notes can be found here: https://cwiki.apache.org/confl
> >> uence/display/MADLIB/MADlib+1.9.1
> >>
> >> Examples of new capabilities can be found here:
> >> https://github.com/madlib/madlib-examples
> >>
> >> See you next Tuesday!
> >>
> >> Join here 

Re: [VIDEO] Origin of Apache MADlib (incubating)

2016-09-06 Thread Frank McQuillan
Thanks for posting, Greg.

On Tue, Sep 6, 2016 at 3:49 PM, Gregory Chase  wrote:

> Dear MADlib Community,
> The very first committer to MADlib, Joe Hellerstein, was kind enough to
> tell us the story about the origin of Apache MADlib in this video:
>
> https://www.youtube.com/watch?v=DGPZwpB92Aw=10=PL62pIycqXx-
> Qf6EXu5FDxUgXW23BHOtcQ
>
> Please enjoy.
>
> -Greg
>
> --
> Greg Chase
>
> Global Head, Big Data Communities
> http://www.pivotal.io/big-data
>
> Pivotal Software
> http://www.pivotal.io/
>
> 650-215-0477
> @GregChase
> Blog: http://geekmarketing.biz/
>


[VOTE] MADlib v1.9.1-rc2

2016-09-02 Thread Frank McQuillan
Hello MADlib community,

We have created a MADlib 1.9.1 RC-2, with the artifacts below up for a vote.

This release candidate replaces RC-1.  The only difference between RC-1 and
RC-2 is
that some ._’ files were sneaked in by OSX during the packaging.
These have been removed.

This will be the 3rd release for Apache MADlib (incubating).

The main goals of this release are:
* new modules (1-class SVM for novelty detection, prediction metrics,
sessionization, pivoting)
* improvements to existing modules (class weights in SVM, overlapping
patterns in path)
* performance improvements (path)
* platform updates (PostgreSQL 9.5 and 9.6)
* bug fixes
* doc improvements

For more information including release notes, please see:
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9.1

*** Please download, review and vote by Tues Sep 6, 2016 @ 6pm PST ***

We're voting upon the source (tag):  rc/1.9.1-rc2

Source Files:
https://dist.apache.org/repos/dist/dev/incubator/madlib/1.9.1-incubating-rc2

Commit to be voted upon:
https://git-wip-us.apache.org/repos/asf?p=incubator-madlib.git;a=commit;h=e1c99c1538dc124c9b323ba76382ba2af05c6892

KEYS file containing PGP Keys we use to sign the release:
https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS

To help in tallying the vote, can PMC members please be sure to indicate
"(binding)" with their vote.

[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)

Thank you,
Frank McQuillan


Re: [VOTE] MADlib v1.9.1-rc1

2016-09-02 Thread Frank McQuillan
; > make[2]: *** [src/ports/postgres/9.6/CMakeFiles/madlib_postgresql_
> > > 9_6.dir/__/__/__/modules/tsa/._arima.cpp.o]
> > > > Error 1
> > > > make[1]: *** [src/ports/postgres/9.6/CMakeFiles/madlib_postgresql_
> > > 9_6.dir/all]
> > > > Error 2
> > > > make: *** [all] Error 2
> > > > [snaga@localhost build]$
> > > >
> > > > And I found that the tarball contains some binary files (?) which
> > > > seems built on Mac OS X.
> > > > I guess this is the reason of the build failure.
> > > >
> > > > [snaga@localhost madlib]$ tar ztvf
> > > > apache-madlib-1.9.1-incubating-source.tar.gz | grep _arima
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/ports/
> > > postgres/modules/tsa/._arima.py_in
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/ports/
> > > postgres/modules/tsa/._arima.sql_in
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/ports/
> > > postgres/modules/tsa/._arima_forecast.py_in
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/ports/
> > > postgres/modules/tsa/test/._arima.sql_in
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/ports/
> > > postgres/modules/tsa/test/._arima_train.sql_in
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/modules/tsa/._arima.cpp
> > > > -rw-r--r-- riyer/staff 226 2016-08-31 07:31
> > > > apache-madlib-1.9.1-incubating-source/src/modules/tsa/._arima.hpp
> > > > [snaga@localhost madlib]$ file
> > > > apache-madlib-1.9.1-incubating-source/src/modules/tsa/._arima.cpp
> > > > apache-madlib-1.9.1-incubating-source/src/modules/tsa/._arima.cpp:
> > > > AppleDouble encoded Macintosh file
> > > > [snaga@localhost madlib]$
> > > >
> > > > Is this intended? Or should it be fixed?
> > > >
> > > > Regards,
> > > >
> > > >
> > > > 2016-09-02 4:17 GMT+09:00 Frank McQuillan <fmcquil...@pivotal.io>:
> > > >> Hello MADlib community,
> > > >>
> > > >> We have created a MADlib 1.9.1 release candidate, with the artifacts
> > > below
> > > >> up for a vote.
> > > >>
> > > >> This will be the 3rd release for Apache MADlib (incubating).
> > > >>
> > > >> The main goals of this release are:
> > > >> * new modules (1-class SVM for novelty detection, prediction
> metrics,
> > > >> sessionization, pivoting)
> > > >> * improvements to existing modules (class weights in SVM,
> overlapping
> > > >> patterns in path)
> > > >> * performance improvements (path)
> > > >> * platform updates (PostgreSQL 9.5 and 9.6)
> > > >> * bug fixes
> > > >> * doc improvements
> > > >>
> > > >> For more information including release notes, please see:
> > > >> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9.1
> > > >>
> > > >> *** Please download, review and vote by Tues Sep 6, 2016 @ 6pm PST
> ***
> > > >>
> > > >> We're voting upon the source (tag):  rc/1.9.1-rc1
> > > >>
> > > >> Source Files:
> > > >> https://dist.apache.org/repos/dist/dev/incubator/madlib/1.9.
> > > 1-incubating-rc1/
> > > >>
> > > >> Commit to be voted upon:
> > > >> https://git-wip-us.apache.org/repos/asf?p=incubator-madlib.
> > > git;a=commit;h=e1c99c1538dc124c9b323ba76382ba2af05c6892
> > > >>
> > > >> KEYS file containing PGP Keys we use to sign the release:
> > > >> https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
> > > >>
> > > >> To help in tallying the vote, can PMC members please be sure to
> > indicate
> > > >> "(binding)" with their vote.
> > > >>
> > > >> [ ] +1  approve
> > > >> [ ] +0  no opinion
> > > >> [ ] -1  disapprove (and reason why)
> > > >>
> > > >> Thank you,
> > > >> Frank McQuillan
> > > >
> > > >
> > > >
> > > > --
> > > > Satoshi Nagayasu <sn...@uptime.jp>
> > >
> > >
> > >
> > > --
> > > Satoshi Nagayasu <sn...@uptime.jp>
> > >
> >
> >
> >
> > --
> >
> > -
> > Rahul Iyer
> > Principal software engineer | Predictive Analytics
> >
> > *Pivotal**A new platform for a new era*
> >
>


Jupyter notebooks for v1.9.1 demos

2016-09-01 Thread Frank McQuillan
I posted some Jupyter notebooks with small data sets at
https://github.com/madlib/madlib-examples
to try out v1.9.1 features.

Many of these examples are used in the user docs for v1.9.1.

As you can see in the other thread from today, v1.9.1 RC is up for [VOTE]
so please vote.

Thanks,
Frank


Re: [VIDEO REPLAY] MADlib Call - Ask Us Anything (About MADlib)

2016-07-28 Thread Frank McQuillan
Thanks for posting, Greg.  It was an interesting discussion.

Frank

On Wed, Jul 27, 2016 at 6:50 PM, Gregory Chase  wrote:

> Today's open office hours call turned into a delightful discussion with a
> new user to MADlib.  Other new users to MADlib may appreciate his
> questions.
>
> Here's the full replay:
> https://youtu.be/EPQk5Uzs8OY?t=35s
>
> Here's the topics we talked about:
>
> 0:35 Our new users’ use case 
>
> 1:45 Why does the system behave differently with MADlib vs. what I see with
> R? 
>
> 3:23 Tableau interactive queries vs. MADlib batch-oriented processing
> 
>
> 7:27 What first algorithms have you run in MADlib?
> 
>
> 8:13 Balancing MADlib query processing with general database access demands
> 
>
> 10:46 MADlib support for decision trees
> 
>
> 12:14 What’s the best way to get more experience using MADlib?
> 
>
> 15:30 SQL vs. R for data science 
>
> 16:54 What about Python? 
>
> 19:03 Brazilian tourism industry advertisement
> 
>
> On Wed, Jul 27, 2016 at 8:02 AM, Greg Chase  wrote:
>
> > Greetings MADlib community,
> > This is a reminder that our next community virtual meeting starts in
> about
> > an hour.
> >
> > Topic: "Ask Us Anything (About MADlib)".  Bring your questions.
> >
> > You can join live here <
> https://pivotalcommunity.adobeconnect.com/madlib/>
> >
> > -Greg
> >
> > On Wed, Jul 20, 2016 at 11:18 AM, Gregory Chase 
> wrote:
> >
> >> Dear MADlib community,
> >> We're holding another Apache MADlib (incubating) community call next
> week,
> >> Wednesday, July 27, at 9AM.
> >>
> >> Topic:  Ask Us Anything (about MADlib)
> >>
> >> Agenda: Whatever we get asked
> >>
> >> Join us for an open Q about using MADlib. We'll have the best expert
> >> contributors and data science users of Apache MADlib on hand to answer
> >> whatever questions you might have.
> >>
> >> You can join live here <
> https://pivotalcommunity.adobeconnect.com/madlib/
> >> >,
> >> and add a reminder appointment to your calendar here
> >> <
> >>
> https://calendar.google.com/calendar/event?eid=Zm8xMjUzOTEyajk3ZDNlZXQ0ZGpnOHBtZnMgcGl2b3RhbC5pb191OGtndnVhaGprYm9oMWduZmh2NXRzMnY5Y0Bn=America/Los_Angeles
> >> >
> >> .
> >>
> >> See you next Wednesday, July 27 at 9AM Pacific!
> >>
> >> --
> >> Greg Chase
> >>
> >> Global Head, Big Data Communities
> >> http://www.pivotal.io/big-data
> >>
> >> Pivotal Software
> >> http://www.pivotal.io/
> >>
> >> 650-215-0477
> >> @GregChase
> >> Blog: http://geekmarketing.biz/
> >>
> >
> >
>
>
> --
> Greg Chase
>
> Global Head, Big Data Communities
> http://www.pivotal.io/big-data
>
> Pivotal Software
> http://www.pivotal.io/
>
> 650-215-0477
> @GregChase
> Blog: http://geekmarketing.biz/
>


Re: [VIRTUAL] MADlib Call Wednesday 7/27 9AM PT: Ask Us Anything (About MADlib)

2016-07-20 Thread Frank McQuillan
Thanks for suggesting this topic, Greg.  Looking forward to it.

Frank

On Wed, Jul 20, 2016 at 11:18 AM, Gregory Chase  wrote:

> Dear MADlib community,
> We're holding another Apache MADlib (incubating) community call next week,
> Wednesday, July 27, at 9AM.
>
> Topic:  Ask Us Anything (about MADlib)
>
> Agenda: Whatever we get asked
>
> Join us for an open Q about using MADlib. We'll have the best expert
> contributors and data science users of Apache MADlib on hand to answer
> whatever questions you might have.
>
> You can join live here  >,
> and add a reminder appointment to your calendar here
> <
> https://calendar.google.com/calendar/event?eid=Zm8xMjUzOTEyajk3ZDNlZXQ0ZGpnOHBtZnMgcGl2b3RhbC5pb191OGtndnVhaGprYm9oMWduZmh2NXRzMnY5Y0Bn=America/Los_Angeles
> >
> .
>
> See you next Wednesday, July 27 at 9AM Pacific!
>
> --
> Greg Chase
>
> Global Head, Big Data Communities
> http://www.pivotal.io/big-data
>
> Pivotal Software
> http://www.pivotal.io/
>
> 650-215-0477
> @GregChase
> Blog: http://geekmarketing.biz/
>


Re: Sessionize function - create table or view by default?

2016-07-20 Thread Frank McQuillan
Hey Jim,

Thank you for the thoughtful response.

Given your comments, I think we ought to stick with a view as the default
for sessionize.  Looking ahead to MADlib 2.0, one thing we want to better
support is workflows since chaining operations together is such a common
data science thing to do.  That means looking across all existing MADlib
functions to determine what changes to returns sets we need to make, such
as standardizing on views.

Frank

On Wed, Jul 20, 2016 at 6:10 AM, Jim Nasby <jim.na...@bluetreble.com> wrote:

> On 7/19/16 7:36 PM, Frank McQuillan wrote:
>
>> "create_view (optional)
>> BOOLEAN default: TRUE. Determines whether to create a view or materialize
>> a
>> table as output. If you only needed session info once, creating a view
>> could be significantly faster than materializing as a table."
>>
>> Question is:  should it really default TRUE (view) or is it better to
>> default FALSE (table)? i.e., Or does it really not matter?
>>
>
> tl;dr: it depends, but more importantly I think MADlib should promote
> views as at least an option (if not the default) across the board.
>
> It's going to depend heavily on what you're doing.
>
> If you're building a "pipeline" of operations where sessionization is just
> the first step of several, and the session data is only referred to once,
> creating a view gives the planner a lot more flexibility on how to produce
> output.
>
> When you use a temp table, the planner has no choice: it must not only
> materialize the complete result set immediately, but it also has to modify
> the catalog to record the temp table.
>
> A pythonic analogy would be that a view is like using a generator (where
> data only needs to be brought forth as it's consumed) while a temp table is
> like using a list. Except there's an even larger difference in SQL: a view
> means the optimizer has a chance to change the execution plan of the final
> query that's using the sessionize output, taking into account everything
> you're doing with the data.
>
> If there are multiple steps in a pipeline, this difference can become very
> large. I've worked on query chains where switching temp tables to views has
> had a 5-10x impact.
>
> The two cases where a temp table would be better are if you need to refer
> to the sessionized data many times (sometimes you'd need to hit it more
> than twice for the temp table to be a win), or if the optimizer ends up
> picking a bad plan when everything is combined into a single query. In the
> case of a bad plan, it would still be better to either try and tweak the
> optimizer settings, or to insert an "optimization fence", typically done by
> sticking an OFFSET 0 clause in.
>
> To me, the bigger picture is promoting the option of views across all
> MADlib set returning operations, because when multiple operations are
> chained together you can see a very large benefit. I suspect that it's more
> common to chain things together, making views a better default... but
> that's just a guess. So if there was a standard default for this across the
> board, perhaps views would be best.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461
>


Session - comment on partition

2016-07-19 Thread Frank McQuillan
Oh another suggestion:

In the same way in path
http://madlib.incubator.apache.org/docs/latest/group__grp__path.html
that we do for partition param:
"This can be NULL or '' to indicate the matching is to be applied to the
whole table."

I think we should do the same for the session function, in the case u want
to sesisonize the whole table.

Frank


Sessionize function - create table or view by default?

2016-07-19 Thread Frank McQuillan
Hi,

I have been testing the sessionize function lately

https://issues.apache.org/jira/browse/MADLIB-909
https://issues.apache.org/jira/browse/MADLIB-1001

and am wondering about this param:

"create_view (optional)
BOOLEAN default: TRUE. Determines whether to create a view or materialize a
table as output. If you only needed session info once, creating a view
could be significantly faster than materializing as a table."

Question is:  should it really default TRUE (view) or is it better to
default FALSE (table)? i.e., Or does it really not matter?

Thx,
Frank


Re: MADlib 1.9.1 release manager

2016-07-13 Thread Frank McQuillan
Thank you, Orhan.

On Wed, Jul 13, 2016 at 1:36 PM, Orhan Kislal <okis...@pivotal.io> wrote:

> I volunteer for the release manager duties. I will still need some help
> from a committer to complete the procedures but I believe I can take care
> of the rest.
>
> Thanks
>
> Orhan
>
> On Mon, Jul 11, 2016 at 4:55 PM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
>
> > As we start moving towards the 1.9.1 release, I am wondering if someone
> > would like to offer be release manager this time around.
> >
> > A general description of the process is at
> > http://incubator.apache.org/guides/releasemanagement.html
> >
> > I will of course help walk thru the steps with anyone who would like to
> > participate.
> >
> > Frank
> >
>


DRAFT report to ASF for Q2 2016

2016-07-06 Thread Frank McQuillan
Here is the draft report for July 2016, covering Q2 activity.

It is posted at http://wiki.apache.org/incubator/July2016

---

MADlib

Big Data Machine Learning in SQL for Data Scientists.

MADlib has been incubating since 2015-09-15.

Three most important issues to address in the move towards graduation:

  1. Continue to produce regular Apache (incubating) releases.
  2. Expand the community, increase dev list activity and add new
contributors.
  3. Execute and manage the project according to governance model required
by the "Apache Way”.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
of?

 None

How has the community developed since the last report?

  1. MADlib related events in Q2 2016:
* April 19 - Joint community call MADlib - Greenplum Database.
Topic:  MADlib v1.9 new features (Nandish Jayaram, Ivan Novick, Cesar
Rojas, Frank McQuillan)
* May 5 - MADLib community call.  Topic:  Detailed review of the MADlib
v1.9 release (Xiaocheng Tang, Frank McQuillan)
* June 21 - MADLib community call.  Topic:  Apache Zeppelin meets Apache
MADlib (incubating) and Apache HAWQ (incubating) (Moon soo Lee, Rahul Iyer,
Frank McQuillan)
  * June 21 - Data Engineers Guild meetup in Palo Alto.  Topic: The
Analytics and Science Behind Connected Transportation (Srivatsan Ramanujam,
Esther Vasiete, Ralph Rabbat, Frank McQuillan)
  2. Material technical conversations on dev mailing lists and in the
appropriate JIRAs and pull requests.
  3. We are seeing some PostgreSQL experts chipping on SQL coding and
making good suggestions in the pull requests.

How has the project developed since the last report?

  1. 2nd ASF release MADlib v1.9 released on April 6, 2016.  The goal of
this 2nd release was general availability of MADlib v1.9 for community use.
  2. 3rd ASF release MADlib v1.9.1 anticipated this summer depending on
community input.  Features include:  path functions (phase 2), 1-class
support vector machines for novelty detection, prediction metrics,
sessionization, pivoting.
  3. 2 JIRAs created and 14 resolved in last 30 days.

Date of last release:

  MADlib v1.9 on 4/6/16.

When were the last committers or PMC members elected?

  Xiaocheng Tang on 1/14/16.


Sessionization min time param

2016-06-01 Thread Frank McQuillan
There has been some good discussion on the sessionization feature lately,
e.g.,
https://github.com/apache/incubator-madlib/pull/44
Thanks to Jim Nasby  for his comments.

I broke sessionization into 3 phases, of which Phase 1 and 2 are in work
now by Nandish Jayaram:

Sessionization - Phase 1
https://issues.apache.org/jira/browse/MADLIB-909

Sessionization - Phase 2 (output controls)
https://issues.apache.org/jira/browse/MADLIB-1001


But Phase 3 has no assignee if someone is interested in thinking about how
to implement a min time parameter, either with window functions or another
approach?

Sessionization - Phase 3 (minimum time)
https://issues.apache.org/jira/browse/MADLIB-1002

Frank


Re: Open office hours for Apache MADlib at Apache:Big Data 5:30 PM PST today

2016-05-10 Thread Frank McQuillan
Thanks for setting this up, Greg.  Happy to attend and speak with the fine
folks from the Apache community.

Frank

On Tue, May 10, 2016 at 9:11 AM, Gregory Chase  wrote:

> Live from the floor of Apache Con, the Apache MADlib project is holding
> open office hours.  Bring any question you want, or drop by for a quick
> introduction.
>
> Find us in the ODPi Lounge next to the ODPi booth.
>
> You can also join virtually at:
> pivotalcommunity.adobeconnect.com/hawqnest/ 
>
> --
> Greg Chase
>
> Global Head, Big Data Communities
> http://www.pivotal.io/big-data
>
> Pivotal Software
> http://www.pivotal.io/
>
> 650-215-0477
> @GregChase
> Blog: http://geekmarketing.biz/
>
>


Re: Suggestions for creative commons / royalty free music for MADlib theme song

2016-05-05 Thread Frank McQuillan
+1

On Thu, May 5, 2016 at 4:28 PM, Gregory Chase  wrote:

> Here's one that meets the criteria, and is fun.
>
> http://dig.ccmixter.org/files/JeffSpeed68/53617
>
> Nice Blues & Rock mix instrumental.
>
> If I get a +1 I'll use it.
>
> -Greg
>
> On Thu, May 5, 2016 at 4:05 PM, Gregory Chase  wrote:
>
>> Hi MADlib devs & users,
>> I've been adding theme music for the virtual meetings a number of the
>> other communities I help grow.
>>
>> It makes our talks a little more fun and approachable.
>>
>> Apache MADlib needs a good one.
>>
>> Any suggestions are welcome.  They only caveats is that the music needs
>> to be free for distribution through a creative commons license and be
>> royalty free.
>>
>> Apache Geode, for example, had a committer who is also a composer who
>> contributed one of his tracks.
>>
>> Hear the result yourself: https://www.youtube.com/watch?v=tOAe1n7Qyd8
>>
>> --
>> Greg Chase
>>
>> Global Head, Big Data Communities
>> http://www.pivotal.io/big-data
>>
>> Pivotal Software
>> http://www.pivotal.io/
>>
>> 650-215-0477
>> @GregChase
>> Blog: http://geekmarketing.biz/
>>
>>
>
>
> --
> Greg Chase
>
> Global Head, Big Data Communities
> http://www.pivotal.io/big-data
>
> Pivotal Software
> http://www.pivotal.io/
>
> 650-215-0477
> @GregChase
> Blog: http://geekmarketing.biz/
>
>


Re: Prediction Metrics

2016-04-11 Thread Frank McQuillan
Orhan,

I think this is a good addition to MADlib.  Regarding your questions:

1) Seems like a good set of prediction metrics to start with.  If other
members of the community would like to add more, they are welcome to create
a JIRA for those and work on them.

2) Suggest we do include grouping as an optional param, since it could be
very useful.  It means an output table is the way to go.  Without grouping,
an output table with a single value is not ideal but OK, since consistency
of output format is useful.

Frank



On Fri, Apr 8, 2016 at 3:54 PM, Orhan Kislal  wrote:

> Hello MADlib community,
>
> I think it might make sense to add a module to MADlib for prediction
> metrics.
> Since there are quite a bit of options, I decided to start with the list of
> metrics from PDLTools [1]. You can see my proposed interface at attachment
> of
> the associated JIRA [2,3]. I'll paste a snippet just as an example. I would
> like
> the feedback of the community on a number of questions that came up.
>
> 1) Are there any other metrics that should take precedence over these ones?
> Please note that binary_classifier reports multiple metrics (tpr, fpr, acc,
> f1
> etc.)
>
> 2) How should we handle grouping? As you can see in the example, the
> function
> returns a double value for regular execution but an output table is used if
> grouping parameter is passed. This dual interface doesn't seem clean and
> returning a table with a single value for the regular execution feels
> wrong.
>
> Thanks
>
> Orhan Kislal
>
>
> [1]
>
> http://pivotalsoftware.github.io/PDLTools/group__grp__prediction__metrics.html
>
> [2] https://issues.apache.org/jira/browse/MADLIB-907
>
> [3]
> https://issues.apache.org/jira/secure/attachment/12797816/interface_v1.sql
>
> ---
>
> CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.area_under_roc(
> table_inTEXT,
> prediction_col TEXT,
>   observed_col TEXT,
> table_out TEXT,
> grouping_col TEXT
> ) RETURNS VOID
> AS $$
> PythonFunctionBodyOnly(`pred_metrics', `pred_metrics')
> return pred_metrics.area_under_roc(schema_madlib,
> table_in, prediction_col, observed_col, table_out, grouping_col)
> $$ LANGUAGE plpythonu
> m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
>
> CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.area_under_roc(
> table_inTEXT,
> prediction_col TEXT,
>   observed_col TEXT
> ) RETURNS DOUBLE PRECISION
> AS $$
> PythonFunctionBodyOnly(`pred_metrics', `pred_metrics')
> return pred_metrics.area_under_roc(schema_madlib,
> table_in, prediction_col, observed_col)
> $$ LANGUAGE plpythonu
> m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
>
> ---
>


[RESULT][VOTE] MADlib v1.9-rc1

2016-04-06 Thread Frank McQuillan
(re-posting with proper subject line)

The vote has PASSED with 3 +1 binding votes from the Incubator PMC members,
and no 0 or -1 votes:

+1 Justin Mclean
+1 Roman Shaposhnik
+1 Konstantin Boudnik

Thread:
http://mail-archives.apache.org/mod_mbox/incubator-general/201604.mbox/%3CCAKBQfzQ2abean0rBXjNb8wvfi7bsVr2HP0dVDWTK%3D3%2BAuRtNog%40mail.gmail.com%3E

On behalf of the MADlib community, thank you to those who reviewed and
voted on this release candidate.

We will proceed with promoting this release candidate.

Regards,
Frank McQuillan


[RESULT][VOTE] MADlib v1.9-rc1

2016-04-06 Thread Frank McQuillan
(re-posting with proper subject line)

The vote has PASSED with 3 +1 binding votes from the Incubator PMC members,
and no 0 or -1 votes:

+1 Justin Mclean
+1 Roman Shaposhnik
+1 Konstantin Boudnik

Thread:
http://mail-archives.apache.org/mod_mbox/incubator-general/201604.mbox/%3CCAKBQfzQ2abean0rBXjNb8wvfi7bsVr2HP0dVDWTK%3D3%2BAuRtNog%40mail.gmail.com%3E

On behalf of the MADlib community, thank you to those who reviewed and
voted on this release candidate.

We will proceed with promoting this release candidate.

Regards,
Frank McQuillan


[RESULT][VOTE] MADlib v1.9-rc1

2016-04-06 Thread Frank McQuillan
The vote has PASSED with 3 +1 binding votes from the Incubator PMC members,
and no 0 or -1 votes:

+1 Justin Mclean
+1 Roman Shaposhnik
+1 Konstantin Boudnik

Thread:
http://mail-archives.apache.org/mod_mbox/incubator-general/201604.mbox/%3CCAKBQfzQ2abean0rBXjNb8wvfi7bsVr2HP0dVDWTK%3D3%2BAuRtNog%40mail.gmail.com%3E

On behalf of the MADlib community, thank you to those who reviewed and
voted on this release candidate.

We will proceed with promoting this release candidate.

Regards,
Frank McQuillan


Re: pca_train error

2016-04-06 Thread Frank McQuillan
Thanks for the update Esther.

Frank

On Wed, Apr 6, 2016 at 3:53 PM, Esther Vasiete <evasi...@pivotal.io> wrote:

> Upgrading to MADlib 1.8 solved the problem!
>
> Thanks,
> Esther
>
> On Tue, Apr 5, 2016 at 10:27 AM, Esther Vasiete <evasi...@pivotal.io>
> wrote:
>
>> Oh sorry, it is HAWQ 1.3.1.
>>
>> And the data engineer will upgrade to MADlib 1.8 tonight.
>>
>> Thanks,
>> Esther
>>
>> On Tue, Apr 5, 2016 at 9:26 AM, Frank McQuillan <fmcquil...@pivotal.io>
>> wrote:
>>
>>> Please clarify the platform - do you mean GPDB 4.2.0?
>>>
>>> Would you be able to upgrade to MADlib 1.8?  Then you are using the
>>> latest software and we can see if you still have a problem.
>>>
>>> Frank
>>>
>>> On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <evasi...@pivotal.io>
>>> wrote:
>>>
>>>> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>>>>
>>>> Thanks.
>>>>
>>>> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fmcquil...@pivotal.io>
>>>> wrote:
>>>>
>>>>> Thanks for the question, Esther.  What version of MADlib are you using
>>>>> and what database platform and version are you running on?
>>>>>
>>>>> It seems to be a MADlib version lower than 1.8 since the error message
>>>>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 
>>>>> to
>>>>> allow user-specified column names in PCA.)
>>>>>
>>>>> Frank
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <evasi...@pivotal.io>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to use pca_train but I am running through this error:
>>>>>>
>>>>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>>>>> Function "madlib.__matrix_densify_sfunc(double
>>>>>> precision[],integer,integer,double precision)": invalid argument - col
>>>>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>>>>> pid=104068) (plpython.c:4648)
>>>>>> SQL state: XX000
>>>>>> Context: Traceback (most recent call last):
>>>>>>   PL/Python function "pca_train", line 23, in 
>>>>>> return pca.pca(**globals())
>>>>>>   PL/Python function "pca_train", line 404, in pca
>>>>>> PL/Python function "pca_train"
>>>>>>
>>>>>> My input table has 15472 rows and two columns; a row_id and an array
>>>>>> with 853 features. I am calling pca_train like this:
>>>>>>
>>>>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>>>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>>>>'ev.hci_subset_pca_output',
>>>>>>'row_id',
>>>>>> 3);
>>>>>>
>>>>>> I unfortunately cannot share the data but this is how it looks in
>>>>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>>>>> large and this is why it appears to be empty but it isn't as you can see 
>>>>>> in
>>>>>> the second screenshot.
>>>>>>
>>>>>> [image: Inline image 1]
>>>>>>
>>>>>> [image: Inline image 3]
>>>>>>
>>>>>> I am not sure why I am running through this error. Please advice.
>>>>>>
>>>>>> Update: I have renamed feature_vector to "row_vec" and "row_id"
>>>>>> starts with 1. Still getting the same error.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> --
>>>>>> *Esther Vasiete *
>>>>>> *Data Scientist | Pivotal*
>>>>>> evasi...@pivotal.io
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Esther Vasiete *
>>>> *Data Scientist | Pivotal*
>>>> evasi...@pivotal.io
>>>>
>>>
>>>
>>
>>
>> --
>> *Esther Vasiete *
>> *Data Scientist | Pivotal*
>> evasi...@pivotal.io
>>
>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasi...@pivotal.io
>


RE: Board Report

2016-04-05 Thread Frank McQuillan
Yes , thanks for reminder.

Frank


Sent from my mobile.  Please excuse brevity.

 Original message 
From: "John D. Ament"
Date:04/05/2016 16:45 (GMT-08:00)
To: dev@madlib.incubator.apache.org
Subject: Board Report

Dear Madlib Podling,

Just a reminder about your board report, which is due tomorrow.

John


Re: pca_train error

2016-04-05 Thread Frank McQuillan
Please clarify the platform - do you mean GPDB 4.2.0?

Would you be able to upgrade to MADlib 1.8?  Then you are using the latest
software and we can see if you still have a problem.

Frank

On Tue, Apr 5, 2016 at 9:20 AM, Esther Vasiete <evasi...@pivotal.io> wrote:

> I am using MADlib 1.7.1 on HAWQ 4.2.0.
>
> Thanks.
>
> On Mon, Apr 4, 2016 at 8:04 PM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
>
>> Thanks for the question, Esther.  What version of MADlib are you using
>> and what database platform and version are you running on?
>>
>> It seems to be a MADlib version lower than 1.8 since the error message
>> you report is different in the 1.8 release.  (There was a bug fix in 1.8 to
>> allow user-specified column names in PCA.)
>>
>> Frank
>>
>>
>>
>>
>>
>> On Mon, Apr 4, 2016 at 4:27 PM, Esther Vasiete <evasi...@pivotal.io>
>> wrote:
>>
>>> Hi,
>>>
>>> I am trying to use pca_train but I am running through this error:
>>>
>>> ERROR: plpy.SPIError: plpy.SPIError: plpy.SPIError: plpy.SPIError:
>>> Function "madlib.__matrix_densify_sfunc(double
>>> precision[],integer,integer,double precision)": invalid argument - col
>>> should be in the range of [0, col_dim)  (seg35 awsaiuirl1178:40003
>>> pid=104068) (plpython.c:4648)
>>> SQL state: XX000
>>> Context: Traceback (most recent call last):
>>>   PL/Python function "pca_train", line 23, in 
>>> return pca.pca(**globals())
>>>   PL/Python function "pca_train", line 404, in pca
>>> PL/Python function "pca_train"
>>>
>>> My input table has 15472 rows and two columns; a row_id and an array
>>> with 853 features. I am calling pca_train like this:
>>>
>>> DROP TABLE if exists ev.hci_subset_pca_output;
>>> SELECT madlib.pca_train( 'ev.hci_subset_pca_input',
>>>'ev.hci_subset_pca_output',
>>>'row_id',
>>> 3);
>>>
>>> I unfortunately cannot share the data but this is how it looks in
>>> pgAdmin3. Note that pgAmdin3 won't show a feature_vector that it is too
>>> large and this is why it appears to be empty but it isn't as you can see in
>>> the second screenshot.
>>>
>>> [image: Inline image 1]
>>>
>>> [image: Inline image 3]
>>>
>>> I am not sure why I am running through this error. Please advice.
>>>
>>> Update: I have renamed feature_vector to "row_vec" and "row_id" starts
>>> with 1. Still getting the same error.
>>>
>>> Thanks,
>>>
>>> --
>>> *Esther Vasiete *
>>> *Data Scientist | Pivotal*
>>> evasi...@pivotal.io
>>>
>>>
>>>
>>
>
>
> --
> *Esther Vasiete *
> *Data Scientist | Pivotal*
> evasi...@pivotal.io
>


Update on MADlib 1.9 release

2016-04-03 Thread Frank McQuillan
Hello MADlib community,

A vote has been proposed to the Incubator PMC regarding MADlib 1.9:
http://mail-archives.apache.org/mod_mbox/incubator-general/201604.mbox/%3CCAKBQfzQ2abean0rBXjNb8wvfi7bsVr2HP0dVDWTK%3D3%2BAuRtNog%40mail.gmail.com%3E

This will be the 2nd ASF release for Apache MADlib (incubating).  The goal
of this 2nd release is:  general availability of MADlib v1.9 for community
use.

The software in this release is very similar to the 1st ASF release MADlib
v1.9alpha on 3/11/16.  The main differences are bug fixes, license and
notice clarifications, and minor updates.  Feature set is the same.

Reminder of the Apache MADlib (incubating) community vote:
http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201602.mbox/%3CCAKBQfzSkXyGVQSKrY99zc9UmTE_NfXcYrxDGB%3DCMBmuCKLxbAg%40mail.gmail.com%3E

For more information including release notes, please see:
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9

Thank you to the MADlib community for your contributions, and we will keep
appraised of the result of this vote.

Regards,
Frank McQuillan


Re: Contributing GMM and Perceptron to MADLib

2016-03-28 Thread Frank McQuillan
Thanks Roman.  I was able to do it just now.

Frank

On Mon, Mar 28, 2016 at 9:12 PM, Roman Shaposhnik <r...@apache.org> wrote:

> I can help with that -- stay tuned.
>
> On Mon, Mar 28, 2016 at 8:29 PM, Frank McQuillan <fmcquil...@pivotal.io>
> wrote:
> > Let me figure out how to do this and add Aditya as the owner of that
> JIRA.
> > My initial attempts in ASF infra-land were not quite successful.
> >
> > Frank
> >
> > On Mon, Mar 28, 2016 at 4:54 PM, Rahul Iyer <ri...@pivotal.io> wrote:
> >>
> >> @Frank, Roman: I believe Aditya needs to be added as a developer to the
> >> MADlib project to assign a JIRA to him? Is this only available to the
> >> lead/owner?
> >>
> >> On Mon, Mar 28, 2016 at 3:49 PM, Aditya Nain <adityana...@gmail.com>
> >> wrote:
> >>>
> >>> Hi Rahul,
> >>>
> >>> I didn't have an id, so I created one now.
> >>> My id is : Aditya Nain
> >>>
> >>> Thanks,
> >>> Aditya
> >>>
> >>> On Mon, Mar 28, 2016 at 6:40 PM, Rahul Iyer <ri...@pivotal.io> wrote:
> >>>
> >>> > I can assign this to you, but you need to have an account in
> >>> > https://issues.apache.org.
> >>> > If you already have an account, then please send your id - I wasn't
> >>> > able to
> >>> > find you just using your name.
> >>> >
> >>> > On Mon, Mar 28, 2016 at 3:31 PM, Aditya Nain <adityana...@gmail.com>
> >>> > wrote:
> >>> >
> >>> > > Hi Rahul,
> >>> > >
> >>> > > Thanks for the reply!
> >>> > >
> >>> > > I am working on implementing Gaussian Mixture Model assuming that
> the
> >>> > > co-variance matrix is same for all the Gaussians.
> >>> > > The JIRA which deals GMM is MADBLIB-410:
> >>> > >
> >>> >
> >>> >
> https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB
> >>> > >
> >>> > > Can this be assigned to me, or how do I get it assigned to me?
> >>> > >
> >>> > > Thanks,
> >>> > > Aditya
> >>> > >
> >>> > > On Mon, Mar 21, 2016 at 3:41 PM, Rahul Iyer <ri...@pivotal.io>
> wrote:
> >>> > >
> >>> > > > Hi Aditya,
> >>> > > >
> >>> > > > Welcome to the MADlib community!
> >>> > > >
> >>> > > > Gaussian Mixture models is extrememly useful and we would
> heartily
> >>> > > welcome
> >>> > > > a contribution for it. The SQLEM paper might be oversimplifying
> the
> >>> > > > capabilities of the database (e.g. assuming there is no array
> type
> >>> > > > is
> >>> > > > unnecessary for Postgresql). You could speed things (both dev
> time
> >>> > > > and
> >>> > > > execution time) by writing some of the functions in C++. K-means
> is
> >>> > > > an
> >>> > > > example of how clustering is implemented.
> >>> > > > IMO, assuming the same covariance matrix is reasonable. We could
> >>> > > > extend
> >>> > > the
> >>> > > > capabilities after the initial implementation is complete.
> >>> > > >
> >>> > > > There was some work started a long time ago that built
> perceptrons
> >>> > using
> >>> > > > the convex framework (link
> >>> > > > <https://github.com/iyerr3/madlib/tree/mlp
> >>> > >).
> >>> > > > There are still some bugs in that code since the trained network
> >>> > > > isn't
> >>> > > > converging. You could start there or build a new module - either
> >>> > > > ways
> >>> > an
> >>> > > > MLP module is frequently demanded by the data science community.
> >>> > > >
> >>> > > > I would suggest starting with Gaussian mixtures and then moving
> to
> >>> > > > perceptrons if GMM work is completed.
> >>> > > >
> >>> > > > Feel free to ask questions on this forum. Looking forward to
> >>> > > collaborating
> >>> > 

Re: [gpdb-users] Re: [VIRTUAL] Reminder: Apache MADlib on PostgreSQL meeting tomorrow, Weds, 9AM PST

2016-03-21 Thread Frank McQuillan
The MADlib community calls happen 1x per month.  I think this last one was
the first time time we advertised the event on the Greenplum and HAWQ
mailing lists (though I could be wrong about this, Greg Chase can correct
me).

At any rate we had good attendance last week and a fun discussion, so if
there is no objection, we will continue to give notice of these events on
the Greenplum and HAWQ dev/user mailing lists.

One topic for the next meeting is the new path algorithm in upcoming MADlib
1.9 (kinda like Aster nPath), which may be of interest to SQL folks too,
not just machine learning folks.

Frank

Frank

On Sun, Mar 20, 2016 at 6:54 PM, Jim Nasby  wrote:

> On 3/18/16 7:52 PM, Jon Roberts wrote:
>
>> Some of the functions from pg_similarity are available with FuzzyStrMatch
>> which is included with Greenplum database.
>> http://gpdb.docs.pivotal.io/4370/ref_guide/extensions/fuzzystrmatch.html
>>
>> It includes Soudex, Levenshtein, Metaphone, and Double Metaphone.  Soundex
>> and Levenshtein are both in pg_similiarity.
>>
>
> It's also possible to add extensions even if GreenPlum doesn't formally
> support them. Extensions are ultimately just SQL scripts.
>
> BTW, it would be nice if there was more heads-up on these calls. Do they
> happen on a regular basis?
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


k-NN

2016-03-18 Thread Frank McQuillan
Moving discussion from JIRA to dev mailing list...

@anishsingh and @tianwei37
Thank you for your interest in k-NN
https://issues.apache.org/jira/browse/MADLIB-927

I think this is a great algorithm to work on.

May I suggest that you review some of the material on the MADlib wiki
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61319606

In particular the architecture
https://cwiki.apache.org/confluence/display/MADLIB/Architecture

and the quick start guide for developers
https://cwiki.apache.org/confluence/display/MADLIB/Quick+Start+Guide+for+Developers

That will give you a sense of how in-data base machine learning works in
MADlib.

Please post any questions that you have to this mailing list and we will be
pleased to help you.

Frank


Subject: [RESULT][VOTE] MADlib v1.9alpha-rc2

2016-03-11 Thread Frank McQuillan
The vote has PASSED with 3 +1 binding votes from the Incubator PMC members,
and no 0 or -1 votes:

+1 Justin Mclean
+1 Roman Shaposhnik
+1 Konstantin Boudnik

Thread:
http://mail-archives.apache.org/mod_mbox/incubator-general/201603.mbox/%3CCAKBQfzQh%3DJ3DrFSgFEY8teRDpEf5Yz3r7eBffTZVVN_9evpBJg%40mail.gmail.com%3E

On behalf of the MADlib community, thank you to all who reviewed and voted
on this release candidate.

We will proceed with promoting this release candidate.

Regards,
Frank McQuillan


Spatial model in MADlib (GWR)

2016-03-09 Thread Frank McQuillan
Hi ChenLiang Wang,

I am checking to see how things are going regarding the GWR model for
MADlib that you proposed.  Not sure which phase you are at, but a suggested
next step might be how you plan to implement the GWR algorithm in a
distributed manner.  That is, how will it run in parallel?

(Starting as a new thread since the previous thread fragmented.)

Regards,
Frank


Re: Adding a script generate test data table for CRF

2016-02-23 Thread Frank McQuillan
Jim's approach seems like a reasonable way to go.

Giang, can you create a JIRA for this request?  You are welcome to start
working on it if you would like to contribute this to improve CRF usability.

Frank

On Tue, Feb 23, 2016 at 3:24 PM, Jim Nasby  wrote:

> On 2/23/16 11:07 AM, Nguyen,Giang H wrote:
>
>> I think It could be very helpful if we write a python script in Madlib to
>> tokenize words and assign the doc_id and start_pos correspondingly and
>> store it into the database. Hence, users can save a lot more time when
>> using CRF and also enable them to conveniently run crf model on big testing
>> data.
>>
>
> Perhaps the Postgres text search stuff could be used for this (maybe
> to_tsvector())?
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
>


[VOTE] MADlib v1.9alpha-rc1

2016-02-19 Thread Frank McQuillan
Hello MADlib community,

We have created a MADlib 1.9 alpha release, with the artifacts below up for
a vote.

This is the 1st release for Apache MADlib (incubating).

First of all, a big thanks to Orhan Kislal for being the release manager
for this release.

There are two main goals of this release:
* Clear all potential IP issues in the code base and make it legally ready
to be adopted by the community.
* Share the new features that have been developed so far, in order to give
the community a good sense of the upcoming 1.9 release.

For more information including release notes, please see:
https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9+alpha

This is a source code tarball only release.

*** Please download, review and vote by Wed Feb 24, 2016 @ 6pm PST ***

We're voting on the source (tag):
rc/v1.9alpha-rc1

Source Files:
https://dist.apache.org/repos/dist/dev/incubator/madlib/1.9alpha-incubating-rc1/

Commit to be voted on:
https://git-wip-us.apache.org/repos/asf?p=incubator-madlib.git;a=commit;h=581d07b03ba6c7f81fd791548f1b0f7c4909c710

KEYS file containing PGP Keys we use to sign the release:
https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS

To help in tallying the vote, PMC members please be sure to indicate
"(binding)" with your vote.

[ ] +1  approve
[ ] +0  no opinion
[ ] -1  disapprove (and reason why)


Thank you,
Frank McQuillan


Follow up questions from community call

2016-02-16 Thread Frank McQuillan
Thank you to Chenliang Wang for presenting Geographically Weighted
Regression (GWR) analysis of spatial data at the MADlib community meeting.

Here are some follow up questions that we did not get to in the meeting.
Chenliang, could you briefly respond?

1) what type of matrix operations are required?

2) how will the algorithm be parallelized?

3) is raster support for PostGIS a requirement?  (GPDB currently does not
support as per
http://gpdb.docs.pivotal.io/4340/ref_guide/postGIS.html#topic_wy2_rkb_3p)

4) what does the 160 refer to in your slides?

Thanks again,
Frank


Re: MADlib 1.9 alpha Release

2016-02-12 Thread Frank McQuillan
LGTM

On Fri, Feb 12, 2016 at 4:55 PM, Orhan Kislal  wrote:

> Dear MADlib Community,
>
> We have been working on the 1.9 alpha release tracked by this JIRA
> https://issues.apache.org/jira/browse/MADLIB-957. It seems most of the
> tasks are completed and the remaining ones like code testing will be
> completed soon. Roman will be publishing the keys. I would like to ask if
> there are any comments or suggestions before we finalize the release.
>
> Thank you for using and contributing to MADlib.
>
> Orhan Kislal
>


New MADlib committer: Xiaocheng Tang

2016-01-13 Thread Frank McQuillan
Dear MADlib dev community,

The Project Management Committee (PMC) for Apache MADlib has asked
Xiaocheng Tang to become a committer and we are pleased to announce that he
has accepted.

Recently Xiaocheng has been working on a completely new version of Support
Vector Machines in addition to making various bug fixes and refinements to
existing algorithms.

Being a committer enables easier contribution to the project since there is
no need to go via the patch submission process.  This should enable better
productivity.  Being a PMC member enables assistance with the management
and to guide the direction of the project.

Welcome Xiaocheng!

Regards,
Frank


Path epic

2015-12-23 Thread Frank McQuillan
Thanks to those who joined the MADlib community call last Friday.

One question came up during the meeting regarding path function work that
is currently in flight.  The epic is:
https://issues.apache.org/jira/browse/MADLIB-903

Happy to hear from anyone who has comments or input on this epic.

Frank


Re: How to contribute a spatial module to MADlib manipulating objects from PostGIS

2015-12-18 Thread Frank McQuillan
Thanks ChenLiang Wang for your interest.

I would repeat Ivan's welcome to you, and I look forward to your
contributions in the area of GIS.

To answer your questions:

1.  Yes, it is possible to call PostGIS functions from MADlib.

2.  Yes, spatial statistics are suitable for MADlib.

For documentation, please refer to the Apache MADlib wiki
http://madlib.incubator.apache.org/

which includes:
Quick Start Guides

Get going with a minimum of fuss.

   - Installation Guide
   
   - Quick Start Guide for Users
   

   - Quick Start Guide for Developers
   



As Ivan mentioned, writing down the functions you would like to build and
the interface is a good place to begin.  Then we can discuss on the open
mailing list.

Regards,
Frank

On Thu, Dec 17, 2015 at 8:11 PM, 王晨 亮  wrote:

> Thanks for your quick reply. Your suggestion is great. I will give a
> definitions and description for the spatial statistic functions and
> comparison with ordinary statistic models.
>
>
> > Date: Thu, 17 Dec 2015 21:56:06 -0500
> > Subject: Re: How to contribute a spatial module to MADlib manipulating
> objects from PostGIS
> > From: inov...@pivotal.io
> > To: dev@madlib.incubator.apache.org
> >
> > Hi ChenLiang,
> >
> > I think your proposal is good and worth trying to do it!
> >
> > Can I suggest the first steps if you send a proposal of the function
> > definitions and the parameters and return values as well as description
> of
> > the functions and what they do.
> >
> > Based on that we can discuss the design of the interface and once it
> looks
> > good you can start working on the actual implementation of the coding.
> > When you get to implementation we can help you on technical challenges.
> >
> > Cheers,
> > Ivan
> >
> >
> >
> >
> >
> > On Thu, Dec 17, 2015 at 9:50 PM, 王晨 亮  wrote:
> >
> > > Hi MADlib Developers,
> > >
> > >
> > >
> > >
> > > I am a GIS Researcher and have some knowledge on PostGIS, Python,
> > > C/C++,Java and R.
> > >
> > >
> > >
> > > I have learned some spatial statistical models during My PhD research
> in
> > > GIS. Recently, I have done a job translating GWR (Geographical Weighted
> > > Regression) from R into Java for my company.  And I would like to
> > > contribute to MADLib if possible.  I believe PostGIS and MADlib are the
> > > most powerful extensions of PostgreSQL . Therefore, a spatial
> statistical
> > > module connecting the two libraries could be significant . If I can
> start
> > > the task , the first goal to implement will be GWR model.
> > >
> > >
> > >
> > > Now I am reading the developer guide of MADlib. I not quite sure how to
> > > contribute a geospatial module to MADlib. Is it possible to manipulate
> > > spatial object or attribute from PostGIS in MADlib ?
> > >
> > >
> > >
> > > So could anyone suggest a few pointers & links that I can follow to get
> > > to know:
> > >
> > >
> > >
> > > 1. how to deal with these dependencies about MADlib?
> > >
> > >
> > >
> > > 2. whether the spatial statistics module is suitable for MADlib?
> > >
> > >
> > >
> > > Thank you in advance.
> > >
> > >
> > > ChenLiang Wang
> > >
> > >
>
>


Re: DRAFT Apache MADlib (incubating) report for Dec 2015

2015-12-03 Thread Frank McQuillan
Roman added user and dev mailing list numbers.

Report has been posted @
https://wiki.apache.org/incubator/December2015

Frank

On Tue, Dec 1, 2015 at 10:10 PM, Frank McQuillan <fmcquil...@pivotal.io>
wrote:

> Here is the draft report for Dec.  Please let me know if you have any
> additions/comments/suggestions.  It is due by close of business Wed 12/2.
> Sorry for not providing more time for review on this mailing list.
>
> Thx,
> Frank
>
> ---
>
> MADlib
>
> Big Data Machine Learning in SQL for Data Scientists.
>
> MADlib has been incubating since 2015-09-15.
>
> Three most important issues to address in the move towards graduation:
>
>   1. Produce a first Apache (incubating) release.
>   2. Expand the community, increase dev list activity and add new
> committers/pmc members.
>   3. Execute and manage the project according to governance model required
> by the "Apache Way”.
>
> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> aware of?
>
>  None
>
> How has the community developed since the last report?
>
>   1. First community call held 11/20/15.  There were approximately 10
> attendees, about half were from outside of the current group of MADlib
> contributors.  This will be a monthly call, possibly moving to 2x per month
> in the future.
>   2. Meetup 12/3/15 @ Pivotal Labs, San Francisco:  “MADlib and HAWQ for
> Advanced SQL Machine Learning on Hadoop”.  One goal of this meetup is to
> invite new community participation in MADlib.
>   3. Material technical conversations are now happening on the dev mailing
> lists and in the appropriate JIRAs.  E.g., 53 emails on dev in Nov compared
> with 7 in Oct.
>   4. Number of user and dev mailing list subscribers:   the numbers>
>
> How has the project developed since the last report?
>
>   1. 31 JIRAs created and 7 resolved in last 30 days.
>   2. Proposed scope for first Apache MADlib release has been described to
> the community for comment.  This release includes IP cleanliness and new
> features.
>   3. The MADlib wiki
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61319606
> has been updated with new content, including a new contributors guideline,
> an FAQ and a page listing suggestions for first time contributors (these
> have also been labeled “starter” in the JIRAs).
>
> Date of last release:
>
>   No release yet.
>
> When were the last committers or PMC members elected?
>
>   No new members added on top of the initial committer list.
>
> Signed-off-by:
>
>   [ ](madlib) Konstantin Boudnik
>   [ ](madlib) Ted Dunning
>   [ ](madlib) Roman Shaposhnik
>
> Shepherd/Mentor notes:
>
>