Re: Jenkins PR build is failing

2017-08-18 Thread Nandish Jayaram
Thank you for investigating it Ed. :)

NJ

On Fri, Aug 18, 2017 at 11:42 AM, Ed Espino <esp...@apache.org> wrote:

> NJ,
>
> I took a quick look at the build console output and it appears "qnode3
> (ubuntu)" has exhausted it's available disk space.  Additionally, it looks
> as if this system has been indulging a bit too much lately (too many Stan's
> donuts?) impacting other Jenkins jobs:
> https://issues.apache.org/jira/browse/INFRA-14838?jql=
> text%20~%20%22qnode3%22
>
> I also triggered additional PR builds and they all failed as they went to
> the same Jenkins build resource.
>
> I have added a comment to INFRA-14838 for the MADlib PR build issue. You
> may need to open a new INFRA ticket to address this current issues.
>
> Top of build console output identifying the Jenkins build slave resource:
>
> Building remotely on qnode3
> <https://builds.apache.org/computer/qnode3> (ubuntu) in workspace
> /home/jenkins/jenkins-slave/workspace/madlib-pr-build
>
>
> -=e
>
> On Fri, Aug 18, 2017 at 10:53 AM, Nandish Jayaram <njaya...@pivotal.io>
> wrote:
>
> > Hi,
> >
> > The latest PR build on Jenkins (
> > https://builds.apache.org/user/riyer/my-views/view/
> > MADlib-Monitor/job/madlib-pr-build/170/)
> > is failing with an
> > `java.io.IOException: No space left on device` error.
> > Can somebody with a Jenkins account have a look at
> > it please?
> >
> > NJ
> >
>
>
>
> --
> *Ed Espino*
>


Jenkins PR build is failing

2017-08-18 Thread Nandish Jayaram
Hi,

The latest PR build on Jenkins (
https://builds.apache.org/user/riyer/my-views/view/MADlib-Monitor/job/madlib-pr-build/170/)
is failing with an
`java.io.IOException: No space left on device` error.
Can somebody with a Jenkins account have a look at
it please?

NJ


JIRA for migrating repos following MADlib's TLP graduation

2017-08-14 Thread Nandish Jayaram
Hi All,

I have opened an Apache Infrastructure ticket to migrate MADlib's
git repos, distribution server, and other common tasks associated
with the move from incubator to TLP. The ticket is:
https://issues.apache.org/jira/browse/INFRA-14872

Please do have a look at it and let me know if I have missed something,
or if something is to be changed. I followed the instructions at
http://www.apache.org/dev/infra-contact#requesting-graduation to open
the ticket, and used the template used by Apache Flex's TLP ticket
https://issues.apache.org/jira/browse/INFRA-5688.

I will keep you posted on the status of the ticket. We might still need to
change some settings in MADlib's Jenkins build, once the git repo move
is finished. I thought that was something we could control and might not
need Infra's help for that (please correct me if I am wrong).

NJ


Re: Jenkins madlib-master-build failed

2017-08-11 Thread Nandish Jayaram
Thank you for the info Ed. :)

NJ

On Fri, Aug 11, 2017 at 9:58 AM, Ed Espino  wrote:

> An observant badminton birdie whispered in the wind "I couldn't find a way
> to re-trigger Jenkins master, is it because I don't have a Jenkins
> account?"
>
> It just so happens that I assist with Apache Jenkins support for the Apache
> HAWQ (incubating) project. I requested access from the mentor (The great,
> powerful and kind Roman). It is he who granted me access to the Apache
> Jenkins service. It is through that privilege that I was able to trigger a
> MADlib master build to get the project back to a green state. I'm not sure
> how many team members on the Apache MADlib project have access to this
> service, but I suggest there are at least a few to assist with its
> maintenance.
>
> Who on the team currently has access to the Apache Jenkins service?
>
> -=e
>
> On Thu, Aug 10, 2017 at 4:15 PM, Ed Espino  wrote:
>
> > FYI: The manually triggered Jenkins master build passed:
> > https://builds.apache.org/view/M-R/view/MADlib/job/
> madlib-master-build/80/
> >
> > -=e
> >
> > On Thu, Aug 10, 2017 at 4:14 PM, Ed Espino  wrote:
> >
> >> Not sure what caused the MADlib master build to fail (git clone issue?).
> >> I have re-triggered it and it is beyond the previous failure point.
> >>
> >> -=e
> >>
> >> Here is the failure for future reference (https://builds.apache.org/vie
> >> w/M-R/view/MADlib/job/madlib-master-build/79/console):
> >>
> >> Checking out Revision 67b69eb8a5eec1ff5d4b947eabb90970d66b2ac5
> >> (refs/remotes/origin/master)
> >> Commit message: "MADLIB-1133. TLP graduation - remove references to
> >> "incubating"."
> >>  > git config core.sparsecheckout # timeout=10
> >>  > git checkout -f 67b69eb8a5eec1ff5d4b947eabb90970d66b2ac5
> >>  > git rev-list 0dc2df94358bb2ec3fd85865a6d53ae7cbde0226 # timeout=10
> >> Extended Email Publisher is currently disabled in project settings
> >> FATAL: Unable to produce a script file
> >> java.io.IOException: Permission denied
> >> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> >> at java.io.File.createTempFile(File.java:2024)
> >> at hudson.FilePath$17.invoke(FilePath.java:1373)
> >> at hudson.FilePath$17.invoke(FilePath.java:1363)
> >> at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2739)
> >> at hudson.remoting.UserRequest.perform(UserRequest.java:153)
> >> at hudson.remoting.UserRequest.perform(UserRequest.java:50)
> >> at hudson.remoting.Request$2.run(Request.java:336)
> >> at hudson.remoting.InterceptingExecutorService$1.call(Intercept
> >> ingExecutorService.java:68)
> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1142)
> >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:617)
> >> at java.lang.Thread.run(Thread.java:748)
> >> Caused: java.io.IOException: Failed to create a temporary directory in
> >> /tmp
> >> at hudson.FilePath$17.invoke(FilePath.java:1375)
> >> at hudson.FilePath$17.invoke(FilePath.java:1363)
> >> at hudson.FilePath$FileCallableWrapper.call(FilePath.java:2739)
> >> at hudson.remoting.UserRequest.perform(UserRequest.java:153)
> >> at hudson.remoting.UserRequest.perform(UserRequest.java:50)
> >> at hudson.remoting.Request$2.run(Request.java:336)
> >> at hudson.remoting.InterceptingExecutorService$1.call(Intercept
> >> ingExecutorService.java:68)
> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1142)
> >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:617)
> >> at java.lang.Thread.run(Thread.java:748)
> >> at ..remote call to H21(Native Method)
> >> at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1545)
> >> at hudson.remoting.UserResponse.retrieve(UserRequest.java:253)
> >> at hudson.remoting.Channel.call(Channel.java:830)
> >> at hudson.FilePath.act(FilePath.java:986)
> >> Caused: java.io.IOException: remote file operation failed:
> >> /home/jenkins/jenkins-slave/workspace/madlib-master-build at
> >> hudson.remoting.Channel@4b715ff3:H21
> >> at hudson.FilePath.act(FilePath.java:993)
> >> at hudson.FilePath.act(FilePath.java:975)
> >> at hudson.FilePath.createTextTempFile(FilePath.java:1363)
> >> Caused: java.io.IOException: Failed to create a temp file on
> >> /home/jenkins/jenkins-slave/workspace/madlib-master-build
> >> at hudson.FilePath.createTextTempFile(FilePath.java:1386)
> >> at hudson.tasks.CommandInterpreter.createScriptFile(CommandInte
> >> rpreter.java:162)
> >> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:94)
> >> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
> >> at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> >> at hudson.model.AbstractBuild$AbstractBuildExecution.perform(
> >> 

Re: [VOTE]: MADlib repo(s) migration

2017-08-09 Thread Nandish Jayaram
1

NJ

On Wed, Aug 9, 2017 at 2:40 PM, Ed Espino <esp...@apache.org> wrote:

> 2
>
> Thanks NJ!
>
> On Wed, Aug 9, 2017 at 2:35 PM, Cooper Sloan <csl...@pivotal.io> wrote:
>
> > 2
> >
> > On Wed, Aug 9, 2017 at 2:32 PM Nandish Jayaram <njaya...@pivotal.io>
> > wrote:
> >
> > > Hi All,
> > >
> > > With MADlib's graduation to TLP, it's time to migrate its github
> > > repos from `*incubator-madlib*` to `*madlib*`. We will have to open
> > > an Apache Infrastructure ticket to request this move for the following
> > > repos (along with other stuff like wiki, jenkins etc):
> > > https://git1-us-west.apache.org/repos/asf?p=incubator-madlib.git
> > >  (Read/Write)
> > > https://github.com/apache/incubator-madlib (Github mirror- read only)
> > > https://git1-us-west.apache.org/repos/asf?p=incubator-madlib-site.git
> > > https://github.com/apache/incubator-madlib-site (GitHub mirror)
> > >
> > > There are two ways to go about this, and the Infra ticket has to be
> > > raised accordingly.
> > > 1) Just maintain the current set-up, but have the repos renamed from
> > > incubator-madlib to madlib.
> > > 2) Use Gitbox to enable github repo as a R/W repo and not just
> read-only.
> > > Check this email (
> > >
> > > https://mail-archives.apache.org/mod_mbox/incubator-madlib-
> > dev/201708.mbox/%3cCA+ULb+vP0ViWH4Nc=4eaXvbT0KOmeFtQzp4eAa3p0fKPP7c
> > 8...@mail.gmail.com%3e
> > > )
> > > for further information.
> > >
> > > Please vote you preference and we can decide to move accordingly.
> > >
> > > NJ
> > >
> >
>
>
>
> --
> *Ed Espino*
>


[VOTE]: MADlib repo(s) migration

2017-08-09 Thread Nandish Jayaram
Hi All,

With MADlib's graduation to TLP, it's time to migrate its github
repos from `*incubator-madlib*` to `*madlib*`. We will have to open
an Apache Infrastructure ticket to request this move for the following
repos (along with other stuff like wiki, jenkins etc):
https://git1-us-west.apache.org/repos/asf?p=incubator-madlib.git
 (Read/Write)
https://github.com/apache/incubator-madlib (Github mirror- read only)
https://git1-us-west.apache.org/repos/asf?p=incubator-madlib-site.git
https://github.com/apache/incubator-madlib-site (GitHub mirror)

There are two ways to go about this, and the Infra ticket has to be
raised accordingly.
1) Just maintain the current set-up, but have the repos renamed from
incubator-madlib to madlib.
2) Use Gitbox to enable github repo as a R/W repo and not just read-only.
Check this email (
https://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201708.mbox/%3cCA+ULb+vP0ViWH4Nc=4eaxvbt0komeftqzp4eaa3p0fkpp7c...@mail.gmail.com%3e)
for further information.

Please vote you preference and we can decide to move accordingly.

NJ


Re: Migrating MADlib code base out of incubator.

2017-08-09 Thread Nandish Jayaram
> Shall we put the main MADlib repo(s) migration to GitBox to a vote?
Sure. Do we vote on this thread or on a different one?

NJ

On Wed, Aug 9, 2017 at 12:28 PM, Ed Espino <esp...@apache.org> wrote:

> I believe there is another git repo "incubator-madlib-site" (pair) that
> needs to be migrated as well:
>
> https://git1-us-west.apache.org/repos/asf?p=incubator-madlib-site.git
> https://github.com/apache/incubator-madlib-site (GitHub mirror)
>
> Personally, I am in favor of Gitbox use. I believe there is a security
> requirement which I also support: "You are required to enable 2FA on GitHub
> before you can gain write-access to repositories.".
>
> Shall we put the main MADlib repo(s) migration to GitBox to a vote?
>
> -=e
>
> On Wed, Aug 9, 2017 at 11:47 AM, Nandish Jayaram <njaya...@pivotal.io>
> wrote:
>
> > Hi All,
> >
> > I was planning on opening an Apache Infra JIRA to migrate MADlib's
> current
> > code repo out of "incubator", following the directions in:
> > https://incubator.apache.org/guides/transferring.html
> > http://www.apache.org/dev/infra-contact#requesting-graduation
> > An example Infra JIRA we can follow:
> > https://issues.apache.org/jira/browse/INFRA-5688
> >
> > The following MADlib repos have to be migrated:
> > https://git1-us-west.apache.org/repos/asf?p=incubator-madlib.git
> > https://github.com/apache/incubator-madlib
> >
> > A recent email from Roman (
> > https://mail-archives.apache.org/mod_mbox/incubator-madlib-
> > dev/201708.mbox/%3cCA+ULb+vP0ViWH4Nc=4eaXvbT0KOmeFtQzp4eAa3p0fKPP7c
> > 8...@mail.gmail.com%3e
> > )
> > gave us another option in Gitbox for our source code repo.
> >
> > Does anybody in the community have any preferences/suggestions on
> > moving to Gitbox?
> >
> > NJ
> >
>
>
>
> --
> *Ed Espino*
>


Migrating MADlib code base out of incubator.

2017-08-09 Thread Nandish Jayaram
Hi All,

I was planning on opening an Apache Infra JIRA to migrate MADlib's current
code repo out of "incubator", following the directions in:
https://incubator.apache.org/guides/transferring.html
http://www.apache.org/dev/infra-contact#requesting-graduation
An example Infra JIRA we can follow:
https://issues.apache.org/jira/browse/INFRA-5688

The following MADlib repos have to be migrated:
https://git1-us-west.apache.org/repos/asf?p=incubator-madlib.git
https://github.com/apache/incubator-madlib

A recent email from Roman (
https://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201708.mbox/%3cCA+ULb+vP0ViWH4Nc=4eaxvbt0komeftqzp4eaa3p0fkpp7c...@mail.gmail.com%3e
)
gave us another option in Gitbox for our source code repo.

Does anybody in the community have any preferences/suggestions on
moving to Gitbox?

NJ


Confusion regarding the order of JIRAs to address

2017-08-07 Thread Nandish Jayaram
Hi All,

I was reviewing PR #158 (https://github.com/apache/incubator-madlib/pull/158),
and was also looking at the post graduation tasks. This PR seems to be very
relevant to one of the post graduation JIRAs (
https://issues.apache.org/jira/browse/MADLIB-1132), and I was a little
confused about the order of doing things here.
Should we merge PR #158 before or after MADLIB-1132 is addressed?

NJ


Re: MADlib Debugging (elastic_net) - MADLIB-1068

2017-08-07 Thread Nandish Jayaram
Nice catch, thanks Ed!

NJ

On Mon, Aug 7, 2017 at 2:07 PM, Ed Espino  wrote:

> I have resolved MADLIB-1068 with a workaround (use GCC 4 and not the
> default GCC 5) for Ubuntu 16.04. I have created MADLIB-1145 to track the
> GCC 5 issue.
>
> Happy Monday,
> -=e
>
> On Fri, Aug 4, 2017 at 11:11 AM, Ed Espino  wrote:
>
> > FYI: I have managed to get the issue to reproduce in gdb! It was
> > relatively painless. I will be adding my debugging environment and notes
> to
> > the dev list for future reference.
> >
> > I have come to understand Ubuntu isn't a supported platform. In the
> spirit
> > of growing and fostering MADlib adoption, I'll spend a bit more time on
> > this to see if I can identify what the subtleties are between the
> supported
> > platforms and Ubuntu.
> >
> > Cheers,
> > -=e
> >
> > On Thu, Aug 3, 2017 at 5:15 PM, Ed Espino  wrote:
> >
> >> For MADLIB-1068, I have reproduced the elastic_net issue which causes a
> >> crash (core dump) in PostgreSQL 9.6 on Ubuntu 16.04.03 using MADlib
> master.
> >> Are there any general debugging techniques used to help track down these
> >> types of issues? I will be fiddling around with the elastic_net trying
> to
> >> track down this issue.  Any helpful tips and tricks are greatly
> appreciated.
> >>
> >> -=e
> >>
> >> --
> >> *Ed Espino*
> >>
> >
> >
> >
> > --
> > *Ed Espino*
> >
>
>
>
> --
> *Ed Espino*
>


Re: Regarding moving source repos from incubator-madlib to madlib

2017-08-07 Thread Nandish Jayaram
> NJ (aka: Badminton man),
lol. :D

Thank you Ed, will check out the resources.

NJ

On Mon, Aug 7, 2017 at 12:33 PM, Ed Espino <esp...@apache.org> wrote:

> NJ (aka: Badminton man),
>
> Roman pointed me at the following "Guide :: Transferring Resources out of
> the Incubator" (https://incubator.apache.org/guides/transferring.html).
> There is mention of the git project renaming task.
>
> There are also several TLP graduation JIRAs filed under the "TLP graduation
> tasks" epic (https://issues.apache.org/jira/browse/MADLIB-1112).
>
> -=e
>
> On Mon, Aug 7, 2017 at 12:27 PM, Nandish Jayaram <njaya...@pivotal.io>
> wrote:
>
> > Hi All,
> >
> > Now that we have graduated to TLP, it's time to move all
> "incubator-madlib"
> > to "madlib", and I was looking at a PR (
> > https://github.com/apache/incubator-madlib/pull/158) which does quite a
> > bit
> > towards it.
> >
> > I noticed that a few URLs such as our homepage and user docs (
> > http://madlib.apache.org/docs/latest/index.html) have been been changed,
> > although we might still want to redirect
> > http://madlib.incubator.apache.org/docs/latest/index.html to
> > http://madlib.apache.org/docs/latest/index.html.
> >
> > The question I had was how and when to make URL changes to other
> resources
> > such as our source code repo (https://github.com/apache/
> incubator-madlib/
> > and https://git1-us-west.apache.org/repos/asf?p=incubator-madlib.git).
> Any
> > idea how to go about this process?
> >
> > NJ
> >
>
>
>
> --
> *Ed Espino*
>


Regarding moving source repos from incubator-madlib to madlib

2017-08-07 Thread Nandish Jayaram
Hi All,

Now that we have graduated to TLP, it's time to move all "incubator-madlib"
to "madlib", and I was looking at a PR (
https://github.com/apache/incubator-madlib/pull/158) which does quite a bit
towards it.

I noticed that a few URLs such as our homepage and user docs (
http://madlib.apache.org/docs/latest/index.html) have been been changed,
although we might still want to redirect
http://madlib.incubator.apache.org/docs/latest/index.html to
http://madlib.apache.org/docs/latest/index.html.

The question I had was how and when to make URL changes to other resources
such as our source code repo (https://github.com/apache/incubator-madlib/
and https://git1-us-west.apache.org/repos/asf?p=incubator-madlib.git). Any
idea how to go about this process?

NJ


Re: MADlib Python module code coverage

2017-07-18 Thread Nandish Jayaram
Hey Ed,

I have not tried it. It'll be great if you could share your experiences
with it. Will go well with
https://github.com/apache/incubator-madlib/pull/151.

NJ

On Tue, Jul 18, 2017 at 3:34 PM, Ed Espino  wrote:

> MADlib dev,
>
> Has anyone tried to use the Python Coverage.py (
> https://coverage.readthedocs.io/en/coverage-4.4.1/) module to
> generate MADlib Python module code coverage metrics? I want to run it
> against the python only code in the graph PR:
> https://github.com/apache/incubator-madlib/pull/152
>
> Thanks,
> -=e
>
> --
> *Ed Espino*
>


Re: MADlib Code coverage

2017-07-14 Thread Nandish Jayaram
Thank you Ed, this is great. We will create a JIRA to incorporate this
(including documenting the steps to use it). Or, you can also create a PR.
:)

NJ

On Fri, Jul 14, 2017 at 12:31 PM, Ed Espino <esp...@apache.org> wrote:

> NJ,
>
> I spent about an hour getting it setup on my mac.  This supported the
> MLP PR review. I threw some info together below.
>
> Hope it helps,
> -=e
>
> --
>
> Add lcov through brew (brings in lcov and genhtml utilities):
>   brew install lcov
>
> "Total HACK" to add code coverage support into the build for C/C++
> code. This is quick and dirty so I could get to the data I needed for
> the PR review.
>
>   diff --git a/CMakeLists.txt b/CMakeLists.txt
>   index b2172ef3..db80986c 100644
>   --- a/CMakeLists.txt
>   +++ b/CMakeLists.txt
>   @@ -104,7 +104,7 @@ if(CMAKE_COMPILER_IS_GNUCXX)
>set(CMAKE_INCLUDE_SYSTEM_FLAG_CXX "-isystem ")
>endif(APPLE)
>elseif(CMAKE_C_COMPILER_ID MATCHES "Clang")
>   -set(CMAKE_CXX_FLAGS "-stdlib=libstdc++")
>   +set(CMAKE_CXX_FLAGS "-stdlib=libstdc++ -fprofile-arcs
> -ftest-coverage")
>endif(CMAKE_COMPILER_IS_GNUCXX)
>
># force a `m4_' prefix to all builtins
>   @@ -114,6 +114,8 @@ else()
>set(M4_ARGUMENTS "--prefix-builtins")
>endif()
>
>   +set(CMAKE_C_FLAGS "-fprofile-arcs -ftest-coverage")
>   +
># Read and parse Version.yml file
>file(READ "${MADLIB_VERSION_YML}" _MADLIB_VERSION_CONTENTS)
>string(REGEX REPLACE "^.*version:[ \t]*([^\n]*)\n.*" "\\1"
> MADLIB_VERSION_STRING "${_MADLIB_VERSION_CONTENTS}")
>
> Once MADlib is built and installed, run your tests:
>
>   ## Build and install MADlib
>   mkdir build
>   cd build
>   cmake ..
>   make -j8 install
>   /usr/local/madlib/bin/madpack -s madlib -p postgres install
>
>   ## At this point you can run your tests (I'm focusing on MLP)
>
>   /usr/local/madlib/bin/madpack -s madlib -p postgres install-check -t
> convex/mlp
>
>   ## Time to capture the results and generate html report.
>
>   lcov --capture --directory . --output-file coverage.info
>   genhtml coverage.info --output-directory gcov
>
>   # You will notice there is some uninteresting system coverage info.
>   # The following will filter them out.
>
>   lcov --remove coverage.info '/usr/include/*' '/usr/local/include/*'
> '/usr/local/postgres/*' '*build/third_party/*' -o coverage_filtered.info
>   rm -rf gcov
>   genhtml coverage_filtered.info --output-directory gcov
>
> It is possible to zero out the counters with the following.
>lcov --zerocounters --directory .
>
>
> On Fri, Jul 14, 2017 at 12:23 PM, Nandish Jayaram <njaya...@pivotal.io>
> wrote:
>
> > Hi Ed,
> >
> > We haven't set that up for MADlib yet, but we will be looking into it
> soon.
> > Any ideas?
> >
> > NJ
> >
> > On Fri, Jul 14, 2017 at 10:00 AM, Ed Espino <esp...@apache.org> wrote:
> >
> > > Out of curiosity, do MADlib developers regularly use code coverage
> > > utilities to measure the coverage quality of their tests?
> > >
> > > -=e
> > >
> > > --
> > > *Ed Espino*
> > >
> >
>
>
>
> --
> *Ed Espino*
>


Re: MADlib Code coverage

2017-07-14 Thread Nandish Jayaram
Hi Ed,

We haven't set that up for MADlib yet, but we will be looking into it soon.
Any ideas?

NJ

On Fri, Jul 14, 2017 at 10:00 AM, Ed Espino  wrote:

> Out of curiosity, do MADlib developers regularly use code coverage
> utilities to measure the coverage quality of their tests?
>
> -=e
>
> --
> *Ed Espino*
>


Re: [GitHub] incubator-madlib pull request #81: JIRA: MADLIB-927 Changes made in KNN-help...

2016-12-28 Thread Nandish Jayaram
Just pushing the code should be good, you don't have to create a new pull
request. The existing one will get updated.

NJ

On Tue, Dec 27, 2016 at 5:56 PM, Kazmi,Auon H  wrote:

> Hi,
>
> I have been making some changes in existing KNN code after going through
> the comments of Orhan.
>
> How should I push these changes? Should I create a new pull request and
> delete the current one?
>
>
>
> Regards,
>
> Auon
>
> 
> From: auonhaidar 
> Sent: Tuesday, December 27, 2016 3:46:09 PM
> To: dev@madlib.incubator.apache.org
> Subject: [GitHub] incubator-madlib pull request #81: JIRA: MADLIB-927
> Changes made in KNN-help...
>
> Github user auonhaidar commented on a diff in the pull request:
>
> https://github.com/apache/incubator-madlib/pull/81#
> discussion_r93969163
>
> --- Diff: src/ports/postgres/modules/knn/test/knn.sql_in ---
> @@ -0,0 +1,41 @@
> +m4_include(`SQLCommon.m4')
> +/* 
> -
> + * Test knn.
> + *
> + * FIXME: Verify results
> --- End diff --
>
> You mean to say that I should include assert statements in this
> test/knn.sql_in file in order to validate results, right?
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


Re: [GitHub] incubator-madlib pull request #80: KNN Added

2016-12-19 Thread Nandish Jayaram
Hi Auon,

You don't have to add the JIRA, it is already there:
https://issues.apache.org/jira/browse/MADLIB-927

It's just good practice to mention the JIRA ID (JIRA: MADLIB-927) in your
commit message.

NJ

On Sun, Dec 18, 2016 at 8:13 PM, Kazmi,Auon H  wrote:

> Hi NJ,
>
> I am ready to add changes made in KNN to my repo. Please tell where to add
> JIRA.
>
>
>
>
> Regards,
>
> Auon
>
> 
> From: auonhaidar 
> Sent: Saturday, December 17, 2016 1:07:48 AM
> To: dev@madlib.incubator.apache.org
> Subject: [GitHub] incubator-madlib pull request #80: KNN Added
>
> Github user auonhaidar closed the pull request at:
>
> https://github.com/apache/incubator-madlib/pull/80
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


Re: [GitHub] incubator-madlib issue #80: KNN Added

2016-12-19 Thread Nandish Jayaram
Thank you Chenliang, I think it's a great idea to have such a writeup. It
will be great if the community can suggest some topics to include in the
writeup.
Auon, you might be able to suggest some good ideas for the topics since you
just went through the process and you would know where you struggled the
most!

NJ

On Mon, Dec 19, 2016 at 9:51 AM, Nandish Jayaram <njaya...@pivotal.io>
wrote:

> Great! Let us know if you have any other questions.
>
> NJ
>
> On Fri, Dec 16, 2016 at 10:52 PM, Kazmi,Auon H <aka...@ufl.edu> wrote:
>
>> Hi NJ,
>>
>> I guess I was able to play around with branching and other stuff but my
>> PR got deleted from madlib's repo. But that's okay as I have got the
>> comments you made, in e-mails. I will work on them from tomorrow.
>>
>>
>> Thanks for your help!
>>
>>
>> Thanks,
>>
>> Auon
>>
>> 
>> From: Kazmi,Auon H <aka...@ufl.edu>
>> Sent: Friday, December 16, 2016 11:09:11 PM
>> To: dev@madlib.incubator.apache.org
>> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>>
>> Hi NJ,
>>
>> Thanks for your detailed reply!
>>
>> I will try to do the said things.
>>
>>
>>
>> Thanks,
>>
>> Auon
>>
>> 
>> From: Nandish Jayaram <njaya...@pivotal.io>
>> Sent: Friday, December 16, 2016 8:32:52 PM
>> To: dev@madlib.incubator.apache.org
>> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>>
>> Hi Auon,
>>
>> Hope your exams went well.
>>
>> You can do whatever ends up being a better git-learning experience for
>> you.
>> Since you just started contributing to MADlib, the easier way to get going
>> might be to do what you mentioned. But a better, though a longer way,
>> would
>> be to just mess around with branches as a learning experience. For
>> instance
>> (be warned, this might not be the best approach and it might sound
>> daunting), you can do the following:
>> - Create a new local branch (say the branch name is temp-features/knn)
>> while on your current master branch (which already has the knn code
>> changes
>> in it).
>> useful commands: git checkout -b temp-features/knn
>> - Go back to your master branch and reset it to the commit SHA before you
>> made changes for knn (look at git log command to find the appropriate
>> commit SHA).
>> useful commands: git log, git reset --hard  (be careful while
>> using the --hard flag in general).
>> - You essentially want to reach a state where the new branch features/knn
>> has the code changes you have made so far for the knn feature, and your
>> master branch must be in sync with apache/incubator-madlib's master
>> branch.
>> You ideally want your local master to be in sync with your repo master,
>> which in turn must be in sync with origin master
>> (apache/incubator-madlib).
>> - You might also want to push your master (with --force option) to your
>> remote repo, to undo the changes you have made to your repo master branch
>> with the previous PR.
>> useful commands: git push --force 
>> - Now create a new branch off your master (say branch name features/knn).
>> Rebase this new branch with the temp-features/knn branch. You will get the
>> knn related changes back on this branch now.
>> useful commands: git checkout -b features/knn, git rebase
>> temp-features/knn
>> - Address the comments on this PR, and then push the features/knn branch
>> to
>> your repo and open a new PR on the branch. Read about git rebase (and try
>> using it) before pushing the branch.
>> useful commands: (on master branch), git pull --ff-only, (on features/knn
>> branch) git rebase -i master
>>
>> The useful commands I have mentioned might not do the needful for each
>> step. They are just pointers for you. There might be a much more simpler
>> way to accomplish all this, and I have no idea if this way would actually
>> work correctly. :) But you can (almost) always recover from any mistake
>> you
>> make on git.
>>
>> NJ
>>
>> On Fri, Dec 16, 2016 at 2:57 PM, Kazmi,Auon H <aka...@ufl.edu> wrote:
>>
>> > HI NJ,
>> >
>> > Thanks for your input!
>> >
>> > Sorry, I was busy with my semester-end exams.
>> >
>> > I am reading on Git. Should I repeat the process of checking out madlib
>> > repo and then again making changes in a local branch?
>> >
>> >
>> >
>> > Regards,
>>

Re: [GitHub] incubator-madlib issue #80: KNN Added

2016-12-19 Thread Nandish Jayaram
Great! Let us know if you have any other questions.

NJ

On Fri, Dec 16, 2016 at 10:52 PM, Kazmi,Auon H <aka...@ufl.edu> wrote:

> Hi NJ,
>
> I guess I was able to play around with branching and other stuff but my PR
> got deleted from madlib's repo. But that's okay as I have got the comments
> you made, in e-mails. I will work on them from tomorrow.
>
>
> Thanks for your help!
>
>
> Thanks,
>
> Auon
>
> 
> From: Kazmi,Auon H <aka...@ufl.edu>
> Sent: Friday, December 16, 2016 11:09:11 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>
> Hi NJ,
>
> Thanks for your detailed reply!
>
> I will try to do the said things.
>
>
>
> Thanks,
>
> Auon
>
> 
> From: Nandish Jayaram <njaya...@pivotal.io>
> Sent: Friday, December 16, 2016 8:32:52 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: [GitHub] incubator-madlib issue #80: KNN Added
>
> Hi Auon,
>
> Hope your exams went well.
>
> You can do whatever ends up being a better git-learning experience for you.
> Since you just started contributing to MADlib, the easier way to get going
> might be to do what you mentioned. But a better, though a longer way, would
> be to just mess around with branches as a learning experience. For instance
> (be warned, this might not be the best approach and it might sound
> daunting), you can do the following:
> - Create a new local branch (say the branch name is temp-features/knn)
> while on your current master branch (which already has the knn code changes
> in it).
> useful commands: git checkout -b temp-features/knn
> - Go back to your master branch and reset it to the commit SHA before you
> made changes for knn (look at git log command to find the appropriate
> commit SHA).
> useful commands: git log, git reset --hard  (be careful while
> using the --hard flag in general).
> - You essentially want to reach a state where the new branch features/knn
> has the code changes you have made so far for the knn feature, and your
> master branch must be in sync with apache/incubator-madlib's master branch.
> You ideally want your local master to be in sync with your repo master,
> which in turn must be in sync with origin master (apache/incubator-madlib).
> - You might also want to push your master (with --force option) to your
> remote repo, to undo the changes you have made to your repo master branch
> with the previous PR.
> useful commands: git push --force 
> - Now create a new branch off your master (say branch name features/knn).
> Rebase this new branch with the temp-features/knn branch. You will get the
> knn related changes back on this branch now.
> useful commands: git checkout -b features/knn, git rebase temp-features/knn
> - Address the comments on this PR, and then push the features/knn branch to
> your repo and open a new PR on the branch. Read about git rebase (and try
> using it) before pushing the branch.
> useful commands: (on master branch), git pull --ff-only, (on features/knn
> branch) git rebase -i master
>
> The useful commands I have mentioned might not do the needful for each
> step. They are just pointers for you. There might be a much more simpler
> way to accomplish all this, and I have no idea if this way would actually
> work correctly. :) But you can (almost) always recover from any mistake you
> make on git.
>
> NJ
>
> On Fri, Dec 16, 2016 at 2:57 PM, Kazmi,Auon H <aka...@ufl.edu> wrote:
>
> > HI NJ,
> >
> > Thanks for your input!
> >
> > Sorry, I was busy with my semester-end exams.
> >
> > I am reading on Git. Should I repeat the process of checking out madlib
> > repo and then again making changes in a local branch?
> >
> >
> >
> > Regards,
> >
> > Auon
> >
> > 
> > From: njayaram2 <g...@git.apache.org>
> > Sent: Thursday, December 15, 2016 6:24:08 PM
> > To: dev@madlib.incubator.apache.org
> > Subject: [GitHub] incubator-madlib issue #80: KNN Added
> >
> > Github user njayaram2 commented on the issue:
> >
> > https://github.com/apache/incubator-madlib/pull/80
> >
> > This is a great start!
> > I will provide some github-specific feedback here, and more
> > knn-specific
> > comments in the code.
> > Git can be daunting to use at first, but it's great once you get a
> > hang of it.
> > I would recommend you go through the following wonderful book if you
> > have not already done so:
> > https://git-scm.com/book/en/v2
> >
>

Re: [GitHub] incubator-madlib issue #80: KNN Added

2016-12-16 Thread Nandish Jayaram
Hi Auon,

Hope your exams went well.

You can do whatever ends up being a better git-learning experience for you.
Since you just started contributing to MADlib, the easier way to get going
might be to do what you mentioned. But a better, though a longer way, would
be to just mess around with branches as a learning experience. For instance
(be warned, this might not be the best approach and it might sound
daunting), you can do the following:
- Create a new local branch (say the branch name is temp-features/knn)
while on your current master branch (which already has the knn code changes
in it).
useful commands: git checkout -b temp-features/knn
- Go back to your master branch and reset it to the commit SHA before you
made changes for knn (look at git log command to find the appropriate
commit SHA).
useful commands: git log, git reset --hard  (be careful while
using the --hard flag in general).
- You essentially want to reach a state where the new branch features/knn
has the code changes you have made so far for the knn feature, and your
master branch must be in sync with apache/incubator-madlib's master branch.
You ideally want your local master to be in sync with your repo master,
which in turn must be in sync with origin master (apache/incubator-madlib).
- You might also want to push your master (with --force option) to your
remote repo, to undo the changes you have made to your repo master branch
with the previous PR.
useful commands: git push --force 
- Now create a new branch off your master (say branch name features/knn).
Rebase this new branch with the temp-features/knn branch. You will get the
knn related changes back on this branch now.
useful commands: git checkout -b features/knn, git rebase temp-features/knn
- Address the comments on this PR, and then push the features/knn branch to
your repo and open a new PR on the branch. Read about git rebase (and try
using it) before pushing the branch.
useful commands: (on master branch), git pull --ff-only, (on features/knn
branch) git rebase -i master

The useful commands I have mentioned might not do the needful for each
step. They are just pointers for you. There might be a much more simpler
way to accomplish all this, and I have no idea if this way would actually
work correctly. :) But you can (almost) always recover from any mistake you
make on git.

NJ

On Fri, Dec 16, 2016 at 2:57 PM, Kazmi,Auon H  wrote:

> HI NJ,
>
> Thanks for your input!
>
> Sorry, I was busy with my semester-end exams.
>
> I am reading on Git. Should I repeat the process of checking out madlib
> repo and then again making changes in a local branch?
>
>
>
> Regards,
>
> Auon
>
> 
> From: njayaram2 
> Sent: Thursday, December 15, 2016 6:24:08 PM
> To: dev@madlib.incubator.apache.org
> Subject: [GitHub] incubator-madlib issue #80: KNN Added
>
> Github user njayaram2 commented on the issue:
>
> https://github.com/apache/incubator-madlib/pull/80
>
> This is a great start!
> I will provide some github-specific feedback here, and more
> knn-specific
> comments in the code.
> Git can be daunting to use at first, but it's great once you get a
> hang of it.
> I would recommend you go through the following wonderful book if you
> have not already done so:
> https://git-scm.com/book/en/v2
>
> When you work on a feature/bug, it is best if you create a branch
> locally
> and make all changes for that feature there. You can then push that
> branch
> into your github repo and open a pull request. This way you won't mess
> with
> your local master branch, which should ideally be in sync with the
> origin's
> (apache/incubator-madlib in this case) master branch. More information
> on
> how to work with branches can be found in the following chapter:
> https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell
> (especially section 3.5)
>
> One other minor feedback is to try including the corresponding JIRA id
> with the commit message. The JIRA associated with this feature is:
> https://issues.apache.org/jira/browse/MADLIB-927
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
>


Re: Adding KNN to madlib

2016-12-12 Thread Nandish Jayaram
Hi Auon,

Please push all the changes you have made in your branch for KNN to your
incubator-madlib repo, and open a PR on that push.

NJ

On Mon, Dec 12, 2016 at 1:58 PM, Kazmi,Auon H <aka...@ufl.edu> wrote:

> Hi NJ,
>
> Where should I git push my code? I am doing that in my github id. Also,
> should I push just KNN folder or the whole src/ folder of madlib?
>
>
>
> Regards,
>
> Auon
>
> 
> From: Kazmi,Auon H <aka...@ufl.edu>
> Sent: Monday, December 5, 2016 8:32:38 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: Adding KNN to madlib
>
> Hi NJ,
>
> Thanks!
>
> I will do that.
>
>
>
>
> Regards,
>
> Auon
>
> 
> From: Nandish Jayaram <njaya...@pivotal.io>
> Sent: Sunday, December 4, 2016 1:39:53 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: Adding KNN to madlib
>
> Hi Auon,
>
> That's great!
> I think the best way to share your code with the community is by opening a
> pull request on github. Please do that and a lot of folks will be able to
> comment and give suggestions to you.
>
> NJ
>
> On Sat, Dec 3, 2016 at 2:13 PM, Kazmi,Auon H <aka...@ufl.edu> wrote:
>
> > Hi NJ,
> >
> > I got the solution to my problem.
> >
> > So, I might be done with my first version of interface of KNN for
> > classification as suggested by you, by Monday or so. I will generalise it
> > for regression and then please let me know how to share it with you guys.
> > After that, I can start making required changes as and when needed.
> >
> >
> >
> > regards,
> >
> > Auon Haidar
> >
> > 
> > From: Kazmi,Auon H <aka...@ufl.edu>
> > Sent: Thursday, December 1, 2016 2:59:21 PM
> > To: dev@madlib.incubator.apache.org
> > Subject: Re: Adding KNN to madlib
> >
> > Hi NJ,
> >
> > No, this is just an example I gave. So, I want in a postgres function to
> > iterate over the rows of a table given as a VARCHAR argument.
> >
> > FOR r IN EXECUTE format('SELECT * FROM %I', point_source)
> >
> > will do that. Now, r is a record, i.e. a row of table 'point_source'. I
> > want to store a particular column of that row r in a variable. Now, this
> > column name is also passed as VARCHAR argument to function. I am not able
> > to figure out the way to access this particular column from the current
> row
> > 'r'.
> >
> >
> > Basically, I am trying to iterate over my testing data one by one and
> pass
> > its vector column to a function that finds its label.
> >
> >
> >
> > Regards,
> >
> > Auon
> >
> >
> > 
> > From: Nandish Jayaram <njaya...@pivotal.io>
> > Sent: Thursday, December 1, 2016 2:51:47 PM
> > To: dev@madlib.incubator.apache.org
> > Subject: Re: Adding KNN to madlib
> >
> > Hi Auon,
> >
> > My apologies for the late reply.
> > Can you please give me more information regarding the design approach you
> > have taken. Information like
> > what files you have created so far would be helpful. I am not sure I
> > understand your approach correctly
> > yet. Is the above snippet of code the only code you have, or do you have
> > some other files too?
> >
> > NJ
> >
> > On Tue, Nov 29, 2016 at 10:06 PM, Kazmi,Auon H <aka...@ufl.edu> wrote:
> >
> > > Hi NJ,
> > >
> > > I got stuck at a place. Need a little help.
> > >
> > > Suppose I have a function that receives table_name and column_name as
> > > varchar.
> > >
> > > Now I would like to iterate through each rows of this table, while
> > > accessing the value of this column. I am doing something like this:
> > >
> > >
> > > CREATE OR REPLACE FUNCTION Foo(
> > > table_name VARCHAR,
> > > column_name VARCHAR
> > > ) RETURNS VOID AS
> > > $BODY$
> > > DECLARE
> > > r record;
> > > b integer;
> > > BEGIN
> > >
> > > FOR r IN EXECUTE format('SELECT * FROM %I', point_source)
> > > LOOP
> > >
> > > b := r.column_name;
> > >
> > >END LOOP
> > > END
> > >
> > > So, everything works except column_name is a varchar. So, r.column_name
> > > won't give me the correponding column's value in extracted row r. So,
> > > suppose it is 'pid' in the given table, then b:= r.pid will give the
&g

Re: Adding KNN to madlib

2016-11-23 Thread Nandish Jayaram
Hey Auon,

Starting with only classification for now sounds like a good idea!
Yes, the output should be just the predicted label for each row.
If the table you want to run the classification task on is like the
following:
*id |   x   |  y*
110 10.5
230 31.5
320 22.5

then the output table could be something like the following:
*id |   x   |y |  predicted_label*
110 10.5  true
230 31.5  false
320 22.5  true

You are basically adding a new column to the input table called
"predicted_label", and assign the label for each row based on the k-NN.

We can certainly make it better, by modifying the kNN function interface.
But let's just keep it simple for now and work on that later.

NJ

On Tue, Nov 22, 2016 at 2:52 PM, Kazmi,Auon H <aka...@ufl.edu> wrote:

>
> Hi NJ,
>
> I have implemented a first version of interface as suggested by you. Right
> now, I am just looking at classification task. I will generalize it to work
> for regression task as well. I have a question regarding output of the
> function. Should it just be the predicted label (or prediction value in
> case of regression)? Can you give an example of output?
>
>
>
>
>
> Regards,
>
> Auon Haidar
>
> 
> From: Kazmi,Auon H <aka...@ufl.edu>
> Sent: Friday, November 18, 2016 3:16:00 AM
> To: dev@madlib.incubator.apache.org
> Subject: Re: Adding KNN to madlib
>
> Hi NJ,
>
> Thanks for your inputs!
>
> I will go through everyone of them and try to incorporate them.
>
>
>
> Regards,
>
> Auon Haidar
>
> 
> From: Nandish Jayaram <njaya...@pivotal.io>
> Sent: Wednesday, November 16, 2016 2:29:05 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: Adding KNN to madlib
>
> Hi Auon,
>
> Defining the interface is a good start for k-NN. I have slightly modified
> your interface to help it conform with other MADlib algorithms' interfaces.
> Note that the output for each new data point is not the 'k' nearest
> neighbors, but either a classification or regression task on the data point
> based on its 'k' nearest neighbors. Every data point in the training data
> will have an associated class label (regression value) in a different
> column. Normally, the column containing the data point itself is called the
> independent variable, and the column containing the class label is called
> the dependent variable. If it is classification, you take a majority vote
> of the class labels of the 'k' nearest neighbors, and if it is regression,
> you average the dependent variable values of the 'k' nearest neighbors.
> Here is a preliminary interface we could start with:
>
> *knn*(
> source_table, -- *TEXT, name of table containing training data.*
> new_data_table, -- *TEXT, name of table containing new data on which
> classification or regression has to be performed. Classification or
> regression can be performed based on the type of "dependent_varname".*
> output_table, -- *TEXT, name of the table where output predictors are
> written. If this table is already present, an error is returned.*
> dependent_varname, -- *TEXT, name of the independent variable column. If
> this column is of type boolean/integer, we could probably perform k-NN
> classification, and perform k-NN regression if this is of type double.*
> independent_varname, -- *TEXT, column defining data points. Data points can
> be of type SVEC or any type convertible to SVEC such as float[] or
> integer[].*
> k, --* INTEGER, (optional, default value could be some odd number, say 5)
> number of neighbors to consider*
> metric, -- *TEXT, (optional, default value could be what you are using now
> for distance) the distance metric to use.*
> );
>
> For now you can just use the distance metric you had mentioned in an
> earlier email. Note that the source_table and new_data_table are tables in
> the database and not files.
>
> Some pointers to help you start off with the implementation:
> -
> https://cwiki.apache.org/confluence/display/MADLIB/Quick+Start+Guide+for+
> Developers
> is a very useful resource with a great hello-world example. It gives you
> details about how to add a new module (k-NN would be a new module) to
> MADlib.
> - k-NN is a great candidate for parallelizing. Do try to use UDA (User
> Defined Aggregates) in your implementation. This will require you to add a
> C++ layer too, along with the SQL and python layers. Feel free to ask
> specific questions about this after you have tried out the hello world
> example.
> - Chapter 1 in http://madlib.incubator.apache.org/design.pdf gives you
> more
> Design Document - Apache MADlib<http://mad

Re: Adding KNN to madlib

2016-11-16 Thread Nandish Jayaram
Hi Auon,

Defining the interface is a good start for k-NN. I have slightly modified
your interface to help it conform with other MADlib algorithms' interfaces.
Note that the output for each new data point is not the 'k' nearest
neighbors, but either a classification or regression task on the data point
based on its 'k' nearest neighbors. Every data point in the training data
will have an associated class label (regression value) in a different
column. Normally, the column containing the data point itself is called the
independent variable, and the column containing the class label is called
the dependent variable. If it is classification, you take a majority vote
of the class labels of the 'k' nearest neighbors, and if it is regression,
you average the dependent variable values of the 'k' nearest neighbors.
Here is a preliminary interface we could start with:

*knn*(
source_table, -- *TEXT, name of table containing training data.*
new_data_table, -- *TEXT, name of table containing new data on which
classification or regression has to be performed. Classification or
regression can be performed based on the type of "dependent_varname".*
output_table, -- *TEXT, name of the table where output predictors are
written. If this table is already present, an error is returned.*
dependent_varname, -- *TEXT, name of the independent variable column. If
this column is of type boolean/integer, we could probably perform k-NN
classification, and perform k-NN regression if this is of type double.*
independent_varname, -- *TEXT, column defining data points. Data points can
be of type SVEC or any type convertible to SVEC such as float[] or
integer[].*
k, --* INTEGER, (optional, default value could be some odd number, say 5)
number of neighbors to consider*
metric, -- *TEXT, (optional, default value could be what you are using now
for distance) the distance metric to use.*
);

For now you can just use the distance metric you had mentioned in an
earlier email. Note that the source_table and new_data_table are tables in
the database and not files.

Some pointers to help you start off with the implementation:
-
https://cwiki.apache.org/confluence/display/MADLIB/Quick+Start+Guide+for+Developers
is a very useful resource with a great hello-world example. It gives you
details about how to add a new module (k-NN would be a new module) to
MADlib.
- k-NN is a great candidate for parallelizing. Do try to use UDA (User
Defined Aggregates) in your implementation. This will require you to add a
C++ layer too, along with the SQL and python layers. Feel free to ask
specific questions about this after you have tried out the hello world
example.
- Chapter 1 in http://madlib.incubator.apache.org/design.pdf gives you more
information regarding the C++ abstraction layer in MADlib.

Feel free to shout out for help if you are stuck! Cheers. :)

NJ

On Tue, Nov 15, 2016 at 2:56 PM, Kazmi,Auon H <aka...@ufl.edu> wrote:

> Hi Frank and NJ,
>
> Thanks for your comments. I will go through the suggestions provided by NJ.
>
> Current interface of KNN is as follows:
>
> 1) Input:
>
>- Name of table having all the data points in n-dimensional vector
> form (Double  Precision[ ])
>
>- Column-name of these data points
>
>- Name of file having that n-dim vector (v, say) whose k-nearest
> neighbours need to be   found from first table (Double
> Precision[ ])
>
>- Column name having this vector
>
>- value of 'k'
>
>
> It returns 'k' nearest neighbours of vector v from first table having data
> points.
>
>
>
> For now, I am using madlib's squared norm function to calculate distance
> between any two vectors. I will try to generalise that.
>
>
> Please suggest any other improvements.
>
>
>
> Thanks,
>
> Auon Haidar
>
> 
> From: Frank McQuillan <fmcquil...@pivotal.io>
> Sent: Tuesday, November 15, 2016 1:30:53 PM
> To: dev@madlib.incubator.apache.org
> Subject: Re: Adding KNN to madlib
>
> Auon,
>
> Thanks for working on kNN for MADlib.   Can you expand a little bit on your
> note, and post the interface that you are thinking about and description of
> the arguments?  Then people can comment on that.
>
> Thanks,
> Frank
>
> On Tue, Nov 15, 2016 at 9:30 AM, Nandish Jayaram <njaya...@pivotal.io>
> wrote:
>
> > Hi Auon,
> >
> > Great going with your first version of k-NN implementation.
> > Some useful links for coding guidelines are at (see Developer
> > Documentation):
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=61319606
> > MADilb has something called as install-checks for basic testing. You can
> > look at any existing module for an example of the same. For i

Re: Adding KNN to madlib

2016-11-15 Thread Nandish Jayaram
Hi Auon,

Great going with your first version of k-NN implementation.
Some useful links for coding guidelines are at (see Developer
Documentation):
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61319606
MADilb has something called as install-checks for basic testing. You can
look at any existing module for an example of the same. For instance, check
out the install check code for k-means at:
https://github.com/apache/incubator-madlib/tree/master/src/ports/postgres/modules/kmeans/test

I am sure others will pitch in to help you more with your other questions,
but these are some starters you can consider! Good luck!

NJ

On Mon, Nov 14, 2016 at 10:41 PM, Kazmi,Auon H  wrote:

> Hi,
>
> I am a first year Computer Science graduate student at University of
> Florida working on implementing KNN in Madlib. I am ready with a first
> version of it but I don't know how to proceed with testing and adding it to
> Madlib platform. Also, I am not clear on what standards do I have to choose
> in the final implementation. My current version asks for the table name and
> column name having vectors in which I have to find the neighbours. The
> other table given as input holds the vector whose K-NN needs to be found.
> It is assuming euclidean distance metric for distance calculation. It would
> really help if somebody can share ideas on what can be added to this
> functionality.
>
>
>
>
>
> Regards,
>
> Auon Haidar Kazmi
>


Re: [VOTE] MADlib v1.9.1-rc2

2016-09-02 Thread Nandish Jayaram
+1

On Fri, Sep 2, 2016 at 10:26 AM, Frank McQuillan 
wrote:

> Hello MADlib community,
>
> We have created a MADlib 1.9.1 RC-2, with the artifacts below up for a
> vote.
>
> This release candidate replaces RC-1.  The only difference between RC-1 and
> RC-2 is
> that some ._’ files were sneaked in by OSX during the packaging.
> These have been removed.
>
> This will be the 3rd release for Apache MADlib (incubating).
>
> The main goals of this release are:
> * new modules (1-class SVM for novelty detection, prediction metrics,
> sessionization, pivoting)
> * improvements to existing modules (class weights in SVM, overlapping
> patterns in path)
> * performance improvements (path)
> * platform updates (PostgreSQL 9.5 and 9.6)
> * bug fixes
> * doc improvements
>
> For more information including release notes, please see:
> https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.9.1
>
> *** Please download, review and vote by Tues Sep 6, 2016 @ 6pm PST ***
>
> We're voting upon the source (tag):  rc/1.9.1-rc2
>
> Source Files:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/1.9.
> 1-incubating-rc2
>
> Commit to be voted upon:
> https://git-wip-us.apache.org/repos/asf?p=incubator-madlib.git;a=commit;h=
> e1c99c1538dc124c9b323ba76382ba2af05c6892
>
> KEYS file containing PGP Keys we use to sign the release:
> https://dist.apache.org/repos/dist/dev/incubator/madlib/KEYS
>
> To help in tallying the vote, can PMC members please be sure to indicate
> "(binding)" with their vote.
>
> [ ] +1  approve
> [ ] +0  no opinion
> [ ] -1  disapprove (and reason why)
>
> Thank you,
> Frank McQuillan
>