Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Darin Johnson
Great will write up the doc link here, finish pirk63 then start this.

On Sep 19, 2016 5:34 PM, "Suneel Marthi"  wrote:

> +100
>
> On Mon, Sep 19, 2016 at 11:24 PM, Ellison Anne Williams <
> eawilli...@apache.org> wrote:
>
> > Yes, ES is just an inputformat (like HDFS, Kafka, etc) - we don't need a
> > separate submodule.
> >
> > Aside from pirk-core, it seems that we would want to break the responder
> > implementations out into submodules. This would leave us with something
> > along the lines of the following (at this point):
> >
> > pirk-core (encryption, core responder incl. standalone, core querier,
> > query, inputformat, serialization, utils)
> > pirk-storm
> > pirk-mapreduce
> > pirk-spark
> > pirk-benchmark
> > pirk-distributed-test
> >
> > Once we add other responder implementations, we can add them as
> submodules
> > - i.e. for Flink, we would have pirk-flink; for Beam, pirk-beam, etc.
> >
> > We could break 'pirk-core' down further...
> >
> > On Mon, Sep 19, 2016 at 5:10 PM, Suneel Marthi 
> > wrote:
> >
> > > Here's an example from the Flink project for how they go about new
> > features
> > > or system breaking API changes, we could start a similar process. The
> > Flink
> > > guys call these FLIP (Flink Improvement Proposal) and Kafka community
> > > similarly has something called KLIP.
> > >
> > > We could start a PLIP (??? :-) )
> > >
> > > https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=65870673
> > >
> > >
> > > On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi <
> suneel.mar...@gmail.com
> > >
> > > wrote:
> > >
> > > > A shared Google doc would be more convenient than a bunch of Jiras.
> Its
> > > > easier to comment and add notes that way.
> > > >
> > > >
> > > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <
> > dbjohnson1...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> Suneel, I'll try to put a couple jiras on it tonight with my
> thoughts.
> > > >> Based off my pirk-63 I was able to pull spark and storm out with no
> > > >> issues.  I was planning to pull them out, then tackling elastic
> > search,
> > > >> then hadoop as it's a little entrenched.  This should keep most PRs
> to
> > > >> manageable chunks. I think once that's done addressing the configs
> > will
> > > >> make more sense.
> > > >>
> > > >> I'm open to suggestions. But the hope would be:
> > > >> Pirk-parent
> > > >> Pirk-core
> > > >> Pirk-hadoop
> > > >> Pirk-storm
> > > >> Pirk-parent
> > > >>
> > > >> Pirk-es is a little weird as it's really just an inputformat, seems
> > like
> > > >> there's a more general solution here than creating submodules for
> > every
> > > >> inputformat.
> > > >>
> > > >> Darin
> > > >>
> > > >> On Sep 19, 2016 1:00 PM, "Suneel Marthi" 
> wrote:
> > > >>
> > > >> >
> > > >>
> > > >> > Refactor is definitely a first priority.  Is there a
> design/proposal
> > > >> draft
> > > >> > that we could comment on about how to go about refactoring the
> code.
> > > I
> > > >> > have been trying to keep up with the emails but definitely would
> > have
> > > >> > missed some.
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > > >> > eawilli...@apache.org > wrote:
> > > >> >
> > > >> > > Agree - let's leave the config/CLI the way it is for now and
> > tackle
> > > >> that as
> > > >> > > a subsequent design discussion and PR.
> > > >> > >
> > > >> > > Also, I think that we should leave the ResponderDriver and the
> > > >> > > ResponderProps alone for this PR and push to a subsequent PR
> (once
> > > we
> > > >> > > decide if and how we would like to delegate each).
> > > >> > >
> > > >> > > I vote to remove the 'platform' option and the backwards
> > > compatibility
> > > >> in
> > > >> > > this PR and proceed with having a ResponderLauncher interface
> and
> > > >> forcing
> > > >> > > its implementation by the ResponderDriver.
> > > >> > >
> > > >> > > And, I am not so concerned with having one fat jar vs. multiple
> > jars
> > > >> right
> > > >> > > now - to me, at this point, it's a 'nice to have' and not a
> 'must
> > > >> have'
> > > >> for
> > > >> > > Pirk functionality. We do need to break out Pirk into more
> clearly
> > > >> defined
> > > >> > > submodules (which is in progress) - via this re-factor, I think
> > that
> > > >> we
> > > >> > > will gain some ability to generate multiple jars which is nice.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <
> > > t.p.elli...@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > On 19/09/16 15:46, Darin Johnson wrote:
> > > >> > > > > Hey guys,
> > > >> > > > >
> > > >> > > > > Thanks for looking at the PR, I apologize if it offended
> > > anyone's
> > > >> > > eyes:).
> > > >> > > > >
> > > >> > > > > I'm glad it generated some discussion about the
> configuration.
> > > I
> > > >> > > didn't
> > > >> > > > > 

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Suneel Marthi
+100

On Mon, Sep 19, 2016 at 11:24 PM, Ellison Anne Williams <
eawilli...@apache.org> wrote:

> Yes, ES is just an inputformat (like HDFS, Kafka, etc) - we don't need a
> separate submodule.
>
> Aside from pirk-core, it seems that we would want to break the responder
> implementations out into submodules. This would leave us with something
> along the lines of the following (at this point):
>
> pirk-core (encryption, core responder incl. standalone, core querier,
> query, inputformat, serialization, utils)
> pirk-storm
> pirk-mapreduce
> pirk-spark
> pirk-benchmark
> pirk-distributed-test
>
> Once we add other responder implementations, we can add them as submodules
> - i.e. for Flink, we would have pirk-flink; for Beam, pirk-beam, etc.
>
> We could break 'pirk-core' down further...
>
> On Mon, Sep 19, 2016 at 5:10 PM, Suneel Marthi 
> wrote:
>
> > Here's an example from the Flink project for how they go about new
> features
> > or system breaking API changes, we could start a similar process. The
> Flink
> > guys call these FLIP (Flink Improvement Proposal) and Kafka community
> > similarly has something called KLIP.
> >
> > We could start a PLIP (??? :-) )
> >
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=65870673
> >
> >
> > On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi  >
> > wrote:
> >
> > > A shared Google doc would be more convenient than a bunch of Jiras. Its
> > > easier to comment and add notes that way.
> > >
> > >
> > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <
> dbjohnson1...@gmail.com
> > >
> > > wrote:
> > >
> > >> Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
> > >> Based off my pirk-63 I was able to pull spark and storm out with no
> > >> issues.  I was planning to pull them out, then tackling elastic
> search,
> > >> then hadoop as it's a little entrenched.  This should keep most PRs to
> > >> manageable chunks. I think once that's done addressing the configs
> will
> > >> make more sense.
> > >>
> > >> I'm open to suggestions. But the hope would be:
> > >> Pirk-parent
> > >> Pirk-core
> > >> Pirk-hadoop
> > >> Pirk-storm
> > >> Pirk-parent
> > >>
> > >> Pirk-es is a little weird as it's really just an inputformat, seems
> like
> > >> there's a more general solution here than creating submodules for
> every
> > >> inputformat.
> > >>
> > >> Darin
> > >>
> > >> On Sep 19, 2016 1:00 PM, "Suneel Marthi"  wrote:
> > >>
> > >> >
> > >>
> > >> > Refactor is definitely a first priority.  Is there a design/proposal
> > >> draft
> > >> > that we could comment on about how to go about refactoring the code.
> > I
> > >> > have been trying to keep up with the emails but definitely would
> have
> > >> > missed some.
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > >> > eawilli...@apache.org > wrote:
> > >> >
> > >> > > Agree - let's leave the config/CLI the way it is for now and
> tackle
> > >> that as
> > >> > > a subsequent design discussion and PR.
> > >> > >
> > >> > > Also, I think that we should leave the ResponderDriver and the
> > >> > > ResponderProps alone for this PR and push to a subsequent PR (once
> > we
> > >> > > decide if and how we would like to delegate each).
> > >> > >
> > >> > > I vote to remove the 'platform' option and the backwards
> > compatibility
> > >> in
> > >> > > this PR and proceed with having a ResponderLauncher interface and
> > >> forcing
> > >> > > its implementation by the ResponderDriver.
> > >> > >
> > >> > > And, I am not so concerned with having one fat jar vs. multiple
> jars
> > >> right
> > >> > > now - to me, at this point, it's a 'nice to have' and not a 'must
> > >> have'
> > >> for
> > >> > > Pirk functionality. We do need to break out Pirk into more clearly
> > >> defined
> > >> > > submodules (which is in progress) - via this re-factor, I think
> that
> > >> we
> > >> > > will gain some ability to generate multiple jars which is nice.
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <
> > t.p.elli...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > On 19/09/16 15:46, Darin Johnson wrote:
> > >> > > > > Hey guys,
> > >> > > > >
> > >> > > > > Thanks for looking at the PR, I apologize if it offended
> > anyone's
> > >> > > eyes:).
> > >> > > > >
> > >> > > > > I'm glad it generated some discussion about the configuration.
> > I
> > >> > > didn't
> > >> > > > > really like where things were heading with the config.
> However,
> > >> didn't
> > >> > > > > want to create to much scope creep.
> > >> > > > >
> > >> > > > > I think any hierarchical config (TypeSafe or yaml) would make
> > >> things
> > >> > > much
> > >> > > > > more maintainable, the plugin could simply grab the
> appropriate
> > >> part of
> > >> > > > the
> > >> > > > > config and handle accordingly.  I'd also cut down the number
> of
> > >> 

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Ellison Anne Williams
Yes, ES is just an inputformat (like HDFS, Kafka, etc) - we don't need a
separate submodule.

Aside from pirk-core, it seems that we would want to break the responder
implementations out into submodules. This would leave us with something
along the lines of the following (at this point):

pirk-core (encryption, core responder incl. standalone, core querier,
query, inputformat, serialization, utils)
pirk-storm
pirk-mapreduce
pirk-spark
pirk-benchmark
pirk-distributed-test

Once we add other responder implementations, we can add them as submodules
- i.e. for Flink, we would have pirk-flink; for Beam, pirk-beam, etc.

We could break 'pirk-core' down further...

On Mon, Sep 19, 2016 at 5:10 PM, Suneel Marthi 
wrote:

> Here's an example from the Flink project for how they go about new features
> or system breaking API changes, we could start a similar process. The Flink
> guys call these FLIP (Flink Improvement Proposal) and Kafka community
> similarly has something called KLIP.
>
> We could start a PLIP (??? :-) )
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673
>
>
> On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi 
> wrote:
>
> > A shared Google doc would be more convenient than a bunch of Jiras. Its
> > easier to comment and add notes that way.
> >
> >
> > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson  >
> > wrote:
> >
> >> Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
> >> Based off my pirk-63 I was able to pull spark and storm out with no
> >> issues.  I was planning to pull them out, then tackling elastic search,
> >> then hadoop as it's a little entrenched.  This should keep most PRs to
> >> manageable chunks. I think once that's done addressing the configs will
> >> make more sense.
> >>
> >> I'm open to suggestions. But the hope would be:
> >> Pirk-parent
> >> Pirk-core
> >> Pirk-hadoop
> >> Pirk-storm
> >> Pirk-parent
> >>
> >> Pirk-es is a little weird as it's really just an inputformat, seems like
> >> there's a more general solution here than creating submodules for every
> >> inputformat.
> >>
> >> Darin
> >>
> >> On Sep 19, 2016 1:00 PM, "Suneel Marthi"  wrote:
> >>
> >> >
> >>
> >> > Refactor is definitely a first priority.  Is there a design/proposal
> >> draft
> >> > that we could comment on about how to go about refactoring the code.
> I
> >> > have been trying to keep up with the emails but definitely would have
> >> > missed some.
> >> >
> >> >
> >> >
> >> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> >> > eawilli...@apache.org > wrote:
> >> >
> >> > > Agree - let's leave the config/CLI the way it is for now and tackle
> >> that as
> >> > > a subsequent design discussion and PR.
> >> > >
> >> > > Also, I think that we should leave the ResponderDriver and the
> >> > > ResponderProps alone for this PR and push to a subsequent PR (once
> we
> >> > > decide if and how we would like to delegate each).
> >> > >
> >> > > I vote to remove the 'platform' option and the backwards
> compatibility
> >> in
> >> > > this PR and proceed with having a ResponderLauncher interface and
> >> forcing
> >> > > its implementation by the ResponderDriver.
> >> > >
> >> > > And, I am not so concerned with having one fat jar vs. multiple jars
> >> right
> >> > > now - to me, at this point, it's a 'nice to have' and not a 'must
> >> have'
> >> for
> >> > > Pirk functionality. We do need to break out Pirk into more clearly
> >> defined
> >> > > submodules (which is in progress) - via this re-factor, I think that
> >> we
> >> > > will gain some ability to generate multiple jars which is nice.
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <
> t.p.elli...@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > On 19/09/16 15:46, Darin Johnson wrote:
> >> > > > > Hey guys,
> >> > > > >
> >> > > > > Thanks for looking at the PR, I apologize if it offended
> anyone's
> >> > > eyes:).
> >> > > > >
> >> > > > > I'm glad it generated some discussion about the configuration.
> I
> >> > > didn't
> >> > > > > really like where things were heading with the config.  However,
> >> didn't
> >> > > > > want to create to much scope creep.
> >> > > > >
> >> > > > > I think any hierarchical config (TypeSafe or yaml) would make
> >> things
> >> > > much
> >> > > > > more maintainable, the plugin could simply grab the appropriate
> >> part of
> >> > > > the
> >> > > > > config and handle accordingly.  I'd also cut down the number of
> >> command
> >> > > > > line options to only those that change between runs often (like
> >> > > > > input/output)
> >> > > > >
> >> > > > >> One option is to make Pirk pluggable, so that a Pirk
> installation
> >> > > could
> >> > > > >> use one or more of these in an extensible fashion by adding JAR
> >> files.
> >> > > > >> That would still require selecting one by command-line
> argument.
> 

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Suneel Marthi
Here's an example from the Flink project for how they go about new features
or system breaking API changes, we could start a similar process. The Flink
guys call these FLIP (Flink Improvement Proposal) and Kafka community
similarly has something called KLIP.

We could start a PLIP (??? :-) )

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673


On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi 
wrote:

> A shared Google doc would be more convenient than a bunch of Jiras. Its
> easier to comment and add notes that way.
>
>
> On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson 
> wrote:
>
>> Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
>> Based off my pirk-63 I was able to pull spark and storm out with no
>> issues.  I was planning to pull them out, then tackling elastic search,
>> then hadoop as it's a little entrenched.  This should keep most PRs to
>> manageable chunks. I think once that's done addressing the configs will
>> make more sense.
>>
>> I'm open to suggestions. But the hope would be:
>> Pirk-parent
>> Pirk-core
>> Pirk-hadoop
>> Pirk-storm
>> Pirk-parent
>>
>> Pirk-es is a little weird as it's really just an inputformat, seems like
>> there's a more general solution here than creating submodules for every
>> inputformat.
>>
>> Darin
>>
>> On Sep 19, 2016 1:00 PM, "Suneel Marthi"  wrote:
>>
>> >
>>
>> > Refactor is definitely a first priority.  Is there a design/proposal
>> draft
>> > that we could comment on about how to go about refactoring the code.  I
>> > have been trying to keep up with the emails but definitely would have
>> > missed some.
>> >
>> >
>> >
>> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
>> > eawilli...@apache.org > wrote:
>> >
>> > > Agree - let's leave the config/CLI the way it is for now and tackle
>> that as
>> > > a subsequent design discussion and PR.
>> > >
>> > > Also, I think that we should leave the ResponderDriver and the
>> > > ResponderProps alone for this PR and push to a subsequent PR (once we
>> > > decide if and how we would like to delegate each).
>> > >
>> > > I vote to remove the 'platform' option and the backwards compatibility
>> in
>> > > this PR and proceed with having a ResponderLauncher interface and
>> forcing
>> > > its implementation by the ResponderDriver.
>> > >
>> > > And, I am not so concerned with having one fat jar vs. multiple jars
>> right
>> > > now - to me, at this point, it's a 'nice to have' and not a 'must
>> have'
>> for
>> > > Pirk functionality. We do need to break out Pirk into more clearly
>> defined
>> > > submodules (which is in progress) - via this re-factor, I think that
>> we
>> > > will gain some ability to generate multiple jars which is nice.
>> > >
>> > >
>> > >
>> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison 
>> > > wrote:
>> > >
>> > > > On 19/09/16 15:46, Darin Johnson wrote:
>> > > > > Hey guys,
>> > > > >
>> > > > > Thanks for looking at the PR, I apologize if it offended anyone's
>> > > eyes:).
>> > > > >
>> > > > > I'm glad it generated some discussion about the configuration.  I
>> > > didn't
>> > > > > really like where things were heading with the config.  However,
>> didn't
>> > > > > want to create to much scope creep.
>> > > > >
>> > > > > I think any hierarchical config (TypeSafe or yaml) would make
>> things
>> > > much
>> > > > > more maintainable, the plugin could simply grab the appropriate
>> part of
>> > > > the
>> > > > > config and handle accordingly.  I'd also cut down the number of
>> command
>> > > > > line options to only those that change between runs often (like
>> > > > > input/output)
>> > > > >
>> > > > >> One option is to make Pirk pluggable, so that a Pirk installation
>> > > could
>> > > > >> use one or more of these in an extensible fashion by adding JAR
>> files.
>> > > > >> That would still require selecting one by command-line argument.
>> > > > >
>> > > > > An argument for this approach is for lambda architecture
>> approaches
>> > > (say
>> > > > > spark/spark-streaming) were the contents of the jars would be so
>> > > similar
>> > > > it
>> > > > > seems like to much trouble to create separate jars.
>> > > > >
>> > > > > Happy to continue working on this given some direction on where
>> you'd
>> > > > like
>> > > > > it to go.  Also, it's a bit of a blocker to refactoring the build
>> into
>> > > > > submodules.
>> > > >
>> > > > FWIW my 2c is to not try and fix all the problems in one go, and
>> rather
>> > > > take a compromise on the configurations while you tease apart the
>> > > > submodules in to separate source code trees, poms, etc; then come
>> back
>> > > > and fix the runtime configs.
>> > > >
>> > > > Once the submodules are in place it will open up more work for
>> release
>> > > > engineering and tinkering that can be done in parallel with the
>> config
>> > > > polishing.
>> > > >
>> > > > 

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Ellison Anne Williams
Sounds good.

On Mon, Sep 19, 2016 at 4:22 PM, Darin Johnson 
wrote:

> Alright, that was in the spirit of what I was thinking when I did this.
>
> Why don't we take Tim's suggested improvements to my PR (I'll do the
> necessary cleanup) and at the same time just remove the platform argument
> altogether since backwards compatibility isn't upsetting anyone.
>
> We'll still need a command line option for the launcher for now as we don't
> have submodules we can decide between the two choices after we break out
> submodules and improve the config.
>
>
> On Sep 19, 2016 12:19 PM, "Tim Ellison"  wrote:
>
> > On 19/09/16 15:46, Darin Johnson wrote:
> > > Hey guys,
> > >
> > > Thanks for looking at the PR, I apologize if it offended anyone's
> eyes:).
> > >
> > > I'm glad it generated some discussion about the configuration.  I
> didn't
> > > really like where things were heading with the config.  However, didn't
> > > want to create to much scope creep.
> > >
> > > I think any hierarchical config (TypeSafe or yaml) would make things
> much
> > > more maintainable, the plugin could simply grab the appropriate part of
> > the
> > > config and handle accordingly.  I'd also cut down the number of command
> > > line options to only those that change between runs often (like
> > > input/output)
> > >
> > >> One option is to make Pirk pluggable, so that a Pirk installation
> could
> > >> use one or more of these in an extensible fashion by adding JAR files.
> > >> That would still require selecting one by command-line argument.
> > >
> > > An argument for this approach is for lambda architecture approaches
> (say
> > > spark/spark-streaming) were the contents of the jars would be so
> similar
> > it
> > > seems like to much trouble to create separate jars.
> > >
> > > Happy to continue working on this given some direction on where you'd
> > like
> > > it to go.  Also, it's a bit of a blocker to refactoring the build into
> > > submodules.
> >
> > FWIW my 2c is to not try and fix all the problems in one go, and rather
> > take a compromise on the configurations while you tease apart the
> > submodules in to separate source code trees, poms, etc; then come back
> > and fix the runtime configs.
> >
> > Once the submodules are in place it will open up more work for release
> > engineering and tinkering that can be done in parallel with the config
> > polishing.
> >
> > Just a thought.
> > Tim
> >
> >
> > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison 
> > wrote:
> > >
> > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > >>> It seems that it's the same idea as the ResponderLauncher with the
> > >> service
> > >>> component added to maintain something akin to the 'platform'. I would
> > >>> prefer that we just did away with the platform notion altogether and
> > make
> > >>> the ResponderDriver 'dumb'. We get around needing a platform-aware
> > >> service
> > >>> by requiring the ResponderLauncher implementation to be passed as a
> CLI
> > >> to
> > >>> the ResponderDriver.
> > >>
> > >> Let me check I understand what you are saying here.
> > >>
> > >> At the moment, there is a monolithic Pirk that hard codes how to
> respond
> > >> using lots of different backends (mapreduce, spark, sparkstreaming,
> > >> storm , standalone), and that is selected by command-line argument.
> > >>
> > >> One option is to make Pirk pluggable, so that a Pirk installation
> could
> > >> use one or more of these in an extensible fashion by adding JAR files.
> > >> That would still require selecting one by command-line argument.
> > >>
> > >> A second option is to simply pass in the required backend JAR to
> select
> > >> the particular implementation you choose, as a specific Pirk
> > >> installation doesn't need to use multiple backends simultaneously.
> > >>
> > >> ...and you are leaning towards the second option.  Do I have that
> > correct?
> > >>
> > >> Regards,
> > >> Tim
> > >>
> > >>> Am I missing something? Is there a good reason to provide a service
> by
> > >>> which platforms are registered? I'm open...
> > >>>
> > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison 
> > >> wrote:
> > >>>
> >  How about an approach like this?
> > https://github.com/tellison/incubator-pirk/tree/pirk-63
> > 
> >  The "on-ramp" is the driver [1], which calls upon the service to
> find
> > a
> >  plug-in [2] that claims to implement the required platform
> responder,
> >  e.g. [3].
> > 
> >  The list of plug-ins is given in the provider's JAR file, so the
> ones
> > we
> >  provide in Pirk are listed together [4], but if you split these into
> >  modules, or somebody brings their own JAR alongside, these would be
> >  listed in each JAR's services/ directory.
> > 
> >  [1]
> >  https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >  src/main/java/org/apache/pirk/responder/wideskies/
> > 

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Darin Johnson
Sure will do tonight.

On Sep 19, 2016 5:07 PM, "Suneel Marthi"  wrote:

> A shared Google doc would be more convenient than a bunch of Jiras. Its
> easier to comment and add notes that way.
>
>
> On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson 
> wrote:
>
> > Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
> > Based off my pirk-63 I was able to pull spark and storm out with no
> > issues.  I was planning to pull them out, then tackling elastic search,
> > then hadoop as it's a little entrenched.  This should keep most PRs to
> > manageable chunks. I think once that's done addressing the configs will
> > make more sense.
> >
> > I'm open to suggestions. But the hope would be:
> > Pirk-parent
> > Pirk-core
> > Pirk-hadoop
> > Pirk-storm
> > Pirk-parent
> >
> > Pirk-es is a little weird as it's really just an inputformat, seems like
> > there's a more general solution here than creating submodules for every
> > inputformat.
> >
> > Darin
> >
> > On Sep 19, 2016 1:00 PM, "Suneel Marthi"  wrote:
> >
> > >
> >
> > > Refactor is definitely a first priority.  Is there a design/proposal
> > draft
> > > that we could comment on about how to go about refactoring the code.  I
> > > have been trying to keep up with the emails but definitely would have
> > > missed some.
> > >
> > >
> > >
> > > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > > eawilli...@apache.org > wrote:
> > >
> > > > Agree - let's leave the config/CLI the way it is for now and tackle
> > that as
> > > > a subsequent design discussion and PR.
> > > >
> > > > Also, I think that we should leave the ResponderDriver and the
> > > > ResponderProps alone for this PR and push to a subsequent PR (once we
> > > > decide if and how we would like to delegate each).
> > > >
> > > > I vote to remove the 'platform' option and the backwards
> compatibility
> > in
> > > > this PR and proceed with having a ResponderLauncher interface and
> > forcing
> > > > its implementation by the ResponderDriver.
> > > >
> > > > And, I am not so concerned with having one fat jar vs. multiple jars
> > right
> > > > now - to me, at this point, it's a 'nice to have' and not a 'must
> have'
> > for
> > > > Pirk functionality. We do need to break out Pirk into more clearly
> > defined
> > > > submodules (which is in progress) - via this re-factor, I think that
> we
> > > > will gain some ability to generate multiple jars which is nice.
> > > >
> > > >
> > > >
> > > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison  >
> > > > wrote:
> > > >
> > > > > On 19/09/16 15:46, Darin Johnson wrote:
> > > > > > Hey guys,
> > > > > >
> > > > > > Thanks for looking at the PR, I apologize if it offended anyone's
> > > > eyes:).
> > > > > >
> > > > > > I'm glad it generated some discussion about the configuration.  I
> > > > didn't
> > > > > > really like where things were heading with the config.  However,
> > didn't
> > > > > > want to create to much scope creep.
> > > > > >
> > > > > > I think any hierarchical config (TypeSafe or yaml) would make
> > things
> > > > much
> > > > > > more maintainable, the plugin could simply grab the appropriate
> > part of
> > > > > the
> > > > > > config and handle accordingly.  I'd also cut down the number of
> > command
> > > > > > line options to only those that change between runs often (like
> > > > > > input/output)
> > > > > >
> > > > > >> One option is to make Pirk pluggable, so that a Pirk
> installation
> > > > could
> > > > > >> use one or more of these in an extensible fashion by adding JAR
> > files.
> > > > > >> That would still require selecting one by command-line argument.
> > > > > >
> > > > > > An argument for this approach is for lambda architecture
> approaches
> > > > (say
> > > > > > spark/spark-streaming) were the contents of the jars would be so
> > > > similar
> > > > > it
> > > > > > seems like to much trouble to create separate jars.
> > > > > >
> > > > > > Happy to continue working on this given some direction on where
> > you'd
> > > > > like
> > > > > > it to go.  Also, it's a bit of a blocker to refactoring the build
> > into
> > > > > > submodules.
> > > > >
> > > > > FWIW my 2c is to not try and fix all the problems in one go, and
> > rather
> > > > > take a compromise on the configurations while you tease apart the
> > > > > submodules in to separate source code trees, poms, etc; then come
> > back
> > > > > and fix the runtime configs.
> > > > >
> > > > > Once the submodules are in place it will open up more work for
> > release
> > > > > engineering and tinkering that can be done in parallel with the
> > config
> > > > > polishing.
> > > > >
> > > > > Just a thought.
> > > > > Tim
> > > > >
> > > > >
> > > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
> > t.p.elli...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > 

[GitHub] incubator-pirk issue #95: [WIP] [PIRK-69] Improve clarity of group theory po...

2016-09-19 Thread wraydulany
Github user wraydulany commented on the issue:

https://github.com/apache/incubator-pirk/pull/95
  
Please don't close this until we have some feedback from the community 
indicating that the changes provide sufficient background to make clear my 
(previously quite obscure, unless you knew group theory terminology off the top 
of your head already) mathematical notation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Darin Johnson
Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
Based off my pirk-63 I was able to pull spark and storm out with no
issues.  I was planning to pull them out, then tackling elastic search,
then hadoop as it's a little entrenched.  This should keep most PRs to
manageable chunks. I think once that's done addressing the configs will
make more sense.

I'm open to suggestions. But the hope would be:
Pirk-parent
Pirk-core
Pirk-hadoop
Pirk-storm
Pirk-parent

Pirk-es is a little weird as it's really just an inputformat, seems like
there's a more general solution here than creating submodules for every
inputformat.

Darin

On Sep 19, 2016 1:00 PM, "Suneel Marthi"  wrote:

>

> Refactor is definitely a first priority.  Is there a design/proposal draft
> that we could comment on about how to go about refactoring the code.  I
> have been trying to keep up with the emails but definitely would have
> missed some.
>
>
>
> On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> eawilli...@apache.org > wrote:
>
> > Agree - let's leave the config/CLI the way it is for now and tackle
that as
> > a subsequent design discussion and PR.
> >
> > Also, I think that we should leave the ResponderDriver and the
> > ResponderProps alone for this PR and push to a subsequent PR (once we
> > decide if and how we would like to delegate each).
> >
> > I vote to remove the 'platform' option and the backwards compatibility
in
> > this PR and proceed with having a ResponderLauncher interface and
forcing
> > its implementation by the ResponderDriver.
> >
> > And, I am not so concerned with having one fat jar vs. multiple jars
right
> > now - to me, at this point, it's a 'nice to have' and not a 'must have'
for
> > Pirk functionality. We do need to break out Pirk into more clearly
defined
> > submodules (which is in progress) - via this re-factor, I think that we
> > will gain some ability to generate multiple jars which is nice.
> >
> >
> >
> > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison 
> > wrote:
> >
> > > On 19/09/16 15:46, Darin Johnson wrote:
> > > > Hey guys,
> > > >
> > > > Thanks for looking at the PR, I apologize if it offended anyone's
> > eyes:).
> > > >
> > > > I'm glad it generated some discussion about the configuration.  I
> > didn't
> > > > really like where things were heading with the config.  However,
didn't
> > > > want to create to much scope creep.
> > > >
> > > > I think any hierarchical config (TypeSafe or yaml) would make things
> > much
> > > > more maintainable, the plugin could simply grab the appropriate
part of
> > > the
> > > > config and handle accordingly.  I'd also cut down the number of
command
> > > > line options to only those that change between runs often (like
> > > > input/output)
> > > >
> > > >> One option is to make Pirk pluggable, so that a Pirk installation
> > could
> > > >> use one or more of these in an extensible fashion by adding JAR
files.
> > > >> That would still require selecting one by command-line argument.
> > > >
> > > > An argument for this approach is for lambda architecture approaches
> > (say
> > > > spark/spark-streaming) were the contents of the jars would be so
> > similar
> > > it
> > > > seems like to much trouble to create separate jars.
> > > >
> > > > Happy to continue working on this given some direction on where
you'd
> > > like
> > > > it to go.  Also, it's a bit of a blocker to refactoring the build
into
> > > > submodules.
> > >
> > > FWIW my 2c is to not try and fix all the problems in one go, and
rather
> > > take a compromise on the configurations while you tease apart the
> > > submodules in to separate source code trees, poms, etc; then come back
> > > and fix the runtime configs.
> > >
> > > Once the submodules are in place it will open up more work for release
> > > engineering and tinkering that can be done in parallel with the config
> > > polishing.
> > >
> > > Just a thought.
> > > Tim
> > >
> > >
> > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison 
> > > wrote:
> > > >
> > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > > >>> It seems that it's the same idea as the ResponderLauncher with the
> > > >> service
> > > >>> component added to maintain something akin to the 'platform'. I
would
> > > >>> prefer that we just did away with the platform notion altogether
and
> > > make
> > > >>> the ResponderDriver 'dumb'. We get around needing a platform-aware
> > > >> service
> > > >>> by requiring the ResponderLauncher implementation to be passed as
a
> > CLI
> > > >> to
> > > >>> the ResponderDriver.
> > > >>
> > > >> Let me check I understand what you are saying here.
> > > >>
> > > >> At the moment, there is a monolithic Pirk that hard codes how to
> > respond
> > > >> using lots of different backends (mapreduce, spark, sparkstreaming,
> > > >> storm , standalone), and that is selected by command-line argument.
> > > >>
> > > >> One 

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Darin Johnson
Alright, that was in the spirit of what I was thinking when I did this.

Why don't we take Tim's suggested improvements to my PR (I'll do the
necessary cleanup) and at the same time just remove the platform argument
altogether since backwards compatibility isn't upsetting anyone.

We'll still need a command line option for the launcher for now as we don't
have submodules we can decide between the two choices after we break out
submodules and improve the config.


On Sep 19, 2016 12:19 PM, "Tim Ellison"  wrote:

> On 19/09/16 15:46, Darin Johnson wrote:
> > Hey guys,
> >
> > Thanks for looking at the PR, I apologize if it offended anyone's eyes:).
> >
> > I'm glad it generated some discussion about the configuration.  I didn't
> > really like where things were heading with the config.  However, didn't
> > want to create to much scope creep.
> >
> > I think any hierarchical config (TypeSafe or yaml) would make things much
> > more maintainable, the plugin could simply grab the appropriate part of
> the
> > config and handle accordingly.  I'd also cut down the number of command
> > line options to only those that change between runs often (like
> > input/output)
> >
> >> One option is to make Pirk pluggable, so that a Pirk installation could
> >> use one or more of these in an extensible fashion by adding JAR files.
> >> That would still require selecting one by command-line argument.
> >
> > An argument for this approach is for lambda architecture approaches (say
> > spark/spark-streaming) were the contents of the jars would be so similar
> it
> > seems like to much trouble to create separate jars.
> >
> > Happy to continue working on this given some direction on where you'd
> like
> > it to go.  Also, it's a bit of a blocker to refactoring the build into
> > submodules.
>
> FWIW my 2c is to not try and fix all the problems in one go, and rather
> take a compromise on the configurations while you tease apart the
> submodules in to separate source code trees, poms, etc; then come back
> and fix the runtime configs.
>
> Once the submodules are in place it will open up more work for release
> engineering and tinkering that can be done in parallel with the config
> polishing.
>
> Just a thought.
> Tim
>
>
> > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison 
> wrote:
> >
> >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> >>> It seems that it's the same idea as the ResponderLauncher with the
> >> service
> >>> component added to maintain something akin to the 'platform'. I would
> >>> prefer that we just did away with the platform notion altogether and
> make
> >>> the ResponderDriver 'dumb'. We get around needing a platform-aware
> >> service
> >>> by requiring the ResponderLauncher implementation to be passed as a CLI
> >> to
> >>> the ResponderDriver.
> >>
> >> Let me check I understand what you are saying here.
> >>
> >> At the moment, there is a monolithic Pirk that hard codes how to respond
> >> using lots of different backends (mapreduce, spark, sparkstreaming,
> >> storm , standalone), and that is selected by command-line argument.
> >>
> >> One option is to make Pirk pluggable, so that a Pirk installation could
> >> use one or more of these in an extensible fashion by adding JAR files.
> >> That would still require selecting one by command-line argument.
> >>
> >> A second option is to simply pass in the required backend JAR to select
> >> the particular implementation you choose, as a specific Pirk
> >> installation doesn't need to use multiple backends simultaneously.
> >>
> >> ...and you are leaning towards the second option.  Do I have that
> correct?
> >>
> >> Regards,
> >> Tim
> >>
> >>> Am I missing something? Is there a good reason to provide a service by
> >>> which platforms are registered? I'm open...
> >>>
> >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison 
> >> wrote:
> >>>
>  How about an approach like this?
> https://github.com/tellison/incubator-pirk/tree/pirk-63
> 
>  The "on-ramp" is the driver [1], which calls upon the service to find
> a
>  plug-in [2] that claims to implement the required platform responder,
>  e.g. [3].
> 
>  The list of plug-ins is given in the provider's JAR file, so the ones
> we
>  provide in Pirk are listed together [4], but if you split these into
>  modules, or somebody brings their own JAR alongside, these would be
>  listed in each JAR's services/ directory.
> 
>  [1]
>  https://github.com/tellison/incubator-pirk/blob/pirk-63/
>  src/main/java/org/apache/pirk/responder/wideskies/
> ResponderDriver.java
>  [2]
>  https://github.com/tellison/incubator-pirk/blob/pirk-63/
>  src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
>  [3]
>  https://github.com/tellison/incubator-pirk/blob/pirk-63/
>  src/main/java/org/apache/pirk/responder/wideskies/storm/
>  StormResponder.java
>  [4]
> 

Re: Math deck (was: Re: [GitHub] incubator-pirk pull request #92: [Pirk 67] - Add Slide Deck to the Website D...)

2016-09-19 Thread Walter Ray-Dulany
One explicit vote, one implicit vote for updating/clarifying the slides.

I've created PIRK-69 to improve slide clarity.

Unless this doesn't make sense (tell me) I'll mark PRs on this as WIPs
until I've got some agreement from the community that the slides are clear
enough.

On Mon, Sep 19, 2016 at 3:31 PM, Ryan Carr  wrote:

> Hey Walter / Tim,
>
>   I just wanted to add I had some trouble similar to Tim's when trying to
> understand the Wideskies paper. As a person without a background in group
> theory/theoretical math trying to get my head around this stuff, it was
> very difficult for me to even find with Google what the notations (Z/NZ)
> and (Z/N^2 Z)^x (also called (Z/N^2 Z)* in the Wideskies/Paillier papers)
> meant. Since these concepts are so central to how the algorithm works, I
> think it would be really helpful if we had a footnote the first time those
> are introduced defining that notation with links to a more in-depth
> explanation, or at least a phrase that can be Googled to reliably find it.
>
> Thanks,
> -Ryan Carr
>
> On Mon, Sep 19, 2016 at 1:50 PM Walter Ray-Dulany 
> wrote:
>
> > Correction:
> >
> > ...bby the binomial theorem, (1+N)**N = 1 + N*N + other terms
> divisible...
> >
> > I multiplied by N on the left when I ought to have exponentiated
> >
> > Walter
> >
> > On Mon, Sep 19, 2016 at 1:36 PM, Walter Ray-Dulany  >
> > wrote:
> >
> > > Hi Tim,
> > >
> > > Apologies! It's disorienting at first, and most of all when one
> actually
> > > tries to sit down and do a real example. The version on the slides was
> > not
> > > written in one go, I assure you.
> > >
> > > Let's go through, and see what's not working.
> > >
> > > **
> > >
> > > > I'm trying a very simple example.  I'm going to choose, p = 3, q = 5
> > and
> > > a message m = 42
> > >
> > > Already we're in trouble. p and q are fine; but remember that the
> > > plaintext space (let's call it P(N)) is the set of all integers in
> Z/NZ;
> > > that is, it is all numbers m
> > >
> > > 0 <=  m < N
> > >
> > > You can see already that the m you chose is not in the plaintext space.
> > >
> > > Let's pick a new m to continue with; in this case, let's choose your m,
> > > but mod 15 so that it lies in P(N). Thus, our new m going forward shall
> > be
> > >
> > > m = 12
> > >
> > > **
> > >
> > > > I'm going to pick g = 240.  I think it needs to be a multiple of N
> that
> > > is greater than N*N, correct?
> > >
> > > No, and this is important. g has to be an element of (Z/(N squared )Z)*
> > of
> > > order a nonzero multiple of N. That sentence is meaningless unless
> you're
> > > already embedded in the mathematics, so let's go through what it means,
> > bit
> > > by bit.
> > >
> > > g must be:
> > > 1. *an element of (Z/(N squared)Z)**: everything but the outer * on the
> > > right just means that 0 <= g < N*N; in this case that means 0 <= g <
> 225.
> > > The outer * on the right indicates that we only want to take a certain
> > > special kind of g: one that is what we call a *unit* mod N*N; that is,
> it
> > > means that we require that there exist another element 0<= h < N*N such
> > > that g*h = 1 mod N*N. In our current situation, N = p*q is a product of
> > > primes, and so N*N = p**2 * q**2, and we can easily characterize G =
> > (Z/(N
> > > squared)Z)*: G = { 0<= g < N*N such that neither p nor q divide g}. So
> as
> > > long as we pick a g that does not have p or q as a factor, we're good
> for
> > > this condition (this also includes 0, so really all of my "0 <=" in
> this
> > > paragraph could have been "0 < "). Another way to characterize G is to
> > say
> > > that it is the set of integers less than N*N that are relatively prime
> to
> > > N*N.
> > >
> > > 2. *of order a nonzero multiple of N*: this is a little trickier.  The
> > > *order* of an element g of a finite group (which G is) is the least
> > > integer k such that g^k = 1 in G. I'm not going to prove it here, but
> it
> > > turns out that every element of G has finite order (that is, if g is in
> > G,
> > > then there exists a finite non-zero k such that g^k = 1), and that it
> is
> > > less than or equal to the Carmichael number lambda(N*N). That takes
> care
> > of
> > > what 'order' means, and, like I said, order is defined for all g in G.
> > But!
> > > We require a special order. Specifically, we only want g in G such that
> > the
> > > order of g is a non-zero multiple of N. We might ask whether we know
> that
> > > such always exists (a good question, since we require it), and we do!
> > > Here's a quick proof of existence, one tied closely to Wideskies:
> > >
> > > * Take g = 1 + N (I'm going to prove, all at once, that 1+N is in G and
> > > that it has an order that fits the bill).
> > > * Consider g**N: by the binomial theorem, (1+N)*N = 1 + N*N + other
> terms
> > > divisible by N*N. This number is equivalent to 1 mod N*N. QED
> > 

Re: Math deck (was: Re: [GitHub] incubator-pirk pull request #92: [Pirk 67] - Add Slide Deck to the Website D...)

2016-09-19 Thread Ryan Carr
Hey Walter / Tim,

  I just wanted to add I had some trouble similar to Tim's when trying to
understand the Wideskies paper. As a person without a background in group
theory/theoretical math trying to get my head around this stuff, it was
very difficult for me to even find with Google what the notations (Z/NZ)
and (Z/N^2 Z)^x (also called (Z/N^2 Z)* in the Wideskies/Paillier papers)
meant. Since these concepts are so central to how the algorithm works, I
think it would be really helpful if we had a footnote the first time those
are introduced defining that notation with links to a more in-depth
explanation, or at least a phrase that can be Googled to reliably find it.

Thanks,
-Ryan Carr

On Mon, Sep 19, 2016 at 1:50 PM Walter Ray-Dulany 
wrote:

> Correction:
>
> ...bby the binomial theorem, (1+N)**N = 1 + N*N + other terms divisible...
>
> I multiplied by N on the left when I ought to have exponentiated
>
> Walter
>
> On Mon, Sep 19, 2016 at 1:36 PM, Walter Ray-Dulany 
> wrote:
>
> > Hi Tim,
> >
> > Apologies! It's disorienting at first, and most of all when one actually
> > tries to sit down and do a real example. The version on the slides was
> not
> > written in one go, I assure you.
> >
> > Let's go through, and see what's not working.
> >
> > **
> >
> > > I'm trying a very simple example.  I'm going to choose, p = 3, q = 5
> and
> > a message m = 42
> >
> > Already we're in trouble. p and q are fine; but remember that the
> > plaintext space (let's call it P(N)) is the set of all integers in Z/NZ;
> > that is, it is all numbers m
> >
> > 0 <=  m < N
> >
> > You can see already that the m you chose is not in the plaintext space.
> >
> > Let's pick a new m to continue with; in this case, let's choose your m,
> > but mod 15 so that it lies in P(N). Thus, our new m going forward shall
> be
> >
> > m = 12
> >
> > **
> >
> > > I'm going to pick g = 240.  I think it needs to be a multiple of N that
> > is greater than N*N, correct?
> >
> > No, and this is important. g has to be an element of (Z/(N squared )Z)*
> of
> > order a nonzero multiple of N. That sentence is meaningless unless you're
> > already embedded in the mathematics, so let's go through what it means,
> bit
> > by bit.
> >
> > g must be:
> > 1. *an element of (Z/(N squared)Z)**: everything but the outer * on the
> > right just means that 0 <= g < N*N; in this case that means 0 <= g < 225.
> > The outer * on the right indicates that we only want to take a certain
> > special kind of g: one that is what we call a *unit* mod N*N; that is, it
> > means that we require that there exist another element 0<= h < N*N such
> > that g*h = 1 mod N*N. In our current situation, N = p*q is a product of
> > primes, and so N*N = p**2 * q**2, and we can easily characterize G =
> (Z/(N
> > squared)Z)*: G = { 0<= g < N*N such that neither p nor q divide g}. So as
> > long as we pick a g that does not have p or q as a factor, we're good for
> > this condition (this also includes 0, so really all of my "0 <=" in this
> > paragraph could have been "0 < "). Another way to characterize G is to
> say
> > that it is the set of integers less than N*N that are relatively prime to
> > N*N.
> >
> > 2. *of order a nonzero multiple of N*: this is a little trickier.  The
> > *order* of an element g of a finite group (which G is) is the least
> > integer k such that g^k = 1 in G. I'm not going to prove it here, but it
> > turns out that every element of G has finite order (that is, if g is in
> G,
> > then there exists a finite non-zero k such that g^k = 1), and that it is
> > less than or equal to the Carmichael number lambda(N*N). That takes care
> of
> > what 'order' means, and, like I said, order is defined for all g in G.
> But!
> > We require a special order. Specifically, we only want g in G such that
> the
> > order of g is a non-zero multiple of N. We might ask whether we know that
> > such always exists (a good question, since we require it), and we do!
> > Here's a quick proof of existence, one tied closely to Wideskies:
> >
> > * Take g = 1 + N (I'm going to prove, all at once, that 1+N is in G and
> > that it has an order that fits the bill).
> > * Consider g**N: by the binomial theorem, (1+N)*N = 1 + N*N + other terms
> > divisible by N*N. This number is equivalent to 1 mod N*N. QED
> >
> > Ok, great, such g exist, and so we can require that we use one of them.
> > But you must be careful: you can't just choose any g in G off the street
> > and expect it will satisfy our requirements. You chose g = 240, which (1)
> > bigger than N*N, which isn't what we want, and (2) is divisible by N, and
> > so even if we take 240 mod N*N, we still aren't in G, much less of the
> > 'right order' (turns out 240, being not relatively prime to N, can never
> be
> > exponentiated to 1 mod N*N). For now, let's just take the standard
> > Wideskies g, g = 1 + N = 16. If you want to go through 

Re: Math deck (was: Re: [GitHub] incubator-pirk pull request #92: [Pirk 67] - Add Slide Deck to the Website D...)

2016-09-19 Thread Walter Ray-Dulany
Correction:

...bby the binomial theorem, (1+N)**N = 1 + N*N + other terms divisible...

I multiplied by N on the left when I ought to have exponentiated

Walter

On Mon, Sep 19, 2016 at 1:36 PM, Walter Ray-Dulany 
wrote:

> Hi Tim,
>
> Apologies! It's disorienting at first, and most of all when one actually
> tries to sit down and do a real example. The version on the slides was not
> written in one go, I assure you.
>
> Let's go through, and see what's not working.
>
> **
>
> > I'm trying a very simple example.  I'm going to choose, p = 3, q = 5 and
> a message m = 42
>
> Already we're in trouble. p and q are fine; but remember that the
> plaintext space (let's call it P(N)) is the set of all integers in Z/NZ;
> that is, it is all numbers m
>
> 0 <=  m < N
>
> You can see already that the m you chose is not in the plaintext space.
>
> Let's pick a new m to continue with; in this case, let's choose your m,
> but mod 15 so that it lies in P(N). Thus, our new m going forward shall be
>
> m = 12
>
> **
>
> > I'm going to pick g = 240.  I think it needs to be a multiple of N that
> is greater than N*N, correct?
>
> No, and this is important. g has to be an element of (Z/(N squared )Z)* of
> order a nonzero multiple of N. That sentence is meaningless unless you're
> already embedded in the mathematics, so let's go through what it means, bit
> by bit.
>
> g must be:
> 1. *an element of (Z/(N squared)Z)**: everything but the outer * on the
> right just means that 0 <= g < N*N; in this case that means 0 <= g < 225.
> The outer * on the right indicates that we only want to take a certain
> special kind of g: one that is what we call a *unit* mod N*N; that is, it
> means that we require that there exist another element 0<= h < N*N such
> that g*h = 1 mod N*N. In our current situation, N = p*q is a product of
> primes, and so N*N = p**2 * q**2, and we can easily characterize G = (Z/(N
> squared)Z)*: G = { 0<= g < N*N such that neither p nor q divide g}. So as
> long as we pick a g that does not have p or q as a factor, we're good for
> this condition (this also includes 0, so really all of my "0 <=" in this
> paragraph could have been "0 < "). Another way to characterize G is to say
> that it is the set of integers less than N*N that are relatively prime to
> N*N.
>
> 2. *of order a nonzero multiple of N*: this is a little trickier.  The
> *order* of an element g of a finite group (which G is) is the least
> integer k such that g^k = 1 in G. I'm not going to prove it here, but it
> turns out that every element of G has finite order (that is, if g is in G,
> then there exists a finite non-zero k such that g^k = 1), and that it is
> less than or equal to the Carmichael number lambda(N*N). That takes care of
> what 'order' means, and, like I said, order is defined for all g in G. But!
> We require a special order. Specifically, we only want g in G such that the
> order of g is a non-zero multiple of N. We might ask whether we know that
> such always exists (a good question, since we require it), and we do!
> Here's a quick proof of existence, one tied closely to Wideskies:
>
> * Take g = 1 + N (I'm going to prove, all at once, that 1+N is in G and
> that it has an order that fits the bill).
> * Consider g**N: by the binomial theorem, (1+N)*N = 1 + N*N + other terms
> divisible by N*N. This number is equivalent to 1 mod N*N. QED
>
> Ok, great, such g exist, and so we can require that we use one of them.
> But you must be careful: you can't just choose any g in G off the street
> and expect it will satisfy our requirements. You chose g = 240, which (1)
> bigger than N*N, which isn't what we want, and (2) is divisible by N, and
> so even if we take 240 mod N*N, we still aren't in G, much less of the
> 'right order' (turns out 240, being not relatively prime to N, can never be
> exponentiated to 1 mod N*N). For now, let's just take the standard
> Wideskies g, g = 1 + N = 16. If you want to go through this with a
> different g, give it a shot, but make sure it's got the right kind of order.
>
> **
>
> > I'll pick zeta = 21.  I think it needs to be greater than N.
>
> As in point 2, no. We require zeta to be in (Z/NZ)*, which, similar to the
> above, means a number
>
> 0 < zeta < N such that zeta is a unit.
>
> You picked 21; if we take 21 mod N we get zeta = 6, which is not a unit
> (in particular it is not relatively prime to p=3). Let's pick the next
> number greater than 6 which is in (Z/NZ)*, which is
>
> zeta = 7.
>
> **
>
> Let's see what we've got.
>
> ( (16**12)*(7**15) ) mod 225 = 208.
>
> I will leave it as an exercise to check that the decryption of 208 is in
> fact 12.
>
> **
>
> Ok, that's all so far. If the above is still not computing (literally or
> metaphorically), I am available to converse one-on-one either over the
> phone or some other medium (face time or what 

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Suneel Marthi
Refactor is definitely a first priority.  Is there a design/proposal draft
that we could comment on about how to go about refactoring the code.  I
have been trying to keep up with the emails but definitely would have
missed some.



On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
eawilli...@apache.org> wrote:

> Agree - let's leave the config/CLI the way it is for now and tackle that as
> a subsequent design discussion and PR.
>
> Also, I think that we should leave the ResponderDriver and the
> ResponderProps alone for this PR and push to a subsequent PR (once we
> decide if and how we would like to delegate each).
>
> I vote to remove the 'platform' option and the backwards compatibility in
> this PR and proceed with having a ResponderLauncher interface and forcing
> its implementation by the ResponderDriver.
>
> And, I am not so concerned with having one fat jar vs. multiple jars right
> now - to me, at this point, it's a 'nice to have' and not a 'must have' for
> Pirk functionality. We do need to break out Pirk into more clearly defined
> submodules (which is in progress) - via this re-factor, I think that we
> will gain some ability to generate multiple jars which is nice.
>
>
>
> On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison 
> wrote:
>
> > On 19/09/16 15:46, Darin Johnson wrote:
> > > Hey guys,
> > >
> > > Thanks for looking at the PR, I apologize if it offended anyone's
> eyes:).
> > >
> > > I'm glad it generated some discussion about the configuration.  I
> didn't
> > > really like where things were heading with the config.  However, didn't
> > > want to create to much scope creep.
> > >
> > > I think any hierarchical config (TypeSafe or yaml) would make things
> much
> > > more maintainable, the plugin could simply grab the appropriate part of
> > the
> > > config and handle accordingly.  I'd also cut down the number of command
> > > line options to only those that change between runs often (like
> > > input/output)
> > >
> > >> One option is to make Pirk pluggable, so that a Pirk installation
> could
> > >> use one or more of these in an extensible fashion by adding JAR files.
> > >> That would still require selecting one by command-line argument.
> > >
> > > An argument for this approach is for lambda architecture approaches
> (say
> > > spark/spark-streaming) were the contents of the jars would be so
> similar
> > it
> > > seems like to much trouble to create separate jars.
> > >
> > > Happy to continue working on this given some direction on where you'd
> > like
> > > it to go.  Also, it's a bit of a blocker to refactoring the build into
> > > submodules.
> >
> > FWIW my 2c is to not try and fix all the problems in one go, and rather
> > take a compromise on the configurations while you tease apart the
> > submodules in to separate source code trees, poms, etc; then come back
> > and fix the runtime configs.
> >
> > Once the submodules are in place it will open up more work for release
> > engineering and tinkering that can be done in parallel with the config
> > polishing.
> >
> > Just a thought.
> > Tim
> >
> >
> > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison 
> > wrote:
> > >
> > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > >>> It seems that it's the same idea as the ResponderLauncher with the
> > >> service
> > >>> component added to maintain something akin to the 'platform'. I would
> > >>> prefer that we just did away with the platform notion altogether and
> > make
> > >>> the ResponderDriver 'dumb'. We get around needing a platform-aware
> > >> service
> > >>> by requiring the ResponderLauncher implementation to be passed as a
> CLI
> > >> to
> > >>> the ResponderDriver.
> > >>
> > >> Let me check I understand what you are saying here.
> > >>
> > >> At the moment, there is a monolithic Pirk that hard codes how to
> respond
> > >> using lots of different backends (mapreduce, spark, sparkstreaming,
> > >> storm , standalone), and that is selected by command-line argument.
> > >>
> > >> One option is to make Pirk pluggable, so that a Pirk installation
> could
> > >> use one or more of these in an extensible fashion by adding JAR files.
> > >> That would still require selecting one by command-line argument.
> > >>
> > >> A second option is to simply pass in the required backend JAR to
> select
> > >> the particular implementation you choose, as a specific Pirk
> > >> installation doesn't need to use multiple backends simultaneously.
> > >>
> > >> ...and you are leaning towards the second option.  Do I have that
> > correct?
> > >>
> > >> Regards,
> > >> Tim
> > >>
> > >>> Am I missing something? Is there a good reason to provide a service
> by
> > >>> which platforms are registered? I'm open...
> > >>>
> > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison 
> > >> wrote:
> > >>>
> >  How about an approach like this?
> > https://github.com/tellison/incubator-pirk/tree/pirk-63
> > 
> > 

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Darin Johnson
Hey guys,

Thanks for looking at the PR, I apologize if it offended anyone's eyes:).

I'm glad it generated some discussion about the configuration.  I didn't
really like where things were heading with the config.  However, didn't
want to create to much scope creep.

I think any hierarchical config (TypeSafe or yaml) would make things much
more maintainable, the plugin could simply grab the appropriate part of the
config and handle accordingly.  I'd also cut down the number of command
line options to only those that change between runs often (like
input/output)

>One option is to make Pirk pluggable, so that a Pirk installation could
>use one or more of these in an extensible fashion by adding JAR files.
>That would still require selecting one by command-line argument.

An argument for this approach is for lambda architecture approaches (say
spark/spark-streaming) were the contents of the jars would be so similar it
seems like to much trouble to create separate jars.

Happy to continue working on this given some direction on where you'd like
it to go.  Also, it's a bit of a blocker to refactoring the build into
submodules.

Darin




On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison  wrote:

> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > It seems that it's the same idea as the ResponderLauncher with the
> service
> > component added to maintain something akin to the 'platform'. I would
> > prefer that we just did away with the platform notion altogether and make
> > the ResponderDriver 'dumb'. We get around needing a platform-aware
> service
> > by requiring the ResponderLauncher implementation to be passed as a CLI
> to
> > the ResponderDriver.
>
> Let me check I understand what you are saying here.
>
> At the moment, there is a monolithic Pirk that hard codes how to respond
> using lots of different backends (mapreduce, spark, sparkstreaming,
> storm , standalone), and that is selected by command-line argument.
>
> One option is to make Pirk pluggable, so that a Pirk installation could
> use one or more of these in an extensible fashion by adding JAR files.
> That would still require selecting one by command-line argument.
>
> A second option is to simply pass in the required backend JAR to select
> the particular implementation you choose, as a specific Pirk
> installation doesn't need to use multiple backends simultaneously.
>
> ...and you are leaning towards the second option.  Do I have that correct?
>
> Regards,
> Tim
>
> > Am I missing something? Is there a good reason to provide a service by
> > which platforms are registered? I'm open...
> >
> > On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison 
> wrote:
> >
> >> How about an approach like this?
> >>https://github.com/tellison/incubator-pirk/tree/pirk-63
> >>
> >> The "on-ramp" is the driver [1], which calls upon the service to find a
> >> plug-in [2] that claims to implement the required platform responder,
> >> e.g. [3].
> >>
> >> The list of plug-ins is given in the provider's JAR file, so the ones we
> >> provide in Pirk are listed together [4], but if you split these into
> >> modules, or somebody brings their own JAR alongside, these would be
> >> listed in each JAR's services/ directory.
> >>
> >> [1]
> >> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >> src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java
> >> [2]
> >> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
> >> [3]
> >> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >> src/main/java/org/apache/pirk/responder/wideskies/storm/
> >> StormResponder.java
> >> [4]
> >> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >> src/main/services/org.apache.responder.spi.Responder
> >>
> >> I'm not even going to dignify this with a WIP PR, it is far from ready,
> >> so proceed with caution.  There is hopefully enough there to show the
> >> approach, and if it is worth continuing I'm happy to do so.
> >>
> >> Regards,
> >> Tim
> >>
> >>
> >
>


Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Tim Ellison
On 19/09/16 13:39, Suneel Marthi wrote:
> The way this PR is now is so similar to how bad IBM SystemML which is a
> hackwork of hurriedly put together and something I have often pointed out
> to others as a clear example of "how not to design crappy software".  See
> this gist of an example code snippet from IBM SystemML -
> https://gist.github.com/smarthi/eb848e46621b7444924f

Not sure if you are looking at PR93, or the URL I sent you.

I agree that a large, explicit enumeration via a switch/if statement is
not conducive to extensibility, and that is what PIRK-63 is trying to
address.

> First things for the project:
> 
> 1. Move away from using the java properties (this is so 2002 way of doing
> things) to using TypeSafe style configurations which allow for structured
> properties.

>From a quick look, that covers a different level, namely how the
configurations are represented.  First we need to look at the responder
architecture to allow for different responder types to be plugged in to
the Pirk framework.

Each plug-in responder type can figure out how to depict it's configuration.

> 2. From a Responder design, there would be a Responder-impl-class property
> which would be read from TypeSafe config and the appropriate driver class
> invoked.

I've not used style configurations before.  I think they overlap with
the SystemConfiguration a bit.  It would be interesting to see what changes.

> As an example for the above ^^^ two, please look at at the Oryx 2.0 project
> for reference
> 
> https://github.com/oryxproject/oryx

I'd rather look at a proposed change to Pirk ;-)

Regards,
Tim

> On Mon, Sep 19, 2016 at 2:28 PM, Tim Ellison  wrote:
> 
>> How about an approach like this?
>>https://github.com/tellison/incubator-pirk/tree/pirk-63
>>
>> The "on-ramp" is the driver [1], which calls upon the service to find a
>> plug-in [2] that claims to implement the required platform responder,
>> e.g. [3].
>>
>> The list of plug-ins is given in the provider's JAR file, so the ones we
>> provide in Pirk are listed together [4], but if you split these into
>> modules, or somebody brings their own JAR alongside, these would be
>> listed in each JAR's services/ directory.
>>
>> [1]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java
>> [2]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
>> [3]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> src/main/java/org/apache/pirk/responder/wideskies/storm/
>> StormResponder.java
>> [4]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> src/main/services/org.apache.responder.spi.Responder
>>
>> I'm not even going to dignify this with a WIP PR, it is far from ready,
>> so proceed with caution.  There is hopefully enough there to show the
>> approach, and if it is worth continuing I'm happy to do so.
>>
>> Regards,
>> Tim
>>
>>
> 


[GitHub] incubator-pirk issue #94: Update a number of Pirk's pom dependencies.

2016-09-19 Thread ellisonanne
Github user ellisonanne commented on the issue:

https://github.com/apache/incubator-pirk/pull/94
  
+1 will merge now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Tim Ellison
How about an approach like this?
   https://github.com/tellison/incubator-pirk/tree/pirk-63

The "on-ramp" is the driver [1], which calls upon the service to find a
plug-in [2] that claims to implement the required platform responder,
e.g. [3].

The list of plug-ins is given in the provider's JAR file, so the ones we
provide in Pirk are listed together [4], but if you split these into
modules, or somebody brings their own JAR alongside, these would be
listed in each JAR's services/ directory.

[1]
https://github.com/tellison/incubator-pirk/blob/pirk-63/src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java
[2]
https://github.com/tellison/incubator-pirk/blob/pirk-63/src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
[3]
https://github.com/tellison/incubator-pirk/blob/pirk-63/src/main/java/org/apache/pirk/responder/wideskies/storm/StormResponder.java
[4]
https://github.com/tellison/incubator-pirk/blob/pirk-63/src/main/services/org.apache.responder.spi.Responder

I'm not even going to dignify this with a WIP PR, it is far from ready,
so proceed with caution.  There is hopefully enough there to show the
approach, and if it is worth continuing I'm happy to do so.

Regards,
Tim



[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread DarinJ
Github user DarinJ commented on a diff in the pull request:

https://github.com/apache/incubator-pirk/pull/93#discussion_r79377189
  
--- Diff: 
src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java ---
@@ -49,83 +41,111 @@
 public class ResponderDriver
 {
   private static final Logger logger = 
LoggerFactory.getLogger(ResponderDriver.class);
+  // ClassNames to instantiate Platforms using the platform CLI
+  private final static String MAPREDUCE_LAUNCHER = 
"org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponderLauncher";
+  private final static String SPARK_LAUNCHER = 
"org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher";
+  private final static String SPARKSTREAMING_LAUNCHER = 
"org.apache.pirk.responder.wideskies.spark.streaming.SparkStreamingResponderLauncher";
+  private final static String STANDALONE_LAUNCHER = 
"org.apache.pirk.responder.wideskies.standalone.StandaloneResponderLauncher";
+  private final static String STORM_LAUNCHER = 
"org.apache.pirk.responder.wideskies.storm.StormResponderLauncher";
 
   private enum Platform
   {
 MAPREDUCE, SPARK, SPARKSTREAMING, STORM, STANDALONE, NONE
   }
 
-  public static void main(String[] args) throws Exception
+  private static void launch(String launcherClassName)
+  {
+logger.info("Launching Responder with {}", launcherClassName);
+try
+{
+  Class clazz = Class.forName(launcherClassName);
+  if (ResponderLauncher.class.isAssignableFrom(clazz))
+  {
+Object launcherInstance = clazz.newInstance();
+Method m = launcherInstance.getClass().getDeclaredMethod("run");
+m.invoke(launcherInstance);
+  }
+  else
+  {
+logger.error("Class {} does not implement ResponderLauncher", 
launcherClassName);
+  }
+}
+catch (ClassNotFoundException e)
+{
+  logger.error("Class {} not found, check launcher property: {}", 
launcherClassName);
+}
+catch (NoSuchMethodException e)
+{
+  logger.error("In {} run method not found: {}", launcherClassName);
+}
+catch (InvocationTargetException e)
+{
+  logger.error("In {} run method could not be invoked: {}: {}", 
launcherClassName, e);
+}
+catch (InstantiationException e)
+{
+  logger.error("Instantiation exception within {}: {}", 
launcherClassName, e);
+}
+catch (IllegalAccessException e)
+{
+  logger.error("IllegalAccess Exception {}", e);
+}
+  }
+
+  public static void main(String[] args)
   {
 ResponderCLI responderCLI = new ResponderCLI(args);
 
 // For handling System.exit calls from Spark Streaming
 System.setSecurityManager(new SystemExitManager());
 
-Platform platform = Platform.NONE;
-String platformString = 
SystemConfiguration.getProperty(ResponderProps.PLATFORM);
-try
-{
-  platform = Platform.valueOf(platformString.toUpperCase());
-} catch (IllegalArgumentException e)
+String launcherClassName = 
SystemConfiguration.getProperty(ResponderProps.LAUNCHER);
+if (launcherClassName != null)
 {
-  logger.error("platform " + platformString + " not found.");
+  launch(launcherClassName);
 }
-
-logger.info("platform = " + platform);
-switch (platform)
+else
 {
-  case MAPREDUCE:
-logger.info("Launching MapReduce ResponderTool:");
-
-ComputeResponseTool pirWLTool = new ComputeResponseTool();
-ToolRunner.run(pirWLTool, new String[] {});
-break;
-
-  case SPARK:
-logger.info("Launching Spark ComputeResponse:");
-
-ComputeResponse computeResponse = new 
ComputeResponse(FileSystem.get(new Configuration()));
-computeResponse.performQuery();
-break;
-
-  case SPARKSTREAMING:
-logger.info("Launching Spark ComputeStreamingResponse:");
-
-ComputeStreamingResponse computeSR = new 
ComputeStreamingResponse(FileSystem.get(new Configuration()));
-try
-{
-  computeSR.performQuery();
-} catch (SystemExitException e)
-{
-  // If System.exit(0) is not caught from Spark Streaming,
-  // the application will complete with a 'failed' status
-  logger.info("Exited with System.exit(0) from Spark Streaming");
-}
-
-// Teardown the context
-computeSR.teardown();
-break;
-
-  case STORM:
-logger.info("Launching Storm PirkTopology:");
-PirkTopology.runPirkTopology();
-break;
-
-  case 

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread DarinJ
Github user DarinJ commented on a diff in the pull request:

https://github.com/apache/incubator-pirk/pull/93#discussion_r79377024
  
--- Diff: 
src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java ---
@@ -49,83 +41,111 @@
 public class ResponderDriver
 {
   private static final Logger logger = 
LoggerFactory.getLogger(ResponderDriver.class);
+  // ClassNames to instantiate Platforms using the platform CLI
+  private final static String MAPREDUCE_LAUNCHER = 
"org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponderLauncher";
+  private final static String SPARK_LAUNCHER = 
"org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher";
+  private final static String SPARKSTREAMING_LAUNCHER = 
"org.apache.pirk.responder.wideskies.spark.streaming.SparkStreamingResponderLauncher";
+  private final static String STANDALONE_LAUNCHER = 
"org.apache.pirk.responder.wideskies.standalone.StandaloneResponderLauncher";
+  private final static String STORM_LAUNCHER = 
"org.apache.pirk.responder.wideskies.storm.StormResponderLauncher";
 
--- End diff --

Yes, I added this for backwards compatibility. Maybe overkill this early in 
the game, but didn't want to break anyone's scrips/bash history to quickly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-pirk issue #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread ellisonanne
Github user ellisonanne commented on the issue:

https://github.com/apache/incubator-pirk/pull/93
  
A few other comments for discussion:

First, I am not opposed to having separate ResponderDrivers for each 
responder, but let's think it through and see if we really need to go down that 
path. 

I think that that main concern with having a single ResponderDriver vs. 
delegating the ResponderDrivers to each responder is the bloating of the main 
CLI and ResponderProps. Other than keeping the CLI/Props under control, I can't 
see a particularly good, material (i.e. not stylistic) reason to delegate now 
that we are rolling in a ResponderLauncher.  

The ResponderProps can go ahead and be delegated down into the specific 
responders independently of whether or not the ResponderDrivers get delegated. 
The ResponderLauncher for each responder can be responsible for implementing 
the 'validateResponderProperties' method that is currently in the central 
ResponderProps - since the CLI loads the properties from the properties files 
into SystemConfiguration, it will not require passing anything extra to the 
launchers.

One design alternative to breaking out into specific ResponderDrivers 
(which I am not opposed to BTW) would be to only allow the core properties in 
the main CLI and force everything else to be specified via properties files. 
This is somewhat limiting in some (contrived) cases that I can think of, but it 
would allow for a main CLI and prevent the bloat since responder-specific CLI 
options would not need to be added to the main CLI. 

Thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-pirk issue #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread ellisonanne
Github user ellisonanne commented on the issue:

https://github.com/apache/incubator-pirk/pull/93
  
+1 - looks good so far. 

One item for consideration: I am in favor of *not* providing backwards 
compatibility with the 'platform' option at this point, i.e. removing it 
altogether in favor of just the launcher. Since we just completed our first 
release, I think that we can go ahead and change the API - this would only 
require an argument change in current command lines and a deployment of the new 
jar - completely doable. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Intermittent problems with PIRK-35

2016-09-19 Thread Ellison Anne Williams
I have not seen/experienced similar issues, but I am fine with rolling it
back...

On Mon, Sep 19, 2016 at 6:05 AM, Tim Ellison  wrote:

> I have intermittent failures caused by
> "PIRK-35 Execute Tests in Parallel"
>
> such as
>
> 
>  ---
>   T E S T S
>  ---
>  Error occurred during initialization of VM
>  java.lang.OutOfMemoryError: unable to create new native thread
>  Error occurred during initialization of VM
>  java.lang.OutOfMemoryError: unable to create new native thread
>  Running org.apache.pirk.schema.data.LoadDataSchemaTest
>  Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.434
> sec - in org.apache.pirk.schema.data.LoadDataSchemaTest
>  Running org.apache.pirk.schema.query.LoadQuerySchemaTest
> 
>
> and
>
> 
>  Error occurred during initialization of VM
>  Cannot create VM thread. Out of system resources.
>  Error occurred during initialization of VM
>  java.lang.OutOfMemoryError: unable to create new native thread
> at java.lang.Thread.start0(Native Method)
> at java.lang.Thread.start(Thread.java:714)
> at java.lang.ref.Finalizer.(Finalizer.java:226)
>  Running org.apache.pirk.schema.data.LoadDataSchemaTest
>
> 
>
> My laptop is an 8-way machine with 24GB RAM, without ulimits.
> I've been running with Oracle Java 8 b102, which defaults to
> -XX:InitialHeapSize=387619456 -XX:MaxHeapSize=6201911296, and IBM Java 8
> SR3fp10.
>
> Spinning up all tests simultaneously, especially with the
> new KafkaStorm tests is too much.
>
> I'm working around it by deleting the PIRK-35 changes, and I get a full
> test run in 2mins.
>
> Do other see similar problems?  An objection to me reverting PIRK-35 now
> that the tests are running faster anyway?
>
> Regards,
> Tim
>


Intermittent problems with PIRK-35

2016-09-19 Thread Tim Ellison
I have intermittent failures caused by
"PIRK-35 Execute Tests in Parallel"

such as


 ---
  T E S T S
 ---
 Error occurred during initialization of VM
 java.lang.OutOfMemoryError: unable to create new native thread
 Error occurred during initialization of VM
 java.lang.OutOfMemoryError: unable to create new native thread
 Running org.apache.pirk.schema.data.LoadDataSchemaTest
 Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.434
sec - in org.apache.pirk.schema.data.LoadDataSchemaTest
 Running org.apache.pirk.schema.query.LoadQuerySchemaTest


and


 Error occurred during initialization of VM
 Cannot create VM thread. Out of system resources.
 Error occurred during initialization of VM
 java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.lang.ref.Finalizer.(Finalizer.java:226)
 Running org.apache.pirk.schema.data.LoadDataSchemaTest



My laptop is an 8-way machine with 24GB RAM, without ulimits.
I've been running with Oracle Java 8 b102, which defaults to
-XX:InitialHeapSize=387619456 -XX:MaxHeapSize=6201911296, and IBM Java 8
SR3fp10.

Spinning up all tests simultaneously, especially with the
new KafkaStorm tests is too much.

I'm working around it by deleting the PIRK-35 changes, and I get a full
test run in 2mins.

Do other see similar problems?  An objection to me reverting PIRK-35 now
that the tests are running faster anyway?

Regards,
Tim


[GitHub] incubator-pirk issue #94: Update a number of Pirk's pom dependencies.

2016-09-19 Thread smarthi
Github user smarthi commented on the issue:

https://github.com/apache/incubator-pirk/pull/94
  
+1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-pirk pull request #94: Update a number of Pirk's pom dependencies.

2016-09-19 Thread tellison
GitHub user tellison opened a pull request:

https://github.com/apache/incubator-pirk/pull/94

Update a number of Pirk's pom dependencies.

 - move Pirk to later versions of JMH, Hadoop, commons-math3, commons-net, 
json-simple, jacoco-maven-plugin, coveralls-maven-plugin, Surefire, 
maven-jar-plugin, and maven-release-plugin.

 - Note that Storm version 1.0.1 passes Pirk tests, but Storm version 1.0.2 
fails with NoClassDefFoundError: 
org/apache/kafka/common/protocol/SecurityProtocol

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tellison/incubator-pirk versions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-pirk/pull/94.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #94


commit 43b682aac24ae2e3998907ccc2a3eb695e7c2cb3
Author: Tim Ellison 
Date:   2016-09-15T13:58:32Z

Update a number of pom dependencies.

 - move Pirk to later versions of JMH, Hadoop, Storm, commons-math3,
commons-net, json-simple, jacoco-maven-plugin, coveralls-maven-plugin,
Surefire, maven-jar-plugin, and maven-release-plugin.

commit 6fe4241de34879f5b3420cb287947ad42aa481aa
Author: Tim Ellison 
Date:   2016-09-15T14:21:06Z

Revert Storm version change

 - Storm version 1.0.1 passes Pirk tests, but Storm version 1.0.2 fails
with NoClassDefFoundError:
org/apache/kafka/common/protocol/SecurityProtocol




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread Tim Ellison
Darin,

Unless I'm reading this wrong, the patch still has many references from
the ResponderDriver to the set of currently supported responders.  This
code will have to change when somebody wants to add a new responder type.

I thought the plan was to have the responder driver agnostic of the
responders available?  So, for example, having the driver maintain a
list of responders by name, and letting people specify the name on the
command line.

Each responder would then be responsible for implementing a standardised
interface, and registering themselves with the driver by name.

In that model the responders would each know about (a) the driver, and
how to register themselves by name, and (b) implement a standard
life-cycle for building a response.

The driver would be responsible for (a) collecting and maintaining the
registrations of any responder being loaded, and (b) invoking the
correct responder based on user selection.

Make sense?

I can hack something together to show what I mean.

Regards,
Tim



On 19/09/16 07:05, DarinJ wrote:
> GitHub user DarinJ opened a pull request:
> 
> https://github.com/apache/incubator-pirk/pull/93
> 
> WIP-Pirk 63-DO NOT MERGE
> 
> This is a WIP for 
> [PIRK-63](https://issues.apache.org/jira/browse/PIRK-63) to open the door to 
> other responders without having to modify the actual code of Pirk.  It's 
> submitted for feedback only, please DO NOT MERGE.  I've only tested 
> standalone mode.
> 
> It deprecates the "platform" CLI option in favor of the "launcher" option 
> which is the name of a class implementing the `ResponderLauncher` interface 
> which will invoke the run method via reflection.  This allows a developer of 
> a different responder to merely place a jar on the classpath and specify the 
> appropriate `ResponderLauncher` on the classpath.
> 
> The "platform" CLI option is still made available.  However, I removed 
> the explicit dependencies in favor of using reflection.  This was done in 
> anticipation other refactoring the build into submodules, though this does 
> admittedly make the code more fragile.
> 
> ResponderDriver had no unit tests, and unfortunately I saw no good way to 
> create good ones for this particular change, especially as it required 
> multiple frameworks to run.
> 
> I should say that another possible route here is to have each framework 
> responder implement their own ResponderDriver.  We could provide some 
> utilities to check the minimum Pirk required options are set, but leave the 
> rest to the implementation of the responder.  It would clean up the 
> ResponderCLI and ResponderProps which are rather bloated and might continue 
> to grow if left unchecked.
> 
> You can merge this pull request into a Git repository by running:
> 
> $ git pull https://github.com/DarinJ/incubator-pirk Pirk-63
> 
> Alternatively you can review and apply these changes as the patch at:
> 
> https://github.com/apache/incubator-pirk/pull/93.patch
> 
> To close this pull request, make a commit to your master/trunk branch
> with (at least) the following in the commit message:
> 
> This closes #93
> 
> 
> commit dda458bb2ae77fd9e3dc686d17dd8b49095b3395
> Author: Darin Johnson 
> Date:   2016-09-13T03:19:12Z
> 
> This is a WIP for 
> [PIRK-63](https://issues.apache.org/jira/browse/PIRK-63) to open the door to 
> other responders without having to modify the actual code of Pirk.  It's 
> submitted for feedback only, please DO NOT MERGE.
> 
> It deprecates the "platform" CLI option in favor of the "launcher" option 
> which is the name of a class implementing the `ResponderLauncher` interface 
> which will invoke the run method via reflection.  This allows a developer of 
> a different responder to merely place a jar on the classpath and specify the 
> appropriate `ResponderLauncher` on the classpath.
> 
> The "platform" CLI option is still made available.  However, I removed 
> the explicit dependencies in favor of using reflection.  This was done in 
> anticipation other refactoring the build into submodules, though this does 
> admittedly make the code more fragile.
> 
> 
> 
> 
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
> with INFRA.
> ---
> 


[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread tellison
Github user tellison commented on a diff in the pull request:

https://github.com/apache/incubator-pirk/pull/93#discussion_r79352002
  
--- Diff: 
src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java ---
@@ -49,83 +41,111 @@
 public class ResponderDriver
 {
   private static final Logger logger = 
LoggerFactory.getLogger(ResponderDriver.class);
+  // ClassNames to instantiate Platforms using the platform CLI
+  private final static String MAPREDUCE_LAUNCHER = 
"org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponderLauncher";
+  private final static String SPARK_LAUNCHER = 
"org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher";
+  private final static String SPARKSTREAMING_LAUNCHER = 
"org.apache.pirk.responder.wideskies.spark.streaming.SparkStreamingResponderLauncher";
+  private final static String STANDALONE_LAUNCHER = 
"org.apache.pirk.responder.wideskies.standalone.StandaloneResponderLauncher";
+  private final static String STORM_LAUNCHER = 
"org.apache.pirk.responder.wideskies.storm.StormResponderLauncher";
 
--- End diff --

I'm confused by this, I though the goal of PIRK-63 was to avoid having to 
change the ResponderDriver each time a new responder type is introduced?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread tellison
Github user tellison commented on a diff in the pull request:

https://github.com/apache/incubator-pirk/pull/93#discussion_r79351660
  
--- Diff: 
src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java ---
@@ -49,83 +41,111 @@
 public class ResponderDriver
 {
   private static final Logger logger = 
LoggerFactory.getLogger(ResponderDriver.class);
+  // ClassNames to instantiate Platforms using the platform CLI
+  private final static String MAPREDUCE_LAUNCHER = 
"org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponderLauncher";
+  private final static String SPARK_LAUNCHER = 
"org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher";
+  private final static String SPARKSTREAMING_LAUNCHER = 
"org.apache.pirk.responder.wideskies.spark.streaming.SparkStreamingResponderLauncher";
+  private final static String STANDALONE_LAUNCHER = 
"org.apache.pirk.responder.wideskies.standalone.StandaloneResponderLauncher";
+  private final static String STORM_LAUNCHER = 
"org.apache.pirk.responder.wideskies.storm.StormResponderLauncher";
 
   private enum Platform
   {
 MAPREDUCE, SPARK, SPARKSTREAMING, STORM, STANDALONE, NONE
   }
 
-  public static void main(String[] args) throws Exception
+  private static void launch(String launcherClassName)
+  {
+logger.info("Launching Responder with {}", launcherClassName);
+try
+{
+  Class clazz = Class.forName(launcherClassName);
+  if (ResponderLauncher.class.isAssignableFrom(clazz))
+  {
+Object launcherInstance = clazz.newInstance();
+Method m = launcherInstance.getClass().getDeclaredMethod("run");
+m.invoke(launcherInstance);
+  }
+  else
+  {
+logger.error("Class {} does not implement ResponderLauncher", 
launcherClassName);
+  }
+}
+catch (ClassNotFoundException e)
+{
+  logger.error("Class {} not found, check launcher property: {}", 
launcherClassName);
+}
+catch (NoSuchMethodException e)
+{
+  logger.error("In {} run method not found: {}", launcherClassName);
+}
+catch (InvocationTargetException e)
+{
+  logger.error("In {} run method could not be invoked: {}: {}", 
launcherClassName, e);
+}
+catch (InstantiationException e)
+{
+  logger.error("Instantiation exception within {}: {}", 
launcherClassName, e);
+}
+catch (IllegalAccessException e)
+{
+  logger.error("IllegalAccess Exception {}", e);
+}
+  }
+
+  public static void main(String[] args)
   {
 ResponderCLI responderCLI = new ResponderCLI(args);
 
 // For handling System.exit calls from Spark Streaming
 System.setSecurityManager(new SystemExitManager());
 
-Platform platform = Platform.NONE;
-String platformString = 
SystemConfiguration.getProperty(ResponderProps.PLATFORM);
-try
-{
-  platform = Platform.valueOf(platformString.toUpperCase());
-} catch (IllegalArgumentException e)
+String launcherClassName = 
SystemConfiguration.getProperty(ResponderProps.LAUNCHER);
+if (launcherClassName != null)
 {
-  logger.error("platform " + platformString + " not found.");
+  launch(launcherClassName);
 }
-
-logger.info("platform = " + platform);
-switch (platform)
+else
 {
-  case MAPREDUCE:
-logger.info("Launching MapReduce ResponderTool:");
-
-ComputeResponseTool pirWLTool = new ComputeResponseTool();
-ToolRunner.run(pirWLTool, new String[] {});
-break;
-
-  case SPARK:
-logger.info("Launching Spark ComputeResponse:");
-
-ComputeResponse computeResponse = new 
ComputeResponse(FileSystem.get(new Configuration()));
-computeResponse.performQuery();
-break;
-
-  case SPARKSTREAMING:
-logger.info("Launching Spark ComputeStreamingResponse:");
-
-ComputeStreamingResponse computeSR = new 
ComputeStreamingResponse(FileSystem.get(new Configuration()));
-try
-{
-  computeSR.performQuery();
-} catch (SystemExitException e)
-{
-  // If System.exit(0) is not caught from Spark Streaming,
-  // the application will complete with a 'failed' status
-  logger.info("Exited with System.exit(0) from Spark Streaming");
-}
-
-// Teardown the context
-computeSR.teardown();
-break;
-
-  case STORM:
-logger.info("Launching Storm PirkTopology:");
-PirkTopology.runPirkTopology();
-break;
-
-  case 

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

2016-09-19 Thread DarinJ
GitHub user DarinJ opened a pull request:

https://github.com/apache/incubator-pirk/pull/93

WIP-Pirk 63-DO NOT MERGE

This is a WIP for [PIRK-63](https://issues.apache.org/jira/browse/PIRK-63) 
to open the door to other responders without having to modify the actual code 
of Pirk.  It's submitted for feedback only, please DO NOT MERGE.  I've only 
tested standalone mode.

It deprecates the "platform" CLI option in favor of the "launcher" option 
which is the name of a class implementing the `ResponderLauncher` interface 
which will invoke the run method via reflection.  This allows a developer of a 
different responder to merely place a jar on the classpath and specify the 
appropriate `ResponderLauncher` on the classpath.

The "platform" CLI option is still made available.  However, I removed the 
explicit dependencies in favor of using reflection.  This was done in 
anticipation other refactoring the build into submodules, though this does 
admittedly make the code more fragile.

ResponderDriver had no unit tests, and unfortunately I saw no good way to 
create good ones for this particular change, especially as it required multiple 
frameworks to run.

I should say that another possible route here is to have each framework 
responder implement their own ResponderDriver.  We could provide some utilities 
to check the minimum Pirk required options are set, but leave the rest to the 
implementation of the responder.  It would clean up the ResponderCLI and 
ResponderProps which are rather bloated and might continue to grow if left 
unchecked.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DarinJ/incubator-pirk Pirk-63

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-pirk/pull/93.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #93


commit dda458bb2ae77fd9e3dc686d17dd8b49095b3395
Author: Darin Johnson 
Date:   2016-09-13T03:19:12Z

This is a WIP for [PIRK-63](https://issues.apache.org/jira/browse/PIRK-63) 
to open the door to other responders without having to modify the actual code 
of Pirk.  It's submitted for feedback only, please DO NOT MERGE.

It deprecates the "platform" CLI option in favor of the "launcher" option 
which is the name of a class implementing the `ResponderLauncher` interface 
which will invoke the run method via reflection.  This allows a developer of a 
different responder to merely place a jar on the classpath and specify the 
appropriate `ResponderLauncher` on the classpath.

The "platform" CLI option is still made available.  However, I removed the 
explicit dependencies in favor of using reflection.  This was done in 
anticipation other refactoring the build into submodules, though this does 
admittedly make the code more fragile.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---