Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Great will write up the doc link here, finish pirk63 then start this. On Sep 19, 2016 5:34 PM, "Suneel Marthi"wrote: > +100 > > On Mon, Sep 19, 2016 at 11:24 PM, Ellison Anne Williams < > eawilli...@apache.org> wrote: > > > Yes, ES is just an inputformat (like HDFS, Kafka, etc) - we don't need a > > separate submodule. > > > > Aside from pirk-core, it seems that we would want to break the responder > > implementations out into submodules. This would leave us with something > > along the lines of the following (at this point): > > > > pirk-core (encryption, core responder incl. standalone, core querier, > > query, inputformat, serialization, utils) > > pirk-storm > > pirk-mapreduce > > pirk-spark > > pirk-benchmark > > pirk-distributed-test > > > > Once we add other responder implementations, we can add them as > submodules > > - i.e. for Flink, we would have pirk-flink; for Beam, pirk-beam, etc. > > > > We could break 'pirk-core' down further... > > > > On Mon, Sep 19, 2016 at 5:10 PM, Suneel Marthi > > wrote: > > > > > Here's an example from the Flink project for how they go about new > > features > > > or system breaking API changes, we could start a similar process. The > > Flink > > > guys call these FLIP (Flink Improvement Proposal) and Kafka community > > > similarly has something called KLIP. > > > > > > We could start a PLIP (??? :-) ) > > > > > > https://cwiki.apache.org/confluence/pages/viewpage. > > action?pageId=65870673 > > > > > > > > > On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi < > suneel.mar...@gmail.com > > > > > > wrote: > > > > > > > A shared Google doc would be more convenient than a bunch of Jiras. > Its > > > > easier to comment and add notes that way. > > > > > > > > > > > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson < > > dbjohnson1...@gmail.com > > > > > > > > wrote: > > > > > > > >> Suneel, I'll try to put a couple jiras on it tonight with my > thoughts. > > > >> Based off my pirk-63 I was able to pull spark and storm out with no > > > >> issues. I was planning to pull them out, then tackling elastic > > search, > > > >> then hadoop as it's a little entrenched. This should keep most PRs > to > > > >> manageable chunks. I think once that's done addressing the configs > > will > > > >> make more sense. > > > >> > > > >> I'm open to suggestions. But the hope would be: > > > >> Pirk-parent > > > >> Pirk-core > > > >> Pirk-hadoop > > > >> Pirk-storm > > > >> Pirk-parent > > > >> > > > >> Pirk-es is a little weird as it's really just an inputformat, seems > > like > > > >> there's a more general solution here than creating submodules for > > every > > > >> inputformat. > > > >> > > > >> Darin > > > >> > > > >> On Sep 19, 2016 1:00 PM, "Suneel Marthi" > wrote: > > > >> > > > >> > > > > >> > > > >> > Refactor is definitely a first priority. Is there a > design/proposal > > > >> draft > > > >> > that we could comment on about how to go about refactoring the > code. > > > I > > > >> > have been trying to keep up with the emails but definitely would > > have > > > >> > missed some. > > > >> > > > > >> > > > > >> > > > > >> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams < > > > >> > eawilli...@apache.org > wrote: > > > >> > > > > >> > > Agree - let's leave the config/CLI the way it is for now and > > tackle > > > >> that as > > > >> > > a subsequent design discussion and PR. > > > >> > > > > > >> > > Also, I think that we should leave the ResponderDriver and the > > > >> > > ResponderProps alone for this PR and push to a subsequent PR > (once > > > we > > > >> > > decide if and how we would like to delegate each). > > > >> > > > > > >> > > I vote to remove the 'platform' option and the backwards > > > compatibility > > > >> in > > > >> > > this PR and proceed with having a ResponderLauncher interface > and > > > >> forcing > > > >> > > its implementation by the ResponderDriver. > > > >> > > > > > >> > > And, I am not so concerned with having one fat jar vs. multiple > > jars > > > >> right > > > >> > > now - to me, at this point, it's a 'nice to have' and not a > 'must > > > >> have' > > > >> for > > > >> > > Pirk functionality. We do need to break out Pirk into more > clearly > > > >> defined > > > >> > > submodules (which is in progress) - via this re-factor, I think > > that > > > >> we > > > >> > > will gain some ability to generate multiple jars which is nice. > > > >> > > > > > >> > > > > > >> > > > > > >> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison < > > > t.p.elli...@gmail.com> > > > >> > > wrote: > > > >> > > > > > >> > > > On 19/09/16 15:46, Darin Johnson wrote: > > > >> > > > > Hey guys, > > > >> > > > > > > > >> > > > > Thanks for looking at the PR, I apologize if it offended > > > anyone's > > > >> > > eyes:). > > > >> > > > > > > > >> > > > > I'm glad it generated some discussion about the > configuration. > > > I > > > >> > > didn't > > > >> > > > >
Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
+100 On Mon, Sep 19, 2016 at 11:24 PM, Ellison Anne Williams < eawilli...@apache.org> wrote: > Yes, ES is just an inputformat (like HDFS, Kafka, etc) - we don't need a > separate submodule. > > Aside from pirk-core, it seems that we would want to break the responder > implementations out into submodules. This would leave us with something > along the lines of the following (at this point): > > pirk-core (encryption, core responder incl. standalone, core querier, > query, inputformat, serialization, utils) > pirk-storm > pirk-mapreduce > pirk-spark > pirk-benchmark > pirk-distributed-test > > Once we add other responder implementations, we can add them as submodules > - i.e. for Flink, we would have pirk-flink; for Beam, pirk-beam, etc. > > We could break 'pirk-core' down further... > > On Mon, Sep 19, 2016 at 5:10 PM, Suneel Marthi> wrote: > > > Here's an example from the Flink project for how they go about new > features > > or system breaking API changes, we could start a similar process. The > Flink > > guys call these FLIP (Flink Improvement Proposal) and Kafka community > > similarly has something called KLIP. > > > > We could start a PLIP (??? :-) ) > > > > https://cwiki.apache.org/confluence/pages/viewpage. > action?pageId=65870673 > > > > > > On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi > > > wrote: > > > > > A shared Google doc would be more convenient than a bunch of Jiras. Its > > > easier to comment and add notes that way. > > > > > > > > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson < > dbjohnson1...@gmail.com > > > > > > wrote: > > > > > >> Suneel, I'll try to put a couple jiras on it tonight with my thoughts. > > >> Based off my pirk-63 I was able to pull spark and storm out with no > > >> issues. I was planning to pull them out, then tackling elastic > search, > > >> then hadoop as it's a little entrenched. This should keep most PRs to > > >> manageable chunks. I think once that's done addressing the configs > will > > >> make more sense. > > >> > > >> I'm open to suggestions. But the hope would be: > > >> Pirk-parent > > >> Pirk-core > > >> Pirk-hadoop > > >> Pirk-storm > > >> Pirk-parent > > >> > > >> Pirk-es is a little weird as it's really just an inputformat, seems > like > > >> there's a more general solution here than creating submodules for > every > > >> inputformat. > > >> > > >> Darin > > >> > > >> On Sep 19, 2016 1:00 PM, "Suneel Marthi" wrote: > > >> > > >> > > > >> > > >> > Refactor is definitely a first priority. Is there a design/proposal > > >> draft > > >> > that we could comment on about how to go about refactoring the code. > > I > > >> > have been trying to keep up with the emails but definitely would > have > > >> > missed some. > > >> > > > >> > > > >> > > > >> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams < > > >> > eawilli...@apache.org > wrote: > > >> > > > >> > > Agree - let's leave the config/CLI the way it is for now and > tackle > > >> that as > > >> > > a subsequent design discussion and PR. > > >> > > > > >> > > Also, I think that we should leave the ResponderDriver and the > > >> > > ResponderProps alone for this PR and push to a subsequent PR (once > > we > > >> > > decide if and how we would like to delegate each). > > >> > > > > >> > > I vote to remove the 'platform' option and the backwards > > compatibility > > >> in > > >> > > this PR and proceed with having a ResponderLauncher interface and > > >> forcing > > >> > > its implementation by the ResponderDriver. > > >> > > > > >> > > And, I am not so concerned with having one fat jar vs. multiple > jars > > >> right > > >> > > now - to me, at this point, it's a 'nice to have' and not a 'must > > >> have' > > >> for > > >> > > Pirk functionality. We do need to break out Pirk into more clearly > > >> defined > > >> > > submodules (which is in progress) - via this re-factor, I think > that > > >> we > > >> > > will gain some ability to generate multiple jars which is nice. > > >> > > > > >> > > > > >> > > > > >> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison < > > t.p.elli...@gmail.com> > > >> > > wrote: > > >> > > > > >> > > > On 19/09/16 15:46, Darin Johnson wrote: > > >> > > > > Hey guys, > > >> > > > > > > >> > > > > Thanks for looking at the PR, I apologize if it offended > > anyone's > > >> > > eyes:). > > >> > > > > > > >> > > > > I'm glad it generated some discussion about the configuration. > > I > > >> > > didn't > > >> > > > > really like where things were heading with the config. > However, > > >> didn't > > >> > > > > want to create to much scope creep. > > >> > > > > > > >> > > > > I think any hierarchical config (TypeSafe or yaml) would make > > >> things > > >> > > much > > >> > > > > more maintainable, the plugin could simply grab the > appropriate > > >> part of > > >> > > > the > > >> > > > > config and handle accordingly. I'd also cut down the number > of > > >>
Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Yes, ES is just an inputformat (like HDFS, Kafka, etc) - we don't need a separate submodule. Aside from pirk-core, it seems that we would want to break the responder implementations out into submodules. This would leave us with something along the lines of the following (at this point): pirk-core (encryption, core responder incl. standalone, core querier, query, inputformat, serialization, utils) pirk-storm pirk-mapreduce pirk-spark pirk-benchmark pirk-distributed-test Once we add other responder implementations, we can add them as submodules - i.e. for Flink, we would have pirk-flink; for Beam, pirk-beam, etc. We could break 'pirk-core' down further... On Mon, Sep 19, 2016 at 5:10 PM, Suneel Marthiwrote: > Here's an example from the Flink project for how they go about new features > or system breaking API changes, we could start a similar process. The Flink > guys call these FLIP (Flink Improvement Proposal) and Kafka community > similarly has something called KLIP. > > We could start a PLIP (??? :-) ) > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673 > > > On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi > wrote: > > > A shared Google doc would be more convenient than a bunch of Jiras. Its > > easier to comment and add notes that way. > > > > > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson > > > wrote: > > > >> Suneel, I'll try to put a couple jiras on it tonight with my thoughts. > >> Based off my pirk-63 I was able to pull spark and storm out with no > >> issues. I was planning to pull them out, then tackling elastic search, > >> then hadoop as it's a little entrenched. This should keep most PRs to > >> manageable chunks. I think once that's done addressing the configs will > >> make more sense. > >> > >> I'm open to suggestions. But the hope would be: > >> Pirk-parent > >> Pirk-core > >> Pirk-hadoop > >> Pirk-storm > >> Pirk-parent > >> > >> Pirk-es is a little weird as it's really just an inputformat, seems like > >> there's a more general solution here than creating submodules for every > >> inputformat. > >> > >> Darin > >> > >> On Sep 19, 2016 1:00 PM, "Suneel Marthi" wrote: > >> > >> > > >> > >> > Refactor is definitely a first priority. Is there a design/proposal > >> draft > >> > that we could comment on about how to go about refactoring the code. > I > >> > have been trying to keep up with the emails but definitely would have > >> > missed some. > >> > > >> > > >> > > >> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams < > >> > eawilli...@apache.org > wrote: > >> > > >> > > Agree - let's leave the config/CLI the way it is for now and tackle > >> that as > >> > > a subsequent design discussion and PR. > >> > > > >> > > Also, I think that we should leave the ResponderDriver and the > >> > > ResponderProps alone for this PR and push to a subsequent PR (once > we > >> > > decide if and how we would like to delegate each). > >> > > > >> > > I vote to remove the 'platform' option and the backwards > compatibility > >> in > >> > > this PR and proceed with having a ResponderLauncher interface and > >> forcing > >> > > its implementation by the ResponderDriver. > >> > > > >> > > And, I am not so concerned with having one fat jar vs. multiple jars > >> right > >> > > now - to me, at this point, it's a 'nice to have' and not a 'must > >> have' > >> for > >> > > Pirk functionality. We do need to break out Pirk into more clearly > >> defined > >> > > submodules (which is in progress) - via this re-factor, I think that > >> we > >> > > will gain some ability to generate multiple jars which is nice. > >> > > > >> > > > >> > > > >> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison < > t.p.elli...@gmail.com> > >> > > wrote: > >> > > > >> > > > On 19/09/16 15:46, Darin Johnson wrote: > >> > > > > Hey guys, > >> > > > > > >> > > > > Thanks for looking at the PR, I apologize if it offended > anyone's > >> > > eyes:). > >> > > > > > >> > > > > I'm glad it generated some discussion about the configuration. > I > >> > > didn't > >> > > > > really like where things were heading with the config. However, > >> didn't > >> > > > > want to create to much scope creep. > >> > > > > > >> > > > > I think any hierarchical config (TypeSafe or yaml) would make > >> things > >> > > much > >> > > > > more maintainable, the plugin could simply grab the appropriate > >> part of > >> > > > the > >> > > > > config and handle accordingly. I'd also cut down the number of > >> command > >> > > > > line options to only those that change between runs often (like > >> > > > > input/output) > >> > > > > > >> > > > >> One option is to make Pirk pluggable, so that a Pirk > installation > >> > > could > >> > > > >> use one or more of these in an extensible fashion by adding JAR > >> files. > >> > > > >> That would still require selecting one by command-line > argument. >
Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Here's an example from the Flink project for how they go about new features or system breaking API changes, we could start a similar process. The Flink guys call these FLIP (Flink Improvement Proposal) and Kafka community similarly has something called KLIP. We could start a PLIP (??? :-) ) https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673 On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthiwrote: > A shared Google doc would be more convenient than a bunch of Jiras. Its > easier to comment and add notes that way. > > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson > wrote: > >> Suneel, I'll try to put a couple jiras on it tonight with my thoughts. >> Based off my pirk-63 I was able to pull spark and storm out with no >> issues. I was planning to pull them out, then tackling elastic search, >> then hadoop as it's a little entrenched. This should keep most PRs to >> manageable chunks. I think once that's done addressing the configs will >> make more sense. >> >> I'm open to suggestions. But the hope would be: >> Pirk-parent >> Pirk-core >> Pirk-hadoop >> Pirk-storm >> Pirk-parent >> >> Pirk-es is a little weird as it's really just an inputformat, seems like >> there's a more general solution here than creating submodules for every >> inputformat. >> >> Darin >> >> On Sep 19, 2016 1:00 PM, "Suneel Marthi" wrote: >> >> > >> >> > Refactor is definitely a first priority. Is there a design/proposal >> draft >> > that we could comment on about how to go about refactoring the code. I >> > have been trying to keep up with the emails but definitely would have >> > missed some. >> > >> > >> > >> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams < >> > eawilli...@apache.org > wrote: >> > >> > > Agree - let's leave the config/CLI the way it is for now and tackle >> that as >> > > a subsequent design discussion and PR. >> > > >> > > Also, I think that we should leave the ResponderDriver and the >> > > ResponderProps alone for this PR and push to a subsequent PR (once we >> > > decide if and how we would like to delegate each). >> > > >> > > I vote to remove the 'platform' option and the backwards compatibility >> in >> > > this PR and proceed with having a ResponderLauncher interface and >> forcing >> > > its implementation by the ResponderDriver. >> > > >> > > And, I am not so concerned with having one fat jar vs. multiple jars >> right >> > > now - to me, at this point, it's a 'nice to have' and not a 'must >> have' >> for >> > > Pirk functionality. We do need to break out Pirk into more clearly >> defined >> > > submodules (which is in progress) - via this re-factor, I think that >> we >> > > will gain some ability to generate multiple jars which is nice. >> > > >> > > >> > > >> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison >> > > wrote: >> > > >> > > > On 19/09/16 15:46, Darin Johnson wrote: >> > > > > Hey guys, >> > > > > >> > > > > Thanks for looking at the PR, I apologize if it offended anyone's >> > > eyes:). >> > > > > >> > > > > I'm glad it generated some discussion about the configuration. I >> > > didn't >> > > > > really like where things were heading with the config. However, >> didn't >> > > > > want to create to much scope creep. >> > > > > >> > > > > I think any hierarchical config (TypeSafe or yaml) would make >> things >> > > much >> > > > > more maintainable, the plugin could simply grab the appropriate >> part of >> > > > the >> > > > > config and handle accordingly. I'd also cut down the number of >> command >> > > > > line options to only those that change between runs often (like >> > > > > input/output) >> > > > > >> > > > >> One option is to make Pirk pluggable, so that a Pirk installation >> > > could >> > > > >> use one or more of these in an extensible fashion by adding JAR >> files. >> > > > >> That would still require selecting one by command-line argument. >> > > > > >> > > > > An argument for this approach is for lambda architecture >> approaches >> > > (say >> > > > > spark/spark-streaming) were the contents of the jars would be so >> > > similar >> > > > it >> > > > > seems like to much trouble to create separate jars. >> > > > > >> > > > > Happy to continue working on this given some direction on where >> you'd >> > > > like >> > > > > it to go. Also, it's a bit of a blocker to refactoring the build >> into >> > > > > submodules. >> > > > >> > > > FWIW my 2c is to not try and fix all the problems in one go, and >> rather >> > > > take a compromise on the configurations while you tease apart the >> > > > submodules in to separate source code trees, poms, etc; then come >> back >> > > > and fix the runtime configs. >> > > > >> > > > Once the submodules are in place it will open up more work for >> release >> > > > engineering and tinkering that can be done in parallel with the >> config >> > > > polishing. >> > > > >> > > >
Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Sounds good. On Mon, Sep 19, 2016 at 4:22 PM, Darin Johnsonwrote: > Alright, that was in the spirit of what I was thinking when I did this. > > Why don't we take Tim's suggested improvements to my PR (I'll do the > necessary cleanup) and at the same time just remove the platform argument > altogether since backwards compatibility isn't upsetting anyone. > > We'll still need a command line option for the launcher for now as we don't > have submodules we can decide between the two choices after we break out > submodules and improve the config. > > > On Sep 19, 2016 12:19 PM, "Tim Ellison" wrote: > > > On 19/09/16 15:46, Darin Johnson wrote: > > > Hey guys, > > > > > > Thanks for looking at the PR, I apologize if it offended anyone's > eyes:). > > > > > > I'm glad it generated some discussion about the configuration. I > didn't > > > really like where things were heading with the config. However, didn't > > > want to create to much scope creep. > > > > > > I think any hierarchical config (TypeSafe or yaml) would make things > much > > > more maintainable, the plugin could simply grab the appropriate part of > > the > > > config and handle accordingly. I'd also cut down the number of command > > > line options to only those that change between runs often (like > > > input/output) > > > > > >> One option is to make Pirk pluggable, so that a Pirk installation > could > > >> use one or more of these in an extensible fashion by adding JAR files. > > >> That would still require selecting one by command-line argument. > > > > > > An argument for this approach is for lambda architecture approaches > (say > > > spark/spark-streaming) were the contents of the jars would be so > similar > > it > > > seems like to much trouble to create separate jars. > > > > > > Happy to continue working on this given some direction on where you'd > > like > > > it to go. Also, it's a bit of a blocker to refactoring the build into > > > submodules. > > > > FWIW my 2c is to not try and fix all the problems in one go, and rather > > take a compromise on the configurations while you tease apart the > > submodules in to separate source code trees, poms, etc; then come back > > and fix the runtime configs. > > > > Once the submodules are in place it will open up more work for release > > engineering and tinkering that can be done in parallel with the config > > polishing. > > > > Just a thought. > > Tim > > > > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison > > wrote: > > > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote: > > >>> It seems that it's the same idea as the ResponderLauncher with the > > >> service > > >>> component added to maintain something akin to the 'platform'. I would > > >>> prefer that we just did away with the platform notion altogether and > > make > > >>> the ResponderDriver 'dumb'. We get around needing a platform-aware > > >> service > > >>> by requiring the ResponderLauncher implementation to be passed as a > CLI > > >> to > > >>> the ResponderDriver. > > >> > > >> Let me check I understand what you are saying here. > > >> > > >> At the moment, there is a monolithic Pirk that hard codes how to > respond > > >> using lots of different backends (mapreduce, spark, sparkstreaming, > > >> storm , standalone), and that is selected by command-line argument. > > >> > > >> One option is to make Pirk pluggable, so that a Pirk installation > could > > >> use one or more of these in an extensible fashion by adding JAR files. > > >> That would still require selecting one by command-line argument. > > >> > > >> A second option is to simply pass in the required backend JAR to > select > > >> the particular implementation you choose, as a specific Pirk > > >> installation doesn't need to use multiple backends simultaneously. > > >> > > >> ...and you are leaning towards the second option. Do I have that > > correct? > > >> > > >> Regards, > > >> Tim > > >> > > >>> Am I missing something? Is there a good reason to provide a service > by > > >>> which platforms are registered? I'm open... > > >>> > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison > > >> wrote: > > >>> > > How about an approach like this? > > https://github.com/tellison/incubator-pirk/tree/pirk-63 > > > > The "on-ramp" is the driver [1], which calls upon the service to > find > > a > > plug-in [2] that claims to implement the required platform > responder, > > e.g. [3]. > > > > The list of plug-ins is given in the provider's JAR file, so the > ones > > we > > provide in Pirk are listed together [4], but if you split these into > > modules, or somebody brings their own JAR alongside, these would be > > listed in each JAR's services/ directory. > > > > [1] > > https://github.com/tellison/incubator-pirk/blob/pirk-63/ > > src/main/java/org/apache/pirk/responder/wideskies/ > >
Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Sure will do tonight. On Sep 19, 2016 5:07 PM, "Suneel Marthi"wrote: > A shared Google doc would be more convenient than a bunch of Jiras. Its > easier to comment and add notes that way. > > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson > wrote: > > > Suneel, I'll try to put a couple jiras on it tonight with my thoughts. > > Based off my pirk-63 I was able to pull spark and storm out with no > > issues. I was planning to pull them out, then tackling elastic search, > > then hadoop as it's a little entrenched. This should keep most PRs to > > manageable chunks. I think once that's done addressing the configs will > > make more sense. > > > > I'm open to suggestions. But the hope would be: > > Pirk-parent > > Pirk-core > > Pirk-hadoop > > Pirk-storm > > Pirk-parent > > > > Pirk-es is a little weird as it's really just an inputformat, seems like > > there's a more general solution here than creating submodules for every > > inputformat. > > > > Darin > > > > On Sep 19, 2016 1:00 PM, "Suneel Marthi" wrote: > > > > > > > > > > Refactor is definitely a first priority. Is there a design/proposal > > draft > > > that we could comment on about how to go about refactoring the code. I > > > have been trying to keep up with the emails but definitely would have > > > missed some. > > > > > > > > > > > > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams < > > > eawilli...@apache.org > wrote: > > > > > > > Agree - let's leave the config/CLI the way it is for now and tackle > > that as > > > > a subsequent design discussion and PR. > > > > > > > > Also, I think that we should leave the ResponderDriver and the > > > > ResponderProps alone for this PR and push to a subsequent PR (once we > > > > decide if and how we would like to delegate each). > > > > > > > > I vote to remove the 'platform' option and the backwards > compatibility > > in > > > > this PR and proceed with having a ResponderLauncher interface and > > forcing > > > > its implementation by the ResponderDriver. > > > > > > > > And, I am not so concerned with having one fat jar vs. multiple jars > > right > > > > now - to me, at this point, it's a 'nice to have' and not a 'must > have' > > for > > > > Pirk functionality. We do need to break out Pirk into more clearly > > defined > > > > submodules (which is in progress) - via this re-factor, I think that > we > > > > will gain some ability to generate multiple jars which is nice. > > > > > > > > > > > > > > > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison > > > > > wrote: > > > > > > > > > On 19/09/16 15:46, Darin Johnson wrote: > > > > > > Hey guys, > > > > > > > > > > > > Thanks for looking at the PR, I apologize if it offended anyone's > > > > eyes:). > > > > > > > > > > > > I'm glad it generated some discussion about the configuration. I > > > > didn't > > > > > > really like where things were heading with the config. However, > > didn't > > > > > > want to create to much scope creep. > > > > > > > > > > > > I think any hierarchical config (TypeSafe or yaml) would make > > things > > > > much > > > > > > more maintainable, the plugin could simply grab the appropriate > > part of > > > > > the > > > > > > config and handle accordingly. I'd also cut down the number of > > command > > > > > > line options to only those that change between runs often (like > > > > > > input/output) > > > > > > > > > > > >> One option is to make Pirk pluggable, so that a Pirk > installation > > > > could > > > > > >> use one or more of these in an extensible fashion by adding JAR > > files. > > > > > >> That would still require selecting one by command-line argument. > > > > > > > > > > > > An argument for this approach is for lambda architecture > approaches > > > > (say > > > > > > spark/spark-streaming) were the contents of the jars would be so > > > > similar > > > > > it > > > > > > seems like to much trouble to create separate jars. > > > > > > > > > > > > Happy to continue working on this given some direction on where > > you'd > > > > > like > > > > > > it to go. Also, it's a bit of a blocker to refactoring the build > > into > > > > > > submodules. > > > > > > > > > > FWIW my 2c is to not try and fix all the problems in one go, and > > rather > > > > > take a compromise on the configurations while you tease apart the > > > > > submodules in to separate source code trees, poms, etc; then come > > back > > > > > and fix the runtime configs. > > > > > > > > > > Once the submodules are in place it will open up more work for > > release > > > > > engineering and tinkering that can be done in parallel with the > > config > > > > > polishing. > > > > > > > > > > Just a thought. > > > > > Tim > > > > > > > > > > > > > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison < > > t.p.elli...@gmail.com> > > > > > wrote: > > > > > > > > > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote: > >
[GitHub] incubator-pirk issue #95: [WIP] [PIRK-69] Improve clarity of group theory po...
Github user wraydulany commented on the issue: https://github.com/apache/incubator-pirk/pull/95 Please don't close this until we have some feedback from the community indicating that the changes provide sufficient background to make clear my (previously quite obscure, unless you knew group theory terminology off the top of your head already) mathematical notation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Suneel, I'll try to put a couple jiras on it tonight with my thoughts. Based off my pirk-63 I was able to pull spark and storm out with no issues. I was planning to pull them out, then tackling elastic search, then hadoop as it's a little entrenched. This should keep most PRs to manageable chunks. I think once that's done addressing the configs will make more sense. I'm open to suggestions. But the hope would be: Pirk-parent Pirk-core Pirk-hadoop Pirk-storm Pirk-parent Pirk-es is a little weird as it's really just an inputformat, seems like there's a more general solution here than creating submodules for every inputformat. Darin On Sep 19, 2016 1:00 PM, "Suneel Marthi"wrote: > > Refactor is definitely a first priority. Is there a design/proposal draft > that we could comment on about how to go about refactoring the code. I > have been trying to keep up with the emails but definitely would have > missed some. > > > > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams < > eawilli...@apache.org > wrote: > > > Agree - let's leave the config/CLI the way it is for now and tackle that as > > a subsequent design discussion and PR. > > > > Also, I think that we should leave the ResponderDriver and the > > ResponderProps alone for this PR and push to a subsequent PR (once we > > decide if and how we would like to delegate each). > > > > I vote to remove the 'platform' option and the backwards compatibility in > > this PR and proceed with having a ResponderLauncher interface and forcing > > its implementation by the ResponderDriver. > > > > And, I am not so concerned with having one fat jar vs. multiple jars right > > now - to me, at this point, it's a 'nice to have' and not a 'must have' for > > Pirk functionality. We do need to break out Pirk into more clearly defined > > submodules (which is in progress) - via this re-factor, I think that we > > will gain some ability to generate multiple jars which is nice. > > > > > > > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison > > wrote: > > > > > On 19/09/16 15:46, Darin Johnson wrote: > > > > Hey guys, > > > > > > > > Thanks for looking at the PR, I apologize if it offended anyone's > > eyes:). > > > > > > > > I'm glad it generated some discussion about the configuration. I > > didn't > > > > really like where things were heading with the config. However, didn't > > > > want to create to much scope creep. > > > > > > > > I think any hierarchical config (TypeSafe or yaml) would make things > > much > > > > more maintainable, the plugin could simply grab the appropriate part of > > > the > > > > config and handle accordingly. I'd also cut down the number of command > > > > line options to only those that change between runs often (like > > > > input/output) > > > > > > > >> One option is to make Pirk pluggable, so that a Pirk installation > > could > > > >> use one or more of these in an extensible fashion by adding JAR files. > > > >> That would still require selecting one by command-line argument. > > > > > > > > An argument for this approach is for lambda architecture approaches > > (say > > > > spark/spark-streaming) were the contents of the jars would be so > > similar > > > it > > > > seems like to much trouble to create separate jars. > > > > > > > > Happy to continue working on this given some direction on where you'd > > > like > > > > it to go. Also, it's a bit of a blocker to refactoring the build into > > > > submodules. > > > > > > FWIW my 2c is to not try and fix all the problems in one go, and rather > > > take a compromise on the configurations while you tease apart the > > > submodules in to separate source code trees, poms, etc; then come back > > > and fix the runtime configs. > > > > > > Once the submodules are in place it will open up more work for release > > > engineering and tinkering that can be done in parallel with the config > > > polishing. > > > > > > Just a thought. > > > Tim > > > > > > > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison > > > wrote: > > > > > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote: > > > >>> It seems that it's the same idea as the ResponderLauncher with the > > > >> service > > > >>> component added to maintain something akin to the 'platform'. I would > > > >>> prefer that we just did away with the platform notion altogether and > > > make > > > >>> the ResponderDriver 'dumb'. We get around needing a platform-aware > > > >> service > > > >>> by requiring the ResponderLauncher implementation to be passed as a > > CLI > > > >> to > > > >>> the ResponderDriver. > > > >> > > > >> Let me check I understand what you are saying here. > > > >> > > > >> At the moment, there is a monolithic Pirk that hard codes how to > > respond > > > >> using lots of different backends (mapreduce, spark, sparkstreaming, > > > >> storm , standalone), and that is selected by command-line argument. > > > >> > > > >> One
Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Alright, that was in the spirit of what I was thinking when I did this. Why don't we take Tim's suggested improvements to my PR (I'll do the necessary cleanup) and at the same time just remove the platform argument altogether since backwards compatibility isn't upsetting anyone. We'll still need a command line option for the launcher for now as we don't have submodules we can decide between the two choices after we break out submodules and improve the config. On Sep 19, 2016 12:19 PM, "Tim Ellison"wrote: > On 19/09/16 15:46, Darin Johnson wrote: > > Hey guys, > > > > Thanks for looking at the PR, I apologize if it offended anyone's eyes:). > > > > I'm glad it generated some discussion about the configuration. I didn't > > really like where things were heading with the config. However, didn't > > want to create to much scope creep. > > > > I think any hierarchical config (TypeSafe or yaml) would make things much > > more maintainable, the plugin could simply grab the appropriate part of > the > > config and handle accordingly. I'd also cut down the number of command > > line options to only those that change between runs often (like > > input/output) > > > >> One option is to make Pirk pluggable, so that a Pirk installation could > >> use one or more of these in an extensible fashion by adding JAR files. > >> That would still require selecting one by command-line argument. > > > > An argument for this approach is for lambda architecture approaches (say > > spark/spark-streaming) were the contents of the jars would be so similar > it > > seems like to much trouble to create separate jars. > > > > Happy to continue working on this given some direction on where you'd > like > > it to go. Also, it's a bit of a blocker to refactoring the build into > > submodules. > > FWIW my 2c is to not try and fix all the problems in one go, and rather > take a compromise on the configurations while you tease apart the > submodules in to separate source code trees, poms, etc; then come back > and fix the runtime configs. > > Once the submodules are in place it will open up more work for release > engineering and tinkering that can be done in parallel with the config > polishing. > > Just a thought. > Tim > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison > wrote: > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote: > >>> It seems that it's the same idea as the ResponderLauncher with the > >> service > >>> component added to maintain something akin to the 'platform'. I would > >>> prefer that we just did away with the platform notion altogether and > make > >>> the ResponderDriver 'dumb'. We get around needing a platform-aware > >> service > >>> by requiring the ResponderLauncher implementation to be passed as a CLI > >> to > >>> the ResponderDriver. > >> > >> Let me check I understand what you are saying here. > >> > >> At the moment, there is a monolithic Pirk that hard codes how to respond > >> using lots of different backends (mapreduce, spark, sparkstreaming, > >> storm , standalone), and that is selected by command-line argument. > >> > >> One option is to make Pirk pluggable, so that a Pirk installation could > >> use one or more of these in an extensible fashion by adding JAR files. > >> That would still require selecting one by command-line argument. > >> > >> A second option is to simply pass in the required backend JAR to select > >> the particular implementation you choose, as a specific Pirk > >> installation doesn't need to use multiple backends simultaneously. > >> > >> ...and you are leaning towards the second option. Do I have that > correct? > >> > >> Regards, > >> Tim > >> > >>> Am I missing something? Is there a good reason to provide a service by > >>> which platforms are registered? I'm open... > >>> > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison > >> wrote: > >>> > How about an approach like this? > https://github.com/tellison/incubator-pirk/tree/pirk-63 > > The "on-ramp" is the driver [1], which calls upon the service to find > a > plug-in [2] that claims to implement the required platform responder, > e.g. [3]. > > The list of plug-ins is given in the provider's JAR file, so the ones > we > provide in Pirk are listed together [4], but if you split these into > modules, or somebody brings their own JAR alongside, these would be > listed in each JAR's services/ directory. > > [1] > https://github.com/tellison/incubator-pirk/blob/pirk-63/ > src/main/java/org/apache/pirk/responder/wideskies/ > ResponderDriver.java > [2] > https://github.com/tellison/incubator-pirk/blob/pirk-63/ > src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java > [3] > https://github.com/tellison/incubator-pirk/blob/pirk-63/ > src/main/java/org/apache/pirk/responder/wideskies/storm/ > StormResponder.java > [4] >
Re: Math deck (was: Re: [GitHub] incubator-pirk pull request #92: [Pirk 67] - Add Slide Deck to the Website D...)
One explicit vote, one implicit vote for updating/clarifying the slides. I've created PIRK-69 to improve slide clarity. Unless this doesn't make sense (tell me) I'll mark PRs on this as WIPs until I've got some agreement from the community that the slides are clear enough. On Mon, Sep 19, 2016 at 3:31 PM, Ryan Carrwrote: > Hey Walter / Tim, > > I just wanted to add I had some trouble similar to Tim's when trying to > understand the Wideskies paper. As a person without a background in group > theory/theoretical math trying to get my head around this stuff, it was > very difficult for me to even find with Google what the notations (Z/NZ) > and (Z/N^2 Z)^x (also called (Z/N^2 Z)* in the Wideskies/Paillier papers) > meant. Since these concepts are so central to how the algorithm works, I > think it would be really helpful if we had a footnote the first time those > are introduced defining that notation with links to a more in-depth > explanation, or at least a phrase that can be Googled to reliably find it. > > Thanks, > -Ryan Carr > > On Mon, Sep 19, 2016 at 1:50 PM Walter Ray-Dulany > wrote: > > > Correction: > > > > ...bby the binomial theorem, (1+N)**N = 1 + N*N + other terms > divisible... > > > > I multiplied by N on the left when I ought to have exponentiated > > > > Walter > > > > On Mon, Sep 19, 2016 at 1:36 PM, Walter Ray-Dulany > > > wrote: > > > > > Hi Tim, > > > > > > Apologies! It's disorienting at first, and most of all when one > actually > > > tries to sit down and do a real example. The version on the slides was > > not > > > written in one go, I assure you. > > > > > > Let's go through, and see what's not working. > > > > > > ** > > > > > > > I'm trying a very simple example. I'm going to choose, p = 3, q = 5 > > and > > > a message m = 42 > > > > > > Already we're in trouble. p and q are fine; but remember that the > > > plaintext space (let's call it P(N)) is the set of all integers in > Z/NZ; > > > that is, it is all numbers m > > > > > > 0 <= m < N > > > > > > You can see already that the m you chose is not in the plaintext space. > > > > > > Let's pick a new m to continue with; in this case, let's choose your m, > > > but mod 15 so that it lies in P(N). Thus, our new m going forward shall > > be > > > > > > m = 12 > > > > > > ** > > > > > > > I'm going to pick g = 240. I think it needs to be a multiple of N > that > > > is greater than N*N, correct? > > > > > > No, and this is important. g has to be an element of (Z/(N squared )Z)* > > of > > > order a nonzero multiple of N. That sentence is meaningless unless > you're > > > already embedded in the mathematics, so let's go through what it means, > > bit > > > by bit. > > > > > > g must be: > > > 1. *an element of (Z/(N squared)Z)**: everything but the outer * on the > > > right just means that 0 <= g < N*N; in this case that means 0 <= g < > 225. > > > The outer * on the right indicates that we only want to take a certain > > > special kind of g: one that is what we call a *unit* mod N*N; that is, > it > > > means that we require that there exist another element 0<= h < N*N such > > > that g*h = 1 mod N*N. In our current situation, N = p*q is a product of > > > primes, and so N*N = p**2 * q**2, and we can easily characterize G = > > (Z/(N > > > squared)Z)*: G = { 0<= g < N*N such that neither p nor q divide g}. So > as > > > long as we pick a g that does not have p or q as a factor, we're good > for > > > this condition (this also includes 0, so really all of my "0 <=" in > this > > > paragraph could have been "0 < "). Another way to characterize G is to > > say > > > that it is the set of integers less than N*N that are relatively prime > to > > > N*N. > > > > > > 2. *of order a nonzero multiple of N*: this is a little trickier. The > > > *order* of an element g of a finite group (which G is) is the least > > > integer k such that g^k = 1 in G. I'm not going to prove it here, but > it > > > turns out that every element of G has finite order (that is, if g is in > > G, > > > then there exists a finite non-zero k such that g^k = 1), and that it > is > > > less than or equal to the Carmichael number lambda(N*N). That takes > care > > of > > > what 'order' means, and, like I said, order is defined for all g in G. > > But! > > > We require a special order. Specifically, we only want g in G such that > > the > > > order of g is a non-zero multiple of N. We might ask whether we know > that > > > such always exists (a good question, since we require it), and we do! > > > Here's a quick proof of existence, one tied closely to Wideskies: > > > > > > * Take g = 1 + N (I'm going to prove, all at once, that 1+N is in G and > > > that it has an order that fits the bill). > > > * Consider g**N: by the binomial theorem, (1+N)*N = 1 + N*N + other > terms > > > divisible by N*N. This number is equivalent to 1 mod N*N. QED > >
Re: Math deck (was: Re: [GitHub] incubator-pirk pull request #92: [Pirk 67] - Add Slide Deck to the Website D...)
Hey Walter / Tim, I just wanted to add I had some trouble similar to Tim's when trying to understand the Wideskies paper. As a person without a background in group theory/theoretical math trying to get my head around this stuff, it was very difficult for me to even find with Google what the notations (Z/NZ) and (Z/N^2 Z)^x (also called (Z/N^2 Z)* in the Wideskies/Paillier papers) meant. Since these concepts are so central to how the algorithm works, I think it would be really helpful if we had a footnote the first time those are introduced defining that notation with links to a more in-depth explanation, or at least a phrase that can be Googled to reliably find it. Thanks, -Ryan Carr On Mon, Sep 19, 2016 at 1:50 PM Walter Ray-Dulanywrote: > Correction: > > ...bby the binomial theorem, (1+N)**N = 1 + N*N + other terms divisible... > > I multiplied by N on the left when I ought to have exponentiated > > Walter > > On Mon, Sep 19, 2016 at 1:36 PM, Walter Ray-Dulany > wrote: > > > Hi Tim, > > > > Apologies! It's disorienting at first, and most of all when one actually > > tries to sit down and do a real example. The version on the slides was > not > > written in one go, I assure you. > > > > Let's go through, and see what's not working. > > > > ** > > > > > I'm trying a very simple example. I'm going to choose, p = 3, q = 5 > and > > a message m = 42 > > > > Already we're in trouble. p and q are fine; but remember that the > > plaintext space (let's call it P(N)) is the set of all integers in Z/NZ; > > that is, it is all numbers m > > > > 0 <= m < N > > > > You can see already that the m you chose is not in the plaintext space. > > > > Let's pick a new m to continue with; in this case, let's choose your m, > > but mod 15 so that it lies in P(N). Thus, our new m going forward shall > be > > > > m = 12 > > > > ** > > > > > I'm going to pick g = 240. I think it needs to be a multiple of N that > > is greater than N*N, correct? > > > > No, and this is important. g has to be an element of (Z/(N squared )Z)* > of > > order a nonzero multiple of N. That sentence is meaningless unless you're > > already embedded in the mathematics, so let's go through what it means, > bit > > by bit. > > > > g must be: > > 1. *an element of (Z/(N squared)Z)**: everything but the outer * on the > > right just means that 0 <= g < N*N; in this case that means 0 <= g < 225. > > The outer * on the right indicates that we only want to take a certain > > special kind of g: one that is what we call a *unit* mod N*N; that is, it > > means that we require that there exist another element 0<= h < N*N such > > that g*h = 1 mod N*N. In our current situation, N = p*q is a product of > > primes, and so N*N = p**2 * q**2, and we can easily characterize G = > (Z/(N > > squared)Z)*: G = { 0<= g < N*N such that neither p nor q divide g}. So as > > long as we pick a g that does not have p or q as a factor, we're good for > > this condition (this also includes 0, so really all of my "0 <=" in this > > paragraph could have been "0 < "). Another way to characterize G is to > say > > that it is the set of integers less than N*N that are relatively prime to > > N*N. > > > > 2. *of order a nonzero multiple of N*: this is a little trickier. The > > *order* of an element g of a finite group (which G is) is the least > > integer k such that g^k = 1 in G. I'm not going to prove it here, but it > > turns out that every element of G has finite order (that is, if g is in > G, > > then there exists a finite non-zero k such that g^k = 1), and that it is > > less than or equal to the Carmichael number lambda(N*N). That takes care > of > > what 'order' means, and, like I said, order is defined for all g in G. > But! > > We require a special order. Specifically, we only want g in G such that > the > > order of g is a non-zero multiple of N. We might ask whether we know that > > such always exists (a good question, since we require it), and we do! > > Here's a quick proof of existence, one tied closely to Wideskies: > > > > * Take g = 1 + N (I'm going to prove, all at once, that 1+N is in G and > > that it has an order that fits the bill). > > * Consider g**N: by the binomial theorem, (1+N)*N = 1 + N*N + other terms > > divisible by N*N. This number is equivalent to 1 mod N*N. QED > > > > Ok, great, such g exist, and so we can require that we use one of them. > > But you must be careful: you can't just choose any g in G off the street > > and expect it will satisfy our requirements. You chose g = 240, which (1) > > bigger than N*N, which isn't what we want, and (2) is divisible by N, and > > so even if we take 240 mod N*N, we still aren't in G, much less of the > > 'right order' (turns out 240, being not relatively prime to N, can never > be > > exponentiated to 1 mod N*N). For now, let's just take the standard > > Wideskies g, g = 1 + N = 16. If you want to go through
Re: Math deck (was: Re: [GitHub] incubator-pirk pull request #92: [Pirk 67] - Add Slide Deck to the Website D...)
Correction: ...bby the binomial theorem, (1+N)**N = 1 + N*N + other terms divisible... I multiplied by N on the left when I ought to have exponentiated Walter On Mon, Sep 19, 2016 at 1:36 PM, Walter Ray-Dulanywrote: > Hi Tim, > > Apologies! It's disorienting at first, and most of all when one actually > tries to sit down and do a real example. The version on the slides was not > written in one go, I assure you. > > Let's go through, and see what's not working. > > ** > > > I'm trying a very simple example. I'm going to choose, p = 3, q = 5 and > a message m = 42 > > Already we're in trouble. p and q are fine; but remember that the > plaintext space (let's call it P(N)) is the set of all integers in Z/NZ; > that is, it is all numbers m > > 0 <= m < N > > You can see already that the m you chose is not in the plaintext space. > > Let's pick a new m to continue with; in this case, let's choose your m, > but mod 15 so that it lies in P(N). Thus, our new m going forward shall be > > m = 12 > > ** > > > I'm going to pick g = 240. I think it needs to be a multiple of N that > is greater than N*N, correct? > > No, and this is important. g has to be an element of (Z/(N squared )Z)* of > order a nonzero multiple of N. That sentence is meaningless unless you're > already embedded in the mathematics, so let's go through what it means, bit > by bit. > > g must be: > 1. *an element of (Z/(N squared)Z)**: everything but the outer * on the > right just means that 0 <= g < N*N; in this case that means 0 <= g < 225. > The outer * on the right indicates that we only want to take a certain > special kind of g: one that is what we call a *unit* mod N*N; that is, it > means that we require that there exist another element 0<= h < N*N such > that g*h = 1 mod N*N. In our current situation, N = p*q is a product of > primes, and so N*N = p**2 * q**2, and we can easily characterize G = (Z/(N > squared)Z)*: G = { 0<= g < N*N such that neither p nor q divide g}. So as > long as we pick a g that does not have p or q as a factor, we're good for > this condition (this also includes 0, so really all of my "0 <=" in this > paragraph could have been "0 < "). Another way to characterize G is to say > that it is the set of integers less than N*N that are relatively prime to > N*N. > > 2. *of order a nonzero multiple of N*: this is a little trickier. The > *order* of an element g of a finite group (which G is) is the least > integer k such that g^k = 1 in G. I'm not going to prove it here, but it > turns out that every element of G has finite order (that is, if g is in G, > then there exists a finite non-zero k such that g^k = 1), and that it is > less than or equal to the Carmichael number lambda(N*N). That takes care of > what 'order' means, and, like I said, order is defined for all g in G. But! > We require a special order. Specifically, we only want g in G such that the > order of g is a non-zero multiple of N. We might ask whether we know that > such always exists (a good question, since we require it), and we do! > Here's a quick proof of existence, one tied closely to Wideskies: > > * Take g = 1 + N (I'm going to prove, all at once, that 1+N is in G and > that it has an order that fits the bill). > * Consider g**N: by the binomial theorem, (1+N)*N = 1 + N*N + other terms > divisible by N*N. This number is equivalent to 1 mod N*N. QED > > Ok, great, such g exist, and so we can require that we use one of them. > But you must be careful: you can't just choose any g in G off the street > and expect it will satisfy our requirements. You chose g = 240, which (1) > bigger than N*N, which isn't what we want, and (2) is divisible by N, and > so even if we take 240 mod N*N, we still aren't in G, much less of the > 'right order' (turns out 240, being not relatively prime to N, can never be > exponentiated to 1 mod N*N). For now, let's just take the standard > Wideskies g, g = 1 + N = 16. If you want to go through this with a > different g, give it a shot, but make sure it's got the right kind of order. > > ** > > > I'll pick zeta = 21. I think it needs to be greater than N. > > As in point 2, no. We require zeta to be in (Z/NZ)*, which, similar to the > above, means a number > > 0 < zeta < N such that zeta is a unit. > > You picked 21; if we take 21 mod N we get zeta = 6, which is not a unit > (in particular it is not relatively prime to p=3). Let's pick the next > number greater than 6 which is in (Z/NZ)*, which is > > zeta = 7. > > ** > > Let's see what we've got. > > ( (16**12)*(7**15) ) mod 225 = 208. > > I will leave it as an exercise to check that the decryption of 208 is in > fact 12. > > ** > > Ok, that's all so far. If the above is still not computing (literally or > metaphorically), I am available to converse one-on-one either over the > phone or some other medium (face time or what
Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Refactor is definitely a first priority. Is there a design/proposal draft that we could comment on about how to go about refactoring the code. I have been trying to keep up with the emails but definitely would have missed some. On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams < eawilli...@apache.org> wrote: > Agree - let's leave the config/CLI the way it is for now and tackle that as > a subsequent design discussion and PR. > > Also, I think that we should leave the ResponderDriver and the > ResponderProps alone for this PR and push to a subsequent PR (once we > decide if and how we would like to delegate each). > > I vote to remove the 'platform' option and the backwards compatibility in > this PR and proceed with having a ResponderLauncher interface and forcing > its implementation by the ResponderDriver. > > And, I am not so concerned with having one fat jar vs. multiple jars right > now - to me, at this point, it's a 'nice to have' and not a 'must have' for > Pirk functionality. We do need to break out Pirk into more clearly defined > submodules (which is in progress) - via this re-factor, I think that we > will gain some ability to generate multiple jars which is nice. > > > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison> wrote: > > > On 19/09/16 15:46, Darin Johnson wrote: > > > Hey guys, > > > > > > Thanks for looking at the PR, I apologize if it offended anyone's > eyes:). > > > > > > I'm glad it generated some discussion about the configuration. I > didn't > > > really like where things were heading with the config. However, didn't > > > want to create to much scope creep. > > > > > > I think any hierarchical config (TypeSafe or yaml) would make things > much > > > more maintainable, the plugin could simply grab the appropriate part of > > the > > > config and handle accordingly. I'd also cut down the number of command > > > line options to only those that change between runs often (like > > > input/output) > > > > > >> One option is to make Pirk pluggable, so that a Pirk installation > could > > >> use one or more of these in an extensible fashion by adding JAR files. > > >> That would still require selecting one by command-line argument. > > > > > > An argument for this approach is for lambda architecture approaches > (say > > > spark/spark-streaming) were the contents of the jars would be so > similar > > it > > > seems like to much trouble to create separate jars. > > > > > > Happy to continue working on this given some direction on where you'd > > like > > > it to go. Also, it's a bit of a blocker to refactoring the build into > > > submodules. > > > > FWIW my 2c is to not try and fix all the problems in one go, and rather > > take a compromise on the configurations while you tease apart the > > submodules in to separate source code trees, poms, etc; then come back > > and fix the runtime configs. > > > > Once the submodules are in place it will open up more work for release > > engineering and tinkering that can be done in parallel with the config > > polishing. > > > > Just a thought. > > Tim > > > > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison > > wrote: > > > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote: > > >>> It seems that it's the same idea as the ResponderLauncher with the > > >> service > > >>> component added to maintain something akin to the 'platform'. I would > > >>> prefer that we just did away with the platform notion altogether and > > make > > >>> the ResponderDriver 'dumb'. We get around needing a platform-aware > > >> service > > >>> by requiring the ResponderLauncher implementation to be passed as a > CLI > > >> to > > >>> the ResponderDriver. > > >> > > >> Let me check I understand what you are saying here. > > >> > > >> At the moment, there is a monolithic Pirk that hard codes how to > respond > > >> using lots of different backends (mapreduce, spark, sparkstreaming, > > >> storm , standalone), and that is selected by command-line argument. > > >> > > >> One option is to make Pirk pluggable, so that a Pirk installation > could > > >> use one or more of these in an extensible fashion by adding JAR files. > > >> That would still require selecting one by command-line argument. > > >> > > >> A second option is to simply pass in the required backend JAR to > select > > >> the particular implementation you choose, as a specific Pirk > > >> installation doesn't need to use multiple backends simultaneously. > > >> > > >> ...and you are leaning towards the second option. Do I have that > > correct? > > >> > > >> Regards, > > >> Tim > > >> > > >>> Am I missing something? Is there a good reason to provide a service > by > > >>> which platforms are registered? I'm open... > > >>> > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison > > >> wrote: > > >>> > > How about an approach like this? > > https://github.com/tellison/incubator-pirk/tree/pirk-63 > > > >
Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Hey guys, Thanks for looking at the PR, I apologize if it offended anyone's eyes:). I'm glad it generated some discussion about the configuration. I didn't really like where things were heading with the config. However, didn't want to create to much scope creep. I think any hierarchical config (TypeSafe or yaml) would make things much more maintainable, the plugin could simply grab the appropriate part of the config and handle accordingly. I'd also cut down the number of command line options to only those that change between runs often (like input/output) >One option is to make Pirk pluggable, so that a Pirk installation could >use one or more of these in an extensible fashion by adding JAR files. >That would still require selecting one by command-line argument. An argument for this approach is for lambda architecture approaches (say spark/spark-streaming) were the contents of the jars would be so similar it seems like to much trouble to create separate jars. Happy to continue working on this given some direction on where you'd like it to go. Also, it's a bit of a blocker to refactoring the build into submodules. Darin On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellisonwrote: > On 19/09/16 13:40, Ellison Anne Williams wrote: > > It seems that it's the same idea as the ResponderLauncher with the > service > > component added to maintain something akin to the 'platform'. I would > > prefer that we just did away with the platform notion altogether and make > > the ResponderDriver 'dumb'. We get around needing a platform-aware > service > > by requiring the ResponderLauncher implementation to be passed as a CLI > to > > the ResponderDriver. > > Let me check I understand what you are saying here. > > At the moment, there is a monolithic Pirk that hard codes how to respond > using lots of different backends (mapreduce, spark, sparkstreaming, > storm , standalone), and that is selected by command-line argument. > > One option is to make Pirk pluggable, so that a Pirk installation could > use one or more of these in an extensible fashion by adding JAR files. > That would still require selecting one by command-line argument. > > A second option is to simply pass in the required backend JAR to select > the particular implementation you choose, as a specific Pirk > installation doesn't need to use multiple backends simultaneously. > > ...and you are leaning towards the second option. Do I have that correct? > > Regards, > Tim > > > Am I missing something? Is there a good reason to provide a service by > > which platforms are registered? I'm open... > > > > On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison > wrote: > > > >> How about an approach like this? > >>https://github.com/tellison/incubator-pirk/tree/pirk-63 > >> > >> The "on-ramp" is the driver [1], which calls upon the service to find a > >> plug-in [2] that claims to implement the required platform responder, > >> e.g. [3]. > >> > >> The list of plug-ins is given in the provider's JAR file, so the ones we > >> provide in Pirk are listed together [4], but if you split these into > >> modules, or somebody brings their own JAR alongside, these would be > >> listed in each JAR's services/ directory. > >> > >> [1] > >> https://github.com/tellison/incubator-pirk/blob/pirk-63/ > >> src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java > >> [2] > >> https://github.com/tellison/incubator-pirk/blob/pirk-63/ > >> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java > >> [3] > >> https://github.com/tellison/incubator-pirk/blob/pirk-63/ > >> src/main/java/org/apache/pirk/responder/wideskies/storm/ > >> StormResponder.java > >> [4] > >> https://github.com/tellison/incubator-pirk/blob/pirk-63/ > >> src/main/services/org.apache.responder.spi.Responder > >> > >> I'm not even going to dignify this with a WIP PR, it is far from ready, > >> so proceed with caution. There is hopefully enough there to show the > >> approach, and if it is worth continuing I'm happy to do so. > >> > >> Regards, > >> Tim > >> > >> > > >
Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
On 19/09/16 13:39, Suneel Marthi wrote: > The way this PR is now is so similar to how bad IBM SystemML which is a > hackwork of hurriedly put together and something I have often pointed out > to others as a clear example of "how not to design crappy software". See > this gist of an example code snippet from IBM SystemML - > https://gist.github.com/smarthi/eb848e46621b7444924f Not sure if you are looking at PR93, or the URL I sent you. I agree that a large, explicit enumeration via a switch/if statement is not conducive to extensibility, and that is what PIRK-63 is trying to address. > First things for the project: > > 1. Move away from using the java properties (this is so 2002 way of doing > things) to using TypeSafe style configurations which allow for structured > properties. >From a quick look, that covers a different level, namely how the configurations are represented. First we need to look at the responder architecture to allow for different responder types to be plugged in to the Pirk framework. Each plug-in responder type can figure out how to depict it's configuration. > 2. From a Responder design, there would be a Responder-impl-class property > which would be read from TypeSafe config and the appropriate driver class > invoked. I've not used style configurations before. I think they overlap with the SystemConfiguration a bit. It would be interesting to see what changes. > As an example for the above ^^^ two, please look at at the Oryx 2.0 project > for reference > > https://github.com/oryxproject/oryx I'd rather look at a proposed change to Pirk ;-) Regards, Tim > On Mon, Sep 19, 2016 at 2:28 PM, Tim Ellisonwrote: > >> How about an approach like this? >>https://github.com/tellison/incubator-pirk/tree/pirk-63 >> >> The "on-ramp" is the driver [1], which calls upon the service to find a >> plug-in [2] that claims to implement the required platform responder, >> e.g. [3]. >> >> The list of plug-ins is given in the provider's JAR file, so the ones we >> provide in Pirk are listed together [4], but if you split these into >> modules, or somebody brings their own JAR alongside, these would be >> listed in each JAR's services/ directory. >> >> [1] >> https://github.com/tellison/incubator-pirk/blob/pirk-63/ >> src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java >> [2] >> https://github.com/tellison/incubator-pirk/blob/pirk-63/ >> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java >> [3] >> https://github.com/tellison/incubator-pirk/blob/pirk-63/ >> src/main/java/org/apache/pirk/responder/wideskies/storm/ >> StormResponder.java >> [4] >> https://github.com/tellison/incubator-pirk/blob/pirk-63/ >> src/main/services/org.apache.responder.spi.Responder >> >> I'm not even going to dignify this with a WIP PR, it is far from ready, >> so proceed with caution. There is hopefully enough there to show the >> approach, and if it is worth continuing I'm happy to do so. >> >> Regards, >> Tim >> >> >
[GitHub] incubator-pirk issue #94: Update a number of Pirk's pom dependencies.
Github user ellisonanne commented on the issue: https://github.com/apache/incubator-pirk/pull/94 +1 will merge now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
How about an approach like this? https://github.com/tellison/incubator-pirk/tree/pirk-63 The "on-ramp" is the driver [1], which calls upon the service to find a plug-in [2] that claims to implement the required platform responder, e.g. [3]. The list of plug-ins is given in the provider's JAR file, so the ones we provide in Pirk are listed together [4], but if you split these into modules, or somebody brings their own JAR alongside, these would be listed in each JAR's services/ directory. [1] https://github.com/tellison/incubator-pirk/blob/pirk-63/src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java [2] https://github.com/tellison/incubator-pirk/blob/pirk-63/src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java [3] https://github.com/tellison/incubator-pirk/blob/pirk-63/src/main/java/org/apache/pirk/responder/wideskies/storm/StormResponder.java [4] https://github.com/tellison/incubator-pirk/blob/pirk-63/src/main/services/org.apache.responder.spi.Responder I'm not even going to dignify this with a WIP PR, it is far from ready, so proceed with caution. There is hopefully enough there to show the approach, and if it is worth continuing I'm happy to do so. Regards, Tim
[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Github user DarinJ commented on a diff in the pull request: https://github.com/apache/incubator-pirk/pull/93#discussion_r79377189 --- Diff: src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java --- @@ -49,83 +41,111 @@ public class ResponderDriver { private static final Logger logger = LoggerFactory.getLogger(ResponderDriver.class); + // ClassNames to instantiate Platforms using the platform CLI + private final static String MAPREDUCE_LAUNCHER = "org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponderLauncher"; + private final static String SPARK_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher"; + private final static String SPARKSTREAMING_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.streaming.SparkStreamingResponderLauncher"; + private final static String STANDALONE_LAUNCHER = "org.apache.pirk.responder.wideskies.standalone.StandaloneResponderLauncher"; + private final static String STORM_LAUNCHER = "org.apache.pirk.responder.wideskies.storm.StormResponderLauncher"; private enum Platform { MAPREDUCE, SPARK, SPARKSTREAMING, STORM, STANDALONE, NONE } - public static void main(String[] args) throws Exception + private static void launch(String launcherClassName) + { +logger.info("Launching Responder with {}", launcherClassName); +try +{ + Class clazz = Class.forName(launcherClassName); + if (ResponderLauncher.class.isAssignableFrom(clazz)) + { +Object launcherInstance = clazz.newInstance(); +Method m = launcherInstance.getClass().getDeclaredMethod("run"); +m.invoke(launcherInstance); + } + else + { +logger.error("Class {} does not implement ResponderLauncher", launcherClassName); + } +} +catch (ClassNotFoundException e) +{ + logger.error("Class {} not found, check launcher property: {}", launcherClassName); +} +catch (NoSuchMethodException e) +{ + logger.error("In {} run method not found: {}", launcherClassName); +} +catch (InvocationTargetException e) +{ + logger.error("In {} run method could not be invoked: {}: {}", launcherClassName, e); +} +catch (InstantiationException e) +{ + logger.error("Instantiation exception within {}: {}", launcherClassName, e); +} +catch (IllegalAccessException e) +{ + logger.error("IllegalAccess Exception {}", e); +} + } + + public static void main(String[] args) { ResponderCLI responderCLI = new ResponderCLI(args); // For handling System.exit calls from Spark Streaming System.setSecurityManager(new SystemExitManager()); -Platform platform = Platform.NONE; -String platformString = SystemConfiguration.getProperty(ResponderProps.PLATFORM); -try -{ - platform = Platform.valueOf(platformString.toUpperCase()); -} catch (IllegalArgumentException e) +String launcherClassName = SystemConfiguration.getProperty(ResponderProps.LAUNCHER); +if (launcherClassName != null) { - logger.error("platform " + platformString + " not found."); + launch(launcherClassName); } - -logger.info("platform = " + platform); -switch (platform) +else { - case MAPREDUCE: -logger.info("Launching MapReduce ResponderTool:"); - -ComputeResponseTool pirWLTool = new ComputeResponseTool(); -ToolRunner.run(pirWLTool, new String[] {}); -break; - - case SPARK: -logger.info("Launching Spark ComputeResponse:"); - -ComputeResponse computeResponse = new ComputeResponse(FileSystem.get(new Configuration())); -computeResponse.performQuery(); -break; - - case SPARKSTREAMING: -logger.info("Launching Spark ComputeStreamingResponse:"); - -ComputeStreamingResponse computeSR = new ComputeStreamingResponse(FileSystem.get(new Configuration())); -try -{ - computeSR.performQuery(); -} catch (SystemExitException e) -{ - // If System.exit(0) is not caught from Spark Streaming, - // the application will complete with a 'failed' status - logger.info("Exited with System.exit(0) from Spark Streaming"); -} - -// Teardown the context -computeSR.teardown(); -break; - - case STORM: -logger.info("Launching Storm PirkTopology:"); -PirkTopology.runPirkTopology(); -break; - - case
[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Github user DarinJ commented on a diff in the pull request: https://github.com/apache/incubator-pirk/pull/93#discussion_r79377024 --- Diff: src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java --- @@ -49,83 +41,111 @@ public class ResponderDriver { private static final Logger logger = LoggerFactory.getLogger(ResponderDriver.class); + // ClassNames to instantiate Platforms using the platform CLI + private final static String MAPREDUCE_LAUNCHER = "org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponderLauncher"; + private final static String SPARK_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher"; + private final static String SPARKSTREAMING_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.streaming.SparkStreamingResponderLauncher"; + private final static String STANDALONE_LAUNCHER = "org.apache.pirk.responder.wideskies.standalone.StandaloneResponderLauncher"; + private final static String STORM_LAUNCHER = "org.apache.pirk.responder.wideskies.storm.StormResponderLauncher"; --- End diff -- Yes, I added this for backwards compatibility. Maybe overkill this early in the game, but didn't want to break anyone's scrips/bash history to quickly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-pirk issue #93: WIP-Pirk 63-DO NOT MERGE
Github user ellisonanne commented on the issue: https://github.com/apache/incubator-pirk/pull/93 A few other comments for discussion: First, I am not opposed to having separate ResponderDrivers for each responder, but let's think it through and see if we really need to go down that path. I think that that main concern with having a single ResponderDriver vs. delegating the ResponderDrivers to each responder is the bloating of the main CLI and ResponderProps. Other than keeping the CLI/Props under control, I can't see a particularly good, material (i.e. not stylistic) reason to delegate now that we are rolling in a ResponderLauncher. The ResponderProps can go ahead and be delegated down into the specific responders independently of whether or not the ResponderDrivers get delegated. The ResponderLauncher for each responder can be responsible for implementing the 'validateResponderProperties' method that is currently in the central ResponderProps - since the CLI loads the properties from the properties files into SystemConfiguration, it will not require passing anything extra to the launchers. One design alternative to breaking out into specific ResponderDrivers (which I am not opposed to BTW) would be to only allow the core properties in the main CLI and force everything else to be specified via properties files. This is somewhat limiting in some (contrived) cases that I can think of, but it would allow for a main CLI and prevent the bloat since responder-specific CLI options would not need to be added to the main CLI. Thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-pirk issue #93: WIP-Pirk 63-DO NOT MERGE
Github user ellisonanne commented on the issue: https://github.com/apache/incubator-pirk/pull/93 +1 - looks good so far. One item for consideration: I am in favor of *not* providing backwards compatibility with the 'platform' option at this point, i.e. removing it altogether in favor of just the launcher. Since we just completed our first release, I think that we can go ahead and change the API - this would only require an argument change in current command lines and a deployment of the new jar - completely doable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Intermittent problems with PIRK-35
I have not seen/experienced similar issues, but I am fine with rolling it back... On Mon, Sep 19, 2016 at 6:05 AM, Tim Ellisonwrote: > I have intermittent failures caused by > "PIRK-35 Execute Tests in Parallel" > > such as > > > --- > T E S T S > --- > Error occurred during initialization of VM > java.lang.OutOfMemoryError: unable to create new native thread > Error occurred during initialization of VM > java.lang.OutOfMemoryError: unable to create new native thread > Running org.apache.pirk.schema.data.LoadDataSchemaTest > Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.434 > sec - in org.apache.pirk.schema.data.LoadDataSchemaTest > Running org.apache.pirk.schema.query.LoadQuerySchemaTest > > > and > > > Error occurred during initialization of VM > Cannot create VM thread. Out of system resources. > Error occurred during initialization of VM > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:714) > at java.lang.ref.Finalizer.(Finalizer.java:226) > Running org.apache.pirk.schema.data.LoadDataSchemaTest > > > > My laptop is an 8-way machine with 24GB RAM, without ulimits. > I've been running with Oracle Java 8 b102, which defaults to > -XX:InitialHeapSize=387619456 -XX:MaxHeapSize=6201911296, and IBM Java 8 > SR3fp10. > > Spinning up all tests simultaneously, especially with the > new KafkaStorm tests is too much. > > I'm working around it by deleting the PIRK-35 changes, and I get a full > test run in 2mins. > > Do other see similar problems? An objection to me reverting PIRK-35 now > that the tests are running faster anyway? > > Regards, > Tim >
Intermittent problems with PIRK-35
I have intermittent failures caused by "PIRK-35 Execute Tests in Parallel" such as --- T E S T S --- Error occurred during initialization of VM java.lang.OutOfMemoryError: unable to create new native thread Error occurred during initialization of VM java.lang.OutOfMemoryError: unable to create new native thread Running org.apache.pirk.schema.data.LoadDataSchemaTest Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.434 sec - in org.apache.pirk.schema.data.LoadDataSchemaTest Running org.apache.pirk.schema.query.LoadQuerySchemaTest and Error occurred during initialization of VM Cannot create VM thread. Out of system resources. Error occurred during initialization of VM java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:714) at java.lang.ref.Finalizer.(Finalizer.java:226) Running org.apache.pirk.schema.data.LoadDataSchemaTest My laptop is an 8-way machine with 24GB RAM, without ulimits. I've been running with Oracle Java 8 b102, which defaults to -XX:InitialHeapSize=387619456 -XX:MaxHeapSize=6201911296, and IBM Java 8 SR3fp10. Spinning up all tests simultaneously, especially with the new KafkaStorm tests is too much. I'm working around it by deleting the PIRK-35 changes, and I get a full test run in 2mins. Do other see similar problems? An objection to me reverting PIRK-35 now that the tests are running faster anyway? Regards, Tim
[GitHub] incubator-pirk issue #94: Update a number of Pirk's pom dependencies.
Github user smarthi commented on the issue: https://github.com/apache/incubator-pirk/pull/94 +1 to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-pirk pull request #94: Update a number of Pirk's pom dependencies.
GitHub user tellison opened a pull request: https://github.com/apache/incubator-pirk/pull/94 Update a number of Pirk's pom dependencies. - move Pirk to later versions of JMH, Hadoop, commons-math3, commons-net, json-simple, jacoco-maven-plugin, coveralls-maven-plugin, Surefire, maven-jar-plugin, and maven-release-plugin. - Note that Storm version 1.0.1 passes Pirk tests, but Storm version 1.0.2 fails with NoClassDefFoundError: org/apache/kafka/common/protocol/SecurityProtocol You can merge this pull request into a Git repository by running: $ git pull https://github.com/tellison/incubator-pirk versions Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-pirk/pull/94.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #94 commit 43b682aac24ae2e3998907ccc2a3eb695e7c2cb3 Author: Tim EllisonDate: 2016-09-15T13:58:32Z Update a number of pom dependencies. - move Pirk to later versions of JMH, Hadoop, Storm, commons-math3, commons-net, json-simple, jacoco-maven-plugin, coveralls-maven-plugin, Surefire, maven-jar-plugin, and maven-release-plugin. commit 6fe4241de34879f5b3420cb287947ad42aa481aa Author: Tim Ellison Date: 2016-09-15T14:21:06Z Revert Storm version change - Storm version 1.0.1 passes Pirk tests, but Storm version 1.0.2 fails with NoClassDefFoundError: org/apache/kafka/common/protocol/SecurityProtocol --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Darin, Unless I'm reading this wrong, the patch still has many references from the ResponderDriver to the set of currently supported responders. This code will have to change when somebody wants to add a new responder type. I thought the plan was to have the responder driver agnostic of the responders available? So, for example, having the driver maintain a list of responders by name, and letting people specify the name on the command line. Each responder would then be responsible for implementing a standardised interface, and registering themselves with the driver by name. In that model the responders would each know about (a) the driver, and how to register themselves by name, and (b) implement a standard life-cycle for building a response. The driver would be responsible for (a) collecting and maintaining the registrations of any responder being loaded, and (b) invoking the correct responder based on user selection. Make sense? I can hack something together to show what I mean. Regards, Tim On 19/09/16 07:05, DarinJ wrote: > GitHub user DarinJ opened a pull request: > > https://github.com/apache/incubator-pirk/pull/93 > > WIP-Pirk 63-DO NOT MERGE > > This is a WIP for > [PIRK-63](https://issues.apache.org/jira/browse/PIRK-63) to open the door to > other responders without having to modify the actual code of Pirk. It's > submitted for feedback only, please DO NOT MERGE. I've only tested > standalone mode. > > It deprecates the "platform" CLI option in favor of the "launcher" option > which is the name of a class implementing the `ResponderLauncher` interface > which will invoke the run method via reflection. This allows a developer of > a different responder to merely place a jar on the classpath and specify the > appropriate `ResponderLauncher` on the classpath. > > The "platform" CLI option is still made available. However, I removed > the explicit dependencies in favor of using reflection. This was done in > anticipation other refactoring the build into submodules, though this does > admittedly make the code more fragile. > > ResponderDriver had no unit tests, and unfortunately I saw no good way to > create good ones for this particular change, especially as it required > multiple frameworks to run. > > I should say that another possible route here is to have each framework > responder implement their own ResponderDriver. We could provide some > utilities to check the minimum Pirk required options are set, but leave the > rest to the implementation of the responder. It would clean up the > ResponderCLI and ResponderProps which are rather bloated and might continue > to grow if left unchecked. > > You can merge this pull request into a Git repository by running: > > $ git pull https://github.com/DarinJ/incubator-pirk Pirk-63 > > Alternatively you can review and apply these changes as the patch at: > > https://github.com/apache/incubator-pirk/pull/93.patch > > To close this pull request, make a commit to your master/trunk branch > with (at least) the following in the commit message: > > This closes #93 > > > commit dda458bb2ae77fd9e3dc686d17dd8b49095b3395 > Author: Darin Johnson> Date: 2016-09-13T03:19:12Z > > This is a WIP for > [PIRK-63](https://issues.apache.org/jira/browse/PIRK-63) to open the door to > other responders without having to modify the actual code of Pirk. It's > submitted for feedback only, please DO NOT MERGE. > > It deprecates the "platform" CLI option in favor of the "launcher" option > which is the name of a class implementing the `ResponderLauncher` interface > which will invoke the run method via reflection. This allows a developer of > a different responder to merely place a jar on the classpath and specify the > appropriate `ResponderLauncher` on the classpath. > > The "platform" CLI option is still made available. However, I removed > the explicit dependencies in favor of using reflection. This was done in > anticipation other refactoring the build into submodules, though this does > admittedly make the code more fragile. > > > > > --- > If your project is set up for it, you can reply to this email and have your > reply appear on GitHub as well. If your project does not have this feature > enabled and wishes so, or if the feature is enabled but not working, please > contact infrastructure at infrastruct...@apache.org or file a JIRA ticket > with INFRA. > --- >
[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Github user tellison commented on a diff in the pull request: https://github.com/apache/incubator-pirk/pull/93#discussion_r79352002 --- Diff: src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java --- @@ -49,83 +41,111 @@ public class ResponderDriver { private static final Logger logger = LoggerFactory.getLogger(ResponderDriver.class); + // ClassNames to instantiate Platforms using the platform CLI + private final static String MAPREDUCE_LAUNCHER = "org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponderLauncher"; + private final static String SPARK_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher"; + private final static String SPARKSTREAMING_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.streaming.SparkStreamingResponderLauncher"; + private final static String STANDALONE_LAUNCHER = "org.apache.pirk.responder.wideskies.standalone.StandaloneResponderLauncher"; + private final static String STORM_LAUNCHER = "org.apache.pirk.responder.wideskies.storm.StormResponderLauncher"; --- End diff -- I'm confused by this, I though the goal of PIRK-63 was to avoid having to change the ResponderDriver each time a new responder type is introduced? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
Github user tellison commented on a diff in the pull request: https://github.com/apache/incubator-pirk/pull/93#discussion_r79351660 --- Diff: src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java --- @@ -49,83 +41,111 @@ public class ResponderDriver { private static final Logger logger = LoggerFactory.getLogger(ResponderDriver.class); + // ClassNames to instantiate Platforms using the platform CLI + private final static String MAPREDUCE_LAUNCHER = "org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponderLauncher"; + private final static String SPARK_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher"; + private final static String SPARKSTREAMING_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.streaming.SparkStreamingResponderLauncher"; + private final static String STANDALONE_LAUNCHER = "org.apache.pirk.responder.wideskies.standalone.StandaloneResponderLauncher"; + private final static String STORM_LAUNCHER = "org.apache.pirk.responder.wideskies.storm.StormResponderLauncher"; private enum Platform { MAPREDUCE, SPARK, SPARKSTREAMING, STORM, STANDALONE, NONE } - public static void main(String[] args) throws Exception + private static void launch(String launcherClassName) + { +logger.info("Launching Responder with {}", launcherClassName); +try +{ + Class clazz = Class.forName(launcherClassName); + if (ResponderLauncher.class.isAssignableFrom(clazz)) + { +Object launcherInstance = clazz.newInstance(); +Method m = launcherInstance.getClass().getDeclaredMethod("run"); +m.invoke(launcherInstance); + } + else + { +logger.error("Class {} does not implement ResponderLauncher", launcherClassName); + } +} +catch (ClassNotFoundException e) +{ + logger.error("Class {} not found, check launcher property: {}", launcherClassName); +} +catch (NoSuchMethodException e) +{ + logger.error("In {} run method not found: {}", launcherClassName); +} +catch (InvocationTargetException e) +{ + logger.error("In {} run method could not be invoked: {}: {}", launcherClassName, e); +} +catch (InstantiationException e) +{ + logger.error("Instantiation exception within {}: {}", launcherClassName, e); +} +catch (IllegalAccessException e) +{ + logger.error("IllegalAccess Exception {}", e); +} + } + + public static void main(String[] args) { ResponderCLI responderCLI = new ResponderCLI(args); // For handling System.exit calls from Spark Streaming System.setSecurityManager(new SystemExitManager()); -Platform platform = Platform.NONE; -String platformString = SystemConfiguration.getProperty(ResponderProps.PLATFORM); -try -{ - platform = Platform.valueOf(platformString.toUpperCase()); -} catch (IllegalArgumentException e) +String launcherClassName = SystemConfiguration.getProperty(ResponderProps.LAUNCHER); +if (launcherClassName != null) { - logger.error("platform " + platformString + " not found."); + launch(launcherClassName); } - -logger.info("platform = " + platform); -switch (platform) +else { - case MAPREDUCE: -logger.info("Launching MapReduce ResponderTool:"); - -ComputeResponseTool pirWLTool = new ComputeResponseTool(); -ToolRunner.run(pirWLTool, new String[] {}); -break; - - case SPARK: -logger.info("Launching Spark ComputeResponse:"); - -ComputeResponse computeResponse = new ComputeResponse(FileSystem.get(new Configuration())); -computeResponse.performQuery(); -break; - - case SPARKSTREAMING: -logger.info("Launching Spark ComputeStreamingResponse:"); - -ComputeStreamingResponse computeSR = new ComputeStreamingResponse(FileSystem.get(new Configuration())); -try -{ - computeSR.performQuery(); -} catch (SystemExitException e) -{ - // If System.exit(0) is not caught from Spark Streaming, - // the application will complete with a 'failed' status - logger.info("Exited with System.exit(0) from Spark Streaming"); -} - -// Teardown the context -computeSR.teardown(); -break; - - case STORM: -logger.info("Launching Storm PirkTopology:"); -PirkTopology.runPirkTopology(); -break; - - case
[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE
GitHub user DarinJ opened a pull request: https://github.com/apache/incubator-pirk/pull/93 WIP-Pirk 63-DO NOT MERGE This is a WIP for [PIRK-63](https://issues.apache.org/jira/browse/PIRK-63) to open the door to other responders without having to modify the actual code of Pirk. It's submitted for feedback only, please DO NOT MERGE. I've only tested standalone mode. It deprecates the "platform" CLI option in favor of the "launcher" option which is the name of a class implementing the `ResponderLauncher` interface which will invoke the run method via reflection. This allows a developer of a different responder to merely place a jar on the classpath and specify the appropriate `ResponderLauncher` on the classpath. The "platform" CLI option is still made available. However, I removed the explicit dependencies in favor of using reflection. This was done in anticipation other refactoring the build into submodules, though this does admittedly make the code more fragile. ResponderDriver had no unit tests, and unfortunately I saw no good way to create good ones for this particular change, especially as it required multiple frameworks to run. I should say that another possible route here is to have each framework responder implement their own ResponderDriver. We could provide some utilities to check the minimum Pirk required options are set, but leave the rest to the implementation of the responder. It would clean up the ResponderCLI and ResponderProps which are rather bloated and might continue to grow if left unchecked. You can merge this pull request into a Git repository by running: $ git pull https://github.com/DarinJ/incubator-pirk Pirk-63 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-pirk/pull/93.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #93 commit dda458bb2ae77fd9e3dc686d17dd8b49095b3395 Author: Darin JohnsonDate: 2016-09-13T03:19:12Z This is a WIP for [PIRK-63](https://issues.apache.org/jira/browse/PIRK-63) to open the door to other responders without having to modify the actual code of Pirk. It's submitted for feedback only, please DO NOT MERGE. It deprecates the "platform" CLI option in favor of the "launcher" option which is the name of a class implementing the `ResponderLauncher` interface which will invoke the run method via reflection. This allows a developer of a different responder to merely place a jar on the classpath and specify the appropriate `ResponderLauncher` on the classpath. The "platform" CLI option is still made available. However, I removed the explicit dependencies in favor of using reflection. This was done in anticipation other refactoring the build into submodules, though this does admittedly make the code more fragile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---