Re: [DISCUSS] GeoLite database licensing change, master build broken…..
So, what I think we need, is for someone who has some idea of how to proceed to fill out the jira with either a list or actual tasks, that someone else could follow to accomplish this. On January 13, 2020 at 19:08:20, Michael Miklavcic ( michael.miklav...@gmail.com) wrote: Yes, understood. We do not provide a mechanism for cleaning individual records from the DB file. Just a thought, but a workaround could be to download the latest file from maxmind and simply delete the old file. That's not ideal, but it might be something that could still be automated depending on the maxmind implementation. Definitely open to other options, but I wanted to at least come to the table with a simple workable option. On Mon, Jan 13, 2020 at 4:36 PM Justin Leet wrote: > To clarify a bit, on Otto's comment > > > I have not read the new license, but the idea I believe is yes, whomever > > downloads and accepts the new license is then responsible for adherence > to > > the applicable law, > > Which is why the Apache Foundation cannot be that entity I would think > > > My concern is less the legal concern about anything regarding Metron or > ASF, I 100% agree the ASF isn't responsible for users fulfilling their > legal obligations. My concern there about the practical ability of users to > even be able to do the sort of recalculation necessary to properly act on > such a requirement. The minimum is probably to call out they need to clean > it up, but that may cause a lot of friction for our users. > > On Mon, Jan 13, 2020 at 5:52 PM Michael Miklavcic < > michael.miklav...@gmail.com> wrote: > > > +1 to Apache Legal advice > > > > On Mon, Jan 13, 2020 at 3:30 PM Otto Fowler > > wrote: > > > > > Justin, > > > > > > I have not read the new license, but the idea I believe is yes, > whomever > > > downloads and accepts the new license is then responsible for adherence > > to > > > the applicable law, > > > Which is why the Apache Foundation cannot be that entity I would think. > > > > > > We may want to send this past Apache Legal? > > > > > > > > > > > > On January 13, 2020 at 17:22:21, Justin Leet (justinjl...@gmail.com) > > > wrote: > > > > > > On the whole, I agree. I think the immediate focus should be to rip out > > > maxmind from default usage in master, even if it's done a bit roughly. > > > Getting master at least building for people would probably be a good > > first > > > step. > > > > > > Couple further thoughts > > > > > > - JUnit 5 supports a variety of conditional testing conditions. It > might > > > be possible to do "Test is enabled if property > > > maxmind.geo.database.location is set", and just provide instructions to > > > users on actually doing it instead of having to manually enable/disable > > the > > > test via code. > > > - Remove DB download/install from Ambari. I agree with removing the > > > actual dl/load, but still keeping the default location setup. > > > - I'm inclined to think it should be removed from the default demo > > > enrichments. We might be able to replace the GEO_GET with some dummy > > > stellar and just produce similar outputs, since our demo data has a > > limited > > > set of IPs, iirc. Obviously, we'd want to document that fairly well and > > > also let users manually set it up for non-demo data. > > > - CLI tool should be fine for letting users load, it's more or less why > > > it existed in the first place. Maybe we also add a check if someone > tries > > > using the old URL and let them know. > > > - We need to document that these changes occurred for CCPA reasons, so > > > that users can evaluate the consequences. > > > - Given that I am not even slightly a lawyer, my understanding here is > > > likely wrong, but are there further implications here? Say a user runs > a > > > bunch of data through Metron (both enrichments and profiler) using > these > > > IPs. A subset of those IPs is removed as it shouldn't be used. Would a > > user > > > then have to remove all data derived from these IPs (e.g. anything in > > > ES/Solr, anything in HDFS, any profiles using the data?). If they do > have > > > to remove it, I assume that's not directly out problem (since they have > > to > > > do cleanup), but we're probably not making it easy on them in terms of > > > providing the ability to clean up that sort of information and rerun > > > profiles and such. > > > > > > > > > On Mon, Jan 13, 2020 at 5:20 PM Otto Fowler > > > wrote: > > > > > > > I agree with all of that, my only thinking on contrib is, if that > > > component > > > > is not tested and has no coverage, and needs manual steps, then we > may > > > want > > > > to separate it. > > > > We would have to have some handle on full dev as well right? > > > > > > > > On January 13, 2020 at 16:57:13, Michael Miklavcic ( > > > > michael.miklav...@gmail.com) wrote: > > > > > > > > Hey Otto, > > > > > > > > As I mentioned above, we have had this issue with other components > > > before, > > > > e.g. mysql. I don't see a compelling re
Re: [DISCUSS] GeoLite database licensing change, master build broken…..
Yes, understood. We do not provide a mechanism for cleaning individual records from the DB file. Just a thought, but a workaround could be to download the latest file from maxmind and simply delete the old file. That's not ideal, but it might be something that could still be automated depending on the maxmind implementation. Definitely open to other options, but I wanted to at least come to the table with a simple workable option. On Mon, Jan 13, 2020 at 4:36 PM Justin Leet wrote: > To clarify a bit, on Otto's comment > > > I have not read the new license, but the idea I believe is yes, whomever > > downloads and accepts the new license is then responsible for adherence > to > > the applicable law, > > Which is why the Apache Foundation cannot be that entity I would think > > > My concern is less the legal concern about anything regarding Metron or > ASF, I 100% agree the ASF isn't responsible for users fulfilling their > legal obligations. My concern there about the practical ability of users to > even be able to do the sort of recalculation necessary to properly act on > such a requirement. The minimum is probably to call out they need to clean > it up, but that may cause a lot of friction for our users. > > On Mon, Jan 13, 2020 at 5:52 PM Michael Miklavcic < > michael.miklav...@gmail.com> wrote: > > > +1 to Apache Legal advice > > > > On Mon, Jan 13, 2020 at 3:30 PM Otto Fowler > > wrote: > > > > > Justin, > > > > > > I have not read the new license, but the idea I believe is yes, > whomever > > > downloads and accepts the new license is then responsible for adherence > > to > > > the applicable law, > > > Which is why the Apache Foundation cannot be that entity I would think. > > > > > > We may want to send this past Apache Legal? > > > > > > > > > > > > On January 13, 2020 at 17:22:21, Justin Leet (justinjl...@gmail.com) > > > wrote: > > > > > > On the whole, I agree. I think the immediate focus should be to rip out > > > maxmind from default usage in master, even if it's done a bit roughly. > > > Getting master at least building for people would probably be a good > > first > > > step. > > > > > > Couple further thoughts > > > > > > - JUnit 5 supports a variety of conditional testing conditions. It > might > > > be possible to do "Test is enabled if property > > > maxmind.geo.database.location is set", and just provide instructions to > > > users on actually doing it instead of having to manually enable/disable > > the > > > test via code. > > > - Remove DB download/install from Ambari. I agree with removing the > > > actual dl/load, but still keeping the default location setup. > > > - I'm inclined to think it should be removed from the default demo > > > enrichments. We might be able to replace the GEO_GET with some dummy > > > stellar and just produce similar outputs, since our demo data has a > > limited > > > set of IPs, iirc. Obviously, we'd want to document that fairly well and > > > also let users manually set it up for non-demo data. > > > - CLI tool should be fine for letting users load, it's more or less why > > > it existed in the first place. Maybe we also add a check if someone > tries > > > using the old URL and let them know. > > > - We need to document that these changes occurred for CCPA reasons, so > > > that users can evaluate the consequences. > > > - Given that I am not even slightly a lawyer, my understanding here is > > > likely wrong, but are there further implications here? Say a user runs > a > > > bunch of data through Metron (both enrichments and profiler) using > these > > > IPs. A subset of those IPs is removed as it shouldn't be used. Would a > > user > > > then have to remove all data derived from these IPs (e.g. anything in > > > ES/Solr, anything in HDFS, any profiles using the data?). If they do > have > > > to remove it, I assume that's not directly out problem (since they have > > to > > > do cleanup), but we're probably not making it easy on them in terms of > > > providing the ability to clean up that sort of information and rerun > > > profiles and such. > > > > > > > > > On Mon, Jan 13, 2020 at 5:20 PM Otto Fowler > > > wrote: > > > > > > > I agree with all of that, my only thinking on contrib is, if that > > > component > > > > is not tested and has no coverage, and needs manual steps, then we > may > > > want > > > > to separate it. > > > > We would have to have some handle on full dev as well right? > > > > > > > > On January 13, 2020 at 16:57:13, Michael Miklavcic ( > > > > michael.miklav...@gmail.com) wrote: > > > > > > > > Hey Otto, > > > > > > > > As I mentioned above, we have had this issue with other components > > > before, > > > > e.g. mysql. I don't see a compelling reason to discontinue or push > this > > > > component to contrib just yet - it's a type of enrichment that > happens > > to > > > > require an additional manual step. Per the article ( > > > > > > > > > > > > > > > > > https://blog.maxmind.com/2019/12/18/significant-changes-to-ac
Re: [DISCUSS] GeoLite database licensing change, master build broken…..
To clarify a bit, on Otto's comment > I have not read the new license, but the idea I believe is yes, whomever > downloads and accepts the new license is then responsible for adherence to > the applicable law, > Which is why the Apache Foundation cannot be that entity I would think My concern is less the legal concern about anything regarding Metron or ASF, I 100% agree the ASF isn't responsible for users fulfilling their legal obligations. My concern there about the practical ability of users to even be able to do the sort of recalculation necessary to properly act on such a requirement. The minimum is probably to call out they need to clean it up, but that may cause a lot of friction for our users. On Mon, Jan 13, 2020 at 5:52 PM Michael Miklavcic < michael.miklav...@gmail.com> wrote: > +1 to Apache Legal advice > > On Mon, Jan 13, 2020 at 3:30 PM Otto Fowler > wrote: > > > Justin, > > > > I have not read the new license, but the idea I believe is yes, whomever > > downloads and accepts the new license is then responsible for adherence > to > > the applicable law, > > Which is why the Apache Foundation cannot be that entity I would think. > > > > We may want to send this past Apache Legal? > > > > > > > > On January 13, 2020 at 17:22:21, Justin Leet (justinjl...@gmail.com) > > wrote: > > > > On the whole, I agree. I think the immediate focus should be to rip out > > maxmind from default usage in master, even if it's done a bit roughly. > > Getting master at least building for people would probably be a good > first > > step. > > > > Couple further thoughts > > > > - JUnit 5 supports a variety of conditional testing conditions. It might > > be possible to do "Test is enabled if property > > maxmind.geo.database.location is set", and just provide instructions to > > users on actually doing it instead of having to manually enable/disable > the > > test via code. > > - Remove DB download/install from Ambari. I agree with removing the > > actual dl/load, but still keeping the default location setup. > > - I'm inclined to think it should be removed from the default demo > > enrichments. We might be able to replace the GEO_GET with some dummy > > stellar and just produce similar outputs, since our demo data has a > limited > > set of IPs, iirc. Obviously, we'd want to document that fairly well and > > also let users manually set it up for non-demo data. > > - CLI tool should be fine for letting users load, it's more or less why > > it existed in the first place. Maybe we also add a check if someone tries > > using the old URL and let them know. > > - We need to document that these changes occurred for CCPA reasons, so > > that users can evaluate the consequences. > > - Given that I am not even slightly a lawyer, my understanding here is > > likely wrong, but are there further implications here? Say a user runs a > > bunch of data through Metron (both enrichments and profiler) using these > > IPs. A subset of those IPs is removed as it shouldn't be used. Would a > user > > then have to remove all data derived from these IPs (e.g. anything in > > ES/Solr, anything in HDFS, any profiles using the data?). If they do have > > to remove it, I assume that's not directly out problem (since they have > to > > do cleanup), but we're probably not making it easy on them in terms of > > providing the ability to clean up that sort of information and rerun > > profiles and such. > > > > > > On Mon, Jan 13, 2020 at 5:20 PM Otto Fowler > > wrote: > > > > > I agree with all of that, my only thinking on contrib is, if that > > component > > > is not tested and has no coverage, and needs manual steps, then we may > > want > > > to separate it. > > > We would have to have some handle on full dev as well right? > > > > > > On January 13, 2020 at 16:57:13, Michael Miklavcic ( > > > michael.miklav...@gmail.com) wrote: > > > > > > Hey Otto, > > > > > > As I mentioned above, we have had this issue with other components > > before, > > > e.g. mysql. I don't see a compelling reason to discontinue or push this > > > component to contrib just yet - it's a type of enrichment that happens > to > > > require an additional manual step. Per the article ( > > > > > > > > > > > https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ > > > ), > > > > > > Maxmind is not requiring purchase, merely a user to register due to > GDPR > > > and CCPA requirements. That being said, it does cause issues for the > > > integration tests. I think we should minimally add an @Ignore > annotation > > to > > > the testLoadGeoIpDatabase test in MaxmindDbEnrichmentLoaderTest and > > > provide manual instructions for running the integration test if/when > > > someone submits a PR affecting this code. > > > > > > Users can still use the geolite DB by uploading it to HDFS. > > (Incidentally, > > > this even provides a crude mechanism for versioning, should a user > choose > > > to use the config path that way). We separa
Re: [DISCUSS] GeoLite database licensing change, master build broken…..
+1 to Apache Legal advice On Mon, Jan 13, 2020 at 3:30 PM Otto Fowler wrote: > Justin, > > I have not read the new license, but the idea I believe is yes, whomever > downloads and accepts the new license is then responsible for adherence to > the applicable law, > Which is why the Apache Foundation cannot be that entity I would think. > > We may want to send this past Apache Legal? > > > > On January 13, 2020 at 17:22:21, Justin Leet (justinjl...@gmail.com) > wrote: > > On the whole, I agree. I think the immediate focus should be to rip out > maxmind from default usage in master, even if it's done a bit roughly. > Getting master at least building for people would probably be a good first > step. > > Couple further thoughts > > - JUnit 5 supports a variety of conditional testing conditions. It might > be possible to do "Test is enabled if property > maxmind.geo.database.location is set", and just provide instructions to > users on actually doing it instead of having to manually enable/disable the > test via code. > - Remove DB download/install from Ambari. I agree with removing the > actual dl/load, but still keeping the default location setup. > - I'm inclined to think it should be removed from the default demo > enrichments. We might be able to replace the GEO_GET with some dummy > stellar and just produce similar outputs, since our demo data has a limited > set of IPs, iirc. Obviously, we'd want to document that fairly well and > also let users manually set it up for non-demo data. > - CLI tool should be fine for letting users load, it's more or less why > it existed in the first place. Maybe we also add a check if someone tries > using the old URL and let them know. > - We need to document that these changes occurred for CCPA reasons, so > that users can evaluate the consequences. > - Given that I am not even slightly a lawyer, my understanding here is > likely wrong, but are there further implications here? Say a user runs a > bunch of data through Metron (both enrichments and profiler) using these > IPs. A subset of those IPs is removed as it shouldn't be used. Would a user > then have to remove all data derived from these IPs (e.g. anything in > ES/Solr, anything in HDFS, any profiles using the data?). If they do have > to remove it, I assume that's not directly out problem (since they have to > do cleanup), but we're probably not making it easy on them in terms of > providing the ability to clean up that sort of information and rerun > profiles and such. > > > On Mon, Jan 13, 2020 at 5:20 PM Otto Fowler > wrote: > > > I agree with all of that, my only thinking on contrib is, if that > component > > is not tested and has no coverage, and needs manual steps, then we may > want > > to separate it. > > We would have to have some handle on full dev as well right? > > > > On January 13, 2020 at 16:57:13, Michael Miklavcic ( > > michael.miklav...@gmail.com) wrote: > > > > Hey Otto, > > > > As I mentioned above, we have had this issue with other components > before, > > e.g. mysql. I don't see a compelling reason to discontinue or push this > > component to contrib just yet - it's a type of enrichment that happens to > > require an additional manual step. Per the article ( > > > > > > https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ > > ), > > > > Maxmind is not requiring purchase, merely a user to register due to GDPR > > and CCPA requirements. That being said, it does cause issues for the > > integration tests. I think we should minimally add an @Ignore annotation > to > > the testLoadGeoIpDatabase test in MaxmindDbEnrichmentLoaderTest and > > provide manual instructions for running the integration test if/when > > someone submits a PR affecting this code. > > > > Users can still use the geolite DB by uploading it to HDFS. > (Incidentally, > > this even provides a crude mechanism for versioning, should a user choose > > to use the config path that way). We separated the deployment side of the > > geolite DB from the consumption side > > > > > > https://github.com/apache/metron/blob/master/metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/adapters/maxmind/geo/GeoLiteCityDatabase.java#L135 > > . > > > > All that needs to happen is the user would drop the file in HDFS. This > had > > previously happened via Ambari - see here - > > > > > > https://github.com/apache/metron/blob/master/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/enrichment_commands.py#L95 > > . > > > > We should probably do a couple things: > > > > 1. Minimally, remove the DB download and install from Ambari - we might > > choose to keep around the HDFS path creation so that we have a reasonable > > default prepped and ready. > > 2. Add documentation to manually install the geolite DB (register with > > maxmind) OR remove it from our default demo enrichments shipped with
Re: [DISCUSS] GeoLite database licensing change, master build broken…..
Justin, I have not read the new license, but the idea I believe is yes, whomever downloads and accepts the new license is then responsible for adherence to the applicable law, Which is why the Apache Foundation cannot be that entity I would think. We may want to send this past Apache Legal? On January 13, 2020 at 17:22:21, Justin Leet (justinjl...@gmail.com) wrote: On the whole, I agree. I think the immediate focus should be to rip out maxmind from default usage in master, even if it's done a bit roughly. Getting master at least building for people would probably be a good first step. Couple further thoughts - JUnit 5 supports a variety of conditional testing conditions. It might be possible to do "Test is enabled if property maxmind.geo.database.location is set", and just provide instructions to users on actually doing it instead of having to manually enable/disable the test via code. - Remove DB download/install from Ambari. I agree with removing the actual dl/load, but still keeping the default location setup. - I'm inclined to think it should be removed from the default demo enrichments. We might be able to replace the GEO_GET with some dummy stellar and just produce similar outputs, since our demo data has a limited set of IPs, iirc. Obviously, we'd want to document that fairly well and also let users manually set it up for non-demo data. - CLI tool should be fine for letting users load, it's more or less why it existed in the first place. Maybe we also add a check if someone tries using the old URL and let them know. - We need to document that these changes occurred for CCPA reasons, so that users can evaluate the consequences. - Given that I am not even slightly a lawyer, my understanding here is likely wrong, but are there further implications here? Say a user runs a bunch of data through Metron (both enrichments and profiler) using these IPs. A subset of those IPs is removed as it shouldn't be used. Would a user then have to remove all data derived from these IPs (e.g. anything in ES/Solr, anything in HDFS, any profiles using the data?). If they do have to remove it, I assume that's not directly out problem (since they have to do cleanup), but we're probably not making it easy on them in terms of providing the ability to clean up that sort of information and rerun profiles and such. On Mon, Jan 13, 2020 at 5:20 PM Otto Fowler wrote: > I agree with all of that, my only thinking on contrib is, if that component > is not tested and has no coverage, and needs manual steps, then we may want > to separate it. > We would have to have some handle on full dev as well right? > > On January 13, 2020 at 16:57:13, Michael Miklavcic ( > michael.miklav...@gmail.com) wrote: > > Hey Otto, > > As I mentioned above, we have had this issue with other components before, > e.g. mysql. I don't see a compelling reason to discontinue or push this > component to contrib just yet - it's a type of enrichment that happens to > require an additional manual step. Per the article ( > > https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ > ), > > Maxmind is not requiring purchase, merely a user to register due to GDPR > and CCPA requirements. That being said, it does cause issues for the > integration tests. I think we should minimally add an @Ignore annotation to > the testLoadGeoIpDatabase test in MaxmindDbEnrichmentLoaderTest and > provide manual instructions for running the integration test if/when > someone submits a PR affecting this code. > > Users can still use the geolite DB by uploading it to HDFS. (Incidentally, > this even provides a crude mechanism for versioning, should a user choose > to use the config path that way). We separated the deployment side of the > geolite DB from the consumption side > > https://github.com/apache/metron/blob/master/metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/adapters/maxmind/geo/GeoLiteCityDatabase.java#L135 > . > > All that needs to happen is the user would drop the file in HDFS. This had > previously happened via Ambari - see here - > > https://github.com/apache/metron/blob/master/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/enrichment_commands.py#L95 > . > > We should probably do a couple things: > > 1. Minimally, remove the DB download and install from Ambari - we might > choose to keep around the HDFS path creation so that we have a reasonable > default prepped and ready. > 2. Add documentation to manually install the geolite DB (register with > maxmind) OR remove it from our default demo enrichments shipped with the > platform. > > Manual DB file loading would be managed using the CLI tool. You should be > able to load it via local file URL ( > https://en.wikipedia.org/wiki/File_URI_scheme) or via a custom hosted > solution via the --geo_url option. See more details here - > > https://github.com/ap
Re: [DISCUSS] GeoLite database licensing change, master build broken…..
Maybe we should have a stellar command for this On January 13, 2020 at 17:22:21, Justin Leet (justinjl...@gmail.com) wrote: On the whole, I agree. I think the immediate focus should be to rip out maxmind from default usage in master, even if it's done a bit roughly. Getting master at least building for people would probably be a good first step. Couple further thoughts - JUnit 5 supports a variety of conditional testing conditions. It might be possible to do "Test is enabled if property maxmind.geo.database.location is set", and just provide instructions to users on actually doing it instead of having to manually enable/disable the test via code. - Remove DB download/install from Ambari. I agree with removing the actual dl/load, but still keeping the default location setup. - I'm inclined to think it should be removed from the default demo enrichments. We might be able to replace the GEO_GET with some dummy stellar and just produce similar outputs, since our demo data has a limited set of IPs, iirc. Obviously, we'd want to document that fairly well and also let users manually set it up for non-demo data. - CLI tool should be fine for letting users load, it's more or less why it existed in the first place. Maybe we also add a check if someone tries using the old URL and let them know. - We need to document that these changes occurred for CCPA reasons, so that users can evaluate the consequences. - Given that I am not even slightly a lawyer, my understanding here is likely wrong, but are there further implications here? Say a user runs a bunch of data through Metron (both enrichments and profiler) using these IPs. A subset of those IPs is removed as it shouldn't be used. Would a user then have to remove all data derived from these IPs (e.g. anything in ES/Solr, anything in HDFS, any profiles using the data?). If they do have to remove it, I assume that's not directly out problem (since they have to do cleanup), but we're probably not making it easy on them in terms of providing the ability to clean up that sort of information and rerun profiles and such. On Mon, Jan 13, 2020 at 5:20 PM Otto Fowler wrote: > I agree with all of that, my only thinking on contrib is, if that component > is not tested and has no coverage, and needs manual steps, then we may want > to separate it. > We would have to have some handle on full dev as well right? > > On January 13, 2020 at 16:57:13, Michael Miklavcic ( > michael.miklav...@gmail.com) wrote: > > Hey Otto, > > As I mentioned above, we have had this issue with other components before, > e.g. mysql. I don't see a compelling reason to discontinue or push this > component to contrib just yet - it's a type of enrichment that happens to > require an additional manual step. Per the article ( > > https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ > ), > > Maxmind is not requiring purchase, merely a user to register due to GDPR > and CCPA requirements. That being said, it does cause issues for the > integration tests. I think we should minimally add an @Ignore annotation to > the testLoadGeoIpDatabase test in MaxmindDbEnrichmentLoaderTest and > provide manual instructions for running the integration test if/when > someone submits a PR affecting this code. > > Users can still use the geolite DB by uploading it to HDFS. (Incidentally, > this even provides a crude mechanism for versioning, should a user choose > to use the config path that way). We separated the deployment side of the > geolite DB from the consumption side > > https://github.com/apache/metron/blob/master/metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/adapters/maxmind/geo/GeoLiteCityDatabase.java#L135 > . > > All that needs to happen is the user would drop the file in HDFS. This had > previously happened via Ambari - see here - > > https://github.com/apache/metron/blob/master/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/enrichment_commands.py#L95 > . > > We should probably do a couple things: > > 1. Minimally, remove the DB download and install from Ambari - we might > choose to keep around the HDFS path creation so that we have a reasonable > default prepped and ready. > 2. Add documentation to manually install the geolite DB (register with > maxmind) OR remove it from our default demo enrichments shipped with the > platform. > > Manual DB file loading would be managed using the CLI tool. You should be > able to load it via local file URL ( > https://en.wikipedia.org/wiki/File_URI_scheme) or via a custom hosted > solution via the --geo_url option. See more details here - > > https://github.com/apache/metron/tree/master/metron-platform/metron-data-management#geolite2-loader > > Anything else I'm missing? Probably worth some feedback from Justin Leet on > this as well. > > Thanks, > Mike > > On Mon, Jan 13, 2020 at 1:59 PM Otto Fowler > wr
Re: [DISCUSS] GeoLite database licensing change, master build broken…..
On the whole, I agree. I think the immediate focus should be to rip out maxmind from default usage in master, even if it's done a bit roughly. Getting master at least building for people would probably be a good first step. Couple further thoughts - JUnit 5 supports a variety of conditional testing conditions. It might be possible to do "Test is enabled if property maxmind.geo.database.location is set", and just provide instructions to users on actually doing it instead of having to manually enable/disable the test via code. - Remove DB download/install from Ambari. I agree with removing the actual dl/load, but still keeping the default location setup. - I'm inclined to think it should be removed from the default demo enrichments. We might be able to replace the GEO_GET with some dummy stellar and just produce similar outputs, since our demo data has a limited set of IPs, iirc. Obviously, we'd want to document that fairly well and also let users manually set it up for non-demo data. - CLI tool should be fine for letting users load, it's more or less why it existed in the first place. Maybe we also add a check if someone tries using the old URL and let them know. - We need to document that these changes occurred for CCPA reasons, so that users can evaluate the consequences. - Given that I am not even slightly a lawyer, my understanding here is likely wrong, but are there further implications here? Say a user runs a bunch of data through Metron (both enrichments and profiler) using these IPs. A subset of those IPs is removed as it shouldn't be used. Would a user then have to remove all data derived from these IPs (e.g. anything in ES/Solr, anything in HDFS, any profiles using the data?). If they do have to remove it, I assume that's not directly out problem (since they have to do cleanup), but we're probably not making it easy on them in terms of providing the ability to clean up that sort of information and rerun profiles and such. On Mon, Jan 13, 2020 at 5:20 PM Otto Fowler wrote: > I agree with all of that, my only thinking on contrib is, if that component > is not tested and has no coverage, and needs manual steps, then we may want > to separate it. > We would have to have some handle on full dev as well right? > > On January 13, 2020 at 16:57:13, Michael Miklavcic ( > michael.miklav...@gmail.com) wrote: > > Hey Otto, > > As I mentioned above, we have had this issue with other components before, > e.g. mysql. I don't see a compelling reason to discontinue or push this > component to contrib just yet - it's a type of enrichment that happens to > require an additional manual step. Per the article ( > > https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ > ), > > Maxmind is not requiring purchase, merely a user to register due to GDPR > and CCPA requirements. That being said, it does cause issues for the > integration tests. I think we should minimally add an @Ignore annotation to > the testLoadGeoIpDatabase test in MaxmindDbEnrichmentLoaderTest and > provide manual instructions for running the integration test if/when > someone submits a PR affecting this code. > > Users can still use the geolite DB by uploading it to HDFS. (Incidentally, > this even provides a crude mechanism for versioning, should a user choose > to use the config path that way). We separated the deployment side of the > geolite DB from the consumption side > > https://github.com/apache/metron/blob/master/metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/adapters/maxmind/geo/GeoLiteCityDatabase.java#L135 > . > > All that needs to happen is the user would drop the file in HDFS. This had > previously happened via Ambari - see here - > > https://github.com/apache/metron/blob/master/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/enrichment_commands.py#L95 > . > > We should probably do a couple things: > > 1. Minimally, remove the DB download and install from Ambari - we might > choose to keep around the HDFS path creation so that we have a reasonable > default prepped and ready. > 2. Add documentation to manually install the geolite DB (register with > maxmind) OR remove it from our default demo enrichments shipped with the > platform. > > Manual DB file loading would be managed using the CLI tool. You should be > able to load it via local file URL ( > https://en.wikipedia.org/wiki/File_URI_scheme) or via a custom hosted > solution via the --geo_url option. See more details here - > > https://github.com/apache/metron/tree/master/metron-platform/metron-data-management#geolite2-loader > > Anything else I'm missing? Probably worth some feedback from Justin Leet on > this as well. > > Thanks, > Mike > > On Mon, Jan 13, 2020 at 1:59 PM Otto Fowler > wrote: > > > Hi Tom, that is true, and I thin
Re: [DISCUSS] GeoLite database licensing change, master build broken…..
I agree with all of that, my only thinking on contrib is, if that component is not tested and has no coverage, and needs manual steps, then we may want to separate it. We would have to have some handle on full dev as well right? On January 13, 2020 at 16:57:13, Michael Miklavcic ( michael.miklav...@gmail.com) wrote: Hey Otto, As I mentioned above, we have had this issue with other components before, e.g. mysql. I don't see a compelling reason to discontinue or push this component to contrib just yet - it's a type of enrichment that happens to require an additional manual step. Per the article ( https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/), Maxmind is not requiring purchase, merely a user to register due to GDPR and CCPA requirements. That being said, it does cause issues for the integration tests. I think we should minimally add an @Ignore annotation to the testLoadGeoIpDatabase test in MaxmindDbEnrichmentLoaderTest and provide manual instructions for running the integration test if/when someone submits a PR affecting this code. Users can still use the geolite DB by uploading it to HDFS. (Incidentally, this even provides a crude mechanism for versioning, should a user choose to use the config path that way). We separated the deployment side of the geolite DB from the consumption side https://github.com/apache/metron/blob/master/metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/adapters/maxmind/geo/GeoLiteCityDatabase.java#L135. All that needs to happen is the user would drop the file in HDFS. This had previously happened via Ambari - see here - https://github.com/apache/metron/blob/master/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/enrichment_commands.py#L95. We should probably do a couple things: 1. Minimally, remove the DB download and install from Ambari - we might choose to keep around the HDFS path creation so that we have a reasonable default prepped and ready. 2. Add documentation to manually install the geolite DB (register with maxmind) OR remove it from our default demo enrichments shipped with the platform. Manual DB file loading would be managed using the CLI tool. You should be able to load it via local file URL ( https://en.wikipedia.org/wiki/File_URI_scheme) or via a custom hosted solution via the --geo_url option. See more details here - https://github.com/apache/metron/tree/master/metron-platform/metron-data-management#geolite2-loader Anything else I'm missing? Probably worth some feedback from Justin Leet on this as well. Thanks, Mike On Mon, Jan 13, 2020 at 1:59 PM Otto Fowler wrote: > Hi Tom, that is true, and I think that is the only viable approach. We do > however use the database for testing during build, and we do however setup > the components that use the data base in the ’sample’ flow with the > simulated sensors for our vagrant deploy… and our contrib/docker deploy > etc. > > So, having the user download their own properly licensed version ( and > having the user responsible for the privacy law issues ) is fine, but I > think we need to talk through all the ways we are going to change the > build, what it means for testing that component ( does it move to contrib ? > ), and the default deployment to vagrant/topology. > > > > On January 13, 2020 at 13:41:01, Yerex, Tom (tom.ye...@ubc.ca) wrote: > > Hi Otto, > > Thank you for raising this in the discussion. > > It seems to me that Maxmind is proactive about providing instructions and > code to deliver updates to the local system. I can recall being surprised > that the current Metron solution seemed to do more than I expected, i.e., I > thought I would need to get Maxmind files into the local file system where > Metron would pick those up and load them into HDFS and instead Metron did > it all. > > Perhaps the approach to simplify Metron and have it load files from the > local file system into HDFS, how you get the files to the local file system > is up to you? > > > On 2020-01-13, 3:52 AM, "Otto Fowler" wrote: > > https://issues.apache.org/jira/browse/METRON–2340 > > > https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ > > Maxmind has changed the way the distribute and license the geolite2 > database that we use in our builds and distribution. > > Master build is broken, and users are having issues setting up metron ( > https://the-asf.slack.com/archives/CB7Q6AN3T/p1578556024012200) > > > We need to fix the build and figure out how we are going to move on from > this. >
Re: [DISCUSS] GeoLite database licensing change, master build broken…..
Hey Otto, As I mentioned above, we have had this issue with other components before, e.g. mysql. I don't see a compelling reason to discontinue or push this component to contrib just yet - it's a type of enrichment that happens to require an additional manual step. Per the article ( https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/), Maxmind is not requiring purchase, merely a user to register due to GDPR and CCPA requirements. That being said, it does cause issues for the integration tests. I think we should minimally add an @Ignore annotation to the testLoadGeoIpDatabase test in MaxmindDbEnrichmentLoaderTest and provide manual instructions for running the integration test if/when someone submits a PR affecting this code. Users can still use the geolite DB by uploading it to HDFS. (Incidentally, this even provides a crude mechanism for versioning, should a user choose to use the config path that way). We separated the deployment side of the geolite DB from the consumption side https://github.com/apache/metron/blob/master/metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/adapters/maxmind/geo/GeoLiteCityDatabase.java#L135. All that needs to happen is the user would drop the file in HDFS. This had previously happened via Ambari - see here - https://github.com/apache/metron/blob/master/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/enrichment_commands.py#L95. We should probably do a couple things: 1. Minimally, remove the DB download and install from Ambari - we might choose to keep around the HDFS path creation so that we have a reasonable default prepped and ready. 2. Add documentation to manually install the geolite DB (register with maxmind) OR remove it from our default demo enrichments shipped with the platform. Manual DB file loading would be managed using the CLI tool. You should be able to load it via local file URL ( https://en.wikipedia.org/wiki/File_URI_scheme) or via a custom hosted solution via the --geo_url option. See more details here - https://github.com/apache/metron/tree/master/metron-platform/metron-data-management#geolite2-loader Anything else I'm missing? Probably worth some feedback from Justin Leet on this as well. Thanks, Mike On Mon, Jan 13, 2020 at 1:59 PM Otto Fowler wrote: > Hi Tom, that is true, and I think that is the only viable approach. We do > however use the database for testing during build, and we do however setup > the components that use the data base in the ’sample’ flow with the > simulated sensors for our vagrant deploy… and our contrib/docker deploy > etc. > > So, having the user download their own properly licensed version ( and > having the user responsible for the privacy law issues ) is fine, but I > think we need to talk through all the ways we are going to change the > build, what it means for testing that component ( does it move to contrib ? > ), and the default deployment to vagrant/topology. > > > > On January 13, 2020 at 13:41:01, Yerex, Tom (tom.ye...@ubc.ca) wrote: > > Hi Otto, > > Thank you for raising this in the discussion. > > It seems to me that Maxmind is proactive about providing instructions and > code to deliver updates to the local system. I can recall being surprised > that the current Metron solution seemed to do more than I expected, i.e., I > thought I would need to get Maxmind files into the local file system where > Metron would pick those up and load them into HDFS and instead Metron did > it all. > > Perhaps the approach to simplify Metron and have it load files from the > local file system into HDFS, how you get the files to the local file system > is up to you? > > > On 2020-01-13, 3:52 AM, "Otto Fowler" wrote: > > https://issues.apache.org/jira/browse/METRON–2340 > > > https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ > > Maxmind has changed the way the distribute and license the geolite2 > database that we use in our builds and distribution. > > Master build is broken, and users are having issues setting up metron ( > https://the-asf.slack.com/archives/CB7Q6AN3T/p1578556024012200) > > > We need to fix the build and figure out how we are going to move on from > this. >
Re: [DISCUSS] GeoLite database licensing change, master build broken…..
Hi Tom, that is true, and I think that is the only viable approach. We do however use the database for testing during build, and we do however setup the components that use the data base in the ’sample’ flow with the simulated sensors for our vagrant deploy… and our contrib/docker deploy etc. So, having the user download their own properly licensed version ( and having the user responsible for the privacy law issues ) is fine, but I think we need to talk through all the ways we are going to change the build, what it means for testing that component ( does it move to contrib ? ), and the default deployment to vagrant/topology. On January 13, 2020 at 13:41:01, Yerex, Tom (tom.ye...@ubc.ca) wrote: Hi Otto, Thank you for raising this in the discussion. It seems to me that Maxmind is proactive about providing instructions and code to deliver updates to the local system. I can recall being surprised that the current Metron solution seemed to do more than I expected, i.e., I thought I would need to get Maxmind files into the local file system where Metron would pick those up and load them into HDFS and instead Metron did it all. Perhaps the approach to simplify Metron and have it load files from the local file system into HDFS, how you get the files to the local file system is up to you? On 2020-01-13, 3:52 AM, "Otto Fowler" wrote: https://issues.apache.org/jira/browse/METRON–2340 https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ Maxmind has changed the way the distribute and license the geolite2 database that we use in our builds and distribution. Master build is broken, and users are having issues setting up metron ( https://the-asf.slack.com/archives/CB7Q6AN3T/p1578556024012200) We need to fix the build and figure out how we are going to move on from this.
Re: [DISCUSS] GeoLite database licensing change, master build broken…..
Hi Otto, Thank you for raising this in the discussion. It seems to me that Maxmind is proactive about providing instructions and code to deliver updates to the local system. I can recall being surprised that the current Metron solution seemed to do more than I expected, i.e., I thought I would need to get Maxmind files into the local file system where Metron would pick those up and load them into HDFS and instead Metron did it all. Perhaps the approach to simplify Metron and have it load files from the local file system into HDFS, how you get the files to the local file system is up to you? On 2020-01-13, 3:52 AM, "Otto Fowler" wrote: https://issues.apache.org/jira/browse/METRON–2340 https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ Maxmind has changed the way the distribute and license the geolite2 database that we use in our builds and distribution. Master build is broken, and users are having issues setting up metron ( https://the-asf.slack.com/archives/CB7Q6AN3T/p1578556024012200) We need to fix the build and figure out how we are going to move on from this. smime.p7s Description: S/MIME cryptographic signature
Re: [DISCUSS] GeoLite database licensing change, master build broken…..
I haven't had a chance to read the details yet, but we may need to make this portion of the dev environment manual. Other libraries, e.g. mysql, had issues like this as well. On Mon, Jan 13, 2020 at 4:52 AM Otto Fowler wrote: > https://issues.apache.org/jira/browse/METRON–2340 > > > https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ > > Maxmind has changed the way the distribute and license the geolite2 > database that we use in our builds and distribution. > > Master build is broken, and users are having issues setting up metron ( > https://the-asf.slack.com/archives/CB7Q6AN3T/p1578556024012200) > > > We need to fix the build and figure out how we are going to move on from > this. >
[DISCUSS] GeoLite database licensing change, master build broken…..
https://issues.apache.org/jira/browse/METRON–2340 https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ Maxmind has changed the way the distribute and license the geolite2 database that we use in our builds and distribution. Master build is broken, and users are having issues setting up metron ( https://the-asf.slack.com/archives/CB7Q6AN3T/p1578556024012200) We need to fix the build and figure out how we are going to move on from this.