[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/2671 If you take a look here https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#core-properties-br and checkout the 'nifi.nar.library.directory' section you can see how someone could be guided to create a directory for custom nars, add that in there, and start up. Once we have support for extensions in the NiFi Registry this will be beautiful/easy. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user ryanjdew commented on the issue: https://github.com/apache/nifi/pull/2671 @joewitt I understand the concern. We would like to simplify the user experience of using NiFi with MarkLogic processors. We have a current model for creating and releasing NARs via a GitHub releases page, but would you happen to have a good example of a process using Maven publishing with NARs in NiFi? ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/2671 as a good example where we should have though it through more as well we have some cool work done by the InfluxDB folks. They're now wanting to improve it in https://github.com/apache/nifi/pull/2743 But the reality is we just don't have people knowledgeable enough to do reliable code reviews/testing of this. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/2671 @ryanjdew We have addressed Travis-CI related tests issues and the build has been stable now since. It could break again as people add timing dependent tests that behave wildly different on slow environments but we'll see. This PR, assuming the L and such are sorted now, is just tricky because we need a committer with time to devote to learning MarkLogic enough to setup an environment and verify function or leverage a provided instance (not a sustainable model). It is a good example where the limits of what the community can reasonably support comes into play. Now, this said this is probably a really cool and useful thing to offer folks and beneficial to both the NiFi userbase and MarkLogic user base. This is why I suggested MarkLogic folks just hang onto this code in some public github repo and have it be ALv2. They can publish their nars into maven central or wherever they do and provide instructions to it. I'd be supportive, and I'd assume the community at large would, of having links to such extensions on the apache website. This feels to me like the best tradeoff right now for all parties. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user ryanjdew commented on the issue: https://github.com/apache/nifi/pull/2671 Following up, are there any other concerns with this PR? If needed, I can provide credentials to a MarkLogic instance for testing. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/2671 team; given https://github.com/marklogic/nifi/releases could we consider closing this PR and keeping the MarkLogic artifact creation/maintenance something MarkLogic takes care of at this time? It is a perfectly fine model. We could even create a nifi web page to point at vendor/other community managed/supported extensions possibly. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user MikeThomsen commented on the issue: https://github.com/apache/nifi/pull/2671 @vivekmuniyandi you have a merge conflict now. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user vivekmuniyandi commented on the issue: https://github.com/apache/nifi/pull/2671 @joewitt I have addressed all your comments except for the License and Notice comments. Can you please let us know what more should we add apart from the LICENSE and NOTICE file prepared by our legal team which we have included in the root directory of the nar? That constitutes for all the dependecies added. What more should be added? Please help. Thanks. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user vivekmuniyandi commented on the issue: https://github.com/apache/nifi/pull/2671 @MikeThomsen Sure, will add that to our backlog. Thanks! ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user MikeThomsen commented on the issue: https://github.com/apache/nifi/pull/2671 @vivekmuniyandi Unrelated, but I don't have your email: consider adding a MarkLogicLookupService in a future sprint. You can look at HBaseLookupService and MongoDBLookupService as examples. Might be highly useful to users to be able to enrich a record set using MarkLogic. I have PRs open for some LookupService-related tasks that add some additional schema-related capabilities and those might be useful to your team on that issue. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user vivekmuniyandi commented on the issue: https://github.com/apache/nifi/pull/2671 Thanks @joewitt for the comment. I am addressing all the changes you have mentioned. I will address the SSLContext and remove the Kerberos and Certificate auth for now. ``` The other thing that needs to happen is the nar bundles need their LICENSE/NOTICE file(s) added if necessary. I looked at one of the nars and there would definitely need to be entries.``` Wrt this, I have a [LICENSE](https://github.com/apache/nifi/pull/2671/files#diff-53deed39bf31085fbecf77ea6a2382dc) and [NOTICE](https://github.com/apache/nifi/pull/2671/files#diff-a2f6b487a7a70d5f43fa320730b2c87a) file prepared by our legal team in the root directory (to account for the contents of the root directory and the sub directories) of the nifi marklogic bundle. That constitutes for all the dependecies added in our MarkLogic bundle. Should we do something more? Can you explain a bit more here as to what is required? Thanks for all the help? ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/2671 Ok i've attached a patch which helps with some aspects of POM construction, flagging things like resource utilization since it appears to be loading full content into memory, and renaming the service to indicate it is a MarkLogic service rather than just a database service. There is an outstanding need to sort out the security configuration. For SSLContext stuff those things should utilize the standard mechanism of obtaining that as you can follow from a number of other processors. Also, there is a kerberos context for security setting but there does not appear to be any associated settings for the user. The security configurations should be removed in favor of simple/digest for now OR completed and with some consistency to other items. For security relevant things CVEs become a concern so we take these more seriously. For things about the performance/logic of the processor interaction with MarkLogic that we can improve over time if needed but security we want to get right up front. The other thing that needs to happen is the nar bundles need their LICENSE/NOTICE file(s) added if necessary. I looked at one of the nars and there would definitely need to be entries. Please try adding these in like other nars and I'm happy to help tweak it to get it to the finish line. If you have questions on how to achieve any of the above please ask. Show an example nar you looked at which is similar so that we can best help close remaining gaps but from a place of good examples that you've looked at. Thanks ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user vivekmuniyandi commented on the issue: https://github.com/apache/nifi/pull/2671 @MikeThomsen We don't have a public MarkLogic Docker image but we do have [this](https://hub.docker.com/r/patrickmcelwee/marklogic-dependencies/) on Docker Hub which would give you a head start on having a MarkLogic instance up and running. I will drop them an email and I will work on getting a secure access to the cluster. Thanks for all the help. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user MikeThomsen commented on the issue: https://github.com/apache/nifi/pull/2671 @vivekmuniyandi I'll try to find time to set up a MarkLogic node for testing some time in the next few days (day job and such is getting in the way). In the mean time, I would suggest reaching out to @joewitt directly to see if he or any of his folks have any time they can spare to jump in and help you out. Also, just a suggestion, you might want to think about setting up a secure cluster that you can privately allow reviewers to access so we/they can work with you to confirm everything works the way you expect. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user vivekmuniyandi commented on the issue: https://github.com/apache/nifi/pull/2671 Thanks @MikeThomsen for the comment. `PutMarkLogicRecord` is definitely on our roadmap but I am not sure when we will be able to get to it. We have an internal sprint for NiFi. We will add this to our backlog, check with PM and address this with priority. We don't want to keep this PR waiting for that processor. I would raise a separate PR in the future for that. Thanks! ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user MikeThomsen commented on the issue: https://github.com/apache/nifi/pull/2671 BTW, I would strongly recommend your team discuss adding a `PutMarkLogicRecord` processor so you can do a bulk ingestion invent from a single flowfile. We have quite a few good implementations such as ones for HBase, ElasticSearch and MongoDB that you can use/steal from to make it happen. Would **strongly** recommend you do that because it'll make bulk ingestion of very large data sets go much faster for MarkLogic. If you want to do that, feel free to just start work on it and push changes into this PR and we'll just keep going. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user MikeThomsen commented on the issue: https://github.com/apache/nifi/pull/2671 @vivekmuniyandi I'll try to build a MarkLogic Docker image and share it on Docker Hub so others can use that if they want. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user vivekmuniyandi commented on the issue: https://github.com/apache/nifi/pull/2671 Thanks @MikeThomsen ! Have made the changes. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user vivekmuniyandi commented on the issue: https://github.com/apache/nifi/pull/2671 I followed this link - https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide#ContributorGuide-Keepingyourfeaturebranchcurrent and looks like this pulled other's commits as well. ---
[GitHub] nifi issue #2671: NiFi-5102 - Adding Processors for MarkLogic DB
Github user MikeThomsen commented on the issue: https://github.com/apache/nifi/pull/2671 @vivekmuniyandi looks like you pulled a few other folks' commits in with your last push. Do this to clear that up: 1. git checkout master 2. git pull master 3. git checkout nifi-5102 4. git rebase master 5. git push marklogic --force nifi-5102 You probably did a pull on master into nifi-5102. You want to avoid that for your own sanity's sake and use a rebase instead. ---