Cluster Warnings

2018-10-14 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Hello,

We're running a 4-node cluster on NiFi 1.7.1. The fourth node was added 
recently and as soon as we added the 4th node, we started seeing below warnings

Response time from NODE2 was slow for each of the last 3 requests made. To see 
more information about timing, enable DEBUG logging for 
org.apache.nifi.cluster.coordination.http.replication.ThreadPoolRequestReplicator

Initially we though the problem was with the recent node added and cross 
checked all the configs on the box and everything seemed to be just fine. After 
enabling the DEBUG mode for cluster logging we noticed that the warning is not 
specific to any node and every-time we see a warning like above there is one 
slow node which takes forever to send a response like below (in this case the 
slow node is NIFI04). Sometimes these will lead to node-disconnects needing a 
manual intervention.

DEBUG [Replicate Request Thread-50] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
Node Responses for GET /nifi-api/site-to-site (Request ID 
b2c6e983-5233-4007-bd54-13d21b7068d5):
NIFI04:8443: 1386 millis
NIFI02:8443: 3 millis
NIFI01:8443: 5 millis
NIFI03:8443: 3 millis
DEBUG [Replicate Request Thread-41] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
Node Responses for GET /nifi-api/site-to-site (Request ID 
d182fdab-f1d4-4ac9-97fd-e24c41dc4622):
NIFI04:8443: 1143 millis
NIFI02:8443: 22 millis
NIFI01:8443: 3 millis
NIFI03:8443: 2 millis
DEBUG [Replicate Request Thread-31] o.a.n.c.c.h.r.ThreadPoolRequestReplicator 
Node Responses for GET /nifi-api/site-to-site (Request ID 
e4726027-27c7-4bbb-8ab6-d02bb41f1920):
NIFI04:8443: 1053 millis
NIFI02:8443: 3 millis
NIFI01:8443: 3 millis
NIFI03:8443: 2 millis

We tried changing the configurations in nifi.properties like bumping up the 
"nifi.cluster.node.protocol.max.threads" but none of them seems to be working 
and we're still stuck with the slow communication between the nodes. We use an 
external zookeeper as this is our production server.
Below are some of our configs

# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=fslhdppnifi01.imfs.micron.com
nifi.cluster.node.protocol.port=11443
nifi.cluster.node.protocol.threads=100
nifi.cluster.node.protocol.max.threads=120
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=90 sec
nifi.cluster.node.read.timeout=90 sec
nifi.cluster.node.max.concurrent.requests=1000
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=30 sec
nifi.cluster.flow.election.max.candidates=

Any thoughts on why this is happening?


-Karthik


Re: [DISCUSS] Closing in on a release of NiFi 1.8.0?

2018-10-14 Thread Koji Kawamura
Jeff, Sivasprasanna,

NIFI-5698 (PR3073) Fixing DeleteAzureBlob bug is merged.

Thanks,
Koji
On Mon, Oct 15, 2018 at 10:18 AM Koji Kawamura  wrote:
>
> Thank you for the fix Sivaprasanna,
> I have Azure account. Reviewing it now.
>
> Koji
> On Sun, Oct 14, 2018 at 11:21 PM Jeff  wrote:
> >
> > Sivaprasanna,
> >
> > Thanks for submitting a pull request for that issue!  Later today or
> > tomorrow I'll have to check to see if I've already used up my free-tier
> > access to Azure.  If I still have access, I can review your PR and we'll
> > get it into 1.8.0.
> >
> > On Sun, Oct 14, 2018 at 4:30 AM Sivaprasanna 
> > wrote:
> >
> > > All - Just found one bug with DeleteAzureBlobStorage processor. It was
> > > shared by one user on StackOverflow [1] and I later confirmed it. It looks
> > > to be introduced by NIFI-4199. I have created a Jira [2] and made the
> > > necessary changes (not huge, just few lines) and raised a PR [3]. I think,
> > > if we can spend a little time in getting it reviewed, we can mark it for
> > > 1.8.0. Thoughts?
> > >
> > > [1] -
> > >
> > > https://stackoverflow.com/questions/52766991/apache-nifi-deleteazureblobstorage-processor-is-throwing-an-error
> > > [2] - https://issues.apache.org/jira/browse/NIFI-5698
> > > [3] - https://github.com/apache/nifi/pull/3073
> > >
> > > -
> > > Sivaprasanna
> > >
> > > On Fri, Oct 12, 2018 at 9:05 PM Mike Thomsen 
> > > wrote:
> > >
> > > > 4811 should be ready for review now. Rebased and cleaned it up with a
> > > full
> > > > listing of the Spring dependencies.
> > > >
> > > > On Fri, Oct 12, 2018 at 11:23 AM Joe Witt  wrote:
> > > >
> > > > > Jeff,
> > > > >
> > > > > I think for anything not tagged to 1.8.0 we just keep rolling.  For
> > > > > anything tagged 1.8.0 that should not be we should remove it until
> > > > > ready.  For things tagged to 1.8.0 that cannot be moved we should
> > > > > resolve.  For the tagged 1.8.0 section you had.
> > > > >
> > > > >- NIFI-4811  -
> > > Use a
> > > > >newer version of spring-data-redis
> > > > >- PR 2856 
> > > > > *This needs to be resolved by either reverting the commit or ensuring
> > > > > L accurately reflects all.  We have to do this always and for every
> > > > > nar.  The process isnt easy or fun but it is necessary to produce
> > > > > valid ASF releases.  Landing commits which change dependencies
> > > > > requires this due diligence.  Now, we've put a lot of energy into
> > > > > updating Spring dependencies because some older Spring libs had
> > > > > vulnerabilities which while we likely aren't exposed to them we want
> > > > > to fix in due course.  So reverting may require more analysis than if
> > > > > we were just get L fixed with this new change.  I commented on the
> > > > > JIRA.  But this needs to be resolved.
> > > > >
> > > > >
> > > > >- NIFI-5426  - Use
> > > > >NIO.2 API for ListFile to avoid multiple disk reads
> > > > >   - PR 2889 
> > > > > *This just needed to be marked resolved.  The commit went in the day
> > > > > after we cut 1.7.1.  So this one is sorted.
> > > > >
> > > > >- NIFI-5448  -
> > > > Failed
> > > > >EL date parsing live-locks processors without a failure 
> > > > > relationship
> > > > > * The commit needs to be reverted.  I'm working on that now.  Once the
> > > > > discsusion/concerns are addressed this can get dealt with.
> > > > >
> > > > >- NIFI-5665  -
> > > > Upgrade
> > > > >io.netty dependencies
> > > > > * This looks important to get resolved if possible as old netty libs
> > > > > are on the list of things with vulnerabilities.
> > > > >
> > > > >- NIFI-5686  -
> > > Test
> > > > >failure in TestStandardProcessScheduler
> > > > >- PR 3062 
> > > > > * This has a PR but a test, possibly two, failed in one of the travis
> > > > > runs and it is clearly related.  I ignored one of those tests in a
> > > > > previous run.  We must deal with brittle tests.  But the underlying
> > > > > problem is important to solve here so either the tests needs improved
> > > > > or we still have an issue.  Not clear but worth some focus.
> > > > >
> > > > > note: I intend to reference updates to libraries that have known
> > > > > vulnerabilities and do so in a far less subtle manner than we had.  We
> > > > > aren't acknowledging that NiFi is or exposes vulnerabilities but we
> > > > > are and should be clear when we're updating dependencies that do have
> > > > > them (even if we're not exposed to them) so that some of these commits
> > > > > aren't so mysterious.  It creates far more confusion than is worth.
> > > > > We still will follow the 

Re: [DISCUSS] Closing in on a release of NiFi 1.8.0?

2018-10-14 Thread Koji Kawamura
Thank you for the fix Sivaprasanna,
I have Azure account. Reviewing it now.

Koji
On Sun, Oct 14, 2018 at 11:21 PM Jeff  wrote:
>
> Sivaprasanna,
>
> Thanks for submitting a pull request for that issue!  Later today or
> tomorrow I'll have to check to see if I've already used up my free-tier
> access to Azure.  If I still have access, I can review your PR and we'll
> get it into 1.8.0.
>
> On Sun, Oct 14, 2018 at 4:30 AM Sivaprasanna 
> wrote:
>
> > All - Just found one bug with DeleteAzureBlobStorage processor. It was
> > shared by one user on StackOverflow [1] and I later confirmed it. It looks
> > to be introduced by NIFI-4199. I have created a Jira [2] and made the
> > necessary changes (not huge, just few lines) and raised a PR [3]. I think,
> > if we can spend a little time in getting it reviewed, we can mark it for
> > 1.8.0. Thoughts?
> >
> > [1] -
> >
> > https://stackoverflow.com/questions/52766991/apache-nifi-deleteazureblobstorage-processor-is-throwing-an-error
> > [2] - https://issues.apache.org/jira/browse/NIFI-5698
> > [3] - https://github.com/apache/nifi/pull/3073
> >
> > -
> > Sivaprasanna
> >
> > On Fri, Oct 12, 2018 at 9:05 PM Mike Thomsen 
> > wrote:
> >
> > > 4811 should be ready for review now. Rebased and cleaned it up with a
> > full
> > > listing of the Spring dependencies.
> > >
> > > On Fri, Oct 12, 2018 at 11:23 AM Joe Witt  wrote:
> > >
> > > > Jeff,
> > > >
> > > > I think for anything not tagged to 1.8.0 we just keep rolling.  For
> > > > anything tagged 1.8.0 that should not be we should remove it until
> > > > ready.  For things tagged to 1.8.0 that cannot be moved we should
> > > > resolve.  For the tagged 1.8.0 section you had.
> > > >
> > > >- NIFI-4811  -
> > Use a
> > > >newer version of spring-data-redis
> > > >- PR 2856 
> > > > *This needs to be resolved by either reverting the commit or ensuring
> > > > L accurately reflects all.  We have to do this always and for every
> > > > nar.  The process isnt easy or fun but it is necessary to produce
> > > > valid ASF releases.  Landing commits which change dependencies
> > > > requires this due diligence.  Now, we've put a lot of energy into
> > > > updating Spring dependencies because some older Spring libs had
> > > > vulnerabilities which while we likely aren't exposed to them we want
> > > > to fix in due course.  So reverting may require more analysis than if
> > > > we were just get L fixed with this new change.  I commented on the
> > > > JIRA.  But this needs to be resolved.
> > > >
> > > >
> > > >- NIFI-5426  - Use
> > > >NIO.2 API for ListFile to avoid multiple disk reads
> > > >   - PR 2889 
> > > > *This just needed to be marked resolved.  The commit went in the day
> > > > after we cut 1.7.1.  So this one is sorted.
> > > >
> > > >- NIFI-5448  -
> > > Failed
> > > >EL date parsing live-locks processors without a failure relationship
> > > > * The commit needs to be reverted.  I'm working on that now.  Once the
> > > > discsusion/concerns are addressed this can get dealt with.
> > > >
> > > >- NIFI-5665  -
> > > Upgrade
> > > >io.netty dependencies
> > > > * This looks important to get resolved if possible as old netty libs
> > > > are on the list of things with vulnerabilities.
> > > >
> > > >- NIFI-5686  -
> > Test
> > > >failure in TestStandardProcessScheduler
> > > >- PR 3062 
> > > > * This has a PR but a test, possibly two, failed in one of the travis
> > > > runs and it is clearly related.  I ignored one of those tests in a
> > > > previous run.  We must deal with brittle tests.  But the underlying
> > > > problem is important to solve here so either the tests needs improved
> > > > or we still have an issue.  Not clear but worth some focus.
> > > >
> > > > note: I intend to reference updates to libraries that have known
> > > > vulnerabilities and do so in a far less subtle manner than we had.  We
> > > > aren't acknowledging that NiFi is or exposes vulnerabilities but we
> > > > are and should be clear when we're updating dependencies that do have
> > > > them (even if we're not exposed to them) so that some of these commits
> > > > aren't so mysterious.  It creates far more confusion than is worth.
> > > > We still will follow the ASF/NiFi security handling policy but I no
> > > > longer intend to treat due course dependency updates as if they need
> > > > to be a secret.
> > > >
> > > > Thanks
> > > > Joe
> > > >
> > > >
> > > > On Fri, Oct 12, 2018 at 3:32 AM Jeff  wrote:
> > > > >
> > > > > Hello everyone!  Next week is probably a good timeframe to aim for a
> > > > > release 

Re: NIFI single node in cluster mode

2018-10-14 Thread Milan Das
Thanks all for the advise.
Found the problem. I was adding two Djava parameters in java.arg.N. I added 
then two different  lines it worked.

Now I see the problem when I list Queue on Success Queye: My flow is simple 
GenerateFlowFile (success) --> Funnel. 
Yes I added all policies at root level to user nifiadmin1. This works when I 
set the cluster to false.

NIFI version : 1.6.0



Error:

2018-10-14 15:03:21,620 INFO [NiFi Web Server-38] 
o.a.n.w.s.NiFiAuthenticationFilter Authentication success for 
nifiadm...@interset.com
2018-10-14 15:03:21,621 INFO [NiFi Web Server-38] 
o.a.n.w.a.c.AccessDeniedExceptionMapper identity[nifiadm...@interset.com], 
groups[] does not have permission to access the requested resource. Unable to 
view the data for Processor with ID 7312084e-0166-1000--6ef08dd3. 
Returning Forbidden response.
2018-10-14 15:03:21,623 INFO [NiFi Web Server-40] 
o.a.n.w.a.c.AccessDeniedExceptionMapper identity[nifiadm...@interset.com], 
groups[] does not have permission to access the requested resource. Node 
ip-172-30-1-235.ec2.internal:8443 is unable to fulfill this request due to: 
Unable to view the data for Processor with ID 
7312084e-0166-1000--6ef08dd3. Contact the system administrator. 
Returning Forbidden response.
2018-10-14 15:03:21,633 INFO [NiFi Web Server-138] 
o.a.n.w.s.NiFiAuthenticationFilter Attempting request for 
() POST 
https://ip-172-30-1-235.ec2.internal:8443/nifi-api/flowfile-queues/73121f31-0166-1000--24972726/listing-requests
 (source ip: 172.30.1.235)
2018-10-14 15:03:21,633 INFO [NiFi Web Server-138] 
o.a.n.w.s.NiFiAuthenticationFilter Authentication success for nifiadmin1@

Regards,
Milan Das


Milan Das
Sr. System Architect
email: m...@interset.com

www.interset.com 
 


On 10/13/18, 2:39 PM, "Jeff"  wrote:

Milan,

If you haven't already done so, please take a look at the NiFi Admin
Guide's sections "Securing Zookeeper" [1] and "Kerberizing NiFi’s ZooKeeper
Client" [2], which should help you configure NiFi to use a kerberized
ZooKeeper.

[1]

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#securing_zookeeper
[2]

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#zk_kerberos_client

On Sat, Oct 13, 2018 at 9:38 AM Milan Das  wrote:

> Problem is I am using Kerbrized zookeeper and it is failing to create nifi
> basepath. Even if TGT is getting created Authentication is failing.
>
>
> 2018-10-13 13:33:53,573 INFO [Thread-12] org.apache.zookeeper.Login TGT
> refresh thread started.
> 2018-10-13 13:33:53,576 INFO [Thread-12] org.apache.zookeeper.Login TGT
> valid starting at:Sat Oct 13 13:33:53 UTC 2018
> 2018-10-13 13:33:53,576 INFO [Thread-12] org.apache.zookeeper.Login TGT
> expires:  Sun Oct 14 13:33:53 UTC 2018
> 2018-10-13 13:33:53,577 INFO [Thread-12] org.apache.zookeeper.Login TGT
> refresh sleeping until: Sun Oct 14 09:38:53 UTC 2018
> 2018-10-13 13:33:53,577 INFO
> [main-SendThread(ip-172-30-1-132.ec2.internal:2181)]
> o.a.zookeeper.client.ZooKeeperSaslClient Client will use GSSAPI as SASL
> mechanism.
> 2018-10-13 13:33:53,606 INFO [main-EventThread]
> o.a.c.f.state.ConnectionStateManager State change: CONNECTED
> 2018-10-13 13:33:53,616 ERROR
> [main-SendThread(ip-172-30-1-132.ec2.internal:2181)]
> o.a.zookeeper.client.ZooKeeperSaslClient SASL authentication failed using
> login context 'Client'.
> 2018-10-13 13:33:53,723 WARN [main]
> o.a.n.c.l.e.CuratorLeaderElectionManager Unable to determine the Elected
> Leader for role 'Cluster Coordinator' due to
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode 
=
> AuthFailed for /nifi/leaders/Cluster Coordinator; assuming no leader has
> been elected
> 2018-10-13 13:33:53,724 INFO [Curator-Framework-0]
> o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting
> 2018-10-13 13:33:53,726 INFO [main]
> o.apache.nifi.controller.FlowController It appears that no Cluster
> Coordinator has been Elected yet. Registering for Cluster Coordinator 
Role.
>
>
> Thanks,
> Milan Das
>
>
> On 10/12/18, 6:26 PM, "Bryan Bende"  wrote:
>
> There is also another property for the # of candidates to wait for 
when
> voting, if it sees the # of candidates first it will short circuit the
> time
> period. So setting the candidates to 1 for a single node cluster 
should
> start immediately.
>
> On Fri, Oct 12, 2018 at 5:59 PM Jon Logan  wrote:
>
> > It waits for election for a specific period of time, which if I
> recall is
> > fairly high (I think 5 minutes?). If you lower this it'll still wait
> for an
> > election but will complete faster (we do 

Re: [DISCUSS] Closing in on a release of NiFi 1.8.0?

2018-10-14 Thread Jeff
Sivaprasanna,

Thanks for submitting a pull request for that issue!  Later today or
tomorrow I'll have to check to see if I've already used up my free-tier
access to Azure.  If I still have access, I can review your PR and we'll
get it into 1.8.0.

On Sun, Oct 14, 2018 at 4:30 AM Sivaprasanna 
wrote:

> All - Just found one bug with DeleteAzureBlobStorage processor. It was
> shared by one user on StackOverflow [1] and I later confirmed it. It looks
> to be introduced by NIFI-4199. I have created a Jira [2] and made the
> necessary changes (not huge, just few lines) and raised a PR [3]. I think,
> if we can spend a little time in getting it reviewed, we can mark it for
> 1.8.0. Thoughts?
>
> [1] -
>
> https://stackoverflow.com/questions/52766991/apache-nifi-deleteazureblobstorage-processor-is-throwing-an-error
> [2] - https://issues.apache.org/jira/browse/NIFI-5698
> [3] - https://github.com/apache/nifi/pull/3073
>
> -
> Sivaprasanna
>
> On Fri, Oct 12, 2018 at 9:05 PM Mike Thomsen 
> wrote:
>
> > 4811 should be ready for review now. Rebased and cleaned it up with a
> full
> > listing of the Spring dependencies.
> >
> > On Fri, Oct 12, 2018 at 11:23 AM Joe Witt  wrote:
> >
> > > Jeff,
> > >
> > > I think for anything not tagged to 1.8.0 we just keep rolling.  For
> > > anything tagged 1.8.0 that should not be we should remove it until
> > > ready.  For things tagged to 1.8.0 that cannot be moved we should
> > > resolve.  For the tagged 1.8.0 section you had.
> > >
> > >- NIFI-4811  -
> Use a
> > >newer version of spring-data-redis
> > >- PR 2856 
> > > *This needs to be resolved by either reverting the commit or ensuring
> > > L accurately reflects all.  We have to do this always and for every
> > > nar.  The process isnt easy or fun but it is necessary to produce
> > > valid ASF releases.  Landing commits which change dependencies
> > > requires this due diligence.  Now, we've put a lot of energy into
> > > updating Spring dependencies because some older Spring libs had
> > > vulnerabilities which while we likely aren't exposed to them we want
> > > to fix in due course.  So reverting may require more analysis than if
> > > we were just get L fixed with this new change.  I commented on the
> > > JIRA.  But this needs to be resolved.
> > >
> > >
> > >- NIFI-5426  - Use
> > >NIO.2 API for ListFile to avoid multiple disk reads
> > >   - PR 2889 
> > > *This just needed to be marked resolved.  The commit went in the day
> > > after we cut 1.7.1.  So this one is sorted.
> > >
> > >- NIFI-5448  -
> > Failed
> > >EL date parsing live-locks processors without a failure relationship
> > > * The commit needs to be reverted.  I'm working on that now.  Once the
> > > discsusion/concerns are addressed this can get dealt with.
> > >
> > >- NIFI-5665  -
> > Upgrade
> > >io.netty dependencies
> > > * This looks important to get resolved if possible as old netty libs
> > > are on the list of things with vulnerabilities.
> > >
> > >- NIFI-5686  -
> Test
> > >failure in TestStandardProcessScheduler
> > >- PR 3062 
> > > * This has a PR but a test, possibly two, failed in one of the travis
> > > runs and it is clearly related.  I ignored one of those tests in a
> > > previous run.  We must deal with brittle tests.  But the underlying
> > > problem is important to solve here so either the tests needs improved
> > > or we still have an issue.  Not clear but worth some focus.
> > >
> > > note: I intend to reference updates to libraries that have known
> > > vulnerabilities and do so in a far less subtle manner than we had.  We
> > > aren't acknowledging that NiFi is or exposes vulnerabilities but we
> > > are and should be clear when we're updating dependencies that do have
> > > them (even if we're not exposed to them) so that some of these commits
> > > aren't so mysterious.  It creates far more confusion than is worth.
> > > We still will follow the ASF/NiFi security handling policy but I no
> > > longer intend to treat due course dependency updates as if they need
> > > to be a secret.
> > >
> > > Thanks
> > > Joe
> > >
> > >
> > > On Fri, Oct 12, 2018 at 3:32 AM Jeff  wrote:
> > > >
> > > > Hello everyone!  Next week is probably a good timeframe to aim for a
> > > > release candidate, with two major feature PRs recently merged to
> > master:
> > > >
> > > >- NIFI-5516  -
> > Allow
> > > >data in a Connection to be Load-Balanced across cluster
> > > >- NIFI-5585  -
> > > Prepare
> > > >Nodes to be 

Re: Graph database support w/ NiFi

2018-10-14 Thread Mike Thomsen
We have a Neo4J processor in a PR, but it is very much tied to Neo4J and
Cypher. I was raising the issue that we might want to take that PR and
extend it into an "ExecuteCypherQuery" processor with controller services
that use either cypher for gremlin or the neo4j driver.

On Sun, Oct 14, 2018 at 6:03 AM u...@moosheimer.com 
wrote:

> Mike,
>
> Cypher for Gremlin is a good idea. We can start with it and then later
> allow an alternative so that users can use either Cypher or Gremlin
> directly.
>
> To set the focus on Neo4J or Janusgraph or xyz is in my opinion not
> target-oriented.
> We should have a NiFi Graph processor that supports Tinkerpop. Via the
> Gremlin server we can support all Tinkerpop capable graph databases
> (
> https://github.com/apache/tinkerpop/blob/master/gremlin-server/conf/gremlin-server-neo4j.yaml
> ).
>
> Via a controller service we can then connect either Neo4J or Janusgraph
> or any other graph DB.
> Otherwise we would have to build a processor for each Graph DB.
> We don't do that in NiFi for RDBMS either. There we have an ExecuteSQL
> or PutSQL and say about the controller service what we want to connect.
>
> What do you mean Mike?
>
> Best Regards,
> Uwe
>
> Am 06.10.2018 um 00:15 schrieb Mike Thomsen:
> > Uwe and Matt,
> >
> > Now that we're dipping our toes into Neo4J and Cypher, any thoughts on
> this?
> >
> > https://github.com/opencypher/cypher-for-gremlin
> >
> > I'm wondering if we shouldn't work with mans2singh to take the Neo4J work
> > and push it further into having a client API that can let us inject a
> > service that uses that or one that uses Neo4J's drivers.
> >
> > Mike
> >
> > On Mon, May 14, 2018 at 7:13 AM Otto Fowler 
> wrote:
> >
> >> The wiki discussion should list these and other points of concern and
> >> should document the extent to which
> >> they are to be addressed.
> >>
> >>
> >> On May 12, 2018 at 12:37:59, u...@moosheimer.com (u...@moosheimer.com)
> >> wrote:
> >>
> >> Matt,
> >>
> >> You have some interesting ideas that I really like.
> >> GraphReaders and GraphWriters would be interesting. When I started
> >> writing a graph processor with my idea, the concept was not yet
> >> implemented in NiFi.
> >> I don't find GraphML and GraphSON so tingly because they contain e.g.
> >> the Vertex/Edge IDs and serve as import and export format to my
> >> knowledge (correct me if I'm wrong).
> >>
> >> A ConvertRecordToGraph processor is a good approach, the only question
> >> is from which format we can convert?
> >>
> >> I also think to make a graph processor a bit general we would have to
> >> provide a query as input which provides the correct vertex from which
> >> the graph should be extended.
> >> Maybe like your suggestion with a gremlin query or a small gremlin
> script.
> >>
> >> If a vertex is found a new edge and a new vertex are added.
> >> It asks how we transmit the individual attributes to the edge and vertex
> >> as well as the labels of the edge and vertex? Possibly with NiFi
> >> attributes?
> >>
> >> I have some headaches about the complexity.
> >> A small example:
> >> Imagine we have a set from a CSV file.
> >> The columns are Set ID, Token1, Token2, Token3...
> >> ID, Token1,Token2,Token3,Token4,Token5
> >> 123, Mary, had, a, little, lamp
> >>
> >> I want to create a vertex with ID 123 (if not exists). Then I want to
> >> check for each token if a vertex exists in the graph database (search
> >> for vertex with label "Token" and attribute "name"="Mary"). If the
> >> vertex does not exist, the vertex has to be created.
> >> Since I want to save e.g. Wikipedia to my graph I want to avoid the
> >> supernode problem for the token vertices. I create a few distribution
> >> vertices for each vertex that belongs to a token. If there is a vertex
> >> for Token1(Mary) then I don't want to make the edge from this vertex to
> >> my vertex with the ID 123, but from one of the distribution vertices.
> >> If the vertex for the token does not exist, the distribution vertices
> >> have also to be created ... and so on...
> >>
> >> Even with this very simple example it seems to become difficult with a
> >> universal processor.
> >>
> >> In any case I think the idea to implement a graph processor in NiFi is a
> >> good one.
> >> The more we work on it the more good ideas we get and maybe only I can't
> >> see the forest for the trees.
> >>
> >> One question about Titan. To my knowledge, Titan has been dead for a
> >> year and a half and Janusgraph is the successor?
> >> Titan has become unofficially Datastax Enterprise Graph?!
> >> Supporting Titan could become difficult because Titan does not support
> >> my knowledge after TinkerPop 3 and is no longer maintained.
> >>
> >> I like your idea for a wiki page for more ideas. In the many mails one
> >> loses oneself otherwise.
> >>
> >> Regards,
> >> Kay-Uwe
> >>
> >> Am 12.05.2018 um 16:52 schrieb Matt Burgess:
> >>> All,
> >>>
> >>> As Joe implied, I'm very happy that we are discussing graph tech 

Re: Graph database support w/ NiFi

2018-10-14 Thread u...@moosheimer.com
Mike,

Cypher for Gremlin is a good idea. We can start with it and then later
allow an alternative so that users can use either Cypher or Gremlin
directly.

To set the focus on Neo4J or Janusgraph or xyz is in my opinion not
target-oriented.
We should have a NiFi Graph processor that supports Tinkerpop. Via the
Gremlin server we can support all Tinkerpop capable graph databases
(https://github.com/apache/tinkerpop/blob/master/gremlin-server/conf/gremlin-server-neo4j.yaml).

Via a controller service we can then connect either Neo4J or Janusgraph
or any other graph DB.
Otherwise we would have to build a processor for each Graph DB.
We don't do that in NiFi for RDBMS either. There we have an ExecuteSQL
or PutSQL and say about the controller service what we want to connect.

What do you mean Mike?

Best Regards,
Uwe

Am 06.10.2018 um 00:15 schrieb Mike Thomsen:
> Uwe and Matt,
>
> Now that we're dipping our toes into Neo4J and Cypher, any thoughts on this?
>
> https://github.com/opencypher/cypher-for-gremlin
>
> I'm wondering if we shouldn't work with mans2singh to take the Neo4J work
> and push it further into having a client API that can let us inject a
> service that uses that or one that uses Neo4J's drivers.
>
> Mike
>
> On Mon, May 14, 2018 at 7:13 AM Otto Fowler  wrote:
>
>> The wiki discussion should list these and other points of concern and
>> should document the extent to which
>> they are to be addressed.
>>
>>
>> On May 12, 2018 at 12:37:59, u...@moosheimer.com (u...@moosheimer.com)
>> wrote:
>>
>> Matt,
>>
>> You have some interesting ideas that I really like.
>> GraphReaders and GraphWriters would be interesting. When I started
>> writing a graph processor with my idea, the concept was not yet
>> implemented in NiFi.
>> I don't find GraphML and GraphSON so tingly because they contain e.g.
>> the Vertex/Edge IDs and serve as import and export format to my
>> knowledge (correct me if I'm wrong).
>>
>> A ConvertRecordToGraph processor is a good approach, the only question
>> is from which format we can convert?
>>
>> I also think to make a graph processor a bit general we would have to
>> provide a query as input which provides the correct vertex from which
>> the graph should be extended.
>> Maybe like your suggestion with a gremlin query or a small gremlin script.
>>
>> If a vertex is found a new edge and a new vertex are added.
>> It asks how we transmit the individual attributes to the edge and vertex
>> as well as the labels of the edge and vertex? Possibly with NiFi
>> attributes?
>>
>> I have some headaches about the complexity.
>> A small example:
>> Imagine we have a set from a CSV file.
>> The columns are Set ID, Token1, Token2, Token3...
>> ID, Token1,Token2,Token3,Token4,Token5
>> 123, Mary, had, a, little, lamp
>>
>> I want to create a vertex with ID 123 (if not exists). Then I want to
>> check for each token if a vertex exists in the graph database (search
>> for vertex with label "Token" and attribute "name"="Mary"). If the
>> vertex does not exist, the vertex has to be created.
>> Since I want to save e.g. Wikipedia to my graph I want to avoid the
>> supernode problem for the token vertices. I create a few distribution
>> vertices for each vertex that belongs to a token. If there is a vertex
>> for Token1(Mary) then I don't want to make the edge from this vertex to
>> my vertex with the ID 123, but from one of the distribution vertices.
>> If the vertex for the token does not exist, the distribution vertices
>> have also to be created ... and so on...
>>
>> Even with this very simple example it seems to become difficult with a
>> universal processor.
>>
>> In any case I think the idea to implement a graph processor in NiFi is a
>> good one.
>> The more we work on it the more good ideas we get and maybe only I can't
>> see the forest for the trees.
>>
>> One question about Titan. To my knowledge, Titan has been dead for a
>> year and a half and Janusgraph is the successor?
>> Titan has become unofficially Datastax Enterprise Graph?!
>> Supporting Titan could become difficult because Titan does not support
>> my knowledge after TinkerPop 3 and is no longer maintained.
>>
>> I like your idea for a wiki page for more ideas. In the many mails one
>> loses oneself otherwise.
>>
>> Regards,
>> Kay-Uwe
>>
>> Am 12.05.2018 um 16:52 schrieb Matt Burgess:
>>> All,
>>>
>>> As Joe implied, I'm very happy that we are discussing graph tech in
>>> relation to NiFi! NiFi and Graph theory/tech/analytics are passions of
>>> mine. Mike, the examples you list are great, I would add Titan (and
>>> its fork Janusgraph as Kay-Uwe mentioned) and Azure CosmosDB (these
>>> and others are at [1]). I think there are at least four aspects to
>>> this:
>>>
>>> 1) Graph query/traversal: This deals with getting data out of a graph
>>> database and into flow file(s) for further processing. Here I agree
>>> with Kay-Uwe that we should consider Apache Tinkerpop as the main
>>> library for graph query/traversal, for a