Re: Kafka Schema registry
https://issues.apache.org/jira/browse/NIFI-1763 Please feel free to add your thoughts to that JIRA. On Wed, Apr 13, 2016 at 2:17 PM, Joe Wittwrote: > Ok will look into a bit and put in a JIRA for this idea. Will send > that on this thread to ensure it captures your thoughts and of course > please do add/augment it as you like. Are you interested in helping > contribute to this from a coding perspective as well? > > Thanks > Joe > > On Wed, Apr 13, 2016 at 2:09 PM, Madhukar Thota > wrote: >> Hi Joe, >> >> We are using Confluent version Kafka and using its schema registry to store >> Avro schema. we would like continue same with Nifi writing avro file to >> Confluent Kafka Schema registry. >> >> http://docs.confluent.io/2.0.0/schema-registry/docs/index.html >> >> -Madhu >> >> On Wed, Apr 13, 2016 at 1:48 PM, Joe Witt wrote: >>> >>> Madhu, >>> >>> Do you have any information you can point to for the registry? I know >>> of the Confluent one but I am not sure of its interfaces. If there >>> are open source friendly ones available it certainly would be a fine >>> thing to support. Can you point us to what you are looking at >>> specifically? >>> >>> Thanks >>> Joe >>> >>> On Wed, Apr 13, 2016 at 1:34 PM, Madhukar Thota >>> wrote: >>> > Friends, >>> > >>> > Is it possible to use Schema registry with Kafka Processors to store and >>> > retrive Avro schema? >>> > >>> > -Madhu >> >>
Re: 'On primary node' ListSFTP not working for new cluster
Tom, Ok that is pretty interesting and we'd want to get to the bottom of it. If you happen to see that state again could you please run ./bin/nifi.sh dump and send the logs/nifi-bootstrap.log that results. Thanks Joe On Wed, Apr 13, 2016 at 2:36 PM, Tom Stewartwrote: > Yes I have three nodes on the NiFi cluster screen - two are CONNECTED and > one is CONNECTED,PRIMARY. All have up to date heartbeats. When I run > GenerateFlowFile with Timer Driven it executes on all three of my nodes. > > However, I just switched the PRIMARY to another node via election and now it > is working. I moved it back to the one it was on previously and it is > working there too. > > So it appears re-electing seemed to correct whatever state I was in that was > preventing "On Primary Node" functionality from working. Thanks for the > help! > > > > From: Mark Payne > To: users@nifi.apache.org; Tom Stewart > Sent: Wednesday, April 13, 2016 1:16 PM > > Subject: Re: 'On primary node' ListSFTP not working for new cluster > > Tom, > > It sounds like you do not have any node elected primary at all. If you click > the cluster icon in the top-right corner, > it should show all of the nodes in your cluster. Next to the nodes should be > a ribbon that you can click to elect > a new primary node... Though it should also show which node is currently the > primary. Can you check if it shows > a primary node? And if so, can you verify that the primary node is actually > doing anything? I.e., if you start GenerateFlowFile > on all nodes, can you see that it is indeed running on the primary node, in > addition to the others? > > Thanks > -Mark > > > On Apr 13, 2016, at 2:01 PM, Tom Stewart wrote: > > On Primary Node - the Tasks/Time stays at zero for the 5 minute interval > that is displaying for me. When I flip it to Timer Driven it does increment > as expected. I have my Run Schedule at "60 sec". > > The View State shows two keys (listing,timestamp/processed.timestamp), but > the Value is not changing. Both show Scope=Cluster. I think this is residual > data from when I had it set to "Timer Driven" and it actually processed some > files. > > I neglected to mention the version - this is 0.60. > I tried an even simpler flow with just GenerateFlowFile and LogAttribute and > it does the same for me. > > > > From: Mark Payne > To: users@nifi.apache.org; Tom Stewart > Sent: Wednesday, April 13, 2016 11:42 AM > Subject: Re: 'On primary node' ListSFTP not working for new cluster > > Tom, > > When you are running on Primary Node, do you see the "Tasks/Time" on the > processor showing that tasks are running? > > If you right-click on the Processor and choose "View State", does it show > anything in the table? > > Thanks > -Mark > > On Apr 13, 2016, at 11:59 AM, Tom Stewart wrote: > > I built a NiFi cluster and some test flows and things seem to be working > fine. My three nodes show in the cluster view and are all connected with one > marked PRIMARY. I cannot get 'On primary node' working with several > processors I have tried. My current one is a simple flow consisting of > ListSFTP and LogAttribute. If I set my ListSFTP to Timer Driven with Run > schedule of 60 sec, it works fine. However then it runs on all of my nodes. > I changed it to On Primary Node and while I see the log in the my primary > node where it claims to start: > > nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler > Thread-3-SendThread(los90hdf4.novalocal:2181)] > org.apache.zookeeper.ClientCnxn Reading reply sessionid:0x354103c2b86, > packet:: clientPath:null serverPath:null finished:false header:: 2,4 > replyHeader:: 2,55834574850,0 request:: > '/nifi/components/4603bfe7-6d98-4ad4-99f2-2a740034ae03,F response:: > #1,s{51539607588,51539607597,1460559569213,1460560385643,8,0,0,0,5,0,51539607588} > nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler > Thread-1] o.a.nifi.processors.standard.ListSFTP > ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] Returning CLUSTER State: > StandardStateMap[version=8, values={}] > nifi-app.log:2016-04-13 10:53:55,599 INFO [StandardProcessScheduler > Thread-4] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled > ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] to run with 1 threads > > However, I do not see any errors or accesses on my SFTP server where it is > actually ever attempting to connect. I can flip the processor back to Timer > Driven and my SFTP server starts seeing requests. But when I toggle back to > On Primary Node it doesn't appear to be executing. I think the other > processor I tried this with was GetHTTP with similar experience. > > Curious if there are any debug steps or setting recommendations that are > useful to check if it appears that "On
Re: 'On primary node' ListSFTP not working for new cluster
Yes I have three nodes on the NiFi cluster screen - two are CONNECTED and one is CONNECTED,PRIMARY. All have up to date heartbeats. When I run GenerateFlowFile with Timer Driven it executes on all three of my nodes. However, I just switched the PRIMARY to another node via election and now it is working. I moved it back to the one it was on previously and it is working there too. So it appears re-electing seemed to correct whatever state I was in that was preventing "On Primary Node" functionality from working. Thanks for the help! From: Mark PayneTo: users@nifi.apache.org; Tom Stewart Sent: Wednesday, April 13, 2016 1:16 PM Subject: Re: 'On primary node' ListSFTP not working for new cluster Tom, It sounds like you do not have any node elected primary at all. If you click the cluster icon in the top-right corner,it should show all of the nodes in your cluster. Next to the nodes should be a ribbon that you can click to electa new primary node... Though it should also show which node is currently the primary. Can you check if it showsa primary node? And if so, can you verify that the primary node is actually doing anything? I.e., if you start GenerateFlowFileon all nodes, can you see that it is indeed running on the primary node, in addition to the others? Thanks-Mark On Apr 13, 2016, at 2:01 PM, Tom Stewart wrote: On Primary Node - the Tasks/Time stays at zero for the 5 minute interval that is displaying for me. When I flip it to Timer Driven it does increment as expected. I have my Run Schedule at "60 sec". The View State shows two keys (listing,timestamp/processed.timestamp), but the Value is not changing. Both show Scope=Cluster. I think this is residual data from when I had it set to "Timer Driven" and it actually processed some files. I neglected to mention the version - this is 0.60. I tried an even simpler flow with just GenerateFlowFile and LogAttribute and it does the same for me. From: Mark Payne To: users@nifi.apache.org; Tom Stewart Sent: Wednesday, April 13, 2016 11:42 AM Subject: Re: 'On primary node' ListSFTP not working for new cluster Tom, When you are running on Primary Node, do you see the "Tasks/Time" on the processor showing that tasks are running? If you right-click on the Processor and choose "View State", does it show anything in the table? Thanks-Mark On Apr 13, 2016, at 11:59 AM, Tom Stewart wrote: I built a NiFi cluster and some test flows and things seem to be working fine. My three nodes show in the cluster view and are all connected with one marked PRIMARY. I cannot get 'On primary node' working with several processors I have tried. My current one is a simple flow consisting of ListSFTP and LogAttribute. If I set my ListSFTP to Timer Driven with Run schedule of 60 sec, it works fine. However then it runs on all of my nodes. I changed it to On Primary Node and while I see the log in the my primary node where it claims to start: nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler Thread-3-SendThread(los90hdf4.novalocal:2181)] org.apache.zookeeper.ClientCnxn Reading reply sessionid:0x354103c2b86, packet:: clientPath:null serverPath:null finished:false header:: 2,4 replyHeader:: 2,55834574850,0 request:: '/nifi/components/4603bfe7-6d98-4ad4-99f2-2a740034ae03,F response:: #1,s{51539607588,51539607597,1460559569213,1460560385643,8,0,0,0,5,0,51539607588} nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler Thread-1] o.a.nifi.processors.standard.ListSFTP ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] Returning CLUSTER State: StandardStateMap[version=8, values={}] nifi-app.log:2016-04-13 10:53:55,599 INFO [StandardProcessScheduler Thread-4] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] to run with 1 threads However, I do not see any errors or accesses on my SFTP server where it is actually ever attempting to connect. I can flip the processor back to Timer Driven and my SFTP server starts seeing requests. But when I toggle back to On Primary Node it doesn't appear to be executing. I think the other processor I tried this with was GetHTTP with similar experience. Curious if there are any debug steps or setting recommendations that are useful to check if it appears that "On Primary Node" doesn't work for a cluster.
Re: Kafka Schema registry
Ok will look into a bit and put in a JIRA for this idea. Will send that on this thread to ensure it captures your thoughts and of course please do add/augment it as you like. Are you interested in helping contribute to this from a coding perspective as well? Thanks Joe On Wed, Apr 13, 2016 at 2:09 PM, Madhukar Thotawrote: > Hi Joe, > > We are using Confluent version Kafka and using its schema registry to store > Avro schema. we would like continue same with Nifi writing avro file to > Confluent Kafka Schema registry. > > http://docs.confluent.io/2.0.0/schema-registry/docs/index.html > > -Madhu > > On Wed, Apr 13, 2016 at 1:48 PM, Joe Witt wrote: >> >> Madhu, >> >> Do you have any information you can point to for the registry? I know >> of the Confluent one but I am not sure of its interfaces. If there >> are open source friendly ones available it certainly would be a fine >> thing to support. Can you point us to what you are looking at >> specifically? >> >> Thanks >> Joe >> >> On Wed, Apr 13, 2016 at 1:34 PM, Madhukar Thota >> wrote: >> > Friends, >> > >> > Is it possible to use Schema registry with Kafka Processors to store and >> > retrive Avro schema? >> > >> > -Madhu > >
Re: Kafka Schema registry
Hi Joe, We are using Confluent version Kafka and using its schema registry to store Avro schema. we would like continue same with Nifi writing avro file to Confluent Kafka Schema registry. http://docs.confluent.io/2.0.0/schema-registry/docs/index.html -Madhu On Wed, Apr 13, 2016 at 1:48 PM, Joe Wittwrote: > Madhu, > > Do you have any information you can point to for the registry? I know > of the Confluent one but I am not sure of its interfaces. If there > are open source friendly ones available it certainly would be a fine > thing to support. Can you point us to what you are looking at > specifically? > > Thanks > Joe > > On Wed, Apr 13, 2016 at 1:34 PM, Madhukar Thota > wrote: > > Friends, > > > > Is it possible to use Schema registry with Kafka Processors to store and > > retrive Avro schema? > > > > -Madhu >
Re: 'On primary node' ListSFTP not working for new cluster
On Primary Node - the Tasks/Time stays at zero for the 5 minute interval that is displaying for me. When I flip it to Timer Driven it does increment as expected. I have my Run Schedule at "60 sec". The View State shows two keys (listing,timestamp/processed.timestamp), but the Value is not changing. Both show Scope=Cluster. I think this is residual data from when I had it set to "Timer Driven" and it actually processed some files. I neglected to mention the version - this is 0.60. I tried an even simpler flow with just GenerateFlowFile and LogAttribute and it does the same for me. From: Mark PayneTo: users@nifi.apache.org; Tom Stewart Sent: Wednesday, April 13, 2016 11:42 AM Subject: Re: 'On primary node' ListSFTP not working for new cluster Tom, When you are running on Primary Node, do you see the "Tasks/Time" on the processor showing that tasks are running? If you right-click on the Processor and choose "View State", does it show anything in the table? Thanks-Mark On Apr 13, 2016, at 11:59 AM, Tom Stewart wrote: I built a NiFi cluster and some test flows and things seem to be working fine. My three nodes show in the cluster view and are all connected with one marked PRIMARY. I cannot get 'On primary node' working with several processors I have tried. My current one is a simple flow consisting of ListSFTP and LogAttribute. If I set my ListSFTP to Timer Driven with Run schedule of 60 sec, it works fine. However then it runs on all of my nodes. I changed it to On Primary Node and while I see the log in the my primary node where it claims to start: nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler Thread-3-SendThread(los90hdf4.novalocal:2181)] org.apache.zookeeper.ClientCnxn Reading reply sessionid:0x354103c2b86, packet:: clientPath:null serverPath:null finished:false header:: 2,4 replyHeader:: 2,55834574850,0 request:: '/nifi/components/4603bfe7-6d98-4ad4-99f2-2a740034ae03,F response:: #1,s{51539607588,51539607597,1460559569213,1460560385643,8,0,0,0,5,0,51539607588} nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler Thread-1] o.a.nifi.processors.standard.ListSFTP ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] Returning CLUSTER State: StandardStateMap[version=8, values={}] nifi-app.log:2016-04-13 10:53:55,599 INFO [StandardProcessScheduler Thread-4] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] to run with 1 threads However, I do not see any errors or accesses on my SFTP server where it is actually ever attempting to connect. I can flip the processor back to Timer Driven and my SFTP server starts seeing requests. But when I toggle back to On Primary Node it doesn't appear to be executing. I think the other processor I tried this with was GetHTTP with similar experience. Curious if there are any debug steps or setting recommendations that are useful to check if it appears that "On Primary Node" doesn't work for a cluster.
Re: Kafka Schema registry
Madhu, Do you have any information you can point to for the registry? I know of the Confluent one but I am not sure of its interfaces. If there are open source friendly ones available it certainly would be a fine thing to support. Can you point us to what you are looking at specifically? Thanks Joe On Wed, Apr 13, 2016 at 1:34 PM, Madhukar Thotawrote: > Friends, > > Is it possible to use Schema registry with Kafka Processors to store and > retrive Avro schema? > > -Madhu
Kafka Schema registry
Friends, Is it possible to use Schema registry with Kafka Processors to store and retrive Avro schema? -Madhu
Re: 'On primary node' ListSFTP not working for new cluster
Tom, When you are running on Primary Node, do you see the "Tasks/Time" on the processor showing that tasks are running? If you right-click on the Processor and choose "View State", does it show anything in the table? Thanks -Mark > On Apr 13, 2016, at 11:59 AM, Tom Stewartwrote: > > I built a NiFi cluster and some test flows and things seem to be working > fine. My three nodes show in the cluster view and are all connected with one > marked PRIMARY. I cannot get 'On primary node' working with several > processors I have tried. My current one is a simple flow consisting of > ListSFTP and LogAttribute. If I set my ListSFTP to Timer Driven with Run > schedule of 60 sec, it works fine. However then it runs on all of my nodes. I > changed it to On Primary Node and while I see the log in the my primary node > where it claims to start: > > nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler > Thread-3-SendThread(los90hdf4.novalocal:2181)] > org.apache.zookeeper.ClientCnxn Reading reply sessionid:0x354103c2b86, > packet:: clientPath:null serverPath:null finished:false header:: 2,4 > replyHeader:: 2,55834574850,0 request:: > '/nifi/components/4603bfe7-6d98-4ad4-99f2-2a740034ae03,F response:: > #1,s{51539607588,51539607597,1460559569213,1460560385643,8,0,0,0,5,0,51539607588} > nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler > Thread-1] o.a.nifi.processors.standard.ListSFTP > ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] Returning CLUSTER State: > StandardStateMap[version=8, values={}] > nifi-app.log:2016-04-13 10:53:55,599 INFO [StandardProcessScheduler Thread-4] > o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled > ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] to run with 1 threads > > However, I do not see any errors or accesses on my SFTP server where it is > actually ever attempting to connect. I can flip the processor back to Timer > Driven and my SFTP server starts seeing requests. But when I toggle back to > On Primary Node it doesn't appear to be executing. I think the other > processor I tried this with was GetHTTP with similar experience. > > Curious if there are any debug steps or setting recommendations that are > useful to check if it appears that "On Primary Node" doesn't work for a > cluster. > > >
'On primary node' ListSFTP not working for new cluster
I built a NiFi cluster and some test flows and things seem to be working fine. My three nodes show in the cluster view and are all connected with one marked PRIMARY. I cannot get 'On primary node' working with several processors I have tried. My current one is a simple flow consisting of ListSFTP and LogAttribute. If I set my ListSFTP to Timer Driven with Run schedule of 60 sec, it works fine. However then it runs on all of my nodes. I changed it to On Primary Node and while I see the log in the my primary node where it claims to start: nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler Thread-3-SendThread(los90hdf4.novalocal:2181)] org.apache.zookeeper.ClientCnxn Reading reply sessionid:0x354103c2b86, packet:: clientPath:null serverPath:null finished:false header:: 2,4 replyHeader:: 2,55834574850,0 request:: '/nifi/components/4603bfe7-6d98-4ad4-99f2-2a740034ae03,F response:: #1,s{51539607588,51539607597,1460559569213,1460560385643,8,0,0,0,5,0,51539607588} nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler Thread-1] o.a.nifi.processors.standard.ListSFTP ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] Returning CLUSTER State: StandardStateMap[version=8, values={}] nifi-app.log:2016-04-13 10:53:55,599 INFO [StandardProcessScheduler Thread-4] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] to run with 1 threads However, I do not see any errors or accesses on my SFTP server where it is actually ever attempting to connect. I can flip the processor back to Timer Driven and my SFTP server starts seeing requests. But when I toggle back to On Primary Node it doesn't appear to be executing. I think the other processor I tried this with was GetHTTP with similar experience. Curious if there are any debug steps or setting recommendations that are useful to check if it appears that "On Primary Node" doesn't work for a cluster.
Re: Large dataset on hbase
Hi, 1.Is the output of your Pig script a single file that contains all the JSON documents corresponding to your CSV? Yes output of my pig script having all json documents corresponding to the CSV. 2.Also, are there any errors in logs/nifi-app.log (or on the processor in the UI) when this happens? Here there are no errors in both web interface(UI) and logs/nifi-app.log file. Thanks, Prabhu Mahendran On 12-Apr-2016 8:20 pm, "Bryan Bende"wrote: > > Is the output of your Pig script a single file that contains all the JSON > documents corresponding to your CSV? > or does it create a single JSON document for each row of the CSV? > > Also, are there any errors in logs/nifi-app.log (or on the processor in > the UI) when this happens? > > -Bryan > > On Tue, Apr 12, 2016 at 12:38 PM, prabhu Mahendran < > prabhuu161...@gmail.com> wrote: > >> Hi, >> >> I just use Pig Script to convert the CSV into JSON with help of >> ExecuteProcess. >> >> In my case i have use n1 from JSON document which could be stored as row >> key in HBase Table.So n2-n22 store as columns in hbase. >> >> some of rows (n1's) are stored inside the table but remaining are read >> well but not stored. >> >> Thanks, >> Prabhu Mahendran >> >> On Tue, Apr 12, 2016 at 1:58 PM, Bryan Bende wrote: >> >>> Hi Prabhu, >>> >>> How did you end up converting your CSV into JSON? >>> >>> PutHBaseJSON creates a single row from a JSON document. In your example >>> above, using n1 as the rowId, it would create a row with columns n2 - n22. >>> Are you seeing columns missing, or are you missing whole rows from your >>> original CSV? >>> >>> Thanks, >>> >>> Bryan >>> >>> >>> >>> On Mon, Apr 11, 2016 at 11:43 AM, prabhu Mahendran < >>> prabhuu161...@gmail.com> wrote: >>> Hi Simon/Joe, Thanks for this support. I have successfully converted the CSV data into JSON and also insert those JSON data into Hbase Table using PutHBaseJSon. Part of JSON Sample Data like below: { "n1":"", "n2":"", "n3":"", "n4":"","n5":"","n6":"", "n7":"", "n8":"", "n9":"", "n10":"","n11":"","n12":"","n13":"","n14":"","n15":"","n16":"", "n17":"","n18":"","n19":"","n20":"","n21":"-", "n22":"" } PutHBaseJSON: Table Name is 'Hike' , Column Family:'Sweet' ,Row Identifier Field Name:n1(Element in JSON File). My Record Contains 15 lacks rows but HBaseTable contains only 10 rows. It Can Read the 15 lacks rows but stores minimum rows. Anyone please help me to solve this? Prabhu, If the dataset being processed can be split up and still retain the necessary meaning when input to HBase I'd recommend doing that. NiFI itself, as a framework, can handle very large objects because its API doesn't force loading of entire objects into memory. However, various processors may do that and I believe ReplaceText may be one that does. You can use SplitText or ExecuteScript or other processors to do that splitting if that will help your case. Thanks Joe On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball wrote: > Hi Prabhu, > > Did you try increasing the heap size in conf/bootstrap.conf? By default nifi > uses a very small RAM allocation (512MB). You can increase this by tweaking > java.arg.2 and .3 in the bootstrap.conf file. Note that this is the java > heap, so you will need more than your data size to account for java object > overhead. The other thing to check is the buffer sizes you are using for > your replace text processors. If you’re also using Split processors, you can > sometime run up against RAM and open file limits, if this is the case, make > sure you increase the ulimit -n settings. > > Simon > > On 9 Apr 2016, at 16:51, prabhu Mahendran wrote: > > Hi, > > I am new to nifi and does not know how to process large data like one gb csv > data into hbase.while try combination of getFile and putHbase shell leads > Java Out of memory error and also try combination of replace text, extract > text and puthbasejson doesn't work on large dataset but it work correctly in > smaller dataset. > Can anyone please help me to solve this? > Thanks in advance. > > Thanks & Regards, > Prabhu Mahendran > > >>> >>> >> >