Re: Kafka Schema registry

2016-04-13 Thread Joe Witt
https://issues.apache.org/jira/browse/NIFI-1763
Please feel free to add your thoughts to that JIRA.

On Wed, Apr 13, 2016 at 2:17 PM, Joe Witt  wrote:
> Ok will look into a bit and put in a JIRA for this idea.  Will send
> that on this thread to ensure it captures your thoughts and of course
> please do add/augment it as you like.  Are you interested in helping
> contribute to this from a coding perspective as well?
>
> Thanks
> Joe
>
> On Wed, Apr 13, 2016 at 2:09 PM, Madhukar Thota
>  wrote:
>> Hi Joe,
>>
>> We are using Confluent version Kafka and using its schema registry to store
>> Avro schema. we would like continue same with Nifi writing avro file to
>> Confluent Kafka Schema registry.
>>
>> http://docs.confluent.io/2.0.0/schema-registry/docs/index.html
>>
>> -Madhu
>>
>> On Wed, Apr 13, 2016 at 1:48 PM, Joe Witt  wrote:
>>>
>>> Madhu,
>>>
>>> Do you have any information you can point to for the registry?  I know
>>> of the Confluent one but I am not sure of its interfaces.  If there
>>> are open source friendly ones available it certainly would be a fine
>>> thing to support.  Can you point us to what you are looking at
>>> specifically?
>>>
>>> Thanks
>>> Joe
>>>
>>> On Wed, Apr 13, 2016 at 1:34 PM, Madhukar Thota
>>>  wrote:
>>> > Friends,
>>> >
>>> > Is it possible to use Schema registry with Kafka Processors to store and
>>> > retrive Avro schema?
>>> >
>>> > -Madhu
>>
>>


Re: 'On primary node' ListSFTP not working for new cluster

2016-04-13 Thread Joe Witt
Tom,

Ok that is pretty interesting and we'd want to get to the bottom of
it.  If you happen to see that state again could you please run
./bin/nifi.sh dump and send the logs/nifi-bootstrap.log that results.

Thanks
Joe

On Wed, Apr 13, 2016 at 2:36 PM, Tom Stewart  wrote:
> Yes I have three nodes on the NiFi cluster screen - two are CONNECTED and
> one is CONNECTED,PRIMARY. All have up to date heartbeats. When I run
> GenerateFlowFile with Timer Driven it executes on all three of my nodes.
>
> However, I just switched the PRIMARY to another node via election and now it
> is working. I moved it back to the one it was on previously and it is
> working there too.
>
> So it appears re-electing seemed to correct whatever state I was in that was
> preventing "On Primary Node" functionality from working. Thanks for the
> help!
>
>
> 
> From: Mark Payne 
> To: users@nifi.apache.org; Tom Stewart 
> Sent: Wednesday, April 13, 2016 1:16 PM
>
> Subject: Re: 'On primary node' ListSFTP not working for new cluster
>
> Tom,
>
> It sounds like you do not have any node elected primary at all. If you click
> the cluster icon in the top-right corner,
> it should show all of the nodes in your cluster. Next to the nodes should be
> a ribbon that you can click to elect
> a new primary node... Though it should also show which node is currently the
> primary. Can you check if it shows
> a primary node? And if so, can you verify that the primary node is actually
> doing anything? I.e., if you start GenerateFlowFile
> on all nodes, can you see that it is indeed running on the primary node, in
> addition to the others?
>
> Thanks
> -Mark
>
>
> On Apr 13, 2016, at 2:01 PM, Tom Stewart  wrote:
>
> On Primary Node - the Tasks/Time stays at zero for the 5 minute interval
> that is displaying for me. When I flip it to Timer Driven it does increment
> as expected. I have my Run Schedule at "60 sec".
>
> The View State shows two keys (listing,timestamp/processed.timestamp), but
> the Value is not changing. Both show Scope=Cluster. I think this is residual
> data from when I had it set to "Timer Driven" and it actually processed some
> files.
>
> I neglected to mention the version - this is 0.60.
> I tried an even simpler flow with just GenerateFlowFile and LogAttribute and
> it does the same for me.
>
>
> 
> From: Mark Payne 
> To: users@nifi.apache.org; Tom Stewart 
> Sent: Wednesday, April 13, 2016 11:42 AM
> Subject: Re: 'On primary node' ListSFTP not working for new cluster
>
> Tom,
>
> When you are running on Primary Node, do you see the "Tasks/Time" on the
> processor showing that tasks are running?
>
> If you right-click on the Processor and choose "View State", does it show
> anything in the table?
>
> Thanks
> -Mark
>
> On Apr 13, 2016, at 11:59 AM, Tom Stewart  wrote:
>
> I built a NiFi cluster and some test flows and things seem to be working
> fine. My three nodes show in the cluster view and are all connected with one
> marked PRIMARY. I cannot get 'On primary node' working with several
> processors I have tried. My current one is a simple flow consisting of
> ListSFTP and LogAttribute. If I set my ListSFTP to Timer Driven with Run
> schedule of 60 sec, it works fine. However then it runs on all of my nodes.
> I changed it to On Primary Node and while I see the log in the my primary
> node where it claims to start:
>
> nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler
> Thread-3-SendThread(los90hdf4.novalocal:2181)]
> org.apache.zookeeper.ClientCnxn Reading reply sessionid:0x354103c2b86,
> packet:: clientPath:null serverPath:null finished:false header:: 2,4
> replyHeader:: 2,55834574850,0  request::
> '/nifi/components/4603bfe7-6d98-4ad4-99f2-2a740034ae03,F  response::
> #1,s{51539607588,51539607597,1460559569213,1460560385643,8,0,0,0,5,0,51539607588}
> nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler
> Thread-1] o.a.nifi.processors.standard.ListSFTP
> ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] Returning CLUSTER State:
> StandardStateMap[version=8, values={}]
> nifi-app.log:2016-04-13 10:53:55,599 INFO [StandardProcessScheduler
> Thread-4] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled
> ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] to run with 1 threads
>
> However, I do not see any errors or accesses on my SFTP server where it is
> actually ever attempting to connect. I can flip the processor back to Timer
> Driven and my SFTP server starts seeing requests. But when I toggle back to
> On Primary Node it doesn't appear to be executing. I think the other
> processor I tried this with was GetHTTP with similar experience.
>
> Curious if there are any debug steps or setting recommendations that are
> useful to check if it appears that "On 

Re: 'On primary node' ListSFTP not working for new cluster

2016-04-13 Thread Tom Stewart
Yes I have three nodes on the NiFi cluster screen - two are CONNECTED and one 
is CONNECTED,PRIMARY. All have up to date heartbeats. When I run 
GenerateFlowFile with Timer Driven it executes on all three of my nodes. 

However, I just switched the PRIMARY to another node via election and now it is 
working. I moved it back to the one it was on previously and it is working 
there too. 

So it appears re-electing seemed to correct whatever state I was in that was 
preventing "On Primary Node" functionality from working. Thanks for the help!

  From: Mark Payne 
 To: users@nifi.apache.org; Tom Stewart  
 Sent: Wednesday, April 13, 2016 1:16 PM
 Subject: Re: 'On primary node' ListSFTP not working for new cluster
   
Tom,
It sounds like you do not have any node elected primary at all. If you click 
the cluster icon in the top-right corner,it should show all of the nodes in 
your cluster. Next to the nodes should be a ribbon that you can click to electa 
new primary node... Though it should also show which node is currently the 
primary. Can you check if it showsa primary node? And if so, can you verify 
that the primary node is actually doing anything? I.e., if you start 
GenerateFlowFileon all nodes, can you see that it is indeed running on the 
primary node, in addition to the others?
Thanks-Mark


On Apr 13, 2016, at 2:01 PM, Tom Stewart  wrote:
On Primary Node - the Tasks/Time stays at zero for the 5 minute interval that 
is displaying for me. When I flip it to Timer Driven it does increment as 
expected. I have my Run Schedule at "60 sec". 

The View State shows two keys (listing,timestamp/processed.timestamp), but the 
Value is not changing. Both show Scope=Cluster. I think this is residual data 
from when I had it set to "Timer Driven" and it actually processed some files. 

I neglected to mention the version - this is 0.60. 
I tried an even simpler flow with just GenerateFlowFile and LogAttribute and it 
does the same for me. 


  From: Mark Payne 
 To: users@nifi.apache.org; Tom Stewart  
 Sent: Wednesday, April 13, 2016 11:42 AM
 Subject: Re: 'On primary node' ListSFTP not working for new cluster
  
Tom,
When you are running on Primary Node, do you see the "Tasks/Time" on the 
processor showing that tasks are running?
If you right-click on the Processor and choose "View State", does it show 
anything in the table?
Thanks-Mark

On Apr 13, 2016, at 11:59 AM, Tom Stewart  wrote:
I built a NiFi cluster and some test flows and things seem to be working fine. 
My three nodes show in the cluster view and are all connected with one marked 
PRIMARY. I cannot get 'On primary node' working with several processors I have 
tried. My current one is a simple flow consisting of ListSFTP and LogAttribute. 
If I set my ListSFTP to Timer Driven with Run schedule of 60 sec, it works 
fine. However then it runs on all of my nodes. I changed it to On Primary Node 
and while I see the log in the my primary node where it claims to start:
nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler 
Thread-3-SendThread(los90hdf4.novalocal:2181)] org.apache.zookeeper.ClientCnxn 
Reading reply sessionid:0x354103c2b86, packet:: clientPath:null 
serverPath:null finished:false header:: 2,4  replyHeader:: 2,55834574850,0  
request:: '/nifi/components/4603bfe7-6d98-4ad4-99f2-2a740034ae03,F  response:: 
#1,s{51539607588,51539607597,1460559569213,1460560385643,8,0,0,0,5,0,51539607588}
nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler Thread-1] 
o.a.nifi.processors.standard.ListSFTP 
ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] Returning CLUSTER State: 
StandardStateMap[version=8, values={}]
nifi-app.log:2016-04-13 10:53:55,599 INFO [StandardProcessScheduler Thread-4] 
o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled 
ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] to run with 1 threads

However, I do not see any errors or accesses on my SFTP server where it is 
actually ever attempting to connect. I can flip the processor back to Timer 
Driven and my SFTP server starts seeing requests. But when I toggle back to On 
Primary Node it doesn't appear to be executing. I think the other processor I 
tried this with was GetHTTP with similar experience. 

Curious if there are any debug steps or setting recommendations that are useful 
to check if it appears that "On Primary Node" doesn't work for a cluster. 







   



  

Re: Kafka Schema registry

2016-04-13 Thread Joe Witt
Ok will look into a bit and put in a JIRA for this idea.  Will send
that on this thread to ensure it captures your thoughts and of course
please do add/augment it as you like.  Are you interested in helping
contribute to this from a coding perspective as well?

Thanks
Joe

On Wed, Apr 13, 2016 at 2:09 PM, Madhukar Thota
 wrote:
> Hi Joe,
>
> We are using Confluent version Kafka and using its schema registry to store
> Avro schema. we would like continue same with Nifi writing avro file to
> Confluent Kafka Schema registry.
>
> http://docs.confluent.io/2.0.0/schema-registry/docs/index.html
>
> -Madhu
>
> On Wed, Apr 13, 2016 at 1:48 PM, Joe Witt  wrote:
>>
>> Madhu,
>>
>> Do you have any information you can point to for the registry?  I know
>> of the Confluent one but I am not sure of its interfaces.  If there
>> are open source friendly ones available it certainly would be a fine
>> thing to support.  Can you point us to what you are looking at
>> specifically?
>>
>> Thanks
>> Joe
>>
>> On Wed, Apr 13, 2016 at 1:34 PM, Madhukar Thota
>>  wrote:
>> > Friends,
>> >
>> > Is it possible to use Schema registry with Kafka Processors to store and
>> > retrive Avro schema?
>> >
>> > -Madhu
>
>


Re: Kafka Schema registry

2016-04-13 Thread Madhukar Thota
Hi Joe,

We are using Confluent version Kafka and using its schema registry to store
Avro schema. we would like continue same with Nifi writing avro file to
Confluent Kafka Schema registry.

http://docs.confluent.io/2.0.0/schema-registry/docs/index.html

-Madhu

On Wed, Apr 13, 2016 at 1:48 PM, Joe Witt  wrote:

> Madhu,
>
> Do you have any information you can point to for the registry?  I know
> of the Confluent one but I am not sure of its interfaces.  If there
> are open source friendly ones available it certainly would be a fine
> thing to support.  Can you point us to what you are looking at
> specifically?
>
> Thanks
> Joe
>
> On Wed, Apr 13, 2016 at 1:34 PM, Madhukar Thota
>  wrote:
> > Friends,
> >
> > Is it possible to use Schema registry with Kafka Processors to store and
> > retrive Avro schema?
> >
> > -Madhu
>


Re: 'On primary node' ListSFTP not working for new cluster

2016-04-13 Thread Tom Stewart
On Primary Node - the Tasks/Time stays at zero for the 5 minute interval that 
is displaying for me. When I flip it to Timer Driven it does increment as 
expected. I have my Run Schedule at "60 sec". 

The View State shows two keys (listing,timestamp/processed.timestamp), but the 
Value is not changing. Both show Scope=Cluster. I think this is residual data 
from when I had it set to "Timer Driven" and it actually processed some files. 

I neglected to mention the version - this is 0.60. 
I tried an even simpler flow with just GenerateFlowFile and LogAttribute and it 
does the same for me. 


  From: Mark Payne 
 To: users@nifi.apache.org; Tom Stewart  
 Sent: Wednesday, April 13, 2016 11:42 AM
 Subject: Re: 'On primary node' ListSFTP not working for new cluster
   
Tom,
When you are running on Primary Node, do you see the "Tasks/Time" on the 
processor showing that tasks are running?
If you right-click on the Processor and choose "View State", does it show 
anything in the table?
Thanks-Mark

On Apr 13, 2016, at 11:59 AM, Tom Stewart  wrote:
I built a NiFi cluster and some test flows and things seem to be working fine. 
My three nodes show in the cluster view and are all connected with one marked 
PRIMARY. I cannot get 'On primary node' working with several processors I have 
tried. My current one is a simple flow consisting of ListSFTP and LogAttribute. 
If I set my ListSFTP to Timer Driven with Run schedule of 60 sec, it works 
fine. However then it runs on all of my nodes. I changed it to On Primary Node 
and while I see the log in the my primary node where it claims to start:
nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler 
Thread-3-SendThread(los90hdf4.novalocal:2181)] org.apache.zookeeper.ClientCnxn 
Reading reply sessionid:0x354103c2b86, packet:: clientPath:null 
serverPath:null finished:false header:: 2,4  replyHeader:: 2,55834574850,0  
request:: '/nifi/components/4603bfe7-6d98-4ad4-99f2-2a740034ae03,F  response:: 
#1,s{51539607588,51539607597,1460559569213,1460560385643,8,0,0,0,5,0,51539607588}
nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler Thread-1] 
o.a.nifi.processors.standard.ListSFTP 
ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] Returning CLUSTER State: 
StandardStateMap[version=8, values={}]
nifi-app.log:2016-04-13 10:53:55,599 INFO [StandardProcessScheduler Thread-4] 
o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled 
ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] to run with 1 threads

However, I do not see any errors or accesses on my SFTP server where it is 
actually ever attempting to connect. I can flip the processor back to Timer 
Driven and my SFTP server starts seeing requests. But when I toggle back to On 
Primary Node it doesn't appear to be executing. I think the other processor I 
tried this with was GetHTTP with similar experience. 

Curious if there are any debug steps or setting recommendations that are useful 
to check if it appears that "On Primary Node" doesn't work for a cluster. 







  

Re: Kafka Schema registry

2016-04-13 Thread Joe Witt
Madhu,

Do you have any information you can point to for the registry?  I know
of the Confluent one but I am not sure of its interfaces.  If there
are open source friendly ones available it certainly would be a fine
thing to support.  Can you point us to what you are looking at
specifically?

Thanks
Joe

On Wed, Apr 13, 2016 at 1:34 PM, Madhukar Thota
 wrote:
> Friends,
>
> Is it possible to use Schema registry with Kafka Processors to store and
> retrive Avro schema?
>
> -Madhu


Kafka Schema registry

2016-04-13 Thread Madhukar Thota
Friends,

Is it possible to use Schema registry with Kafka Processors to store and
retrive Avro schema?

-Madhu


Re: 'On primary node' ListSFTP not working for new cluster

2016-04-13 Thread Mark Payne
Tom,

When you are running on Primary Node, do you see the "Tasks/Time" on the 
processor showing that tasks are running?

If you right-click on the Processor and choose "View State", does it show 
anything in the table?

Thanks
-Mark

> On Apr 13, 2016, at 11:59 AM, Tom Stewart  wrote:
> 
> I built a NiFi cluster and some test flows and things seem to be working 
> fine. My three nodes show in the cluster view and are all connected with one 
> marked PRIMARY. I cannot get 'On primary node' working with several 
> processors I have tried. My current one is a simple flow consisting of 
> ListSFTP and LogAttribute. If I set my ListSFTP to Timer Driven with Run 
> schedule of 60 sec, it works fine. However then it runs on all of my nodes. I 
> changed it to On Primary Node and while I see the log in the my primary node 
> where it claims to start:
> 
> nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler 
> Thread-3-SendThread(los90hdf4.novalocal:2181)] 
> org.apache.zookeeper.ClientCnxn Reading reply sessionid:0x354103c2b86, 
> packet:: clientPath:null serverPath:null finished:false header:: 2,4  
> replyHeader:: 2,55834574850,0  request:: 
> '/nifi/components/4603bfe7-6d98-4ad4-99f2-2a740034ae03,F  response:: 
> #1,s{51539607588,51539607597,1460559569213,1460560385643,8,0,0,0,5,0,51539607588}
> nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler 
> Thread-1] o.a.nifi.processors.standard.ListSFTP 
> ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] Returning CLUSTER State: 
> StandardStateMap[version=8, values={}]
> nifi-app.log:2016-04-13 10:53:55,599 INFO [StandardProcessScheduler Thread-4] 
> o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled 
> ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] to run with 1 threads
> 
> However, I do not see any errors or accesses on my SFTP server where it is 
> actually ever attempting to connect. I can flip the processor back to Timer 
> Driven and my SFTP server starts seeing requests. But when I toggle back to 
> On Primary Node it doesn't appear to be executing. I think the other 
> processor I tried this with was GetHTTP with similar experience. 
> 
> Curious if there are any debug steps or setting recommendations that are 
> useful to check if it appears that "On Primary Node" doesn't work for a 
> cluster. 
> 
> 
> 



'On primary node' ListSFTP not working for new cluster

2016-04-13 Thread Tom Stewart
I built a NiFi cluster and some test flows and things seem to be working fine. 
My three nodes show in the cluster view and are all connected with one marked 
PRIMARY. I cannot get 'On primary node' working with several processors I have 
tried. My current one is a simple flow consisting of ListSFTP and LogAttribute. 
If I set my ListSFTP to Timer Driven with Run schedule of 60 sec, it works 
fine. However then it runs on all of my nodes. I changed it to On Primary Node 
and while I see the log in the my primary node where it claims to start:
nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler 
Thread-3-SendThread(los90hdf4.novalocal:2181)] org.apache.zookeeper.ClientCnxn 
Reading reply sessionid:0x354103c2b86, packet:: clientPath:null 
serverPath:null finished:false header:: 2,4  replyHeader:: 2,55834574850,0  
request:: '/nifi/components/4603bfe7-6d98-4ad4-99f2-2a740034ae03,F  response:: 
#1,s{51539607588,51539607597,1460559569213,1460560385643,8,0,0,0,5,0,51539607588}
nifi-app.log:2016-04-13 10:53:55,598 DEBUG [StandardProcessScheduler Thread-1] 
o.a.nifi.processors.standard.ListSFTP 
ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] Returning CLUSTER State: 
StandardStateMap[version=8, values={}]
nifi-app.log:2016-04-13 10:53:55,599 INFO [StandardProcessScheduler Thread-4] 
o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled 
ListSFTP[id=4603bfe7-6d98-4ad4-99f2-2a740034ae03] to run with 1 threads

However, I do not see any errors or accesses on my SFTP server where it is 
actually ever attempting to connect. I can flip the processor back to Timer 
Driven and my SFTP server starts seeing requests. But when I toggle back to On 
Primary Node it doesn't appear to be executing. I think the other processor I 
tried this with was GetHTTP with similar experience. 

Curious if there are any debug steps or setting recommendations that are useful 
to check if it appears that "On Primary Node" doesn't work for a cluster. 





Re: Large dataset on hbase

2016-04-13 Thread prabhu Mahendran
Hi,

1.Is the output of your Pig script a single file that contains all the JSON
documents corresponding to your CSV?

Yes output of my pig script having all json documents corresponding to the
CSV.

2.Also, are there any errors in logs/nifi-app.log (or on the processor in
the UI) when this happens?

Here there are no errors in both web interface(UI) and logs/nifi-app.log
file.


Thanks,

Prabhu Mahendran


On 12-Apr-2016 8:20 pm, "Bryan Bende"  wrote:

>
> Is the output of your Pig script a single file that contains all the JSON
> documents corresponding to your CSV?
> or does it create a single JSON document for each row of the CSV?
>
> Also, are there any errors in logs/nifi-app.log (or on the processor in
> the UI) when this happens?
>
> -Bryan
>
> On Tue, Apr 12, 2016 at 12:38 PM, prabhu Mahendran <
> prabhuu161...@gmail.com> wrote:
>
>> Hi,
>>
>> I just use Pig Script to convert the CSV into JSON with help of
>> ExecuteProcess.
>>
>> In my case i have use n1 from JSON document which could be stored as row
>> key in HBase Table.So n2-n22 store as columns in hbase.
>>
>> some of rows (n1's) are stored inside the table but remaining are read
>> well but not stored.
>>
>> Thanks,
>> Prabhu Mahendran
>>
>> On Tue, Apr 12, 2016 at 1:58 PM, Bryan Bende  wrote:
>>
>>> Hi Prabhu,
>>>
>>> How did you end up converting your CSV into JSON?
>>>
>>> PutHBaseJSON creates a single row from a JSON document. In your example
>>> above, using n1 as the rowId, it would create a row with columns n2 - n22.
>>> Are you seeing columns missing, or are you missing whole rows from your
>>> original CSV?
>>>
>>> Thanks,
>>>
>>> Bryan
>>>
>>>
>>>
>>> On Mon, Apr 11, 2016 at 11:43 AM, prabhu Mahendran <
>>> prabhuu161...@gmail.com> wrote:
>>>
 Hi Simon/Joe,

 Thanks for this support.
 I have successfully converted the CSV data into JSON and also insert
 those JSON data into Hbase Table using PutHBaseJSon.
 Part of JSON Sample Data like below:

 {
 "n1":"",
 "n2":"",
 "n3":"",
 "n4":"","n5":"","n6":"",
 "n7":"",
 "n8":"",
 "n9":"",

 "n10":"","n11":"","n12":"","n13":"","n14":"","n15":"","n16":"",

 "n17":"","n18":"","n19":"","n20":"","n21":"-",
 "n22":""

 }
 PutHBaseJSON:
Table Name is 'Hike' , Column Family:'Sweet' ,Row
 Identifier Field Name:n1(Element in JSON File).

 My Record Contains 15 lacks rows but HBaseTable contains only 10 rows.
 It Can Read the 15 lacks rows but stores minimum rows.

 Anyone please help me to solve this?




 Prabhu,

 If the dataset being processed can be split up and still retain the
 necessary meaning when input to HBase I'd recommend doing that.  NiFI
 itself, as a framework, can handle very large objects because its API
 doesn't force loading of entire objects into memory.  However, various
 processors may do that and I believe ReplaceText may be one that does.
 You can use SplitText or ExecuteScript or other processors to do that
 splitting if that will help your case.

 Thanks
 Joe

 On Sat, Apr 9, 2016 at 6:35 PM, Simon Ball 
 wrote:
 > Hi Prabhu,
 >
 > Did you try increasing the heap size in conf/bootstrap.conf? By
 default nifi
 > uses a very small RAM allocation (512MB). You can increase this by
 tweaking
 > java.arg.2 and .3 in the bootstrap.conf file. Note that this is the
 java
 > heap, so you will need more than your data size to account for java
 object
 > overhead. The other thing to check is the buffer sizes you are using
 for
 > your replace text processors. If you’re also using Split processors,
 you can
 > sometime run up against RAM and open file limits, if this is the
 case, make
 > sure you increase the ulimit -n settings.
 >
 > Simon
 >
 > On 9 Apr 2016, at 16:51, prabhu Mahendran 
 wrote:
 >
 > Hi,
 >
 > I am new to nifi and does not know how to process large data like one
 gb csv
 > data into hbase.while try combination of getFile and putHbase shell
 leads
 > Java Out of memory error and also try combination of replace text,
 extract
 > text and puthbasejson doesn't work on large dataset but it work
 correctly in
 > smaller dataset.
 > Can anyone please help me to solve this?
 > Thanks in advance.
 >
 > Thanks & Regards,
 > Prabhu Mahendran
 >
 >

>>>
>>>
>>
>