Re: FetchSFTP: Rename file on move

2020-08-11 Thread Jairo Henao
Thanks for the suggestions.

The combination of UpdateAttribute (Setting the new filename) with PutSFTP
works fine!




On Tue, Aug 11, 2020 at 5:59 PM Andy LoPresto  wrote:

> You can also use an UpdateAttribute processor to change the “filename”
> attribute, which is what any “file persistence” processor (PutSFTP,
> PutFile, etc.) will use when writing the file out.
>
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> He/Him
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Aug 11, 2020, at 3:47 PM, Joe Witt  wrote:
>
> Jairo
>
> You can use a PutSFTP after Fetch to place it where you want.
>
> Thanks
>
> On Tue, Aug 11, 2020 at 3:16 PM Jairo Henao 
> wrote:
>
>> Hi community,
>>
>> Is there a way to rename a file before moving it with FetchSFTP?
>>
>> After processing a file, I need to move it to a folder and add a
>> timestamp suffix to it. The file in the source always has the same name,  but
>> I need that when moving it they are not overwritten.
>>
>> Any ideas or is it necessary to request a modification to the processor?
>>
>> Thanks
>>
>> --
>> Jairo Henao
>> @jairohenaorojas
>>
>>
>

-- 
Jairo Henao
@jairohenaorojas


Re: FetchSFTP: Rename file on move

2020-08-11 Thread Andy LoPresto
You can also use an UpdateAttribute processor to change the “filename” 
attribute, which is what any “file persistence” processor (PutSFTP, PutFile, 
etc.) will use when writing the file out. 


Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
He/Him
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Aug 11, 2020, at 3:47 PM, Joe Witt  wrote:
> 
> Jairo
> 
> You can use a PutSFTP after Fetch to place it where you want.
> 
> Thanks
> 
> On Tue, Aug 11, 2020 at 3:16 PM Jairo Henao  > wrote:
> Hi community,
> 
> Is there a way to rename a file before moving it with FetchSFTP?
> 
> After processing a file, I need to move it to a folder and add a timestamp 
> suffix to it. The file in the source always has the same name,  but I need 
> that when moving it they are not overwritten. 
> 
> Any ideas or is it necessary to request a modification to the processor?
> 
> Thanks
> 
> -- 
> Jairo Henao
> @jairohenaorojas
> 



Re: FetchSFTP: Rename file on move

2020-08-11 Thread Joe Witt
Jairo

You can use a PutSFTP after Fetch to place it where you want.

Thanks

On Tue, Aug 11, 2020 at 3:16 PM Jairo Henao 
wrote:

> Hi community,
>
> Is there a way to rename a file before moving it with FetchSFTP?
>
> After processing a file, I need to move it to a folder and add a timestamp
> suffix to it. The file in the source always has the same name,  but I
> need that when moving it they are not overwritten.
>
> Any ideas or is it necessary to request a modification to the processor?
>
> Thanks
>
> --
> Jairo Henao
> @jairohenaorojas
>
>


FetchSFTP: Rename file on move

2020-08-11 Thread Jairo Henao
Hi community,

Is there a way to rename a file before moving it with FetchSFTP?

After processing a file, I need to move it to a folder and add a timestamp
suffix to it. The file in the source always has the same name,  but I need
that when moving it they are not overwritten.

Any ideas or is it necessary to request a modification to the processor?

Thanks

-- 
Jairo Henao
@jairohenaorojas


Re: ReplaceText - Out of memory - Requested array size exceeds VM limit

2020-08-11 Thread Joe Witt
Asmath

ReplaceText either loads full lines at a time or loads the entire file into
memory.  So keep that in mind.

If you need something that only loads at worst 1-2x the length of the
replacement string you're interested in then I'd recommend just using a
scripted processor that does precisely what you need for now.  You can
stream from the input and stream to the output and result in an extremely
efficient memory usage for arbitrarily large inputs.

Thanks

On Tue, Aug 11, 2020 at 1:01 PM KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:

> Hi ,
>
> I have a file that is throwing an error when looking for particular string
> and replace it with other string.
>
> Requested array size exceeds VM limit
>
> Any suggestions for this? File is around 800 MB.
>
> Thanks,
> Asmath
>


ReplaceText - Out of memory - Requested array size exceeds VM limit

2020-08-11 Thread KhajaAsmath Mohammed
Hi ,

I have a file that is throwing an error when looking for particular string
and replace it with other string.

Requested array size exceeds VM limit

Any suggestions for this? File is around 800 MB.

Thanks,
Asmath


Re: Get all available variables in the InvokeScriptedProcessor

2020-08-11 Thread Matt Burgess
Although this is an "unnatural" use of Groovy (and a conversation much
better suited for the dev list :), it is possible to get at a map of
defined variables (key and value). This counts on particular
implementations of the API and that there is no SecurityManager
installed in the JVM so Groovy ignores boundaries like private
classes.  In InvokeScriptedProcessor or ExecuteScript it would look
something like:

def varRegistry = context.procNode.variableRegistry
def varMap = [:] as Map
storeVariables(varMap, varRegistry)

The storeVariables method is just a parent-first recursive call to
fill your map with variables, this allows child registries to override
variables that were declared "above":

def storeVariables(map, registry) {
  if(!registry) return map
  def parent
try {
parent = registry.parent
} catch(t) {
map.putAll(registry.variableMap)
  return map
}
  if(!parent) {
map.putAll(registry.variableMap)
return map
  }
  storeVariables(map, parent)
}

It works because "context" happens to be a StandardProcessContext
instance, which has a private "procNode" member of type ProcessorNode,
which is an extension of AbstractComponentNode which has a
getVariableRegistry() method.

It's definitely a hack so please use at your own risk :)

Regards,
Matt

On Tue, Aug 11, 2020 at 1:18 AM Saloni Udani  wrote:
>
> Thanks Andy, but with expression language I can only get values of  
> attributes and not both key and value. In our case , variable key also has 
> some useful information.
>
> Thanks
>
> On Mon, Aug 10, 2020 at 10:32 PM Andy LoPresto  wrote:
>>
>> Those variables are available to be referenced via Expression Language in 
>> the flowfile attributes. They are not intended for direct programmatic 
>> access via code, so you don’t need to address them directly in your Groovy 
>> code.
>>
>> If you need to populate specific values at configuration time, you can 
>> define dynamic properties on the processor config and reference those 
>> directly in code (see any existing processor source for examples).
>>
>> Andy LoPresto
>> alopre...@apache.org
>> alopresto.apa...@gmail.com
>> He/Him
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On Aug 9, 2020, at 10:40 PM, Saloni Udani  wrote:
>>
>> Thanks Andy.
>>
>> By variables I meant NiFi process group variables.
>> 
>>
>> On Sat, Aug 8, 2020 at 12:39 AM Andy LoPresto  wrote:
>>>
>>> I think we need additional clarification on what you mean by “variables”. 
>>> If you are referring to actual Groovy variables, you can enumerate them 
>>> using the binding available in the context (see below). If you mean the 
>>> attributes available on a flowfile, you can access them similarly.
>>>
>>> Find all variables starting with prefix:
>>>
>>> def varsStartingWithABC = this.binding.variables.findAll { k,v -> 
>>> k.startsWith(“a.b.c”) }
>>>
>>> Find all attributes starting with prefix:
>>>
>>> def attrsStartingWithABC = flowfile.getAttributes().findAll { k,v -> 
>>> k.startsWith(“a.b.c”) }
>>>
>>>
>>>
>>> Andy LoPresto
>>> alopre...@apache.org
>>> alopresto.apa...@gmail.com
>>> He/Him
>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>>
>>> On Aug 7, 2020, at 2:11 AM, Saloni Udani  wrote:
>>>
>>> Hi,
>>> We use NiFi 1.5.0.
>>> Our use case is to get particular key pattern variables (key and value) in 
>>> the groovy InvokeScriptedProcessor. E.g I want all variables whose key 
>>> starts with "a.b.c". By this I can write a generic logic on certain 
>>> categories of variables for further use.
>>>
>>> Is there a way programmatically to get all variables with a certain key 
>>> pattern? Or for that matter is there a way programmatically to get all 
>>> available variables Map ? In NiFi 1.5.0 or further versions.
>>>
>>>
>>> Thanks
>>> Saloni Udani
>>>
>>>
>>


Re: NIFI /Kafka - Data loss possibility with node failures

2020-08-11 Thread KhajaAsmath Mohammed
Thanks Joe. This is really helpful.

On Tue, Aug 11, 2020 at 9:33 AM Joe Witt  wrote:

> Asmath
>
> In a traditional installation, regardless of how a NiFi cluster obtains
> data (kafka, ftp, HTTP calls, TCP listening, etc, ) once it is
> responsible for the data it has ack'd its receipt to the source(s).
>
> If that NiFi node were to become offline the data it owns is delayed. If
> that node becomes unrecoverably offline the data is likely to be lost.
>
> If you're going to run in environments where there are more powerful
> storage alignment options like in many Kubernetes based deployments then
> there are definitely options to solve the possibility of loss case to a
> very high degree and to ensure there is only minimal data delay in the
> worst case.
>
> In a Hadoop style environment though the traditional model I describe
> works very well, leverages appropriate RAID, and is proven highly reliable
> and durable.
>
> Thanks
>
> On Tue, Aug 11, 2020 at 7:26 AM KhajaAsmath Mohammed <
> mdkhajaasm...@gmail.com> wrote:
>
>> Hi,
>>
>> [image: image.png]
>>
>> we have 3 node NIFI clusters and due to some reasons NODE 2 and NODE 3
>> were disconnected when the flow was running . Consume kafka was reading
>> data from all node settings and loading the data into the database.
>>
>> In the above scenario, is there a possibility of loss of data?
>> Distributed processing in terms of hadoop will handle it automatically and
>> assign the task to other active nodes. Will it be the same case with the
>> NIFI cluster?
>>
>> Thanks,
>> Asmath
>>
>


Re: NIFI /Kafka - Data loss possibility with node failures

2020-08-11 Thread Joe Witt
Asmath

In a traditional installation, regardless of how a NiFi cluster obtains
data (kafka, ftp, HTTP calls, TCP listening, etc, ) once it is
responsible for the data it has ack'd its receipt to the source(s).

If that NiFi node were to become offline the data it owns is delayed. If
that node becomes unrecoverably offline the data is likely to be lost.

If you're going to run in environments where there are more powerful
storage alignment options like in many Kubernetes based deployments then
there are definitely options to solve the possibility of loss case to a
very high degree and to ensure there is only minimal data delay in the
worst case.

In a Hadoop style environment though the traditional model I describe works
very well, leverages appropriate RAID, and is proven highly reliable and
durable.

Thanks

On Tue, Aug 11, 2020 at 7:26 AM KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:

> Hi,
>
> [image: image.png]
>
> we have 3 node NIFI clusters and due to some reasons NODE 2 and NODE 3
> were disconnected when the flow was running . Consume kafka was reading
> data from all node settings and loading the data into the database.
>
> In the above scenario, is there a possibility of loss of data?
> Distributed processing in terms of hadoop will handle it automatically and
> assign the task to other active nodes. Will it be the same case with the
> NIFI cluster?
>
> Thanks,
> Asmath
>


NIFI /Kafka - Data loss possibility with node failures

2020-08-11 Thread KhajaAsmath Mohammed
Hi,

[image: image.png]

we have 3 node NIFI clusters and due to some reasons NODE 2 and NODE 3 were
disconnected when the flow was running . Consume kafka was reading data
from all node settings and loading the data into the database.

In the above scenario, is there a possibility of loss of data?  Distributed
processing in terms of hadoop will handle it automatically and assign the
task to other active nodes. Will it be the same case with the NIFI cluster?

Thanks,
Asmath


InvokeGRPC with External Proto

2020-08-11 Thread Chris Nicholas

Hi,

I need to wire an external gRPC service into Nifi - could anyone provide some 
pointers on how to change the service IDL / proto within the InvokeGRPC 
processor to support the existing external gRPC service proto definitions?

Am trying to avoid having to writing a middleware processor to invoke the 
external service ideally.

Thanks

Chris

Re: Data Provenance Stops Working

2020-08-11 Thread Pierre Villard
Darren - I'm using NiFi 1.11.4 for a while now (with OOTB configuration)
and I do have provenance data for the flows I'm running.

Le lun. 10 août 2020 à 23:24, Wyll Ingersoll <
wyllys.ingers...@keepertech.com> a écrit :

> Ah! That fixed my problem!
>
> I am running a secure/authenticated configuration.  My user had the
> correct policy permissions when viewed from the hamburger menu, but not on
> the access-policies for the individual processors/canvas.
>
> Very confusing!  One would think if the admin user had the permissions,
> they would apply everywhere, but apparently not.
>
> Anyway, now I can see provenance data.  Thanks for the tip!
> --
> *From:* Shawn Weeks 
> *Sent:* Monday, August 10, 2020 5:19 PM
> *To:* users@nifi.apache.org 
> *Subject:* Re: Data Provenance Stops Working
>
>
> That sounds like a permission issue. Are you running in secure mode? If so
> right click on the main canvas and go to access policies and make sure your
> current user is allowed provenance access, you’ll also have to go to the
> policies section in the system menu in the upper right hand corner and do
> the same.
>
>
>
> Thanks
>
> Shawn
>
>
>
> *From: *Darren Govoni 
> *Reply-To: *"users@nifi.apache.org" 
> *Date: *Monday, August 10, 2020 at 3:52 PM
> *To: *"users@nifi.apache.org" 
> *Subject: *Re: Data Provenance Stops Working
>
>
>
> I also use 1.11.4 and out of the box there IS NO provenance data
> whatsoever. It just doesn't work if you install and run nifi as is.
>
> Sent from my Verizon, Samsung Galaxy smartphone
>
> Get Outlook for Android 
>
>
> --
>
> *From:* Shawn Weeks 
> *Sent:* Monday, August 10, 2020 2:23:19 PM
> *To:* users@nifi.apache.org 
> *Subject:* Re: Data Provenance Stops Working
>
>
>
> It sounds like if I expand the retention time a lot, say 30 days the issue
> should be less bad?
>
>
>
> Thanks
>
> Shawn
>
>
>
> *From: *Mark Payne 
> *Reply-To: *"users@nifi.apache.org" 
> *Date: *Monday, August 10, 2020 at 12:37 PM
> *To: *"users@nifi.apache.org" 
> *Subject: *Re: Data Provenance Stops Working
>
>
>
> Shawn / Wyll,
>
>
>
> I think you’re probably running into NIFI-7346 [1], which basically says
> there’s a case in which NiFi may “age off” old data even when it’s still
> the file that’s being actively written to. In Linux/OSX this results in
> simply deleting the file, and then anything else written to it disappears
> into the ether. Of course, now the file never exceeds the max size, since
> it’ll be 0 bytes forever, os it never rolls over again. So when this
> happens, no more provenance data gets created until NiFi is restarted.
>
>
>
> It’s also possible that you’re hitting NIFI-7375 [2]. This Jira only
> affects you if you get to provenance by right-clicking on a Processor and
> clicking View Provenance (i.e., not if you go to the Hamburger Menu in the
> top-right corner and go to Provenance from there and search that way). If
> this is the problem, once you right-click and go to View Provenance, you
> can actually click the Search icon (magnifying glass) in that empty
> Provenance Results panel and click Search and then it will actually bring
> back the results. So that’s obnoxious but it’s a workaround that may help.
>
>
>
> The good news is that both of these have been addressed for 1.12.0, which
> sounds like it should be coming out very soon!
>
>
>
> Thanks
>
> -Mark
>
>
>
> [1] https://issues.apache.org/jira/browse/NIFI-7346
>
> [2] https://issues.apache.org/jira/browse/NIFI-7375
>
>
>
>
>
> On Aug 10, 2020, at 1:26 PM, Joe Witt  wrote:
>
>
>
> shawn - i believe it is related to our default settings and have phoned a
> friend to jump in here when able. but default retention and default
> sharding i *think* can result in this.  You can generate a thread dump
> before and after the locked state to see what it is stuck/sitting on.  That
> will help here
>
>
>
> Thanks
>
>
>
> On Mon, Aug 10, 2020 at 10:24 AM Shawn Weeks 
> wrote:
>
> Out of the box even the initial admin user has to be granted permission I
> think, mine worked fine for several months since 1.11.1 was released and
> just started having an issues a couple of weeks ago. I’ve increasing the
> retention time a bit to see if that improves the situation a bit.
>
>
>
> Thanks
>
> Shawn Weeks
>
>
>
> *From: *Wyll Ingersoll 
> *Reply-To: *"users@nifi.apache.org" 
> *Date: *Monday, August 10, 2020 at 12:22 PM
> *To: *"users@nifi.apache.org" 
> *Subject: *Re: Data Provenance Stops Working
>
>
>
> I run 1.11.4 in a cluster on AWS also and have a similar issue with the
> provenance data, I can't ever view it.  It's probably somehow misconfigured
> but I haven't been able to figure it out.
> --
>
> *From:* Andy LoPresto 
> *Sent:* Monday, August 10, 2020 1:11 PM
> *To:* users@nifi.apache.org 
> *Subject:* Re: Data Provenance Stops Working
>
>
>
> Shawn,
>
>
>
> I don’t know if this is specifically related, but there were a