Flow file failover using a cluster.

2017-11-08 Thread David Marrow
Dev,

I know you are working on a solution to this but in the mean time I wanted to 
ask about how best to implement HA for flow files across a cluster.   One 
option I saw was to use an NFS mount and if a node should fail start up another 
node and point it to the failed nodes repo.   I also saw some comments about 
designing a workflow to auto recover.   My guess is to do something place the 
flow file in directory and for the last step of a flow delete it.  If it ages 
you know that it failed to process so reprocess.Just wanted to get your 
input.

Also any updates on a solution/timeframe if you have them.

Dave


Re: How to count the number of occurrences of a certain string in file

2017-11-08 Thread tzhu
Hi Mark,

I am confused about the whole process. I have the following questions:

1. From what I read, I can use TailFile to read the log file. However, it
would only read the file once (as the input file does not change). Is there
a way that I can read the file every time it gets started?

2. As you suggested, I am writing a personal Python script to handle the
count. Most of the examples online are in Jython.(My NiFi version is 1.3.0.
I suppose it's similar to Python, correct?) 
I find 
https://community.hortonworks.com/articles/75032/executescript-cookbook-part-1.html

  
as a useful guide, but I don't understand what to choose from. What's the
input and output for the script? I want to read the file line by line and
count the string occurrences. I'm currently using  key,value =
flowFile.getAttributes().iteritems() to get the file content, but it shows
"too many values to unpack". (The original file is 41.14MB)
For the output, the common way seems to be using a callback. Is it necessary
in my case? Or can I just add these attributes to the output file and
extract the attributes later?

3. To write the columns into the SQL table, the common way seems to use
"ReplaceText" and "PutSQL". I also noticed there's a processor called
PutDatabaseRecord that might combine the function of ExecuteScript and
PutSQL together. Since my Python script doesn't work so far I can't really
test the result. But is this an easier approach?

Any help is appreciated...

Thanks,
Tina



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Custom properties file for environmental promotion

2017-11-08 Thread wildo
Excellent info, as always Bryan. Very much appreciated!



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Custom properties file for environmental promotion

2017-11-08 Thread Bryan Bende
Currently, there is the variable properties file which would require a
service restart and also would need to be on all nodes in a cluster.

The last release (1.4.0) added a more user-friendly variable registry
in the UI which you can access from the context palette for a given
process group, near where the controller services for a PG are
located.

When editing variables in the UI, it will detect components that
reference them and automatically restart them. This variable registry
will be tightly integrated with the experience of using the flow
registry.

Given all of the above, there still isn't anything you can currently
do for RPGs though... you will unfortunately have to recreate it in
the target environment until they become editable.

As far as when the registry will be released, there are no set
timelines for apache projects so it will be based when the community
believes it is mature enough to be released, and when someone
volunteers to be the release manager.

That being said, a lot of good work has been done already and it is
maturing quickly.

Thanks,

Bryan


On Wed, Nov 8, 2017 at 10:21 AM, wildo  wrote:
> Great info Bryan- thanks!
>
> Regarding my first question, I talked to our admins and we only have one NIC
> anyway. So there is no need for me to limit it, and thus I don't have a need
> to use EL to discover the NIC. So that's good.
>
> Regarding the registry stuff, I found this [1] document which looks
> FANTASTIC. But I'm not able to find when/if this stuff will be released. My
> understanding is that it is not yet released, and therefore I'm assuming
> that specifying a custom.properties file via the
> nifi.variable.registry.properties is still the preferred method.
> Additionally, this will mean that:
>  1) Changes to this file require a service restart, correct?
>  2) Is it true that this needs to be specified for each node of a clustered
> environment?
>
> Thanks again!
>
> [1] https://nifi.apache.org/registry.html
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Custom properties file for environmental promotion

2017-11-08 Thread wildo
Great info Bryan- thanks!

Regarding my first question, I talked to our admins and we only have one NIC
anyway. So there is no need for me to limit it, and thus I don't have a need
to use EL to discover the NIC. So that's good.

Regarding the registry stuff, I found this [1] document which looks
FANTASTIC. But I'm not able to find when/if this stuff will be released. My
understanding is that it is not yet released, and therefore I'm assuming
that specifying a custom.properties file via the
nifi.variable.registry.properties is still the preferred method.
Additionally, this will mean that:
 1) Changes to this file require a service restart, correct?
 2) Is it true that this needs to be specified for each node of a clustered
environment?

Thanks again!

[1] https://nifi.apache.org/registry.html



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Use putdatabaserecord to update or insert data from oracle source database to vertica target db

2017-11-08 Thread Matt Burgess
Ashwin,

I recommend PutDatabaseRecord for batch updates/inserts inside NiFi
(i.e. if you don't have a bulk loader program available). If you use
something like QueryDatabaseTable, GenerateTableFetch, and/or
ExecuteSQL to get your data from the source table, it will be in Avro
format (with embedded schema). If you are looking to replicate the
source table onto the target DB, you can use PutDatabaseRecord with a
Statement Type of INSERT. If you are trying to do incremental
replication (i.e. only "new" rows every so often) and you have a
column that is always increasing (like ID or a timestamp for something
like "creation date"), you can use the aforementioned technique as
well, since you won't get duplicate rows from the source, thereby
avoiding an INSERT when the row already exists on the target.

If you are trying to do periodic replication (i.e. full table copies
every so often), you might be better off trying to truncate the target
table first and then using the aforementioned technique.  Basically I
am saying you want to avoid an UPSERT situation, since there is no
UPSERT statement type in PutDatabaseRecord.  UPSERTs are sometimes
performed by setting the Statement Type property in PutDatabaseRecord
to "Use statement.type attribute", setting statement.type attribute to
"insert", then routing the failure relationship to an UpdateAttribute
processor which changes the statement.type to "update".

For completeness, if none of these work and you want to issue SQL
statements to the target database, you can use any number of
conversion processors. ReplaceText -> PutSQL is a non-record-aware
option, ConvertRecord -> PutDatabaseRecord is a record-aware option. I
am working on a blog post that describes how to take raw data, create
SQL from it, and execute that SQL, all using record-aware processors.
Before the record-aware stuff, you had to split the data into
individual rows/lines, convert to JSON, convert to SQL, then PutSQL a
batch at a time. With the record-aware processors, you can handle your
dataset as a single entity instead of splitting it up and iterating
over them with the exact same transformation(s).

Regards,
Matt

P.S. Since the question is about using NiFi processors (versus
developing them), you might consider asking the users list instead
(us...@nifi.apache.org). If you are not subscribed, you can send an
email to users-subscr...@nifi.apache.org. Once you are subscribed your
emails will go directly to the list (instead of needing to be manually
moderated).

On Wed, Nov 8, 2017 at 1:48 AM, ashwinb  wrote:
> Hi ,
> Can someone let me know which processor to use  to do update or insert data
> from oracle source table  to vertica target table in batch mode?
>
> Thanks
> Ashwin
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Custom properties file for environmental promotion

2017-11-08 Thread Bryan Bende
Hello,

Regarding Remote Process Groups, this is definitely something that
needs to be improved. There is a JIRA to make the URL editable [1].

A significant amount of work has been done on the flow registry [2],
and this will become the primary way to deploy flows across
environments.

The typical scenario would be to save your dev flow to the registry,
and when importing it to QA or prod, you would then edit the RPG
(based on the JIRA to make it editable) to be the URL for that
environment.

After that the URL would be set for that environment and would not be
changed when upgrading to newer versions.

Hope this helps.

Thanks,

Bryan

[1] https://issues.apache.org/jira/browse/NIFI-4526
[2] https://github.com/apache/nifi-registry


On Tue, Nov 7, 2017 at 11:58 PM, wildo  wrote:
> We have nearly wrapped up our testing with out Nifi scripts in dev, and are
> now looking to push to QA. I found an article about creating a custom
> properties file in order to specify each of your environmental specific
> variables, and then specifying that file in nifi.properties at
> nifi.variable.registry.properties.
>
> This will work fine omitting two cases I can think of.
>
> 1) We have a number of ListenTCP processors which require the "local network
> interface" to be specified. I have read that Expression Language can access
> system properties, but I haven't seen any example about how to use this. Can
> anyone share how EL might be used to grab the local network interface for
> each environment automatically?
>
> 2) We use Remote Process Groups with Site-to-Site for load balancing. In the
> RPG, you have to specify an absolute url to the nodes in the remote site.
> The RPG doesn't indicate that EL is acceptable in this field. Can anyone
> chime in on the possibility of using EL to grab a property for the RPG url?
>
> Thanks!
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Use putdatabaserecord to update or insert data from oracle source database to vertica target db

2017-11-08 Thread ashwinb
Hi ,
Can someone let me know which processor to use  to do update or insert data
from oracle source table  to vertica target table in batch mode?

Thanks
Ashwin



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/