Re: ExecuteProcess processor with TLS1.2 error: "failed setting cipher list"

2020-05-24 Thread Eric Chaves
Hi Andy, sorry for not answering before. I Just figured this one out (after
a lot of trial and error). This one was tricky. ;)

The curl being used was the same one that I ran on bash. The error was
related to how I was passing the arguments to curl. In bash I was passing
the argument *--ciphers 'DEFAULT:!DH' *with a single quote to prevent bash
expansion and when I declared the arguments on the processor I did the same
however it seems that the processor does some quoting on it's own and curl
was getting confused with the name of the cipher.

Once I removed the quotes the command worked just fine.

Thanks for the help anyway.


Em sex., 22 de mai. de 2020 às 15:11, Andy LoPresto 
escreveu:

> Hi Eric,
>
> Can you verify a couple things?
>
> 1. The specific curl instance you’re using in the terminal and in NiFi are
> the same? (i.e. run this command on the terminal and in an ExecuteProcess
> processor: $ which curl)
> 2. Run curl -V to see which version of openssl curl is using in both
> scenarios.
> 3. Run curl -vvv to see increased verbosity output.
>
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> He/Him
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On May 22, 2020, at 8:08 AM, Eric Chaves  wrote:
>
> Hi folks,
>
>  I have a flow that downloads files from an FTP server over SSL using
> TLS1.2. To achieve this I use curl command line in an ExecuteProcess
> processor. This routine has been working ok until recently when we tried it
> on an upgraded  NiFi server.
>
> After tracking down the error we noticed that it was due to the updated
> version of open-ssl recommendation of not allowing the use of old ciphers.
> The FTP server in question is using TLS1.2 with a weak certificate but
> since it is not managed by me updating the server is not an option.
>
> After some troubleshooting I managed to adjust my curl command and it is
> working when I execute it manually in a bash session on my nifi server (to
> be precise I ran it inside the docker container that is running the nifi)
> but when I execute the same command line with the ExecuteProcess processor
> I got the following error: "failed setting cipher list"
>
> The curl command and argument line I'm executing is:
>
> *curl -v -slk --tlsv1.2 --ciphers 'DEFAULT:!DH' --user
> ${FTP_USER}:${FTP_PASS} --ftp-ssl ftp://${FTP_HOST}:${FTP_PORT}/${FTP_DIR}/
> <ftp://$%7BFTP_HOST%7D:$%7BFTP_PORT%7D/$%7BFTP_DIR%7D/>*
>
> The actual verbose error from inside the ExecuteProcess processor is:
>
>
>
>
>
>
>
>
>
> **   Trying 200.230.161.229...* TCP_NODELAY set* Expire in 200 ms for 4
> (transfer 0x55f98e691f50)* Connected
> to  () port 
> (#0)< 220 ProFTPD 1.3.4d Server (...) []> AUTH SSL<
> 234 AUTH SSL successful* failed setting cipher list: 'DEFAULT:!DH'* Closing
> connection 0*
>
> So it seems that some configuration either on the nifi or the
> ExecuteProcess is not allowing me to force my curl command to use insecure
> ciphers with openssl.
>
> How can I circumvent this?
>
> Best regards,
>
> Eric
>
>
>


Re: How write multiple prepends/appens in a single EL expression?

2020-03-19 Thread Eric Chaves
Thanks!

Em qui., 19 de mar. de 2020 às 17:29, Paul Parker 
escreveu:

> You can go without append and prepend.
>
> sftp://${sftp.remote.host}/${path}/${filename}
>
> Give it a try.
>
> Eric Chaves  schrieb am Do., 19. März 2020, 21:05:
>
>> Hi folks,
>>
>> Sorry for another newbie question. =) I'm trying to write a single EL
>> expression to perform multiples string manipulation but I keep hitting an
>> invalid EL expression.
>>
>> What would be the correct way to write this expression: 
>> ${sftp.remote.host:prepend("sftp://;):append:("/"):append(${path}):append("/"):append(${filename})}
>> ?
>>
>> Thanks in advance,
>>
>> Eric
>>
>


How write multiple prepends/appens in a single EL expression?

2020-03-19 Thread Eric Chaves
Hi folks,

Sorry for another newbie question. =) I'm trying to write a single EL
expression to perform multiples string manipulation but I keep hitting an
invalid EL expression.

What would be the correct way to write this expression:
${sftp.remote.host:prepend("sftp://;):append:("/"):append(${path}):append("/"):append(${filename})}
?

Thanks in advance,

Eric


Re: How to iterate over all dynamic properties in python invokeScriptedProcessor?

2020-03-18 Thread Eric Chaves
Thanks!

Em qua., 18 de mar. de 2020 às 19:42, Andy LoPresto 
escreveu:

> My Python is not great but I would look at NiPyAPI as an example and the
> NiFi REST API [1] to see how objects are nested. The ProcessorConfigDTO [2]
> contains a dict of str: PropertyDescriptorDTO at descriptors, so you could
> iterate over that as you’re asking. For example, the
> list_sensitive_processors function [3] operates very similarly.
>
> def list_sensitive_processors(pg_id='root', summary=False): """ Returns a
> flattened list of all Processors on the canvas which have sensitive
> properties that would need to be managed during deployment Args: pg_id
> (str): The UUID of the Process Group to start from, defaults to the
> Canvas root summary (bool): True to return just the list of relevant 
> properties
> per Processor, False for the full listing Returns: list[ProcessorEntity]
> or list(dict) """ assert isinstance(pg_id, six.string_types), "pg_id
> should be a string" assert isinstance(summary, bool) cache = nipyapi.
> config.cache.get('list_sensitive_processors') if not cache: cache = []
> matches = [] proc_list = list_all_processors(pg_id) for proc in proc_list:
> if proc.component.type in cache: matches.append(proc) else: sensitive_test
> = False for _, detail in proc.component.config.descriptors.items(): if
> detail.sensitive is True: sensitive_test = True break if sensitive_test:
> matches.append(proc) cache.append(str(proc.component.type)) if cache:
> nipyapi.config.cache['list_sensitive_processors'] = cache if summary:
> return [ {x.id: [ p for p, q in x.component.config.descriptors.items() if
> q.sensitive is True]} for x in matches ] return matches
>
> [1] https://nifi.apache.org/docs/nifi-docs/rest-api/index.html
> [2]
> https://github.com/Chaffelson/nipyapi/blob/master/nipyapi/nifi/models/processor_config_dto.py
> [3]
> https://github.com/Chaffelson/nipyapi/blob/master/nipyapi/canvas.py#L225
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> He/Him
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Mar 18, 2020, at 12:12 PM, Eric Chaves  wrote:
>
> Hi folks,
>
> I'm trying to write a quick python InvokeProcessorScript where I need to
> iterate over all Dynamic Properties from the processor to select just a few
> and I'm having some difficulties with the class types between Jython and
> Java.
>
> Can someone show me how to iterate over "context.properties"  to get each
> PropertyDescriptor?
>
> I'd like do something like this:
>
>
>
>
>
>
>
>
> *for prop in context.properties:  name = prop.name
> <http://prop.name/>  value =
> context.getProperty(prop).evaluateAttributeExpressions(flowFile).getValue()
> self.log.info <http://self.log.info/>("attr {name}:
> {value}".format(name=name, value=value))  if prop.dynamic:
>   if name in lista and re.search(value, filename):
> attrMap['TipoArquivo'] = nameelse:  attrMap[name] =
> value*
>
> Cheers
>
>
>


How to iterate over all dynamic properties in python invokeScriptedProcessor?

2020-03-18 Thread Eric Chaves
Hi folks,

I'm trying to write a quick python InvokeProcessorScript where I need to
iterate over all Dynamic Properties from the processor to select just a few
and I'm having some difficulties with the class types between Jython and
Java.

Can someone show me how to iterate over "context.properties"  to get each
PropertyDescriptor?

I'd like do something like this:








*for prop in context.properties:  name = prop.name
  value =
context.getProperty(prop).evaluateAttributeExpressions(flowFile).getValue()
self.log.info ("attr {name}:
{value}".format(name=name, value=value))  if prop.dynamic:
  if name in lista and re.search(value, filename):
attrMap['TipoArquivo'] = nameelse:  attrMap[name] =
value*

Cheers


Re: Can someone confirm if bug NIFI-5830 is fixed on docker hub images

2020-03-16 Thread Eric Chaves
I think it was more on how docker-compose persist volumes  and the lack of
proper setup on my side. It was a pretty standard docker-file for
development. Please find it attached.

On the good side, this incident raised my attention to some steps I should
plan ahead for upgrades in production since we use the images in our ECS
clusters.

Cheers,.

Em seg., 16 de mar. de 2020 às 11:51, Aldrin Piri 
escreveu:

> Thanks for following up.  The issue you ran into, was it with the default
> volumes as provided by our image or with additional volumes you specified?
> Curious if there is anything we should rethink on this front to help
> alleviate such issues and ease transitions.
>
> aldrin
>
> On Mon, Mar 16, 2020 at 9:46 AM Eric Chaves  wrote:
>
>> Hi Aldrin,
>>
>> Ok, I think I got my error. I'm using a docker-compose to run a local
>> development cluster and it was first created using the 1.8 image. To
>> upgrade to 1.11.3 I first removed the containers (using docking-compose
>> down) but without informing the "-v" flag to remove volumes and then
>> launched the containers again (docker-compose up) but since the volume was
>> persisted it seems the nifi components was still at 1.8 (at least that was
>> the version showing in the UI). Once I fully removed the containers and
>> volumes (docker-compose down -v) and restarted it came with 1.11 version
>> and the error was indeed gone.
>>
>> Sorry for the inconvenience.
>>
>> Cheers,
>>
>>
>>
>> Em dom., 15 de mar. de 2020 às 21:26, Aldrin Piri 
>> escreveu:
>>
>>> Hi Eric,
>>>
>>> I can only speak to the image, but as it is tagged in 1.9.0, the
>>> associated code will also be in what the community offers through Docker
>>> Hub.  We make use of the released convenience binaries for constructing the
>>> image.
>>>
>>> Some things that would help us investigate further:
>>> Could you please share the error you are encountering?
>>> Have you verified the correct functionality for your setup with a
>>> standalone NiFi instance?
>>> How are you running your Docker containers?  Do you have these on the
>>> same Docker network?  If you are using compose, this should be taken care
>>> of by default, and if so, could you please share your compose file?
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Mar 15, 2020 at 7:42 PM Eric Chaves  wrote:
>>>
>>>> Hi folks,
>>>>
>>>> I'm facing an error using Redis DMC in standalone mode with an external
>>>> Redis server were the RedisConnectionPoolService is not able to connect to
>>>> a non-localhost redis in this mode.
>>>>
>>>> This error was already reported (
>>>> https://issues.apache.org/jira/browse/NIFI-5830) and marked fixed in
>>>> nifi 1.9 however I'm trying the nifi docker images from docker hub and all
>>>> images that I've tried kept throwing the same error (I've tried the
>>>> 1.9.0, 1.9.1 and the latest 1.11.3).
>>>>
>>>> In order to confirm if the bug I'm facing seems to be the same I did a
>>>> port forward using socat between localhost (nifi container) and my redis
>>>> container and the error was gone,
>>>>
>>>> Am I missing something? Why I'm getting this error if it's fixed?
>>>>
>>>> Best regards,
>>>>
>>>> Eric
>>>>
>>>


docker-compose.yaml
Description: application/yaml


Re: Can someone confirm if bug NIFI-5830 is fixed on docker hub images

2020-03-16 Thread Eric Chaves
Hi Aldrin,

Ok, I think I got my error. I'm using a docker-compose to run a local
development cluster and it was first created using the 1.8 image. To
upgrade to 1.11.3 I first removed the containers (using docking-compose
down) but without informing the "-v" flag to remove volumes and then
launched the containers again (docker-compose up) but since the volume was
persisted it seems the nifi components was still at 1.8 (at least that was
the version showing in the UI). Once I fully removed the containers and
volumes (docker-compose down -v) and restarted it came with 1.11 version
and the error was indeed gone.

Sorry for the inconvenience.

Cheers,



Em dom., 15 de mar. de 2020 às 21:26, Aldrin Piri 
escreveu:

> Hi Eric,
>
> I can only speak to the image, but as it is tagged in 1.9.0, the
> associated code will also be in what the community offers through Docker
> Hub.  We make use of the released convenience binaries for constructing the
> image.
>
> Some things that would help us investigate further:
> Could you please share the error you are encountering?
> Have you verified the correct functionality for your setup with a
> standalone NiFi instance?
> How are you running your Docker containers?  Do you have these on the same
> Docker network?  If you are using compose, this should be taken care of by
> default, and if so, could you please share your compose file?
>
>
>
>
>
> On Sun, Mar 15, 2020 at 7:42 PM Eric Chaves  wrote:
>
>> Hi folks,
>>
>> I'm facing an error using Redis DMC in standalone mode with an external
>> Redis server were the RedisConnectionPoolService is not able to connect to
>> a non-localhost redis in this mode.
>>
>> This error was already reported (
>> https://issues.apache.org/jira/browse/NIFI-5830) and marked fixed in
>> nifi 1.9 however I'm trying the nifi docker images from docker hub and all
>> images that I've tried kept throwing the same error (I've tried the
>> 1.9.0, 1.9.1 and the latest 1.11.3).
>>
>> In order to confirm if the bug I'm facing seems to be the same I did a
>> port forward using socat between localhost (nifi container) and my redis
>> container and the error was gone,
>>
>> Am I missing something? Why I'm getting this error if it's fixed?
>>
>> Best regards,
>>
>> Eric
>>
>


Can someone confirm if bug NIFI-5830 is fixed on docker hub images

2020-03-15 Thread Eric Chaves
Hi folks,

I'm facing an error using Redis DMC in standalone mode with an external
Redis server were the RedisConnectionPoolService is not able to connect to
a non-localhost redis in this mode.

This error was already reported (
https://issues.apache.org/jira/browse/NIFI-5830) and marked fixed in nifi
1.9 however I'm trying the nifi docker images from docker hub and all
images that I've tried kept throwing the same error (I've tried the
1.9.0, 1.9.1 and the latest 1.11.3).

In order to confirm if the bug I'm facing seems to be the same I did a port
forward using socat between localhost (nifi container) and my redis
container and the error was gone,

Am I missing something? Why I'm getting this error if it's fixed?

Best regards,

Eric


Re: What UDFs are supported by QueryRecord?

2019-10-17 Thread Eric Chaves
Hi mark,

Thanks for pointing that out but from the docs I only got how to use RPATH
to get a RecordPath value. How should I do for example if I wanted to apply
a RecordPath function to in a Record Field?

Em qui, 17 de out de 2019 às 14:57, Mark Payne 
escreveu:

> Eric,
>
> You can use RecordPath with QueryRecord, via the RPATH, RPATH_STRING,
> RPATH_INT, etc. These are explained in the Processor's documentation. For
> example, see [1].
>
> You can also use the Expression Language with QueryRecord. The Expression
> Language is evaluated before the SQL is parsed. So, for example, if you had
> an attribute named 'Field of Interest' you could actually use SQL like:
>
> SELECT ${'Field of Interest'}
> FROM FLOWFILE
>
> Thanks
> -Mark
>
> [1]
> http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.9.2/org.apache.nifi.processors.standard.QueryRecord/index.html
>
> On Oct 16, 2019, at 2:10 PM, Eric Chaves  wrote:
>
> Ok, figured out a way to do it. I noticed that QueryRecord uses Apache
> Calcite so I tried some of Calcite functions until I got into this
> statement that seems to work:
>
> SELECT index FROM FLOWFILE WHERE CAST( SUBSTRING(index FROM
> CHAR_LENGTH(index)-9) AS DATE) <= CURRENT_DATE
>
> Anyhow, I would still like to know if (and how) I could use either NiFi
> expression or RecordPath functions in a QueryRecord statement.
>
> Thanks in advance,
>
> Em qua, 16 de out de 2019 às 14:30, Eric Chaves  escreveu:
>
>> Hi Folks,
>>
>> I'd like to use a SQL statement in QueryRecords like this uses the
>> extract the last 10 chars of  field and compare it to today, similar to the
>> line below:
>>
>> SELECT substring( '/index', 10, -1) as expired FROM FLOWFILE WHERE
>> expired <= '${now():format("-MM-dd")}'
>>
>> This statement is not work and I can't find a list of QueryRecord
>> supported UDF or if (nd how) I can use RecordPath functions.
>>
>> Is there any documentation where I can see the available UDFs?
>>
>> Regards,
>>
>>
>


Re: What UDFs are supported by QueryRecord?

2019-10-16 Thread Eric Chaves
Ok, figured out a way to do it. I noticed that QueryRecord uses Apache
Calcite so I tried some of Calcite functions until I got into this
statement that seems to work:

SELECT index FROM FLOWFILE WHERE CAST( SUBSTRING(index FROM
CHAR_LENGTH(index)-9) AS DATE) <= CURRENT_DATE

Anyhow, I would still like to know if (and how) I could use either NiFi
expression or RecordPath functions in a QueryRecord statement.

Thanks in advance,

Em qua, 16 de out de 2019 às 14:30, Eric Chaves  escreveu:

> Hi Folks,
>
> I'd like to use a SQL statement in QueryRecords like this uses the extract
> the last 10 chars of  field and compare it to today, similar to the line
> below:
>
> SELECT substring( '/index', 10, -1) as expired FROM FLOWFILE WHERE expired
> <= '${now():format("-MM-dd")}'
>
> This statement is not work and I can't find a list of QueryRecord
> supported UDF or if (nd how) I can use RecordPath functions.
>
> Is there any documentation where I can see the available UDFs?
>
> Regards,
>
>


What UDFs are supported by QueryRecord?

2019-10-16 Thread Eric Chaves
Hi Folks,

I'd like to use a SQL statement in QueryRecords like this uses the extract
the last 10 chars of  field and compare it to today, similar to the line
below:

SELECT substring( '/index', 10, -1) as expired FROM FLOWFILE WHERE expired
<= '${now():format("-MM-dd")}'

This statement is not work and I can't find a list of QueryRecord supported
UDF or if (nd how) I can use RecordPath functions.

Is there any documentation where I can see the available UDFs?

Regards,


Re: Which JAR can I find org.apache.nifi.record.path.*

2019-10-04 Thread Eric Chaves
Hi Matt, thanks for letting me know.

Have you ever wrote a ScriptedRecordSetWriter? I couldn't find any example
of it. Do you have any to share?

Best regards,

Em sex, 4 de out de 2019 às 18:00, Matt Burgess 
escreveu:

> Eric,
>
> The RecordPath classes are in the nifi-record-path JAR, which is not
> currently included in the scripting bundle. I have written up an
> improvement Jira [1] to add this.
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-6741
>
> On Fri, Oct 4, 2019 at 3:30 PM Eric Chaves  wrote:
> >
> > Hi folks,
> >
> > I'd like to use the Record Path classes in a groovy script for a
> InvokeScriptedProcessor but when I import those classes they aren't found.
> Can anyone point me the JAR where they reside?
> >
> > The classes I'm importing are
> >
> > import org.apache.nifi.record.path.FieldValue
> > import org.apache.nifi.record.path.RecordPath
> > import org.apache.nifi.record.path.RecordPathResult
> > import org.apache.nifi.record.path.util.compiledRecordPaths
> >
> > Thanks in advance,
> >
> > Eric
>


Which JAR can I find org.apache.nifi.record.path.*

2019-10-04 Thread Eric Chaves
Hi folks,

I'd like to use the Record Path classes in a groovy script for a
InvokeScriptedProcessor but when I import those classes they aren't found.
Can anyone point me the JAR where they reside?

The classes I'm importing are

import org.apache.nifi.record.path.FieldValue
import org.apache.nifi.record.path.RecordPath
import org.apache.nifi.record.path.RecordPathResult
import org.apache.nifi.record.path.util.compiledRecordPaths

Thanks in advance,

Eric


Re: Validate CSV/Records by name instead of position

2019-09-18 Thread Eric Chaves
Awesome, thanks for the detailed steps!

Em qua, 18 de set de 2019 às 11:41, Jerry Vinokurov 
escreveu:

> This certainly works. You can create a schema registry and define an Avro
> schema listing your fields. Then make sure that when you set up the reader,
> it's configured to read the header so that it knows which fields go where
> in the record, set up the mode of the schema access to read from the
> registry you created, and then set the name of the actual schema, which is
> a property on the schema registry. This will correctly validate your CSV
> for you.
>
> On Wed, Sep 18, 2019 at 9:59 AM Eric Chaves  wrote:
>
>> Hi folks,
>>
>> Is it possible to validate fields/columns in Record or CSV by its name
>> instead of it's position? For example I have a record with two mandatory
>> fields and some optional fields but they may be on different position on
>> each ingested file. Should I use a script or there is already a processor
>> that could help me out with those?
>>
>> Regards,
>>
>
>
> --
> http://www.google.com/profiles/grapesmoker
>


Validate CSV/Records by name instead of position

2019-09-18 Thread Eric Chaves
Hi folks,

Is it possible to validate fields/columns in Record or CSV by its name
instead of it's position? For example I have a record with two mandatory
fields and some optional fields but they may be on different position on
each ingested file. Should I use a script or there is already a processor
that could help me out with those?

Regards,


Re: Redis - Sorted Sets

2019-09-13 Thread Eric Chaves
Hi John,

Here at work we also needed to use redis in a flow to store and retrieve
real-time events temporary. We haven't found any "Put/Fetch Redis"
processors and the RedisDistributedMapCache didn't fit out needs precisely
because our flow required more control over the key namespace and also to
use the native redis data types (lists and hashmaps).

We ended wroting a custom groovy ScriptedProcessor using the Jedis library
and it was actually pretty easy. The Nifi ExecuteScript cookbook is your
best friend to start.

I can't share the actual processor we wrote due to some internal logic it
has, but if you need some guidance send me a private message and may be
able to help you out.

Cheers,


Em sex, 13 de set de 2019 às 16:41, John Fak  escreveu:

> Has anyone used nifi to move data to Redis ?
> What processors etc ?
> Ideally want to use a sorted set in redis.
>


Re: How to lock mysql a table for concurrent reads

2019-09-13 Thread Eric Chaves
Hi folks, found my error. I fixed my query to use an inner join with a
select for update instead of a subselect and configured the PutSQL to use a
batch count of 1.

Thanks!

Em qui, 12 de set de 2019 às 19:10, Eric Chaves  escreveu:

> Hi folks,
>
> I'm having a hard time to figure out how to do an update inside an atomic
> transaction with NiFi.
>
> In MySQL I use a table to control what is kind of a "record counter"
> distributed among files. The table looks like this:
>
> *create table file_records (*
>
> * file_id BIGINT NOT NULL AUTO_INCREMENT NOT NULL PRIMARY KEY,*
> * file_name VARCHAR(255) NOT NULL,*
>
>
> * first_id BIGINT NOT NULL DEFAULT 0, last_id BIGINT NOT NULL DEFAULT 0,*
> *)*
>
> The idea is that when a file arrives, given that I know how many lines it
> has I'll allocate a range of ids (first_id, last_id) equal to the amount of
> lines in advance, allowing me to process multiple files in parallel without
> the risk of having record ids collisions (the records/line will be inserted
> in another database that does not have auto increment fields).
>
> For this to work I planned to use an update query like this in an acid
> transaction with full table lock:
>
>
> *update tbl_name*
>
>
> *set first_id = (select max(last_id) + 1),last_id = (select max(last_id) +
> ${record_count})where campaign_id = ${sql.generated.key};*
>
> To test this concept I've tried multiple variations of a PutSQL
> configuration processor with 4 concurrent task neither with success (I'll
> expect to have records ).
>
> My last attempt was to setup a sequence of PutSQL doing a LOCK TABLE
> tbl_name; / UPDATE  / UNLOCK TABLE but still haven't got it working.
> When a processor tries to access the locked table it generates an error
> instead of waiting for the lock to be released.
>
> Has anyone ever done something similar in NiFi? What would be the proper
> way to perform each update inside an atomic transaction using a full table
> lock ?
>
> Thanks in advance?
>


How to lock mysql a table for concurrent reads

2019-09-12 Thread Eric Chaves
Hi folks,

I'm having a hard time to figure out how to do an update inside an atomic
transaction with NiFi.

In MySQL I use a table to control what is kind of a "record counter"
distributed among files. The table looks like this:

*create table file_records (*

* file_id BIGINT NOT NULL AUTO_INCREMENT NOT NULL PRIMARY KEY,*
* file_name VARCHAR(255) NOT NULL,*


* first_id BIGINT NOT NULL DEFAULT 0, last_id BIGINT NOT NULL DEFAULT 0,*
*)*

The idea is that when a file arrives, given that I know how many lines it
has I'll allocate a range of ids (first_id, last_id) equal to the amount of
lines in advance, allowing me to process multiple files in parallel without
the risk of having record ids collisions (the records/line will be inserted
in another database that does not have auto increment fields).

For this to work I planned to use an update query like this in an acid
transaction with full table lock:


*update tbl_name*


*set first_id = (select max(last_id) + 1),last_id = (select max(last_id) +
${record_count})where campaign_id = ${sql.generated.key};*

To test this concept I've tried multiple variations of a PutSQL
configuration processor with 4 concurrent task neither with success (I'll
expect to have records ).

My last attempt was to setup a sequence of PutSQL doing a LOCK TABLE
tbl_name; / UPDATE  / UNLOCK TABLE but still haven't got it working.
When a processor tries to access the locked table it generates an error
instead of waiting for the lock to be released.

Has anyone ever done something similar in NiFi? What would be the proper
way to perform each update inside an atomic transaction using a full table
lock ?

Thanks in advance?


Can FTP processors ignore server internal ftp address in passive mode?

2019-05-22 Thread Eric Chaves
Hi folks,

My nifi flows are facing a trouble with some FTPS servers who send their
internal ip address in passive mode and therefore the FTP processor cannot
open the second connection for data transfer. eg.:


*Command: PASV*

*Response: 227 Entering Passive Mode (172,17,4,21,120,207).*

Is it possible to make the FTP processor ignore this server address from
PASV and use the server public address instead?

Has anyone faced this trouble?

Best regards,

Eric


Re: Advice on orchestrating Nifi with dockerized external services

2019-04-11 Thread Eric Chaves
Hi Koji, that seems a pretty good idea, thanks for bringing it up! I wasn't
aware of nifi nano but definitely will give it a shot. =)

Thanks

Em qua, 10 de abr de 2019 às 22:38, Koji Kawamura 
escreveu:

> Hi Eric,
>
> Although my knowledge on MiNiFi, Python and Go is limited, I wonder if
> "nanofi" library can be used from the proprietary application so that
> they can fetch FlowFiles directly using Site-to-Site protocol. That
> can be an interesting approach and will be able to eliminate the need
> of storing data to a local volume (mentioned in the possible approach
> A).
> https://github.com/apache/nifi-minifi-cpp/tree/master/nanofi
>
> The latest MiNiFi (C++) version 0.6.0 was released recently.
> https://cwiki.apache.org/confluence/display/MINIFI/Release+Notes
>
> Thanks,
> Koji
>
> On Thu, Apr 11, 2019 at 2:28 AM Eric Chaves  wrote:
> >
> > Hi Folks,
> >
> > My company is using nifi to perform several data-flow process and now we
> received a requirement to do some fairly complex ETL over large files. To
> process those files we have some proprietary applications (mostly written
> in phyton or go) that ran as docker containers.
> >
> > I don't think that porting those apps as nifi processors would produce a
> good result due to each app complexity.
> >
> > Also we would like keep using the nifi queues so we can monitor overall
> progress as we already do (we ran several other nifi flows) so we are
> discarding for now solutions that for example submit files to an external
> queue like SQS or Rabbit for consumption.
> >
> > So far we come up with two solutions that would:
> >
> > have kubernete cluster of running jobs periodically querying the nifi
> queue for new flowfiles and pull one when a file arrives.
> > download the file-content (which is already stored outside of nifi) and
> process it.
> > submit the result back to nifi (using a HTTP Listener processor) to
> trigger subsequent nifi process.
> >
> >
> > For step 1 and 2 so far we are considering two possible approaches:
> >
> > A) use a minifi container togheter with the app container in a sidecar
> design. minifi would connect to our nifi cluster and handle file download
> to a local volume for the app container process them.
> >
> > B) use nifi rest API to query and consume flowfiles on queue
> >
> > One requirement is that if needed we would manually scale up the app
> cluster to have multiple containers consumer more queued files in parallel.
> >
> > Do you guys recommend one over another (or a third approach)? Any
> pitfalls you can foresee?
> >
> > Would be really glad to hear your thoughts on this matter.
> >
> > Best regards,
> >
> > Eric
>


Advice on orchestrating Nifi with dockerized external services

2019-04-10 Thread Eric Chaves
Hi Folks,

My company is using nifi to perform several data-flow process and now we
received a requirement to do some fairly complex ETL over large files. To
process those files we have some proprietary applications (mostly written
in phyton or go) that ran as docker containers.

I don't think that porting those apps as nifi processors would produce a
good result due to each app complexity.

Also we would like keep using the nifi queues so we can monitor overall
progress as we already do (we ran several other nifi flows) so we are
discarding for now solutions that for example submit files to an external
queue like SQS or Rabbit for consumption.

So far we come up with two solutions that would:

   1. have kubernete cluster of running jobs periodically querying the nifi
   queue for new flowfiles and pull one when a file arrives.
   2. download the file-content (which is already stored outside of nifi)
   and process it.
   3. submit the result back to nifi (using a HTTP Listener processor) to
   trigger subsequent nifi process.


For step 1 and 2 so far we are considering two possible approaches:

A) use a minifi container togheter with the app container in a sidecar
design. minifi would connect to our nifi cluster and handle file download
to a local volume for the app container process them.

B) use nifi rest API to query and consume flowfiles on queue

One requirement is that if needed we would manually scale up the app
cluster to have multiple containers consumer more queued files in parallel.

Do you guys recommend one over another (or a third approach)? Any pitfalls
you can foresee?

Would be really glad to hear your thoughts on this matter.

Best regards,

Eric


Re: Clarifications on getting flowfiles using FlowFileFilters

2019-03-28 Thread Eric Chaves
Hi Matt, thanks for your explanations. I was coding with some
misconceptions but after your answer I rewrote my script and now it's
working like a charm.

Thanks!!

Em qui, 28 de mar de 2019 às 14:22, Matt Burgess 
escreveu:

> Eric,
>
> Here are some answers to your questions:
>
> 1) The same flowfile will not be present to both threads.
>
> 2) Any flow files retrieved from a session have to be handled with
> that session. You'd either have to keep a reference to the session and
> the flow file(s) -- see MergeContent and MergeRecord for examples --
> or you can keep a map from FlowFile UUID to an attribute map, then
> update the flow file with the attributes when it is time to transfer
> it. In the meantime you'd either rollback the session or transfer the
> flow file to SELF.
>
> 3) No, any flow files retrieved from a session have to be handled with
> that session, either by transfer/commit or remove.
>
> 4) Not sure what you mean by mark/lock/hide, but each flow file
> retrieved from a session is associated only with that session/thread.
> It's kind of "locked" in the sense that while the session is active,
> no other thread/session knows about it.
>
> 5) Not sure what's going on there, but it seems related to the content
> repository so I assume you're updating the flow file contents at the
> time? Maybe put some logging statements around some of the
> session/flowfile operations to see if you can narrow down what's going
> on there. Also, what content repository implementation are you using?
> I think there's a new one that's not yet the default but works better.
>
> Regards,
> Matt
>
> On Wed, Mar 27, 2019 at 3:12 PM Eric Chaves  wrote:
> >
> > Hi folks,
> >
> > I'd like to write InvokeScriptedProcessor that inspect all files in the
> incoming queue without actually processing/transfering them until a given
> business condition is met.
> >
> > So I looked for some examples of how could I get all flowfiles in a
> queue using session.get(FlowFileFilter) and I would like to confirm if I'm
> getting it right.
> >
> > 1) If I have more than one thread executing my processor calling
> session.get(FlowFileFilter) can the same flowfile be present to both
> threads?
> >
> > 2) If I retrieve some flowfiles using session.get(FlowFileFilter) can I
> add/modify their attributes but keep them in the current queue (ie set
> attributes but don't transfer them to any relationship)?
> >
> > 3) Am I correct that session.get(FlowFileFilter) don't requires neither
> session.commit nor session.rollback? Those were required only if I modify
> the flowfile.
> >
> > 4) When obtain a list of flowfiles using session.get(FlowFileFilter)
> does it marks/lock/hides the selected files in anyway?
> >
> > 5) I'm playing with this concept with some groovy script using a
> InvokeScriptedProcessor and very often, after the second or third
> processor's on trigger execution I'm receiving an error
> java.io.IOException: All Partitions have been blacklisted due to failures
> when attempting to update. If the Write-Ahead Log is able to perform a
> checkpoint, this issue may resolve itself. Otherwise, manual intervention
> will be required. What could be the source of this error? Could it be
> related to the use of session.get(FlowFileFilter)? I'm having a hard time
> to detect what I'm doing that raises this error.
> >
> > Thanks in advance!
> >
> >
> >
>


Clarifications on getting flowfiles using FlowFileFilters

2019-03-27 Thread Eric Chaves
Hi folks,

I'd like to write InvokeScriptedProcessor that inspect all files in the
incoming queue without actually processing/transfering them until a given
business condition is met.

So I looked for some examples of how could I get all flowfiles in a queue
using session.get(FlowFileFilter) and I would like to confirm if I'm
getting it right.

1) If I have more than one thread executing my processor calling
session.get(FlowFileFilter) can the same flowfile be present to both
threads?

2) If I retrieve some flowfiles using session.get(FlowFileFilter) can I
add/modify their attributes but keep them in the current queue (ie set
attributes but don't transfer them to any relationship)?

3) Am I correct that session.get(FlowFileFilter) don't requires neither
session.commit nor session.rollback? Those were required only if I modify
the flowfile.

4) When obtain a list of flowfiles using session.get(FlowFileFilter) does
it marks/lock/hides the selected files in anyway?

5) I'm playing with this concept with some groovy script using a
InvokeScriptedProcessor and very often, after the second or third
processor's on trigger execution I'm receiving an error *java.io.IOException:
All Partitions have been blacklisted due to failures when attempting to
update. If the Write-Ahead Log is able to perform a checkpoint, this issue
may resolve itself. Otherwise, manual intervention will be required. *What
could be the source of this error? Could it be related to the use of
session.get(FlowFileFilter)? I'm having a hard time to detect what I'm
doing that raises this error.

Thanks in advance!


Re: How can I ExtractGrok from end-of-file?

2019-03-20 Thread Eric Chaves
Hi Koji, thanks for the tip. I've ended writing a processor script for this
task. Since I need the footer to validate the entire file, It was easier
than splitting the files for the sole purpose of finding out the last line.

Regards,

Em qua, 20 de mar de 2019 às 00:59, Koji Kawamura 
escreveu:

> Hello Eric,
>
> Have you found any solution for this?
> If your trailers (footer?) starts with certain byte sequence, then
> SplitContent may be helpful to split the content into Header+Payload,
> and the Trailers.
> If that works, then the subsequent flow can do something creative
> probably using RouteOnAttribute, GrokExtract, MergeContent (with
> defragment merge strategy) ... etc.
>
> Thanks,
> Koji
>
> On Fri, Mar 15, 2019 at 11:34 PM Eric Chaves  wrote:
> >
> > Hi folks,
> >
> > I'm learning how to use grok with nifi starting  with the ExtractGrok
> processor and I'd like to use it to extract data from file headers and
> trailers however since the GrokExtract processor only apply the grok
> expression on the defined buffer size (and each of my file differs on size)
> I can't evaluate trailers on every file.
> >
> > Any ideas on how could I apply the grok expression from the end of file
> instead of from the beginning, or any alternative processor?
> >
> > Cheers,
> >
>


Can't trim leading whitespaces with UpdateRecord

2019-03-17 Thread Eric Chaves
Hi Folks,

Forgive me for this dummy question but  I've been cracking my head for
hours.

 Is there anything special in RecordPath regular expressions? I'm trying to
trim whitespaces by using UpdateRecord (with replacement strategy of Record
Path Value ) replaceRegex over current field value but I can't find any
expression that works.

All my attempts produce an output with the original value however simple
replacements like replaceRegex(/FULL_NAME,'JON', 'ERIC' ) works. I've
already tried expressions like:

replaceRegex(/FULL_NAME,'\s', '' )
replaceRegex(/FULL_NAME,'\s+', '-' )
replaceRegex(/FULL_NAME,'\s{2,}', '-' )
replaceRegex(/SSN,'(\d+)', '$1' )
replaceRegex(/SSN,'\s+$', '' )

If I try the regex in a java regex tester (like
https://www.freeformatter.com/java-regex-tester.html) all expressions
matches/replaces as expected but in the UpdateRecord processor they have no
effect.

What am I missing?

Thanks


Is it possible to use declare an Avro schema for multi-record files?

2019-03-17 Thread Eric Chaves
Hi folks,

Is possible to declare an Avro schema for a ConvertRecord processor to
handle multi-record file ie a file where each line may be a different avro
record?

Something  like this:

{
  "type" : "record",
  "namespace" : "com.acme",
  "name" : "OrderFile",
  "fields" : [
  {
"type" : "record",
"namespace" : "com.acme",
"name" : "HeaderRecord",
"fields" : [
  {"name":"PNSTORE","type": "string"},
  {"name":"STORENAME",  "type": "string"},
  {"name":"EXTRACTIONDATE",   "type": "string"}
]
  },

  {
"type" : "record",
"namespace" : "com.acme",
"name" : "OrderRecord",
"fields" : [
{ "name": "SALESMAN", "type": "string" },
{ "name": "ORDER_NUMBER", "type": "string" },
{ "name": "DUE_DATE", "type": "string" },
{ "name": "ORDER_AMOUNT", "type": "long" }
]
  },

  {
"type" : "record",
"namespace" : "com.acme",
"name" : "TrailerRecord",
"fields" : [
  {"name":"TOTAL_RECORDS", "type": "long"},
  {"name":"TOTAL_AMOUNT", "type": "long"}
]
  }
  ]
}

Thanks in advance,

Eric


Export/Import flow files

2019-03-15 Thread Eric Chaves
Hi fellows,

I'd like to export a flow file from a queue running in production in order
to import it into my local machine for development/troubleshooting?

Also, is it possible to pick a single flow file in a given queue and
process only it?

Is there any way to do it?

Best regards,

Eric


How can I ExtractGrok from end-of-file?

2019-03-15 Thread Eric Chaves
Hi folks,

I'm learning how to use grok with nifi starting  with the ExtractGrok
processor and I'd like to use it to extract data from file headers and
trailers however since the GrokExtract processor only apply the grok
expression on the defined buffer size (and each of my file differs on size)
I can't evaluate trailers on every file.

Any ideas on how could I apply the grok expression from the end of file
instead of from the beginning, or any alternative processor?

Cheers,


Re: Help with SSL Context Service for https post requests

2017-12-08 Thread Eric Chaves
sure! I'm listing the log folder from inside the docker container.
logback.xml is attached.

nifi@5d3b7bd36ffd:/opt/nifi/nifi-1.4.0$ ls -al logs
total 162720
drwxr-xr-x  2 nifi nifi  4096 Dec  8 01:03 .
drwxr-xr-x 15 nifi nifi  4096 Dec  8 22:02 ..
-rw-r--r--  1 nifi nifi 0 Nov 30 15:25 .gitkeep
-rw-r--r--  1 nifi nifi602301 Dec  8 22:02 nifi-bootstrap.log
-rw-r--r--  1 nifi nifi 63400 Dec  7 16:35 nifi-bootstrap_2017-12-07.log
-rw-r--r--  1 nifi nifi 164964352 Dec  8 23:45 nifi-user.log
-rw-r--r--  1 nifi nifi970005 Dec  7 23:59 nifi-user_2017-12-07.log
nifi@5d3b7bd36ffd:/opt/nifi/nifi-1.4.0$

Regarding the error, the truststore is in a folder where nifi user has read
access:

nifi@5d3b7bd36ffd:/opt/nifi/nifi-1.4.0$ ls -al ../assets
total 176
drwxr-xr-x 2 nifi nifi   4096 Dec  8 17:49 .
drwxr-xr-x 9 nifi nifi   4096 Dec  8 22:02 ..
-rw-r--r-- 1 nifi nifi  0 Dec  1 01:37 .gitkeep
-rw-r--r-- 1 nifi nifi   2255 Dec  8 02:04 mandril-send.json
-rw-r--r-- 1 nifi nifi302 Dec  7 02:41 sample.html
-rw-r--r-- 1 nifi nifi 163706 Dec  8 17:50 truststore.p12
nifi@5d3b7bd36ffd:/opt/nifi/nifi-1.4.0$


Am I required to set keystore together or can I use just the truststore?

regards,


2017-12-08 20:53 GMT-02:00 Andy LoPresto <alopre...@apache.org>:

> That error could be thrown if the file does not have OS level permissions
> that allow the user running NiFi to read it. I’m a little surprised there
> is no nifi-app.log file, as that gets written to as soon as the application
> starts up. If you are able to configure a processor or controller service
> through the API / UI, that file should exist.
>
> Can you provide the contents of your $NIFI_HOME/conf/logback.xml file and
> a directory listing of $NIFI_HOME/logs?
>
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com <alopresto.apa...@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Dec 8, 2017, at 2:11 PM, Eric Chaves <e...@uolet.com> wrote:
>
> Hi Andy,
>
> The log from bulletin board is:
>
> PostHTTP[id=3253a78a-0160-1000-b7cf-6d7878f13efa] Unable to communicate
> with destination https://mandrillapp.com/api/1.0/messages/send.json to
> determine whether or not it can accept flowfiles/gzip; routing
> StandardFlowFileRecord[uuid=cffc2f1d-97cb-423f-9296-5e796fd49a99,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1512770613805-1,
> container=default, section=1], offset=15244, length=2260],offset=0,name=emails
> sample.csv,size=2260] to failure due to javax.net.ssl.SSLException:
> java.lang.RuntimeException: Unexpected error: java.security.
> InvalidAlgorithmParameterException: the trustAnchors parameter must be
> non-empty: java.lang.RuntimeException: *Unexpected error:
> java.security.InvalidAlgorithmParameterException: the trustAnchors
> parameter must be non-empty*
>
> For some reason that I couldn't investigate yet my current nifi setup is
> not generating the nifi-app.log.
>
> Googling the error message the reason would be lacking of a truststore
> file but I have the exported file in place so I really dont know where else
> to look.
>
> Do you have any idea?
>
> Regards,
>
> Eric
>
> 2017-12-08 19:31 GMT-02:00 Andy LoPresto <alopre...@apache.org>:
>
>> Hi Eric,
>>
>> The truststore is a collection of trusted public key certificates. As you
>> noted, the /etc/ssl/ directory contains pre-loaded CA certificates to be
>> used for this. You can also use the JVM cacerts file, which is already in
>> JKS format.
>>
>> If this isn’t sufficient, can you provide an error from the log or a
>> further description of the issue you’re encountering? Thanks.
>>
>> Andy LoPresto
>> alopre...@apache.org
>> *alopresto.apa...@gmail.com <alopresto.apa...@gmail.com>*
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On Dec 8, 2017, at 10:21 AM, Eric Chaves <e...@uolet.com> wrote:
>>
>> Hi,
>>
>> I'd like to make an HTTPS request to an internet public service but I'm
>> failing to to setup the SSL Context Service. I tried to export my system
>> certs to be used as truststore.
>>
>> openssl pkcs12 -export -nokeys -in /etc/ssl/certs/ca-certificates.crt
>> -out ./assets/truststore.p12
>>
>> Can someone help me out with a step-by-step?
>>
>> Thanks
>>
>>
>>
>
>





true



${org.apache.nifi.bootstrap.config.log.dir}/nifi-app.log


${org.apache.nifi.bootstrap.config.log.dir}/nifi-app_%d.log

10
100MB

true

%date %level [%thread] %logger{40

Re: Help with SSL Context Service for https post requests

2017-12-08 Thread Eric Chaves
Hi Andy,

The log from bulletin board is:

PostHTTP[id=3253a78a-0160-1000-b7cf-6d7878f13efa] Unable to communicate
with destination https://mandrillapp.com/api/1.0/messages/send.json to
determine whether or not it can accept flowfiles/gzip; routing
StandardFlowFileRecord[uuid=cffc2f1d-97cb-423f-9296-5e796fd49a99,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1512770613805-1, container=default,
section=1], offset=15244, length=2260],offset=0,name=emails
sample.csv,size=2260] to failure due to javax.net.ssl.SSLException:
java.lang.RuntimeException: Unexpected error:
java.security.InvalidAlgorithmParameterException: the trustAnchors
parameter must be non-empty: java.lang.RuntimeException: *Unexpected error:
java.security.InvalidAlgorithmParameterException: the trustAnchors
parameter must be non-empty*

For some reason that I couldn't investigate yet my current nifi setup is
not generating the nifi-app.log.

Googling the error message the reason would be lacking of a truststore file
but I have the exported file in place so I really dont know where else to
look.

Do you have any idea?

Regards,

Eric

2017-12-08 19:31 GMT-02:00 Andy LoPresto <alopre...@apache.org>:

> Hi Eric,
>
> The truststore is a collection of trusted public key certificates. As you
> noted, the /etc/ssl/ directory contains pre-loaded CA certificates to be
> used for this. You can also use the JVM cacerts file, which is already in
> JKS format.
>
> If this isn’t sufficient, can you provide an error from the log or a
> further description of the issue you’re encountering? Thanks.
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com <alopresto.apa...@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Dec 8, 2017, at 10:21 AM, Eric Chaves <e...@uolet.com> wrote:
>
> Hi,
>
> I'd like to make an HTTPS request to an internet public service but I'm
> failing to to setup the SSL Context Service. I tried to export my system
> certs to be used as truststore.
>
> openssl pkcs12 -export -nokeys -in /etc/ssl/certs/ca-certificates.crt
> -out ./assets/truststore.p12
>
> Can someone help me out with a step-by-step?
>
> Thanks
>
>
>


Help with SSL Context Service for https post requests

2017-12-08 Thread Eric Chaves
Hi,

I'd like to make an HTTPS request to an internet public service but I'm
failing to to setup the SSL Context Service. I tried to export my system
certs to be used as truststore.

openssl pkcs12 -export -nokeys -in /etc/ssl/certs/ca-certificates.crt -out
./assets/truststore.p12

Can someone help me out with a step-by-step?

Thanks


Re: How can I change the content-type of a flow file in a script processor?

2017-12-08 Thread Eric Chaves
Thanks guys.

2017-12-08 13:27 GMT-02:00 Russell Bateman <r...@windofkeltia.com>:

> Eric,
>
> Just a reminder not to forget that this modifies the flowfile and you must
> gather/carry that modification in a practical sense. Therefore, your code
> will be
>
> flowfile = session.putAttribute( flowfile, "content-type",
> "application/json" );
>
> and thence you'll use the updated flowfile the next time you need it (such
> as when you call session.transfer( flowfile, SUCCESS ) ).
>
> Hope this reminder is useful,
>
> Russ
>
> On 12/07/2017 07:12 PM, Chris Herrera wrote:
>
> Hi Eric,
>
> If its for the content viewer you might want to use mime.type.
> additionally you can use session.putAttribute(flowFile, “content-type”,
> "application/json”);
>
> Regards,
> Chris
>
>
> On Dec 7, 2017, 8:10 PM -0600, Eric Chaves <e...@uolet.com>
> <e...@uolet.com>, wrote:
>
> Hi guys,
>
> I'm trying to programatically set the content-type of a flow-file by
> calling flowFile.putAttribute('content-type', 'application/json') but
> it's not working.
>
> How can I change my flow-file content-type?
>
> Regards,
>
> Eric
>
>
>


How can I change the content-type of a flow file in a script processor?

2017-12-07 Thread Eric Chaves
Hi guys,

I'm trying to programatically set the content-type of a flow-file by
calling flowFile.putAttribute('content-type', 'application/json') but it's
not working.

How can I change my flow-file content-type?

Regards,

Eric


Re: Someone recommend a good Avro studdy guide for newbie?

2017-11-23 Thread Eric Chaves
Hi Matt, thanks! as you said the AbstractRecordProcessor was the missing
bits to get me started.  There is one more question thou that I would like
to learn: how to create/manipulate schemas on the fly. Scouting some tests
I saw how to create a new basic schema but I would like to understand how
to create a new schema from a given pojo object, and how to add strcut
attributes in a existing schema.

By the way I'm testing and cleaning my invokedscriptprocessor used to
enhance data with multiples sql lookups, and I intent to share it here with
all. If you want you can share it on you blog too.

Cheers,

Eric

2017-11-22 19:28 GMT-02:00 Matt Burgess <mattyb...@apache.org>:

> Eric,
>
> If you're looking for examples on implementing a scripted record
> reader/writer, you can see the unit test examples [1] or Drew Lim's
> blog [2].  However I suspect you are looking to leverage an existing
> RecordReader/Writer from a scripted processor such as ExecuteScript or
> InvokeScriptedProcessor, to prototype a "record-aware" processor.  A
> general approach is implemented in AbstractRecordProcessor[3], I don't
> believe the NiFi Scripting NAR has access to that class, but you can
> certainly copy-and-paste the relevant bits of code.  I don't think you
> can (easily) get access to an existing RecordReader/Writer controller
> service, but Mark Payne outlined the alternative in your mailing list
> question about getting a DBCPService property. The recommended
> approach is to use InvokeScriptedProcessor and specify your own
> RecordReader/Writer properties which will be presented to the user in
> the InvokeScriptedProcessor dialog (once the script has been applied).
> Then you can get the RecordReader/Writer objects (see [3] lines
> 100-101) and use them in a similar way as [3].
>
> This (and the answer(s) to your questions in the other mailing list
> post) would make excellent blog posts/articles, I hope to do some
> relatively soon, will keep the list posted :)
>
> Regards,
> Matt
>
> [1] https://github.com/apache/nifi/tree/master/nifi-nar-
> bundles/nifi-scripting-bundle/nifi-scripting-processors/src/
> test/resources/groovy
> [2] https://community.hortonworks.com/articles/115311/convert-
> csv-to-json-avro-xml-using-convertrecord-p.html
> [3] https://github.com/apache/nifi/blob/master/nifi-nar-
> bundles/nifi-standard-bundle/nifi-standard-processors/src/
> main/java/org/apache/nifi/processors/standard/
> AbstractRecordProcessor.java#L94
>
>
> On Wed, Nov 22, 2017 at 3:00 PM, Eric Chaves <e...@uolet.com> wrote:
> > Hi folks,
> >
> > Due to my lack of knowledge with Avro in my scripted processors I'm
> required
> > to convert them to JSON in order to be able to manipulate them.
> >
> > I would like to avoid this and write record based processors but I have
> no
> > clue about how should I do to actually read/write/modify record based
> > content (including modifying underlying schemas).
> >
> > Do you have any reading material to recommend?
> >
> > Thanks in advance,
> >
> > Eric
>


Someone recommend a good Avro studdy guide for newbie?

2017-11-22 Thread Eric Chaves
Hi folks,

Due to my lack of knowledge with Avro in my scripted processors I'm
required to convert them to JSON in order to be able to manipulate them.

I would like to avoid this and write record based processors but I have no
clue about how should I do to actually read/write/modify record based
content (including modifying underlying schemas).

Do you have any reading material to recommend?

Thanks in advance,

Eric


Is it possible to import class from NAR bundle in scripted processor?

2017-11-20 Thread Eric Chaves
Hi folks,

I'm writing some groovy scripts for that will use the AWS-SDK. My first
shot was to write a InvokeScriptedProcessor that extends
the AbstractAWSCredentialsProviderProcessor but the script is unable to
resolve this class.

Is it possible to reference a class in a NAR bundle?

Regards,

Eric


Re: DBCP Error Cannot get a connection, pool error Timeout waiting for idle object

2017-11-16 Thread Eric Chaves
Ok, so after a few more research I think I got it working. The tricky was
the fact that I wasn't closing the connection hence exhausting the pool
very quickly. I didn't notice this before because most code I saw didn't
explicitly close the connection but once I did it my scripted processor
worked as expected. I'm just finishing some rough points and once I finish
it I'll share it with everyone in a gist.

Thanks everyone so far! =)

2017-11-16 13:49 GMT-02:00 Eric Chaves <e...@uolet.com>:

> Hi guys,
>
> I've made a lot of changes on my script processor and now I'm properly
> getting an instance of DBCPService however when I try to use the connection
> I got the error org.apache.commons.dbcp.SQLNestedException: Cannot get a
> connection, pool error Timeout waiting for idle object.
>
> I know the connection is ok because the upstream processor  uses the same
> DBCP instance to access the same database.
>
> the code I'm using on my InvokeScriptedProcessor is:
>
>   void onTrigger(ProcessContext context, ProcessSessionFactory
> sessionFactory) throws ProcessException {
> def session = sessionFactory.createSession()
>   def flowFile = session.get()
>   if(!flowFile) return
>   def properties = context.getProperties()
>   log.info("Properties {}", properties)
>   def String lookupKey = context.getProperty(LOOKUP_FIELD)?.
> evaluateAttributeExpressions()?.getValue()
>   log.info('Lookup key {}', lookupKey)
>   def dbcpService = context.getProperty(DBCP_
> SERVICE).asControllerService(DBCPService)
>   def Integer queryTimeout = context.getProperty(QUERY_
> TIMEOUT).asTimePeriod(TimeUnit.SECONDS).intValue()
>   try
> {
>   def con = dbcpService.getConnection()
>   def st = con.createStatement()
>   st.setQueryTimeout(queryTimeout)
>   def selectQuery = "select * from PERSONS where id=${lookupKey}"
>   log.info("Executing query {}", selectQuery)
>   boolean results = st.execute(selectQuery)
>   log.info("Results {}-{}", lookupKey, results)
>   session.transfer(flowFile, REL_SUCCESS)
>   session.commit()
> } catch (final Throwable t) {
>   log.error('{} failed to process due to {}', [this, t] as Object[])
>   session.rollback(true)
>   throw t
> }
>   }
>
> Any hints?
>


DBCP Error Cannot get a connection, pool error Timeout waiting for idle object

2017-11-16 Thread Eric Chaves
Hi guys,

I've made a lot of changes on my script processor and now I'm properly
getting an instance of DBCPService however when I try to use the connection
I got the error org.apache.commons.dbcp.SQLNestedException: Cannot get a
connection, pool error Timeout waiting for idle object.

I know the connection is ok because the upstream processor  uses the same
DBCP instance to access the same database.

the code I'm using on my InvokeScriptedProcessor is:

  void onTrigger(ProcessContext context, ProcessSessionFactory
sessionFactory) throws ProcessException {
def session = sessionFactory.createSession()
  def flowFile = session.get()
  if(!flowFile) return
  def properties = context.getProperties()
  log.info("Properties {}", properties)
  def String lookupKey =
context.getProperty(LOOKUP_FIELD)?.evaluateAttributeExpressions()?.getValue()
  log.info('Lookup key {}', lookupKey)
  def dbcpService =
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService)
  def Integer queryTimeout =
context.getProperty(QUERY_TIMEOUT).asTimePeriod(TimeUnit.SECONDS).intValue()
  try
{
  def con = dbcpService.getConnection()
  def st = con.createStatement()
  st.setQueryTimeout(queryTimeout)
  def selectQuery = "select * from PERSONS where id=${lookupKey}"
  log.info("Executing query {}", selectQuery)
  boolean results = st.execute(selectQuery)
  log.info("Results {}-{}", lookupKey, results)
  session.transfer(flowFile, REL_SUCCESS)
  session.commit()
} catch (final Throwable t) {
  log.error('{} failed to process due to {}', [this, t] as Object[])
  session.rollback(true)
  throw t
}
  }

Any hints?


Re: How to get DBCP service inside ScriptedLookupService

2017-11-15 Thread Eric Chaves
Hi Folks, after thinking about my scripted components I decided to ditch
the ScriptedLookup in favor of writing a InvokeScriptedProcessor that seems
more aligned with the proper use explained so far.

I've implemented the base script as outlined and added some Properties into
my InvokeScriptedProcessor but one of them is keeping the processor in an
invalid state claiming that the property is invalid because it's not a
supported property. I've declared the property using the following code:

class GroovyProcessor implements Processor {

final static PropertyDescriptor LOOKUP_FIELD = new
PropertyDescriptor.Builder()
.name("lookup-field")
.displayName("Lookup field")
.description("Field used in lookup queries")
.dynamic(true) // have tried with both dynamic true/false with same
outcome.
.defaultValue("id")
.build()

...}

What is my mistake here?

Cheers,

Eric

2017-11-15 12:24 GMT-02:00 Eric Chaves <e...@uolet.com>:

> Matt, Mark, thanks for the great explanations! I'm learning a lot! :)
>
> So I went down the road described but I'm getting another error:
>  groovy.lang.MissingMethodException: No signature of method:
> org.apache.nifi.lookup.script.ScriptedLookupService$1.getProperty() is
> applicable for argument types: (org.apache.nifi.components.PropertyDescriptor)
> values: [PropertyDescriptor[Database Connection Pool Services]]
>
> Basically I declared a final static PropertyDescriptor DBCP_SERVICE and
> inside the initialize method I tried to get the DBCPService as outlined.
> Comparing my code to the QueryDatabaseTable processor I noticed when QDT
> grabs the DBCPService instance it's context is a ProcessContext while in my
> GroovyLookupClass's initialize method context is a
> ControllerServiceInitializationContext so it's seem I'm using the wrong
> object right? Where perform the call for context.getProperty(DBCP_
> SERVICE).asControllerService(DBCPService)?
>
> (https://github.com/apache/nifi/blob/c10ff574c4602fe05f5d1dae5eb0b1
> bd24026c02/nifi-nar-bundles/nifi-standard-bundle/nifi-
> standard-processors/src/main/java/org/apache/nifi/processors/standard/
> QueryDatabaseTable.java#L191)
>
> A minor notice for future reference by other users: it took me a while to
> get the PropertyDescriptor working because declaring it was not enough to
> make it shown at Properties dialog. I had to enable and than disable the
> ScriptedProcessor it at least once to have it shown (guessing the code was
> not executed).
>
> Thanks again for all the support.
>
>
> 2017-11-14 15:52 GMT-02:00 Matt Burgess <mattyb...@apache.org>:
>
>> Mark,
>>
>> Good point, I forgot the ScriptedLookupService is itself a
>> ConfigurableComponent and can add its own properties. The original
>> script from my blog comes from ExecuteScript, where you can't define
>> your own properties. I was just trying to make that work instead of
>> thinking about the actual problem, d'oh!
>>
>> Eric, rather than trying to get at a DBCPConnectionPool defined
>> elsewhere, you can add a property from your ScriptedLookupService that
>> is itself a reference to DBCPService. Then the user will see in the
>> properties a dropdown list of DBCPConnectionPool instances, just like
>> the other processors that use them (ExecuteSQL, e.g.). Mark outlined
>> that approach, and it is definitely way better. Sorry for the wild
>> goose chase, although I guess it was only me that wasted my time :P
>> Guess it's time to add a new post using this technique instead!
>>
>> Thanks,
>> Matt
>>
>>
>> On Tue, Nov 14, 2017 at 12:43 PM, Mark Payne <marka...@hotmail.com>
>> wrote:
>> > Matt, Eric,
>> >
>> > The typical pattern that you would follow for obtaining a Controller
>> Service would be to
>> > return a property that uses the identifiesControllerService() method.
>> For example:
>> >
>> > static final PropertyDescriptor MYSQL_CONNECTION_POOL = new
>> PropertyDescriptor.Builder()
>> > .name("Connection Pool")
>> > .identifiesControllerService(DBCPService.class)
>> > .required(true)
>> > .build();
>> >
>> > Then, to obtain that controller service, you would access it as:
>> >
>> > final DBCPService connectionPoolService = context.getProperty(MYSQL_CONN
>> ECTION_POOL).asControllerService(DBCPService.class)
>> >
>> > This allows the user to explicitly choose which controller service that
>> they want to use.
>> >
>> > Attempting to obtain a Controller Service by name will certainly cause
>> some problems,
>> > as you have already learned :

Re: How to get DBCP service inside ScriptedLookupService

2017-11-15 Thread Eric Chaves
 >> On Nov 14, 2017, at 12:11 PM, Matt Burgess <mattyb...@apache.org>
> wrote:
> >>
> >> Eric,
> >>
> >> So I just learned ALOT about the bowels of the context and
> >> initialization framework while digging into this issue, and needless
> >> to say we will need a better way of making this available to scripts.
> >> Here's some info:
> >>
> >> 1) The ControllerServiceInitializationContext object passed into
> >> initialize() is an anonymous object that passes along the
> >> ScriptedLookupService's context objects, such as the
> >> ControllerServiceLookup.
> >> 2) The ControllerServiceLookup interface does not have a method
> >> signature for getControllerServiceIdentifiers(Class, String) to pass
> >> in the process group id.
> >> 3) The ControllerServiceLookup object returned by the
> >> ControllerServiceInitializationContext.getControllerServiceLookup()
> >> method is a StandardControllerServiceInitializationContext
> >> 4) Note that the context object passed into the initialize() method
> >> and the one returned by context.getControllerServiceLookup() are
> >> different (but both are ControllerServiceInitializationContext impls)
> >> 5) The StandardControllerServiceInitializationContext object contains
> >> a private ControllerServiceProvider called serviceProvider of type
> >> StandardControllerServiceProvider, the anonymous context object does
> >> not
> >> 6) The StandardControllerServiceInitializationContext object delegates
> >> the getControllerServiceIdentifiers(Class) method to the
> >> serviceProvider
> >> 7) serviceProvider (a StandardControllerServiceProvider) does not
> >> allow the call to the getControllerServiceIdentifiers(Class)
> >> signature, and instead throws the error you're seeing
> >> 8) None of these objects can get at the process group ID. This is
> >> because they are not associated with a ConfigurableComponent
> >> 9) ScriptedLookupService, after calling the script's initialize()
> >> method, will then call the script's onEnabled(ConfigurationContext)
> >> method if it exists. This is currently undocumented [1]
> >> 10) The script's onEnabled(ConfigurationContext) method will get a
> >> StandardConfigurationContext object
> >> 11) The StandardConfigurationContext object has a private
> >> ConfiguredComponent named component, it is actually a
> >> StandardControllerServiceNode object
> >> 12) You can get the process group ID by calling the component's
> >> getProcessGroupIdentifier() method
> >> 13) The StandardConfigurationContext object also has a private
> >> ControllerServiceLookup named serviceLookup, it is actually a
> >> StandardControllerServiceProvider object
> >> 14) Since we can get a process group ID from #11-12, we can now call
> >> the supported method on the ControllerServiceProvider interface,
> >> namely getControllerServiceIdentifiers(Class, String)
> >> 15) Getting at private members (#11 &13) is allowed in Groovy, but
> >> IIRC only works if you don't have a security manager/policies on the
> >> JVM.
> >>
> >> TL;DR You can't currently get controller services by name in the
> >> initialize() method, you have to implement onEnabled instead. If you
> >> want to use logging, however, you'll need to save off the logger in
> >> the initialize() method. Here's a working version of onEnabled:
> >>
> >> void onEnabled(ConfigurationContext context) {
> >>lookup = context.serviceLookup
> >>processGroupId = context.component?.processGroupIdentifier
> >>/* Get sql-connection */
> >>def dbcpServiceId =
> >> lookup.getControllerServiceIdentifiers(ControllerService,
> >> processGroupId).find {
> >>  cs -> lookup.getControllerServiceName(cs) == 'MySQLConnectionPool'
> >>}
> >>def conn = lookup.getControllerService(
> dbcpServiceId)?.getConnection()
> >>  }
> >>
> >> Hope this helps. I will think some more on how to make everything
> >> fluid and legit -- Mark Payne, could use your help here :)
> >>
> >> Regards,
> >> Matt
> >>
> >> On Tue, Nov 14, 2017 at 6:13 AM, Eric Chaves <e...@uolet.com> wrote:
> >>> Hi Folks,
> >>>
> >>> I need to get an instance of DBCPService inside my
> ScriptedLookupService and
> >>> for that I'm following Matt's post
> >>> http://funnifi.blogspot.com.br/2016/04/sql-in-nifi-with-
> executescript.html
> >>>
> >>> In my groovy class I've overrided the initialize method and performing
> the
> >>> lookup there but I'm getting the following error:
> >>>
> >>> java.lang.UnsupportedOperationException: Cannot obtain Controller
> Service
> >>> Identifiers for service type interface
> >>> org.apache.nifi.controller.ControllerService without providing a
> Process
> >>> Group Identifier
> >>>
> >>>
> >>> @Override
> >>>  void initialize(ControllerServiceInitializationContext context)
> throws
> >>> InitializationException {
> >>>log = context.logger
> >>>/* Get sql-connection */
> >>>def lookup = context.controllerServiceLookup
> >>>def dbcpServiceId =
> >>> lookup.getControllerServiceIdentifiers(ControllerService).find {
> >>>  cs -> lookup.getControllerServiceName(cs) ==
> 'MySQLConnectionPool'
> >>>}
> >>>conn = lookup.getControllerService(dbcpServiceId)?.getConnection()
> >>>log.info("sql conn {}", conn)
> >>>  }
> >>>
> >>> Is there other way to find service identifiers?
> >>>
> >>> Regards,
> >
>


How to get DBCP service inside ScriptedLookupService

2017-11-14 Thread Eric Chaves
Hi Folks,

I need to get an instance of DBCPService inside my ScriptedLookupService
and for that I'm following Matt's post
http://funnifi.blogspot.com.br/2016/04/sql-in-nifi-with-executescript.html

In my groovy class I've overrided the initialize method and performing the
lookup there but I'm getting the following error:

*java.lang.UnsupportedOperationException: Cannot obtain Controller Service
Identifiers for service type interface
org.apache.nifi.controller.ControllerService without providing a Process
Group Identifier*


@Override
  void initialize(ControllerServiceInitializationContext context) throws
InitializationException {
log = context.logger
/* Get sql-connection */
def lookup = context.controllerServiceLookup
def dbcpServiceId =
lookup.getControllerServiceIdentifiers(ControllerService).find {
  cs -> lookup.getControllerServiceName(cs) == 'MySQLConnectionPool'
}
conn = lookup.getControllerService(dbcpServiceId)?.getConnection()
log.info("sql conn {}", conn)
  }

Is there other way to find service identifiers?

Regards,


Re: How to make a lookup service return a Record

2017-11-12 Thread Eric Chaves
Thanks Mike! I've followed your suggestion and also took some code from
Test_record_reader_inline.groovy  (
https://github.com/jjhbeloved/nifi/blob/c70643ab5ea822c8648b77b2216f8e23b9d0ae0b/nifi-nar-bundles/nifi-scripting-bundle/nifi-scripting-processors/src/test/resources/groovy/test_record_reader_inline.groovy
).



2017-11-12 21:43 GMT-02:00 Mike Thomsen <mikerthom...@gmail.com>:

> Take a look at the MongoDBLookupService. It should give you a template to
> work from:
>
> https://github.com/apache/nifi/blob/9a8e6b2eb150865361dda241d71405
> c5a969f5e8/nifi-nar-bundles/nifi-standard-services/nifi-
> mongodb-services-bundle/nifi-mongodb-services/src/main/
> java/org/apache/nifi/mongodb/MongoDBLookupService.java
>
> On Sun, Nov 12, 2017 at 4:45 PM, Eric Chaves <e...@uolet.com> wrote:
>
>> Hi Folks,
>>
>> I've being cracking my head before asking but couldn't figure out how to
>> make an ScriptedLookupService (groovy) return either a Record or an Map. So
>> far I was only able to return single values.
>>
>> Can anyone point me in the right direction?
>>
>> Cheers,
>>
>
>


How to make a lookup service return a Record

2017-11-12 Thread Eric Chaves
Hi Folks,

I've being cracking my head before asking but couldn't figure out how to
make an ScriptedLookupService (groovy) return either a Record or an Map. So
far I was only able to return single values.

Can anyone point me in the right direction?

Cheers,


Re: Enrichment

2017-11-05 Thread Eric Chaves
Hi Rahul, I'm working something similar and my first shoot was writing a
Scriptedookup service to perform database lookups. Not sure if this is the
recommended approach but it is working.

In case you want to give it a try you can take a look on Matt's articles
"Scripting Cookbooks" to get a hang on NiFi scripting capabilities and then
use Andy Lopresto groovy lookup as starting point:

-
https://community.hortonworks.com/articles/75032/executescript-cookbook-part-1.html
- https://gist.github.com/alopresto/78eb1a2c2b878f75f61481269af38a9f
- https://gist.github.com/alopresto/beb62a15c82c6d68528474085b1a9610

Best regards,

Eric

2017-11-05 0:08 GMT-02:00 Malhotra, Rahul :

> Hi,
> I am trying to enrich my data by looking up in a dimension table and
> moving from one table to another. My warehouse is Vertica, is there a
> recommended approach we can take? Or is it something we have to write as a
> custom process?
>
>
> Thanks,
>
> Rahul
>


Re: Enrichment flow using ScriptedLookup

2017-11-05 Thread Eric Chaves
Hi folks, thanks for the explanations! I've opened a JIRA ticket describing
this issues plus a few more that I faced since I started learning NiFi (
https://issues.apache.org/jira/browse/NIFI-4569). It got a little verbose
so forgive-me and feel free to ignore it. ;)

Most of my hit backs I attribute to my lack of experience with Java/groovy
and really to flaws of NiFi.

@Matt any chance of a future post series "service scripting cookbook"? ;)
Jokes apart it would be a great asset to have a post or document showcasing
the most common service interfaces.

Thanks again for the help!

2017-11-05 14:14 GMT-02:00 Matt Burgess <mattyb...@apache.org>:

> Eric,
>
> Because LookupService implements ControllerService, you must implement
> initialize(ControllerServiceInitializationContext context), which
> Andy's script provides an empty body for. However that context object
> has a method called getLogger() on it, so you can override the
> initialize() method and save off the logger for later use:
>
> class GroovyLookupService implements StringLookupService {
>
> def log
>
> @Override
> void initialize(ControllerServiceInitializationContext context) throws
> InitializationException {
> log = context.logger
> }
>
> // Other stuff
> }
>
> I admit it's a bit awkward to get at, but the ScriptedLookupService is
> more like InvokeScriptedProcessor (ISP) than ExecuteScript; the latter
> has a "log" binding so you can make immediate use of it; the former is
> more like a full implementation of the interface, so you usually get
> the logger a different way (i.e. an init method of some kind). Having
> said that, the ISP-based scripting components also attempt to call
> setLogger(ComponentLog logger) on your object if that method exists
> and the script engine is Invocable (Groovy is).
>
> I can look at adding the "log" object as a binding in all scripting
> components, I chose not to for the ISP-based processors in order to
> allow more flexibility with how/when logging is done (i.e. "log" is a
> global logger but ISP-based components have class definitions possibly
> with their own logger). But if it is easier in these cases to just
> have a "log" object to use, then that's fine with me, please feel free
> to mention that in the Jira.
>
> Regarding the awkwardness of setting log values, it might be nice to
> have a script (either in NiFi or available for download somewhere) to
> be able to update those without having to scour the logback.xml.  Next
> time I have a few minutes I'll whip up a Groovy script to try it out.
> BTW if you didn't want to mark all processors as INFO (just your
> ScriptedLookupService), you can add a separate entry in logback.xml
> for the fully-qualified class name
> (org.apache.nifi.lookup.script.ScriptedLookupService in this case) and
> just set that level to INFO. Then the other processors would remain at
> WARN and not "clog the log" while you're looking for something
> specific.  I'll make sure my script handles that (I envision a Groovy
> script that loads in the whole logback DOM, finds the node if it
> exists and update the level, or add a node if it does not, then writes
> it back out to logback.xml).
>
> Regards,
> Matt
>
>
>
> On Sun, Nov 5, 2017 at 9:45 AM, Joe Witt <joe.w...@gmail.com> wrote:
> > eric
> >
> > can you please file a JIRA to reflect the awkward process you had to
> > go through so we can improve it.
> >
> > Thanks
> >
> > On Sun, Nov 5, 2017 at 9:37 AM, Eric Chaves <e...@uolet.com> wrote:
> >> Ok, I managed to make to make it work. I had to add "log.dir=./logs" to
> >> bootstrap.conf file in order to have the logs generated. I got a little
> lost
> >> because this property is not mentioned at the docs but I could figured
> it
> >> out reading the logback.xml
> >>
> >> Once the logs began to be generated I could see that an exception was
> being
> >> raised in my scripts because I was trying to access "log" which does not
> >> seem to be available in Service controllers. Once I removed the
> offending
> >> log line, the code worked. Bottom line my attempt to debug the code was
> >> actually breaking it. ;)
> >>
> >> Thanks!
> >>
> >> 2017-11-05 12:14 GMT-02:00 Mike Thomsen <mikerthom...@gmail.com>:
> >>>
> >>> They go to logs/nifi-app.log. I think  >>> name="org.apache.nifi.processors" level="INFO"/> is actually WARN by
> >>> default. You'll need to tinker with the log levels (it's not straight
> >>> enable/disable) to get that

Re: Enrichment flow using ScriptedLookup

2017-11-05 Thread Eric Chaves
Ok, I managed to make to make it work. I had to add "log.dir=./logs" to
bootstrap.conf file in order to have the logs generated. I got a little
lost because this property is not mentioned at the docs but I could figured
it out reading the logback.xml

Once the logs began to be generated I could see that an exception was being
raised in my scripts because I was trying to access "log" which does not
seem to be available in Service controllers. Once I removed the offending
log line, the code worked. Bottom line my attempt to debug the code was
actually breaking it. ;)

Thanks!

2017-11-05 12:14 GMT-02:00 Mike Thomsen <mikerthom...@gmail.com>:

> They go to logs/nifi-app.log. I think  name="org.apache.nifi.processors"
> level="INFO"/> is actually WARN by default. You'll need to tinker with the
> log levels (it's not straight enable/disable) to get that working.
>
> On Sun, Nov 5, 2017 at 8:54 AM, Eric Chaves <e...@uolet.com> wrote:
>
>> Hi Mike, I'm running nifi using the official docker image (version 1.4.0)
>> and my logs folder is empty. I was looking at bootstrap.conf, logback.xml
>> and nifi.properties but couldn't found any config value that may
>> disable/enable log. Where should those logs be going?
>>
>> 2017-11-04 12:55 GMT-02:00 Mike Thomsen <mikerthom...@gmail.com>:
>>
>>> You may need to update the logback xml file in the conf folder. There is
>>> a line in there for the processor package. Might be too high for info.
>>>
>>> On Sat, Nov 4, 2017 at 10:50 AM Eric Chaves <e...@uolet.com> wrote:
>>>
>>>> Hi folks,
>>>>
>>>> I'm trying to adapt the flow described at
>>>> https://community.hortonworks.com/articles/138632/data-flow-
>>>> enrichment-with-nifi-lookuprecord-proces.html using
>>>> ScriptedLookupService as replacement for SimpleKeyValueLookupService
>>>> to lookup city names and enrich and incoming record.
>>>>
>>>> When I ran the flow with KeyValueLookupService the field gets enriched
>>>> properly but when I use my scriptedlookup the value always come back as
>>>> null.  The script was pretty simple and I can't figure out where is my
>>>> error. I also tried the ScriptLookup (just the script, not the flow) by
>>>> AloPresto at https://gist.github.com/alopresto/78eb1a2c2b878f75f614812
>>>> 69af38a9f with the same resutls.
>>>>
>>>> I'm trying to log.info the execution to figure out my mistakes but the
>>>> logs are going nowhere. How can I enable logging for services?
>>>>
>>>> Does anyone spot an error?
>>>>
>>>> ---[service-lookup.groovy]---
>>>> import org.apache.nifi.lookup.StringLookupService
>>>>
>>>> class GroovyLookupService implements StringLookupService {
>>>>
>>>>   def lookupTable = [
>>>> '1': 'Paris',
>>>> '2': 'Lyon',
>>>> '3': 'Marseille',
>>>> '4': 'Toulouse',
>>>> '5': 'Nice'
>>>>   ]
>>>>
>>>> @Override
>>>> Optional lookup(final String key) {
>>>>   log.warn('key value: ', key)
>>>>   return Optional.ofNullable(lookupTable[key])
>>>> }
>>>> }
>>>>
>>>> lookupService = new GroovyLookupService()
>>>> ---
>>>>
>>>>
>>
>


Re: Enrichment flow using ScriptedLookup

2017-11-05 Thread Eric Chaves
Hi Mike, I'm running nifi using the official docker image (version 1.4.0)
and my logs folder is empty. I was looking at bootstrap.conf, logback.xml
and nifi.properties but couldn't found any config value that may
disable/enable log. Where should those logs be going?

2017-11-04 12:55 GMT-02:00 Mike Thomsen <mikerthom...@gmail.com>:

> You may need to update the logback xml file in the conf folder. There is a
> line in there for the processor package. Might be too high for info.
>
> On Sat, Nov 4, 2017 at 10:50 AM Eric Chaves <e...@uolet.com> wrote:
>
>> Hi folks,
>>
>> I'm trying to adapt the flow described at https://community.hortonworks.
>> com/articles/138632/data-flow-enrichment-with-nifi-
>> lookuprecord-proces.html using ScriptedLookupService as replacement for 
>> SimpleKeyValueLookupService
>> to lookup city names and enrich and incoming record.
>>
>> When I ran the flow with KeyValueLookupService the field gets enriched
>> properly but when I use my scriptedlookup the value always come back as
>> null.  The script was pretty simple and I can't figure out where is my
>> error. I also tried the ScriptLookup (just the script, not the flow) by
>> AloPresto at https://gist.github.com/alopresto/
>> 78eb1a2c2b878f75f61481269af38a9f with the same resutls.
>>
>> I'm trying to log.info the execution to figure out my mistakes but the
>> logs are going nowhere. How can I enable logging for services?
>>
>> Does anyone spot an error?
>>
>> ---[service-lookup.groovy]---
>> import org.apache.nifi.lookup.StringLookupService
>>
>> class GroovyLookupService implements StringLookupService {
>>
>>   def lookupTable = [
>> '1': 'Paris',
>> '2': 'Lyon',
>> '3': 'Marseille',
>> '4': 'Toulouse',
>> '5': 'Nice'
>>   ]
>>
>> @Override
>> Optional lookup(final String key) {
>>   log.warn('key value: ', key)
>>   return Optional.ofNullable(lookupTable[key])
>> }
>> }
>>
>> lookupService = new GroovyLookupService()
>> ---
>>
>>


What would be a good way to build NoSQL entity from SQL data?

2017-11-02 Thread Eric Chaves
Hi fellows,

I'm planning a flow to feed data from a relational database into
elasticsearch and would like to ask for some advice.

The database has an entity Person with 1:N relation to Email and Phone
(email and phone are objects, not just scalar values).

The flow needs to feed ES with JSON object representing the Person object
and his Emails and Phones (like below) and when any table record gets
updated the flow needs to "refresh" the JSON entity to update corresponding
ES document.

-- Person.json
{
  name: 'Eric',
  age: 43,
  phones:[{number: '11--', qualified: false, updatedAt: '2017-11-29
10:01:13'}],
  emails: [{address: 'e...@domain.org', qualified: true, updatedAt:
'2017-11-10 09:12:35'}],
  updatedAt: '2017-11-30 17:29:53'
}

My first thought was to write a flow that monitors changes on the Person,
Email and Phone tables using QueryDatabaseTable processor and once a change
is detected to route the record (which always contains a PersonId) to a
processor that performs multiple queries to mount the Person JSON. So far
the only way I could find to do it is trough script processors.

Given that there is a lot of new Record based processors with schema
supports I was wondering if there a better approach or an idiomatic way of
performing this kind of "sql lookups" in NiFi.

What do you think?

Cheers,

Eric


Re: How to move queue boxes

2017-10-23 Thread Eric Chaves
Thanks Pierre, that's what I was looking for. By the way congrats on your
blog. I've being an avid reader of you and Matt.

Regards!

2017-10-23 10:14 GMT-02:00 Pierre Villard <pierre.villard...@gmail.com>:

> Hi Eric,
>
> You can add a bending point by double clicking on the black line of the
> relationship (and remove a bending point by double clicking on it).
>
> Pierre
>
> 2017-10-23 11:38 GMT+02:00 Eric Chaves <e...@uolet.com>:
>
>> Hi, my flow has a lot processors coming out from a single router
>> processor which makes the diagram a mess with queue's boxes overlapping
>> each other.
>>
>> sometimes when I connect two processors the queue box connecting them has
>> a small yellow square that allows moving the box around producing angled
>> lines but most of the time the connection is a straight line.
>>
>> Is it possible to move only the queue box? how can I force this yellow
>> square to show up?
>>
>> Thanks
>>
>
>


Re: How to move queue boxes

2017-10-23 Thread Eric Chaves
Hi Mark, thanks the clarification!

Cheers!

2017-10-23 11:04 GMT-02:00 Mark Payne <marka...@hotmail.com>:

> Hi Eric,
>
> Just to elaborate - once you double-click on the connection, it will
> create a new bend point as Pierre noted. this allows you to
> then bend the line however you'd like. In addition to this, though, once a
> bend point has been created, you can click on a
> connection's label and drag it to any one of the bend points. So if they
> get a little cluttered, you can create those bend points,
> if needed, as a mechanism for moving just the label itself, even if you
> don't want to bend the line.
>
> Thanks
> -Mark
>
>
> On Oct 23, 2017, at 8:14 AM, Pierre Villard <pierre.villard...@gmail.com>
> wrote:
>
> Hi Eric,
>
> You can add a bending point by double clicking on the black line of the
> relationship (and remove a bending point by double clicking on it).
>
> Pierre
>
> 2017-10-23 11:38 GMT+02:00 Eric Chaves <e...@uolet.com>:
>
>> Hi, my flow has a lot processors coming out from a single router
>> processor which makes the diagram a mess with queue's boxes overlapping
>> each other.
>>
>> sometimes when I connect two processors the queue box connecting them has
>> a small yellow square that allows moving the box around producing angled
>> lines but most of the time the connection is a straight line.
>>
>> Is it possible to move only the queue box? how can I force this yellow
>> square to show up?
>>
>> Thanks
>>
>
>
>


How to move queue boxes

2017-10-23 Thread Eric Chaves
Hi, my flow has a lot processors coming out from a single router processor
which makes the diagram a mess with queue's boxes overlapping each other.

sometimes when I connect two processors the queue box connecting them has a
small yellow square that allows moving the box around producing angled
lines but most of the time the connection is a straight line.

Is it possible to move only the queue box? how can I force this yellow
square to show up?

Thanks


Re: Advice on sending high volumes of email with putEmail

2017-10-22 Thread Eric Chaves
Hi Joe, thanks for the input. I'm now "fighting" to build a multiṕart mime
 message (just sent another message to the list) where I javax gave me an
exception regarding not being able to handle multipart/related messages
(UnsupportedDataTypeException: no object DCH for MIME type).

I assumed that since NiFi already have nar files using some java packages,
importing those packages would be easy inside a script processor but this
is not turning out to be the case (at least for a novice guy like me).

Any ideas on  how to fix this?

Regards.

2017-10-22 15:58 GMT-02:00 Joe Witt <joe.w...@gmail.com>:

> that rate should be perfectly fine on a modest host system.
>
> On Oct 21, 2017 12:40 PM, "Eric Chaves" <e...@uolet.com> wrote:
>
>> Hi Folks,
>>
>> I'm planning to use nifi to send transactional e-mails with PDF attached.
>> I'll be dispatching around 10K messages per day with peaks up to 1k per
>> hour. Do you think nifi can handle this task?
>>
>> Also, any advices on sizing and dimension of the nifi node for this task?
>>
>> Thanks in advance,
>>
>> Eric
>>
>


Help with UnsupportedDataTypeException: no object DCH for MIME type

2017-10-22 Thread Eric Chaves
Hi folks, after some trial and error (I'm not a java developer) I think I
managed to write a groovy script to build a multi mimepart message, but now
I'm stuck in an exception that I have no clue about what to do:

"javax.activation.UnsupportedDataTypeException: no object DCH for MIME type
multipart/related;
boundary="=_Part_0_80548387.1508693688989""

>From what I could find googling around this relates to the fact that
javax.mail requires some setup to find out how to handle mime-types (
https://stackoverflow.com/questions/21856211/javax-activation-unsupporteddatatypeexception-no-object-dch-for-mime-type-multi/21898970)
and adding such code would fix it (except that it didn't):

MailcapCommandMap mc = (MailcapCommandMap)
CommandMap.getDefaultCommandMap();
mc.addMailcap("text/html;;
x-java-content-handler=com.sun.mail.handlers.text_html");
mc.addMailcap("text/xml;;
x-java-content-handler=com.sun.mail.handlers.text_xml");
mc.addMailcap("text/plain;;
x-java-content-handler=com.sun.mail.handlers.text_plain");
mc.addMailcap("multipart/*;;
x-java-content-handler=com.sun.mail.handlers.multipart_mixed");
mc.addMailcap("message/rfc822;; x-java-content-
handler=com.sun.mail.handlers.message_rfc822");

I'm downloaded both javax.mail and javax.activation jar files and added
them to my ExcuteScript processor's module folder.

Any groovy/java ninja here could help me out?

Thanks in advance,

Eric


ImportingCan I import javax in a script processor?

2017-10-21 Thread Eric Chaves
Hi folks,

I'm writing a script processor to send email messages with an attachment.
After reading the putEmail processor and googling a little bit I came up
with the attached script however my script throws exceptions when importing
any javax.mail  classes (ex.: unable to resolve class
javax.mail.util.ByteArrayDataSource)

Can I import javax in a scripted processor?

Thanks.

ps: below is my current script code.

import org.apache.nifi.controller.ControllerService
import org.apache.nifi.lookup.StringLookupService

import javax.activation.DataHandler
import javax.mail.Authenticator
import javax.mail.Message
import javax.mail.Message.RecipientType
import javax.mail.MessagingException
import javax.mail.PasswordAuthentication
import javax.mail.Session
import javax.mail.Transport
import javax.mail.internet.AddressException
import javax.mail.internet.InternetAddress
import javax.mail.internet.MimeBodyPart
import javax.mail.internet.MimeMessage
import javax.mail.internet.MimeMultipart
import javax.mail.internet.PreencodedMimeBodyPart
import javax.mail.util.ByteArrayDataSource

def lookup = context.controllerServiceLookup
def smtpServiceId =
lookup.getControllerServiceIdentifiers(ControllerService).find
{
cs -> lookup.getControllerServiceName(cs) == SmtpServiceName.value
}

def flowFile = session.get()
if (flowFile == null) return;

def hostname = lookup.getControllerService(smtpServiceId)?.lookup(['key':
'hostname']).get()
def username = lookup.getControllerService(smtpServiceId)?.lookup(['key':
'username']).get()
def password = lookup.getControllerService(smtpServiceId)?.lookup(['key':
'password']).get()
def subject = Subject.evaluateAttributeExpressions(flowFile).value
def from = From.evaluateAttributeExpressions(flowFile).value
def to = To.evaluateAttributeExpressions(flowFile).value
def html = HtmlMessage.evaluateAttributeExpressions(flowFile).value
def bcc = Bcc.evaluateAttributeExpressions(flowFile).value
//def text = TextMessage ?
TextMessage.evaluateAttributeExpressions(flowFile).value : null

log.info("Mailer info: hostname=${hostname}, username=${username}, subject=
${subject}, from=${from}, to=${to}, bcc=${bcc}")

try{
Properties props = new Properties()
props.setProperty("mail.transport.protocol","smtp")
props.setProperty("mail.smtp.host",host)
props.setProperty("mail.smtp.port",port)
Session session = Session.getDefaultInstance(props,null)
MimeMessage message = new MimeMessage(session)
message.addFrom(new InternetAddress(from));
message.setRecipients(RecipientType.TO, new InternetAddress(to))
message.setRecipients(RecipientType.BCC, new InternetAddress(bcc))
message.setHeader('X-Mailer', 'nifi mailer 1.0')
message.setSubject(subject)

MimeMultipart multipart = new MimeMultipart()
MimeBodyPart htmlPart = new MimeBodyPart()
htmlPart.setContent(message,"text/html; charset=utf-8")
multipart.addBodyPart(htmlPart)

MimeBodyPart pdfPart = new MimeBodyPart()
session.read(flowFile, {inputStream ->
pdfPart.setDataHandler(new DataHandler(new ByteArrayDataSource(inputStream,
"application/pdf")));
pdfPart.setFileName("invoice.pdf")
} as InputStreamCallback)

if(pdfPart.getSize()){
multipart.addBodyPart(pdfPart)
}

message.setContent(multipart)
Transport.send(message)

session.transfer(flowFile, REL_SUCCESS)
} catch(Exception ex) {
log.error(ex.toString())
log.error(ex.getStackTrace())
flowFile = session.putAttribute(flowFile, 'Error', ex.toString())
session.transfer(flowFile, REL_FAILURE)
}


Formatting string numbers and date

2017-10-21 Thread Eric Chaves
Hi, is it possible to format numbers and dates with NiFi's Expression
Language?

Best regards,


Advice on sending high volumes of email with putEmail

2017-10-21 Thread Eric Chaves
Hi Folks,

I'm planning to use nifi to send transactional e-mails with PDF attached.
I'll be dispatching around 10K messages per day with peaks up to 1k per
hour. Do you think nifi can handle this task?

Also, any advices on sizing and dimension of the nifi node for this task?

Thanks in advance,

Eric


Re: Parsing stringified json content

2017-10-09 Thread Eric Chaves
Hi James, thanks for the help!

2017-10-09 0:31 GMT-03:00 James Wing <jvw...@gmail.com>:

> Eric,
>
> You can use the EvaluateJsonPath processor to extract the SNS message body
> using a JSON path of "$.Message".  As part of evaluating the path, it will
> un-escape the stringified content of Message and return the enclosed JSON
> content.
>
> If you want this to be in a new, separate flowfile from the original SNS
> body, I recommend duplicating the flowfile upstream of EvaluateJsonPath.
> It is easy to update the flowfile content to the "Message" content with
> EvaluateJsonPath by setting the Destination property to flowfile-content.
>
> Thanks,
>
> James
>
>
> On Sun, Oct 8, 2017 at 3:38 PM, Eric Chaves <e...@uolet.com> wrote:
>
>> Hi, I'm writing a flow to handle SNS notification messages received
>> through HTTP. Those messages are JSON content whose Message attribute is an
>> stringified JSON text.
>>
>> How can I parse this content into a new flow file?
>>
>> Thanks in advance,
>>
>> Eric
>>
>
>


Parsing stringified json content

2017-10-08 Thread Eric Chaves
Hi, I'm writing a flow to handle SNS notification messages received through
HTTP. Those messages are JSON content whose Message attribute is an
stringified JSON text.

How can I parse this content into a new flow file?

Thanks in advance,

Eric


Re: tips for flow development and version control

2017-10-08 Thread Eric Chaves
Hi Joe, thanks for the promptly reply.

I believe that what I'm looking for is really much simpler than the
versioned flows.

What I need is to be able to commit my changes after making some changes on
my flow. The template could work but it would require me to always perform
an export before committing, which is kind of nagging. can I somehow
script/automate this export?

Another possibility would be to keep the flow uncompressed instead of
storing it as tar.gz. is this possible?

Once again thanks for the help.

Regards,

Eric


2017-10-06 9:07 GMT-03:00 Joe Witt <joe.w...@gmail.com>:

> Eric,
>
> The current mechanism we support for this sort of thing is the
> template mechanism [1].
>
> However, there is a lot of effort now going into building a registry
> model for versioned flows and extensions. [2]  There has also been a
> JIRA instance launched for tracking progress of the registry itself
> [3].  In recent releases the foundations for this were put in place
> with process group hosted/scoped variables (for expression language)
> and versioned extension support.
>
> The template approach doesn't integrate with Git directly but you can
> build a process around that.  They are useful but there are important
> shortcomings that the registry approach will solve.  I don't know if
> there is a JIRA to have direct git integration or not yet but you
> could certainly create one if interested.
>
> [1] https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#templates
> [2] https://cwiki.apache.org/confluence/display/NIFI/
> Configuration+Management+of+Flows
> [3] https://issues.apache.org/jira/projects/NIFIREG
>
> Thanks
> Joe
>
> On Fri, Oct 6, 2017 at 7:55 AM, Eric Chaves <e...@uolet.com> wrote:
> > Hi folks, I'm starting with nifi and would like to ask for some tips
> > regarding how to best setup/manage the a project folder with multiple
> flows
> > using git.
> >
> > Do you have any tips to share?
> >
> > Cheers,
> >
> > Eric
>


tips for flow development and version control

2017-10-06 Thread Eric Chaves
Hi folks, I'm starting with nifi and would like to ask for some tips
regarding how to best setup/manage the a project folder with multiple flows
using git.

Do you have any tips to share?

Cheers,

Eric