Re: [ANNOUNCE] New Apache NiFi Committer Sivaprasanna Sethuraman

2018-06-06 Thread Jorge Machado
Congrats ! 

Jorge





> On 6 Jun 2018, at 18:44, Otto Fowler  wrote:
> 
> Congratulations!
> 
> 
> On June 5, 2018 at 10:09:28, Tony Kurc (tk...@apache.org) wrote:
> 
> On behalf of the Apache NiFI PMC, I am very pleased to announce that
> Sivaprasanna has accepted the PMC's invitation to become a committer on the
> Apache NiFi project. We greatly appreciate all of Sivaprasanna's hard work
> and generous contributions to the project. We look forward to continued
> involvement in the project.
> 
> Sivaprasanna has been working with the community on the mailing lists, and
> has a big mix of code and feature contributions to include features and
> improvements to cloud service integrations like Azure, AWS, and Google
> Cloud.
> 
> Welcome and congratulations!



Re: KerberosProperties.validatePrincipalAndKeytab Error ?

2018-06-04 Thread Jorge Machado
Ok that makes sinn. It was really confusing because all other processors when 
they say Supports language expression it works on the incoming flow files. 
Thanks for the heads up.

Jorge Machado


> On 4 Jun 2018, at 14:34, Mark Payne  wrote:
> 
> Jorge,
> 
> These properties do support Expression Language. However, they do not support 
> evaluating FlowFile Attributes,
> only values from the Variable Registry. So this is going to be invalid unless 
> you define a variable in the variable
> registry for both the Principal and the Keytab.
> 
> Unfortunately, in past versions of NiFi, this was not made very clear, when a 
> property supported Expression Language
> but only against the Variable Registry. Fortunately, in the next version, 
> 1.7.0, the UI will clearly indicate whether Expression
> Language can be evaluated against FlowFile Attributes or only the Variable 
> Registry, so it should help to clear up
> some of this confusion.
> 
> Thanks
> -Mark
> 
> 
>> On Jun 4, 2018, at 6:43 AM, Jorge Machado  wrote:
>> 
>> Sivaprasanna, 
>> Yes I have that property set, the Unit test is failing IMHO it should not.
>> 
>> 
>> Jorge Machado
>> Best Regards
>> 
>> 
>>> On 4 Jun 2018, at 12:09, Sivaprasanna  wrote:
>>> 
>>> Jorge,
>>> 
>>> Both 'Kerberos Principal' abd 'Kerberos Keytab' support NiFi expression 
>>> language so ${principal} and ${keytab} is valid here. Can you check if the 
>>> property "nifi.kerberos.krb5.file" is set in nifi.properties file? Looks 
>>> like this has to be set according to the description of those properties.
>>> 
>>> -
>>> Sivaprasanna
>>> 
>>> On Mon, Jun 4, 2018 at 1:27 PM, Jorge Machado >> <mailto:jom...@me.com>> wrote:
>>> Hi Guys, 
>>> 
>>> I’m facing the issue that I cannot start the DeleteHDFS with the error: 
>>> "Kerberos Principal must be provided when using a secure configuration”
>>> 
>>> 
>>> I’m able to reproduce this with this test: 
>>> @Test
>>> public void testKerberosOptionsWithCredentialServices() throws Exception {
>>>   SimpleHadoopProcessor processor = new 
>>> SimpleHadoopProcessor(kerberosProperties);
>>>   TestRunner runner = TestRunners.newTestRunner(processor);
>>> 
>>>   // initialize the runner with EL for the kerberos properties
>>>   
>>> runner.setProperty(AbstractHadoopProcessor.HADOOP_CONFIGURATION_RESOURCES, 
>>> "${variableHadoopConfigResources}");
>>>   runner.setProperty(kerberosProperties.getKerberosPrincipal(), 
>>> "${variablePrincipal}");
>>>   runner.setProperty(kerberosProperties.getKerberosKeytab(), 
>>> "${variableKeytab}");
>>> 
>>>   // add variables for all the kerberos properties except for the keytab
>>>   runner.setVariable("variableHadoopConfigResources", 
>>> "src/test/resources/core-site-security.xml");
>>>   runner.assertValid();
>>> }
>>> In our case the ${principal} and ${keytab} is coming as a attribute on the 
>>> incomming the Flowfile, the problem is that this validation of the 
>>> attribute is happening before. 
>>> Should this work like this? In all other places if we are using a variable 
>>> it can be evaluate at run time... 
>>> 
>>> Jorge Machado
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 



Re: KerberosProperties.validatePrincipalAndKeytab Error ?

2018-06-04 Thread Jorge Machado
Sivaprasanna, 
Yes I have that property set, the Unit test is failing IMHO it should not.


Jorge Machado
Best Regards


> On 4 Jun 2018, at 12:09, Sivaprasanna  wrote:
> 
> Jorge,
> 
> Both 'Kerberos Principal' abd 'Kerberos Keytab' support NiFi expression 
> language so ${principal} and ${keytab} is valid here. Can you check if the 
> property "nifi.kerberos.krb5.file" is set in nifi.properties file? Looks like 
> this has to be set according to the description of those properties.
> 
> -
> Sivaprasanna
> 
> On Mon, Jun 4, 2018 at 1:27 PM, Jorge Machado  <mailto:jom...@me.com>> wrote:
> Hi Guys, 
> 
> I’m facing the issue that I cannot start the DeleteHDFS with the error: 
> "Kerberos Principal must be provided when using a secure configuration”
> 
> 
> I’m able to reproduce this with this test: 
> @Test
> public void testKerberosOptionsWithCredentialServices() throws Exception {
> SimpleHadoopProcessor processor = new 
> SimpleHadoopProcessor(kerberosProperties);
> TestRunner runner = TestRunners.newTestRunner(processor);
> 
> // initialize the runner with EL for the kerberos properties
> 
> runner.setProperty(AbstractHadoopProcessor.HADOOP_CONFIGURATION_RESOURCES, 
> "${variableHadoopConfigResources}");
> runner.setProperty(kerberosProperties.getKerberosPrincipal(), 
> "${variablePrincipal}");
> runner.setProperty(kerberosProperties.getKerberosKeytab(), 
> "${variableKeytab}");
> 
> // add variables for all the kerberos properties except for the keytab
> runner.setVariable("variableHadoopConfigResources", 
> "src/test/resources/core-site-security.xml");
> runner.assertValid();
> }
> In our case the ${principal} and ${keytab} is coming as a attribute on the 
> incomming the Flowfile, the problem is that this validation of the attribute 
> is happening before. 
> Should this work like this? In all other places if we are using a variable it 
> can be evaluate at run time... 
> 
> Jorge Machado
> 
> 
> 
> 
> 
> 



Re: Disable all Remote Processors

2018-06-04 Thread Jorge Machado
Hi Pierre, 

sorry, for the late response. Yes that is the idea. In our case we have a lot 
of RPG and is kind of boring the need to go trough every single one and click 
disable

Thanks for the response any way.




> On 15 May 2018, at 11:50, Pierre Villard  wrote:
> 
> Hi Jorge,
> 
> I'm not sure to understand. You'd like something to disable communication
> on all the RPG included in a PG (recursively I assume)?
> Not sure that's worth a specific API endpoint as I believe this can be
> scripted and done with the currently exposed APIs, no?
> 
> Am I missing something?
> 
> Pierre
> 
> 2018-05-15 9:23 GMT+02:00 Jorge Machado :
> 
>> hi all,
>> 
>> it should be possible to disable all remote processors groups inside a
>> Processor group right ?
>> should we start with a PR for the API ?
>> 
>> Jorge Machado
>> 
>> 
>> 
>> 
>> 
>> 
>> 



Disable all Remote Processors

2018-05-15 Thread Jorge Machado
hi all, 

it should be possible to disable all remote processors groups inside a 
Processor group right ?  
should we start with a PR for the API ? 

Jorge Machado








Pushing flows to Registry with Sensitive Information

2018-04-25 Thread Jorge Machado
Hi Guys, 

so I was playing with the registry and If I pushed a Processor that has 
sensitive information like a password it will be discarded when pulling it from 
the Registry, which is fine.

Now comes the but. But if I put a variable there IMHO I think it should save it 
on the registry.

What do you think ? 

Jorge 







Nifi UI improvement

2018-04-12 Thread Jorge Machado
Hi guys, 

Is there any effort to improve the UI ? One nice feature to have would be: Have 
the possibility to limit searches on a canvas. Let’s say we are inside a 
processor group and we only want to search from there on.

Regards

Jorge Machado








Re: FlattenJson

2018-03-23 Thread Jorge Machado
So I’m pretty lost now, all the suggestions from Matt will not solve my problem 
that I need to have all contents of a flow file as attritube key -paired… 

A good place to have it would be on ConvertAvroToJSON so that it has a option 
to say if it goes to attribute or to FlowFile, defaulting to Flowfile.

Would be the Changed accepted  ? I would create a PR for it. 


Jorge Machado





> On 20 Mar 2018, at 22:35, Otto Fowler <ottobackwa...@gmail.com> wrote:
> 
> We could start with routeOnJsonPath and do the record path as the need
> arises?
> 
> 
> On March 20, 2018 at 16:06:34, Matt Burgess (mattyb...@apache.org) wrote:
> 
> Rather than restricting it to JSONPath, perhaps we should have a
> RouteOnRecordPath or RouteRecord using the RecordPath API? Even better
> would be the ability to use RecordPath functions in QueryRecord, but
> that involves digging into Calcite as well. I realize JSONPath might
> have more capabilities than RecordPath at the moment, but it seems a
> shame to force the user to convert to JSON to use a "RouteOnJSONPath"
> processor, the record-aware processors are meant to replace that kind
> of format-specific functionality.
> 
> Regards,
> Matt
> 
> On Tue, Mar 20, 2018 at 12:19 PM, Sivaprasanna
> <sivaprasanna...@gmail.com> wrote:
>> Like the idea that Otto suggested. RoutOnJSONPath makes more sense since
>> making the flattened JSON write to attributes is restricted to that
>> processor alone.
>> 
>> On Tue, Mar 20, 2018 at 8:37 PM, Otto Fowler <ottobackwa...@gmail.com>
>> wrote:
>> 
>>> Why not create a new processor that does routeOnJSONPath and works on
> the
>>> flow file?
>>> 
>>> 
>>> On March 20, 2018 at 10:39:37, Jorge Machado (jom...@me.com) wrote:
>>> 
>>> So that is what we actually are doing EvaluateJsonPath the problem with
>>> that is, that is hard to build something generic if we need to specify
> each
>>> property by his name, that’s why this idea.
>>> 
>>> Should I make a PR for this or is this to business specific ?
>>> 
>>> 
>>> Jorge Machado
>>> 
>>>> On 20 Mar 2018, at 15:30, Bryan Bende <bbe...@gmail.com> wrote:
>>>> 
>>>> Ok so I guess it depends whether you end up needing all 30 fields as
>>>> attributes to achieve the logic in your flow, or if you only need a
>>>> couple.
>>>> 
>>>> If you only need a couple you could probably use EvaluateJsonPath
>>>> after FlattenJson to extract just the couple of fields you need into
>>>> attributes.
>>>> 
>>>> If you need them all then I guess it makes sense to want the option to
>>>> flatten into attributes.
>>>> 
>>>> On Tue, Mar 20, 2018 at 10:14 AM, Jorge Machado <jom...@me.com> wrote:
>>>>> From there on we use a lot of routeOnAttritutes and use that values
> on
>>> sql queries to other tables like select * from someTable where
>>> id=${myExtractedAttribute}
>>>>> To be honest I tryed JoltTransformJSON but I could not get it working
> :)
>>>>> 
>>>>> Jorge Machado
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 20 Mar 2018, at 15:12, Matt Burgess <mattyb...@apache.org> wrote:
>>>>>> 
>>>>>> I think Bryan is asking about what happens AFTER this part of the
>>>>>> flow. For example, if you are doing routing you can use QueryRecord
>>>>>> (and you won't need the SplitJson), if you are doing transformations
>>>>>> you can use JoltTransformJSON (often without SplitJson as well),
> etc.
>>>>>> 
>>>>>> Regards,
>>>>>> Matt
>>>>>> 
>>>>>> On Tue, Mar 20, 2018 at 10:08 AM, Jorge Machado <jom...@me.com>
> wrote:
>>>>>>> Hi Bryan,
>>>>>>> 
>>>>>>> thanks for the help.
>>>>>>> Our Flow: ExecuteSql -> convertToJSON -> SplitJson -> ExecuteScript
>>> with attachedcode 1.
>>>>>>> 
>>>>>>> We are now writting a custom processor that does this which is a
> copy
>>> of FlattenJson but instead of putting the result into a flowfile we put
> it
>>> into the attributes.
>>>>>>> That’s why I asked if it makes sense to contribute this back
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>

Re: Stackoverflow question: Moving data from one RDB to another through NiFi

2018-03-22 Thread Jorge Machado
That would be probably the fastest way. 

Jorge Machado
www.jmachado.me





> On 22 Mar 2018, at 08:38, Brett Ryan <brett.r...@gmail.com> wrote:
> 
> Hmmm, now I’m doubting myself. It’s possible we sqoop to hdfs then sqoop
> out, will have to look, sorry if I am wrong.
> 
> On Thu, 22 Mar 2018 at 18:27, Jorge Machado <jom...@me.com> wrote:
> 
>> Hi Bryan Sure ? Database to Database ? Or  with a step in between ? Can
>> you past the command that you use ? That would be new to me and I would be
>> interested.
>> 
>> Jorge Machado
>> 
>> 
>> 
>> 
>> 
>>> On 22 Mar 2018, at 08:22, Brett Ryan <brett.r...@gmail.com> wrote:
>>> 
>>> Sure it does, I’m using it for postgres and MariaDB (which is
>> essentially MySQL).
>>> 
>>>> On 22 Mar 2018, at 18:18, Jorge Machado <jom...@me.com> wrote:
>>>> 
>>>> Sqoop does not import into a mySql database. Just into Hive if you tell
>> him to do so.
>>>> You could use Nifi but if you have a lot of data may be you should try
>> Spark.  which reads and writes in Parallel.
>>>> Using Nifi would work to but you have the overhead of pumping the data
>> over “insert” unless you copy the files into the server and on the server
>> then use some import bulk…..
>>>> 
>>>> Jorge Machado
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 22 Mar 2018, at 08:13, Brett Ryan <brett.r...@gmail.com> wrote:
>>>>> 
>>>>> Could Sqoop [1] be an option?
>>>>> 
>>>>> [1]: http://sqoop.apache.org/
>>>>> 
>>>>>> On 22 Mar 2018, at 16:33, Sivaprasanna <sivaprasanna...@gmail.com>
>> wrote:
>>>>>> 
>>>>>> I had a chance to attempt a question raised on stackoverflow regarding
>>>>>> moving data from SQL Server to MySQL using NiFi. The user is using
>>>>>> GenerateTableFetch to read data from SQL Server and then try to use
>> LOAD
>>>>>> DATA command in ExecuteSQL but this involves writing the read SQL
>> Server
>>>>>> data to filesystem and then load it, which is a performance hit, I
>>>>>> suggested the user to try PutDatabaseRecord but I have never tried the
>>>>>> approach myself and going by the docs, I think it won't show any
>>>>>> performance benefit than LOAD DATA because the former reads from file
>> and
>>>>>> inserts at a high speed while the latter reads content and parses it
>>>>>> according to the configured Record Reader and insert the rows as a
>> single
>>>>>> batch. Confused, I wanted to get the community's opinion/thoughts on
>> this.
>>>>>> Please attempt the questions, if you have better suggestions.
>>>>>> 
>>>>>> Links:
>>>>>> 
>>>>>> -
>>>>>> 
>> https://stackoverflow.com/questions/49400447/bulk-load-sql-server-data-into-mysql-apache-nifi?noredirect=1#comment85843021_49400447
>>>>>> -
>>>>>> 
>> https://stackoverflow.com/questions/49380307/flowfile-absolute-path-nifi/49398500?noredirect=1#comment85805848_49398500
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Sivaprasanna
>>>> 
>> 
>> 



Re: Stackoverflow question: Moving data from one RDB to another through NiFi

2018-03-22 Thread Jorge Machado
Sqoop does not import into a mySql database. Just into Hive if you tell him to 
do so. 
You could use Nifi but if you have a lot of data may be you should try Spark.  
which reads and writes in Parallel. 
Using Nifi would work to but you have the overhead of pumping the data over 
“insert” unless you copy the files into the server and on the server then use 
some import bulk…..

Jorge Machado





> On 22 Mar 2018, at 08:13, Brett Ryan <brett.r...@gmail.com> wrote:
> 
> Could Sqoop [1] be an option?
> 
>  [1]: http://sqoop.apache.org/
> 
>> On 22 Mar 2018, at 16:33, Sivaprasanna <sivaprasanna...@gmail.com> wrote:
>> 
>> I had a chance to attempt a question raised on stackoverflow regarding
>> moving data from SQL Server to MySQL using NiFi. The user is using
>> GenerateTableFetch to read data from SQL Server and then try to use LOAD
>> DATA command in ExecuteSQL but this involves writing the read SQL Server
>> data to filesystem and then load it, which is a performance hit, I
>> suggested the user to try PutDatabaseRecord but I have never tried the
>> approach myself and going by the docs, I think it won't show any
>> performance benefit than LOAD DATA because the former reads from file and
>> inserts at a high speed while the latter reads content and parses it
>> according to the configured Record Reader and insert the rows as a single
>> batch. Confused, I wanted to get the community's opinion/thoughts on this.
>> Please attempt the questions, if you have better suggestions.
>> 
>> Links:
>> 
>>  -
>>  
>> https://stackoverflow.com/questions/49400447/bulk-load-sql-server-data-into-mysql-apache-nifi?noredirect=1#comment85843021_49400447
>>  -
>>  
>> https://stackoverflow.com/questions/49380307/flowfile-absolute-path-nifi/49398500?noredirect=1#comment85805848_49398500
>> 
>> Thanks,
>> 
>> Sivaprasanna



Re: FlattenJson

2018-03-20 Thread Jorge Machado
So that is what we actually are doing EvaluateJsonPath the problem with that 
is, that is hard to build something generic if we need to specify each property 
by his name, that’s why this idea. 

Should I make a PR  for this or is this to business specific ? 


Jorge Machado

> On 20 Mar 2018, at 15:30, Bryan Bende <bbe...@gmail.com> wrote:
> 
> Ok so I guess it depends whether you end up needing all 30 fields as
> attributes to achieve the logic in your flow, or if you only need a
> couple.
> 
> If you only need a couple you could probably use EvaluateJsonPath
> after FlattenJson to extract just the couple of fields you need into
> attributes.
> 
> If you need them all then I guess it makes sense to want the option to
> flatten into attributes.
> 
> On Tue, Mar 20, 2018 at 10:14 AM, Jorge Machado <jom...@me.com> wrote:
>> From there on  we use a lot of routeOnAttritutes and use that values on sql 
>> queries to other tables like select * from someTable where 
>> id=${myExtractedAttribute}
>> To be honest I tryed JoltTransformJSON but I could not get it working :)
>> 
>> Jorge Machado
>> 
>> 
>> 
>> 
>> 
>>> On 20 Mar 2018, at 15:12, Matt Burgess <mattyb...@apache.org> wrote:
>>> 
>>> I think Bryan is asking about what happens AFTER this part of the
>>> flow. For example, if you are doing routing you can use QueryRecord
>>> (and you won't need the SplitJson), if you are doing transformations
>>> you can use JoltTransformJSON (often without SplitJson as well), etc.
>>> 
>>> Regards,
>>> Matt
>>> 
>>> On Tue, Mar 20, 2018 at 10:08 AM, Jorge Machado <jom...@me.com> wrote:
>>>> Hi Bryan,
>>>> 
>>>> thanks for the help.
>>>> Our Flow: ExecuteSql -> convertToJSON ->  SplitJson -> ExecuteScript with 
>>>> attachedcode 1.
>>>> 
>>>> We are now writting a custom processor that does this which is a copy of 
>>>> FlattenJson but instead of putting the result into a flowfile we put it 
>>>> into the attributes.
>>>> That’s why I asked if it makes sense to contribute this back
>>>> 
>>>> 
>>>> 
>>>> Attached code 1:
>>>> 
>>>> import org.apache.commons.io.IOUtils
>>>> import java.nio.charset.*
>>>> def flowFile = session.get();
>>>> if (flowFile == null) {
>>>>   return;
>>>> }
>>>> def slurper = new groovy.json.JsonSlurper()
>>>> def attrs = [:] as Map<String,String>
>>>> session.read(flowFile,
>>>>   { inputStream ->
>>>>   def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>>>>   def obj = slurper.parseText(text)
>>>>   obj.each {k,v ->
>>>>   if(v!=null && v.toString()!=""){
>>>> attrs[k] = v.toString()
>>>> }
>>>>   }
>>>>   } as InputStreamCallback)
>>>> flowFile = session.putAllAttributes(flowFile, attrs)
>>>> session.transfer(flowFile, REL_SUCCESS)
>>>> 
>>>> some code removed
>>>> 
>>>> 
>>>> Jorge Machado
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 20 Mar 2018, at 15:03, Bryan Bende <bbe...@gmail.com> wrote:
>>>>> 
>>>>> Ok it is still not clear what the reason for needing it in attributes
>>>>> is though... Is there another processor you are using after this that
>>>>> only works off attributes?
>>>>> 
>>>>> Just trying to understand if there is another way to accomplish what
>>>>> you want to do.
>>>>> 
>>>>> On Tue, Mar 20, 2018 at 9:50 AM, Jorge Machado <jom...@me.com> wrote:
>>>>>> We are using nifi for Workflow and we get from a database like 
>>>>>> job_status and job_name and some nested json columns.  (30 columns)
>>>>>> We need to put it as attributes from the Flow file and not the content. 
>>>>>> For the first part (columns without a json is done by groovy script) but 
>>>>>> then would be nice to use this standard processor and instead of writing 
>>>>>> this to a flow content write it to attributes.
>>>>>> 
>>>>>> 
>>>>>> Jorge Machado
>>>>>> 
>>>>>> 
>>>>>> 
>>>&g

Re: FlattenJson

2018-03-20 Thread Jorge Machado
From there on  we use a lot of routeOnAttritutes and use that values on sql 
queries to other tables like select * from someTable where 
id=${myExtractedAttribute}
To be honest I tryed JoltTransformJSON but I could not get it working :) 

Jorge Machado





> On 20 Mar 2018, at 15:12, Matt Burgess <mattyb...@apache.org> wrote:
> 
> I think Bryan is asking about what happens AFTER this part of the
> flow. For example, if you are doing routing you can use QueryRecord
> (and you won't need the SplitJson), if you are doing transformations
> you can use JoltTransformJSON (often without SplitJson as well), etc.
> 
> Regards,
> Matt
> 
> On Tue, Mar 20, 2018 at 10:08 AM, Jorge Machado <jom...@me.com> wrote:
>> Hi Bryan,
>> 
>> thanks for the help.
>> Our Flow: ExecuteSql -> convertToJSON ->  SplitJson -> ExecuteScript with 
>> attachedcode 1.
>> 
>> We are now writting a custom processor that does this which is a copy of 
>> FlattenJson but instead of putting the result into a flowfile we put it into 
>> the attributes.
>> That’s why I asked if it makes sense to contribute this back
>> 
>> 
>> 
>> Attached code 1:
>> 
>> import org.apache.commons.io.IOUtils
>> import java.nio.charset.*
>> def flowFile = session.get();
>> if (flowFile == null) {
>>return;
>> }
>> def slurper = new groovy.json.JsonSlurper()
>> def attrs = [:] as Map<String,String>
>> session.read(flowFile,
>>{ inputStream ->
>>def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
>>def obj = slurper.parseText(text)
>>obj.each {k,v ->
>>if(v!=null && v.toString()!=""){
>>  attrs[k] = v.toString()
>>  }
>>}
>>} as InputStreamCallback)
>> flowFile = session.putAllAttributes(flowFile, attrs)
>> session.transfer(flowFile, REL_SUCCESS)
>> 
>> some code removed
>> 
>> 
>> Jorge Machado
>> 
>> 
>> 
>> 
>> 
>>> On 20 Mar 2018, at 15:03, Bryan Bende <bbe...@gmail.com> wrote:
>>> 
>>> Ok it is still not clear what the reason for needing it in attributes
>>> is though... Is there another processor you are using after this that
>>> only works off attributes?
>>> 
>>> Just trying to understand if there is another way to accomplish what
>>> you want to do.
>>> 
>>> On Tue, Mar 20, 2018 at 9:50 AM, Jorge Machado <jom...@me.com> wrote:
>>>> We are using nifi for Workflow and we get from a database like job_status 
>>>> and job_name and some nested json columns.  (30 columns)
>>>> We need to put it as attributes from the Flow file and not the content. 
>>>> For the first part (columns without a json is done by groovy script) but 
>>>> then would be nice to use this standard processor and instead of writing 
>>>> this to a flow content write it to attributes.
>>>> 
>>>> 
>>>> Jorge Machado
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 20 Mar 2018, at 14:47, Bryan Bende <bbe...@gmail.com> wrote:
>>>>> 
>>>>> What would be the main use case for wanting all the flattened values
>>>>> in attributes?
>>>>> 
>>>>> If the reason was to keep the original content, we could probably just
>>>>> added an original relationship.
>>>>> 
>>>>> Also, I think FlattenJson supports flattening a flow file where the
>>>>> root is an array of JSON documents (although I'm not totally sure), so
>>>>> you'd have to consider what to do in that case.
>>>>> 
>>>>> On Tue, Mar 20, 2018 at 5:26 AM, Pierre Villard
>>>>> <pierre.villard...@gmail.com> wrote:
>>>>>> No I do see how this could be convenient in some cases. My comment was
>>>>>> more: you can certainly submit a PR for that feature, but it'll need to 
>>>>>> be
>>>>>> clearly documented using the appropriate annotations, documentation, and
>>>>>> property descriptions.
>>>>>> 
>>>>>> 2018-03-20 10:20 GMT+01:00 Jorge Machado <jom...@me.com>:
>>>>>> 
>>>>>>> Hi Pierre, I’m aware of that. So This means the change would not be
>>>>>>> accepted correct ?
>>>>>>> 
>>>>>>> Regards
>>>>>>> 
>>>>>>> Jorge Machado
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On 20 Mar 2018, at 09:54, Pierre Villard <pierre.villard...@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi Jorge,
>>>>>>>> 
>>>>>>>> I think this should be carefully documented to remind users that the
>>>>>>>> attributes are in memory. Doing what you propose would mean having in
>>>>>>>> memory the full content of the flow file as long as the flow file is
>>>>>>>> processed in the workflow (unless you remove attributes using
>>>>>>>> UpdateAttributes).
>>>>>>>> 
>>>>>>>> Pierre
>>>>>>>> 
>>>>>>>> 2018-03-20 7:55 GMT+01:00 Jorge Machado <jom...@me.com>:
>>>>>>>> 
>>>>>>>>> Hey guys,
>>>>>>>>> 
>>>>>>>>> I would like to change the FlattenJson Procerssor to be possible to
>>>>>>>>> Flatten to the attributes instead of Only to content. Is this a good
>>>>>>> Idea ?
>>>>>>>>> would the PR be accepted ?
>>>>>>>>> 
>>>>>>>>> Cheers
>>>>>>>>> 
>>>>>>>>> Jorge Machado
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>> 
>> 



Re: FlattenJson

2018-03-20 Thread Jorge Machado
Hi Bryan, 

thanks for the help. 
Our Flow: ExecuteSql -> convertToJSON ->  SplitJson -> ExecuteScript with 
attachedcode 1. 

We are now writting a custom processor that does this which is a copy of 
FlattenJson but instead of putting the result into a flowfile we put it into 
the attributes. 
That’s why I asked if it makes sense to contribute this back



Attached code 1: 

import org.apache.commons.io.IOUtils
import java.nio.charset.*
def flowFile = session.get();
if (flowFile == null) {
return;
}
def slurper = new groovy.json.JsonSlurper()
def attrs = [:] as Map<String,String>
session.read(flowFile,
{ inputStream ->
def text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
def obj = slurper.parseText(text)
obj.each {k,v ->
if(v!=null && v.toString()!=""){
  attrs[k] = v.toString()
  }
}
} as InputStreamCallback)
flowFile = session.putAllAttributes(flowFile, attrs)
session.transfer(flowFile, REL_SUCCESS)

some code removed


Jorge Machado





> On 20 Mar 2018, at 15:03, Bryan Bende <bbe...@gmail.com> wrote:
> 
> Ok it is still not clear what the reason for needing it in attributes
> is though... Is there another processor you are using after this that
> only works off attributes?
> 
> Just trying to understand if there is another way to accomplish what
> you want to do.
> 
> On Tue, Mar 20, 2018 at 9:50 AM, Jorge Machado <jom...@me.com> wrote:
>> We are using nifi for Workflow and we get from a database like job_status 
>> and job_name and some nested json columns.  (30 columns)
>> We need to put it as attributes from the Flow file and not the content. For 
>> the first part (columns without a json is done by groovy script) but then 
>> would be nice to use this standard processor and instead of writing this to 
>> a flow content write it to attributes.
>> 
>> 
>> Jorge Machado
>> 
>> 
>> 
>> 
>> 
>>> On 20 Mar 2018, at 14:47, Bryan Bende <bbe...@gmail.com> wrote:
>>> 
>>> What would be the main use case for wanting all the flattened values
>>> in attributes?
>>> 
>>> If the reason was to keep the original content, we could probably just
>>> added an original relationship.
>>> 
>>> Also, I think FlattenJson supports flattening a flow file where the
>>> root is an array of JSON documents (although I'm not totally sure), so
>>> you'd have to consider what to do in that case.
>>> 
>>> On Tue, Mar 20, 2018 at 5:26 AM, Pierre Villard
>>> <pierre.villard...@gmail.com> wrote:
>>>> No I do see how this could be convenient in some cases. My comment was
>>>> more: you can certainly submit a PR for that feature, but it'll need to be
>>>> clearly documented using the appropriate annotations, documentation, and
>>>> property descriptions.
>>>> 
>>>> 2018-03-20 10:20 GMT+01:00 Jorge Machado <jom...@me.com>:
>>>> 
>>>>> Hi Pierre, I’m aware of that. So This means the change would not be
>>>>> accepted correct ?
>>>>> 
>>>>> Regards
>>>>> 
>>>>> Jorge Machado
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 20 Mar 2018, at 09:54, Pierre Villard <pierre.villard...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi Jorge,
>>>>>> 
>>>>>> I think this should be carefully documented to remind users that the
>>>>>> attributes are in memory. Doing what you propose would mean having in
>>>>>> memory the full content of the flow file as long as the flow file is
>>>>>> processed in the workflow (unless you remove attributes using
>>>>>> UpdateAttributes).
>>>>>> 
>>>>>> Pierre
>>>>>> 
>>>>>> 2018-03-20 7:55 GMT+01:00 Jorge Machado <jom...@me.com>:
>>>>>> 
>>>>>>> Hey guys,
>>>>>>> 
>>>>>>> I would like to change the FlattenJson Procerssor to be possible to
>>>>>>> Flatten to the attributes instead of Only to content. Is this a good
>>>>> Idea ?
>>>>>>> would the PR be accepted ?
>>>>>>> 
>>>>>>> Cheers
>>>>>>> 
>>>>>>> Jorge Machado
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>> 



Re: FlattenJson

2018-03-20 Thread Jorge Machado
We are using nifi for Workflow and we get from a database like job_status and 
job_name and some nested json columns.  (30 columns)
We need to put it as attributes from the Flow file and not the content. For the 
first part (columns without a json is done by groovy script) but then would be 
nice to use this standard processor and instead of writing this to a flow 
content write it to attributes. 


Jorge Machado





> On 20 Mar 2018, at 14:47, Bryan Bende <bbe...@gmail.com> wrote:
> 
> What would be the main use case for wanting all the flattened values
> in attributes?
> 
> If the reason was to keep the original content, we could probably just
> added an original relationship.
> 
> Also, I think FlattenJson supports flattening a flow file where the
> root is an array of JSON documents (although I'm not totally sure), so
> you'd have to consider what to do in that case.
> 
> On Tue, Mar 20, 2018 at 5:26 AM, Pierre Villard
> <pierre.villard...@gmail.com> wrote:
>> No I do see how this could be convenient in some cases. My comment was
>> more: you can certainly submit a PR for that feature, but it'll need to be
>> clearly documented using the appropriate annotations, documentation, and
>> property descriptions.
>> 
>> 2018-03-20 10:20 GMT+01:00 Jorge Machado <jom...@me.com>:
>> 
>>> Hi Pierre, I’m aware of that. So This means the change would not be
>>> accepted correct ?
>>> 
>>> Regards
>>> 
>>> Jorge Machado
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On 20 Mar 2018, at 09:54, Pierre Villard <pierre.villard...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi Jorge,
>>>> 
>>>> I think this should be carefully documented to remind users that the
>>>> attributes are in memory. Doing what you propose would mean having in
>>>> memory the full content of the flow file as long as the flow file is
>>>> processed in the workflow (unless you remove attributes using
>>>> UpdateAttributes).
>>>> 
>>>> Pierre
>>>> 
>>>> 2018-03-20 7:55 GMT+01:00 Jorge Machado <jom...@me.com>:
>>>> 
>>>>> Hey guys,
>>>>> 
>>>>> I would like to change the FlattenJson Procerssor to be possible to
>>>>> Flatten to the attributes instead of Only to content. Is this a good
>>> Idea ?
>>>>> would the PR be accepted ?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> Jorge Machado
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 



Re: FlattenJson

2018-03-20 Thread Jorge Machado
Hi Pierre, I’m aware of that. So This means the change would not be accepted 
correct ? 

Regards

Jorge Machado





> On 20 Mar 2018, at 09:54, Pierre Villard <pierre.villard...@gmail.com> wrote:
> 
> Hi Jorge,
> 
> I think this should be carefully documented to remind users that the
> attributes are in memory. Doing what you propose would mean having in
> memory the full content of the flow file as long as the flow file is
> processed in the workflow (unless you remove attributes using
> UpdateAttributes).
> 
> Pierre
> 
> 2018-03-20 7:55 GMT+01:00 Jorge Machado <jom...@me.com>:
> 
>> Hey guys,
>> 
>> I would like to change the FlattenJson Procerssor to be possible to
>> Flatten to the attributes instead of Only to content. Is this a good Idea ?
>> would the PR be accepted ?
>> 
>> Cheers
>> 
>> Jorge Machado
>> 
>> 
>> 
>> 
>> 
>> 
>> 



FlattenJson

2018-03-20 Thread Jorge Machado
Hey guys,

 I would like to change the FlattenJson Procerssor to be possible to Flatten to 
the attributes instead of Only to content. Is this a good Idea ? would the PR 
be accepted ?

Cheers

Jorge Machado