Nifi - Merge multiple columns to row

2018-03-27 Thread mausam
hi all,

I am trying to import the below structure.


​
While importing, I need to merge multiple records from the same file (kind
of self-join) to produce something like this.


Here, all records having the same ID are being merged together into a
single row. To separate the records, I have used a separator (%%). To
separate ColType and ColValue, I have used another separator (||),

I can see that there is a MergeContent processor that merges records across
multiple flow files, it seems, it doesn't support joining records within
the same flow file.

In case if any of you have come across a similar problem, could you let me
know the best way to handle this scenario.

Thanks,
Mausam


​


Re: PutHDFS with mapr

2018-03-27 Thread Andre
Ravi,

I assume the MapR client package is working and operational and you can
login to the uid running NiFi and issues the following successfuly:

$ maprlogin authtest

$ maprlogin print

$ hdfs dfs -ls /


So if those fail, fix them before you proceed.

If those work, I would point the issue is likely to be caused by the
additional class path not being complete.

>From the documentation:

A comma-separated list of paths to files and/or directories that will be
added to the classpath. When specifying a directory, all files with in the
directory will be added to the classpath, but further sub-directories will
not be included.

I don't have a mapr-client instance handy but my next steps would be
ensuring the list of directory and subdirectories is complete and if not,
add individual JAR files.


It should work.


On Tue, Mar 27, 2018 at 1:56 AM, Ravi Papisetti (rpapiset) <
rpapi...@cisco.com> wrote:

> Hi Andre,
>
>
>
> I have tried with pointing puthdfs to lib class path with:
> /opt/mapr/lib,/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common,/opt/mapr/
> hadoop/hadoop-2.7.0/share/hadoop/common/lib
>
>
>
> I have given this value for “Additional Classpath Resources” parameter of
> PutHDFS processor.
>
>
>
> Getting below exception. Please note that, I have tried this in NiFi 1.5
> version.
>
>
>
>
>
> 2018-03-26 14:47:51,305 ERROR [StandardProcessScheduler Thread-6]
> o.a.n.controller.StandardProcessorNode Failed to invoke @OnScheduled
> method due to java.lang.RuntimeException: Failed while executing one of
> processor's OnScheduled task.
>
> java.lang.RuntimeException: Failed while executing one of processor's
> OnScheduled task.
>
> at org.apache.nifi.controller.StandardProcessorNode.
> invokeTaskAsCancelableFuture(StandardProcessorNode.java:1504)
>
> at org.apache.nifi.controller.StandardProcessorNode.initiateStart(
> StandardProcessorNode.java:1330)
>
> at org.apache.nifi.controller.StandardProcessorNode.lambda$
> initiateStart$1(StandardProcessorNode.java:1358)
>
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>
> at java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
>
> at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.util.concurrent.ExecutionException: java.lang.reflect.
> InvocationTargetException
>
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>
> at java.util.concurrent.FutureTask.get(FutureTask.java:206)
>
> at org.apache.nifi.controller.StandardProcessorNode.
> invokeTaskAsCancelableFuture(StandardProcessorNode.java:1487)
>
> ... 9 common frames omitted
>
> Caused by: java.lang.reflect.InvocationTargetException: null
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:498)
>
> at org.apache.nifi.util.ReflectionUtils.
> invokeMethodsWithAnnotations(ReflectionUtils.java:137)
>
> at org.apache.nifi.util.ReflectionUtils.
> invokeMethodsWithAnnotations(ReflectionUtils.java:125)
>
> at org.apache.nifi.util.ReflectionUtils.
> invokeMethodsWithAnnotations(ReflectionUtils.java:70)
>
> at org.apache.nifi.util.ReflectionUtils.
> invokeMethodsWithAnnotation(ReflectionUtils.java:47)
>
> at org.apache.nifi.controller.StandardProcessorNode$1.call(
> StandardProcessorNode.java:1334)
>
> at org.apache.nifi.controller.StandardProcessorNode$1.call(
> StandardProcessorNode.java:1330)
>
> ... 6 common frames omitted
>
> Caused by: java.io.IOException: No FileSystem for scheme: maprfs
>
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
> FileSystem.java:2660)
>
> at org.apache.hadoop.fs.FileSystem.createFileSystem(
> FileSystem.java:2667)
>
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
>
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:172)
>
> at org.apache.nifi.processors.hadoop.
> AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:322)
>
> at org.apache.nifi.processors.hadoop.
> AbstractHadoopProcessor$1.run(AbstractHadoopProcessor.java:319)
>
> at java.security.AccessController.doPrivileged(Native Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:422)
>
> at 

Re: NiFi Variables

2018-03-27 Thread Daniel Chaffelson
Thank you for taking the time to try it!
Please let me know if there's other features you would find useful.

On Tue, 27 Mar 2018, 04:10 scott,  wrote:

> Daniel,
>
> That worked perfectly. Thank you for the help, and for creating this great
> tool.
>
> Scott
>
> On 03/25/2018 02:50 AM, Daniel Chaffelson wrote:
>
> Hi Scott,
> NiPyApi provides a python client for this purpose. There are calls to
> get/set variables in the canvas module.
> https://github.com/Chaffelson/nipyapi
>
> Let me know if you have any troubles with it.
>
>
>
> On Sun, 25 Mar 2018, 02:28 Charlie Meyer, <
> charlie.me...@civitaslearning.com> wrote:
>
>> Take a look at your browsers developer tools when you set variables and
>> mimic the calls in code. We do this using a swagger generated client and it
>> works well.
>>
>> On Sat, Mar 24, 2018, 20:26 scott  wrote:
>>
>>> Hello community,
>>>
>>> I'm looking for a way to edit or add to the new "variables" feature
>>> programmatically, such as through the API or other. For instance, I'd
>>> like to use a variable to configure the remote host for my SFTP collection,
>>> and then be able to change the value through an automated job when the
>>> remote host changes. This would be especially useful for processors that
>>> don't allow an input relationship.
>>>
>>> Any suggestions or comments would be welcome.
>>>
>>> Thanks,
>>>
>>> Scott
>>>
>>
>


Setting NIFI_HTTP_WEB_HOST in Docker doesn't work

2018-03-27 Thread Paulo & Claudio
 Hello. I have installed NiFi Docker on a Google Cloud Debian 9 server.

The server has a public IP of PUBLIC_IP (placeholder for the real IP).

I ran docker pull apache/nifi and it installed the image successfully.

Then, I ran:
sudo docker run --name nifi -p 8080:8080 -d -e NIFI_HTTP_WEB_HOST=PUBLIC_IP
-e NIFI_HTTP_WEB_PORT=8080 apache/nifi:latest

Since I set NIFI_HTTP_WEB_HOST to my PUBLIC_IP, and I have allowed the port
8080 in the Google Cloud console, I would assume that http://PUBLIC_IP:8080/
nifi/  would work... instead, I get this:

"System Error
The request contained an invalid host header [PUBLIC_IP:8080] in the
request [/nifi/]. Check for request manipulation or third-party intercept."
When I run sudo docker logs nifi, I see the following warning:

o.a.nifi.web.server.HostHeaderHandler Request host header [PUBLIC_IP:8080]
different from web hostname [f3d104266d9a(:8080)]

Any clue on why that might be happening?

I found a few sites that suggested setting nifi.web.http.host in
nifi.properties, but I don't know how to do that in Docker. Plus, I thought
that the NIFI_HTTP_WEB_HOST setting served exacly for that purpose. Am I
missing something?

Thank you in advance.


Re: NiFi as a Web Services server

2018-03-27 Thread Boris Tyukin
thanks Mike, you've reasoned my concerns as well.

On Mon, Mar 26, 2018 at 5:57 PM, Mike Thomsen 
wrote:

> I think you would hit two big barriers in design:
>
> 1. NiFi just isn't designed to be an app server for additional service
> layer components a la Tomcat.
> 2. Synchronizing between the REST services and NiFi's highly asynchronous
> processing would be a logistical nightmare if your goal is to confine NiFi
> processing to the request/response cycle of HTTP.
>
> If you want to really integrate NiFi into other apps, what you should do
> is focus on the inputs and outputs and use something very real-time like
> Kafka + WebSockets at the end so that you can stream output to consumers.
>
> Thanks,
>
> Mike
>
> On Mon, Mar 26, 2018 at 3:07 PM, Boris Tyukin 
> wrote:
>
>> I wonder how practical it is to use NiFi as a server to serve web service
>> calls and return data over REST to consumers. It seems like a natural fit,
>> given NiFi's scalability and redundancy and a host of data integration and
>> transformation processors.
>>
>> I am thinking to create a quick pilot using Kafka as a message broker
>> (and to decouple NiFi flows) and NiFi as transformation/integration engine
>> and also as a web server to expose flows via NiFi Httphandler/listener
>> processors.
>>
>> Is anyone using NiFi for that?
>>
>> Thanks,
>> Boris
>>
>>
>>
>>
>


Re: how to edit queue content

2018-03-27 Thread Mike Thomsen
If you know one of the supported scripting languages, you can probably do
some of that with ExecuteScript. For example, if you wanted to drop every
other flowfile in a block of 100, it'd be like this:

def flowfiles = session.get(100) // Get up to 100
int index = 1
flowfiles?.each { flowFile ->
if (index % 2 == 0) {
session.remove(flowFile)
} else {
session.transfer(flowFile, REL_SUCCESS)
}
index++
}

On Tue, Mar 27, 2018 at 12:06 AM, scott  wrote:

> Hi community,
>
> I've got a question about a feature I would find useful. I've been setting
> up a lot of new flows and testing various configurations, and I thought it
> would be really useful if I could edit the content of queues. For example,
> I can examine each file in the queue, then decide I want to keep the second
> one and the third one, then remove the rest before resuming my flow
> testing. I know I can delete all files, but is there a way to have more
> control over the queue content? Could I delete a specific file, or change
> the order of the queue?
>
> Thanks for your time,
>
> Scott
>
>


Re: how to edit queue content

2018-03-27 Thread Andrew Grande
How is it going to work with e.g. 20GB of events in the queue? I'd be
careful, as requirements blow up into a full db with indexes, search, and a
UI on top. If one wanted to filter events, wouldn't a standard processor do
the job better?

Andrew

On Tue, Mar 27, 2018, 12:11 AM Joe Witt  wrote:

> Scott
> Yep definitely something we've talked about [1].  We've not pursued it
> directly as of yet since it is indeed a queue and we're just letting
> you peak into it.  We dont have facilities built in to really alter
> the queue in a particular position.  Also, the complexity comes in
> when folks want to have paging/selection of various items down the
> list/etc..  (but it isn't a list - its a queue).
>
> If you could bound the range of what you'd expect to be able to do
> that would probably help constrain into something reasonably
> implemented.
>
> Thanks
>
> [1]
> https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management
>
> On Tue, Mar 27, 2018 at 12:06 AM, scott  wrote:
> > Hi community,
> >
> > I've got a question about a feature I would find useful. I've been
> setting
> > up a lot of new flows and testing various configurations, and I thought
> it
> > would be really useful if I could edit the content of queues. For
> example, I
> > can examine each file in the queue, then decide I want to keep the second
> > one and the third one, then remove the rest before resuming my flow
> testing.
> > I know I can delete all files, but is there a way to have more control
> over
> > the queue content? Could I delete a specific file, or change the order of
> > the queue?
> >
> > Thanks for your time,
> >
> > Scott
> >
>


RE: Unable to create HiveConnectionPool with kerberos.

2018-03-27 Thread mohit.jain
Thanks Pierre, 

It is working now.

 

Mohit

 

From: Pierre Villard  
Sent: 27 March 2018 13:15
To: users@nifi.apache.org
Subject: Re: Unable to create HiveConnectionPool with kerberos.

 

Hi,

It needs to be the principal of the Hive server, not yours. Can you give it a 
try by replacing _HOST by the fqdn of your Hive server (just to check if that's 
the issue here)?

If you still have an error, I'd recommend checking the nifi-app.log file to 
have more detailed (complete stack trace) about the "GSS initiate failed" error.

Pierre

 

2018-03-27 8:57 GMT+02:00  >:

When I try using my user prinicipal instead of hive it gives following error:

 

SelectHiveQL[id=633d54ed-0162-1000--6fa47d56] 
org.apache.nifi.processors.hive.SelectHiveQL$$Lambda$523/1312347477@1c34a7fa 

  failed to process due to java.lang.IllegalArgumentException: Kerberos 
principal should have 3 parts: mo...@olympus.oi.co.in 
 ; rolling back session: Kerberos principal 
should have 3 parts: mo...@olympus.oi.co.in  

 

FYI…I am able to write to HDFS using kerberos. Just when I’m trying to create 
table in hive using PutHiveQl, it throws the error.

 

 

From: mohit.j...@open-insights.co.in   
 > 
Sent: 27 March 2018 11:22
To: users@nifi.apache.org  
Subject: RE: Unable to create HiveConnectionPool with kerberos.

 

Hi,

 

I have tried that URL but it gives me following error:-

 

HiveConnectionPool[id=6e60258b-9e00-3bac-9590-543aec882280] Error getting Hive 
connection: org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://**.co.in:1/nifi_test1;principal=hive/_HOST@**.co.in 
 : GSS initiate 
failed)

 

 

Mohit

 

 

From: Pierre Villard  > 
Sent: 26 March 2018 23:31
To: users@nifi.apache.org  
Subject: Re: Unable to create HiveConnectionPool with kerberos.

 

Mohit,

I believe you need to change the JDBC url (even though you have the 
configuration files correctly set) to something like:
jdbc:hive2://:/;principal=

So it'd be something like: 

jdbc:hive2://localhost:1/default;principal=hive/my.fqdn.hive.ser...@example.com
  


Pierre

 

2018-03-26 18:13 GMT+02:00 Juan Pablo Gardella  >:

Sorry, the issue happens when a HA configuration is used.

 

On Mon, 26 Mar 2018 at 13:03 Juan Pablo Gardella  > wrote:

See https://issues.apache.org/jira/browse/NIFI-2575, the driver does not suppor 
that. I've put some workarounds in the ticket.

 

On Mon, 26 Mar 2018 at 13:03  > wrote:

Hi,

 

I am getting the following warning when I use HiveConnection pool with Kerberos 
:

 

HiveConnectionPool[id=6e60258b-9e00-3bac-85ba-0dac8e22142f] Configuration does 
not have security enabled, Keytab and Principal will be ignored

 

It also throws the following bulletin in my PutHiveQl processor: 

PutHiveQL[id=55f4ac1b-ecf9-3db3-b898-7a9d145a5382] 
org.apache.nifi.processors.hive.PutHiveQL$$Lambda$663/2042832677 
 @40267000 failed to process due to 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://**:1/nifi_test1: Peer indicated failure: Unsupported 
mechanism type PLAIN); rolling back session: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://**:1/nifi_test1: Peer indicated failure: Unsupported 
mechanism type PLAIN)
 

Hive Configuration Resources:- 
/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml

I have set hive.security.authentication and hadoop.security.authentication to 
Kerberos.

 

Please let me know if I’m doing anything wrong.

 

Regards,

Mohit

 

 



RE: Unable to create HiveConnectionPool with kerberos.

2018-03-27 Thread mohit.jain
When I try using my user prinicipal instead of hive it gives following error:

 

SelectHiveQL[id=633d54ed-0162-1000--6fa47d56] 
org.apache.nifi.processors.hive.SelectHiveQL$$Lambda$523/1312347477@1c34a7fa 
failed to process due to java.lang.IllegalArgumentException: Kerberos principal 
should have 3 parts: mo...@olympus.oi.co.in; rolling back session: Kerberos 
principal should have 3 parts: mo...@olympus.oi.co.in 
 

 

FYI…I am able to write to HDFS using kerberos. Just when I’m trying to create 
table in hive using PutHiveQl, it throws the error.

 

 

From: mohit.j...@open-insights.co.in  
Sent: 27 March 2018 11:22
To: users@nifi.apache.org
Subject: RE: Unable to create HiveConnectionPool with kerberos.

 

Hi,

 

I have tried that URL but it gives me following error:-

 

HiveConnectionPool[id=6e60258b-9e00-3bac-9590-543aec882280] Error getting Hive 
connection: org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://**.co.in:1/nifi_test1;principal=hive/_HOST@**.co.in: GSS 
initiate failed)

 

 

Mohit

 

 

From: Pierre Villard  > 
Sent: 26 March 2018 23:31
To: users@nifi.apache.org  
Subject: Re: Unable to create HiveConnectionPool with kerberos.

 

Mohit,

I believe you need to change the JDBC url (even though you have the 
configuration files correctly set) to something like:
jdbc:hive2://:/;principal=

So it'd be something like: 

jdbc:hive2://localhost:1/default;principal=hive/my.fqdn.hive.ser...@example.com
  


Pierre

 

2018-03-26 18:13 GMT+02:00 Juan Pablo Gardella  >:

Sorry, the issue happens when a HA configuration is used.

 

On Mon, 26 Mar 2018 at 13:03 Juan Pablo Gardella  > wrote:

See https://issues.apache.org/jira/browse/NIFI-2575, the driver does not suppor 
that. I've put some workarounds in the ticket.

 

On Mon, 26 Mar 2018 at 13:03  > wrote:

Hi,

 

I am getting the following warning when I use HiveConnection pool with Kerberos 
:

 

HiveConnectionPool[id=6e60258b-9e00-3bac-85ba-0dac8e22142f] Configuration does 
not have security enabled, Keytab and Principal will be ignored

 

It also throws the following bulletin in my PutHiveQl processor: 

PutHiveQL[id=55f4ac1b-ecf9-3db3-b898-7a9d145a5382] 
org.apache.nifi.processors.hive.PutHiveQL$$Lambda$663/2042832677 
 @40267000 failed to process due to 
org.apache.nifi.processor.exception.ProcessException: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://**:1/nifi_test1: Peer indicated failure: Unsupported 
mechanism type PLAIN); rolling back session: 
org.apache.commons.dbcp.SQLNestedException: Cannot create 
PoolableConnectionFactory (Could not open client transport with JDBC Uri: 
jdbc:hive2://**:1/nifi_test1: Peer indicated failure: Unsupported 
mechanism type PLAIN)
 

Hive Configuration Resources:- 
/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml

I have set hive.security.authentication and hadoop.security.authentication to 
Kerberos.

 

Please let me know if I’m doing anything wrong.

 

Regards,

Mohit