Re: PutHDFS with mapr

2018-03-23 Thread Andre
Ravi,

There are two ways of solving this.

One of them (suggested to me MapR representatives) is to deploy MapR's FUSE
client to your NiFi nodes, use the PutFile processor instead of PutHDFS and
let the MapR client pump coordinate the API engagement with MapR-FS. This
is a very clean and robust approach, however it may have licensing
implications as the FUSE client is licensed. (per node if I recall
correctly).

The other one is to use the out of box PutHDFS processor with a bit of
configurations (it works on both Secure and Insecure clusters).

Try this out

Instead of recompiling PutHDFS simply point it to the mapr-client jars and
use a core-site.xml with the following content:



fs.defaultFS
maprfs:///



Please note maprclients don't play ball with kerberos nicely and you will
be required to use a MapR ticket to access the system. This can be easily
done by:

sudo -u  "kinit -kt /path/to/your/keytab &&
maprlogin kerberos"

Cheers

[1]
https://lists.apache.org/thread.html/af9244266e89990618152bb59b5bf95c9a49dc2428ea3fa0e6aaa682@%3Cusers.nifi.apache.org%3E
[2] https://cwiki.apache.org/confluence/x/zI5zAw



On Fri, Mar 23, 2018 at 5:05 AM, Ravi Papisetti (rpapiset) <
rpapi...@cisco.com> wrote:

> Hi,
>
>
>
> I have re-compiled nifi with mapr dependencies as per instructions at
> http://hariology.com/integrating-mapr-fs-and-apache-nifi/
>
>
>
> Created process flow with ListFile > FetchFile > PutHDFS. As soon as I
> start this process group nifi-bootstrap.log is getting filled with
>
> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler]
> org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error
> Invalid argument
>
> 2018-03-21 22:56:26,806 ERROR [NiFi logging handler]
> org.apache.nifi.StdErr 2018-03-21 22:56:26,8003 select failed(-1) error
> Invalid argument
>
>
>
> This log grows into GBs in minutes. I had to stop nifi to stop the
> flooding.
>
>
>
> I found similar issue in petaho forum: https://jira.pentaho.com/
> browse/PDI-16270
>
>
>
> Any one has any thoughts why this error might be coming?
>
>
>
> Appreciate any help.
>
>
>
> Thanks,
>
> Ravi Papisetti
>


Re: AWS CloudWatch

2018-03-23 Thread Kevin Doran
Hi Laurens,

I've never done this but here are some ideas you could experiment with.

Assuming the logs are coming from something like an application running on an 
EC2 instance, there are a number of ways you could probably expose them to NiFi 
without going through CloudWatch logs. There are a number of articles and blog 
posts [1] that describe how to do this. For instance, if your app logging 
framework supports an appender that can go direct to NiFi, or logging locally 
and running a local MiNiFi agent running a simple flow that tails a log file 
and sends the contents to NiFi using the site to site protocol. This would have 
the advantage of attaching provenance metadata to your logs right at the 
source, in case that is valuable for your use case.

I'm assuming you want to use CloudWatch for some other reason/integration as 
part of your overall architecture, or that the source is not something you 
could run a MiNiFi agent on (i.e., another AWS service). There is a backlogged 
NiFi JIRA for reading from a Kinesis Stream [2], but in absence of that feature 
being implemented, you would have to have something between a Kinesis 
Stream/Firehose carrying the log data and NiFi.  Some ideas include:

- Kinesis > S3 > NiFi (as you suggested) could work
- Kinesis > Redshift > NiFi using one of NiFi's SQL processors (e.g., 
QueryDatabaseTable) to get data out of Redshift. 
- Kinesis > AWS Lambda > NiFi using one of NiFi's listener processors 
(ListenUDP or HTTP) on the NiFi side to create an endpoint an AWS Lambda 
function could post to.
- Kinesis > [any streaming/messaging framework that NiFi can integrate with, 
such as Kafka or JMS] > NiFi

The last one is kind of a shot in the dark, but given the number of libraries 
and protocols out there that are interoperable, it might be possible to 
leverage one if you can find a good third-party tool for getting a data stream 
out of Kinesis into a system that natively interoperates with another messaging 
protocol.

I'd be interested to hear if you come up with an approach that works well for 
your use case. Hope this helps!

Kevin

[1] 
https://bryanbende.com/development/2015/05/17/collecting-logs-with-apache-nifi
[2] https://issues.apache.org/jira/browse/NIFI-2892

On 3/23/18, 14:15, "Laurens Vets"  wrote:

Hi list,

Has anyone tried to setup NiFi to get real-time CloudWatch logs somehow?

I can export CloudWatch logs to S3, but it might take up to 12 hours for 
them to become available. I suspect the only other option is to go 
through AWS Kinesis Firehose to stream to S3 and have NiFi pick up the 
logs there?

Any ideas/comments/suggestions are highly appreciated :)





AWS CloudWatch

2018-03-23 Thread Laurens Vets

Hi list,

Has anyone tried to setup NiFi to get real-time CloudWatch logs somehow?

I can export CloudWatch logs to S3, but it might take up to 12 hours for 
them to become available. I suspect the only other option is to go 
through AWS Kinesis Firehose to stream to S3 and have NiFi pick up the 
logs there?


Any ideas/comments/suggestions are highly appreciated :)


答复: 答复: put pictures from remote server into hdfs

2018-03-23 Thread 李 磊
Thanks all! I succeed.

发件人: Andrew Grande [mailto:apere...@gmail.com]
发送时间: 2018年3月23日 21:13
收件人: users@nifi.apache.org
主题: Re: 答复: put pictures from remote server into hdfs

I think the MOB expectation for HBase was around 10MB.

I agree it will require some thought put in organizing the space and region 
server splits with column families, once this volume becomes significant.

Andrew
On Fri, Mar 23, 2018, 9:08 AM Mike Thomsen 
> wrote:
Off the top of my head, try PutHBaseCell for that. If you run into problems, 
let us know.

As a side note, you should be careful about storing large binary blobs in 
HBase. I don't know to what extent our processors support HBase MOBs either. In 
general, you'll probably be alright if the pictures are on the small side (< 
1MB), but be very careful beyond that.

If you have to store a lot of images and aren't able to commit to a small file 
size, I would recommend looking at BLOB store like S3 or OpenStack Swift. Maybe 
Ceph as well.

On Thu, Mar 22, 2018 at 8:59 PM, 李 磊 
> wrote:
Hi Bryan:

Thanks for you response.

Using GetSFTP and PutHDFS is helpful.

Now I meet another problem. Besides the HDFS, the priictures from remote server 
also need to put into HBase. The filename is rowkey and the file as a column.

This is the reason why I store the pictures in local and then use 
ExecuteFlumeSource with spooldir which can read the picture as a whole, but I 
lose the filename.

-邮件原件-
发件人: Bryan Bende [mailto:bbe...@gmail.com]
发送时间: 2018年3月23日 0:42
收件人: users@nifi.apache.org
主题: Re: put pictures from remote server into hdfs

Hello,

It would probably be best to use GetSFTP -> PutHDFS.

No need to write the files out to local disk somewhere else with PutFile, they 
can go straight to HDFS.

The filename in HDFS will be the "filename" attribute of the flow file, which 
GetSFTP should be setting to the filename it picked up.

If you need a different filename, you can stick an UpdateAttribute before 
PutHDFS and change the filename attribute to whatever makes sense.

-Bryan


On Thu, Mar 22, 2018 at 12:18 PM, 李 磊 
> wrote:
> Hi all,
>
>
>
> It is my requirement that put pictures from remote server(not in nifi
> cluster) into hdfs.
>
> First I use the GetSFTP and PutFile to get pictures to local, and then
> use ExecuteFlumeSource and ExecuteFlumeSink to put pictures into hdfs
> from local.
>
>
>
> However, there is a problem that the name of pictures that put into
> hdfs cannot keep the same with local.
>
>
>
> Could you tell me the way to keep the name same or a better way to put
> pictures into hdfs from remote server with nifi?
>
>
>
> Thanks!



Re: 答复: put pictures from remote server into hdfs

2018-03-23 Thread Andrew Grande
I think the MOB expectation for HBase was around 10MB.

I agree it will require some thought put in organizing the space and region
server splits with column families, once this volume becomes significant.

Andrew

On Fri, Mar 23, 2018, 9:08 AM Mike Thomsen  wrote:

> Off the top of my head, try PutHBaseCell for that. If you run into
> problems, let us know.
>
> As a side note, you should be careful about storing large binary blobs in
> HBase. I don't know to what extent our processors support HBase MOBs
> either. In general, you'll probably be alright if the pictures are on the
> small side (< 1MB), but be very careful beyond that.
>
> If you have to store a lot of images and aren't able to commit to a small
> file size, I would recommend looking at BLOB store like S3 or OpenStack
> Swift. Maybe Ceph as well.
>
> On Thu, Mar 22, 2018 at 8:59 PM, 李 磊  wrote:
>
>> Hi Bryan:
>>
>> Thanks for you response.
>>
>> Using GetSFTP and PutHDFS is helpful.
>>
>> Now I meet another problem. Besides the HDFS, the priictures from remote
>> server also need to put into HBase. The filename is rowkey and the file as
>> a column.
>>
>> This is the reason why I store the pictures in local and then use
>> ExecuteFlumeSource with spooldir which can read the picture as a whole, but
>> I lose the filename.
>>
>> -邮件原件-
>> 发件人: Bryan Bende [mailto:bbe...@gmail.com]
>> 发送时间: 2018年3月23日 0:42
>> 收件人: users@nifi.apache.org
>> 主题: Re: put pictures from remote server into hdfs
>>
>> Hello,
>>
>> It would probably be best to use GetSFTP -> PutHDFS.
>>
>> No need to write the files out to local disk somewhere else with PutFile,
>> they can go straight to HDFS.
>>
>> The filename in HDFS will be the "filename" attribute of the flow file,
>> which GetSFTP should be setting to the filename it picked up.
>>
>> If you need a different filename, you can stick an UpdateAttribute before
>> PutHDFS and change the filename attribute to whatever makes sense.
>>
>> -Bryan
>>
>>
>> On Thu, Mar 22, 2018 at 12:18 PM, 李 磊  wrote:
>> > Hi all,
>> >
>> >
>> >
>> > It is my requirement that put pictures from remote server(not in nifi
>> > cluster) into hdfs.
>> >
>> > First I use the GetSFTP and PutFile to get pictures to local, and then
>> > use ExecuteFlumeSource and ExecuteFlumeSink to put pictures into hdfs
>> > from local.
>> >
>> >
>> >
>> > However, there is a problem that the name of pictures that put into
>> > hdfs cannot keep the same with local.
>> >
>> >
>> >
>> > Could you tell me the way to keep the name same or a better way to put
>> > pictures into hdfs from remote server with nifi?
>> >
>> >
>> >
>> > Thanks!
>>
>
>


Re: 答复: put pictures from remote server into hdfs

2018-03-23 Thread Mike Thomsen
Off the top of my head, try PutHBaseCell for that. If you run into
problems, let us know.

As a side note, you should be careful about storing large binary blobs in
HBase. I don't know to what extent our processors support HBase MOBs
either. In general, you'll probably be alright if the pictures are on the
small side (< 1MB), but be very careful beyond that.

If you have to store a lot of images and aren't able to commit to a small
file size, I would recommend looking at BLOB store like S3 or OpenStack
Swift. Maybe Ceph as well.

On Thu, Mar 22, 2018 at 8:59 PM, 李 磊  wrote:

> Hi Bryan:
>
> Thanks for you response.
>
> Using GetSFTP and PutHDFS is helpful.
>
> Now I meet another problem. Besides the HDFS, the priictures from remote
> server also need to put into HBase. The filename is rowkey and the file as
> a column.
>
> This is the reason why I store the pictures in local and then use
> ExecuteFlumeSource with spooldir which can read the picture as a whole, but
> I lose the filename.
>
> -邮件原件-
> 发件人: Bryan Bende [mailto:bbe...@gmail.com]
> 发送时间: 2018年3月23日 0:42
> 收件人: users@nifi.apache.org
> 主题: Re: put pictures from remote server into hdfs
>
> Hello,
>
> It would probably be best to use GetSFTP -> PutHDFS.
>
> No need to write the files out to local disk somewhere else with PutFile,
> they can go straight to HDFS.
>
> The filename in HDFS will be the "filename" attribute of the flow file,
> which GetSFTP should be setting to the filename it picked up.
>
> If you need a different filename, you can stick an UpdateAttribute before
> PutHDFS and change the filename attribute to whatever makes sense.
>
> -Bryan
>
>
> On Thu, Mar 22, 2018 at 12:18 PM, 李 磊  wrote:
> > Hi all,
> >
> >
> >
> > It is my requirement that put pictures from remote server(not in nifi
> > cluster) into hdfs.
> >
> > First I use the GetSFTP and PutFile to get pictures to local, and then
> > use ExecuteFlumeSource and ExecuteFlumeSink to put pictures into hdfs
> > from local.
> >
> >
> >
> > However, there is a problem that the name of pictures that put into
> > hdfs cannot keep the same with local.
> >
> >
> >
> > Could you tell me the way to keep the name same or a better way to put
> > pictures into hdfs from remote server with nifi?
> >
> >
> >
> > Thanks!
>


Re: Store Hash data type in redis

2018-03-23 Thread Mike Thomsen
I don't think there are any processors yet for this sort of thing. I've
been thinking about working on some for a while now. How would you expect
that hypothetical PutRedisHash processor to work? Here are some example use
cases that I've been mulling for building one:

1. Read from attributes with a configured prefix like
"redis.hash.{NAME}.{KEY}".
2. A mass HSET from a flat JSON document where the hash name would come off
a configured attribute.
3. A set of work instructions like this:

{
"hincr": [
"user_logins",
"bad_logins"
],
"hset": {
"key1": "x",
"key2": y"
}
}

What are your thoughts on that for a Jira ticket?

On Fri, Mar 23, 2018 at 6:04 AM, Sangavi Eswaramoorthi 
wrote:

> Hi,
>
> I would like to store hash data in redis using NIFi. Is it achievable?
>
> Thanks,
> sangavi
>


Store Hash data type in redis

2018-03-23 Thread Sangavi Eswaramoorthi
Hi,

I would like to store hash data in redis using NIFi. Is it achievable?

Thanks,
sangavi