Re: Apache NiFi - WebHDFS

2016-04-21 Thread Simon Ball
For webhdfs it could be difficult to build out with InvokeHttp, because of the slightly unusual way Hadoop uses http redirects. The principal is that you send a request without payload, get the redirect then send the request dictated by the redirect, but with the payload (redirect should be to

Re: Apache NiFi - WebHDFS

2016-04-21 Thread Jeremy Dyer
Yep all of those reasons make perfect sense to me. Now the question becomes is this something where we create new processors or just build out templates using existing processors like InvokeHTTP that we make publicly available? My vote would probably be for just making the processors but I would

Re: Apache NiFi - WebHDFS

2016-04-21 Thread Tom Stewart
I will share what would interest me. The HDFS processor today runs with authority matching the userid that NiFi is running as. Interactions with HDFS are via that userid, which limits what it can access. Now granted there are two options with the current PutHDFS processor (I believe). If you

RE: Apache NiFi - WebHDFS

2016-04-21 Thread Kumiko Yada
We would need the WebHDFS processor for Microsoft Azure Data Lake store. Thanks Kumiko -Original Message- From: Jeremy Dyer [mailto:jdy...@gmail.com] Sent: Thursday, April 21, 2016 4:28 PM To: users@nifi.apache.org Subject: Re: Apache NiFi - WebHDFS Kumiko, Just curious what makes you

Re: Apache NiFi - WebHDFS

2016-04-21 Thread larry mccay
Any WebHDFS processor should make the URL and credentials configurable so that it could go direct to WebHDFS or through the Knox Gateway. On Thu, Apr 21, 2016 at 6:11 PM, Tom Stewart wrote: > What about Knox Gateway? > > > On Apr 21, 2016, at 3:21 PM, Kumiko Yada

Re: Apache NiFi - WebHDFS

2016-04-21 Thread Tom Stewart
What about Knox Gateway? > On Apr 21, 2016, at 3:21 PM, Kumiko Yada wrote: > > Will do. > > Thanks > Kumiko > > -Original Message- > From: Joe Witt [mailto:joe.w...@gmail.com] > Sent: Thursday, April 21, 2016 12:45 PM > To: users@nifi.apache.org > Subject: Re:

Re: Apache NiFi/Hive - store merged tweets in HDFS, create table in hive

2016-04-21 Thread Igor Kravzov
That worked. Thank you. On Thu, Apr 21, 2016 at 5:26 PM, Joe Witt wrote: > Run the output through UpdateAttribute and put a property on that > processor with a name of 'filename' and a value of > '${filename}.yourextension' > > Thanks > Joe > > On Thu, Apr 21, 2016 at 5:24

Re: Apache NiFi/Hive - store merged tweets in HDFS, create table in hive

2016-04-21 Thread Joe Witt
Run the output through UpdateAttribute and put a property on that processor with a name of 'filename' and a value of '${filename}.yourextension' Thanks Joe On Thu, Apr 21, 2016 at 5:24 PM, Igor Kravzov wrote: > Thanks guys. I think it will work. > One thing: merged file

Re: Apache NiFi/Hive - store merged tweets in HDFS, create table in hive

2016-04-21 Thread Igor Kravzov
Thanks guys. I think it will work. One thing: merged file comes out without extension. How do I add extension to a merged file? On Thu, Apr 21, 2016 at 4:42 PM, Simon Ball wrote: > For most hive JSON serdes you are going to want what some people call JSON > record format.

Re: Apache NiFi/Hive - store merged tweets in HDFS, create table in hive

2016-04-21 Thread Simon Ball
For most hive JSON serdes you are going to want what some people call JSON record format. This is essentially a text file with a JSON document per line which represents a record, with reasonably consistent structure. You can achieve this by ensuring your JSON is not pretty formatted (one doc

Re: Apache NiFi/Hive - store merged tweets in HDFS, create table in hive

2016-04-21 Thread Bryan Bende
Also, this blog has a picture of what I described with MergeContent: https://blogs.apache.org/nifi/entry/indexing_tweets_with_nifi_and -Bryan On Thu, Apr 21, 2016 at 4:37 PM, Bryan Bende wrote: > Hi Igor, > > I don't know that much about Hive so I can't really say what

RE: Apache NiFi - WebHDFS

2016-04-21 Thread Kumiko Yada
Will do. Thanks Kumiko -Original Message- From: Joe Witt [mailto:joe.w...@gmail.com] Sent: Thursday, April 21, 2016 12:45 PM To: users@nifi.apache.org Subject: Re: Apache NiFi - WebHDFS Kumiko, Not that I am aware of. If you do end up doing so and are interested in contributing

Re: Apache NiFi/Hive - store merged tweets in HDFS, create table in hive

2016-04-21 Thread Igor Kravzov
Hi Brian, I am aware of this example. But I want to store JSON as it is and create external table. Like in this example. http://hortonworks.com/blog/howto-use-hive-to-sqlize-your-own-tweets-part-two-loading-hive-sql-queries/ What I don't know is how to properly merge multiple JSON in one file in

Re: Where can I find documentation of implementing custom processor?

2016-04-21 Thread Ashraf Hasson
Hi Andy, Can you complement your list with guides or how-tos on building a custom controller service and tie that into a processor that leverages the service. The only blog I found on this was Phillip Grenier's article

Re: Apache NiFi - WebHDFS

2016-04-21 Thread Joe Witt
Kumiko, Not that I am aware of. If you do end up doing so and are interested in contributing please let us know. Thanks Joe On Thu, Apr 21, 2016 at 3:43 PM, Kumiko Yada wrote: > Hello, > > > > Has anyone written the custom process for WebHDFS? > > > > Thanks > > Kumiko

Apache NiFi - WebHDFS

2016-04-21 Thread Kumiko Yada
Hello, Has anyone written the custom process for WebHDFS? Thanks Kumiko

Re: Apache NiFi/Hive - store merged tweets in HDFS, create table in hive

2016-04-21 Thread Bryan Bende
Hello, I believe this example shows an approach to do it (it includes Hive even though the title is Solr/banana): https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.html The short version is that it extracts several attributes from each tweet using

Re: datetime argument to the GetSFTP processor

2016-04-21 Thread Ashraf Hasson
Thanks Joe and Pierre, Yep, that was the way to go indeed and thanks for your confirmation, time and replies [] I've moved matching the static part to the ListSFTP regex with File Filter Regex = test.file.\d{8}-\d{6}.csv and matched with Pierre's expression to simplify the route condition.

Re: datetime argument to the GetSFTP processor

2016-04-21 Thread Pierre Villard
Hi Ashraf, I believe you are right, RouteOnAttribute is certainly what you should use. Regarding the expression, it sounds good to me. Depending of the filename format characteristics, may be you can directly use ${filename:contains( ${now():toNumber():minus(8640):format('mmdd')} )}

Re: datetime argument to the GetSFTP processor

2016-04-21 Thread Joe Witt
Perfect! You got it. On Thu, Apr 21, 2016 at 1:11 PM, Ashraf Hasson wrote: > Hi there, > > Okay, so I should use RouteOnAttribute I think. > > Here's the source filename: test.file.20160420-015931.csv > > I've configured the RouteOnAttribute to route the `filename` to

Re: datetime argument to the GetSFTP processor

2016-04-21 Thread Joe Witt
Ashraf, The flow would be ListSFTP -> RouteOnAttribute -> FetchSFTP In RouteOnAttribute you'd put a filename filter in place to detect filenames of interest to you. Route things that match to FetchSFTP and things that do not match you can terminate or do whatever you need. RouteOnAttribute is

Re: datetime argument to the GetSFTP processor

2016-04-21 Thread Ashraf Hasson
Hi there, Okay, so I should use RouteOnAttribute I think. Here's the source filename: test.file.20160420-015931.csv I've configured the RouteOnAttribute to route the `filename` to success when the following is matched: property: filename == value:

Re: datetime argument to the GetSFTP processor

2016-04-21 Thread Ashraf Hasson
Hi James, Thanks for your reply. The source has multiple files generated per day and I'm interested in one of those, so filtering is required I guess. I don't know how to filter based on the filename, was trying to pipe things like this: ListSFTP -> UpdateAttribute -> FetchSFTP but I'm sure

Re: AvroRuntimeException : Duplicate field name

2016-04-21 Thread Toivo Adams
It seems ExecuteSQL Junit test case must be created. Then we can investigate problem. Unfortunately I don't have other ideas (at least at the moment). thanks toivo 2016-04-21 19:26 GMT+03:00 Panos Geo : > No worries, I appreciate your help anyhow. > > I am using

RE: AvroRuntimeException : Duplicate field name

2016-04-21 Thread Panos Geo
No worries, I appreciate your help anyhow. I am using MariaDB, but I get the below as warning when I start NiFi even before triggering the processor to execute the SQL statement. When I do trigger the processor to execute the SQL query, then I see the AvroRuntimeException as full error...

RE: AvroRuntimeException : Duplicate field name

2016-04-21 Thread Panos Geo
Hello Toivo, Many thanks for your reply! As I have indicated in my initial email, using aliases doesn't make any difference. It appears as if they are ignored and I am getting the same error. Any other thoughts? Many thanks,Panos Date: Thu, 21 Apr 2016 18:59:33 +0300 Subject: Re:

Re: AvroRuntimeException : Duplicate field name

2016-04-21 Thread Toivo Adams
Hi, Field names should be unique. Currently after executing query both 'plant.name' and 'area.name' will be just same 'name' You can use alias to have unique name, like: SELECT plant.name as pname, area.area_id, area.name as aname thanks toivo 2016-04-21 18:51 GMT+03:00 Panos Geo

Re: Lua usage in ExecuteScript Processor

2016-04-21 Thread Madhukar Thota
Made some progess on loading the lua files from modules directory. In my case all my lua files and .so files are in modules diretory. I placed the directory in nifi installation folder. Eg: lua_modules/common_log_format.lua in my script i am calling the luascript as follows: local clf =