Re: A "proxy" question from the irc channel

2015-11-08 Thread Matt Burgess
Would we have more participation with something like a Slack team? Apache Drill has one with #user and #dev channels, seems to work pretty well and has various integrations with other tools (email, GitHub, Jira, etc.) Sent from my iPhone > On Nov 8, 2015, at 12:41 PM, Tony Kurc

Re: Expression language

2015-11-12 Thread Matt Burgess
Not sure if it would prove useful but I've started messing around with the Aho-Corasick algorithm in the hopes of the user being able to paste in some sample data and getting a regex out. If the data is "regular", the user wouldn't need to know an expression language, they would just need a

Re: Nifi service fail to start - Removed custom processor

2015-11-10 Thread Matt Burgess
cussion of this ticket [1]? > > Thanks > Joe > > [1] https://issues.apache.org/jira/browse/NIFI-1052 > >> On Tue, Nov 10, 2015 at 9:18 PM, Matt Burgess <mattyb...@gmail.com> wrote: >> Perhaps the UI could have a placeholder for a missing processor, and

Re: PutElasticsearch Identifier attribute question

2016-06-07 Thread Matt Burgess
stead of {"id":"160889137" causes the issue. > > >> On Tue, Jun 7, 2016 at 6:53 PM, Matt Burgess <mattyb...@gmail.com> wrote: >> Igor, >> >> The "id" field you have is in your content, but PutElasticsearch is >> looking f

Re: PutElasticsearch Identifier attribute question

2016-06-07 Thread Matt Burgess
Igor, The "id" field you have is in your content, but PutElasticsearch is looking for a flow file attribute. This can be fixed by putting an EvaluateJsonPath processor before the PutElasticsearch processor, with the Destination property set to "flowfile-attribute" and add a dynamic property

Re: Dependency for SSL

2016-05-25 Thread Matt Burgess
Kumiko, I'm guessing that entry is in your processor's POM. I believe you need the following in your NAR's POM as well: org.apache.nifi nifi-standard-services-api-nar nar Regards, Matt On Wed, May 25, 2016 at 3:44 PM, Kumiko Yada

Re: Dependency for SSL

2016-05-25 Thread Matt Burgess
If that library has dependencies, you may need to remove the jar so that it brings in the POM, which should get the JAR and its dependencies. Regards, Matt > On May 25, 2016, at 5:38 PM, Kumiko Yada wrote: > > Thank you Bryan and Matt. > > I added the dependency in

Re: Escape * and new line character

2016-06-21 Thread Matt Burgess
Huagen, I agree with Bryan that other processors may be better here. You could use ListFile -> FetchFile, or as Bryan said, you could use GetFile. Regards, Matt On Tue, Jun 21, 2016 at 9:38 AM, Bryan Rosander wrote: > Hi Huagen, > > 1. The ExecuteStreamCommand uses a

Re: IDE-specific setup

2016-06-21 Thread Matt Burgess
Same as Bryan, I choose New (Project or Module) from Existing Sources and point at the POM for that directory/project/module, IntelliJ does a good job of getting everything set up. On Tue, Jun 21, 2016 at 5:29 PM, Bryan Bende wrote: > I personally use IntelliJ and it generally

Re: GetHTTP->ExtractText (Regex/User problem?)

2016-06-20 Thread Matt Burgess
Looks like "content" is in smart quotes, try plain quotes instead. On Mon, Jun 20, 2016 at 1:43 PM, Sven Davison wrote: > I had tried that but got a NULL value result. Is there a setting w/in the > extractor that I need to change too? > > > > > > > > -Sven > > Sent from

Re: Problem with EvaluationJsonPath

2016-06-24 Thread Matt Burgess
Anuj, It seems like the value at that path is an array, perhaps add [0] to your JSON Path. Is "0" the expected value of that field? If not then perhaps the JSON path itself is incorrect, you could test it with some sample data at http://jsonpath.com/ Regards, Matt On Fri, Jun 24, 2016 at 3:34

Re: Auto installation of template

2016-02-26 Thread Matt Burgess
Michael, I don't think you can put a template into conf/templates and have it be picked up, I tried a couple of things and it looks like the system manages the things put in there. For the REST API, the documentation for uploading a template is missing because there are two ways to use POST and

Re: Using Apache Nifi and Tika to extract content from pdf

2016-02-20 Thread Matt Burgess
I have a blog post on how to do this with NiFi using a Groovy script in the ExecuteScript (new in 0.5.0) processor using PDFBox instead of Tika: http://funnifi.blogspot.com/2016/02/executescript-extract-text-metadata.html?m=1 Jython is also supported but can't yet use Java libraries (it uses

Re: Using Apache Nifi and Tika to extract content from pdf

2016-02-20 Thread Matt Burgess
can write up an improvement Jira with the initial findings. Regards, Matt > On Feb 20, 2016, at 2:18 PM, Russell Whitaker <russell.whita...@gmail.com> > wrote: > > Don't forget Clojure as well. > > Russell Whitaker > Sent from my iPhone > >> On Feb 20, 20

Re: Using Apache Nifi and Tika to extract content from pdf

2016-02-20 Thread Matt Burgess
ession.write(flowFile, {inputStream, outputStream -> > doc = PDDocument.load(inputStream) > info = doc.getDocumentInformation() > s.writeText(doc, new OutputStreamWriter(outputStream)) > } as StreamCallback > ) > > Thanks for your help. > > BR >

Re: Using Apache Nifi and Tika to extract content from pdf

2016-02-21 Thread Matt Burgess
s there a documentation? Or where did I find some infos? > > Sorry for all my questions. > > BR and thanks. > > Ralf > > > Am 20.02.2016 um 22:27 schrieb Matt Burgess <mattyb...@gmail.com>: > > I will update the blog to make these more clear. I used PDFBox 1.8.10 so >

Re: Generate URL based on different conditions

2016-02-16 Thread Matt Burgess
Here's a Gist template that uses Joe's approach of RouteOnAttribute then UpdateAttribute to generate URLs with the use case you described: https://gist.github.com/mattyb149/8fd87efa1338a70c On Tue, Feb 16, 2016 at 9:51 PM, Joe Witt wrote: > Jeff, > > For each of the

Re: How to add python modules ?

2016-03-30 Thread Matt Burgess
You are returning self.d from process() which is a void method. Needs to return None. Sent from my iPhone > On Mar 30, 2016, at 5:00 PM, Madhukar Thota wrote: > > Matt, > > I tired the following code but i am getting the following error. Can you help > me where i

Re: Common Attributes (FileSize)

2016-03-28 Thread Matt Burgess
Radhakrishna, The "fileSize" attribute should be available for every flow file. Can you describe how you are finding the default set of attributes and which ones you are finding? To test, I generated a flow file with the text "Hello" in it, then sent that to a LogAttribute processor, and got the

Re: ExecuteSQL to elasticsearch

2016-04-07 Thread Matt Burgess
tName": "lab1", "CounterName": > "AvgDiskSecTransfer", "InstanceName": "D:", "MetricValue": > 2.3995189985726E-4}, > {"DateTime": "2016-04-07 17:22:00.0", "HostName": "lab1", &q

Re: ExecuteSQL to elasticsearch

2016-04-07 Thread Matt Burgess
Can you provide a sample JSON output from your ConvertAvroToJson processor? It could help identify the location of any mapping/parser exceptions. Thanks, Matt On Thu, Apr 7, 2016 at 1:31 PM, Madhukar Thota wrote: > I am able to construct the dataflow with the

Re: DetectDuplicate : java.net.ConnectException

2016-03-19 Thread Matt Burgess
Arathi, You'll need to add another Controller Service, one of type DistributedMapCacheServer, set up on port 4557 (to match your DistributedMapCacheClientService), and enable/start it. Then you should be able to connect successfully. Regards, Matt On Thu, Mar 17, 2016 at 4:15 PM, Arathi Maddula

Re: CSV/delimited to Parquet conversion via Nifi

2016-03-21 Thread Matt Burgess
Edmon, NIFI-1663 [1] was created to add ORC support to NiFi. If you have a target dataset that has been created with Parquet format, I think you can use ConvertCSVtoAvro then StoreInKiteDataset to get flow files in Parquet format into Hive, HDFS, etc. Others in the community know a lot more about

Re: what is the PutElasticsearch Identifier Attribute for?

2016-03-23 Thread Matt Burgess
The Identifier Attribute property should contain the name of a Flow File attribute, which in turn contains the ID of the document to be put into Elasticsearch. Unfortunately it is a required property so having ES auto-generate it is not yet supported [1]. If you don't care what the ID is but need

Re: How to add python modules ?

2016-03-24 Thread Matt Burgess
:34 AM, Madhukar Thota <madhukar.th...@gmail.com > > wrote: > >> Hi Matt, >> >> Thank you for the input. I updated my config as you suggested and it >> worked like charm and also big thankyou for nice article. i used your >> article as reference when i am st

Re: Help on creating that flow that requires processing attributes in a flow content but need to preserve the original flow content

2016-03-21 Thread Matt Burgess
One way (in NiFi 0.5.0+) is to use the ExecuteScript processor, which gives you full control over the session and flowfile(s). For example if you had JSON in your "kafka.key" attribute such as "{"data": {"myKey": "myValue"}}" , you could use the following Groovy script to parse out the value of

Re: NiFi: command-line interface ?

2016-03-20 Thread Matt Burgess
a020-8bd3c82d1692] is not in a valid state. > Returning Conflict response. > > Not sure why the state is "not valid". The GetFile processor seems fine to > me. All the processors in the flow are currently stopped. GetFile has input > files. I would assume this should

Re: NiFi: command-line interface ?

2016-03-20 Thread Matt Burgess
Dmitry, With regards to nifi-client (I am the author), that exception occurs when the flow has been changed externally and the shell has not recognized it. What the result of the following command? nifi.currentVersion If it is -1, then I recommend restarting the shell. It should be a

Re: NiFi: command-line interface ?

2016-03-20 Thread Matt Burgess
. I've got Gradle 2.3 whose version > option's output states Groovy at 2.3.9. > - Dmitry > > > >> On Sun, Mar 20, 2016 at 3:23 PM, Matt Burgess <mattyb...@gmail.com> wrote: >> Dmitry, >> >> With regards to nifi-client (I am the author), that e

Re: How to add python modules ?

2016-03-23 Thread Matt Burgess
Madhukar, Glad to hear you found a solution, I was just replying when your email came in. Although in ExecuteScript you have chosen "python" as the script engine, it is actually Jython that is being used to interpret the scripts, not your installed version of Python. The first line (shebang) is

Re: REST Interface

2016-03-26 Thread Matt Burgess
The REST API is at /nifi-api not /nifi, the doc is somewhere but I am guessing we can do more to announce that in the relevant docs, thanks! Where do you think it would be helpful to add such reference(s)? Thanks, Matt > On Mar 26, 2016, at 2:47 PM, Uwe Geercken wrote: >

Re: ExecuteSQL and NiFi 0.5.1 - Error org.apache.avro.SchemaParseException: Empty name

2016-03-05 Thread Matt Burgess
That's on me, that commit went into 0.5.0 and looks like a negative logic error. I thought I had unit tested it but I guess not :( Sent from my iPhone > On Mar 5, 2016, at 6:57 PM, Bryan Bende wrote: > > I think this a legitimate bug that was introduced in 0.5.0. > > I

Re: ExecuteSQL Extract database tables multiple times.

2016-03-04 Thread Matt Burgess
Currently ExecuteSql will put all available rows into a single flow file. There is a Jira case (https://issues.apache.org/jira/browse/NIFI-1251) to allow the user to break up the result set into flow files containing a specified number of records. I'm not sure why you get 26 flow files, although

Re: javascript executescript processor

2016-03-02 Thread Matt Burgess
t; >> Hi Matt, >> >> That's exactly what I'm looking for - much appreciated ! >> >> Thanks, >> Mike >> >> On Tue, 1 Mar 2016 at 18:13, Matt Burgess <mattyb...@gmail.com> wrote: >> >>> Mike, >>> >>> I have a blog

Re: javascript executescript processor

2016-03-02 Thread Matt Burgess
a/com/crossbusiness/nifi/processors/NiFiUtils.java > > Sent from my iPhone > >> On Mar 2, 2016, at 1:40 PM, Matt Burgess <mattyb...@gmail.com> wrote: >> >> Ask and ye shall receive ;) I realize most of my examples are in Groovy so >> it was a goo

Re: ExecuteSQL Extract database tables multiple times.

2016-03-07 Thread Matt Burgess
b3b-a8b9-a77d0be27273] >>> ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273] failed to process >>> due to org.apache.avro.SchemaParseException: Empty name; rolling back >>> session: org.apache.avro.SchemaParseException: Empty name >>> >>> 10:30:02 CET E

Re: javascript executescript processor

2016-03-07 Thread Matt Burgess
; > > On 7 March 2016 at 17:03, Mike Harding <mikeyhard...@gmail.com> wrote: > >> aaa ok cool. Given that org.apache.nifi.processor.io.StreamCallback is >> an interface do I need to include the underlying classes? >> >> On 7 March 2016 at 16:29, Matt Burge

Re: EvaluateJsonPath and Json Field Name Starting with @ as the First Character

2016-03-29 Thread Matt Burgess
Hong, I was able to use EvaluateJsonPath with eventClass $.event.@class and the attribute had the correct value (see output from LogAttribute below): -- Standard FlowFile Attributes Key: 'entryDate' Value: 'Tue Mar 29 08:41:36 EDT 2016' Key:

Re: String conversion to Int, float double

2016-03-28 Thread Matt Burgess
Sounds good to me. I presume the processor would still put all attributes in the JSON content, but would use any dynamic properties solely for type coercion? Anything not listed would be treated like a String as it is now (to preserve current behavior). We'd need to document the possible

Re: Lua usage in ExecuteScript Processor

2016-04-20 Thread Matt Burgess
Madhu, I know very little about Lua, so I haven't tried making a Lua version of my JSON-to-JSON scripts/blogs (funnifi.blogspot.com), but here's something that works to get you started. The following Luaj script creates a flow file, writes to it, adds an attribute, then transfers it to success.

Re: Nifi parsing examples

2016-04-27 Thread Matt Burgess
If you can represent the expected string format as a regular expression, you can use the replaceAll() function [1] with back-references: ${url:replaceAll('(http://[a-zA-Z0-9]+:)[a-zA-Z0-9]+(@.*)','$1x$2')} original: http://username:p...@host.com after: http://username:xx...@host.com Note I

Re: Nifi parsing examples

2016-04-27 Thread Matt Burgess
Sorry that was just for example #2 :) > On Apr 27, 2016, at 3:59 PM, Matt Burgess <mattyb...@gmail.com> wrote: > > If you can represent the expected string format as a regular > expression, you can use the replaceAll() function [1] with > back-references: > > ${url:r

Re: Is it possible to call a HIVE table from a ExecuteScript Processor?

2016-04-26 Thread Matt Burgess
Hive doesn't work with ExecuteSQL as its JDBC driver does not support all the JDBC API calls made by ExecuteSQL / PutSQL. However I am working on a Hive NAR to include ExecuteHiveQL and PutHiveQL processors (https://issues.apache.org/jira/browse/NIFI-981), there is a prototype pull request on

Re: ReplaceText processor configuration help

2016-04-26 Thread Matt Burgess
ecame broken. > > Also looking for some alterbatives like using Groovy for JSON-to-JSON > conversion. But not sure how StandardCharsets.UTF_8 will work with > multi-byte languages. > > > On Tue, Apr 26, 2016 at 12:11 PM, Matt Burgess <mattyb...@gmail.com> wrote: >> >

Re: Doing development on nifi

2016-04-28 Thread Matt Burgess
Stéphane, Welcome to NiFi, glad to have you aboard! May I ask what version you are using? I believe as of at least 0.6.0, you can view the items in a queued connection. So for your example, you can have a GetHttp into a SplitJson, but don't start the SplitJson, just the GetHttp. You will see any

Re: ReplaceText processor configuration help

2016-04-26 Thread Matt Burgess
Yes, I think you'll be better off with Aldrin's suggestion of ReplaceText. Then you can put the value of the attribute(s) directly into the content. For example, if you have two attributes "entities" and "users", and you want a JSON doc with those two objects inside, you can use ReplaceText with

Re: Nifi into Titan graph

2016-05-22 Thread Matt Burgess
Pat, I did a very deep dive into Tinkerpop3 this weekend, I was looking for a very generic solution (to involve GremlinServer at the least but hopefully using RemoteGraph/RemoteConnection for any server that can accept graph traversals, not a Titan one in particular). Also I wanted to abstract

Re: EvaluateXPath and xml namespace

2016-05-22 Thread Matt Burgess
ww.centricconsulting.com | @Centric <https://twitter.com/centric> > > On Sat, May 21, 2016 at 9:08 PM, Matt Burgess <mattyb...@gmail.com> wrote: > >> Hong, >> >> The use of a default namespace makes the XPath more tricky, as the >> namespace technically

Re: EvaluateXPath and xml namespace

2016-05-21 Thread Matt Burgess
Hong, The use of a default namespace makes the XPath more tricky, as the namespace technically exists as a prefix although it is not visible in the document. As an example, I used this sample content: http://cp.com/rules/client;> Hello In order to get the value "Hello", I had to use

Re: SelectHiveQL HiveConnectionPool issues

2016-05-09 Thread Matt Burgess
Your URL has a scheme of "mysql", try replacing with "hive2", and also maybe explicitly setting the port: jdbc:hive2://:1/default If that doesn't work, can you see if there is an error/stack trace in logs/nifi-app.log? Regards, Matt On Mon, May 9, 2016 at 12:04 PM, Mike Harding

Re: How to extract mutiple json properties/fields into processor properties?

2016-05-11 Thread Matt Burgess
Are you using update attribute to fill HTTP header attributes? In any case, I think InvokeHttp will be a solution. Regards, Matt > On May 11, 2016, at 6:15 PM, Keith Lim wrote: > > Thanks Brian, that works. I have a follow up question. I want to use the > update

Re: SplitJson configuration question

2016-05-11 Thread Matt Burgess
I believe $.* should work to split at the root. > On May 11, 2016, at 5:23 PM, Igor Kravzov wrote: > > Looks like am missing something. How to configure SplitJson to split array > like bellow to individual JSON files. Basically split on "root" of array. > > [{ >

Re: JSON Schema

2016-05-17 Thread Matt Burgess
Madhu, This is a good idea for a processor (ValidateJson like the existing ValidateXml processor), I've written up [1] in Jira for it. In the meantime, here's a Groovy script you could use in ExecuteScript, just need to download the two JAR dependencies ([2] and [3]) and add them to your Module

Re: QueryDatabaseTable errors

2016-05-12 Thread Matt Burgess
We can probably do better with the error displayed in the bulletin, maybe by propagating the message from the cause to the RuntimeException or something. > On May 12, 2016, at 6:31 PM, Thad Guidry wrote: > > The odbc6.jar is in the classpath already ( dropped it into

Re: PutElasticsearch error

2016-05-06 Thread Matt Burgess
Pierre is correct Sent from my iPhone > On May 6, 2016, at 5:20 PM, Pierre Villard > wrote: > > Hi Igor, > > I believe ES processor uses port 9300 (transport port) and not 9200 port > (http port) > > Pierre. > > 2016-05-06 23:16 GMT+02:00 Igor Kravzov

Re: ExecuteScript Processor Performance

2016-05-02 Thread Matt Burgess
Madhu, In addition to Joe's suggestions, currently ExecuteScript only allows for one task at a time, which is currently a pretty bad bottleneck if you are dealing with lots of throughput. However I have written up a Jira [1] for this and issued a PR [2] to fix it, feel free to try that out and/or

Re: Lua usage in ExecuteScript Processor

2016-05-04 Thread Matt Burgess
>> >>> >>> luajava.LuaState.openLibs() >>> luajava.LuaState.LdoFile("common_log_format.lua"); >>> >>> >>> On Wed, Apr 20, 2016 at 4:29 PM, Madhukar Thota >>> <madhukar.th...@gmail.com> wrote: >>>> >&g

Re: Is it possible to call a HIVE table from a ExecuteScript Processor?

2016-04-28 Thread Matt Burgess
ou have an idea when you plan to issue the PR for > this? > > > Cheers, > Mike > > On Tue, 26 Apr 2016 at 14:47, Matt Burgess <mattyb...@gmail.com> wrote: >> >> Hive doesn't work with ExecuteSQL as its JDBC driver does not support >> all the JDBC API

Re: JsonSplit Question/Help

2016-07-26 Thread Matt Burgess
Sven, You can use the SplitJson processor with a JSONPath value of $.twitter.hashtags, it will create a new flowfile for each hashtag. Then you can use EvaluateJsonPath to get the text value from each of the flow files. Regards, Matt On Tue, Jul 26, 2016 at 5:09 PM, Sven Davison

Re: export from Teradata

2016-07-14 Thread Matt Burgess
Dima, There was a discussion on how to get the SQL processors working with Teradata a little while ago: http://mail-archives.apache.org/mod_mbox/nifi-users/201605.mbox/%3CCAEXY4srXZkb2pMGiOFGs%3DrSc_mHCFx%2BvjW32RjPhz_K1pMr%2B%2Bg%40mail.gmail.com%3E Looks like it involves making a fat JAR to

Re: MergeContent with varying number of entries in bins.

2016-08-10 Thread Matt Burgess
Michael, There are a handful of examples of ExecuteScript using Javascript and/or Jython, on my blog (http://funnifi.blogspot.com) and other locations: Javascript: http://funnifi.blogspot.com/2016/03/executescript-json-to-json-revisited.html

Re: ExecuteSQL question

2016-08-03 Thread Matt Burgess
per docs)? And > then finally write text file back to file system to be picked up next time? > Thanks > Conrad > > On 03/08/2016, 14:02, "Matt Burgess" <mattyb...@gmail.com> wrote: > >Conrad, > >Is it possible to add a view (materialized or not) to the

Re: ExecuteSQL question

2016-08-03 Thread Matt Burgess
Conrad, Is it possible to add a view (materialized or not) to the RDBMS? That view could take care of the denormalization and then QueryDatabaseTable could point at the view. The DB would take care of the push-down filters, which functionally is like if you had a QueryDatabaseTable for each table

Re: Json routing

2016-07-07 Thread Matt Burgess
It will be fixed in 0.7.0 [1]. Also you could use InvokeScriptedProcessor to replace both the ExecuteScript and RouteOnAttribute, since the scripted processor can define the relationships and provide the logic to extract the arbitrary JSON keys. Regards, Matt [1]

Re: Json routing

2016-07-07 Thread Matt Burgess
ly.. > >> On Thu, Jul 7, 2016 at 11:45 AM, Matt Burgess <mattyb...@gmail.com> wrote: >> It will be fixed in 0.7.0 [1]. Also you could use >> InvokeScriptedProcessor to replace both the ExecuteScript and >> RouteOnAttribute, since the scripted processor ca

Re: PutCassandraQL failing on ISO-8601-formatted timestamp

2016-07-01 Thread Matt Burgess
rgs.3.value' > Value: '6.7' > Key: 'j.id' > Value: 'temp3' > Key: 'j.ts' > Value: '2016-06-30T20:04:36Z' > Key: 'j.value' > Value: '6.7' > -- > INSERT INTO test.test2 (sensor, ts, value) VALUES(?,?,

Re: ExecuteProcess (fetch output)

2016-07-03 Thread Matt Burgess
A single dot will match a single character, so I think you'll need ".*". Also ExtractText might be looking for a grouping, so you may need "(.*)". If that doesn't handle multi-lines I think there's a processor property for that. Sorry I'm not at my computer so can't confirm. > On Jul 3, 2016,

Re: v0.* QueryDatabaseTable vs v1 GenerateTableFetch

2016-08-15 Thread Matt Burgess
Peter, Another difference between the two (besides the paging) is that QueryDatabaseTable executes SQL and GenerateTableFetch generates SQL. With the paging capability (which with Remote Process Groups enables distributed fetch a la Sqoop), you're likely correct that GTF will replace /

Re: v0.* QueryDatabaseTable vs v1 GenerateTableFetch

2016-08-15 Thread Matt Burgess
Oops sorry, had replied before I saw this :) > On Aug 15, 2016, at 11:15 PM, Peter Wicks (pwicks) wrote: > > Oh, disregard J. I misread GenerateTableFetch as being an actual data fetch > vs a query builder. > > From: Peter Wicks (pwicks) > Sent: Monday, August 15, 2016

Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary

2017-02-02 Thread Matt Burgess
James, If you'd rather work with the inputStream as bytes, you don't need the IOUtils.toString() call, and I'm not sure what a UTF-8 charset would do to your mixed data. You can wrap any of the *InputStream decorators around the inputStream object, such as DataInputStream [1] to read various

Re: Validating an array of objects using ConvertJSONToAvro

2017-02-03 Thread Matt Burgess
Bas, Sorry for the late reply, I should've mentioned sooner that I am looking into this issue. From your description it seems like ConvertJSONtoAvro should be able to handle this kind of thing; if I can't find a schema that fits and instead confirm it is a bug/improvement, I will write up a Jira

Re: Writing back through a python stream callback when the flowfile content is a mix of character and binary

2017-02-03 Thread Matt Burgess
James, I haven't had a chance to dig into this yet, but one thing I noticed about your script was an issue identified by Bryan Rosander (NiFi committer and all-around good guy :) as the probable cause of the TypeError, namely the calling of bytearray() after encode() (the latter of which already

Re: send contents of web page to a remote nifi instance

2017-02-08 Thread Matt Burgess
Mohammed, HandleHttpRequest [1] allows you to specify the listening port as well as Allowed Paths. Using the hostname/IP of the NiFi instance, along with the Listening Port and Allowed Paths, creates an endpoint to which you can issue HTTP commands (GET, PUT, POST -- all can be allowed or denied

Re: Flowfile handling in C# is possible or not?

2017-02-08 Thread Matt Burgess
Prabhu, There are a couple of ways I can think of for NiFi to be able to communicate with an external application: 1) The InvokeHttp processor [1] can send the flow file content as the payload and any number of flow file attributes as HTTP headers (you can specify a regular expression for which

Re: Scientific Notation conversion?

2017-01-26 Thread Matt Burgess
Sven, Are your values Strings or numbers? Meaning does the JSON look like: { "a": "2.1234567891E10" } or { "a" : 2.1234567891E10 } If the latter, would the output field ("a" or "new_a" or whatever) have to remain a number, or is a String ok? I think most applications/libraries will default

Re: Data extraction for 100 columns is possible in NiFi?

2017-01-30 Thread Matt Burgess
Prabhu, I agree with Mark; if you want to use ExecuteScript for this, I have an example of splitting fields (using a bar | delimiter, but you can change to comma) [1]. If you have quoted values that can contain commas, then like Mark said you may want to look at writing a custom processor, or

Re: Execute script and python

2017-02-21 Thread Matt Burgess
org> wrote: > Hey Brian, > > One good resource around the ExecuteScript processor is Matt Burgess' blog > [1]. Matt wrote the ExecuteScript processor and has a bunch of how-to guides > there. > > [1] http://funnifi.blogspot.com/ > > Thanks, > Bryan > > On Tue,

Re: RemoveDistributedMapCache

2017-02-13 Thread Matt Burgess
tOfTablesToSyncronize -> DetectDuplicte (tableName, with no age > Off) ->CreteTableIfNotExists -> IncrementalLoadData –> > RemoveDistributedMapCache (tableName) > > > > Unfortunately there isn’t the processor RemoveDistributedMapCache, I could > handle this, thanks to Matt B

Re: RemoveDistributedMapCache

2017-02-13 Thread Matt Burgess
t > process the same table at the same time, what I wish achieve is a > synchronized process for each table. > > Regards > Carlos > > -Original Message- > From: Matt Burgess [mailto:mattyb...@apache.org] > Sent: segunda-feira, 13 de Fevereiro de 2017 18:2

Re: Query related to ExecuteScript

2016-08-18 Thread Matt Burgess
,759 WARN [NiFi Web Server-111] >> o.e.jetty.util.thread.QueuedThreadPool Unexpected thread death: >> org.eclipse.jetty.util.thread.QueuedThreadPool$3@54b057d5 in NiFi Web >> Server{STARTED,8<=13<=200,i=4,q=0} >> 2016-08-18 20:49:42,759 ERROR [NiFi Web Server-111] org.apach

Re: adding dependencies like jdbc drivers to the build

2016-08-22 Thread Matt Burgess
All, I took a shot at adding the ability to specify multiple URLs, files, and folders to the DBCPConnectionPool configuration (NIFI-2604). The branch is here if you'd like to build and try: https://github.com/mattyb149/nifi/tree/NIFI-2604 The property name, description, etc. has changed, which

Re: Query related to ExecuteScript

2016-08-17 Thread Matt Burgess
If you need an input flowfile, you're probably better off with ExecuteStreamCommand than ExecuteScript for this use case. ExecuteStreamCommand is much like ExecuteProcess but it accepts input flow files. Regards, Matt > On Aug 17, 2016, at 6:49 PM, koustav choudhuri

Re: Re: Re: new Nifi Processors

2017-03-02 Thread Matt Burgess
; Betreff: Re: Re: new Nifi Processors > > Basically the GPL license puts restrictions on how one can distribute in > practical terms. Meaning your work may live under GPL license as long as > it's not part of the official package. End users will have to download your > NAR themselve

Re: new Nifi Processors

2017-02-28 Thread Matt Burgess
Uwe G has made his processors available (thank you!) via his own repo vs the official Apache NiFi repo; this may be directly related to your point about licensing. Having said that, he is of course at liberty to license those separate processors as he sees fit (assuming it is also in accordance

Re: Need to read a small local file into a flow file property

2016-08-24 Thread Matt Burgess
Chris, Are you looking to have a flow file that has its own content also as an attribute? With EvaluateJsonPath, are you taking in the entire document? If so, you could use ExtractText with a regex that captures all text and puts it in an attribute, I believe the content of the flow file is

Re: Processor to enrich attribute from external service

2016-09-02 Thread Matt Burgess
Manish, Some of the queries in those processors could bring back lots of data, and putting them into an attribute could cause memory issues. Another concern is when the result is binary data, such as ExecuteSQL returning an Avro file. And since the return of these is a collection of records,

Re: Processor to enrich attribute from external service

2016-09-02 Thread Matt Burgess
. This would avoid using additional “Extract” type processor. All the > downstream processor can simply work with “jsonPath” for additional lookup > inside the attribute. > > > > Regards, > > Manish > > > > From: Matt Burgess [mailto:mattyb...@gmail.com] > Sent: Fr

Re: Dynamic property in QueryDatabaseTable

2016-09-08 Thread Matt Burgess
Ravisankar, The dynamic property needs to have a certain name, in general of the form initial.maxvalue.{max_value_column}. So if you have a max value column called last_updated, you will want to add a dynamic property called initial.maxvalue.last_updated, and you set the value to whatever you

Re: Posting input files to NiFi using REST

2016-09-04 Thread Matt Burgess
James, For simple calls that return immediately, ListenHttp probably works fine. For more flexible and powerful processing of HTTP requests (and responses), you might be better off with HandleHttpRequest and HandleHttpResponse. There is an example of this under Hello_NiFi_Web_Service [1].

Re: Appending files in Hadoop with PutHDFS ...

2016-09-07 Thread Matt Burgess
ge data and periodically write one big file) but the latency > it introduces is not acceptable. What are some other options that we can try? > > Suyog Kulkarni > suyog_kulka...@csx.com > > > -Original Message- > From: Matt Burgess [mailto:mattyb...@apache.org] > Se

Re: PutHiveQL and Hive Connection Pool with HDInsight

2016-09-29 Thread Matt Burgess
Manish, According to [1], status 72 means a bad URL, perhaps you need a transportMode and/or httpPath parameter in the URL (as described in the post)? Regards, Matt [1] https://community.hortonworks.com/questions/23864/hive-http-transport-mode-problem.html On Thu, Sep 29, 2016 at 9:06 AM,

Re: PutHiveQL and Hive Connection Pool with HDInsight

2016-09-30 Thread Matt Burgess
somehdiclustername.azurehdinsight.net:443/ > somedbname;ssl=true?hive.server2.transport.mode=http; > hive.server2.thrift.http.path=/hive2. > > But, I was getting *java.lang.NoSuchFieldError: INSTANCE: > java.lang.NoSuchFieldError: INSTANCE*. > > > > I will try again with

Re: JoltTransformJSON error

2016-10-05 Thread Matt Burgess
I'm not near my computer but my knee-jerk reaction is that all the jolt-app-demo transforms are actually Chain transforms, some (like your example) with a single transform inside (like a Shift). Try removing the array brackets if you're selecting a Shift transform, or choose Chain and keep them

Re: Nifi for java Program

2016-10-04 Thread Matt Burgess
Selvam, Are you looking to run an external java program (like running "java -jar MyCode.jar" from the command-line)? If so, you can use the ExecuteProcess [1] or ExecuteStreamCommand [2] processor(s). If you are looking to call code from a JAR directly, you could use the ExecuteScript processor

Re: Routeonattribute

2016-10-08 Thread Matt Burgess
Selvam, Are those two branches meeting at the same RouteOnAttribute (aka filterpoint)? If so, I'm assuming you'd like the GetFile/ExtractText to inform the RouteOnAttribute processor how to handle the flow files coming in from the other branch (please correct me if I've misunderstood). In

Re: SelectHiveQL Error

2016-10-06 Thread Matt Burgess
Dan, That is a catch-all error returned when (in case probably) something is misconfigured. Are there more error lines below that in the log? The driver class and all its dependencies are present in the Hive NAR, so there is likely an underlying error that, while being propagated up, returns the

Re: SelectHiveQL Error

2016-10-06 Thread Matt Burgess
apache.hive.jdbc.HiveDriver' for connect URL >> 'jdbc:hive://server:port/default' >> at >> org.apache.commons.dbcp.BasicDataSource.createConnectionFactory(BasicDataSource.java:1452) >> ~[commons-dbcp-1.4.jar:1.4] >> at >> org.apache.commons.

Re: SelectHiveQL Error

2016-10-07 Thread Matt Burgess
e2://, I get a different error set of >> errors. >> >> >> >> >Error getting Hive connection >> >> >org.apache.commons.dbcp.SQLNestedException: Cannot create >> > PoolableConnectionFactory (Could not open client transport with JDBC Uri: >

Re: PutHiveQL Multiple Ordered Statements

2016-09-23 Thread Matt Burgess
Peter, Since each of your statements ends with a semicolon, I would think you could use SplitText with Enable Multiline Mode and a delimiter of ';' to get flowfiles containing a single statement apiece, then route those to a single PutHiveQL. Not sure what the exact regex would look like but on

Re: PutHiveQL Multiple Ordered Statements

2016-09-23 Thread Matt Burgess
ld get > it to work; however I am not really sure how to apply the correct priority > attribute to the correct split. Does split already apply a split index? (I > haven't checked) > > Thanks, > Peter > > -Original Message- > From: Matt Burgess [mailto:mattyb

  1   2   3   4   5   >