Re: A bag of groovy questions regarding the ExecuteScript processor

2017-10-04 Thread Matt Burgess
Giovanni, I second all of Andy's answers, they are spot-on. For the each() construct, they are "safe" in the sense that you will be working with one flow file at a time, but remember that there is only one "session". If you throw an Exception from inside the each(), then it will be caught by Execu

Re: A bag of groovy questions regarding the ExecuteScript processor

2017-10-05 Thread Matt Burgess
; } > > That way the fat jar will be much smaller but still executable by NiFi. > Without that a 15kb jar ends up being a 8mb fat jar. > > on-the-fly-reload) I'd rather hack the API that doing that :) Are there any > pointers/examples for this InvokeScriptedProcessor? It

Re: FTPS

2017-10-05 Thread Matt Burgess
Austin, There is an open Jira (NIFI-2278 [1]) to add this support to the existing processor(s), I believe if Apache Commons Net is used for the clients then we would just need to create an FTPSClient [2] instead of an FTPClient. In order to support prototyping such things, Apache Commons Net was a

Version 1.2.0 of nifi-script-tester released

2017-10-05 Thread Matt Burgess
All, I've just released version 1.2.0 of the nifi-script-tester [1], a utility that lets you test your Groovy, Jython, and Javascript scripts for use in the NiFi ExecuteScript processor. Here are the new features: - Upgraded code to NiFi 1.4.0 - Added support for incoming flow file attributes F

Re: convert avro schema to another schema

2017-10-11 Thread Matt Burgess
If you know the input and output schemas, you should be able to use UpdateRecord for this. You would have a user-defined property for each output field (including flattened field names), whose value would be the RecordPath to the field in the map. I believe for any fields that are in the input sche

Re: FTPS

2017-10-11 Thread Matt Burgess
ement this in nifi > > On Thu, Oct 5, 2017 at 3:08 PM, Matt Burgess wrote: >> >> Austin, >> >> There is an open Jira (NIFI-2278 [1]) to add this support to the >> existing processor(s), I believe if Apache Commons Net is used for the >> clients then we

Re: FTPS

2017-10-11 Thread Matt Burgess
7 at 1:08 PM, Matt Burgess wrote: >> >> Austin, >> >> Sorry I lost track of this thread. If you have a command-line FTPS >> client, then you can configure ExecuteProcess or ExecuteStreamCommand >> to run the same command you would from a shell. For the scripted

Re: CSV to XML in NiFi using ScriptedRecordSetWriter

2017-10-12 Thread Matt Burgess
Kiran, There is an example of a Groovy script for an XML writer [1] in the unit tests for ScriptedRecordSetWriter, this should be a pretty good place to get started, but please let us know if you have any questions or issues in making it work. Regards, Matt [1] https://github.com/apache/nifi/bl

Re: CSV to XML in NiFi using ScriptedRecordSetWriter

2017-10-13 Thread Matt Burgess
wants to ingest CSV. On the old version of nifi I would have used > TransformXml and XSLT to achieve this, should I still go down that route or > can you point me in the direction of a xml Scripted reader example? > > Kiran > > > > ---- Original Message >

Re: Avro timestamp problem

2017-10-19 Thread Matt Burgess
Uwe, I think you are running into either AVRO-2065 [1] or its related issue AVRO-1891 [2]. Hopefully they will fix it for 1.8.3 and we can upgrade after it is released. In the meantime, try a schema with just a union between null and long, then use QueryRecord to filter out all the records whose

Re: Python example of setState, getState?

2017-10-20 Thread Matt Burgess
I have an example (albeit a trivial one) of this in my ExecuteScript Cookbook post [1]. As far as a separate workaround, I can't tell from the description what you need to do differently than ListFile. It starts with no state, lists all the files, saves the time of the newest file in state, then

Re: PutElasticsearchHttp array

2017-10-20 Thread Matt Burgess
Pat, Are you trying to put the whole array in as a single document, or are you trying to put each element of the array in as a separate document? If the former, you could use ReplaceText to put the array into a JSON object. If the latter, you can use SplitJSON to split the array into individual e

Re: PutElasticsearchHttp array

2017-10-20 Thread Matt Burgess
Pat, If you match the entire text, you should be able to do something like the following as the replacement: { "array": $1 } I didn't try this, but I think it should put the array into a JSON object. Although an array may be a valid JSON "object", I don't think Elasticsearch accepts them as such

Re: Back pressure deadlock

2017-10-23 Thread Matt Burgess
Perhaps a quick(ish) win would be to implement a DeadlockDetectionReportingTask, where you could specify processor IDs (or names but that can get dicey) and it would monitor those processors for incoming connections that all have backpressure applied, and that the processor has not run for X amount

Re: Select Query With Aliases

2017-10-23 Thread Matt Burgess
Austin, What version of NiFi are you using? There was an issue with aliases (at least for MySQL) before NiFi 1.1.0, fixed by NIFI-3064 [1]. Also what database and driver version are you using? Since NiFi 1.1.0, we are following the JDBC 4 spec which says the drivers (when getColumnLabel is calle

Re: Select Query With Aliases

2017-10-23 Thread Matt Burgess
it says that spaces are an illegal character) > or is it possible for me to erase the first row and replace it with the > column names that I want. I am converting the avro schema into a csv. > > On Mon, Oct 23, 2017 at 3:03 PM, Matt Burgess wrote: >> >> Austin, >> >&g

Re: UpdateAttribute is missing JAXB dependency

2017-10-28 Thread Matt Burgess
Leandro, NiFi does not yet work with Java 9 as far as I know, your version was compiled against (and is intended to run against) Java 8. Regards, Matt > On Oct 28, 2017, at 3:36 PM, Leandro Lourenço > wrote: > > Hi, > > I'm having a strange issue with UpdateAttribute processor. > '' is inv

Re: Creating a record schema that has a date or timestamp field

2017-11-01 Thread Matt Burgess
Mike, Which Record Readers/Writers are you using? Do they have the option for "Date Format", and if so, are they filled in? Date Format defaults to empty, and the doc says it "[s]pecifies the format to use when reading/writing Date fields. If not specified, Date fields will be assumed to be number

Re: Reading flowfile in a stream callback

2017-11-03 Thread Matt Burgess
ode('utf-8'))) > > that omits the encoding, like so: > > outputStream.write(bytearray(some_binary)) ? > > Thank you very much in advance. -Jim > > On Thu, Nov 2, 2017 at 8:26 PM, Andy LoPresto wrote: >> >> James, >> >> The Python API should

Re: Enrichment flow using ScriptedLookup

2017-11-05 Thread Matt Burgess
Eric, Because LookupService implements ControllerService, you must implement initialize(ControllerServiceInitializationContext context), which Andy's script provides an empty body for. However that context object has a method called getLogger() on it, so you can override the initialize() method an

Re: ValidateRecord Processor

2017-11-05 Thread Matt Burgess
Seems like ValidateRecord might make a good two-birds-with-one-stone replacement for ConvertRecord :) -Matt > On Nov 5, 2017, at 3:46 PM, Mark Payne wrote: > > Hey Paul, > > That is accurate - the Record Writer chosen will not affect the validation > process. > The way that the processor wo

Re: Replace Text

2017-11-08 Thread Matt Burgess
Austin, If your data is not coming from something like ExecuteSQL (which Bryan mentioned) but you are defining a schema for it, there are a couple of options. First, what format is your data in? If CSV, you can configure a CSVReader to use your schema and ignore the header, effectively renaming th

Re: Output from PostHTTP

2017-11-08 Thread Matt Burgess
Jim, The content of the flow file is the body of the outgoing POST, so you could query provenance for the PostHttp processor, find the associated flow file(s), and (if the content is still available in the content repository) retrieve the content. Also the resolved URL for the POST (after evaluati

Re: Output from PostHTTP

2017-11-08 Thread Matt Burgess
he flowfile content. How do I set > that attribute to be my flowfile content? > > The challenge I seem to be having is that the service is not a nifi flow. > How do i feed to it the content body? > > On Wed, Nov 8, 2017 at 9:41 AM, Matt Burgess wrote: >> >> Jim, &

Re: Nifi : 504 Gateway Time-Out Error

2017-11-09 Thread Matt Burgess
Aruna, The reason you can no longer log in is due to the Out Of Memory Error occurring on the JVM running NiFi. I believe you will need to restart NiFi in order to reconnect. For the original issue, PutDatabaseRecord uses the JDBC driver to set up a prepared statement along with rows of values. E

Re: Nifi : 504 Gateway Time-Out Error

2017-11-09 Thread Matt Burgess
you have given, which one is recommended? > > > > *From:* Matt Burgess [mailto:mattyb...@apache.org] > *Sent:* Thursday, November 09, 2017 12:14 PM > *To:* users@nifi.apache.org > *Subject:* Re: Nifi : 504 Gateway Time-Out Error > > > > Aruna, > > >

Re: csv to sql

2017-11-09 Thread Matt Burgess
Austin, Yes that's exactly what PutDatabaseRecord is for, it is kind of like ConvertJSONToSQL + PutSQL, but it uses the record reader of your choice, so it doesn't have to be JSON. You can set up a CSVReader and a DBCPConnection pool pointing at your PostgreSQL DB, set the verb (INSERT, e.g.) and

Re: Wait only if flagged?

2017-11-13 Thread Matt Burgess
Peter, I haven't tried this, but my knee-jerk reaction is to switch the roles of the "wait" and "success" relationships. Maybe you can send the "wait" relationship downstream and route the "success" one back to Wait. Then when the flag is "cleared", the flow files will start going to the "success"

Re: How to get DBCP service inside ScriptedLookupService

2017-11-14 Thread Matt Burgess
Eric, So I just learned ALOT about the bowels of the context and initialization framework while digging into this issue, and needless to say we will need a better way of making this available to scripts. Here's some info: 1) The ControllerServiceInitializationContext object passed into initialize

Re: How to get DBCP service inside ScriptedLookupService

2017-11-14 Thread Matt Burgess
are multiple with > the same name. In fact, over 10 different iterations you could get 10 > different services instead of always > getting the same service. > > So I guess the question is: Is there a reason that the typical approach of > identifying the service in a > Property

Re: NIFI 1.4.0 - PutMongo - How to use composite key for "Update Query key" parameter ?

2017-11-15 Thread Matt Burgess
Thomas, You can file an Improvement or New Feature Jira [1] asking for the enhancement. Ironically Avro 1.4.0's Schema.parse() method does allow the dollar sign, but we use Avro 1.8.x now which is apparently more strict. I am toying around with a PegasusSchemaRegistry using the PegasusSchemaParse

Re: NIFI 1.4.0 - PutMongo - How to use composite key for "Update Query key" parameter ?

2017-11-16 Thread Matt Burgess
ght want to be careful about that because Avro 1.8 added the support for > logical types and removing that could break parts of the Record API like the > date/timestamp functionality. > > On Wed, Nov 15, 2017 at 3:41 PM, Matt Burgess wrote: >> >> Thomas, >> >>

Re: NIFI 1.4.0 - PutMongo - How to use composite key for "Update Query key" parameter ?

2017-11-16 Thread Matt Burgess
I wrote up an Improvement Jira to add the property to Validate Field Names to AvroSchemaRegistry: https://issues.apache.org/jira/browse/NIFI-4612 -Matt On Thu, Nov 16, 2017 at 9:15 AM, Matt Burgess wrote: > Mike, > > That's a very good point, I guess that would have to be emulate

Re: Is it possible to import class from NAR bundle in scripted processor?

2017-11-20 Thread Matt Burgess
The other NARs are not immediately available to the scripting NAR, and in general you usually have to put your processor in the same NAR as the base class, or put the base class and interfaces in to an API JAR and share that somehow. IMO there's a little too much voodoo to try and make it work with

Re: Someone recommend a good Avro studdy guide for newbie?

2017-11-22 Thread Matt Burgess
Eric, If you're looking for examples on implementing a scripted record reader/writer, you can see the unit test examples [1] or Drew Lim's blog [2]. However I suspect you are looking to leverage an existing RecordReader/Writer from a scripted processor such as ExecuteScript or InvokeScriptedProce

Re: Hyphenated Tables and Columns names

2017-12-06 Thread Matt Burgess
Alberto, What version of NiFi are you using? As of version 1.1.0, QueryDatabaseTable has a "Normalize Table/Column Names" property that you can set to true, and it will replace all Avro-illegal characters with underscores. Regards, Matt On Wed, Dec 6, 2017 at 12:06 PM, Alberto Bengoa wrote: >

Re: Nifi: how to transfer only last file from the flowFile list?

2017-12-07 Thread Matt Burgess
Sally, I don't think you want a FlowFileFilter here, as your smaller flow files will remain in the queue while the large enough ones get processed. Here's a script that I think does what you want it to, but please let me know if I've misunderstood your intent: def ffList = session.get(1000) def l

Re: ConvertJSONToSQL empty VALUES fields

2017-12-08 Thread Matt Burgess
Alberto, This came up the other day as well, the generated SQL is a Prepared Statement, which allows the code to use the same statement but then just set different values based on "parameters". In this case the values for the parameters are stored in "positional" flow file attributes for the state

Re: ListS3 Processor Error

2017-12-11 Thread Matt Burgess
Aruna, The index and type for Elasticsearch are kinds of partitioning that can help the users organize data, but definitely help in indexing and searching data. Types are not always required, but an index is. Imagine you are trying to store a bunch of tweets from a Twitter feed (or firehose) into

Re: create a sql table if it does not exist

2017-12-11 Thread Matt Burgess
Tina, What database are you using? Do you have the ability to call CREATE TABLE IF NOT EXISTS? If so, you could add those to your other sql statements and send them either to PutSQL individually (but you'll probably want a Prioritizer on the connection between ExecuteScript -> PutSQL or an Enforce

Re: Using threads inside nifi ExecuteScript processos

2017-12-19 Thread Matt Burgess
Sally, Although it may be possible to do some multithreading inside ExecuteScript, it is not really designed for this kind of thing; rather it is usually for short code blocks to perform some transformation or parse some format that existing NiFi processors do not (yet) handle. In briefly looking

Re: How to get controller service in Script Executor

2017-12-21 Thread Matt Burgess
Kui, The getControllerService() method requires a controller service (CS) identifier, not the name (because names are not necessarily unique). To get the CS by name, you have to get the list of all CSs and match on the name. I have an example in Groovy on my blog [1], but here is a similar one por

Re: use record reader to read text file and get line counts

2017-12-22 Thread Matt Burgess
Tina, You could use SplitText with a very large value for Line Split Count (larger than any of your files would contain), and you will get the same flow file out but with an attribute called "text.line.count" that contains the number of lines in the file. Regards, Matt On Fri, Dec 22, 2017 at 1

Re: [EXTERNAL EMAIL]Re: Kerberos hive failure to renew tickets

2018-01-10 Thread Matt Burgess
Apache Hive 1.2.1, or a version from a vendor? Also are you using Apache NiFi or a version from a vendor? On Wed, Jan 10, 2018 at 11:07 AM, Georg Heiler wrote: > Hive is 1.2.1 > Joe Witt schrieb am Mi. 10. Jan. 2018 um 17:04: >> >> Interesting. Not what I thought it might have been. >> >> Can y

Re: [EXTERNAL EMAIL]Re: Kerberos hive failure to renew tickets

2018-01-10 Thread Matt Burgess
Georg, are you seeing the same stack trace as Jonathan? Or something different? On Wed, Jan 10, 2018 at 11:11 AM, Schneider, Jonathan wrote: > HDF 3.0.0? > > > > *Jonathan Schneider* > > Hadoop/UNIX Administrator, STSC > > SCL Health > > 17501 W. 98

Re: [EXTERNAL EMAIL]Re: Kerberos hive failure to renew tickets

2018-01-10 Thread Matt Burgess
To Joe's point, this may not be an issue in the upcoming 1.5.0 release as it may have been fixed under [1]. Regards, Matt [1] https://issues.apache.org/jira/browse/NIFI-3472 On Wed, Jan 10, 2018 at 11:14 AM, Georg Heiler wrote: > Regarding the stack trace I will clarify tomorrow. But it is pre

Re: How to resolve failure to coerce to byte[] in ExecuteScript python script

2018-01-23 Thread Matt Burgess
Jim, If you only need to read the contents of the flow file and not modify them, then you don't have to use session.write() to send the original content out, you can just use session.read(FlowFile, InputStreamCallback) instead. I'm not sure why that write fails sometimes and not others, I suspect

Re: Stateful

2018-02-06 Thread Matt Burgess
Austin, Can you create a (non-)materialized view from that query? If so then QueryDatabaseTable could work. If not, then try QueryRecord after ExecuteSQL (after adding s.txn_time, I didn't see it in the query), I think you can add a "max_txn_time" field to the schema and do something like SELECT *

Re: Integrate new flowFile generation out of an ExecuteScript processor running a python script

2018-02-07 Thread Matt Burgess
Jim, You can use session.create() to create a new FlowFile from within your script. You don't need a parent, or to transfer input->output, or even write any content to the output FlowFile for your use case. After flowFile = session.create(), you can do flowFile = session.putAttribute(flowFile, "ab

Re: ConvertRecord

2018-02-08 Thread Matt Burgess
Austin, What version of NiFi are you using? I'm wondering if you're running into [1] (fixed in 1.3.0), or [2] (fixed in 1.4.0), or something else. You may want to change the types to be "optional", meaning a union between null and the intended type. So for "PracticeId", try: {"name": "PracticeId"

Re: Object not recognized in ExecuteScript

2018-02-12 Thread Matt Burgess
Jim, In this case I don't think it's as much that the modules aren't being found, rather that the datetime module in Jython returns java.sql.Timestamp (Java) objects, rather than Jython/Python datetime objects, and the former do not support the methods/attributes of the latter, including timetuple

Re: JoltJsonTransform question

2018-02-13 Thread Matt Burgess
Austin, This one works for your sample data but doesn't extend to more fields unless you keep repeating the pattern in the spec: [ { "operation": "shift", "spec": { "@Place1": "TestArray[0].Place", "@Holder1": "TestArray[0].Holder", "@Place2": "TestArray[1].Place",

Re: Sending Arguments to Scripts from ExecuteScript

2018-02-18 Thread Matt Burgess
Jim, I don't think that's possible because I don't think the user-defined properties are guaranteed to be in a particular order (insertion order in the processor config dialog, e.g.), but someone please correct me if I'm wrong. If true, then we wouldn't be able to have the user-defined properties

Re: PutHiveStreaming NullPointerException error

2018-02-19 Thread Matt Burgess
Mike, Joe is correct, in order for Apache NiFi to interact with HDP Hive, the Hive client dependencies need to be swapped out, as HDP Hive 1.x components are not 100% compatible with Apache Hive 1.x components. This can be done (in general) while building NiFi with Maven, by using a vendor profile

Re: PutHiveStreaming NullPointerException error

2018-02-19 Thread Matt Burgess
ipTests –e > > > Regards, > Mike > >> -Original Message- >> From: Matt Burgess [mailto:mattyb...@apache.org] >> Sent: Monday, February 19, 2018 2:30 PM >> To: users@nifi.apache.org >> Subject: Re: PutHiveStreaming NullPointerException error >> >

Re: Get YYYY from java.sql.Timestamp (jython engine, ExecuteScript)

2018-02-22 Thread Matt Burgess
Jim, Instead of "import java.util Calendar" try "from java.util import Calendar". You have the right approach (using calendar) to get the year, but always be aware of any timezone issues (are the input timestamps UTC? If not you might need to make an adjustment to the calendar for the timezone).

Re: Question

2018-02-23 Thread Matt Burgess
Márcio, I believe you are running into this bug [1], which seems it will be fixed in Jython 2.7.2, but that version has not been released yet (looks like it's in alpha). When/if they release jython-shaded 2.7.2, I will upgrade the library in the scripting bundle. Regards, Matt [1] http://bugs.j

Re: Additional configuration properties for DBCPConnectionPool

2018-02-23 Thread Matt Burgess
Tim, What version of NiFi are you using? As of 1.1.0 [1], you can specify a Validation Query on the DBCPConnectionPool, this is used by DBCP to validate that a connection is "good" before offering it to the client. For idle/timed-out connections, the validation query should fail and DBCP should at

Re: Additional configuration properties for DBCPConnectionPool

2018-02-23 Thread Matt Burgess
Tim, We can certainly expose some DBCP properties as processor properties, we'd have to enumerate them explicitly since the user-defined ones are used for the connection. Please feel free to write up an Improvement Jira [1] to cover whichever properties you'd like to see added to DBCPConnectionPoo

Re: Attribute level interlinked CSV file import

2018-02-26 Thread Matt Burgess
Mausam, You could use PutFile to store off the Category CSV, then you can use LookupRecord with either a CSVRecordLookupService or a SimpleCsvLookupService, the former is for fetching multiple fields from the lookup, the latter is for a single value lookup. You'll also use a CSVReader to read in t

Re: Atlas and NiFi integration help

2018-02-28 Thread Matt Burgess
Mike, There is a nifi-atlas-bundle in NiFi with a NAR that includes the ReportLineageToAtlas reporting task, but IIRC it is so large that it is not included in the default assembly. Instead there is a "include-atlas" profile that can be activated when building the assembly, and that should include

Re: A "proxy" question from the irc channel

2015-11-08 Thread Matt Burgess
Would we have more participation with something like a Slack team? Apache Drill has one with #user and #dev channels, seems to work pretty well and has various integrations with other tools (email, GitHub, Jira, etc.) Sent from my iPhone > On Nov 8, 2015, at 12:41 PM, Tony Kurc wrote: > > The

Re: A "proxy" question from the irc channel

2015-11-08 Thread Matt Burgess
cting people. > >> On Sun, Nov 8, 2015 at 12:53 PM, Matt Burgess wrote: >> Would we have more participation with something like a Slack team? Apache >> Drill has one with #user and #dev channels, seems to work pretty well and >> has various integrations with other tool

Re: Nifi service fail to start - Removed custom processor

2015-11-10 Thread Matt Burgess
Perhaps the UI could have a placeholder for a missing processor, and the context menu or body could include more details as far as which processor it was looking for. Sent from my iPhone > On Nov 10, 2015, at 8:30 PM, Chakrader Dewaragatla > wrote: > > Thank you. I wish this process get simp

Re: Nifi service fail to start - Removed custom processor

2015-11-10 Thread Matt Burgess
t [1]? > > Thanks > Joe > > [1] https://issues.apache.org/jira/browse/NIFI-1052 > >> On Tue, Nov 10, 2015 at 9:18 PM, Matt Burgess wrote: >> Perhaps the UI could have a placeholder for a missing processor, and the >> context menu or body could include more details

Re: Expression language

2015-11-12 Thread Matt Burgess
Not sure if it would prove useful but I've started messing around with the Aho-Corasick algorithm in the hopes of the user being able to paste in some sample data and getting a regex out. If the data is "regular", the user wouldn't need to know an expression language, they would just need a rep

Re: Expression language

2015-11-12 Thread Matt Burgess
u've got ideas on > how to provide a more intuitive play - yes please. You will find an > implementation of aho corasick under the standard processors > (ScanContent) and the associated library under search tools. > Amazingly fast. > > Thanks! > Joe > >> On Thu,

Re: Mapping from JMS -> JSON -> SQL

2016-01-30 Thread Matt Burgess
If the JMS source is actual JSON then you can use EvaluateJsonPath (or SplitJson for arrays), you can craft the attributes to match the arguments to PutSql and have a prepared statement within... I think :) ExecuteScript will be for those times you just can't presently connect the dots with exi

Re: Generate URL based on different conditions

2016-02-16 Thread Matt Burgess
Here's a Gist template that uses Joe's approach of RouteOnAttribute then UpdateAttribute to generate URLs with the use case you described: https://gist.github.com/mattyb149/8fd87efa1338a70c On Tue, Feb 16, 2016 at 9:51 PM, Joe Witt wrote: > Jeff, > > For each of the input files could it be t

Re: Using Apache Nifi and Tika to extract content from pdf

2016-02-20 Thread Matt Burgess
I have a blog post on how to do this with NiFi using a Groovy script in the ExecuteScript (new in 0.5.0) processor using PDFBox instead of Tika: http://funnifi.blogspot.com/2016/02/executescript-extract-text-metadata.html?m=1 Jython is also supported but can't yet use Java libraries (it uses Jyt

Re: Using Apache Nifi and Tika to extract content from pdf

2016-02-20 Thread Matt Burgess
can write up an improvement Jira with the initial findings. Regards, Matt > On Feb 20, 2016, at 2:18 PM, Russell Whitaker > wrote: > > Don't forget Clojure as well. > > Russell Whitaker > Sent from my iPhone > >> On Feb 20, 2016, at 7:44 AM, Matt Burges

Re: Using Apache Nifi and Tika to extract content from pdf

2016-02-20 Thread Matt Burgess
te(flowFile, {inputStream, outputStream -> > doc = PDDocument.load(inputStream) > info = doc.getDocumentInformation() > s.writeText(doc, new OutputStreamWriter(outputStream)) > } as StreamCallback > ) > > Thanks for your help. > > BR > Ralf > &g

Re: Using Apache Nifi and Tika to extract content from pdf

2016-02-21 Thread Matt Burgess
umentation? Or where did I find some infos? > > Sorry for all my questions. > > BR and thanks. > > Ralf > > > Am 20.02.2016 um 22:27 schrieb Matt Burgess : > > I will update the blog to make these more clear. I used PDFBox 1.8.10 so > I'm not sure what e

Re: Auto installation of template

2016-02-26 Thread Matt Burgess
Michael, I don't think you can put a template into conf/templates and have it be picked up, I tried a couple of things and it looks like the system manages the things put in there. For the REST API, the documentation for uploading a template is missing because there are two ways to use POST and t

Re: javascript executescript processor

2016-03-01 Thread Matt Burgess
Mike, I have a blog containing a few posts on how to use ExecuteScript and InvokeScriptedProcessor: http://funnifi.blogspot.com One contains an example using Javascript to get data from Hazelcast and update flowfile attributes: http://funnifi.blogspot.com/2016/02/executescript-using-modules.html

Re: javascript executescript processor

2016-03-02 Thread Matt Burgess
27;m looking for - much appreciated ! >> >> Thanks, >> Mike >> >> On Tue, 1 Mar 2016 at 18:13, Matt Burgess wrote: >> >>> Mike, >>> >>> I have a blog containing a few posts on how to use ExecuteScript and >>> InvokeScript

Re: javascript executescript processor

2016-03-02 Thread Matt Burgess
ss/nifi/processors/NiFiUtils.java > > Sent from my iPhone > >> On Mar 2, 2016, at 1:40 PM, Matt Burgess wrote: >> >> Ask and ye shall receive ;) I realize most of my examples are in Groovy so >> it was a good idea to do some non-trivial stuff in another language, th

Re: ExecuteSQL Extract database tables multiple times.

2016-03-04 Thread Matt Burgess
Currently ExecuteSql will put all available rows into a single flow file. There is a Jira case (https://issues.apache.org/jira/browse/NIFI-1251) to allow the user to break up the result set into flow files containing a specified number of records. I'm not sure why you get 26 flow files, although i

Re: ExecuteSQL and NiFi 0.5.1 - Error org.apache.avro.SchemaParseException: Empty name

2016-03-05 Thread Matt Burgess
That's on me, that commit went into 0.5.0 and looks like a negative logic error. I thought I had unit tested it but I guess not :( Sent from my iPhone > On Mar 5, 2016, at 6:57 PM, Bryan Bende wrote: > > I think this a legitimate bug that was introduced in 0.5.0. > > I created this ticket: h

Re: ExecuteSQL and NiFi 0.5.1 - Error org.apache.avro.SchemaParseException: Empty name

2016-03-05 Thread Matt Burgess
Actually on second thought it's not negative logic, it should be checking against tableNameFromMeta. Sent from my iPhone > On Mar 5, 2016, at 6:57 PM, Bryan Bende wrote: > > I think this a legitimate bug that was introduced in 0.5.0. > > I created this ticket: https://issues.apache.org/jira/

Re: ExecuteSQL Extract database tables multiple times.

2016-03-07 Thread Matt Burgess
a8b9-a77d0be27273] failed to process >>> due to org.apache.avro.SchemaParseException: Empty name; rolling back >>> session: org.apache.avro.SchemaParseException: Empty name >>> >>> 10:30:02 CET ERROR >>> ExecuteSQL[id=d32x32d7-c477-4b3b-a8b9-a77d0be27273] Pr

Re: ExecuteSQL Extract database tables multiple times.

2016-03-07 Thread Matt Burgess
, > > thanks for the reply. Is this fix also solving the issue with Microsoft > SQL Server? > Is there estimation at which time such a fix is available for the public? > > Thanks for your help. > > BR > Ralf > > > Am 07.03.2016 um 15:15 schrieb Matt Burgess : &

Re: javascript executescript processor

2016-03-07 Thread Matt Burgess
Looks like on Rhino you need a different syntax to import stuff: http://docs.oracle.com/javase/7/docs/technotes/guides/scripting/programmer_guide/#jsengine On Mon, Mar 7, 2016 at 11:26 AM, Matt Burgess wrote: > So that's weird since you're running NiFi on Java already and tr

Re: javascript executescript processor

2016-03-07 Thread Matt Burgess
t, > > Thanks for doing this - I've just tried to run the template and I get the > reference error: "Java" is not defined. I have JAVA_HOME set on my ubuntu > machine - just wondering if theres a new config setting I'm missing perhaps? > > Mike > >

Re: javascript executescript processor

2016-03-07 Thread Matt Burgess
va classes > in a script before. > > > > On 7 March 2016 at 17:03, Mike Harding wrote: > >> aaa ok cool. Given that org.apache.nifi.processor.io.StreamCallback is >> an interface do I need to include the underlying classes? >> >> On 7 March 2016 at 16

Re: DetectDuplicate : java.net.ConnectException

2016-03-19 Thread Matt Burgess
Arathi, You'll need to add another Controller Service, one of type DistributedMapCacheServer, set up on port 4557 (to match your DistributedMapCacheClientService), and enable/start it. Then you should be able to connect successfully. Regards, Matt On Thu, Mar 17, 2016 at 4:15 PM, Arathi Maddula

Re: NiFi: command-line interface ?

2016-03-20 Thread Matt Burgess
Dmitry, With regards to nifi-client (I am the author), that exception occurs when the flow has been changed externally and the shell has not recognized it. What the result of the following command? nifi.currentVersion If it is -1, then I recommend restarting the shell. It should be a non-negativ

Re: NiFi: command-line interface ?

2016-03-20 Thread Matt Burgess
). I've got Gradle 2.3 whose version > option's output states Groovy at 2.3.9. > - Dmitry > > > >> On Sun, Mar 20, 2016 at 3:23 PM, Matt Burgess wrote: >> Dmitry, >> >> With regards to nifi-client (I am the author), that exce

Re: NiFi: command-line interface ?

2016-03-20 Thread Matt Burgess
- Dmitry > >> On Sun, Mar 20, 2016 at 3:49 PM, Matt Burgess wrote: >> Hmm looks like it is working properly, not sure why you're getting the 409 >> Conflict. I will look into it more. >> >> I also wanted to mention that you can make use of the nifi-client &q

Re: NiFi: command-line interface ?

2016-03-20 Thread Matt Burgess
t in a valid state. > Returning Conflict response. > > Not sure why the state is "not valid". The GetFile processor seems fine to > me. All the processors in the flow are currently stopped. GetFile has input > files. I would assume this should be OK. > > > >

Re: Help on creating that flow that requires processing attributes in a flow content but need to preserve the original flow content

2016-03-21 Thread Matt Burgess
One way (in NiFi 0.5.0+) is to use the ExecuteScript processor, which gives you full control over the session and flowfile(s). For example if you had JSON in your "kafka.key" attribute such as "{"data": {"myKey": "myValue"}}" , you could use the following Groovy script to parse out the value of th

Re: CSV/delimited to Parquet conversion via Nifi

2016-03-21 Thread Matt Burgess
Edmon, NIFI-1663 [1] was created to add ORC support to NiFi. If you have a target dataset that has been created with Parquet format, I think you can use ConvertCSVtoAvro then StoreInKiteDataset to get flow files in Parquet format into Hive, HDFS, etc. Others in the community know a lot more about

Re: CSV/delimited to Parquet conversion via Nifi

2016-03-22 Thread Matt Burgess
e extra transform could be expensive. >> >>> On Mar 21, 2016 9:39 PM, "Dmitry Goldenberg" >>> wrote: >>> Since NiFi has ConvertJsonToAvro and ConvertCsvToAvro processors, would it >>> make sense to add a feature request for a ConvertJsonToParquet pro

Re: what is the PutElasticsearch Identifier Attribute for?

2016-03-23 Thread Matt Burgess
The Identifier Attribute property should contain the name of a Flow File attribute, which in turn contains the ID of the document to be put into Elasticsearch. Unfortunately it is a required property so having ES auto-generate it is not yet supported [1]. If you don't care what the ID is but need

Re: How to add python modules ?

2016-03-23 Thread Matt Burgess
Madhukar, Glad to hear you found a solution, I was just replying when your email came in. Although in ExecuteScript you have chosen "python" as the script engine, it is actually Jython that is being used to interpret the scripts, not your installed version of Python. The first line (shebang) is

Re: How to add python modules ?

2016-03-24 Thread Matt Burgess
ble > > -Madhu > > On Thu, Mar 24, 2016 at 12:34 AM, Madhukar Thota > wrote: > >> Hi Matt, >> >> Thank you for the input. I updated my config as you suggested and it >> worked like charm and also big thankyou for nice article. i used your >> articl

Re: REST Interface

2016-03-26 Thread Matt Burgess
The REST API is at /nifi-api not /nifi, the doc is somewhere but I am guessing we can do more to announce that in the relevant docs, thanks! Where do you think it would be helpful to add such reference(s)? Thanks, Matt > On Mar 26, 2016, at 2:47 PM, Uwe Geercken wrote: > > Just a quick one: I

Re: String conversion to Int, float double

2016-03-28 Thread Matt Burgess
Sounds good to me. I presume the processor would still put all attributes in the JSON content, but would use any dynamic properties solely for type coercion? Anything not listed would be treated like a String as it is now (to preserve current behavior). We'd need to document the possible values

Re: Common Attributes (FileSize)

2016-03-28 Thread Matt Burgess
Radhakrishna, The "fileSize" attribute should be available for every flow file. Can you describe how you are finding the default set of attributes and which ones you are finding? To test, I generated a flow file with the text "Hello" in it, then sent that to a LogAttribute processor, and got the

Re: EvaluateJsonPath and Json Field Name Starting with @ as the First Character

2016-03-29 Thread Matt Burgess
Hong, I was able to use EvaluateJsonPath with eventClass $.event.@class and the attribute had the correct value (see output from LogAttribute below): -- Standard FlowFile Attributes Key: 'entryDate' Value: 'Tue Mar 29 08:41:36 EDT 2016' Key: 'lineag

<    1   2   3   4   5   6   >