Re: How to insert a ISO date recode using PutMongo processor?

2016-06-28 Thread Thad Guidry
I've added my additional comments.

Looks good enough now.


Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>

On Tue, Jun 28, 2016 at 8:20 AM, Bryan Bende  wrote:

> Thanks Thad. I created this JIRA:
> https://issues.apache.org/jira/browse/NIFI-2135
>
> If I said anything wrong there please let me know, or add to the JIRA.
>
>
>
>
> On Tue, Jun 28, 2016 at 8:59 AM, Thad Guidry  wrote:
>
>> Yes, you want updateOne()
>> https://docs.mongodb.com/manual/reference/method/db.collection.updateOne/
>> Not replaceOne()
>> https://docs.mongodb.com/manual/reference/method/db.collection.replaceOne/
>>
>> However, also look at the upsert option itself as well which does a
>> replace if NO MATCH
>> https://api.mongodb.com/java/3.2/com/mongodb/client/model/UpdateOptions.html
>> which basically prescribes to this kind of handling
>> https://docs.mongodb.com/manual/reference/method/db.collection.update/
>>
>>
>> Thad
>> +ThadGuidry <https://www.google.com/+ThadGuidry>
>>
>> On Tue, Jun 28, 2016 at 7:26 AM, Bryan Bende  wrote:
>>
>>> I'm not very familiar with MongoDB, but I can see that when using
>>> "update" mode PutMongo does this:
>>>
>>> else {
>>> // update
>>> final boolean upsert = context.getProperty(UPSERT).asBoolean();
>>> final String updateKey =
>>> context.getProperty(UPDATE_QUERY_KEY).getValue();
>>> final Document query = new Document(updateKey, doc.get(updateKey));
>>>
>>> collection.replaceOne(query, doc, new
>>> UpdateOptions().upsert(upsert));
>>> logger.info("updated {} into MongoDB", new Object[] { flowFile });
>>> }
>>>
>>> I'm wondering if that collection.replaceOne() is the problem, I see
>>> there is collection.updateOne() which sounds more correct here.
>>>
>>> If someone with more MongoDB experience could verify this I would be
>>> happy to open a JIRA and get this changed.
>>>
>>> -Bryan
>>>
>>> On Tue, Jun 28, 2016 at 5:32 AM, Asanka Sanjaya Herath <
>>> angal...@gmail.com> wrote:
>>>
>>>> Hi Bryan,
>>>>
>>>> Your suggestion was worked fine. I have another question. This is not
>>>> related to the subject, but it is related to PutMongo processor. How can I
>>>> use put mongo processor to add a new key value pair to an existing
>>>> document? The flow file contains the document object Id. I have set 'mode'
>>>> property to 'update' and 'upsert' property to false and 'update query key'
>>>> property to '_id'. Flow file content is something like this.
>>>>
>>>> {
>>>> _id:ObjectId(577216f0154b943fe8068079)
>>>> expired:true
>>>> }
>>>>
>>>> Without inserting the 'expired:true', it replaces the whole document
>>>> with the given one. So is there  a way to  insert the new key value pair to
>>>> collection without replacing the whole collection in MongoDB using putmongo
>>>> processor? Your concern regarding this is highly appreciated.
>>>>
>>>>
>>>>
>>>> On Mon, Jun 27, 2016 at 6:43 PM, Asanka Sanjaya Herath <
>>>> angal...@gmail.com> wrote:
>>>>
>>>>> Hi Bryan,
>>>>> Thank you for the input. That really helps. I'll try that.
>>>>>
>>>>> On Mon, Jun 27, 2016 at 6:31 PM, Bryan Bende  wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Right now AttributesToJson does treat everything as strings. I
>>>>>> remember a previous discussion about adding support for different types,
>>>>>> but I can't find a JIRA that references this.
>>>>>>
>>>>>> One option to work around this could be to use ReplaceText to
>>>>>> construct the JSON, instead of AttributesToJson. You could set the
>>>>>> Replacement Value property to something like:
>>>>>>
>>>>>> {
>>>>>>   "dataSourceId" : "${datasource}",
>>>>>>   "filename" : "${filename}",
>>>>>>   "sent_date" : ${sent_date},
>>>>>>   "uuid" : "${uuid}",
>>>>>>   "originalSource" : "${originalsource}"
>>>>>> }
>&g

Re: How to insert a ISO date recode using PutMongo processor?

2016-06-28 Thread Thad Guidry
Yes, you want updateOne()
https://docs.mongodb.com/manual/reference/method/db.collection.updateOne/
Not replaceOne()
https://docs.mongodb.com/manual/reference/method/db.collection.replaceOne/

However, also look at the upsert option itself as well which does a replace
if NO MATCH
https://api.mongodb.com/java/3.2/com/mongodb/client/model/UpdateOptions.html
which basically prescribes to this kind of handling
https://docs.mongodb.com/manual/reference/method/db.collection.update/


Thad
+ThadGuidry 

On Tue, Jun 28, 2016 at 7:26 AM, Bryan Bende  wrote:

> I'm not very familiar with MongoDB, but I can see that when using "update"
> mode PutMongo does this:
>
> else {
> // update
> final boolean upsert = context.getProperty(UPSERT).asBoolean();
> final String updateKey =
> context.getProperty(UPDATE_QUERY_KEY).getValue();
> final Document query = new Document(updateKey, doc.get(updateKey));
>
> collection.replaceOne(query, doc, new UpdateOptions().upsert(upsert));
> logger.info("updated {} into MongoDB", new Object[] { flowFile });
> }
>
> I'm wondering if that collection.replaceOne() is the problem, I see there
> is collection.updateOne() which sounds more correct here.
>
> If someone with more MongoDB experience could verify this I would be happy
> to open a JIRA and get this changed.
>
> -Bryan
>
> On Tue, Jun 28, 2016 at 5:32 AM, Asanka Sanjaya Herath  > wrote:
>
>> Hi Bryan,
>>
>> Your suggestion was worked fine. I have another question. This is not
>> related to the subject, but it is related to PutMongo processor. How can I
>> use put mongo processor to add a new key value pair to an existing
>> document? The flow file contains the document object Id. I have set 'mode'
>> property to 'update' and 'upsert' property to false and 'update query key'
>> property to '_id'. Flow file content is something like this.
>>
>> {
>> _id:ObjectId(577216f0154b943fe8068079)
>> expired:true
>> }
>>
>> Without inserting the 'expired:true', it replaces the whole document with
>> the given one. So is there  a way to  insert the new key value pair to
>> collection without replacing the whole collection in MongoDB using putmongo
>> processor? Your concern regarding this is highly appreciated.
>>
>>
>>
>> On Mon, Jun 27, 2016 at 6:43 PM, Asanka Sanjaya Herath <
>> angal...@gmail.com> wrote:
>>
>>> Hi Bryan,
>>> Thank you for the input. That really helps. I'll try that.
>>>
>>> On Mon, Jun 27, 2016 at 6:31 PM, Bryan Bende  wrote:
>>>
 Hello,

 Right now AttributesToJson does treat everything as strings. I remember
 a previous discussion about adding support for different types, but I can't
 find a JIRA that references this.

 One option to work around this could be to use ReplaceText to construct
 the JSON, instead of AttributesToJson. You could set the Replacement Value
 property to something like:

 {
   "dataSourceId" : "${datasource}",
   "filename" : "${filename}",
   "sent_date" : ${sent_date},
   "uuid" : "${uuid}",
   "originalSource" : "${originalsource}"
 }

 Of course using the appropriate attribute names.

 Another option is that in the upcoming 0.7.0 release, there is a new
 processor to transform JSON using JOLT. With that processor you may be able
 to take the output of AttributesToJson and apply a transform that converts
 the date field to remove the quotes.

 Hope that helps.

 -Bryan

 On Mon, Jun 27, 2016 at 8:16 AM, Asanka Sanjaya Herath <
 angal...@gmail.com> wrote:

> I'm trying to insert a flow file to MongoDb which has a date record as
> an attribute. First I sent that flow file through an attribute to JSON
> processor, so that all attributes are now converted to a Json document in
> flow file body. When I insert that flow file to Mongodb using PutMongo
> processor, it saves the "sent_date" attribute as a String. I want this to
> be saved as an ISO date object.
>
> My flow file looked like this.
>
> {
>   "dataSourceId" : "",
>   "filename" : "979f7bc5-a395-4396-9625-69fdb2c806c6",
>   "sent_date" : "Mon Jan 18 04:50:50 IST 2016",
>   "uuid" : "77a5ef56-8b23-40ee-93b5-78c6323e0e1c",
>   "originalSource" : "ImportedZip"
> }
>
> Then I prepend "ISODate" to "sent_date" attribute using another
> processor. So now my flow file content looks like this.
> {
>   "dataSourceId" : "",
>   "filename" : "979f7bc5-a395-4396-9625-69fdb2c806c6",
>   "sent_date" : "ISODate('Mon Jan 18 04:50:50 IST 2016')",
>   "uuid" : "77a5ef56-8b23-40ee-93b5-78c6323e0e1c",
>   "originalSource" : "ImportedZip"
> }
>
> But still It is saved as a string in MongoDB, because of the double
> quotation marks. Is there a way to remove those double quotation marks 
> when
> convert using AttributeToJson processor?
>
> Any h

Re: Processor Question

2016-06-06 Thread Thad Guidry
Hi Joe and others,

I see a high-level problem with searchable documentation for Processors in
two areas.

1.  The current new-processor-dialog shown during Add Processor has a
search (filter) capability, but it only searches through the Tags and Type,
not Property Names, PropertyDescriptor, or even AllowableValue's
description text.  That's a shame, because doing a search for "line" brings
up nothing.

Solution:  I think a more comprehensive Search/Filter that also searches
PropertyDescriptor and AllowableValue's would be much better.  Showing a
more expanded Add Processor dialog that also shows the description text for
the Properties and AllowableValues descriptions as well.

Wanted Position: Make the dialog show more complete info for a Processor
when a user clicks on a processor.

2. AllowableValue descriptions are not ideally displayed on the docs page,
but instead are hidden in a small blue ? circle next to each value title,
such as
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.RouteText/index.html
has Matching Strategy with AllowableValue of Satisfies Expression, but key
information about it having variables Line and LineNo are hidden completely
and should not be. Its part of the full documentation, and all
documentation should be shown to a user visiting the docs pages.

Solution:  Sub table the Allowable Values column on nifi-docs and add a
column for "Allowable Values description" next to each AllowableValue (just
as the source code does).

I have added these comments to JIRA issue for enhanced filter on Add
Processor here:
https://issues.apache.org/jira/browse/NIFI-1115

-Thad

Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>

On Mon, Jun 6, 2016 at 10:47 AM, Joe Percivall 
wrote:

> For number one, you can also use RouteText[1] with the matching strategy
> "Satisfies Expression". Then as a dynamic property use this expression
> "${lineNo:le(10)}". This will route first 10 lines to the "matched"
> relationship (assuming "Route to each matching Property Name" is not
> selected). This option also allows you to route those unmatched lines
> elsewhere if you need (if not just auto-terminate the "unmatched"
> relationship).
>
> The for number two, instead of ReplaceText, you could also use RouteText.
> Set the matching strategy to "Matches Regular Expression". Then set the
> dynamic property to match everything and end with "unambiguously" (an
> example being "((\w|\W)*unambiguously)"). This will route all the text that
> matches the Regex apart from the end of the file and gives you the option
> to route the ending text differently if needed.
>
> [1]
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.RouteText/index.html
>
>
> Joe- - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joeperciv...@yahoo.com
>
>
>
> On Sunday, June 5, 2016 4:41 AM, Leslie Hartman  wrote:
>
>
>
> Matthew:
>
> The modifyBytes processor would be the best if it would allow
>one to
> specify the bytes to keep. I could calculate the number of bytes to
>delete,
> but when I try and place a variable in the End Offset it says it is
>not in the
>   format.
>
> As for SegmentContent and SplitText I have tried both of these.
>The problem
> is that it just takes the original file a splits it in to a bunch of
>little files. So if I wanted
> say 256 Bytes of a 30 meg file, after running out of memory it would
>give me
> 125 Million 829 Thousand 119 Files to get rid of.
>
> For the 2nd case ReplaceText should work, I'm just having
>problems getting
> the correct syntax. If someone could provide an example of the
>correct syntax
> I would appreciate it.
>
> Thank You.
>
> Leslie Hartman
>
>
> Matthew Clarke wrote:
>
> You may also want to look at using the modifyBytes processor for number 1.
> >
> >On Jun 4, 2016 1:49 PM, "Thad Guidry"  wrote:
> >
> >For your 1st case, you can use either SegmentContent by your 256 bytes
> (or perhaps you can even use SplitText)
> >>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SegmentContent/index.html
> >>
> >>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitText/index.html
> >>
> >>
> >>
> >>For your 2nd case, you can use ReplaceText
> >>
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceText/index.html
> >>
> >>
> >>
> >>Thad
> >>+ThadGuidry
> >>
> >>
>


Re: Processor Question

2016-06-04 Thread Thad Guidry
For your 1st case, you can use either SegmentContent by your 256 bytes (or
perhaps you can even use SplitText)
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SegmentContent/index.html
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitText/index.html

For your 2nd case, you can use ReplaceText
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceText/index.html

Thad
+ThadGuidry 


Re: Which processor to use to cleanly convert xml to json?

2016-06-01 Thread Thad Guidry
Keith,

Hopefully you are aware of some of the pitfalls that you might run into
with that approach.  But it might be good enough for your particular use
case :)

>From org.json.XML

Convert a well-formed (but not necessarily valid) XML string into a
JSONObject. Some information may be lost in this transformation because
JSON is a data format and XML is a document format. XML uses elements,
attributes, and content text, while JSON uses unordered collections of
name/value pairs and arrays of values. JSON does not does not like to
distinguish between elements and attributes. Sequences of similar elements
are represented as JSONArrays. Content text may be placed in a "content"
member. Comments, prologs, DTDs, and <[ [ ]]> are ignored.

Thad
+ThadGuidry 


Re: Which processor to use to cleanly convert xml to json?

2016-06-01 Thread Thad Guidry
You can use the ExecuteScript processor with Groovy to easily slurp XML and
then build the Json.

http://stackoverflow.com/questions/23374652/xml-to-json-with-groovy-xmlslurper-and-jsonbuilder

http://funnifi.blogspot.com/2016/02/executescript-explained-split-fields.html

Thad
+ThadGuidry 


Re: Guidance for NiFi output streaming

2016-05-26 Thread Thad Guidry
Why use a processor to do the filtering work ?  Why filter at all ?  What
if you just kept flowing and updating ?

Why not just store the value into SQL or some database and perform an
Update using the device_id in the where clause ?

Choosing a database that supports JSON natively will let you query and get
JSON output from it, like PostgreSQL, MySQL, MongoDB, SQL Server, etc.

​(Or you could also explore storing the last 100 receives into the DB,
using a sequence generator that gets reset every X, and built with Groovy
and ExecuteScript processor, essentially having a rolling Last 100 that
gets continuously updated)​

Thad
+ThadGuidry 


Re: Guidance for NiFi output streaming

2016-05-26 Thread Thad Guidry
BTW, the idea previously is a Batching Processor, similar to what Spring
Batch and other Data Tools provide out of the box.  Not sure if NiFI
already has that concept in one of the processors , or if you have to
resort to Groovy or the ExecuteScript processor.

Thad
+ThadGuidry 


Re: FileSize

2016-05-18 Thread Thad Guidry
On Tue, May 17, 2016 at 10:20 AM, Joe Witt  wrote:

> Madhu,
>
> Absolutely.  You can use MergeContent, for example, to pack together a
> bunch of smaller files to create a larger bundle.  I'd recommend if
> you will bundle 10s of thousands or hundreds of thousands or more of
> things that you use two MergeContent processors together where the
> first one merges at most 10,000 items and then the second merges the
> next 10,000 or so.  Hope that helps
>
> Joe
>
>
​
​Hi Joe,

I think this is the kind of information that should be captured in a "Best
Practices with NiFi" within the Wiki.  How about I help with maintaining a
section like that.  (the nuances that are too long to explain in code with
a help tip within the tool, or gloss over in the official documentation
would go into that section.)

I signed up on your wiki, but looks like I need access to edit and create
pages ?  (btw, I also maintain the OpenRefine wiki)​

Thad
+ThadGuidry 


Re: QueryDatabaseTable errors

2016-05-12 Thread Thad Guidry
The odbc6.jar is in the classpath already ( dropped it into NiFi /lib)

The URL is not set because of that (and because it shows Non-Bold, so I
assume, optional if I already set in classpath)

Logs say:
Caused by: java.sql.SQLRecoverableException: IO Error: Invalid connection
string format, a valid format is: "//host[:port][/service_name]"

So there's my problem...I'll fix it and use the correct Oracle service
style syntax. Rather than sid.

Thanks Joe,

Thad
+ThadGuidry 


QueryDatabaseTable errors

2016-05-12 Thread Thad Guidry
I wired up a simple DBCP Connection Pool to my Oracle instance.

Added a QueryDatabaseTable processor and started it, but keep getting
errors (yes this is from DEBUG level...not very helpful):

QueryDatabaseTable[id=f8d7a846-7440-44ce-a683-d5dcb290d948]
QueryDatabaseTable[id=f8d7a846-7440-44ce-a683-d5dcb290d948] failed to
invoke @OnScheduled method due to java.lang.RuntimeException: Failed while
executing one of processor's OnScheduled task.; processor will not be
scheduled to run for 3 milliseconds: java.lang.RuntimeException: Failed
while executing one of processor's OnScheduled task.

[image: Inline image 1]

Thoughts ?

Thad
+ThadGuidry 


Re: nifi processor to parse+update the current json on the fly

2016-04-08 Thread Thad Guidry
Yeap, I think Informatica's DataStage plugins have that ability also, to
let the user know its not streaming, but filling and emptying, filling and
emptying.

Dunno about IBM's :)

Thad
+ThadGuidry 


Re: nifi processor to parse+update the current json on the fly

2016-04-08 Thread Thad Guidry
Frank's work utilizes the Jolt spec(Apache 2 license), which is a great way
to handle JsonToJson transforms in my opinion.

Jolt is not a good fit for Process or Rules, (Use Groovy or Java, etc), but
transforming Json in a great declarative way with Jolt beats the pants off
of anything else out there. Although its not stream based, and can consume
memory when your Json payload size is huge, like 300mb json files, etc, but
fine for most Json payloads in the wild.

"Two things to be aware of :

   1. Jolt is not "stream" based, so if you have a very large Json document
   to transform you need to have enough memory to hold it.
   2. The transform process will create and discard a lot of objects, so
   the garbage collector will have work to do.
   ​"​

A few more details about how it can be used are mentioned on its official
page here:
http://bazaarvoice.github.io/jolt/

A demo of Jolt to see how you can transform Json to Json (click the
Transform button):
http://jolt-demo.appspot.com/#ritwickgupta

Here's the rough performance of Jolt in 2013 where an 80k json file is
shifted in about 5 secs. (authors notes on this slide are interesting), :
https://docs.google.com/presentation/d/1sAiuiFC4Lzz4-064sg1p8EQt2ev0o442MfEbvrpD1ls/edit#slide=id.g9ac79e71_01

Thad
+ThadGuidry 


Re: nifi processor to parse+update the current json on the fly

2016-04-07 Thread Thad Guidry
Philippe,

I would encourage you to just use Groovy with JsonSlurper in the
ExecuteScript processor.  Its a blazing fast parser actually.

http://groovy-lang.org/json.html

http://docs.groovy-lang.org/latest/html/gapi/groovy/json/JsonSlurper.html

Thad
+ThadGuidry 


Re: Approaches to Array in Json with Nifi?

2016-04-06 Thread Thad Guidry
​Does it store as an attribute if you change the return type from
auto-detect to JSON ?

*Return Type*  auto-detect

   - auto-detect
   - json
   - scalar

Indicates the desired return type of the JSON Path expressions. Selecting
'auto-detect' will set the return type to 'json' for a Destination of
'flowfile-content', and 'scalar' for a Destination of 'flowfile-attribute'.​
​
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.EvaluateJsonPath/index.html
​

​Depending on your use case you might want to store those rate values
as content rather than attributesbut depends on what your trying to
accomplish.  You are not limited to changing your flow with just
EvaluateionJsonPath... you can probably add an additional processor after
it to do further filtering or setting of attributes, rather than trying to
just do it all in one shot with the EvaluateJsonPath processor.​

Thad
+ThadGuidry 


Re: Approaches to Array in Json with Nifi?

2016-04-06 Thread Thad Guidry
$.rate[*]

sorry, forget to tell you your answer.

Thad
+ThadGuidry 


Re: Approaches to Array in Json with Nifi?

2016-04-06 Thread Thad Guidry
Internally it uses Jayway JsonPath (which is a port of Stefan Goessner's
JsonPath)

There are many JsonPath online tests for you to use.

My preferred that does realtime updates as you type is
http://www.jsonquerytool.com/

Thad
+ThadGuidry 

On Wed, Apr 6, 2016 at 11:33 AM, Hong Li 
wrote:

> Let's say we have object and array in Json as
>
> {"company":"xyz", "rate":[0.02, 0.03, 0.04]}
>
>
> With Nifi processor EvaluateJsonPath, we may get the individual values of
> the object and array such as
>
> companyValue = $.company
> firstRateValue = $.rate[0]
>
>
> What would you recommend if I need to capture all of the three values of
> the rate?
>
> Thanks.
> Hong
>
>
> *Hong Li*
>
> *Centric Consulting*
>
> *In Balance*
> (888) 781-7567 office
> (614) 296-7644 mobile
> www.centricconsulting.com | @Centric 
>


Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Thad Guidry
Oh gosh your right...lol...to much OS mixing on my fingertips.

Let me try the build of the LZ4  jar again.

Thanks Matt,

Thad
+ThadGuidry 


Re: Having on processor block while another one is running

2016-03-30 Thread Thad Guidry
Also to note if your not plugged in to the Data Management industry...

Work Flow Orchestration is also sometimes called Process Management, where
there are specific tools and frameworks to deal with that scope on multiple
levels.
You may have heard of a specific Process Management called Business Process
Management (BPM) and there are other frameworks and tools that even help
with that. BPMN2 is a standard within that.  I myself use a framework
called Activiti http://activiti.org/

Thad
+ThadGuidry 


Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Thad Guidry
Evidently, on Windows the PATH environment variable should also have the
path to your native libraries, so that java.library.path can find them.

I added the path to my .so file to my PATH environment variableyet I
still get the error

2016-03-30 10:38:16,181 ERROR [Timer-Driven Process Thread-1]
o.apache.nifi.processors.hadoop.PutHDFS
java.lang.RuntimeException: native lz4 library not available
at
org.apache.hadoop.io.compress.Lz4Codec.getCompressorType(Lz4Codec.java:125)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.Lz4Codec.createOutputStream(Lz4Codec.java:87)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.nifi.processors.hadoop.PutHDFS$1.process(PutHDFS.java:279)
~[nifi-hdfs-processors-0.6.0.jar:0.6.0]

​Don't just engage Mr. Sulu ...stay for breakfast. :)​

Thad
+ThadGuidry 


Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Thad Guidry
Yes of course, on Windows 7 in my User environment variables, I set
LD_LIBRARY_PATH=C:\Program Files\Java\jdk1.8.0_74\jre\lib\amd64

which as the .so file

ENGAGE ! Mr Sulu :)​

Thad
+ThadGuidry 


Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Thad Guidry
Ah, that's really helpful.

Looks like I did have the native library in the built .jar file

\lz4-1.3-SNAPSHOT\net\jpountz\util\win32\amd64\liblz4-java.so

but placing that .so file in my C:\Program
Files\Java\jdk1.8.0_74\jre\lib\amd64 folder results in the same missing lz4
native NiFi errors

Ideas ?

Thad
+ThadGuidry 


Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Thad Guidry
My badthere is... in the app log...

2016-03-30 09:39:27,709 INFO [Write-Ahead Local State Provider Maintenance]
org.wali.MinimalLockingWriteAheadLog
org.wali.MinimalLockingWriteAheadLog@7615666e checkpointed with 8 Records
and 0 Swap Files in 69 milliseconds (Stop-the-world time = 6 milliseconds,
Clear Edit Logs time = 4 millis), max Transaction ID 23
2016-03-30 09:39:31,979 INFO [pool-16-thread-1]
o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile
Repository
2016-03-30 09:39:32,380 INFO [pool-16-thread-1]
org.wali.MinimalLockingWriteAheadLog
org.wali.MinimalLockingWriteAheadLog@174f0d06 checkpointed with 3 Records
and 0 Swap Files in 400 milliseconds (Stop-the-world time = 273
milliseconds, Clear Edit Logs time = 74 millis), max Transaction ID 9785
2016-03-30 09:39:32,380 INFO [pool-16-thread-1]
o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile
Repository with 3 records in 400 milliseconds
2016-03-30 09:39:32,523 ERROR [Timer-Driven Process Thread-9]
o.apache.nifi.processors.hadoop.PutHDFS
PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to HDFS
due to java.lang.RuntimeException: native lz4 library not available:
java.lang.RuntimeException: native lz4 library not available
2016-03-30 09:39:32,525 ERROR [Timer-Driven Process Thread-9]
o.apache.nifi.processors.hadoop.PutHDFS
java.lang.RuntimeException: native lz4 library not available
at
org.apache.hadoop.io.compress.Lz4Codec.getCompressorType(Lz4Codec.java:125)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.Lz4Codec.createOutputStream(Lz4Codec.java:87)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.nifi.processors.hadoop.PutHDFS$1.process(PutHDFS.java:279)
~[nifi-hdfs-processors-0.6.0.jar:0.6.0]
at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1807)
~[na:na]
at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1778)
~[na:na]
at
org.apache.nifi.processors.hadoop.PutHDFS.onTrigger(PutHDFS.java:270)
~[nifi-hdfs-processors-0.6.0.jar:0.6.0]
at
org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
[nifi-api-0.6.0.jar:0.6.0]
at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1057)
[nifi-framework-core-0.6.0.jar:0.6.0]
at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
[nifi-framework-core-0.6.0.jar:0.6.0]
at
org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
[nifi-framework-core-0.6.0.jar:0.6.0]
at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:123)
[nifi-framework-core-0.6.0.jar:0.6.0]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_74]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
[na:1.8.0_74]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_74]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[na:1.8.0_74]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_74]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
2016-03-30 09:39:34,273 ERROR [Timer-Driven Process Thread-5]
o.apache.nifi.processors.hadoop.PutHDFS
PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to HDFS
due to java.lang.RuntimeException: native lz4 library not available:
java.lang.RuntimeException: native lz4 library not available
2016-03-30 09:39:34,274 ERROR [Timer-Driven Process Thread-5]
o.apache.nifi.processors.hadoop.PutHDFS
java.lang.RuntimeException: native lz4 library not available
at
org.apache.hadoop.io.compress.Lz4Codec.getCompressorType(Lz4Codec.java:125)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.hadoop.io.compress.Lz4Codec.createOutputStream(Lz4Codec.java:87)
~[hadoop-common-2.6.2.jar:na]
at
org.apache.nifi.processors.hadoop.PutHDFS$1.process(PutHDFS.java:279)
~[nifi-hdfs-processors-0.6.0.jar:0.6.0]
at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1807)
~[na:na]
at
org.apache.nifi.con

Re: PutHDFS and LZ4 compression ERROR

2016-03-30 Thread Thad Guidry
Joe,

There is no more additional output, even when I set to DEBUG level.

09:36:32 CDT
ERROR
765efcb2-5ab0-4a72-a86f-71865dec264d

PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to
HDFS due to java.lang.RuntimeException: native lz4 library not
available: java.lang.RuntimeException: native lz4 library not
available

09:36:34 CDT
ERROR
765efcb2-5ab0-4a72-a86f-71865dec264d

PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to
HDFS due to java.lang.RuntimeException: native lz4 library not
available: java.lang.RuntimeException: native lz4 library not
available

09:36:35 CDT
ERROR
765efcb2-5ab0-4a72-a86f-71865dec264d

PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to
HDFS due to java.lang.RuntimeException: native lz4 library not
available: java.lang.RuntimeException: native lz4 library not
available


Thad
+ThadGuidry 


Re: PutHDFS and LZ4 compression ERROR

2016-03-29 Thread Thad Guidry
Sure Joe,

I'll get that for you tomorrow morning.


Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>

On Tue, Mar 29, 2016 at 10:00 PM, Joe Witt  wrote:

> Thad,
>
> Can you share the full stack trace that should be present in the log
> with that?  There is clearly a bit of Java code attempting to load the
> native library and unable to find it.  Placing the jar file in the
> classpath which contains the native library may well not be enough
> because loading the native libraries requires specific settings.
>
> Thanks
> Joe
>
> On Tue, Mar 29, 2016 at 12:24 PM, Thad Guidry 
> wrote:
> > I get an error:
> >
> > 13:04:51 CDT
> > ERROR
> > 765efcb2-5ab0-4a72-a86f-71865dec264d
> >
> > PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to HDFS
> due
> > to java.lang.RuntimeException: native lz4 library not available:
> > java.lang.RuntimeException: native lz4 library not available
> >
> > even though I built successfully LZ4 https://github.com/jpountz/lz4-java
> > for my Windows 7 64bit using Ant, Ivy, and Mingw-w64
> > and placed that built lz4-1.3-SNAPSHOT.jar into the nifi/lib folder
> > and where it is getting picked up by NiFi bootstrap.Command just fine.
> >
> > yet the error persists.
> >
> > I'm wondering if
> >
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/PutHDFS.java#L279
> >
> > might actually not be using the Java port but instead the JNI binding or
> > some such as described here
> > https://github.com/jpountz/lz4-java#implementations ?
> >
> > Thad
> > +ThadGuidry
>


PutHDFS and LZ4 compression ERROR

2016-03-29 Thread Thad Guidry
I get an error:

13:04:51 CDT
ERROR
765efcb2-5ab0-4a72-a86f-71865dec264d

PutHDFS[id=765efcb2-5ab0-4a72-a86f-71865dec264d] Failed to write to HDFS
due to java.lang.RuntimeException: native lz4 library not available:
java.lang.RuntimeException: native lz4 library not available

even though I built successfully LZ4 https://github.com/jpountz/lz4-java
for my Windows 7 64bit using Ant, Ivy, and Mingw-w64
and placed that built lz4-1.3-SNAPSHOT.jar into the nifi/lib folder
and where it is getting picked up by NiFi bootstrap.Command just fine.

yet the error persists.

I'm wondering if
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/PutHDFS.java#L279

might actually not be using the Java port but instead the JNI binding or
some such as described here
https://github.com/jpountz/lz4-java#implementations ?

Thad
+ThadGuidry 


Flowing from Local NiFi to a Remote HDFS

2016-03-28 Thread Thad Guidry
If I want to run a local NiFi instance that uses a PutHDFS processor to
flow to a remote HDFS...how is this accomplished ?  It seems as though the
PutHDFS is expecting both a NiFi service and HDFS service running on the
same machine ?  What if I don't want to run NiFi on my Hadoop cluster ?

Thad
+ThadGuidry 


Re: Re: Multiple dataflow jobs management(lots of jobs)

2016-03-13 Thread Thad Guidry
Yan,

Pentaho Kettle (PDI) can also certainly handle your needs. But using 10K
jobs to accomplish this is not the proper way to setup Pentaho.  Also,
using MySQL to store the metadata is where you made a wrong choice.
PostgreSQL with data silos on SSD drives would be a better choice, while
properly doing Async config [1] and other necessary steps for high writes.
Don't keep Pentaho's Table output commit levels at their default of 10k
rows when your processing millions of rows!) For Oracle 11g or PostgreSQL,
where I need 30 sec time slice windows for the metadata logging and where I
typically have less than 1k of data on average per row, I typically will
choose 200k rows or more in Pentaho's table output commit option.

I would suggest you contact Pentaho for some adhoc support or hire some
consultants to help you learn more, or setup properly for your use case.
For free, you can also just do a web search on "Pentaho best practices".
There's a lot to learn from industry experts who already have used these
tools and know their quirks.

[1]
http://www.postgresql.org/docs/9.5/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-ASYNC-BEHAVIOR


Thad
+ThadGuidry 

On Sat, Mar 12, 2016 at 11:00 AM, 刘岩  wrote:

> Hi Aldrin
>
> some additional information.
>
> it's a typical ETL offloading user case
>
> each extraction job should foucs on 1 table and 1 table only.  data will
> be written on HDFS , this is similar to Database Staging.
>
> The reason why we need to foucs on 1 table for each job is because there
> might be database error or disconnection occur during the extraction , if
> it's running as  a script like extraction job with expression langurage,
> then it's hard to do the re-running or excape thing on that table or tables.
>
> once the extraction is done, a triger like action will do the data
> cleansing.  this is similar to ODS layer of Datawarehousing
>
> if the data quality has passed the quality check , then it will be marked
> as cleaned. otherwise , it will return to previous step and redo the data
> extraction, or send alert/email to the  system administrator.
>
> if certain numbers of tables were all cleaned and checked , then it will
> call some Transforming  processor to do the transforming , then push the
> data into a datawarehouse (Hive in our case)
>
>
> Thank you very much
>
> Yan Liu
>
> Hortonworks Service Division
>
> Richinfo, Shenzhen, China (PR)
> 13/03/2016
>
> 邮件原文
> *发件人:*"刘岩" 
> *收件人:*users  
> *抄 送: *dev  
> *发送时间:*2016-03-13 00:12:27
> *主题:*Re:Re: Multiple dataflow jobs management(lots of jobs)
>
>
> Hi Aldrin
>
> Currently  we need to extract 60K tables per day , and the time window is
> limited to 8 Hours.  Which means that we need to run jobs concurrently ,
> and we need a general description of what's going on with all those 60K job
> flows and take further actions.
>
> We have tried Kettle and Talend ,  Talend is a IDE-Based so not what we
> are looking for,  and Kettle was crashed due to the Mysql cannot handle the
> Kettle's metadata with 10K jobs.
>
> So we want to use Nifi ,  this is really the product that we are looking
> for , but  the missing piece here is a DataFlow jobs Admin Page.  so we can
> have multiple Nifi instances running on different nodes, but monitoring the
> jobs in one page.  If it can intergrate with Ambari metrics API,  then we
> can develop an Ambari View for Nifi Jobs Monitoring just like HDFS View and
> Hive View.
>
>
> Thank you very much
>
> Yan Liu
>
> Hortonworks Service Division
>
> Richinfo, Shenzhen, China (PR)
> 06/03/2016
>
>
> 邮件原文
> *发件人:*Aldrin Piri  
> *收件人:*users 
> *抄 送: *dev  
> *发送时间:*2016-03-11 02:27:11
> *主题:*Re: Mutiple dataflow jobs management(lots of jobs)
>
> Hi Yan,
>
> We can get more into details and particulars if needed, but have you
> experimented with expression language?  I could see a Cron driven approach
> which covers your periodic efforts that feeds some number of ExecuteSQL
> processors (perhaps one for each database you are communicating with) each
> having a table.  This would certainly cut down on the need for the 30k
> processors on a one-to-one basis with a given processor.
>
> In terms of monitoring the dataflows, could you describe what else you are
> searching for beyond the graph view?  NiFi tries to provide context for the
> flow of data but is not trying to be a sole monitoring, we can give
> information on a processor basis, but do not delve into specifics.  There
> is a summary view for the overall flow where you can monitor stats about
> the components and connections in the system. We support interoperation
> with monitoring systems via push (ReportingTask) and pull (REST API [2])
> semantics.
>
> Any other details beyond your list of how this all interoperates might
> shed some more light on what you are trying to accomplish.  It seems like
> NiFi should be able to help with this.  With some additional information we
> may be ab

Re: Re: List Files

2016-03-04 Thread Thad Guidry
It should be currently possible to use the ExecuteScript processor (with
Groovy language) and set those needed user variables like Paths,etc using
whatever System environment variables you wish to use and retrieve later.

In Groovy scripts you can do this:

System.setProperty("socksProxyPort","2003")
System.setProperty("socksProxyHost","localhost")
System.setProperty("ourPathToHell","DonaldTrump")

Thad
+ThadGuidry 


Mapping from JMS -> JSON -> SQL

2016-01-30 Thread Thad Guidry
Any best practices on how to map fields from a JMS source into JSON (for
some validation stuff perhaps with ExecuteScript and Groovy) and finally
into MySQL with NiFi ?

How do folks usually do field mapping in NiFi between an input and output ,
in general ?

Thad
+ThadGuidry