Re: Keep Files

2015-11-16 Thread Adam Taft
Oooh, neat idea Salvatore.  +1 to creativity.  Really interesting.

Adam

On Mon, Nov 16, 2015 at 6:25 AM, Salvatore Papa 
wrote:

> If you're on a linux system, a alternative i've used in the past is to
> create another directory, full of symlinks pointing to the original
> directory.
>
> As an example, assuming you have a directory: /data/input_files/ full of
> files, create a directory /data/input_links/, and from that new directory,
> do: "ln -s ../input_files/* ./"
>
> Now in NiFi, use the original GetFile processor, configured with
> /data/input_links/, and set Keep Source File to False. When the GetFile
> processor picks up the file, it'll read the contents and create a flowfile
> by following the symlink, delete the symlink, and the original file will
> remain in /data/input_files.
>
> On Mon, Nov 16, 2015 at 12:00 PM, Adam Taft  wrote:
>
> > Also, as a potential work-around, it's possible to use GetFile with
> > "delete" mode and then somewhere in your flow, use PutFile to place the
> > file back down into a "complete" directory.  i.e. something like:
> >
> > /path/incoming  <- use GetFile to pick up files here
> > /path/complete  <- use PutFile to place files here after processing
> >
> > As a variation of the above, if you need the files consistently in the
> same
> > directory, you could configure GetFile to only pick up certain file
> > patterns.  In this way, you could rename a file after it has been
> > processed:
> >
> > /path/incoming  <- use GetFile to pick up files named $filename.new
> > /path/incoming  <- rename file (using UpdateAttribute) to
> > $filename.complete and use PutFile to place files here after rename
> >
> > Hope that gives you some possible alternatives.
> >
> > Adam
> >
> >
> >
> > On Sat, Nov 14, 2015 at 10:49 AM, Mark Petronic 
> > wrote:
> >
> > > Keep, yes, There is a parameter to configure that. Read once. No. But
> > there
> > > is a set of processors in the works to address that. ListFile and
> > > FetchFile. ListFile will return the list of files that have changed
> since
> > > the last time the files were read - it is stateful. FetchFile can then
> > take
> > > a list and fetch them, and I would assume it would have a parameter for
> > > keep= like GetFile. Not sure of the status of the changes -
> have
> > > not checked recently but see:
> > > https://issues.apache.org/jira/browse/NIFI-631
> > >
> > > Mark
> > >
> > > On Fri, Nov 13, 2015 at 8:55 AM, plj  wrote:
> > >
> > > > Is there a way for GetFile to not delete a file but only read it
> > once?  I
> > > > have a directory with files in it.  I only want the new files that
> are
> > > > added
> > > > to the to be processed.  It seems that if I set GetFile to not delete
> > the
> > > > files, the same files get read over and over.
> > > >
> > > >
> > > > thoughts?
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://apache-nifi-developer-list.39713.n7.nabble.com/Keep-Files-tp4864.html
> > > > Sent from the Apache NiFi Developer List mailing list archive at
> > > > Nabble.com.
> > > >
> > >
> >
>


Re: Release wrangling: 1 week until our hopeful 0.4.0 release

2015-11-16 Thread Joe Percivall
Per NiFi-1165: a discussion is occurring on the ticket: 
https://issues.apache.org/jira/browse/NIFI-1165


Overall, most of the issues are identified and pending a fix from Mark, Oleg 
and I. The issues were encountered on two different windows 8 machines by me 
and on windows 2012 R2 by Mark. My configuration is maven 3.3.3 and Java 
1.8.0_45 (on the machine I have in front of me). 

Should have a patch resolving the issues in the next couple days.

Joe

- - - - - - Joseph Percivall
linkedin.com/in/Percivall
e: joeperciv...@yahoo.com




On Monday, November 16, 2015 10:42 AM, Sean Busbey  wrote:



re: NIFI-1165 I also have a windows 7 laptop I can test on. (though it is
low power)


On Sun, Nov 15, 2015 at 10:48 AM, Aldrin Piri  wrote:

> I have another set of eyes for NIFI-748.  Will do so now.
>
> On Sun, Nov 15, 2015 at 10:39 AM, Tony Kurc  wrote:
>
> > For those not watching commits@nifi
> > I need another set of eyes on the review for NIFI-748
> >
> > On Sun, Nov 15, 2015 at 8:33 AM, Joe Witt  wrote:
> >
> > > NIFI-1082 (this should move to next release unless a resolution is
> > > imminent)
> > >
> > > NIFI-1108 (move to next release)
> > >
> > > NIFI-1139 (recommend moving to 0.5.0)
> > >
> > > NIFI-1164 (this should get fixed now - it makes builds unreliable)
> > >
> > > NIFI-1165 (should tackle now.  have a windows laptop i can build on)
> > >
> > > Thanks for pushing tony.
> > >
> > > On Sun, Nov 15, 2015 at 8:18 AM, Tony Kurc  wrote:
> > > > Update:
> > > >
> > > > Presumably fixed by NIFI-1086 (Joe Percivall). Reviewed, awaiting
> > > revision
> > > > NIFI-61
> > > > NIFI-812
> > > > NIFI-980
> > > > NIFI-1009
> > > > NIFI-1086
> > > > NIFI-1133
> > > >
> > > > Multiple Auths (Matt Gilman) no patch yet, making progress
> > > > NIFI-655
> > > >
> > > > Provenance Search Improvement (Oleg Zhurakousky) PR in, being
> reviewed
> > by
> > > > Tony Kurc
> > > > NIFI-748
> > > >
> > > > Create a Getting Started Guide (Mark Payne) Review complete, being
> > merged
> > > > in by Tony Kurc
> > > > NIFI-973
> > > >
> > > > Line ending fix (Tony Kurc) Finger hovering over "go" button
> > > > NIFI-1054
> > > >
> > > > ExecuteStreamCommand (Joe Percivall). Reviewed, awaiting revision
> > > > NIFI-1081
> > > >
> > > > Provenance repository search (Mark Payne) original patch reverted,
> new
> > > > patch in development? Move to 0.5.0?
> > > > NIFI-1082
> > > >
> > > > Scrub code looking for @InputRequirement consistency (Mark Payne) -
> not
> > > > sure how to attack this one
> > > > NIFI-1108
> > > >
> > > > * NEW *
> > > > LogAttribute processor fix (Oleg Zhurakousky) - trivial fix? but
> > breaking
> > > > change? I recommend moving to 0.5.0
> > > > NIFI-1139
> > > >
> > > > * NEW *
> > > > Race condition Fix (Oleg Zhurakousky) - assigned but no patch. Move
> to
> > > > 0.5.0 or 0.4.1?
> > > > NIFI-1164
> > > >
> > > > * NEW *
> > > > Build on windows failing (Joe Percivall) - assigned but no patch. I
> can
> > > dig
> > > > in. (I submitted a requst for MSDN, but can take 8 weeks)
> > > > NIFI-1165
> > > >
> > > >
> > > >
> > > > On Fri, Nov 13, 2015 at 10:22 AM, Aldrin Piri 
> > > wrote:
> > > >
> > > >> Scanned through and removed the 0.4.0 tagging for State Management.
> > > >>
> > > >> Thanks for the suggestion.
> > > >>
> > > >> On Fri, Nov 13, 2015 at 10:10 AM, Sean Busbey 
> > > wrote:
> > > >>
> > > >> > Has anyone had a chance to do a pass through Feature Proposals to
> > move
> > > >> out
> > > >> > any that aren't going to make 0.4.0?
> > > >> >
> > > >> >
> > >
> https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals
> > > >> >
> > > >> > On Thu, Nov 12, 2015 at 8:13 AM, Tony Kurc 
> > wrote:
> > > >> >
> > > >> > > https://issues.apache.org/jira/browse/NIFI-61 - awaiting an
> > answer
> > > >> > before
> > > >> > > patch can be completed
> > > >> > > https://issues.apache.org/jira/browse/NIFI-655 - Based on
> feature
> > > >> branch
> > > >> > > activity, is close?
> > > >> > > https://issues.apache.org/jira/browse/NIFI-696 - awaiting a
> patch
> > > >> > marking
> > > >> > > method as deprecated (assigned to me, but if someone else wants
> to
> > > take
> > > >> > it
> > > >> > > and I review, thats cool too)
> > > >> > > https://issues.apache.org/jira/browse/NIFI-812 -  a bit
> confused
> > > about
> > > >> > > this
> > > >> > > one. patch in NIFI-1086 will close this?
> > > >> > > https://issues.apache.org/jira/browse/NIFI-973 - awaiting
> review?
> > > >> > > https://issues.apache.org/jira/browse/NIFI-980 - (see 812
> > > confusion)
> > > >> > > presumably closed when NIFI-1086 is closed
> > > >> > > https://issues.apache.org/jira/browse/NIFI-1009 (same!)
> > > >> > > https://issues.apache.org/jira/browse/NIFI-1054 I'm at the
> ready
> > to
> > > >> > submit
> > > >> > > a patch at the 11th hour. 

[GitHub] nifi pull request: NIFI-1146 Allow GetKafka to be configured with ...

2015-11-16 Thread naveenmadhire
Github user naveenmadhire commented on the pull request:

https://github.com/apache/nifi/pull/125#issuecomment-157086151
  
Closing the pull request. As I've messed up with the commits. I will open a 
new one soon. Sorry for the trouble.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1146 Allow GetKafka to be configured with ...

2015-11-16 Thread naveenmadhire
Github user naveenmadhire closed the pull request at:

https://github.com/apache/nifi/pull/125


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-748 Fixed logic around handling partial qu...

2015-11-16 Thread olegz
Github user olegz commented on the pull request:

https://github.com/apache/nifi/pull/123#issuecomment-157029989
  
@trkurc @joewitt @apiri 
Guys, please see the latest commit. Didn't squash it, so its easier to read 
and see what's been addressed. In summary:
1. Since based on the latest comment from Joe it appears that we all agree 
that DocReader is not really public, i kept the dead constructor out and also 
made DocReader package private.
2. Based on Tony's point added Document sorting logic back. At least it 
will ensure that previous behavior is maintained. 
See commit message for more details  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1164 decreased the chances of race conditi...

2015-11-16 Thread olegz
GitHub user olegz opened a pull request:

https://github.com/apache/nifi/pull/126

NIFI-1164 decreased the chances of race condition

Removed checks for 'if (getState() != ControllerServiceState.DISABLED)’ 
from StandardControllerServiceNode.verifyCanEnable(..) operations based on the 
discussion that we had in NIFI-1143 where ‘enablable’ service is the one 
that is not ENABLED or ENABLING.
On top of that the actual state check is redundant since it is going  to be 
checked again when isValid() is invoked.
Cleaned up the code in 
StandardControllerServiceProvider.enableReferencingServices(..) since:
1. It had the same check ordering issue on service state between ENABLING 
and ENABLED as was described in NIFI-1143.
2. Removed redundant recursiveReferences computation
3. There was two loops iterating over the same collection, so merged that 
into one
4. Removed redundant state check in the loop since it would be checked 
again as part of 'verifyCanEnable'

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/olegz/nifi NIFI-1164

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/126.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #126


commit 54c5c0397c6d45d34c7c75e5fe44984dcb765ea4
Author: Oleg Zhurakousky 
Date:   2015-11-16T18:18:43Z

NIFI-1164 decreased the chances of race condition
Removed checks for 'if (getState() != ControllerServiceState.DISABLED)’ 
from StandardControllerServiceNode.verifyCanEnable(..) operations based on the 
discussion that we had in NIFI-1143 where ‘enablable’ service is the one 
that is not ENABLED or ENABLING.
On top of that the actual state check is redundant since it is going  to be 
checked again when isValid() is invoked.
Cleaned up the code in 
StandardControllerServiceProvider.enableReferencingServices(..) since:
1. It had the same check ordering issue on service state between ENABLING 
and ENABLED as was described in NIFI-1143.
2. Removed redundant recursiveReferences computation
3. There was two loops iterating over the same collection, so merged that 
into one
4. Removed redundant state check in the loop since it would be checked 
again as part of 'verifyCanEnable'




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1146 Allow GetKafka to be configured with ...

2015-11-16 Thread markap14
Github user markap14 commented on the pull request:

https://github.com/apache/nifi/pull/125#issuecomment-157140757
  
@naveenmadhire no trouble at all :) Looking forward to the new pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1107 - Create new PutS3ObjectMultipart pro...

2015-11-16 Thread jskora
Github user jskora commented on a diff in the pull request:

https://github.com/apache/nifi/pull/121#discussion_r44967382
  
--- Diff: 
nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/PutS3ObjectMultipart.java
 ---
@@ -0,0 +1,550 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.nifi.processors.aws.s3;
+
+import com.amazonaws.AmazonClientException;
+import com.amazonaws.services.s3.AmazonS3;
+import com.amazonaws.services.s3.AmazonS3Client;
+import com.amazonaws.services.s3.model.AccessControlList;
+import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest;
+import com.amazonaws.services.s3.model.CompleteMultipartUploadResult;
+import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest;
+import com.amazonaws.services.s3.model.InitiateMultipartUploadResult;
+import com.amazonaws.services.s3.model.ObjectMetadata;
+import com.amazonaws.services.s3.model.PartETag;
+import com.amazonaws.services.s3.model.StorageClass;
+import com.amazonaws.services.s3.model.UploadPartRequest;
+import com.amazonaws.services.s3.model.UploadPartResult;
+import org.apache.nifi.annotation.behavior.DynamicProperty;
+import org.apache.nifi.annotation.behavior.InputRequirement;
+import org.apache.nifi.annotation.behavior.InputRequirement.Requirement;
+import org.apache.nifi.annotation.behavior.ReadsAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttribute;
+import org.apache.nifi.annotation.behavior.WritesAttributes;
+import org.apache.nifi.annotation.documentation.CapabilityDescription;
+import org.apache.nifi.annotation.documentation.SeeAlso;
+import org.apache.nifi.annotation.documentation.Tags;
+import org.apache.nifi.components.PropertyDescriptor;
+import org.apache.nifi.flowfile.FlowFile;
+import org.apache.nifi.flowfile.attributes.CoreAttributes;
+import org.apache.nifi.processor.DataUnit;
+import org.apache.nifi.processor.ProcessContext;
+import org.apache.nifi.processor.ProcessSession;
+import org.apache.nifi.processor.exception.ProcessException;
+import org.apache.nifi.processor.io.InputStreamCallback;
+import org.apache.nifi.processor.util.StandardValidators;
+import org.apache.nifi.stream.io.BufferedInputStream;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.concurrent.TimeUnit;
+
+@SeeAlso({FetchS3Object.class, PutS3Object.class, DeleteS3Object.class})
+@InputRequirement(Requirement.INPUT_REQUIRED)
+@Tags({"Amazon", "S3", "AWS", "Archive", "Put", "Multi", "Multipart", 
"Upload"})
+@CapabilityDescription("Puts FlowFiles to an Amazon S3 Bucket using the 
MultipartUpload API method.  " +
+"This upload consists of three steps 1) initiate upload, 2) upload 
the parts, and 3) complete the upload.\n" +
+"Since the intent for this processor involves large files, the 
processor saves state locally after each step " +
+"so that an upload can be resumed without having to restart from 
the beginning of the file.\n" +
+"The AWS libraries default to using standard AWS regions but the 
'Endpoint Override URL' allows this to be " +
+"overridden.")
+@DynamicProperty(name = "The name of a User-Defined Metadata field to add 
to the S3 Object",
+value = "The value of a User-Defined Metadata field to add to the 
S3 Object",
+description = "Allows user-defined metadata to be added to the S3 
object as key/value pairs",
+supportsExpressionLanguage = true)
+@ReadsAttribute(attribute = "filename", description = "Uses the FlowFile's 
filename as the filename for 

[GitHub] nifi pull request: NIFI-1146 Allow GetKafka to be configured with ...

2015-11-16 Thread naveenmadhire
GitHub user naveenmadhire opened a pull request:

https://github.com/apache/nifi/pull/127

NIFI-1146  Allow GetKafka to be configured with auto.offset.reset to 
"largest" or "smallest"

Pull request with changes. 
@markap14 I removed the writeAttributes ones, since there is no need to 
write the auto.offset to the flowfile attribute. I also modified the 
description. Please check to see if the description is fine,

Screenshots after the build,


![image](https://cloud.githubusercontent.com/assets/8851548/11192575/7f28294a-8c67-11e5-9800-7c85af4ce038.png)



![image](https://cloud.githubusercontent.com/assets/8851548/11192584/87e334c6-8c67-11e5-8ec6-ada91bf6f834.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/naveenmadhire/nifi NIFI-1146

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nifi/pull/127.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #127


commit b954ca620e619e7961e6a7a58122b844f30862da
Author: Naveen Madhire 
Date:   2015-11-16T17:59:52Z

NIFI-1146 Allow GetKafka to be configured with auto.offset.reset to largest 
or smallest

commit 03a54bf2d593e07ab602f6a9425d0231a273ba5a
Author: Naveen Madhire 
Date:   2015-11-16T19:32:17Z

Changes after review




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1146 Allow GetKafka to be configured with ...

2015-11-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nifi/pull/127


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1146 Allow GetKafka to be configured with ...

2015-11-16 Thread markap14
Github user markap14 commented on the pull request:

https://github.com/apache/nifi/pull/127#issuecomment-157179517
  
@naveenmadhire - code looks good. Builds without problem, and testing on my 
Kafka instance shows the expected results. Nice work! And thanks for the 
contribution. On behalf of the NiFi community, let me welcome you as our newest 
contributor!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-748 Fixed logic around handling partial qu...

2015-11-16 Thread markap14
Github user markap14 commented on the pull request:

https://github.com/apache/nifi/pull/123#issuecomment-157182575
  
@trkurc  Personally, I have exactly 0 qualms about changing it to package 
private. If I choose to take some random util class from a release of Apache 
Tomcat, for example, and depended on it, I should certainly not be surprised if 
that class changes from release to release (including incremental releases) - I 
don't think this is any different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1164 decreased the chances of race conditi...

2015-11-16 Thread olegz
Github user olegz closed the pull request at:

https://github.com/apache/nifi/pull/126


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1164 decreased the chances of race conditi...

2015-11-16 Thread olegz
Github user olegz commented on the pull request:

https://github.com/apache/nifi/pull/126#issuecomment-157173562
  
Pulling back, see comments in JIRA


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: EvaluateJsonPath error: Unable to return a scalar value for the expression

2015-11-16 Thread Aldrin Piri
The documentation is a little unclear and light.  Made a ticket [1] to
clarify how these properties are interpreted.

Thanks!

[1] https://issues.apache.org/jira/browse/NIFI-1177

On Mon, Nov 16, 2015 at 3:52 PM, Sumanth Chinthagunta 
wrote:

> Thanks Aldrin.
> it works after I changed Return Type to JSON.
>
> > On Nov 16, 2015, at 12:47 PM, Aldrin Piri  wrote:
> >
> > Sumo,
> >
> > The scalar option has the processor looking for the resultant value of
> the
> > expression to provide a non-Map/List representation of the targeted
> > expression.  In this case, if you change the property to json, it should
> > work as anticipated.  The property itself is more of a validation of the
> > data that is being extracted (in that it is an object/array or a simple
> > value).
> >
> > On Mon, Nov 16, 2015 at 3:20 PM, Sumanth Chinthagunta  >
> > wrote:
> >
> >> I am trying to extract data into   attribute using EvaluateJsonPath.
> when
> >> what JsonPath return complex type, I am getting error: Unable to return
> a
> >> scalar value for the expression $['data'] for FlowFile 152. Evaluated
> value
> >> was {id=1…..}. Transferring to failure
> >>
> >> data  -   $.data  <—  Error
> >> id  -  $.data.id   <— works
> >> {
> >>"database": "test”,
> >>"table": "guests”,
> >>"type": "insert”,
> >>"ts": 1446422524,
> >>"xid": 1800,
> >>"commit": true,
> >>"data": {
> >>"reg_date": "2015-11-02 00:02:04",
> >>"firstname": "sumo",
> >>"id": 1,
> >>"lastname": "demo"
> >>}
> >> }
> >>
> >> if it possible to extract JSON object from FlowFile using
> EvaluateJsonPath?
> >> if not please advice what options I have.
> >>
> >> Thanks
> >> Sumo
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
>
>


[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...

2015-11-16 Thread trkurc
Github user trkurc commented on the pull request:

https://github.com/apache/nifi/pull/116#issuecomment-157213254
  
@jskora what would you expect the following unit tests to do?

```
@Test
public void testInvalidRegex() {
final TestRunner runner = TestRunners.newTestRunner(new 
UpdateAttribute());
runner.setProperty("Delete Attributes Expression", "(");

final Map attributes = new HashMap<>();
attributes.put("attribute.1", "value.1");

runner.enqueue(new byte[0], attributes);

runner.run();

}

@Test
public void testInvalidRegexInAttribute() {
final TestRunner runner = TestRunners.newTestRunner(new 
UpdateAttribute());
runner.setProperty("Delete Attributes Expression", "${butter}");

final Map attributes = new HashMap<>();
attributes.put("butter", "(");

runner.enqueue(new byte[0], attributes);

runner.run();

}

```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...

2015-11-16 Thread trkurc
Github user trkurc commented on the pull request:

https://github.com/apache/nifi/pull/116#issuecomment-157218572
  
I think what I'm getting at is that I think having a property that requires 
the expression to evaluate to a valid regex might mean it is time for a failure 
relationship for this processor (a breaking change?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Common scheduler and add-hock thread creation

2015-11-16 Thread Oleg Zhurakousky
Taking liberties - so let me throw few example. I am sure you’d agree that 
Thread creation and management is an expensive and hard and error prone, hence 
new java.util.concurrent and all the goodies in it. 
- There is a patch currently in the queue where there is a creation of new 
Thread() and then starting it. Is it necessary? Could we reuse the thread from 
the common pool?
- We have many places where we have Thread.sleep(..) and in fact do sleep 
considerable amount of time. That thread lays dormant where it could actually 
be doing something. Is it necessary?

Cheers
Oleg


> On Nov 16, 2015, at 7:52 PM, Tony Kurc  wrote:
> 
> the issue with a best practices guide on this subject is it will be
> dominated by edge cases. The common case should be "don't produce any
> threads".
> 
> That being said, I commented on a jira somewhere about LinkedBlockingQueues
> used in so many producer/consumer style processors and possibly needing a
> library to have some consistency in using those queues in a consistent
> thread safe manner.
> 
> Also, I'm not quite sure of what you mean by taking liberties?
> 
> 
> 
> 
> 
> 
> On Mon, Nov 16, 2015 at 7:39 PM, Oleg Zhurakousky <
> ozhurakou...@hortonworks.com> wrote:
> 
>> Guys
>> 
>> I am noticing many modules where we have things like "new
>> Thread(..).start()”, creation of new executors and schedulers,
>> Thread.sleep(..)  etc.,. I am sure many would agree that taking such
>> liberties with Threads will have consequences (not IF but WHEN)
>> On several threads several of us mentioned a “must read” for anyone who is
>> getting into concurrent code -
>> http://ptgmedia.pearsoncmg.com/images/9780321349606/samplepages/9780321349606.pdf
>> and indeed we can/should definitely grab some best practices from this book.
>> 
>> At least we can start from what’s our strategy around thread management
>> for NAR developers? Basically should/should not a user create Threads,
>> Executors, Schedulers etc.
>> 
>> Cheers
>> Oleg
>> 



[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...

2015-11-16 Thread jskora
Github user jskora commented on the pull request:

https://github.com/apache/nifi/pull/116#issuecomment-157265254
  
Interesting.  Before I enabled EL support, it had a 
REGULAR_EXPRESSION_VALIDATOR which would have caught those.  Is there a way to 
validate a property that supports expression language to verify that the output 
will be an attribute key?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...

2015-11-16 Thread apiri
Github user apiri commented on the pull request:

https://github.com/apache/nifi/pull/116#issuecomment-157265218
  
@trkurc I am onboard with what you are driving toward. I had created an 
issue around the same thought process. 
https://issues.apache.org/jira/browse/NIFI-813

The core issue is that EL, when introduced, provided ways for processors to 
fail in ways previously unanticipated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...

2015-11-16 Thread trkurc
Github user trkurc commented on the pull request:

https://github.com/apache/nifi/pull/116#issuecomment-157266419
  
@jskora as @apiri pointed out, i think that we may have to live with the 
problem until NIFI-813 is resolved. I tried wrapping my head around a validator 
that handled the first test case, and that hurt. The second one didn't hurt, 
because a validator is too early to catch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...

2015-11-16 Thread trkurc
Github user trkurc commented on the pull request:

https://github.com/apache/nifi/pull/116#issuecomment-157266895
  
@apiri you have some history with this - what is a greater evil, no el 
support or possibly breaking? this is already a processor that appears to have 
a "beware of el" sign up already? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Common scheduler and add-hock thread creation

2015-11-16 Thread Joe Witt
So back in the day...

Here is the thought process behind how it works today at a high level
and taking some generalities.  Developers of extensions, and that
primarily means processors, begin process sessions.  In a process
session a processor can access, create, destroy zero or more flow
files and route them to relationships.  They do not dictate how often
they run or when they run.  The Flow Controller does that.  When it
decides to invoke them it does so by calling the appropriate method.
The thread given in that call is the thread they can use to operate on
that process session.  When they're done with that session be a good
behaved entity and give the thread back to the controller.  That is
it.  They have no control over threads because generally they don't
need them.

Now, some processors are special and they may be written by a
developer that needs greater control of their own threading model,
like web servers for instance.  That is ok but it is also outside of
what is described above.  It is really 'in addition to' what is
described above.  The framework supported path for dealing with
FlowFiles (which is what NiFi is for) is only as above.  It is 'ok'
for these special cases but so far nothing practical has risen to the
level of it needing a framework resolution.  There have been glimmers
but nothing that has really shown to need a resolution as far as
threading goes.  We've considered having different managed thread
pools and then operators could assign a given component on the flow to
those pools.  This way they can preserve a pool for 'sources' vs
'mid-stream' vs 'delivery' processors for example.  Again, this never
reached the level of needing a framework solution.

There have also been cases where folks want to have processors operate
and they do not do *anything* with FlowFiles at all.  These are for
what is known as the 'NiFi-As-A-Fancy-Cron' tool pattern.  We don't
need to support this one.

Now I can definitely conceive of ways to build processors or flows
which will create difficulty in NiFi.  I am ok with that personally.

Thanks
Joe


On Mon, Nov 16, 2015 at 8:50 PM, Oleg Zhurakousky
 wrote:
> Tony, thanks for your input. At least we have some discussion going. See in 
> line for the rest.
>
>> On Nov 16, 2015, at 8:22 PM, Tony Kurc  wrote:
>>
>> so, I believe threads in a processor in nifi are much, much easier than
>> general threading in many other applications. There are defined boundaries
>> on when a processor is built and torn down. Pretty much any state in the
>> middle is up to the processor. you know when resources need to be stood up.
>> you know when they need to be torn down.
> Generally true and I’d agree there is not much one can do to stop users doing 
> what they wan to do regardless of how damaging it may be to the rest of the 
> system
>>
>> Because threads have a localized scope, I'm not sure a global pool would be
>> a help. If a processor needs higher throughput or shorter latency, now, the
>> problem is generally isolated and there is a nice little cream center to
>> optimize. If you're blocked on a global pool of threads because some other
>> processor consumed all the threads in a pool, well, suddenly, your
>> performance is no longer a localized problem.
>>
> This argument is argumentative ;)
> 1. What if I’ve saturated all my cores in my localized Processor’s thread 
> pool with things like while (true){}? Then it really doesn’t matter what the 
> rest of the framework does, the system is hosed. So blockage in this case 
> comes from let’s just call it malicious processor and not global thread pool. 
> So, in the end its a bit of a general discipline question ;)
> 2. So in this case one of the best practices could be taken right from 
> Brian’s book that states that tasks should be as short lived as possible. Any 
> repeats and  retries, should be handled by rerunning/rescheduling a task 
> instead of spinning in the loop inside of task. So with global Scheduler 
> exposed via context or something that each Processor, Service etc. sees we 
> can have a shared Thread pool. We can even have ControllerService as 
> ThreadPools.
> Yes, that would take some serious code review and general discipline from the 
> developers but the benefit would be proportional as well.
>
>> because the common case is "don't use threads" (not everyone is going to
>> build a complex service, contribute to the core framework or need threads
>> in their processor) I actually think code review is a good way to shake out
>> some poor decisions. because optimizing the threads in a processor for a
>> use case a specialized task (the processor writer knows the critical
>> sections and bottlenecks), I'm not sure whether there are massive strides
>> that can be made, but I could be wrong. And we'll always have a weird edge
>> case of some library that wants to do threads its own way that we're trying
>> to integrate.
>>
>> My guess is a lot of 

[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...

2015-11-16 Thread jskora
Github user jskora commented on the pull request:

https://github.com/apache/nifi/pull/116#issuecomment-157272302
  
@trkurc, this is not a problem fix but an enhancement based on discussion 
that occurred on [NIFI-641|https://issues.apache.org/jira/browse/NIFI-641].  It 
seemed easy enough, but the validation issues probably out weigh the value.

Postponing until [NIFI-813|https://issues.apache.org/jira/browse/NIFI-813] 
provides a means to handle potential errors, but if mixing Regex and Expression 
Language can only be handled with a Failure relationships, maybe it's better to 
not mix the them and allow separate properties so the Regex can be validated 
and the EL expression can produce a list of attributes that can be ignored if 
they don't exist.   Thoughts?

 until they can be added without adding risk.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Keep Files

2015-11-16 Thread Salvatore Papa
If you're on a linux system, a alternative i've used in the past is to
create another directory, full of symlinks pointing to the original
directory.

As an example, assuming you have a directory: /data/input_files/ full of
files, create a directory /data/input_links/, and from that new directory,
do: "ln -s ../input_files/* ./"

Now in NiFi, use the original GetFile processor, configured with
/data/input_links/, and set Keep Source File to False. When the GetFile
processor picks up the file, it'll read the contents and create a flowfile
by following the symlink, delete the symlink, and the original file will
remain in /data/input_files.

On Mon, Nov 16, 2015 at 12:00 PM, Adam Taft  wrote:

> Also, as a potential work-around, it's possible to use GetFile with
> "delete" mode and then somewhere in your flow, use PutFile to place the
> file back down into a "complete" directory.  i.e. something like:
>
> /path/incoming  <- use GetFile to pick up files here
> /path/complete  <- use PutFile to place files here after processing
>
> As a variation of the above, if you need the files consistently in the same
> directory, you could configure GetFile to only pick up certain file
> patterns.  In this way, you could rename a file after it has been
> processed:
>
> /path/incoming  <- use GetFile to pick up files named $filename.new
> /path/incoming  <- rename file (using UpdateAttribute) to
> $filename.complete and use PutFile to place files here after rename
>
> Hope that gives you some possible alternatives.
>
> Adam
>
>
>
> On Sat, Nov 14, 2015 at 10:49 AM, Mark Petronic 
> wrote:
>
> > Keep, yes, There is a parameter to configure that. Read once. No. But
> there
> > is a set of processors in the works to address that. ListFile and
> > FetchFile. ListFile will return the list of files that have changed since
> > the last time the files were read - it is stateful. FetchFile can then
> take
> > a list and fetch them, and I would assume it would have a parameter for
> > keep= like GetFile. Not sure of the status of the changes - have
> > not checked recently but see:
> > https://issues.apache.org/jira/browse/NIFI-631
> >
> > Mark
> >
> > On Fri, Nov 13, 2015 at 8:55 AM, plj  wrote:
> >
> > > Is there a way for GetFile to not delete a file but only read it
> once?  I
> > > have a directory with files in it.  I only want the new files that are
> > > added
> > > to the to be processed.  It seems that if I set GetFile to not delete
> the
> > > files, the same files get read over and over.
> > >
> > >
> > > thoughts?
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://apache-nifi-developer-list.39713.n7.nabble.com/Keep-Files-tp4864.html
> > > Sent from the Apache NiFi Developer List mailing list archive at
> > > Nabble.com.
> > >
> >
>