Re: Keep Files
Oooh, neat idea Salvatore. +1 to creativity. Really interesting. Adam On Mon, Nov 16, 2015 at 6:25 AM, Salvatore Papawrote: > If you're on a linux system, a alternative i've used in the past is to > create another directory, full of symlinks pointing to the original > directory. > > As an example, assuming you have a directory: /data/input_files/ full of > files, create a directory /data/input_links/, and from that new directory, > do: "ln -s ../input_files/* ./" > > Now in NiFi, use the original GetFile processor, configured with > /data/input_links/, and set Keep Source File to False. When the GetFile > processor picks up the file, it'll read the contents and create a flowfile > by following the symlink, delete the symlink, and the original file will > remain in /data/input_files. > > On Mon, Nov 16, 2015 at 12:00 PM, Adam Taft wrote: > > > Also, as a potential work-around, it's possible to use GetFile with > > "delete" mode and then somewhere in your flow, use PutFile to place the > > file back down into a "complete" directory. i.e. something like: > > > > /path/incoming <- use GetFile to pick up files here > > /path/complete <- use PutFile to place files here after processing > > > > As a variation of the above, if you need the files consistently in the > same > > directory, you could configure GetFile to only pick up certain file > > patterns. In this way, you could rename a file after it has been > > processed: > > > > /path/incoming <- use GetFile to pick up files named $filename.new > > /path/incoming <- rename file (using UpdateAttribute) to > > $filename.complete and use PutFile to place files here after rename > > > > Hope that gives you some possible alternatives. > > > > Adam > > > > > > > > On Sat, Nov 14, 2015 at 10:49 AM, Mark Petronic > > wrote: > > > > > Keep, yes, There is a parameter to configure that. Read once. No. But > > there > > > is a set of processors in the works to address that. ListFile and > > > FetchFile. ListFile will return the list of files that have changed > since > > > the last time the files were read - it is stateful. FetchFile can then > > take > > > a list and fetch them, and I would assume it would have a parameter for > > > keep= like GetFile. Not sure of the status of the changes - > have > > > not checked recently but see: > > > https://issues.apache.org/jira/browse/NIFI-631 > > > > > > Mark > > > > > > On Fri, Nov 13, 2015 at 8:55 AM, plj wrote: > > > > > > > Is there a way for GetFile to not delete a file but only read it > > once? I > > > > have a directory with files in it. I only want the new files that > are > > > > added > > > > to the to be processed. It seems that if I set GetFile to not delete > > the > > > > files, the same files get read over and over. > > > > > > > > > > > > thoughts? > > > > > > > > > > > > > > > > -- > > > > View this message in context: > > > > > > > > > > http://apache-nifi-developer-list.39713.n7.nabble.com/Keep-Files-tp4864.html > > > > Sent from the Apache NiFi Developer List mailing list archive at > > > > Nabble.com. > > > > > > > > > >
Re: Release wrangling: 1 week until our hopeful 0.4.0 release
Per NiFi-1165: a discussion is occurring on the ticket: https://issues.apache.org/jira/browse/NIFI-1165 Overall, most of the issues are identified and pending a fix from Mark, Oleg and I. The issues were encountered on two different windows 8 machines by me and on windows 2012 R2 by Mark. My configuration is maven 3.3.3 and Java 1.8.0_45 (on the machine I have in front of me). Should have a patch resolving the issues in the next couple days. Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: joeperciv...@yahoo.com On Monday, November 16, 2015 10:42 AM, Sean Busbeywrote: re: NIFI-1165 I also have a windows 7 laptop I can test on. (though it is low power) On Sun, Nov 15, 2015 at 10:48 AM, Aldrin Piri wrote: > I have another set of eyes for NIFI-748. Will do so now. > > On Sun, Nov 15, 2015 at 10:39 AM, Tony Kurc wrote: > > > For those not watching commits@nifi > > I need another set of eyes on the review for NIFI-748 > > > > On Sun, Nov 15, 2015 at 8:33 AM, Joe Witt wrote: > > > > > NIFI-1082 (this should move to next release unless a resolution is > > > imminent) > > > > > > NIFI-1108 (move to next release) > > > > > > NIFI-1139 (recommend moving to 0.5.0) > > > > > > NIFI-1164 (this should get fixed now - it makes builds unreliable) > > > > > > NIFI-1165 (should tackle now. have a windows laptop i can build on) > > > > > > Thanks for pushing tony. > > > > > > On Sun, Nov 15, 2015 at 8:18 AM, Tony Kurc wrote: > > > > Update: > > > > > > > > Presumably fixed by NIFI-1086 (Joe Percivall). Reviewed, awaiting > > > revision > > > > NIFI-61 > > > > NIFI-812 > > > > NIFI-980 > > > > NIFI-1009 > > > > NIFI-1086 > > > > NIFI-1133 > > > > > > > > Multiple Auths (Matt Gilman) no patch yet, making progress > > > > NIFI-655 > > > > > > > > Provenance Search Improvement (Oleg Zhurakousky) PR in, being > reviewed > > by > > > > Tony Kurc > > > > NIFI-748 > > > > > > > > Create a Getting Started Guide (Mark Payne) Review complete, being > > merged > > > > in by Tony Kurc > > > > NIFI-973 > > > > > > > > Line ending fix (Tony Kurc) Finger hovering over "go" button > > > > NIFI-1054 > > > > > > > > ExecuteStreamCommand (Joe Percivall). Reviewed, awaiting revision > > > > NIFI-1081 > > > > > > > > Provenance repository search (Mark Payne) original patch reverted, > new > > > > patch in development? Move to 0.5.0? > > > > NIFI-1082 > > > > > > > > Scrub code looking for @InputRequirement consistency (Mark Payne) - > not > > > > sure how to attack this one > > > > NIFI-1108 > > > > > > > > * NEW * > > > > LogAttribute processor fix (Oleg Zhurakousky) - trivial fix? but > > breaking > > > > change? I recommend moving to 0.5.0 > > > > NIFI-1139 > > > > > > > > * NEW * > > > > Race condition Fix (Oleg Zhurakousky) - assigned but no patch. Move > to > > > > 0.5.0 or 0.4.1? > > > > NIFI-1164 > > > > > > > > * NEW * > > > > Build on windows failing (Joe Percivall) - assigned but no patch. I > can > > > dig > > > > in. (I submitted a requst for MSDN, but can take 8 weeks) > > > > NIFI-1165 > > > > > > > > > > > > > > > > On Fri, Nov 13, 2015 at 10:22 AM, Aldrin Piri > > > wrote: > > > > > > > >> Scanned through and removed the 0.4.0 tagging for State Management. > > > >> > > > >> Thanks for the suggestion. > > > >> > > > >> On Fri, Nov 13, 2015 at 10:10 AM, Sean Busbey > > > wrote: > > > >> > > > >> > Has anyone had a chance to do a pass through Feature Proposals to > > move > > > >> out > > > >> > any that aren't going to make 0.4.0? > > > >> > > > > >> > > > > > https://cwiki.apache.org/confluence/display/NIFI/NiFi+Feature+Proposals > > > >> > > > > >> > On Thu, Nov 12, 2015 at 8:13 AM, Tony Kurc > > wrote: > > > >> > > > > >> > > https://issues.apache.org/jira/browse/NIFI-61 - awaiting an > > answer > > > >> > before > > > >> > > patch can be completed > > > >> > > https://issues.apache.org/jira/browse/NIFI-655 - Based on > feature > > > >> branch > > > >> > > activity, is close? > > > >> > > https://issues.apache.org/jira/browse/NIFI-696 - awaiting a > patch > > > >> > marking > > > >> > > method as deprecated (assigned to me, but if someone else wants > to > > > take > > > >> > it > > > >> > > and I review, thats cool too) > > > >> > > https://issues.apache.org/jira/browse/NIFI-812 - a bit > confused > > > about > > > >> > > this > > > >> > > one. patch in NIFI-1086 will close this? > > > >> > > https://issues.apache.org/jira/browse/NIFI-973 - awaiting > review? > > > >> > > https://issues.apache.org/jira/browse/NIFI-980 - (see 812 > > > confusion) > > > >> > > presumably closed when NIFI-1086 is closed > > > >> > > https://issues.apache.org/jira/browse/NIFI-1009 (same!) > > > >> > > https://issues.apache.org/jira/browse/NIFI-1054 I'm at the > ready > > to > > > >> > submit > > > >> > > a patch at the 11th hour.
[GitHub] nifi pull request: NIFI-1146 Allow GetKafka to be configured with ...
Github user naveenmadhire commented on the pull request: https://github.com/apache/nifi/pull/125#issuecomment-157086151 Closing the pull request. As I've messed up with the commits. I will open a new one soon. Sorry for the trouble. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-1146 Allow GetKafka to be configured with ...
Github user naveenmadhire closed the pull request at: https://github.com/apache/nifi/pull/125 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-748 Fixed logic around handling partial qu...
Github user olegz commented on the pull request: https://github.com/apache/nifi/pull/123#issuecomment-157029989 @trkurc @joewitt @apiri Guys, please see the latest commit. Didn't squash it, so its easier to read and see what's been addressed. In summary: 1. Since based on the latest comment from Joe it appears that we all agree that DocReader is not really public, i kept the dead constructor out and also made DocReader package private. 2. Based on Tony's point added Document sorting logic back. At least it will ensure that previous behavior is maintained. See commit message for more details --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-1164 decreased the chances of race conditi...
GitHub user olegz opened a pull request: https://github.com/apache/nifi/pull/126 NIFI-1164 decreased the chances of race condition Removed checks for 'if (getState() != ControllerServiceState.DISABLED)â from StandardControllerServiceNode.verifyCanEnable(..) operations based on the discussion that we had in NIFI-1143 where âenablableâ service is the one that is not ENABLED or ENABLING. On top of that the actual state check is redundant since it is going to be checked again when isValid() is invoked. Cleaned up the code in StandardControllerServiceProvider.enableReferencingServices(..) since: 1. It had the same check ordering issue on service state between ENABLING and ENABLED as was described in NIFI-1143. 2. Removed redundant recursiveReferences computation 3. There was two loops iterating over the same collection, so merged that into one 4. Removed redundant state check in the loop since it would be checked again as part of 'verifyCanEnable' You can merge this pull request into a Git repository by running: $ git pull https://github.com/olegz/nifi NIFI-1164 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/126.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #126 commit 54c5c0397c6d45d34c7c75e5fe44984dcb765ea4 Author: Oleg ZhurakouskyDate: 2015-11-16T18:18:43Z NIFI-1164 decreased the chances of race condition Removed checks for 'if (getState() != ControllerServiceState.DISABLED)â from StandardControllerServiceNode.verifyCanEnable(..) operations based on the discussion that we had in NIFI-1143 where âenablableâ service is the one that is not ENABLED or ENABLING. On top of that the actual state check is redundant since it is going to be checked again when isValid() is invoked. Cleaned up the code in StandardControllerServiceProvider.enableReferencingServices(..) since: 1. It had the same check ordering issue on service state between ENABLING and ENABLED as was described in NIFI-1143. 2. Removed redundant recursiveReferences computation 3. There was two loops iterating over the same collection, so merged that into one 4. Removed redundant state check in the loop since it would be checked again as part of 'verifyCanEnable' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-1146 Allow GetKafka to be configured with ...
Github user markap14 commented on the pull request: https://github.com/apache/nifi/pull/125#issuecomment-157140757 @naveenmadhire no trouble at all :) Looking forward to the new pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-1107 - Create new PutS3ObjectMultipart pro...
Github user jskora commented on a diff in the pull request: https://github.com/apache/nifi/pull/121#discussion_r44967382 --- Diff: nifi-nar-bundles/nifi-aws-bundle/nifi-aws-processors/src/main/java/org/apache/nifi/processors/aws/s3/PutS3ObjectMultipart.java --- @@ -0,0 +1,550 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.aws.s3; + +import com.amazonaws.AmazonClientException; +import com.amazonaws.services.s3.AmazonS3; +import com.amazonaws.services.s3.AmazonS3Client; +import com.amazonaws.services.s3.model.AccessControlList; +import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest; +import com.amazonaws.services.s3.model.CompleteMultipartUploadResult; +import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest; +import com.amazonaws.services.s3.model.InitiateMultipartUploadResult; +import com.amazonaws.services.s3.model.ObjectMetadata; +import com.amazonaws.services.s3.model.PartETag; +import com.amazonaws.services.s3.model.StorageClass; +import com.amazonaws.services.s3.model.UploadPartRequest; +import com.amazonaws.services.s3.model.UploadPartResult; +import org.apache.nifi.annotation.behavior.DynamicProperty; +import org.apache.nifi.annotation.behavior.InputRequirement; +import org.apache.nifi.annotation.behavior.InputRequirement.Requirement; +import org.apache.nifi.annotation.behavior.ReadsAttribute; +import org.apache.nifi.annotation.behavior.WritesAttribute; +import org.apache.nifi.annotation.behavior.WritesAttributes; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.SeeAlso; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.processor.DataUnit; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.io.InputStreamCallback; +import org.apache.nifi.processor.util.StandardValidators; +import org.apache.nifi.stream.io.BufferedInputStream; + +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.concurrent.TimeUnit; + +@SeeAlso({FetchS3Object.class, PutS3Object.class, DeleteS3Object.class}) +@InputRequirement(Requirement.INPUT_REQUIRED) +@Tags({"Amazon", "S3", "AWS", "Archive", "Put", "Multi", "Multipart", "Upload"}) +@CapabilityDescription("Puts FlowFiles to an Amazon S3 Bucket using the MultipartUpload API method. " + +"This upload consists of three steps 1) initiate upload, 2) upload the parts, and 3) complete the upload.\n" + +"Since the intent for this processor involves large files, the processor saves state locally after each step " + +"so that an upload can be resumed without having to restart from the beginning of the file.\n" + +"The AWS libraries default to using standard AWS regions but the 'Endpoint Override URL' allows this to be " + +"overridden.") +@DynamicProperty(name = "The name of a User-Defined Metadata field to add to the S3 Object", +value = "The value of a User-Defined Metadata field to add to the S3 Object", +description = "Allows user-defined metadata to be added to the S3 object as key/value pairs", +supportsExpressionLanguage = true) +@ReadsAttribute(attribute = "filename", description = "Uses the FlowFile's filename as the filename for
[GitHub] nifi pull request: NIFI-1146 Allow GetKafka to be configured with ...
GitHub user naveenmadhire opened a pull request: https://github.com/apache/nifi/pull/127 NIFI-1146 Allow GetKafka to be configured with auto.offset.reset to "largest" or "smallest" Pull request with changes. @markap14 I removed the writeAttributes ones, since there is no need to write the auto.offset to the flowfile attribute. I also modified the description. Please check to see if the description is fine, Screenshots after the build, ![image](https://cloud.githubusercontent.com/assets/8851548/11192575/7f28294a-8c67-11e5-9800-7c85af4ce038.png) ![image](https://cloud.githubusercontent.com/assets/8851548/11192584/87e334c6-8c67-11e5-8ec6-ada91bf6f834.png) You can merge this pull request into a Git repository by running: $ git pull https://github.com/naveenmadhire/nifi NIFI-1146 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/127.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #127 commit b954ca620e619e7961e6a7a58122b844f30862da Author: Naveen MadhireDate: 2015-11-16T17:59:52Z NIFI-1146 Allow GetKafka to be configured with auto.offset.reset to largest or smallest commit 03a54bf2d593e07ab602f6a9425d0231a273ba5a Author: Naveen Madhire Date: 2015-11-16T19:32:17Z Changes after review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-1146 Allow GetKafka to be configured with ...
Github user asfgit closed the pull request at: https://github.com/apache/nifi/pull/127 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-1146 Allow GetKafka to be configured with ...
Github user markap14 commented on the pull request: https://github.com/apache/nifi/pull/127#issuecomment-157179517 @naveenmadhire - code looks good. Builds without problem, and testing on my Kafka instance shows the expected results. Nice work! And thanks for the contribution. On behalf of the NiFi community, let me welcome you as our newest contributor! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-748 Fixed logic around handling partial qu...
Github user markap14 commented on the pull request: https://github.com/apache/nifi/pull/123#issuecomment-157182575 @trkurc Personally, I have exactly 0 qualms about changing it to package private. If I choose to take some random util class from a release of Apache Tomcat, for example, and depended on it, I should certainly not be surprised if that class changes from release to release (including incremental releases) - I don't think this is any different. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-1164 decreased the chances of race conditi...
Github user olegz closed the pull request at: https://github.com/apache/nifi/pull/126 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-1164 decreased the chances of race conditi...
Github user olegz commented on the pull request: https://github.com/apache/nifi/pull/126#issuecomment-157173562 Pulling back, see comments in JIRA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: EvaluateJsonPath error: Unable to return a scalar value for the expression
The documentation is a little unclear and light. Made a ticket [1] to clarify how these properties are interpreted. Thanks! [1] https://issues.apache.org/jira/browse/NIFI-1177 On Mon, Nov 16, 2015 at 3:52 PM, Sumanth Chinthaguntawrote: > Thanks Aldrin. > it works after I changed Return Type to JSON. > > > On Nov 16, 2015, at 12:47 PM, Aldrin Piri wrote: > > > > Sumo, > > > > The scalar option has the processor looking for the resultant value of > the > > expression to provide a non-Map/List representation of the targeted > > expression. In this case, if you change the property to json, it should > > work as anticipated. The property itself is more of a validation of the > > data that is being extracted (in that it is an object/array or a simple > > value). > > > > On Mon, Nov 16, 2015 at 3:20 PM, Sumanth Chinthagunta > > > wrote: > > > >> I am trying to extract data into attribute using EvaluateJsonPath. > when > >> what JsonPath return complex type, I am getting error: Unable to return > a > >> scalar value for the expression $['data'] for FlowFile 152. Evaluated > value > >> was {id=1…..}. Transferring to failure > >> > >> data - $.data <— Error > >> id - $.data.id <— works > >> { > >>"database": "test”, > >>"table": "guests”, > >>"type": "insert”, > >>"ts": 1446422524, > >>"xid": 1800, > >>"commit": true, > >>"data": { > >>"reg_date": "2015-11-02 00:02:04", > >>"firstname": "sumo", > >>"id": 1, > >>"lastname": "demo" > >>} > >> } > >> > >> if it possible to extract JSON object from FlowFile using > EvaluateJsonPath? > >> if not please advice what options I have. > >> > >> Thanks > >> Sumo > >> > >> > >> > >> > >> > >> > >> > >> > >
[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...
Github user trkurc commented on the pull request: https://github.com/apache/nifi/pull/116#issuecomment-157213254 @jskora what would you expect the following unit tests to do? ``` @Test public void testInvalidRegex() { final TestRunner runner = TestRunners.newTestRunner(new UpdateAttribute()); runner.setProperty("Delete Attributes Expression", "("); final Mapattributes = new HashMap<>(); attributes.put("attribute.1", "value.1"); runner.enqueue(new byte[0], attributes); runner.run(); } @Test public void testInvalidRegexInAttribute() { final TestRunner runner = TestRunners.newTestRunner(new UpdateAttribute()); runner.setProperty("Delete Attributes Expression", "${butter}"); final Map attributes = new HashMap<>(); attributes.put("butter", "("); runner.enqueue(new byte[0], attributes); runner.run(); } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...
Github user trkurc commented on the pull request: https://github.com/apache/nifi/pull/116#issuecomment-157218572 I think what I'm getting at is that I think having a property that requires the expression to evaluate to a valid regex might mean it is time for a failure relationship for this processor (a breaking change?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Common scheduler and add-hock thread creation
Taking liberties - so let me throw few example. I am sure you’d agree that Thread creation and management is an expensive and hard and error prone, hence new java.util.concurrent and all the goodies in it. - There is a patch currently in the queue where there is a creation of new Thread() and then starting it. Is it necessary? Could we reuse the thread from the common pool? - We have many places where we have Thread.sleep(..) and in fact do sleep considerable amount of time. That thread lays dormant where it could actually be doing something. Is it necessary? Cheers Oleg > On Nov 16, 2015, at 7:52 PM, Tony Kurcwrote: > > the issue with a best practices guide on this subject is it will be > dominated by edge cases. The common case should be "don't produce any > threads". > > That being said, I commented on a jira somewhere about LinkedBlockingQueues > used in so many producer/consumer style processors and possibly needing a > library to have some consistency in using those queues in a consistent > thread safe manner. > > Also, I'm not quite sure of what you mean by taking liberties? > > > > > > > On Mon, Nov 16, 2015 at 7:39 PM, Oleg Zhurakousky < > ozhurakou...@hortonworks.com> wrote: > >> Guys >> >> I am noticing many modules where we have things like "new >> Thread(..).start()”, creation of new executors and schedulers, >> Thread.sleep(..) etc.,. I am sure many would agree that taking such >> liberties with Threads will have consequences (not IF but WHEN) >> On several threads several of us mentioned a “must read” for anyone who is >> getting into concurrent code - >> http://ptgmedia.pearsoncmg.com/images/9780321349606/samplepages/9780321349606.pdf >> and indeed we can/should definitely grab some best practices from this book. >> >> At least we can start from what’s our strategy around thread management >> for NAR developers? Basically should/should not a user create Threads, >> Executors, Schedulers etc. >> >> Cheers >> Oleg >>
[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...
Github user jskora commented on the pull request: https://github.com/apache/nifi/pull/116#issuecomment-157265254 Interesting. Before I enabled EL support, it had a REGULAR_EXPRESSION_VALIDATOR which would have caught those. Is there a way to validate a property that supports expression language to verify that the output will be an attribute key? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...
Github user apiri commented on the pull request: https://github.com/apache/nifi/pull/116#issuecomment-157265218 @trkurc I am onboard with what you are driving toward. I had created an issue around the same thought process. https://issues.apache.org/jira/browse/NIFI-813 The core issue is that EL, when introduced, provided ways for processors to fail in ways previously unanticipated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...
Github user trkurc commented on the pull request: https://github.com/apache/nifi/pull/116#issuecomment-157266419 @jskora as @apiri pointed out, i think that we may have to live with the problem until NIFI-813 is resolved. I tried wrapping my head around a validator that handled the first test case, and that hurt. The second one didn't hurt, because a validator is too early to catch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...
Github user trkurc commented on the pull request: https://github.com/apache/nifi/pull/116#issuecomment-157266895 @apiri you have some history with this - what is a greater evil, no el support or possibly breaking? this is already a processor that appears to have a "beware of el" sign up already? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Common scheduler and add-hock thread creation
So back in the day... Here is the thought process behind how it works today at a high level and taking some generalities. Developers of extensions, and that primarily means processors, begin process sessions. In a process session a processor can access, create, destroy zero or more flow files and route them to relationships. They do not dictate how often they run or when they run. The Flow Controller does that. When it decides to invoke them it does so by calling the appropriate method. The thread given in that call is the thread they can use to operate on that process session. When they're done with that session be a good behaved entity and give the thread back to the controller. That is it. They have no control over threads because generally they don't need them. Now, some processors are special and they may be written by a developer that needs greater control of their own threading model, like web servers for instance. That is ok but it is also outside of what is described above. It is really 'in addition to' what is described above. The framework supported path for dealing with FlowFiles (which is what NiFi is for) is only as above. It is 'ok' for these special cases but so far nothing practical has risen to the level of it needing a framework resolution. There have been glimmers but nothing that has really shown to need a resolution as far as threading goes. We've considered having different managed thread pools and then operators could assign a given component on the flow to those pools. This way they can preserve a pool for 'sources' vs 'mid-stream' vs 'delivery' processors for example. Again, this never reached the level of needing a framework solution. There have also been cases where folks want to have processors operate and they do not do *anything* with FlowFiles at all. These are for what is known as the 'NiFi-As-A-Fancy-Cron' tool pattern. We don't need to support this one. Now I can definitely conceive of ways to build processors or flows which will create difficulty in NiFi. I am ok with that personally. Thanks Joe On Mon, Nov 16, 2015 at 8:50 PM, Oleg Zhurakouskywrote: > Tony, thanks for your input. At least we have some discussion going. See in > line for the rest. > >> On Nov 16, 2015, at 8:22 PM, Tony Kurc wrote: >> >> so, I believe threads in a processor in nifi are much, much easier than >> general threading in many other applications. There are defined boundaries >> on when a processor is built and torn down. Pretty much any state in the >> middle is up to the processor. you know when resources need to be stood up. >> you know when they need to be torn down. > Generally true and I’d agree there is not much one can do to stop users doing > what they wan to do regardless of how damaging it may be to the rest of the > system >> >> Because threads have a localized scope, I'm not sure a global pool would be >> a help. If a processor needs higher throughput or shorter latency, now, the >> problem is generally isolated and there is a nice little cream center to >> optimize. If you're blocked on a global pool of threads because some other >> processor consumed all the threads in a pool, well, suddenly, your >> performance is no longer a localized problem. >> > This argument is argumentative ;) > 1. What if I’ve saturated all my cores in my localized Processor’s thread > pool with things like while (true){}? Then it really doesn’t matter what the > rest of the framework does, the system is hosed. So blockage in this case > comes from let’s just call it malicious processor and not global thread pool. > So, in the end its a bit of a general discipline question ;) > 2. So in this case one of the best practices could be taken right from > Brian’s book that states that tasks should be as short lived as possible. Any > repeats and retries, should be handled by rerunning/rescheduling a task > instead of spinning in the loop inside of task. So with global Scheduler > exposed via context or something that each Processor, Service etc. sees we > can have a shared Thread pool. We can even have ControllerService as > ThreadPools. > Yes, that would take some serious code review and general discipline from the > developers but the benefit would be proportional as well. > >> because the common case is "don't use threads" (not everyone is going to >> build a complex service, contribute to the core framework or need threads >> in their processor) I actually think code review is a good way to shake out >> some poor decisions. because optimizing the threads in a processor for a >> use case a specialized task (the processor writer knows the critical >> sections and bottlenecks), I'm not sure whether there are massive strides >> that can be made, but I could be wrong. And we'll always have a weird edge >> case of some library that wants to do threads its own way that we're trying >> to integrate. >> >> My guess is a lot of
[GitHub] nifi pull request: NIFI-1123 Adds expression language support to D...
Github user jskora commented on the pull request: https://github.com/apache/nifi/pull/116#issuecomment-157272302 @trkurc, this is not a problem fix but an enhancement based on discussion that occurred on [NIFI-641|https://issues.apache.org/jira/browse/NIFI-641]. It seemed easy enough, but the validation issues probably out weigh the value. Postponing until [NIFI-813|https://issues.apache.org/jira/browse/NIFI-813] provides a means to handle potential errors, but if mixing Regex and Expression Language can only be handled with a Failure relationships, maybe it's better to not mix the them and allow separate properties so the Regex can be validated and the EL expression can produce a list of attributes that can be ignored if they don't exist. Thoughts? until they can be added without adding risk. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Keep Files
If you're on a linux system, a alternative i've used in the past is to create another directory, full of symlinks pointing to the original directory. As an example, assuming you have a directory: /data/input_files/ full of files, create a directory /data/input_links/, and from that new directory, do: "ln -s ../input_files/* ./" Now in NiFi, use the original GetFile processor, configured with /data/input_links/, and set Keep Source File to False. When the GetFile processor picks up the file, it'll read the contents and create a flowfile by following the symlink, delete the symlink, and the original file will remain in /data/input_files. On Mon, Nov 16, 2015 at 12:00 PM, Adam Taftwrote: > Also, as a potential work-around, it's possible to use GetFile with > "delete" mode and then somewhere in your flow, use PutFile to place the > file back down into a "complete" directory. i.e. something like: > > /path/incoming <- use GetFile to pick up files here > /path/complete <- use PutFile to place files here after processing > > As a variation of the above, if you need the files consistently in the same > directory, you could configure GetFile to only pick up certain file > patterns. In this way, you could rename a file after it has been > processed: > > /path/incoming <- use GetFile to pick up files named $filename.new > /path/incoming <- rename file (using UpdateAttribute) to > $filename.complete and use PutFile to place files here after rename > > Hope that gives you some possible alternatives. > > Adam > > > > On Sat, Nov 14, 2015 at 10:49 AM, Mark Petronic > wrote: > > > Keep, yes, There is a parameter to configure that. Read once. No. But > there > > is a set of processors in the works to address that. ListFile and > > FetchFile. ListFile will return the list of files that have changed since > > the last time the files were read - it is stateful. FetchFile can then > take > > a list and fetch them, and I would assume it would have a parameter for > > keep= like GetFile. Not sure of the status of the changes - have > > not checked recently but see: > > https://issues.apache.org/jira/browse/NIFI-631 > > > > Mark > > > > On Fri, Nov 13, 2015 at 8:55 AM, plj wrote: > > > > > Is there a way for GetFile to not delete a file but only read it > once? I > > > have a directory with files in it. I only want the new files that are > > > added > > > to the to be processed. It seems that if I set GetFile to not delete > the > > > files, the same files get read over and over. > > > > > > > > > thoughts? > > > > > > > > > > > > -- > > > View this message in context: > > > > > > http://apache-nifi-developer-list.39713.n7.nabble.com/Keep-Files-tp4864.html > > > Sent from the Apache NiFi Developer List mailing list archive at > > > Nabble.com. > > > > > >