Thanks Joe, The use case is that I'm receiving data without knowing what character set it is coming in. --mime-encoding is giving it's best guess on character set rather than the content type.
The ListFile sounds interesting, but I wonder if I really even need that. I don't want to leave the files in place, I just want to run an external command on them as part of the data flow. Is there a way I can run an external command against the physical file such as /opt/nifi/somedir/12345.uuid? Would that info be in an attribute somewhere? It just seems wasteful to make an extra copy of the file, in order to run a read-only command on it, then delete it. If ListFiles is still the right way to go, please let me know. On Fri, Nov 20, 2015 at 6:45 AM, Joe Witt <[email protected]> wrote: > For identifying the mime type you may have sufficient results with the > existing processor 'IdentifyMimeType' which you can put into the flow. > > For better logic around identifying files to pull but first calling an > external command to learn more about them the upcoming > ListFile/FetchFile combo that comes from this JIRA [1] might give you > better flexibility. > > [1] https://issues.apache.org/jira/browse/NIFI-631 > > Thanks > Joe > > On Fri, Nov 20, 2015 at 12:08 AM, Charlie Frasure > <[email protected]> wrote: > > Thanks everyone for the help. The trouble started a few processors > earlier > > in an ExecuteStreamCommand on ${filename} with the result of "file not > > found". I had originally set my GetFile processor to not remove files, > but > > recently changed that. Now it seems that my ExecuteStreamCommand may > not be > > the best way to accomplish this. > > > > The command that gets executed is: file -b --mime-encoding ${filename} > > in the working directory: ${absolute.path} > > > > Now that the file is no longer in the source directory when the processor > > fires, the command is broken. I could PutFile somewhere temporarily; is > > there a better way? > > > > On Thu, Nov 19, 2015 at 10:33 PM, Joe Witt <[email protected]> wrote: > >> > >> Charlie, > >> > >> The fact that this is confusing is something we agree should be more > >> clear and we will improve. We're tackling it based on what is > >> mentioned here [1]. > >> > >> [1] > >> > https://cwiki.apache.org/confluence/display/NIFI/Interactive+Queue+Management > >> > >> Thanks > >> Joe > >> > >> On Thu, Nov 19, 2015 at 10:30 PM, Corey Flowers <[email protected] > > > >> wrote: > >> > These guys are right. The file to look in for the uuid is the > >> > nifi-app.log. > >> > Also if you wanted to see what the processor itself was doing, you > could > >> > right click on the processor, get its uuid and while it is running, > run > >> > (assuming it is on Linux): > >> > > >> > tail -F nifi-app.log | grep uuid > >> > > >> > This will just scroll the logs for that specific processor and will > show > >> > you > >> > what it is doing. It should also tell you specific file names and > uuids > >> > of > >> > the failing files. > >> > > >> > Hope that helps! Have a great night and good luck! > >> > > >> > Sent from my iPhone > >> > > >> > On Nov 19, 2015, at 9:27 PM, Juan Sequeiros <[email protected]> > wrote: > >> > > >> > You can also check the NiFi logs for a searchable id or for what the > >> > previous processor ID produced to help search provenance. > >> > > >> > On Nov 19, 2015 21:22, "Bryan Bende" <[email protected]> wrote: > >> >> > >> >> Charlie, > >> >> > >> >> The behavior you described usually means that the processor > encountered > >> >> an > >> >> unexpected error which was thrown back to the framework which rolls > >> >> back the > >> >> processing of that flow file and leaves it in the queue, as opposed > to > >> >> an > >> >> error it expected where it would usually route to a failure > >> >> relationship. > >> >> > >> >> Is the id that you see in the bulletin a uuid? > >> >> > >> >> There should still be some provenance events for this FlowFile from > the > >> >> previous points in the flow. If it looks like the uuid of the > FlowFile, > >> >> that > >> >> should be searchable from provenance using the search button on the > >> >> right. > >> >> Let us know if we can help more. > >> >> > >> >> -Bryan > >> >> > >> >> On Thu, Nov 19, 2015 at 9:10 PM, Charlie Frasure > >> >> <[email protected]> wrote: > >> >>> > >> >>> I have a question on troubleshooting a flow. I've built a flow with > >> >>> no > >> >>> exception routing, just trying to process the expected values first. > >> >>> When a > >> >>> file exposes a problem with the logic in my flow, it queues up prior > >> >>> to the > >> >>> flow that is raising the bulletin. > >> >>> > >> >>> In the bulletin, I can see an id, but can't tell which file it is. > >> >>> Data > >> >>> provenance doesn't seem to help as it passed the flow on the last > >> >>> processor, > >> >>> but hasn't been logged (to my knowledge) on the next one. > >> >>> > >> >>> Is there a way to match the bulletin back to a file without > creating a > >> >>> route for failed files? > >> >> > >> >> > >> > > > > > >
