Hello The UnpackContent is for dealing with archive formats (tar, zip, etc..).
If your file is a compression format (as is the case with the part-0002.gz file) then you first need to run it through 'CompressContent' in 'decompress' mode. You can even first run it through 'IdentifyMimeType' and set up a flow to handle arbitrarily complicated layers of compression/archive structures. So for this case: - GetHDFS (or ListHDFS and FetchHDFS) - CompressContent (in decompress mode) Now you have your text oriented file ready to be dealt with. If you perhaps want to deal with each line individually you can use - SplitText (line split count of 1) Thanks Joe On Tue, Aug 11, 2015 at 9:27 PM, 彭光裕 <[email protected]> wrote: > hi, > > I have a compressed file got from GetHDFS processor and to be > unpacked by using UnpackContent processor, I have already set the > UnpackContent processor property packaging format to 'tar', but an error > like below always takes place. > > > > The error logs is attached below (Unable to unpack StandardFlowFileRecord) > > > > 2015-08-11 07:10:52,291 ERROR [Timer-Driven Process Thread-4] > o.a.n.processors.standard.UnpackContent > UnpackContent[id=b90c65e1-b97f-3b4b-9e37-6223afa1ef03] Unable to unpack > StandardFlowFileRecord[uuid=85b7d53b-3183-4c48-9160-b2e714b5eaa8,claim=1439248247840-1,offset=0,name=part-00002.gz,size=59212170] > due to org.apache.nifi.processor.exception.ProcessException: IOException > thrown from UnpackContent[id=b90c65e1-b97f-3b4b-9e37-6223afa1ef03]: > java.io.IOException: Error detected parsing the header; routing to failure: > org.apache.nifi.processor.exception.ProcessException: IOException thrown > from UnpackContent[id=b90c65e1-b97f-3b4b-9e37-6223afa1ef03]: > java.io.IOException: Error detected parsing the header > > > > My compressed file is named part-00002.gz, and you can access the file > here: https://dl.dropboxusercontent.com/u/24808937/part-00002.gz > > Any advice would be welcome. Please help how to solve this problem, > thank you! > > > > Roland > > > > *本信件可能包含中華電信股份有限公司機密資訊,非指定之收件者,請勿蒐集、處理或利用本信件內容,並請銷毀此信件. > 如為指定收件者,應確實保護郵件中本公司之營業機密及個人資料,不得任意傳佈或揭露,並應自行確認本郵件之附檔與超連結之安全性,以共同善盡資訊安全與個資保護責任. > Please be advised that this email message (including any attachments) > contains confidential information and may be legally privileged. If you are > not the intended recipient, please destroy this message and all attachments > from your system and do not further collect, process, or use them. Chunghwa > Telecom and all its subsidiaries and associated companies shall not be > liable for the improper or incomplete transmission of the information > contained in this email nor for any delay in its receipt or damage to your > system. If you are the intended recipient, please protect the confidential > and/or personal information contained in this email with due care. Any > unauthorized use, disclosure or distribution of this message in whole or in > part is strictly prohibited. Also, please self-inspect attachments and > hyperlinks contained in this email to ensure the information security and > to protect personal information.* >
