Jim I don't really recall the history of that specific processor but what it can handle is just a function of what it was coded for the libraries it uses. I'm sure older format libraries required some other library. That said I think we should consider removing that component in the 2.x and instead favor the ExcelReader [1]. It has the same noted limitation but I'm sure that can be addressed.
[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-poi-nar/1.23.2/org.apache.nifi.excel.ExcelReader/index.html Thanks On Mon, Sep 25, 2023 at 4:02 AM James McMahon <[email protected]> wrote: > A couple of random questions about ConvertExcelToCSVProcessor: > > Why does this processor only handle the xlsx Excel file format? From the > Description for ConvertExcelToCSVProcessor: "*This processor is > currently only capable of processing .xlsx (XSSF 2007 OOXML file format) > Excel documents and not older .xls (HSSF '97(-2007) file format) documents.*" > I ask because it seems unfortunate to have to develop a separate distinct > flow path to handle the .xls files that this native processor cannot. Why > was it that handling of xls Excel files was not baked into > ConvertExcelToCSVProcessor too? Do later releases lift this limitation? > > What is it about this processor that required including the word Processor > in its name? It seems redundant and inconsistent with the naming convention > used for the majority of the other processors. I figure there was an > interesting reason behind this, and so wanted to ask. > > I am using a slightly older version of NiFi. Does this limitation go away > in later versions? > > On Mon, Sep 25, 2023 at 3:23 AM Chris Sampson <[email protected]> > wrote: > >> I completely missed the fact that this was an external python conversion >> script through the ExecuteStreamCommand, but as Matt says, that will be >> catered for in the new NiFi versions. >> >> From a quick look, although I've not tested to confirm, it appears both >> the existing ConvertExcelToCSVProcessor and CSVRecordSetWriter (which can >> now be paired with the relatively new ExcelReader, e.g. in a ConvertRecord >> processor) will both set the result flowfile's mime.type attribute as >> text/csv, which would allow the expected downstream content viewer >> behaviour. >> >> On Mon, 25 Sept 2023, 06:54 Matt Burgess, <[email protected]> wrote: >> >>> I added MIME Type properties to ExecuteProcess and ExecuteStream command >>> so you can set it explicitly if you want [1]. They will be in the 1.24.0 >>> and 2.0 releases. >>> >>> Regards, >>> Matt >>> >>> [1] https://issues.apache.org/jira/browse/NIFI-12011 >>> >>> >>> On Mon, Sep 25, 2023 at 1:41 AM Joe Witt <[email protected]> wrote: >>> >>>> Chris >>>> >>>> Yep. Though this case was ExecuteStreamCommand so following with >>>> UpdateAttr as you mention or IdentifyMimeType would do the trick. >>>> >>>> Thanks >>>> >>>> On Sun, Sep 24, 2023 at 10:30 PM Chris Sampson < >>>> [email protected]> wrote: >>>> >>>>> An UpdateAttribute could also be used to update the mime.type, e.g. to >>>>> text/csv. >>>>> >>>>> I'd think the csv record writer should probably do this automatically >>>>> though, so maybe worth a jira to correct that (I'm reasonably sure the >>>>> existing json and avro writers do that, for example). >>>>> >>>>> On Sun, 24 Sept 2023, 23:52 James McMahon, <[email protected]> >>>>> wrote: >>>>> >>>>>> That was it. I was missing the forest for the trees, yet again <lol>. >>>>>> I do all the hard work and then forget to IdentifyMimeType at the end. >>>>>> Thanks very much Joe. >>>>>> Jim >>>>>> >>>>>> On Sun, Sep 24, 2023 at 6:30 PM Joe Witt <[email protected]> wrote: >>>>>> >>>>>>> Jim, >>>>>>> >>>>>>> Before you try to view it you can likely run it through >>>>>>> IdentifyMimeType. As you note the conversion from XLS to CSV happens >>>>>>> but >>>>>>> we still see a mime type of 'application/vnd. >>>>>>> openxmlformats-officedocument.spreadsheetml.sheet' so that is >>>>>>> likely causing it to not even attempt to display. So after your python >>>>>>> script execution run the data through IdentifyMimeType then you can >>>>>>> likely >>>>>>> view it just fine. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> On Sun, Sep 24, 2023 at 3:21 PM James McMahon <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I sure can Joe. Here they are: >>>>>>>> >>>>>>>> RouteOnAttribute.Route >>>>>>>> isExcel >>>>>>>> execution.command >>>>>>>> /usr/bin/python3 >>>>>>>> execution.command.args >>>>>>>> /opt/nifi/config_resources/scripts/excelToCSV.py >>>>>>>> execution.error >>>>>>>> Empty string set >>>>>>>> execution.status >>>>>>>> 0 >>>>>>>> filename >>>>>>>> Alltables.csv >>>>>>>> hash.value.md5 >>>>>>>> b48840c161b645a0169e622dcb8f5083 >>>>>>>> hash.value.sha256 >>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa >>>>>>>> isChild >>>>>>>> false >>>>>>>> mime.extension >>>>>>>> .xlsx >>>>>>>> mime.type >>>>>>>> application/vnd.openxmlformats-officedocument.spreadsheetml.sheet >>>>>>>> parent.MD5 >>>>>>>> b48840c161b645a0169e622dcb8f5083 >>>>>>>> parent.SHA256 >>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa >>>>>>>> path >>>>>>>> ./ >>>>>>>> s3.bucket >>>>>>>> rampart-raw-data >>>>>>>> s3.encryptionStrategy >>>>>>>> SSE_S3 >>>>>>>> s3.etag >>>>>>>> b48840c161b645a0169e622dcb8f5083 >>>>>>>> s3.isLatest >>>>>>>> true >>>>>>>> s3.lastModified >>>>>>>> 1672701227000 >>>>>>>> s3.length >>>>>>>> 830934 >>>>>>>> s3.owner >>>>>>>> b34a7aa80a4130503fee2e8d4c2b674e154af3c4db69db9a4e3bff8a47cc92d1 >>>>>>>> s3.sseAlgorithm >>>>>>>> AES256 >>>>>>>> s3.storeClass >>>>>>>> STANDARD >>>>>>>> s3.version >>>>>>>> null >>>>>>>> sourcing.MD5 >>>>>>>> b48840c161b645a0169e622dcb8f5083 >>>>>>>> sourcing.SHA256 >>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa >>>>>>>> sourcing.sourceMD5 >>>>>>>> b48840c161b645a0169e622dcb8f5083 >>>>>>>> sourcing.sourceSHA256 >>>>>>>> 4847ac157fd30d6f2e53cb3c4e879ae063d498709da2686c6f61ba6019456afa >>>>>>>> triage.datatype >>>>>>>> excel >>>>>>>> uuid >>>>>>>> d72ec2e9-cfbd-435e-9954-4f7fae55c550 >>>>>>>> >>>>>>>> Thanks for any help. Perhaps my data is there but I simply can't >>>>>>>> render it in the Viewer? >>>>>>>> Jim >>>>>>>> >>>>>>>> On Sun, Sep 24, 2023 at 6:08 PM Joe Witt <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Jim, >>>>>>>>> >>>>>>>>> If a content type attribute exists and is not a type NiFi >>>>>>>>> understands it will not be able to render it. Can you show what >>>>>>>>> flowfile >>>>>>>>> attributes are present at the point you attempt to view it? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> On Sun, Sep 24, 2023 at 3:03 PM James McMahon < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hello. I have converted incoming Excel files to csv. I'd like to >>>>>>>>>> look at the result, but when I select my flowfiles from the output >>>>>>>>>> queue, I >>>>>>>>>> can only select "View as hex" - but I cannot get the display to show >>>>>>>>>> me the >>>>>>>>>> records in the form I expect. Viewing them using the hex display is >>>>>>>>>> not >>>>>>>>>> helpful. >>>>>>>>>> >>>>>>>>>> How can I fix this viewing issue? >>>>>>>>>> >>>>>>>>>> Here is an example of what I can see: >>>>>>>>>> >>>>>>>>>> 0x00000000 22 54 61 62 6C 65 20 31 2E 20 20 45 73 74 69 6D "Table >>>>>>>>>> 1. Estim >>>>>>>>>> 0x00000010 61 74 65 64 20 4D 6F 6E 74 68 6C 79 20 53 61 6C ated >>>>>>>>>> Monthly Sal >>>>>>>>>> 0x00000020 65 73 20 61 6E 64 20 49 6E 76 65 6E 74 6F 72 69 es >>>>>>>>>> and Inventori >>>>>>>>>> 0x00000030 65 73 20 66 6F 72 20 4D 61 6E 75 66 61 63 74 75 es >>>>>>>>>> for Manufactu >>>>>>>>>> 0x00000040 72 65 72 73 2C 20 52 65 74 61 69 6C 65 72 73 2C rers, >>>>>>>>>> Retailers, >>>>>>>>>> 0x00000050 20 61 6E 64 20 4D 65 72 63 68 61 6E 74 20 57 68 and >>>>>>>>>> Merchant Wh >>>>>>>>>> 0x00000060 6F 6C 65 73 61 6C 65 72 73 22 2C 55 6E 6E 61 6D >>>>>>>>>> olesalers",Unnam >>>>>>>>>> 0x00000070 65 64 3A 20 31 2C 55 6E 6E 61 6D 65 64 3A 20 32 ed: >>>>>>>>>> 1,Unnamed: 2 >>>>>>>>>> 0x00000080 2C 55 6E 6E 61 6D 65 64 3A 20 33 2C 55 6E 6E 61 ,Unnamed: >>>>>>>>>> 3,Unna >>>>>>>>>> 0x00000090 6D 65 64 3A 20 34 2C 55 6E 6E 61 6D 65 64 3A 20 med: >>>>>>>>>> 4,Unnamed: >>>>>>>>>> 0x000000A0 35 2C 55 6E 6E 61 6D 65 64 3A 20 36 2C 55 6E 6E 5,Unnamed: >>>>>>>>>> 6,Unn >>>>>>>>>> 0x000000B0 61 6D 65 64 3A >>>>>>>>>> >>>>>>>>>
