Thanks for looking into this Tim! I just created a ticket in JIRA. https://issues.apache.org/jira/browse/TIKA-2761
Nick > On Oct 17, 2018, at 8:21 AM, Tim Allison <[email protected]> wrote: > > Nick, > I'm sorry for my delay. The XHTMLContentHandler writes everything > that is in the Metadata object when the parser writes the first > "content" element, and in the MP3Parser, this is the <h1> element, > which is written before the sample rate is added to the Metadata > object. Any metadata that is added afterwards does not show up in the > xhtml, but is retrievable from the Metadata object. > This is one of the limitations of a streaming write. As I look at > the code of the MP3Parser, I _think_ it would be trivial to write the > metadata before writing any content, and it wouldn't get in the way of > a streaming parse because the parser reads the whole file and caches > the content as it goes -- only writing once it has finished reading > the file. > Please open a ticket on our JIRA, and I'll take care of it. > > Best, > > Tim > On Sun, Oct 14, 2018 at 11:08 PM Nick Sincaglia <[email protected]> > wrote: >> >> I was wondering if anyone might have some insights on why the XML output >> does not contain some of the technical file information that the JSON and >> text version does. Is this something that can be fixed? Could someone >> suggest a way to go about identifying the root cause and fixing it? >> >> Thanks, >> >> Nick >> >> On Oct 8, 2018, at 9:31 PM, Nick Sincaglia <[email protected]> wrote: >> >> I am using the Tika 1.19 as a GUI to extract metadata from an .mp3 file. The >> sample rate is available and I am able access it, but only as a string or as >> part of a JSON document. I am working in XML and would like to use XML as a >> content handler. But when the metadata is returned as ‘structured text’ >> (XML) the sample rate is not returned. I have tried using Tika 1.19 in a >> Maven project and experimented with different contentHandlers and the same >> issue occurs. I cannot seem to get the sample rate returned in an XML doc, >> but I am able to access the data from the metadata object itself. If the >> metadata is returned as a string, the sample rate is there, if it is >> returned as XML, the sample rate is not returned. I am wondering what I am >> doing wrong or misunderstanding. Perhaps an issue with the parser or >> contentHandler that is used? >> >> >> >> Tika 1.19 ‘Metadata’ view (sample rate is available): >> >> >> >> Author: Glee Cast >> Content-Length: 8251946 >> Content-Type: audio/mpeg >> X-Parsed-By: org.apache.tika.parser.DefaultParser >> X-Parsed-By: org.apache.tika.parser.mp3.Mp3Parser >> X-TIKA:digest:MD5: e0bdf3a0e171fca838604f9baad46612 >> X-TIKA:digest:SHA256: >> ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0 >> channels: 2 >> creator: Glee Cast >> dc:creator: Glee Cast >> dc:title: Rehab (Glee Cast Version) >> meta:author: Glee Cast >> resourceName: USQX90900223_A4_T7.mp3 >> samplerate: 44100 >> title: Rehab (Glee Cast Version) >> version: MPEG 3 Layer III Version 1 >> xmpDM:album: Glee: The Music, The Complete Season One >> xmpDM:artist: Glee Cast >> xmpDM:audioChannelType: Stereo >> xmpDM:audioCompressor: MP3 >> xmpDM:audioSampleRate: 44100 >> xmpDM:duration: 206301.296875 >> xmpDM:genre: >> xmpDM:logComment: XXX - >> (P) 2009 Twentieth Century Fox Television - USQX90900223 >> xmpDM:releaseDate: >> xmpDM:trackNumber: 4 >> >> >> >> >> >> Tika 1.19 ‘Structured Text’ view (no sample rate): >> >> >> >> <?xml version="1.0" encoding="UTF-8"?><html >> xmlns="http://www.w3.org/1999/xhtml"> >> <head> >> <meta name="xmpDM:genre" content=""/> >> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/> >> <meta name="X-Parsed-By" content="org.apache.tika.parser.mp3.Mp3Parser"/> >> <meta name="creator" content="Glee Cast"/> >> <meta name="xmpDM:album" content="Glee: The Music, The Complete Season One"/> >> <meta name="xmpDM:releaseDate" content=""/> >> <meta name="meta:author" content="Glee Cast"/> >> <meta name="xmpDM:artist" content="Glee Cast"/> >> <meta name="X-TIKA:digest:SHA256" >> content="ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0"/> >> <meta name="dc:creator" content="Glee Cast"/> >> <meta name="xmpDM:audioCompressor" content="MP3"/> >> <meta name="resourceName" content="USQX90900223_A4_T7.mp3"/> >> <meta name="xmpDM:logComment" content="XXX - (P) 2009 Twentieth Century >> Fox Television - USQX90900223"/> >> <meta name="dc:title" content="Rehab (Glee Cast Version)"/> >> <meta name="Author" content="Glee Cast"/> >> <meta name="Content-Length" content="8251946"/> >> <meta name="X-TIKA:digest:MD5" content="e0bdf3a0e171fca838604f9baad46612"/> >> <meta name="Content-Type" content="audio/mpeg"/> >> <title>Rehab (Glee Cast Version)</title> >> </head> >> <body><h1>Rehab (Glee Cast Version)</h1> >> <p>Glee Cast</p> >> <p>Glee: The Music, The Complete Season One, track 4</p> >> <p>206301.3</p> >> <p>XXX - (P) 2009 Twentieth Century Fox Television - USQX90900223</p> >> </body></html> >> >> >> >> Tika 1.19 Recursive JSON view (the sample rate is there): >> >> >> >> [ >> { >> "Author": "Glee Cast", >> "Content-Type": "audio/mpeg", >> "X-Parsed-By": [ >> "org.apache.tika.parser.DefaultParser", >> "org.apache.tika.parser.mp3.Mp3Parser" >> ], >> "X-TIKA:content": "Rehab (Glee Cast Version)\nGlee Cast\nGlee: The Music, >> The Complete Season One, track 4\n206301.3\nXXX - \n(P) 2009 Twentieth >> Century Fox Television - USQX90900223\n", >> "X-TIKA:digest:MD5": "e0bdf3a0e171fca838604f9baad46612", >> "X-TIKA:digest:SHA256": >> "ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0", >> "X-TIKA:parse_time_millis": "86", >> "channels": "2", >> "creator": "Glee Cast", >> "dc:creator": "Glee Cast", >> "dc:title": "Rehab (Glee Cast Version)", >> "meta:author": "Glee Cast", >> "samplerate": "44100", >> "title": "Rehab (Glee Cast Version)", >> "version": "MPEG 3 Layer III Version 1", >> "xmpDM:album": "Glee: The Music, The Complete Season One", >> "xmpDM:artist": "Glee Cast", >> "xmpDM:audioChannelType": "Stereo", >> "xmpDM:audioCompressor": "MP3", >> "xmpDM:audioSampleRate": "44100", >> "xmpDM:duration": "206301.296875", >> "xmpDM:genre": "", >> "xmpDM:logComment": "XXX - \n(P) 2009 Twentieth Century Fox Television - >> USQX90900223", >> "xmpDM:releaseDate": "", >> "xmpDM:trackNumber": "4" >> } >> ] >> >> >>
