Nick,
I'm sorry for my delay. The XHTMLContentHandler writes everything
that is in the Metadata object when the parser writes the first
"content" element, and in the MP3Parser, this is the <h1> element,
which is written before the sample rate is added to the Metadata
object. Any metadata that is added afterwards does not show up in the
xhtml, but is retrievable from the Metadata object.
This is one of the limitations of a streaming write. As I look at
the code of the MP3Parser, I _think_ it would be trivial to write the
metadata before writing any content, and it wouldn't get in the way of
a streaming parse because the parser reads the whole file and caches
the content as it goes -- only writing once it has finished reading
the file.
Please open a ticket on our JIRA, and I'll take care of it.
Best,
Tim
On Sun, Oct 14, 2018 at 11:08 PM Nick Sincaglia <[email protected]> wrote:
>
> I was wondering if anyone might have some insights on why the XML output does
> not contain some of the technical file information that the JSON and text
> version does. Is this something that can be fixed? Could someone suggest a
> way to go about identifying the root cause and fixing it?
>
> Thanks,
>
> Nick
>
> On Oct 8, 2018, at 9:31 PM, Nick Sincaglia <[email protected]> wrote:
>
> I am using the Tika 1.19 as a GUI to extract metadata from an .mp3 file. The
> sample rate is available and I am able access it, but only as a string or as
> part of a JSON document. I am working in XML and would like to use XML as a
> content handler. But when the metadata is returned as ‘structured text’ (XML)
> the sample rate is not returned. I have tried using Tika 1.19 in a Maven
> project and experimented with different contentHandlers and the same issue
> occurs. I cannot seem to get the sample rate returned in an XML doc, but I am
> able to access the data from the metadata object itself. If the metadata is
> returned as a string, the sample rate is there, if it is returned as XML, the
> sample rate is not returned. I am wondering what I am doing wrong or
> misunderstanding. Perhaps an issue with the parser or contentHandler that is
> used?
>
>
>
> Tika 1.19 ‘Metadata’ view (sample rate is available):
>
>
>
> Author: Glee Cast
> Content-Length: 8251946
> Content-Type: audio/mpeg
> X-Parsed-By: org.apache.tika.parser.DefaultParser
> X-Parsed-By: org.apache.tika.parser.mp3.Mp3Parser
> X-TIKA:digest:MD5: e0bdf3a0e171fca838604f9baad46612
> X-TIKA:digest:SHA256:
> ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0
> channels: 2
> creator: Glee Cast
> dc:creator: Glee Cast
> dc:title: Rehab (Glee Cast Version)
> meta:author: Glee Cast
> resourceName: USQX90900223_A4_T7.mp3
> samplerate: 44100
> title: Rehab (Glee Cast Version)
> version: MPEG 3 Layer III Version 1
> xmpDM:album: Glee: The Music, The Complete Season One
> xmpDM:artist: Glee Cast
> xmpDM:audioChannelType: Stereo
> xmpDM:audioCompressor: MP3
> xmpDM:audioSampleRate: 44100
> xmpDM:duration: 206301.296875
> xmpDM:genre:
> xmpDM:logComment: XXX -
> (P) 2009 Twentieth Century Fox Television - USQX90900223
> xmpDM:releaseDate:
> xmpDM:trackNumber: 4
>
>
>
>
>
> Tika 1.19 ‘Structured Text’ view (no sample rate):
>
>
>
> <?xml version="1.0" encoding="UTF-8"?><html
> xmlns="http://www.w3.org/1999/xhtml">
> <head>
> <meta name="xmpDM:genre" content=""/>
> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
> <meta name="X-Parsed-By" content="org.apache.tika.parser.mp3.Mp3Parser"/>
> <meta name="creator" content="Glee Cast"/>
> <meta name="xmpDM:album" content="Glee: The Music, The Complete Season One"/>
> <meta name="xmpDM:releaseDate" content=""/>
> <meta name="meta:author" content="Glee Cast"/>
> <meta name="xmpDM:artist" content="Glee Cast"/>
> <meta name="X-TIKA:digest:SHA256"
> content="ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0"/>
> <meta name="dc:creator" content="Glee Cast"/>
> <meta name="xmpDM:audioCompressor" content="MP3"/>
> <meta name="resourceName" content="USQX90900223_A4_T7.mp3"/>
> <meta name="xmpDM:logComment" content="XXX - (P) 2009 Twentieth Century
> Fox Television - USQX90900223"/>
> <meta name="dc:title" content="Rehab (Glee Cast Version)"/>
> <meta name="Author" content="Glee Cast"/>
> <meta name="Content-Length" content="8251946"/>
> <meta name="X-TIKA:digest:MD5" content="e0bdf3a0e171fca838604f9baad46612"/>
> <meta name="Content-Type" content="audio/mpeg"/>
> <title>Rehab (Glee Cast Version)</title>
> </head>
> <body><h1>Rehab (Glee Cast Version)</h1>
> <p>Glee Cast</p>
> <p>Glee: The Music, The Complete Season One, track 4</p>
> <p>206301.3</p>
> <p>XXX - (P) 2009 Twentieth Century Fox Television - USQX90900223</p>
> </body></html>
>
>
>
> Tika 1.19 Recursive JSON view (the sample rate is there):
>
>
>
> [
> {
> "Author": "Glee Cast",
> "Content-Type": "audio/mpeg",
> "X-Parsed-By": [
> "org.apache.tika.parser.DefaultParser",
> "org.apache.tika.parser.mp3.Mp3Parser"
> ],
> "X-TIKA:content": "Rehab (Glee Cast Version)\nGlee Cast\nGlee: The Music,
> The Complete Season One, track 4\n206301.3\nXXX - \n(P) 2009 Twentieth
> Century Fox Television - USQX90900223\n",
> "X-TIKA:digest:MD5": "e0bdf3a0e171fca838604f9baad46612",
> "X-TIKA:digest:SHA256":
> "ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0",
> "X-TIKA:parse_time_millis": "86",
> "channels": "2",
> "creator": "Glee Cast",
> "dc:creator": "Glee Cast",
> "dc:title": "Rehab (Glee Cast Version)",
> "meta:author": "Glee Cast",
> "samplerate": "44100",
> "title": "Rehab (Glee Cast Version)",
> "version": "MPEG 3 Layer III Version 1",
> "xmpDM:album": "Glee: The Music, The Complete Season One",
> "xmpDM:artist": "Glee Cast",
> "xmpDM:audioChannelType": "Stereo",
> "xmpDM:audioCompressor": "MP3",
> "xmpDM:audioSampleRate": "44100",
> "xmpDM:duration": "206301.296875",
> "xmpDM:genre": "",
> "xmpDM:logComment": "XXX - \n(P) 2009 Twentieth Century Fox Television -
> USQX90900223",
> "xmpDM:releaseDate": "",
> "xmpDM:trackNumber": "4"
> }
> ]
>
>
>