Re: [DISCUSS] XMPBox

2021-03-28 Thread Guillaume Bailleul
Hi all,

When we wrote xmpbox, we tried to keep compatibility with the previous
jempbox. It had some limitations.

So some years ago, I needed a more open xmp implementation and I wrote
xemph [1]. You can have a look, I can rework on it if needed (I guess there
is an invalid dependency).

Regards,

[1] https://github.com/gbm-bailleul/xemph

Guillaume


Le dim. 28 mars 2021 à 19:37, Andreas Lehmkuehler  a
écrit :

> Am 28.03.21 um 19:27 schrieb sahy...@fileaffairs.de:
> > Am Sonntag, dem 28.03.2021 um 18:47 +0200 schrieb Tilman Hausherr:
> >> Am 28.03.2021 um 18:44 schrieb sahy...@fileaffairs.de:
> >>> Am Sonntag, dem 28.03.2021 um 16:36 +0200 schrieb Tilman Hausherr:
>  I don't have an opinion on XMP because I don't use it.
> >>> As XMP is needed for getting/setting metadata esp. since PDF 2.0
> >>> there
> >>> needs to be support for it - not neccesarily from us directly i.e.
> >>> we
> >>> could integrate a different lib.
> >>>
> >>> I'll revert the work done in PDFBOX-5128 and we get back to it
> >>> after
> >>> 3.0 - WDYT?
> >>
> >>
> >> No, why revert? As far as I understand it, it makes possible that
> >> XMPs
> >> with non standard schemas can still be parsed so that people can
> >> retrieve the standard stuff, so that is very useful.
> >
> > it's still very limited - I can keep it but as long as the XMP doesn't
> > conform to the (strict) initial parsing rules it will still fail. The
> > idea to revert was because of getting time to work on it (if we decide
> > to do so) or otherwise keep it in the state it has been before i.e.
> > targeted to PDF/A-1 conforming XMPs.
>
> I'm going to start a vote about the future of preflight after the release
> of the
> first RC for 3.0.0. Depending on the output we should think about a vote
> about
> the future of xmpbox as well.
>
> Let us see what happens and decide afterwards.
>
> Andreas
>
> >
> > BR
> > Maruan
> >
> >>
> >> Tilman
> >>
> >>
> >>
> >>>
> >>> BR
> >>> Maruan
> >>>
>  Re preflight, I agree with you. It was great but it has hit a
>  dead end,
>  and VeraPDF is better because it is more flexible.
> >>>
>  Tilman
> 
>  Am 28.03.2021 um 15:52 schrieb Andreas Lehmkuehler:
> > Am 28.03.21 um 15:00 schrieb sahy...@fileaffairs.de:
> >> Fellow colleagues,
> >>
> >> there was some discussion about the ability of XMPBox to
> >> parse
> >> arbritary XMP which lead to PDFBOX-5128.
> >>
> >> Now, after digging into the code and after reading through
> >> the
> >> various
> >> specs for XMP and PDF/A as it stands now XMPBox in it's
> >> current
> >> implementation is too restricted from the start as it not
> >> only per
> >> default (although there is a way around it) only supports
> >> parsing
> >> predefined XMP schemas restricted to the ones defined in
> >> PDF/A-1
> >> but
> >> also does some validation in the parsing phase.
> > Exactly the point where I stopped some time ago, when trying to
> > just
> > expand the parser ;-)
> >
> >
> >> Now, in order to get to an implementation for arbritary XMP
> >> that
> >> needs
> >> to change with the validation for PDF/A-1 put on top. We
> >> could use
> >> the
> >> existing implementation in a generalized way, use an existing
> >> Java
> >> XMP
> >> parser such as Adobes XMPCore or approach it in a layered
> >> fashion
> >> XML -
> >>> RDF -> XMP with supporting libs for that.
> >> The other option would be to keep XMPBox as is and for
> >> general
> >> purpose
> >> add a general parser into the project or simply refer to
> >> XMPCore.
> >>
> >> That leads me to the question about the benefit of having a
> >> general
> >> purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
> > It replaced JempBox when preflight was added to PDFBox, saying
> > that,
> > it was a more or less historical reason.
> >
> > I myself never needed that XMP-stuff. It is used by TIKA and
> > preflight
> > and maybe others.
> >
> > I have to admit that I already thought about the future of
> > preflight.
> > I've planned to come up with that topic after releasing 3.0.0,
> > but
> > why
> > waiting.
> >
> > Preflight is part of PDFBox but is practically not maintained.
> > Preflight support is limited to A1B and I don't see anybody who
> > plans
> > to extend it. VeraPDF has a lot more to offer and is open
> > source as
> > well, so maybe a better alternative ...
> >
> > How about removing preflight with 4.0.0? This would remove the
> > one
> > and
> > only hard dependency of XMPBox, so that it would be easier to
> > decide
> > if we really need to maintain out own XMP lib.
> >
> >
> > Andreas
> >
> > ---
> > --
> > To 

Re: [DISCUSS] XMPBox

2021-03-28 Thread Andreas Lehmkuehler

Am 28.03.21 um 19:27 schrieb sahy...@fileaffairs.de:

Am Sonntag, dem 28.03.2021 um 18:47 +0200 schrieb Tilman Hausherr:

Am 28.03.2021 um 18:44 schrieb sahy...@fileaffairs.de:

Am Sonntag, dem 28.03.2021 um 16:36 +0200 schrieb Tilman Hausherr:

I don't have an opinion on XMP because I don't use it.

As XMP is needed for getting/setting metadata esp. since PDF 2.0
there
needs to be support for it - not neccesarily from us directly i.e.
we
could integrate a different lib.

I'll revert the work done in PDFBOX-5128 and we get back to it
after
3.0 - WDYT?



No, why revert? As far as I understand it, it makes possible that
XMPs
with non standard schemas can still be parsed so that people can
retrieve the standard stuff, so that is very useful.


it's still very limited - I can keep it but as long as the XMP doesn't
conform to the (strict) initial parsing rules it will still fail. The
idea to revert was because of getting time to work on it (if we decide
to do so) or otherwise keep it in the state it has been before i.e.
targeted to PDF/A-1 conforming XMPs.


I'm going to start a vote about the future of preflight after the release of the 
first RC for 3.0.0. Depending on the output we should think about a vote about 
the future of xmpbox as well.


Let us see what happens and decide afterwards.

Andreas



BR
Maruan



Tilman





BR
Maruan


Re preflight, I agree with you. It was great but it has hit a
dead end,
and VeraPDF is better because it is more flexible.



Tilman

Am 28.03.2021 um 15:52 schrieb Andreas Lehmkuehler:

Am 28.03.21 um 15:00 schrieb sahy...@fileaffairs.de:

Fellow colleagues,

there was some discussion about the ability of XMPBox to
parse
arbritary XMP which lead to PDFBOX-5128.

Now, after digging into the code and after reading through
the
various
specs for XMP and PDF/A as it stands now XMPBox in it's
current
implementation is too restricted from the start as it not
only per
default (although there is a way around it) only supports
parsing
predefined XMP schemas restricted to the ones defined in
PDF/A-1
but
also does some validation in the parsing phase.

Exactly the point where I stopped some time ago, when trying to
just
expand the parser ;-)



Now, in order to get to an implementation for arbritary XMP
that
needs
to change with the validation for PDF/A-1 put on top. We
could use
the
existing implementation in a generalized way, use an existing
Java
XMP
parser such as Adobes XMPCore or approach it in a layered
fashion
XML -

RDF -> XMP with supporting libs for that.

The other option would be to keep XMPBox as is and for
general
purpose
add a general parser into the project or simply refer to
XMPCore.

That leads me to the question about the benefit of having a
general
purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?

It replaced JempBox when preflight was added to PDFBox, saying
that,
it was a more or less historical reason.

I myself never needed that XMP-stuff. It is used by TIKA and
preflight
and maybe others.

I have to admit that I already thought about the future of
preflight.
I've planned to come up with that topic after releasing 3.0.0,
but
why
waiting.

Preflight is part of PDFBox but is practically not maintained.
Preflight support is limited to A1B and I don't see anybody who
plans
to extend it. VeraPDF has a lot more to offer and is open
source as
well, so maybe a better alternative ...

How about removing preflight with 4.0.0? This would remove the
one
and
only hard dependency of XMPBox, so that it would be easier to
decide
if we really need to maintain out own XMP lib.


Andreas

---
--
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-

To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org






-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [DISCUSS] XMPBox

2021-03-28 Thread sahy...@fileaffairs.de
Am Sonntag, dem 28.03.2021 um 18:47 +0200 schrieb Tilman Hausherr:
> Am 28.03.2021 um 18:44 schrieb sahy...@fileaffairs.de:
> > Am Sonntag, dem 28.03.2021 um 16:36 +0200 schrieb Tilman Hausherr:
> > > I don't have an opinion on XMP because I don't use it.
> > As XMP is needed for getting/setting metadata esp. since PDF 2.0
> > there
> > needs to be support for it - not neccesarily from us directly i.e.
> > we
> > could integrate a different lib.
> > 
> > I'll revert the work done in PDFBOX-5128 and we get back to it
> > after
> > 3.0 - WDYT?
> 
> 
> No, why revert? As far as I understand it, it makes possible that
> XMPs 
> with non standard schemas can still be parsed so that people can 
> retrieve the standard stuff, so that is very useful.

it's still very limited - I can keep it but as long as the XMP doesn't
conform to the (strict) initial parsing rules it will still fail. The
idea to revert was because of getting time to work on it (if we decide
to do so) or otherwise keep it in the state it has been before i.e.
targeted to PDF/A-1 conforming XMPs.

BR
Maruan

> 
> Tilman
> 
> 
> 
> > 
> > BR
> > Maruan
> > 
> > > Re preflight, I agree with you. It was great but it has hit a
> > > dead end,
> > > and VeraPDF is better because it is more flexible.
> > 
> > > Tilman
> > > 
> > > Am 28.03.2021 um 15:52 schrieb Andreas Lehmkuehler:
> > > > Am 28.03.21 um 15:00 schrieb sahy...@fileaffairs.de:
> > > > > Fellow colleagues,
> > > > > 
> > > > > there was some discussion about the ability of XMPBox to
> > > > > parse
> > > > > arbritary XMP which lead to PDFBOX-5128.
> > > > > 
> > > > > Now, after digging into the code and after reading through
> > > > > the
> > > > > various
> > > > > specs for XMP and PDF/A as it stands now XMPBox in it's
> > > > > current
> > > > > implementation is too restricted from the start as it not
> > > > > only per
> > > > > default (although there is a way around it) only supports
> > > > > parsing
> > > > > predefined XMP schemas restricted to the ones defined in
> > > > > PDF/A-1
> > > > > but
> > > > > also does some validation in the parsing phase.
> > > > Exactly the point where I stopped some time ago, when trying to
> > > > just
> > > > expand the parser ;-)
> > > > 
> > > > 
> > > > > Now, in order to get to an implementation for arbritary XMP
> > > > > that
> > > > > needs
> > > > > to change with the validation for PDF/A-1 put on top. We
> > > > > could use
> > > > > the
> > > > > existing implementation in a generalized way, use an existing
> > > > > Java
> > > > > XMP
> > > > > parser such as Adobes XMPCore or approach it in a layered
> > > > > fashion
> > > > > XML -
> > > > > > RDF -> XMP with supporting libs for that.
> > > > > The other option would be to keep XMPBox as is and for
> > > > > general
> > > > > purpose
> > > > > add a general parser into the project or simply refer to
> > > > > XMPCore.
> > > > > 
> > > > > That leads me to the question about the benefit of having a
> > > > > general
> > > > > purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
> > > > It replaced JempBox when preflight was added to PDFBox, saying
> > > > that,
> > > > it was a more or less historical reason.
> > > > 
> > > > I myself never needed that XMP-stuff. It is used by TIKA and
> > > > preflight
> > > > and maybe others.
> > > > 
> > > > I have to admit that I already thought about the future of
> > > > preflight.
> > > > I've planned to come up with that topic after releasing 3.0.0,
> > > > but
> > > > why
> > > > waiting.
> > > > 
> > > > Preflight is part of PDFBox but is practically not maintained.
> > > > Preflight support is limited to A1B and I don't see anybody who
> > > > plans
> > > > to extend it. VeraPDF has a lot more to offer and is open
> > > > source as
> > > > well, so maybe a better alternative ...
> > > > 
> > > > How about removing preflight with 4.0.0? This would remove the
> > > > one
> > > > and
> > > > only hard dependency of XMPBox, so that it would be easier to
> > > > decide
> > > > if we really need to maintain out own XMP lib.
> > > > 
> > > > 
> > > > Andreas
> > > > 
> > > > ---
> > > > --
> > > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > > 
> > > 
> > > -
> > > 
> > > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > > 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: 

Re: [DISCUSS] XMPBox

2021-03-28 Thread Tilman Hausherr

Am 28.03.2021 um 18:44 schrieb sahy...@fileaffairs.de:

Am Sonntag, dem 28.03.2021 um 16:36 +0200 schrieb Tilman Hausherr:

I don't have an opinion on XMP because I don't use it.

As XMP is needed for getting/setting metadata esp. since PDF 2.0 there
needs to be support for it - not neccesarily from us directly i.e. we
could integrate a different lib.

I'll revert the work done in PDFBOX-5128 and we get back to it after
3.0 - WDYT?



No, why revert? As far as I understand it, it makes possible that XMPs 
with non standard schemas can still be parsed so that people can 
retrieve the standard stuff, so that is very useful.


Tilman





BR
Maruan


Re preflight, I agree with you. It was great but it has hit a dead end,
and VeraPDF is better because it is more flexible.



Tilman

Am 28.03.2021 um 15:52 schrieb Andreas Lehmkuehler:

Am 28.03.21 um 15:00 schrieb sahy...@fileaffairs.de:

Fellow colleagues,

there was some discussion about the ability of XMPBox to parse
arbritary XMP which lead to PDFBOX-5128.

Now, after digging into the code and after reading through the
various
specs for XMP and PDF/A as it stands now XMPBox in it's current
implementation is too restricted from the start as it not only per
default (although there is a way around it) only supports parsing
predefined XMP schemas restricted to the ones defined in PDF/A-1
but
also does some validation in the parsing phase.

Exactly the point where I stopped some time ago, when trying to just
expand the parser ;-)



Now, in order to get to an implementation for arbritary XMP that
needs
to change with the validation for PDF/A-1 put on top. We could use
the
existing implementation in a generalized way, use an existing Java
XMP
parser such as Adobes XMPCore or approach it in a layered fashion
XML -

RDF -> XMP with supporting libs for that.

The other option would be to keep XMPBox as is and for general
purpose
add a general parser into the project or simply refer to XMPCore.

That leads me to the question about the benefit of having a general
purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?

It replaced JempBox when preflight was added to PDFBox, saying that,
it was a more or less historical reason.

I myself never needed that XMP-stuff. It is used by TIKA and
preflight
and maybe others.

I have to admit that I already thought about the future of preflight.
I've planned to come up with that topic after releasing 3.0.0, but
why
waiting.

Preflight is part of PDFBox but is practically not maintained.
Preflight support is limited to A1B and I don't see anybody who plans
to extend it. VeraPDF has a lot more to offer and is open source as
well, so maybe a better alternative ...

How about removing preflight with 4.0.0? This would remove the one
and
only hard dependency of XMPBox, so that it would be easier to decide
if we really need to maintain out own XMP lib.


Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [DISCUSS] XMPBox

2021-03-28 Thread sahy...@fileaffairs.de
Am Sonntag, dem 28.03.2021 um 16:36 +0200 schrieb Tilman Hausherr:
> I don't have an opinion on XMP because I don't use it.

As XMP is needed for getting/setting metadata esp. since PDF 2.0 there
needs to be support for it - not neccesarily from us directly i.e. we
could integrate a different lib. 

I'll revert the work done in PDFBOX-5128 and we get back to it after
3.0 - WDYT?

BR
Maruan

> 
> Re preflight, I agree with you. It was great but it has hit a dead end,
> and VeraPDF is better because it is more flexible.


> 
> Tilman
> 
> Am 28.03.2021 um 15:52 schrieb Andreas Lehmkuehler:
> > Am 28.03.21 um 15:00 schrieb sahy...@fileaffairs.de:
> > > Fellow colleagues,
> > > 
> > > there was some discussion about the ability of XMPBox to parse
> > > arbritary XMP which lead to PDFBOX-5128.
> > > 
> > > Now, after digging into the code and after reading through the
> > > various
> > > specs for XMP and PDF/A as it stands now XMPBox in it's current
> > > implementation is too restricted from the start as it not only per
> > > default (although there is a way around it) only supports parsing
> > > predefined XMP schemas restricted to the ones defined in PDF/A-1
> > > but
> > > also does some validation in the parsing phase.
> > Exactly the point where I stopped some time ago, when trying to just 
> > expand the parser ;-)
> > 
> > 
> > > Now, in order to get to an implementation for arbritary XMP that
> > > needs
> > > to change with the validation for PDF/A-1 put on top. We could use
> > > the
> > > existing implementation in a generalized way, use an existing Java
> > > XMP
> > > parser such as Adobes XMPCore or approach it in a layered fashion
> > > XML -
> > > > RDF -> XMP with supporting libs for that.
> > > 
> > > The other option would be to keep XMPBox as is and for general
> > > purpose
> > > add a general parser into the project or simply refer to XMPCore.
> > > 
> > > That leads me to the question about the benefit of having a general
> > > purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
> > It replaced JempBox when preflight was added to PDFBox, saying that, 
> > it was a more or less historical reason.
> > 
> > I myself never needed that XMP-stuff. It is used by TIKA and
> > preflight 
> > and maybe others.
> > 
> > I have to admit that I already thought about the future of preflight.
> > I've planned to come up with that topic after releasing 3.0.0, but
> > why 
> > waiting.
> > 
> > Preflight is part of PDFBox but is practically not maintained. 
> > Preflight support is limited to A1B and I don't see anybody who plans
> > to extend it. VeraPDF has a lot more to offer and is open source as
> > well, so maybe a better alternative ...
> > 
> > How about removing preflight with 4.0.0? This would remove the one
> > and 
> > only hard dependency of XMPBox, so that it would be easier to decide 
> > if we really need to maintain out own XMP lib.
> > 
> > 
> > Andreas
> > 
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> > For additional commands, e-mail: dev-h...@pdfbox.apache.org
> > 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
> For additional commands, e-mail: dev-h...@pdfbox.apache.org
> 

-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [DISCUSS] XMPBox

2021-03-28 Thread Tilman Hausherr

I don't have an opinion on XMP because I don't use it.

Re preflight, I agree with you. It was great but it has hit a dead end, 
and VeraPDF is better because it is more flexible.


Tilman

Am 28.03.2021 um 15:52 schrieb Andreas Lehmkuehler:

Am 28.03.21 um 15:00 schrieb sahy...@fileaffairs.de:

Fellow colleagues,

there was some discussion about the ability of XMPBox to parse
arbritary XMP which lead to PDFBOX-5128.

Now, after digging into the code and after reading through the various
specs for XMP and PDF/A as it stands now XMPBox in it's current
implementation is too restricted from the start as it not only per
default (although there is a way around it) only supports parsing
predefined XMP schemas restricted to the ones defined in PDF/A-1 but
also does some validation in the parsing phase.
Exactly the point where I stopped some time ago, when trying to just 
expand the parser ;-)




Now, in order to get to an implementation for arbritary XMP that needs
to change with the validation for PDF/A-1 put on top. We could use the
existing implementation in a generalized way, use an existing Java XMP
parser such as Adobes XMPCore or approach it in a layered fashion XML -

RDF -> XMP with supporting libs for that.


The other option would be to keep XMPBox as is and for general purpose
add a general parser into the project or simply refer to XMPCore.

That leads me to the question about the benefit of having a general
purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
It replaced JempBox when preflight was added to PDFBox, saying that, 
it was a more or less historical reason.


I myself never needed that XMP-stuff. It is used by TIKA and preflight 
and maybe others.


I have to admit that I already thought about the future of preflight. 
I've planned to come up with that topic after releasing 3.0.0, but why 
waiting.


Preflight is part of PDFBox but is practically not maintained. 
Preflight support is limited to A1B and I don't see anybody who plans 
to extend it. VeraPDF has a lot more to offer and is open source as 
well, so maybe a better alternative ...


How about removing preflight with 4.0.0? This would remove the one and 
only hard dependency of XMPBox, so that it would be easier to decide 
if we really need to maintain out own XMP lib.



Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [DISCUSS] XMPBox

2021-03-28 Thread Andreas Lehmkuehler

Am 28.03.21 um 15:00 schrieb sahy...@fileaffairs.de:

Fellow colleagues,

there was some discussion about the ability of XMPBox to parse
arbritary XMP which lead to PDFBOX-5128.

Now, after digging into the code and after reading through the various
specs for XMP and PDF/A as it stands now XMPBox in it's current
implementation is too restricted from the start as it not only per
default (although there is a way around it) only supports parsing
predefined XMP schemas restricted to the ones defined in PDF/A-1 but
also does some validation in the parsing phase.
Exactly the point where I stopped some time ago, when trying to just expand the 
parser ;-)




Now, in order to get to an implementation for arbritary XMP that needs
to change with the validation for PDF/A-1 put on top. We could use the
existing implementation in a generalized way, use an existing Java XMP
parser such as Adobes XMPCore or approach it in a layered fashion XML -

RDF -> XMP with supporting libs for that.


The other option would be to keep XMPBox as is and for general purpose
add a general parser into the project or simply refer to XMPCore.

That leads me to the question about the benefit of having a general
purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
It replaced JempBox when preflight was added to PDFBox, saying that, it was a 
more or less historical reason.


I myself never needed that XMP-stuff. It is used by TIKA and preflight and maybe 
others.


I have to admit that I already thought about the future of preflight. I've 
planned to come up with that topic after releasing 3.0.0, but why waiting.


Preflight is part of PDFBox but is practically not maintained. Preflight support 
is limited to A1B and I don't see anybody who plans to extend it. VeraPDF has a 
lot more to offer and is open source as well, so maybe a better alternative ...


How about removing preflight with 4.0.0? This would remove the one and only hard 
dependency of XMPBox, so that it would be easier to decide if we really need to 
maintain out own XMP lib.



Andreas

-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



Re: [DISCUSS] XMPBox

2021-03-28 Thread sahy...@fileaffairs.de
quick addition - I'm happy to put the work into that if we think it's
worth the effort.

Maruan

Am Sonntag, dem 28.03.2021 um 15:00 +0200 schrieb
sahy...@fileaffairs.de:
> Fellow colleagues,
> 
> there was some discussion about the ability of XMPBox to parse
> arbritary XMP which lead to PDFBOX-5128.
> 
> Now, after digging into the code and after reading through the
> various
> specs for XMP and PDF/A as it stands now XMPBox in it's current
> implementation is too restricted from the start as it not only per
> default (although there is a way around it) only supports parsing
> predefined XMP schemas restricted to the ones defined in PDF/A-1 but
> also does some validation in the parsing phase.
> 
> Now, in order to get to an implementation for arbritary XMP that
> needs
> to change with the validation for PDF/A-1 put on top. We could use
> the
> existing implementation in a generalized way, use an existing Java
> XMP
> parser such as Adobes XMPCore or approach it in a layered fashion XML
> -
> > RDF -> XMP with supporting libs for that.
> 
> The other option would be to keep XMPBox as is and for general
> purpose
> add a general parser into the project or simply refer to XMPCore.
> 
> That leads me to the question about the benefit of having a general
> purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?
> 
> BR    
>  

-- 
-- 
Maruan Sahyoun



-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org



[DISCUSS] XMPBox

2021-03-28 Thread sahy...@fileaffairs.de
Fellow colleagues,

there was some discussion about the ability of XMPBox to parse
arbritary XMP which lead to PDFBOX-5128.

Now, after digging into the code and after reading through the various
specs for XMP and PDF/A as it stands now XMPBox in it's current
implementation is too restricted from the start as it not only per
default (although there is a way around it) only supports parsing
predefined XMP schemas restricted to the ones defined in PDF/A-1 but
also does some validation in the parsing phase.

Now, in order to get to an implementation for arbritary XMP that needs
to change with the validation for PDF/A-1 put on top. We could use the
existing implementation in a generalized way, use an existing Java XMP
parser such as Adobes XMPCore or approach it in a layered fashion XML -
> RDF -> XMP with supporting libs for that.

The other option would be to keep XMPBox as is and for general purpose
add a general parser into the project or simply refer to XMPCore.

That leads me to the question about the benefit of having a general
purpose (ASL licensed) XMP lib as part of PDFBox? Thoughts?

BR
 
-- 
-- 
Maruan Sahyoun

FileAffairs GmbH
Josef-Schappe-Straße 21
40882 Ratingen

Tel: +49 (2102) 89497 88
Fax: +49 (2102) 89497 91
sahy...@fileaffairs.de
www.fileaffairs.de

Geschäftsführer: Maruan Sahyoun
Handelsregister: AG Düsseldorf, HRB 53837
UST.-ID: DE248275827


-
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org