Re: Pasting from pdf

2011-05-27 Thread Andrew Parsloe

On 26/05/2011 11:21 p.m., Trevor Jenkins wrote:

On Thu, May 26, 2011 at 11:26 AM, Julio Rojas mailto:jcredbe...@gmail.com>> wrote:

BTW, Andrew, how did you manage to copy from a PDF and pasting on
Abiword keeping the format? I was curious, so I tried with a simple
PDF, but it only pasted plain text.


I tried something similar. Had a LyX-generated PDF file open, copied the
content and pasted it into an OpenOffice.org document. The pasted
version had some of the PDF markup with it, font size mostly but not
much more. This was under Mac OS X with Abode Reader.

Regards, Trevor.

<>< Re: deemed!


I did some experiments -- created a Lyx document and emphasised a few 
words here and there, then created a pdf. In the pdf, Ctrl-A, Ctrl-C 
then in Abiword or my dear old Word 95 Ctrl-V. No italics. That's 
strange. So I went back to LyX and played about a bit. The critical 
thing seems to be to choose a definite non-default font. If, for 
instance, I choose Latin Modern and go through the same procedure, then 
the paste into Word or Abiword retains the emphasis. Reverting to the 
default font, or Computer Modern, and the emphasis is not pasted.


Andrew


Re: Pasting from pdf

2011-05-26 Thread Richard Heck
On 05/26/2011 05:25 PM, Andrew Parsloe wrote:
>
> These have been interesting responses. I was unaware of the complexity
> of the issue. In fact I'm not after minutely detailed reproduction of
> the original character style -- fingerpainting -- but was hoping that
> emphasised text in the original might be preserved as emphasised text
> after pasting, the same for bolding, as it is in Word, Abiword, Open
> Office, irrespective of what happens to the font, text size and so on.
>
But of course this is fingerpainting. Emphasized text is being
translated as /italic/, and bold text is being translated as *bold*.
Italic is not emphasized, and bold is not strong. That is: This
information is coming from what fonts are used, not from anything
structural.

rh



Re: Pasting from pdf

2011-05-26 Thread Andrew Parsloe

On 26/05/2011 10:26 p.m., Julio Rojas wrote:

Just what does bold-italic-numbered-size 14-roman actually mean in
the specific context in which it occurs; in one document it might be
a procedural rendition of a section head but in another the same
author might intend it to be emphasis (rendered simply as bold).


This is the exact question one should answer before asking why what
Andrew wants is not the best path to follow. Obviously, Andrew's
documents follow some layout specifically designed for some journal, so
for him this answer is straighforward. Nonetheless, for Lyx it doesn't
matter, as it will be impossible to get this information from the PDF.

BTW, Andrew, how did you manage to copy from a PDF and pasting on
Abiword keeping the format? I was curious, so I tried with a simple PDF,
but it only pasted plain text.

Regards.
-
Julio Rojas
jcredbe...@gmail.com 


I'm working in Windows, so it was Ctrl-A, Ctrl-C in the pdf, then Ctrl-V 
in Abiword. Voila! Text with underlining, bold and italics (emphasis) 
preserved.




On Thu, May 26, 2011 at 12:05 PM, Trevor Jenkins mailto:bslwann...@gmail.com>> wrote:

On Thu, May 26, 2011 at 10:31 AM, Julio Rojas mailto:jcredbe...@gmail.com>> wrote:

Making some text as Bold-Italic-Numbered-Size 14-Roman can be
done in Lyx, but it is much better to say that such text is in a
Section environment, because when changes on the document style
are made, you only have to make them once, and not throughout
the document, at every instance of said format. Finger painting,
what you need, is not the way to go in Lyx/Latex. It should be
avoided like the plague.

I hope you see the need for this approach. Regards.


Many years ago (a couple of decades at least) there was a paper
presented at a SigDoc conference, I think it was, on some initial
attempts to infer structural markup from procedural. This project
must have been sometime in the late 1980s (certainly post 1986 and
the ISO's publication of ISO 8879 SGML) and the early 1990s. I place
it in that time frame because of SGML being an International
Standard and the governmental procurement processes that mandate the
use of standards over ad hoc solutions, which procedural markup is,
and system suppliers not wishing to lose business because they could
deal with the new published standard. At this moment I can't lay my
hands on my copy of those proceedings but I do recall that the
procedural editor being used was WPS-PLUS.

Although I was never involved in that project it did appear to be a
sensible way to convert from old-fashioned procedural to modern
structural markup. I jotted down some OPS-5-like rules in an attempt
to create an expert system that would ease this conversion. And in
moments of craziness return to them adding more and more special
cases to cover how people use procedural stuff. I also got stuck on
the input phase of dealing with the multiplicity of formats being
spewed out by Microsoft Word let alone all the other proprietary
word processing and DTP formats that exist.

Some one else tried a similar project that converted groff/troff
files from a limited set of ms/mm/etc macros to a more structural
form. It may also have included some TeX conversion too. This would
also have been in the early 1990s. I thought it was Eric Raymond but
there's nothing on his web site about it now. My memory is that the
person abandoned the task quite quickly because of the complexity of
the task. Just what does bold-italic-numbered-size 14-roman actually
mean in the specific context in which it occurs; in one document it
might be a procedural rendition of a section head but in another the
same author might intend it to be emphasis (rendered simply as bold).

However, there is a solution to the original poster's request.
Include all the original markup from the pasted in document under a
TeX escape. It won't be pretty and as with the work-arounds already
being used will require manual intervention to convert to LaTeX's
pseudo structural markup scheme.

Regards, Trevor.

<>< Re: deemed!


These have been interesting responses. I was unaware of the complexity 
of the issue. In fact I'm not after minutely detailed reproduction of 
the original character style -- fingerpainting -- but was hoping that 
emphasised text in the original might be preserved as emphasised text 
after pasting, the same for bolding, as it is in Word, Abiword, Open 
Office, irrespective of what happens to the font, text size and so on.


Andrew


Re: Pasting from pdf

2011-05-26 Thread Trevor Jenkins
On Thu, May 26, 2011 at 11:26 AM, Julio Rojas  wrote:

> BTW, Andrew, how did you manage to copy from a PDF and pasting on Abiword
> keeping the format? I was curious, so I tried with a simple PDF, but it only
> pasted plain text.
>

I tried something similar. Had a LyX-generated PDF file open, copied the
content and pasted it into an OpenOffice.org document. The pasted version
had some of the PDF markup with it, font size mostly but not much more. This
was under Mac OS X with Abode Reader.

Regards, Trevor.

<>< Re: deemed!


Re: Pasting from pdf

2011-05-26 Thread Julio Rojas
>
> Just what does bold-italic-numbered-size 14-roman actually mean in the
> specific context in which it occurs; in one document it might be a
> procedural rendition of a section head but in another the same author might
> intend it to be emphasis (rendered simply as bold).
>
>
This is the exact question one should answer before asking why what Andrew
wants is not the best path to follow. Obviously, Andrew's documents follow
some layout specifically designed for some journal, so for him this answer
is straighforward. Nonetheless, for Lyx it doesn't matter, as it will be
impossible to get this information from the PDF.

BTW, Andrew, how did you manage to copy from a PDF and pasting on Abiword
keeping the format? I was curious, so I tried with a simple PDF, but it only
pasted plain text.

Regards.
-
Julio Rojas
jcredbe...@gmail.com


On Thu, May 26, 2011 at 12:05 PM, Trevor Jenkins wrote:

> On Thu, May 26, 2011 at 10:31 AM, Julio Rojas wrote:
>
>  Making some text as Bold-Italic-Numbered-Size 14-Roman can be done in Lyx,
>> but it is much better to say that such text is in a Section environment,
>> because when changes on the document style are made, you only have to make
>> them once, and not throughout the document, at every instance of said
>> format. Finger painting, what you need, is not the way to go in Lyx/Latex.
>> It should be avoided like the plague.
>>
>> I hope you see the need for this approach. Regards.
>>
>
> Many years ago (a couple of decades at least) there was a paper presented
> at a SigDoc conference, I think it was, on some initial attempts to infer
> structural markup from procedural. This project must have been sometime in
> the late 1980s (certainly post 1986 and the ISO's publication of ISO 8879
> SGML) and the early 1990s. I place it in that time frame because of SGML
> being an International Standard and the governmental procurement processes
> that mandate the use of standards over ad hoc solutions, which procedural
> markup is, and system suppliers not wishing to lose business because they
> could deal with the new published standard. At this moment I can't lay my
> hands on my copy of those proceedings but I do recall that the procedural
> editor being used was WPS-PLUS.
>
> Although I was never involved in that project it did appear to be a
> sensible way to convert from old-fashioned procedural to modern structural
> markup. I jotted down some OPS-5-like rules in an attempt to create an
> expert system that would ease this conversion. And in moments of craziness
> return to them adding more and more special cases to cover how people use
> procedural stuff. I also got stuck on the input phase of dealing with the
> multiplicity of formats being spewed out by Microsoft Word let alone all the
> other proprietary word processing and DTP formats that exist.
>
> Some one else tried a similar project that converted groff/troff files from
> a limited set of ms/mm/etc macros to a more structural form. It may also
> have included some TeX conversion too. This would also have been in the
> early 1990s. I thought it was Eric Raymond but there's nothing on his web
> site about it now. My memory is that the person abandoned the task quite
> quickly because of the complexity of the task. Just what does
> bold-italic-numbered-size 14-roman actually mean in the specific context in
> which it occurs; in one document it might be a procedural rendition of a
> section head but in another the same author might intend it to be emphasis
> (rendered simply as bold).
>
> However, there is a solution to the original poster's request. Include all
> the original markup from the pasted in document under a TeX escape. It won't
> be pretty and as with the work-arounds already being used will require
> manual intervention to convert to LaTeX's pseudo structural markup scheme.
>
> Regards, Trevor.
>
> <>< Re: deemed!
>


Re: Pasting from pdf

2011-05-26 Thread Trevor Jenkins
On Thu, May 26, 2011 at 10:31 AM, Julio Rojas  wrote:

Making some text as Bold-Italic-Numbered-Size 14-Roman can be done in Lyx,
> but it is much better to say that such text is in a Section environment,
> because when changes on the document style are made, you only have to make
> them once, and not throughout the document, at every instance of said
> format. Finger painting, what you need, is not the way to go in Lyx/Latex.
> It should be avoided like the plague.
>
> I hope you see the need for this approach. Regards.
>

Many years ago (a couple of decades at least) there was a paper presented at
a SigDoc conference, I think it was, on some initial attempts to infer
structural markup from procedural. This project must have been sometime in
the late 1980s (certainly post 1986 and the ISO's publication of ISO 8879
SGML) and the early 1990s. I place it in that time frame because of SGML
being an International Standard and the governmental procurement processes
that mandate the use of standards over ad hoc solutions, which procedural
markup is, and system suppliers not wishing to lose business because they
could deal with the new published standard. At this moment I can't lay my
hands on my copy of those proceedings but I do recall that the procedural
editor being used was WPS-PLUS.

Although I was never involved in that project it did appear to be a sensible
way to convert from old-fashioned procedural to modern structural markup. I
jotted down some OPS-5-like rules in an attempt to create an expert system
that would ease this conversion. And in moments of craziness return to them
adding more and more special cases to cover how people use procedural stuff.
I also got stuck on the input phase of dealing with the multiplicity of
formats being spewed out by Microsoft Word let alone all the other
proprietary word processing and DTP formats that exist.

Some one else tried a similar project that converted groff/troff files from
a limited set of ms/mm/etc macros to a more structural form. It may also
have included some TeX conversion too. This would also have been in the
early 1990s. I thought it was Eric Raymond but there's nothing on his web
site about it now. My memory is that the person abandoned the task quite
quickly because of the complexity of the task. Just what does
bold-italic-numbered-size 14-roman actually mean in the specific context in
which it occurs; in one document it might be a procedural rendition of a
section head but in another the same author might intend it to be emphasis
(rendered simply as bold).

However, there is a solution to the original poster's request. Include all
the original markup from the pasted in document under a TeX escape. It won't
be pretty and as with the work-arounds already being used will require
manual intervention to convert to LaTeX's pseudo structural markup scheme.

Regards, Trevor.

<>< Re: deemed!


Re: Pasting from pdf

2011-05-26 Thread Julio Rojas
I believe, dear Andrew, this is philosophical in nature. Lyx is based on
Latex, so text is text, independently of how it looks in a PDF. Text changes
its attributes depending on which environment it is included, information
which I believe, would be impossible to get from a PDF via copy-paste.

Making some text as Bold-Italic-Numbered-Size 14-Roman can be done in Lyx,
but it is much better to say that such text is in a Section environment,
because when changes on the document style are made, you only have to make
them once, and not throughout the document, at every instance of said
format. Finger painting, what you need, is not the way to go in Lyx/Latex.
It should be avoided like the plague.

I hope you see the need for this approach. Regards.
-
Julio Rojas
jcredbe...@gmail.com


On Thu, May 26, 2011 at 11:17 AM, Andrew Parsloe wrote:

> When I copy text from a pdf to the clipboard and paste into Word 95, Open
> Office or Abiword, the pasted text retains underlining, bolding and emphasis
> (italics). When I paste into LyX these are lost. It would be a real
> enhancement if LyX were to retain these styles. Is there a deep reason for
> LyX's failure here (or the even deeper reason of lack of an interested
> developer)?
>
> At present, I have some journals to edit. The publisher sent them to me as
> pdf documents. I find I can copy a pdf and paste into Abiword, then save
> from Abiword as a Latex document, open that in Notepad++, find-&-replace all
> the extra spacing commands, and finally import that document into LyX. I get
> there in the end, but it's a roundabout process.
>
> Andrew
>


Pasting from pdf

2011-05-26 Thread Andrew Parsloe
When I copy text from a pdf to the clipboard and paste into Word 95, 
Open Office or Abiword, the pasted text retains underlining, bolding and 
emphasis (italics). When I paste into LyX these are lost. It would be a 
real enhancement if LyX were to retain these styles. Is there a deep 
reason for LyX's failure here (or the even deeper reason of lack of an 
interested developer)?


At present, I have some journals to edit. The publisher sent them to me 
as pdf documents. I find I can copy a pdf and paste into Abiword, then 
save from Abiword as a Latex document, open that in Notepad++, 
find-&-replace all the extra spacing commands, and finally import that 
document into LyX. I get there in the end, but it's a roundabout process.


Andrew