Re: [PATCH] Make HTML Export Work (Bugs 3090, 3047, etc)
On Thu, Jun 14, 2007 at 04:06:16PM -0400, Richard Heck wrote: I've added two scripts: a dir_copy.py script, that simply copies the entire temporary directory over to a subdirectory of the intended output directory, and a tex4html_copy.py script that copies only .png, .html, and .css files, these (I'm pretty sure) being the only kinds of files generated by htlatex. What happens, in the end, then, is that if you open /path/to/file/LyXFile.lyx and do FileExportHTML, then you end up with a (possibly new) directory /path/to/file/LyXFile.html.LyXconv/ and all the relevant files are in there. Rather, say, than scattered across /path/to/file/, which would make it a hassle then to move them to a webserver. When the html converter is not htlatex, why don't you simply take a snapshot of the files that are in the temp dir just before calling the converter, put their names in a file, say FilesToNotBeCopied, and then use a html_copy.py script that copies only those files that are not listed in FilesToNotBeCopied? You may also want to check if one of the `files' created by the converter is instead a directory and properly copy that, too. And then remove the `files' that gets copied, of course. -- Enrico
Re: [PATCH] Make HTML Export Work (Bugs 3090, 3047, etc)
Enrico Forestieri wrote: On Thu, Jun 14, 2007 at 04:06:16PM -0400, Richard Heck wrote: I've added two scripts: a dir_copy.py script, that simply copies the entire temporary directory over to a subdirectory of the intended output directory, and a tex4html_copy.py script that copies only .png, .html, and .css files, these (I'm pretty sure) being the only kinds of files generated by htlatex. What happens, in the end, then, is that if you open /path/to/file/LyXFile.lyx and do FileExportHTML, then you end up with a (possibly new) directory /path/to/file/LyXFile.html.LyXconv/ and all the relevant files are in there. Rather, say, than scattered across /path/to/file/, which would make it a hassle then to move them to a webserver. When the html converter is not htlatex, why don't you simply take a snapshot of the files that are in the temp dir just before calling the converter, put their names in a file, say FilesToNotBeCopied, and then use a html_copy.py script that copies only those files that are not listed in FilesToNotBeCopied? Yes, we discussed this before, and I thought about that, but there are two problems. One is that we don't know that none of the files that are generated by the HTML converter over-write files that are already present. I don't know that this would be a common problem, but it's possible. I had proposed trying to check the timestamps to avoid this problem, but that turned out to be useless, because of the granularity of the timestamps. The other is that it involves messing with Converters.cpp, which is what I was kind of trying not to do. And we don't want to check there what the converter is, so we'd have to generate this file all the time. I guess there could be a special flag for that, but that just seems so messy. The better solution would be for me to find out what latex2html generates, then write a special script for it. You may also want to check if one of the `files' created by the converter is instead a directory and properly copy that, too. I'll add that to the dir_copy.py program. It actually means I can just use copytree(), so everything gets simpler. And then remove the `files' that gets copied, of course. Yes, of course, if we go that way. Richard -- == Richard G Heck, Jr Professor of Philosophy Brown University http://frege.brown.edu/heck/ == Get my public key from http://sks.keyserver.penguin.de Hash: 0x1DE91F1E66FFBDEC Learn how to sign your email using Thunderbird and GnuPG at: http://dudu.dyn.2-h.org/nist/gpg-enigmail-howto
Re: [PATCH] Make HTML Export Work (Bugs 3090, 3047, etc)
On Thu, Jun 14, 2007 at 06:49:17PM -0400, Richard Heck wrote: Enrico Forestieri wrote: On Thu, Jun 14, 2007 at 04:06:16PM -0400, Richard Heck wrote: I've added two scripts: a dir_copy.py script, that simply copies the entire temporary directory over to a subdirectory of the intended output directory, and a tex4html_copy.py script that copies only .png, .html, and .css files, these (I'm pretty sure) being the only kinds of files generated by htlatex. What happens, in the end, then, is that if you open /path/to/file/LyXFile.lyx and do FileExportHTML, then you end up with a (possibly new) directory /path/to/file/LyXFile.html.LyXconv/ and all the relevant files are in there. Rather, say, than scattered across /path/to/file/, which would make it a hassle then to move them to a webserver. When the html converter is not htlatex, why don't you simply take a snapshot of the files that are in the temp dir just before calling the converter, put their names in a file, say FilesToNotBeCopied, and then use a html_copy.py script that copies only those files that are not listed in FilesToNotBeCopied? Yes, we discussed this before, and I thought about that, but there are two problems. One is that we don't know that none of the files that are generated by the HTML converter over-write files that are already present. I don't know that this would be a common problem, but it's possible. I had proposed trying to check the timestamps to avoid this problem, but that turned out to be useless, because of the granularity of the timestamps. On POSIX systems the granularity is 1 second, on Windows with FAT it is 2 seconds. So, what about creating a file, taking its timestamp, waiting for 2 seconds and then calling the converter? The other is that it involves messing with Converters.cpp, which is what I was kind of trying not to do. And we don't want to check there what the converter is, so we'd have to generate this file all the time. I guess there could be a special flag for that, but that just seems so messy. The better solution would be for me to find out what latex2html generates, then write a special script for it. This is wrong, as you also have to take into account tth, hevea and I am sure that an user could use some other converter that you don't know about. You may also want to check if one of the `files' created by the converter is instead a directory and properly copy that, too. I'll add that to the dir_copy.py program. It actually means I can just use copytree(), so everything gets simpler. Yes, simpler, but this way you are going to copy a lot of trash and I am not sure that a casual user is able to sort out the mess. -- Enrico
Re: [PATCH] Make HTML Export Work (Bugs 3090, 3047, etc)
Richard Heck schrieb: The attached patch should finally make HTML export work properly. As I wrote in bug 3090 the bug does no longer appear with MiKTeX 2.6 and tex4ht, but to assure that it works with all configurations, your solution is the right one. Some annotations: - htlatex is always invoked what costs unnecessary a lot of time: Open the Intro.lyx and view it as HTML, then view it again and htlatex is run again although nothing has been changed in the document in the meantime. (When viewing e.g. the document as PDF pdflatex is correctly not invoked for the the second view.) I don't know why this happens for only for viewing HTML. - when I export the Intro.lyx, the result is stored on the hard disk under this folder name: Intro.html.LyXconv. I prefer to name it only Intro.html or is there a specail reason for the current name? - as viewing HTML works now we should think about a button in the view toolbar for HTML ( - +# author Angus Leeming +# author Georg Baum +# author Richard Heck Aren't you only the author of the two new scripts or did you three worked on a solution the last days and I missed this? ) All in all a nice solution. It would be good when the first issue could be fixed but in any case I give my OK to put it in. many thanks for your long breath with this issue! best regards Uwe
Re: [PATCH] Make HTML Export Work (Bugs 3090, 3047, etc)
Enrico Forestieri wrote: On Thu, Jun 14, 2007 at 06:49:17PM -0400, Richard Heck wrote: Enrico Forestieri wrote: On Thu, Jun 14, 2007 at 04:06:16PM -0400, Richard Heck wrote: I've added two scripts: a dir_copy.py script, that simply copies the entire temporary directory over to a subdirectory of the intended output directory, and a tex4html_copy.py script that copies only .png, .html, and .css files, these (I'm pretty sure) being the only kinds of files generated by htlatex. What happens, in the end, then, is that if you open /path/to/file/LyXFile.lyx and do FileExportHTML, then you end up with a (possibly new) directory /path/to/file/LyXFile.html.LyXconv/ and all the relevant files are in there. Rather, say, than scattered across /path/to/file/, which would make it a hassle then to move them to a webserver. When the html converter is not htlatex, why don't you simply take a snapshot of the files that are in the temp dir just before calling the converter, put their names in a file, say FilesToNotBeCopied, and then use a html_copy.py script that copies only those files that are not listed in FilesToNotBeCopied? Yes, we discussed this before, and I thought about that, but there are two problems. One is that we don't know that none of the files that are generated by the HTML converter over-write files that are already present. I don't know that this would be a common problem, but it's possible. I had proposed trying to check the timestamps to avoid this problem, but that turned out to be useless, because of the granularity of the timestamps. On POSIX systems the granularity is 1 second, on Windows with FAT it is 2 seconds. So, what about creating a file, taking its timestamp, waiting for 2 seconds and then calling the converter? This seems an awful waste of time. I suppose if we were doing this in the background it wouldn't be so bad, but that's not how it's presently done. But anyway, there's another and to my mind fatal problem, the one that got me going this direction in the first place. Suppose you View HTML before you Export HTML. Then all the files that the converter will generate are already present, and nothing will be exported. I'm sure there's some way to work around that, but, again, it seems to me that it's getting very messy for what is, in reality, a very special case. The other is that it involves messing with Converters.cpp, which is what I was kind of trying not to do. And we don't want to check there what the converter is, so we'd have to generate this file all the time. I guess there could be a special flag for that, but that just seems so messy. The better solution would be for me to find out what latex2html generates, then write a special script for it. This is wrong, as you also have to take into account tth, hevea and I am sure that an user could use some other converter that you don't know about. Right, of course. But we can check for the ones we do know about and take appropriate action. That said, since we're just copying on the basis of extensions, maybe there should be an extra argument for that, and then we don't have to write extra scripts. The copier just becomes: python -tt ext_copier.py -e png,css,html $$i $$o. What do you think? Richard -- == Richard G Heck, Jr Professor of Philosophy Brown University http://frege.brown.edu/heck/ == Get my public key from http://sks.keyserver.penguin.de Hash: 0x1DE91F1E66FFBDEC Learn how to sign your email using Thunderbird and GnuPG at: http://dudu.dyn.2-h.org/nist/gpg-enigmail-howto
Re: [PATCH] Make HTML Export Work (Bugs 3090, 3047, etc)
Uwe Stöhr wrote: Richard Heck schrieb: The attached patch should finally make HTML export work properly. As I wrote in bug 3090 the bug does no longer appear with MiKTeX 2.6 and tex4ht, but to assure that it works with all configurations, your solution is the right one. Some annotations: - htlatex is always invoked what costs unnecessary a lot of time: Open the Intro.lyx and view it as HTML, then view it again and htlatex is run again although nothing has been changed in the document in the meantime. (When viewing e.g. the document as PDF pdflatex is correctly not invoked for the the second view.) I don't know why this happens for only for viewing HTML. Because caching doesn't work properly in that case. I think the problem is that the caching mechanism basically assumes that only one file is being generated. File a new bug report about this, I'd say. - when I export the Intro.lyx, the result is stored on the hard disk under this folder name: Intro.html.LyXconv. I prefer to name it only Intro.html or is there a specail reason for the current name? I was worried about collisions with Intro.html, should it exist. The idea was to use an easily identifiable directory name that is unlikely to be present otherwise. I didn't want to have it be Intro.html.1181865626 (seconds since the epoch) or whatever, though that would be safer. - as viewing HTML works now we should think about a button in the view toolbar for HTML Absolutely. Not to mention a shortcut, if there isn't one. +# author Angus Leeming +# author Georg Baum +# author Richard Heck Aren't you only the author of the two new scripts or did you three worked on a solution the last days and I missed this? They're just adaptations of the old ones. Very simple ones. So I'm borrowing. All in all a nice solution. Thanks! Richard -- == Richard G Heck, Jr Professor of Philosophy Brown University http://frege.brown.edu/heck/ == Get my public key from http://sks.keyserver.penguin.de Hash: 0x1DE91F1E66FFBDEC Learn how to sign your email using Thunderbird and GnuPG at: http://dudu.dyn.2-h.org/nist/gpg-enigmail-howto
Re: [PATCH] Make HTML Export Work (Bugs 3090, 3047, etc)
On Thu, Jun 14, 2007 at 04:06:16PM -0400, Richard Heck wrote: > I've added two scripts: a dir_copy.py script, that simply copies the > entire temporary directory over to a subdirectory of the intended output > directory, and a tex4html_copy.py script that copies only .png, .html, > and .css files, these (I'm pretty sure) being the only kinds of files > generated by htlatex. What happens, in the end, then, is that if you > open /path/to/file/LyXFile.lyx and do File>Export>HTML, then you end up > with a (possibly new) directory /path/to/file/LyXFile.html.LyXconv/ and > all the relevant files are in there. Rather, say, than scattered across > /path/to/file/, which would make it a hassle then to move them to a > webserver. When the html converter is not htlatex, why don't you simply take a snapshot of the files that are in the temp dir just before calling the converter, put their names in a file, say "FilesToNotBeCopied", and then use a html_copy.py script that copies only those files that are not listed in "FilesToNotBeCopied"? You may also want to check if one of the `files' created by the converter is instead a directory and properly copy that, too. And then remove the `files' that gets copied, of course. -- Enrico
Re: [PATCH] Make HTML Export Work (Bugs 3090, 3047, etc)
Enrico Forestieri wrote: On Thu, Jun 14, 2007 at 04:06:16PM -0400, Richard Heck wrote: I've added two scripts: a dir_copy.py script, that simply copies the entire temporary directory over to a subdirectory of the intended output directory, and a tex4html_copy.py script that copies only .png, .html, and .css files, these (I'm pretty sure) being the only kinds of files generated by htlatex. What happens, in the end, then, is that if you open /path/to/file/LyXFile.lyx and do File>Export>HTML, then you end up with a (possibly new) directory /path/to/file/LyXFile.html.LyXconv/ and all the relevant files are in there. Rather, say, than scattered across /path/to/file/, which would make it a hassle then to move them to a webserver. When the html converter is not htlatex, why don't you simply take a snapshot of the files that are in the temp dir just before calling the converter, put their names in a file, say "FilesToNotBeCopied", and then use a html_copy.py script that copies only those files that are not listed in "FilesToNotBeCopied"? Yes, we discussed this before, and I thought about that, but there are two problems. One is that we don't know that none of the files that are generated by the HTML converter over-write files that are already present. I don't know that this would be a common problem, but it's possible. I had proposed trying to check the timestamps to avoid this problem, but that turned out to be useless, because of the granularity of the timestamps. The other is that it involves messing with Converters.cpp, which is what I was kind of trying not to do. And we don't want to check there what the converter is, so we'd have to generate this file all the time. I guess there could be a special flag for that, but that just seems so messy. The better solution would be for me to find out what latex2html generates, then write a special script for it. You may also want to check if one of the `files' created by the converter is instead a directory and properly copy that, too. I'll add that to the dir_copy.py program. It actually means I can just use copytree(), so everything gets simpler. And then remove the `files' that gets copied, of course. Yes, of course, if we go that way. Richard -- == Richard G Heck, Jr Professor of Philosophy Brown University http://frege.brown.edu/heck/ == Get my public key from http://sks.keyserver.penguin.de Hash: 0x1DE91F1E66FFBDEC Learn how to sign your email using Thunderbird and GnuPG at: http://dudu.dyn.2-h.org/nist/gpg-enigmail-howto
Re: [PATCH] Make HTML Export Work (Bugs 3090, 3047, etc)
On Thu, Jun 14, 2007 at 06:49:17PM -0400, Richard Heck wrote: > Enrico Forestieri wrote: > > On Thu, Jun 14, 2007 at 04:06:16PM -0400, Richard Heck wrote: > > > > > >> I've added two scripts: a dir_copy.py script, that simply copies the > >> entire temporary directory over to a subdirectory of the intended output > >> directory, and a tex4html_copy.py script that copies only .png, .html, > >> and .css files, these (I'm pretty sure) being the only kinds of files > >> generated by htlatex. What happens, in the end, then, is that if you > >> open /path/to/file/LyXFile.lyx and do File>Export>HTML, then you end up > >> with a (possibly new) directory /path/to/file/LyXFile.html.LyXconv/ and > >> all the relevant files are in there. Rather, say, than scattered across > >> /path/to/file/, which would make it a hassle then to move them to a > >> webserver. > >> > > When the html converter is not htlatex, why don't you simply take a > > snapshot of the files that are in the temp dir just before calling the > > converter, put their names in a file, say "FilesToNotBeCopied", and > > then use a html_copy.py script that copies only those files that are > > not listed in "FilesToNotBeCopied"? > > > Yes, we discussed this before, and I thought about that, but there are > two problems. One is that we don't know that none of the files that are > generated by the HTML converter over-write files that are already > present. I don't know that this would be a common problem, but it's > possible. I had proposed trying to check the timestamps to avoid this > problem, but that turned out to be useless, because of the granularity > of the timestamps. On POSIX systems the granularity is 1 second, on Windows with FAT it is 2 seconds. So, what about creating a file, taking its timestamp, waiting for 2 seconds and then calling the converter? > The other is that it involves messing with > Converters.cpp, which is what I was kind of trying not to do. And we > don't want to check there what the converter is, so we'd have to > generate this file all the time. I guess there could be a special flag > for that, but that just seems so messy. The better solution would be for > me to find out what latex2html generates, then write a special script > for it. This is wrong, as you also have to take into account tth, hevea and I am sure that an user could use some other converter that you don't know about. > > You may also want to check if one of the `files' created by the > > converter is instead a directory and properly copy that, too. > > > I'll add that to the dir_copy.py program. It actually means I can just > use copytree(), so everything gets simpler. Yes, simpler, but this way you are going to copy a lot of trash and I am not sure that a casual user is able to sort out the mess. -- Enrico
Re: [PATCH] Make HTML Export Work (Bugs 3090, 3047, etc)
Richard Heck schrieb: The attached patch should finally make HTML export work properly. As I wrote in bug 3090 the bug does no longer appear with MiKTeX 2.6 and tex4ht, but to assure that it works with all configurations, your solution is the right one. Some annotations: - htlatex is always invoked what costs unnecessary a lot of time: Open the Intro.lyx and view it as HTML, then view it again and htlatex is run again although nothing has been changed in the document in the meantime. (When viewing e.g. the document as PDF pdflatex is correctly not invoked for the the second view.) I don't know why this happens for only for viewing HTML. - when I export the Intro.lyx, the result is stored on the hard disk under this folder name: "Intro.html.LyXconv". I prefer to name it only "Intro.html" or is there a specail reason for the current name? - as viewing HTML works now we should think about a button in the view toolbar for HTML ( - > +# author Angus Leeming > +# author Georg Baum > +# author Richard Heck Aren't you only the author of the two new scripts or did you three worked on a solution the last days and I missed this? ) All in all a nice solution. It would be good when the first issue could be fixed but in any case I give my OK to put it in. many thanks for your long breath with this issue! best regards Uwe
Re: [PATCH] Make HTML Export Work (Bugs 3090, 3047, etc)
Enrico Forestieri wrote: On Thu, Jun 14, 2007 at 06:49:17PM -0400, Richard Heck wrote: Enrico Forestieri wrote: On Thu, Jun 14, 2007 at 04:06:16PM -0400, Richard Heck wrote: I've added two scripts: a dir_copy.py script, that simply copies the entire temporary directory over to a subdirectory of the intended output directory, and a tex4html_copy.py script that copies only .png, .html, and .css files, these (I'm pretty sure) being the only kinds of files generated by htlatex. What happens, in the end, then, is that if you open /path/to/file/LyXFile.lyx and do File>Export>HTML, then you end up with a (possibly new) directory /path/to/file/LyXFile.html.LyXconv/ and all the relevant files are in there. Rather, say, than scattered across /path/to/file/, which would make it a hassle then to move them to a webserver. When the html converter is not htlatex, why don't you simply take a snapshot of the files that are in the temp dir just before calling the converter, put their names in a file, say "FilesToNotBeCopied", and then use a html_copy.py script that copies only those files that are not listed in "FilesToNotBeCopied"? Yes, we discussed this before, and I thought about that, but there are two problems. One is that we don't know that none of the files that are generated by the HTML converter over-write files that are already present. I don't know that this would be a common problem, but it's possible. I had proposed trying to check the timestamps to avoid this problem, but that turned out to be useless, because of the granularity of the timestamps. On POSIX systems the granularity is 1 second, on Windows with FAT it is 2 seconds. So, what about creating a file, taking its timestamp, waiting for 2 seconds and then calling the converter? This seems an awful waste of time. I suppose if we were doing this in the background it wouldn't be so bad, but that's not how it's presently done. But anyway, there's another and to my mind fatal problem, the one that got me going this direction in the first place. Suppose you View HTML before you Export HTML. Then all the files that the converter will generate are already present, and nothing will be exported. I'm sure there's some way to work around that, but, again, it seems to me that it's getting very messy for what is, in reality, a very special case. The other is that it involves messing with Converters.cpp, which is what I was kind of trying not to do. And we don't want to check there what the converter is, so we'd have to generate this file all the time. I guess there could be a special flag for that, but that just seems so messy. The better solution would be for me to find out what latex2html generates, then write a special script for it. This is wrong, as you also have to take into account tth, hevea and I am sure that an user could use some other converter that you don't know about. Right, of course. But we can check for the ones we do know about and take appropriate action. That said, since we're just copying on the basis of extensions, maybe there should be an extra argument for that, and then we don't have to write extra scripts. The copier just becomes: python -tt ext_copier.py -e png,css,html $$i $$o. What do you think? Richard -- == Richard G Heck, Jr Professor of Philosophy Brown University http://frege.brown.edu/heck/ == Get my public key from http://sks.keyserver.penguin.de Hash: 0x1DE91F1E66FFBDEC Learn how to sign your email using Thunderbird and GnuPG at: http://dudu.dyn.2-h.org/nist/gpg-enigmail-howto
Re: [PATCH] Make HTML Export Work (Bugs 3090, 3047, etc)
Uwe Stöhr wrote: Richard Heck schrieb: The attached patch should finally make HTML export work properly. As I wrote in bug 3090 the bug does no longer appear with MiKTeX 2.6 and tex4ht, but to assure that it works with all configurations, your solution is the right one. Some annotations: - htlatex is always invoked what costs unnecessary a lot of time: Open the Intro.lyx and view it as HTML, then view it again and htlatex is run again although nothing has been changed in the document in the meantime. (When viewing e.g. the document as PDF pdflatex is correctly not invoked for the the second view.) I don't know why this happens for only for viewing HTML. Because caching doesn't work properly in that case. I think the problem is that the caching mechanism basically assumes that only one file is being generated. File a new bug report about this, I'd say. - when I export the Intro.lyx, the result is stored on the hard disk under this folder name: "Intro.html.LyXconv". I prefer to name it only "Intro.html" or is there a specail reason for the current name? I was worried about collisions with Intro.html, should it exist. The idea was to use an easily identifiable directory name that is unlikely to be present otherwise. I didn't want to have it be Intro.html.1181865626 (seconds since the epoch) or whatever, though that would be safer. - as viewing HTML works now we should think about a button in the view toolbar for HTML Absolutely. Not to mention a shortcut, if there isn't one. > +# author Angus Leeming > +# author Georg Baum > +# author Richard Heck Aren't you only the author of the two new scripts or did you three worked on a solution the last days and I missed this? They're just adaptations of the old ones. Very simple ones. So I'm borrowing. All in all a nice solution. Thanks! Richard -- == Richard G Heck, Jr Professor of Philosophy Brown University http://frege.brown.edu/heck/ == Get my public key from http://sks.keyserver.penguin.de Hash: 0x1DE91F1E66FFBDEC Learn how to sign your email using Thunderbird and GnuPG at: http://dudu.dyn.2-h.org/nist/gpg-enigmail-howto