Re: extracting text from docx files

2011-08-11 Thread Polytropon
On Tue, 9 Aug 2011 21:16:11 +0200, Christian Barthel wrote:
 On Tue, Aug 09, 2011 at 02:36:32PM +0100, Anton Shterenlikht wrote:
  I often receive information in *.docx format
  from my MS using colleagues. Sometimes I can
  ask for a pdf (or similar) instead, but not always.
 
 You have a lot of nice options: 
 - Force them to use BSD/Linux ;)
 - explain them, why docx is shit!
 - don't read it

I also suggest to combine this with reading the following
article:

http://en.nothingisreal.com/wiki/Please_don't_send_me_Microsoft_Word_documents

It's very polite and precise about why using DOC files
is generally a bad idea. It can be easily concluded that
it also applies to DOCX files.

The document also discusses alternatives.



-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-11 Thread Anton Shterenlikht
On Thu, Aug 11, 2011 at 12:14:51PM +0200, Polytropon wrote:
 On Tue, 9 Aug 2011 21:16:11 +0200, Christian Barthel wrote:
  On Tue, Aug 09, 2011 at 02:36:32PM +0100, Anton Shterenlikht wrote:
   I often receive information in *.docx format
   from my MS using colleagues. Sometimes I can
   ask for a pdf (or similar) instead, but not always.
  
  You have a lot of nice options: 
  - Force them to use BSD/Linux ;)
  - explain them, why docx is shit!
  - don't read it
 
 I also suggest to combine this with reading the following
 article:
 
 http://en.nothingisreal.com/wiki/Please_don't_send_me_Microsoft_Word_documents
 
 It's very polite and precise about why using DOC files
 is generally a bad idea. It can be easily concluded that
 it also applies to DOCX files.
 
 The document also discusses alternatives.

That's not my war. It's not going to achive
much me telling all our admin and academic
staff that what they were tought throughout
their career might not be ideal, or even
not the only, tool in the universe.
Sometimes I can request pdf, sometimes I fail.

I also sometimes try to get pdf from various
UK govt departments. Sometimes they only
make documents available in MS formats.
Again, sometimes they respond well, but
mostly, they ignore my requests.

By the way, I tried abiword, and it couldn't
open my docx.

-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 331 5944
Fax: +44 (0)117 929 4423
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-11 Thread Ruben de Groot

There are several docx converters online (google docx2pdf). Haven't tried them
though. LibreOffice handles docx quite well.

On Thu, Aug 11, 2011 at 12:22:22PM +0100, Anton Shterenlikht typed:
 On Thu, Aug 11, 2011 at 12:14:51PM +0200, Polytropon wrote:
  On Tue, 9 Aug 2011 21:16:11 +0200, Christian Barthel wrote:
   On Tue, Aug 09, 2011 at 02:36:32PM +0100, Anton Shterenlikht wrote:
I often receive information in *.docx format
from my MS using colleagues. Sometimes I can
ask for a pdf (or similar) instead, but not always.
   
   You have a lot of nice options: 
   - Force them to use BSD/Linux ;)
   - explain them, why docx is shit!
   - don't read it
  
  I also suggest to combine this with reading the following
  article:
  
  http://en.nothingisreal.com/wiki/Please_don't_send_me_Microsoft_Word_documents
  
  It's very polite and precise about why using DOC files
  is generally a bad idea. It can be easily concluded that
  it also applies to DOCX files.
  
  The document also discusses alternatives.
 
 That's not my war. It's not going to achive
 much me telling all our admin and academic
 staff that what they were tought throughout
 their career might not be ideal, or even
 not the only, tool in the universe.
 Sometimes I can request pdf, sometimes I fail.
 
 I also sometimes try to get pdf from various
 UK govt departments. Sometimes they only
 make documents available in MS formats.
 Again, sometimes they respond well, but
 mostly, they ignore my requests.
 
 By the way, I tried abiword, and it couldn't
 open my docx.
 
 -- 
 Anton Shterenlikht
 Room 2.6, Queen's Building
 Mech Eng Dept
 Bristol University
 University Walk, Bristol BS8 1TR, UK
 Tel: +44 (0)117 331 5944
 Fax: +44 (0)117 929 4423
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


extracting text from docx files

2011-08-09 Thread Anton Shterenlikht
I often receive information in *.docx format
from my MS using colleagues. Sometimes I can
ask for a pdf (or similar) instead, but not always.

Usually I unzip a docx and then search
through all *xml  files to find the
useful data. However, I can't find any
xml styles to use, so I have to convert
the relevant xml file(s) to plain text
by hand. I wonder if anybody can suggest
a better way. Perhaps there's something
in ports that can help.

Many thanks
Anton


-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 331 5944
Fax: +44 (0)117 929 4423
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-09 Thread Rod Person
On Tue, 9 Aug 2011 14:36:32 +0100
Anton Shterenlikht me...@bristol.ac.uk wrote:

 Usually I unzip a docx and then search
 through all *xml  files to find the
 useful data. However, I can't find any
 xml styles to use, so I have to convert
 the relevant xml file(s) to plain text
 by hand. I wonder if anybody can suggest
 a better way. Perhaps there's something
 in ports that can help.

You could try this for just plain text conversion
http://docx2txt.sourceforge.net/

-- 
Rod
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-09 Thread Anton Shterenlikht
On Tue, Aug 09, 2011 at 09:40:26AM -0400, Rod Person wrote:
 On Tue, 9 Aug 2011 14:36:32 +0100
 Anton Shterenlikht me...@bristol.ac.uk wrote:
 
  Usually I unzip a docx and then search
  through all *xml  files to find the
  useful data. However, I can't find any
  xml styles to use, so I have to convert
  the relevant xml file(s) to plain text
  by hand. I wonder if anybody can suggest
  a better way. Perhaps there's something
  in ports that can help.
 
 You could try this for just plain text conversion
 http://docx2txt.sourceforge.net/

Thank you
Anton

-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 331 5944
Fax: +44 (0)117 929 4423
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-09 Thread Kurt Buff
On Tue, Aug 9, 2011 at 06:36, Anton Shterenlikht me...@bristol.ac.uk wrote:
 I often receive information in *.docx format
 from my MS using colleagues. Sometimes I can
 ask for a pdf (or similar) instead, but not always.

 Usually I unzip a docx and then search
 through all *xml  files to find the
 useful data. However, I can't find any
 xml styles to use, so I have to convert
 the relevant xml file(s) to plain text
 by hand. I wonder if anybody can suggest
 a better way. Perhaps there's something
 in ports that can help.

My installation of OpenOffice 3.3 on my Win7 machine will open a
Winword 2010 .docx file.

I'm guessing it will do the same on FreeBSD, but I don't have an
install with a GUI running at the moment.

Kurt
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-09 Thread Matthias Apitz
El día Tuesday, August 09, 2011 a las 10:25:30AM -0700, Kurt Buff escribió:

 My installation of OpenOffice 3.3 on my Win7 machine will open a
 Winword 2010 .docx file.
 
 I'm guessing it will do the same on FreeBSD, but I don't have an
 install with a GUI running at the moment.

It does, using OpenOffice 3.4.0 in 9-CURENT. 

matthias
-- 
Matthias Apitz
t +49-89-61308 351 - f +49-89-61308 399 - m +49-170-4527211
e g...@unixarea.de - w http://www.unixarea.de/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-09 Thread Christian Barthel
On Tue, Aug 09, 2011 at 02:36:32PM +0100, Anton Shterenlikht wrote:
 I often receive information in *.docx format
 from my MS using colleagues. Sometimes I can
 ask for a pdf (or similar) instead, but not always.

You have a lot of nice options: 
- Force them to use BSD/Linux ;)
- explain them, why docx is shit!
- don't read it

 
 Usually I unzip a docx and then search
 through all *xml  files to find the
 useful data. However, I can't find any
 xml styles to use, so I have to convert
 the relevant xml file(s) to plain text
 by hand. I wonder if anybody can suggest
 a better way. Perhaps there's something
 in ports that can help.

But if you really, really need to read docx, you can try the web
application from Microsoft. A few months ago, I got also a lot of docx
and I opend it with the microsoft web app; this worked for me to extract
the information...

More information: 
http://office.microsoft.com/en-us/web-apps/

The downside:  you have to sign up on a microsoft service :( 

cheers

-- 
Christian Barthel 
Public-Key: http://bc.user-mode.org/bc.asc 
Mail: b...@nyx.user-mode.org
Web: http://bc.user-mode.org
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-09 Thread Antonio Olivares
 But if you really, really need to read docx, you can try the web
 application from Microsoft. A few months ago, I got also a lot of docx
 and I opend it with the microsoft web app; this worked for me to extract
 the information...

 More information:
 http://office.microsoft.com/en-us/web-apps/

 The downside:  you have to sign up on a microsoft service :(


Can also use libreoffice.  It is in the ports system :)

Without installing anything, Google Docs also opens *.docx files, if
needed. There are other options too, but it depends on what Anton
wants to install* or just view*  extract?

Regards,

Antonio
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-09 Thread Christian Barthel
On Tue, Aug 09, 2011 at 02:57:51PM -0500, Antonio Olivares wrote:
  But if you really, really need to read docx, you can try the web
  application from Microsoft. A few months ago, I got also a lot of docx
  and I opend it with the microsoft web app; this worked for me to extract
  the information...
 
  More information:
  http://office.microsoft.com/en-us/web-apps/
 
  The downside: ?you have to sign up on a microsoft service :(
 
 
 Can also use libreoffice.  It is in the ports system :)

Sure. But libreoffice is a matter of opinion. *I* would never ever
install this  bloated, buggy software product @_@ 

But, I must admit that I am very petted: vim + LaTeX _rocks_ 

 
 Without installing anything, Google Docs also opens *.docx files, if
 needed. There are other options too, but it depends on what Anton
 wants to install* or just view*  extract?

I have a google account but I never used Google Docs. Nice to know...

 
 Regards,
 
 Antonio
 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org

-- 
Christian Barthel 
Public-Key: http://bc.user-mode.org/bc.asc 
Mail: b...@nyx.user-mode.org
Web: http://bc.user-mode.org
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-09 Thread Alejandro Imass
On Tue, Aug 9, 2011 at 3:57 PM, Antonio Olivares
olivares14...@gmail.com wrote:
 But if you really, really need to read docx, you can try the web
 application from Microsoft. A few months ago, I got also a lot of docx
 and I opend it with the microsoft web app; this worked for me to extract
 the information...


just a thought here but if docx is XML why not just find/build some
XSLT that extracts what you need into another format?
you probably have libxml2 and libxslt already in your system, and the
command line utility: xsltproc
there are probably already existing XSLT to transform to RTF and plain text.

--
Alejandro Imass
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-09 Thread Anton Shterenlikht
On Tue, Aug 09, 2011 at 02:57:51PM -0500, Antonio Olivares wrote:
  But if you really, really need to read docx, you can try the web
  application from Microsoft. A few months ago, I got also a lot of docx
  and I opend it with the microsoft web app; this worked for me to extract
  the information...
 
  More information:
  http://office.microsoft.com/en-us/web-apps/
 
  The downside: ?you have to sign up on a microsoft service :(
 
 
 Can also use libreoffice.  It is in the ports system :)
 
 Without installing anything, Google Docs also opens *.docx files, if
 needed. There are other options too, but it depends on what Anton
 wants to install* or just view*  extract?

Well.. I don't really want to install anything
just to read docx. So probably something as
small as possible. libreoffice (even if it's in ports,
which I dearly love) looks like a monster of
a package, so I'm not sure.

Thanks anyway


-- 
Anton Shterenlikht
Room 2.6, Queen's Building
Mech Eng Dept
Bristol University
University Walk, Bristol BS8 1TR, UK
Tel: +44 (0)117 331 5944
Fax: +44 (0)117 929 4423
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-09 Thread Antonio Olivares
 Well.. I don't really want to install anything
 just to read docx. So probably something as
 small as possible. libreoffice (even if it's in ports,
 which I dearly love) looks like a monster of
 a package, so I'm not sure.

 Thanks anyway


 --

abiword is a word processor that opens docx files, and is in the ports :)
You are welcome to check it out :)  I mentioned libreoffice because it
is a full suite but it is BIG :(

It is not a MONSTER :)

Regards,

Antonio
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-09 Thread Warren Block

On Tue, 9 Aug 2011, Anton Shterenlikht wrote:


Well.. I don't really want to install anything
just to read docx. So probably something as
small as possible. libreoffice (even if it's in ports,
which I dearly love) looks like a monster of
a package, so I'm not sure.


Although still relatively large, OpenOffice has fewer dependencies than 
LibreOffice.  My system has OO.o 3.3 installed, and 'make missing' shows 
seventeen new dependencies needed by LibreOffice.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: extracting text from docx files

2011-08-09 Thread Chris Hill

On Tue, 9 Aug 2011, Anton Shterenlikht wrote:


On Tue, Aug 09, 2011 at 02:57:51PM -0500, Antonio Olivares wrote:

But if you really, really need to read docx, you can try the web
application from Microsoft. A few months ago, I got also a lot of docx
and I opend it with the microsoft web app; this worked for me to extract
the information...

More information:
http://office.microsoft.com/en-us/web-apps/

The downside: ?you have to sign up on a microsoft service :(



Can also use libreoffice.  It is in the ports system :)

Without installing anything, Google Docs also opens *.docx files, if
needed. There are other options too, but it depends on what Anton
wants to install* or just view*  extract?


Well.. I don't really want to install anything
just to read docx. So probably something as
small as possible. libreoffice (even if it's in ports,
which I dearly love) looks like a monster of
a package, so I'm not sure.


Maybe an online service? If you don't have too many to convert at one 
time, and there's nothing secret in them, you could try 
http://www.doc2pdf.net/ - I've never used it, so caveat clicktor.


--
Chris Hill   ch...@monochrome.org
** [ Busy Expunging / ]
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org