Re: [PHP] storing searching docs

2012-12-15 Thread tamouse mailing lists
On Dec 13, 2012 4:50 PM, Jim Giner jim.gi...@albanyhandball.com wrote:

 Thanks for all the posts.  After reading and googling all afternoon, I
think the best approach for me is:

 Create two macros in Word (done!) to export each of my .doc files to .txt
and .pdf formats.

 Create a sql table to hold the .txt contents of my .doc files, along with
a reference to the meeting date and the name of the corresponding .pdf file.

 Upload my two sets of files with an ftp client and then use a script to
load the table with my .txt file data.


Why not use php to upload the set of files?

 Now I just need a couple of scripts to allow a user to locate a file and
bring up the pdf for when he wants to read about a meeting.  And a second
script to accept user input (search words) and perform a query against the
textual data and present some kind of results - probably a listing
containing a reference to the meeting date and a tbd-length string showing
the matching result for each occurrence, ie, something like n chars in
front of and after the match so the user can see the context of the match.

 Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in
.txt format.  (actually, if I 'print' the .doc as a pdf instead of using
the Word's File,Save as, the resulting pdf is only 70kb.  Might need a
new macro!)

 Thanks again!


 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-15 Thread tamouse mailing lists
On Dec 13, 2012 4:50 PM, Jim Giner jim.gi...@albanyhandball.com wrote:

 Thanks for all the posts.  After reading and googling all afternoon, I
think the best approach for me is:

 Create two macros in Word (done!) to export each of my .doc files to .txt
and .pdf formats.

 Create a sql table to hold the .txt contents of my .doc files, along with
a reference to the meeting date and the name of the corresponding .pdf file.

 Upload my two sets of files with an ftp client and then use a script to
load the table with my .txt file data.

 Now I just need a couple of scripts to allow a user to locate a file and
bring up the pdf for when he wants to read about a meeting.  And a second
script to accept user input (search words) and perform a query against the
textual data and present some kind of results - probably a listing
containing a reference to the meeting date and a tbd-length string showing
the matching result for each occurrence, ie, something like n chars in
front of and after the match so the user can see the context of the match.

 Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in
.txt format.  (actually, if I 'print' the .doc as a pdf instead of using
the Word's File,Save as, the resulting pdf is only 70kb.  Might need a
new macro!)


PDF might be better looking than this, but how big is an HTML doc exported
from Word?

 Thanks again!


 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-15 Thread tamouse mailing lists
On Dec 15, 2012 7:29 AM, tamouse mailing lists tamouse.li...@gmail.com
wrote:


 On Dec 13, 2012 4:50 PM, Jim Giner jim.gi...@albanyhandball.com wrote:
 
  Thanks for all the posts.  After reading and googling all afternoon, I
think the best approach for me is:
 
  Create two macros in Word (done!) to export each of my .doc files to
.txt and .pdf formats.
 
  Create a sql table to hold the .txt contents of my .doc files, along
with a reference to the meeting date and the name of the corresponding .pdf
file.
 
  Upload my two sets of files with an ftp client and then use a script to
load the table with my .txt file data.
 
  Now I just need a couple of scripts to allow a user to locate a file
and bring up the pdf for when he wants to read about a meeting.  And a
second script to accept user input (search words) and perform a query
against the textual data and present some kind of results - probably a
listing containing a reference to the meeting date and a tbd-length string
showing the matching result for each occurrence, ie, something like n chars
in front of and after the match so the user can see the context of the
match.
 
  Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb
in .txt format.  (actually, if I 'print' the .doc as a pdf instead of using
the Word's File,Save as, the resulting pdf is only 70kb.  Might need a
new macro!)
 

 PDF might be better looking than this, but how big is an HTML doc
exported from Word?

Sorry for the disjointed replies,  it's still early...

You could export just the HTML, upload it, and your script could strip the
HTML to have both formats available, I.e. plain text for indexing, HTML for
presentation... or even, say, run the HTML through pandoc and produce
markdown...

As I say, it's early, these might be bad ideas, but it's how I'd approach
it.


  Thanks again!
 
 
  --
  PHP General Mailing List (http://www.php.net/)
  To unsubscribe, visit: http://www.php.net/unsub.php
 


Re: [PHP] storing searching docs

2012-12-15 Thread Jim Giner

On 12/15/2012 8:26 AM, tamouse mailing lists wrote:

On Dec 13, 2012 4:50 PM, Jim Giner jim.gi...@albanyhandball.com wrote:


Thanks for all the posts.  After reading and googling all afternoon, I

think the best approach for me is:


Create two macros in Word (done!) to export each of my .doc files to .txt

and .pdf formats.


Create a sql table to hold the .txt contents of my .doc files, along with

a reference to the meeting date and the name of the corresponding .pdf file.


Upload my two sets of files with an ftp client and then use a script to

load the table with my .txt file data.




Why not use php to upload the set of files?


Now I just need a couple of scripts to allow a user to locate a file and

bring up the pdf for when he wants to read about a meeting.  And a second
script to accept user input (search words) and perform a query against the
textual data and present some kind of results - probably a listing
containing a reference to the meeting date and a tbd-length string showing
the matching result for each occurrence, ie, something like n chars in
front of and after the match so the user can see the context of the match.


Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in

.txt format.  (actually, if I 'print' the .doc as a pdf instead of using
the Word's File,Save as, the resulting pdf is only 70kb.  Might need a
new macro!)


Thanks again!


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



cause I dont' now how php could do such a thing?  The only way I know of 
is thru a 'file' input on an html page which is a pia since I would have 
to do it for each file.  With an ftp client I can just drag/drop the 
files in 10 seconds.  In the future, as I add additional docs, one at a 
time, I'll have a simple html form for doing that.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-15 Thread Jim Giner

On 12/15/2012 8:29 AM, tamouse mailing lists wrote:

On Dec 13, 2012 4:50 PM, Jim Giner jim.gi...@albanyhandball.com wrote:


Thanks for all the posts.  After reading and googling all afternoon, I

think the best approach for me is:


Create two macros in Word (done!) to export each of my .doc files to .txt

and .pdf formats.


Create a sql table to hold the .txt contents of my .doc files, along with

a reference to the meeting date and the name of the corresponding .pdf file.


Upload my two sets of files with an ftp client and then use a script to

load the table with my .txt file data.


Now I just need a couple of scripts to allow a user to locate a file and

bring up the pdf for when he wants to read about a meeting.  And a second
script to accept user input (search words) and perform a query against the
textual data and present some kind of results - probably a listing
containing a reference to the meeting date and a tbd-length string showing
the matching result for each occurrence, ie, something like n chars in
front of and after the match so the user can see the context of the match.


Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in

.txt format.  (actually, if I 'print' the .doc as a pdf instead of using
the Word's File,Save as, the resulting pdf is only 70kb.  Might need a
new macro!)




PDF might be better looking than this, but how big is an HTML doc exported
from Word?


Thanks again!


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Word generates very many many words (!) when creating an html doc.  Not 
a good html generator at all.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-15 Thread Ashley Sheridan
On Sat, 2012-12-15 at 12:21 -0500, Jim Giner wrote:

 On 12/15/2012 8:26 AM, tamouse mailing lists wrote:
  On Dec 13, 2012 4:50 PM, Jim Giner jim.gi...@albanyhandball.com wrote:
 
  Thanks for all the posts.  After reading and googling all afternoon, I
  think the best approach for me is:
 
  Create two macros in Word (done!) to export each of my .doc files to .txt
  and .pdf formats.
 
  Create a sql table to hold the .txt contents of my .doc files, along with
  a reference to the meeting date and the name of the corresponding .pdf file.
 
  Upload my two sets of files with an ftp client and then use a script to
  load the table with my .txt file data.
 
 
  Why not use php to upload the set of files?
 
  Now I just need a couple of scripts to allow a user to locate a file and
  bring up the pdf for when he wants to read about a meeting.  And a second
  script to accept user input (search words) and perform a query against the
  textual data and present some kind of results - probably a listing
  containing a reference to the meeting date and a tbd-length string showing
  the matching result for each occurrence, ie, something like n chars in
  front of and after the match so the user can see the context of the match.
 
  Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in
  .txt format.  (actually, if I 'print' the .doc as a pdf instead of using
  the Word's File,Save as, the resulting pdf is only 70kb.  Might need a
  new macro!)
 
  Thanks again!
 
 
  --
  PHP General Mailing List (http://www.php.net/)
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 cause I dont' now how php could do such a thing?  The only way I know of 
 is thru a 'file' input on an html page which is a pia since I would have 
 to do it for each file.  With an ftp client I can just drag/drop the 
 files in 10 seconds.  In the future, as I add additional docs, one at a 
 time, I'll have a simple html form for doing that.
 


I believe Chrome supports drag and drop for file inputs now. I do know
that Chrome and Firefox support multiple uploads from one form element
without the need for things like Uploadify.

Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [PHP] storing searching docs

2012-12-15 Thread tamouse mailing lists
On Sat, Dec 15, 2012 at 11:21 AM, Jim Giner
jim.gi...@albanyhandball.com wrote:
 On 12/15/2012 8:26 AM, tamouse mailing lists wrote:

 On Dec 13, 2012 4:50 PM, Jim Giner jim.gi...@albanyhandball.com wrote:


 Thanks for all the posts.  After reading and googling all afternoon, I

 think the best approach for me is:


 Create two macros in Word (done!) to export each of my .doc files to .txt

 and .pdf formats.


 Create a sql table to hold the .txt contents of my .doc files, along with

 a reference to the meeting date and the name of the corresponding .pdf
 file.


 Upload my two sets of files with an ftp client and then use a script to

 load the table with my .txt file data.



 Why not use php to upload the set of files?

 Now I just need a couple of scripts to allow a user to locate a file and

 bring up the pdf for when he wants to read about a meeting.  And a second
 script to accept user input (search words) and perform a query against the
 textual data and present some kind of results - probably a listing
 containing a reference to the meeting date and a tbd-length string showing
 the matching result for each occurrence, ie, something like n chars in
 front of and after the match so the user can see the context of the match.


 Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in

 .txt format.  (actually, if I 'print' the .doc as a pdf instead of using
 the Word's File,Save as, the resulting pdf is only 70kb.  Might need a
 new macro!)


 Thanks again!


 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php


 cause I dont' now how php could do such a thing?  The only way I know of is
 thru a 'file' input on an html page which is a pia since I would have to do
 it for each file.  With an ftp client I can just drag/drop the files in 10
 seconds.  In the future, as I add additional docs, one at a time, I'll have
 a simple html form for doing that.


 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php


Yeah, bulk upload is a bigger problem. I was thinking just the
one-at-a-time thing.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-15 Thread tamouse mailing lists
On Sat, Dec 15, 2012 at 11:22 AM, Jim Giner
jim.gi...@albanyhandball.com wrote:
 On 12/15/2012 8:29 AM, tamouse mailing lists wrote:

 On Dec 13, 2012 4:50 PM, Jim Giner jim.gi...@albanyhandball.com wrote:


 Thanks for all the posts.  After reading and googling all afternoon, I

 think the best approach for me is:


 Create two macros in Word (done!) to export each of my .doc files to .txt

 and .pdf formats.


 Create a sql table to hold the .txt contents of my .doc files, along with

 a reference to the meeting date and the name of the corresponding .pdf
 file.


 Upload my two sets of files with an ftp client and then use a script to

 load the table with my .txt file data.


 Now I just need a couple of scripts to allow a user to locate a file and

 bring up the pdf for when he wants to read about a meeting.  And a second
 script to accept user input (search words) and perform a query against the
 textual data and present some kind of results - probably a listing
 containing a reference to the meeting date and a tbd-length string showing
 the matching result for each occurrence, ie, something like n chars in
 front of and after the match so the user can see the context of the match.


 Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in

 .txt format.  (actually, if I 'print' the .doc as a pdf instead of using
 the Word's File,Save as, the resulting pdf is only 70kb.  Might need a
 new macro!)



 PDF might be better looking than this, but how big is an HTML doc exported
 from Word?

 Thanks again!


 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php


 Word generates very many many words (!) when creating an html doc.  Not a
 good html generator at all.


 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php


I think my next email talked about sending the HTML through pandoc to
make a plain text file, perhaps in markdown, which could be the thing
you save, and then run it through a markdown filter to produce (a
much, much leaner) HTML.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-15 Thread Jim Giner
I think im good with a text for the db and search capability and the pdf for 
pure display

jg


On Dec 15, 2012, at 5:31 PM, tamouse mailing lists tamouse.li...@gmail.com 
wrote:

 On Sat, Dec 15, 2012 at 11:22 AM, Jim Giner
 jim.gi...@albanyhandball.com wrote:
 On 12/15/2012 8:29 AM, tamouse mailing lists wrote:
 
 On Dec 13, 2012 4:50 PM, Jim Giner jim.gi...@albanyhandball.com wrote:
 
 
 Thanks for all the posts.  After reading and googling all afternoon, I
 
 think the best approach for me is:
 
 
 Create two macros in Word (done!) to export each of my .doc files to .txt
 
 and .pdf formats.
 
 
 Create a sql table to hold the .txt contents of my .doc files, along with
 
 a reference to the meeting date and the name of the corresponding .pdf
 file.
 
 
 Upload my two sets of files with an ftp client and then use a script to
 
 load the table with my .txt file data.
 
 
 Now I just need a couple of scripts to allow a user to locate a file and
 
 bring up the pdf for when he wants to read about a meeting.  And a second
 script to accept user input (search words) and perform a query against the
 textual data and present some kind of results - probably a listing
 containing a reference to the meeting date and a tbd-length string showing
 the matching result for each occurrence, ie, something like n chars in
 front of and after the match so the user can see the context of the match.
 
 
 Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in
 
 .txt format.  (actually, if I 'print' the .doc as a pdf instead of using
 the Word's File,Save as, the resulting pdf is only 70kb.  Might need a
 new macro!)
 
 
 
 PDF might be better looking than this, but how big is an HTML doc exported
 from Word?
 
 Thanks again!
 
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 Word generates very many many words (!) when creating an html doc.  Not a
 good html generator at all.
 
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 
 
 I think my next email talked about sending the HTML through pandoc to
 make a plain text file, perhaps in markdown, which could be the thing
 you save, and then run it through a markdown filter to produce (a
 much, much leaner) HTML.
 

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-13 Thread Jim Giner

Thanks for the input gentlemen.  Two opposing viewpoints!

I understand the concept of using files for the docs and a table to 
locate them and id them.  But I am of the opinion that modern dbs are 
capable of handling very large objects (of which these docs are NOT!) 
much easier than years ago, so I am leaning that way still.  It will 
certainly make my search process easier!


More comments anyone?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-13 Thread Matijn Woudt
On Thu, Dec 13, 2012 at 3:10 PM, Jim Giner jim.gi...@albanyhandball.comwrote:

 Thanks for the input gentlemen.  Two opposing viewpoints!

 I understand the concept of using files for the docs and a table to locate
 them and id them.  But I am of the opinion that modern dbs are capable of
 handling very large objects (of which these docs are NOT!) much easier than
 years ago, so I am leaning that way still.  It will certainly make my
 search process easier!

 More comments anyone?


I'm not sure if there's much difference between large text fields and
blobs, but I had a database (MySQL) with rows that had one blob each of
5-10 mb. At around 200-300 rows the database was pretty slow. After
reaching about 2000 rows, it was terrible. Opening the database with
phpMyAdmin (which executes just select with LIMIT 1, 30), took around 6
seconds. Doing a order by on one of the other rows, it took a few
minutes.. I tried both InnoDB and MyISAM for storage, but that didn't make
much of a difference.

So it depends on how large your docs are I guess..

- Matijn


Re: [PHP] storing searching docs

2012-12-13 Thread Jim Giner

On 12/13/2012 9:19 AM, Matijn Woudt wrote:

On Thu, Dec 13, 2012 at 3:10 PM, Jim Giner jim.gi...@albanyhandball.comwrote:





I'm not sure if there's much difference between large text fields and
blobs, but I had a database (MySQL) with rows that had one blob each of
5-10 mb. At around 200-300 rows the database was pretty slow. After
reaching about 2000 rows, it was terrible. Opening the database with
phpMyAdmin (which executes just select with LIMIT 1, 30), took around 6
seconds. Doing a order by on one of the other rows, it took a few
minutes.. I tried both InnoDB and MyISAM for storage, but that didn't make
much of a difference.

So it depends on how large your docs are I guess..

- Matijn

My docs are very small.  Two hour meetings, 4 typed pages usually, so 
approx. 8K of real data each.  I don't think storage is much of a 
concern here.  The actual doc formats are around 28K and when 
converted to RTF they grow to 44K - still not very large.


Will this be a concern?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-13 Thread Matijn Woudt
On Thu, Dec 13, 2012 at 3:32 PM, Jim Giner jim.gi...@albanyhandball.comwrote:

 On 12/13/2012 9:19 AM, Matijn Woudt wrote:

 On Thu, Dec 13, 2012 at 3:10 PM, Jim Giner jim.gi...@albanyhandball.com
 **wrote:



  I'm not sure if there's much difference between large text fields and
 blobs, but I had a database (MySQL) with rows that had one blob each of
 5-10 mb. At around 200-300 rows the database was pretty slow. After
 reaching about 2000 rows, it was terrible. Opening the database with
 phpMyAdmin (which executes just select with LIMIT 1, 30), took around 6
 seconds. Doing a order by on one of the other rows, it took a few
 minutes.. I tried both InnoDB and MyISAM for storage, but that didn't make
 much of a difference.

 So it depends on how large your docs are I guess..

 - Matijn

  My docs are very small.  Two hour meetings, 4 typed pages usually, so
 approx. 8K of real data each.  I don't think storage is much of a concern
 here.  The actual doc formats are around 28K and when converted to RTF
 they grow to 44K - still not very large.

 Will this be a concern?


That of course also depends on how many you are planning on storing. I
guess a few hundred will be ok, but after that I'm not so sure..

- Matijn


Re: [PHP] storing searching docs

2012-12-13 Thread Bastien


Bastien Koert

On 2012-12-13, at 9:10 AM, Jim Giner jim.gi...@albanyhandball.com wrote:

 Thanks for the input gentlemen.  Two opposing viewpoints!
 
 I understand the concept of using files for the docs and a table to locate 
 them and id them.  But I am of the opinion that modern dbs are capable of 
 handling very large objects (of which these docs are NOT!) much easier than 
 years ago, so I am leaning that way still.  It will certainly make my search 
 process easier!
 
 More comments anyone?
 
 -- 
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 

I got away from storing blobs in the db. I noticed significant slowness after 
the db grew to about 12gb in MySQL. Back ups also get affected as they take 
longer. This was older MySQL. But it also affected my mssql server the same 
way. 

Nowadays it's files into the file system and data into the db. One thing you 
could consider is reading the contents of the into a db field and just store 
the text to allow the full text search

Bastien
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-13 Thread Jim Giner

On 12/13/2012 10:56 AM, Bastien wrote:



Bastien Koert

On 2012-12-13, at 9:10 AM, Jim Giner jim.gi...@albanyhandball.com wrote:


Thanks for the input gentlemen.  Two opposing viewpoints!

I understand the concept of using files for the docs and a table to locate them 
and id them.  But I am of the opinion that modern dbs are capable of handling 
very large objects (of which these docs are NOT!) much easier than years ago, 
so I am leaning that way still.  It will certainly make my search process 
easier!

More comments anyone?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



I got away from storing blobs in the db. I noticed significant slowness after 
the db grew to about 12gb in MySQL. Back ups also get affected as they take 
longer. This was older MySQL. But it also affected my mssql server the same way.

Nowadays it's files into the file system and data into the db. One thing you 
could consider is reading the contents of the into a db field and just store 
the text to allow the full text search

Bastien

A very clever idea!  I like it - the best of both worlds.  Can you sum 
up a method for getting the text out of the .doc (or .rtf) files so that 
I can automate the process for my past and future documents?

Is there a single php function that would accomplish this?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-13 Thread Matijn Woudt
On Thu, Dec 13, 2012 at 5:13 PM, Jim Giner jim.gi...@albanyhandball.comwrote:

 On 12/13/2012 10:56 AM, Bastien wrote:



 Bastien Koert

 On 2012-12-13, at 9:10 AM, Jim Giner jim.gi...@albanyhandball.com
 wrote:

  Thanks for the input gentlemen.  Two opposing viewpoints!

 I understand the concept of using files for the docs and a table to
 locate them and id them.  But I am of the opinion that modern dbs are
 capable of handling very large objects (of which these docs are NOT!) much
 easier than years ago, so I am leaning that way still.  It will certainly
 make my search process easier!

 More comments anyone?

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php


 I got away from storing blobs in the db. I noticed significant slowness
 after the db grew to about 12gb in MySQL. Back ups also get affected as
 they take longer. This was older MySQL. But it also affected my mssql
 server the same way.

 Nowadays it's files into the file system and data into the db. One thing
 you could consider is reading the contents of the into a db field and just
 store the text to allow the full text search

 Bastien

  A very clever idea!  I like it - the best of both worlds.  Can you sum
 up a method for getting the text out of the .doc (or .rtf) files so that I
 can automate the process for my past and future documents?
 Is there a single php function that would accomplish this?


There's no builtin function for such stuff. doc files are quite tricky to
parse, but rtf files can be parsed pretty easily. One project is PHPRtfLite
[1], which provides you an API for doing this.

- Matijn

[1] http://sourceforge.net/projects/phprtf/


Re: [PHP] storing searching docs

2012-12-13 Thread Ashley Sheridan
On Thu, 2012-12-13 at 18:41 +0100, Matijn Woudt wrote:

 On Thu, Dec 13, 2012 at 5:13 PM, Jim Giner 
 jim.gi...@albanyhandball.comwrote:
 
  On 12/13/2012 10:56 AM, Bastien wrote:
 
 
 
  Bastien Koert
 
  On 2012-12-13, at 9:10 AM, Jim Giner jim.gi...@albanyhandball.com
  wrote:
 
   Thanks for the input gentlemen.  Two opposing viewpoints!
 
  I understand the concept of using files for the docs and a table to
  locate them and id them.  But I am of the opinion that modern dbs are
  capable of handling very large objects (of which these docs are NOT!) much
  easier than years ago, so I am leaning that way still.  It will certainly
  make my search process easier!
 
  More comments anyone?
 
  --
  PHP General Mailing List (http://www.php.net/)
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 
  I got away from storing blobs in the db. I noticed significant slowness
  after the db grew to about 12gb in MySQL. Back ups also get affected as
  they take longer. This was older MySQL. But it also affected my mssql
  server the same way.
 
  Nowadays it's files into the file system and data into the db. One thing
  you could consider is reading the contents of the into a db field and just
  store the text to allow the full text search
 
  Bastien
 
   A very clever idea!  I like it - the best of both worlds.  Can you sum
  up a method for getting the text out of the .doc (or .rtf) files so that I
  can automate the process for my past and future documents?
  Is there a single php function that would accomplish this?
 
 
 There's no builtin function for such stuff. doc files are quite tricky to
 parse, but rtf files can be parsed pretty easily. One project is PHPRtfLite
 [1], which provides you an API for doing this.
 
 - Matijn
 
 [1] http://sourceforge.net/projects/phprtf/


As well as rtf, the OpenDoc format is easy to read from PHP. Essentially
it's just a bunch of XML files zipped up. Images are kept in the archive
too, which is a handy way to retrieve thumbnails of docs also!

Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [PHP] storing searching docs

2012-12-13 Thread Bastien Koert
On Thu, Dec 13, 2012 at 12:41 PM, Matijn Woudt tijn...@gmail.com wrote:
 On Thu, Dec 13, 2012 at 5:13 PM, Jim Giner 
 jim.gi...@albanyhandball.comwrote:

 On 12/13/2012 10:56 AM, Bastien wrote:



 Bastien Koert

 On 2012-12-13, at 9:10 AM, Jim Giner jim.gi...@albanyhandball.com
 wrote:

  Thanks for the input gentlemen.  Two opposing viewpoints!

 I understand the concept of using files for the docs and a table to
 locate them and id them.  But I am of the opinion that modern dbs are
 capable of handling very large objects (of which these docs are NOT!) much
 easier than years ago, so I am leaning that way still.  It will certainly
 make my search process easier!

 More comments anyone?

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php


 I got away from storing blobs in the db. I noticed significant slowness
 after the db grew to about 12gb in MySQL. Back ups also get affected as
 they take longer. This was older MySQL. But it also affected my mssql
 server the same way.

 Nowadays it's files into the file system and data into the db. One thing
 you could consider is reading the contents of the into a db field and just
 store the text to allow the full text search

 Bastien

  A very clever idea!  I like it - the best of both worlds.  Can you sum
 up a method for getting the text out of the .doc (or .rtf) files so that I
 can automate the process for my past and future documents?
 Is there a single php function that would accomplish this?


 There's no builtin function for such stuff. doc files are quite tricky to
 parse, but rtf files can be parsed pretty easily. One project is PHPRtfLite
 [1], which provides you an API for doing this.

 - Matijn

 [1] http://sourceforge.net/projects/phprtf/


There is 
http://stackoverflow.com/questions/188452/reading-writing-a-ms-word-file-in-php
which has some discussion on reading those files with Antiword
(http://www.winfield.demon.nl/)

-- 

Bastien

Cat, the other other white meat

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-13 Thread Jim Giner

On 12/13/2012 2:40 PM, Bastien Koert wrote:

On Thu, Dec 13, 2012 at 12:41 PM, Matijn Woudt tijn...@gmail.com wrote:

On Thu, Dec 13, 2012 at 5:13 PM, Jim Giner jim.gi...@albanyhandball.comwrote:


On 12/13/2012 10:56 AM, Bastien wrote:




Bastien Koert

On 2012-12-13, at 9:10 AM, Jim Giner jim.gi...@albanyhandball.com
wrote:

  Thanks for the input gentlemen.  Two opposing viewpoints!


I understand the concept of using files for the docs and a table to
locate them and id them.  But I am of the opinion that modern dbs are
capable of handling very large objects (of which these docs are NOT!) much
easier than years ago, so I am leaning that way still.  It will certainly
make my search process easier!

More comments anyone?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



I got away from storing blobs in the db. I noticed significant slowness
after the db grew to about 12gb in MySQL. Back ups also get affected as
they take longer. This was older MySQL. But it also affected my mssql
server the same way.

Nowadays it's files into the file system and data into the db. One thing
you could consider is reading the contents of the into a db field and just
store the text to allow the full text search

Bastien

  A very clever idea!  I like it - the best of both worlds.  Can you sum

up a method for getting the text out of the .doc (or .rtf) files so that I
can automate the process for my past and future documents?
Is there a single php function that would accomplish this?



There's no builtin function for such stuff. doc files are quite tricky to
parse, but rtf files can be parsed pretty easily. One project is PHPRtfLite
[1], which provides you an API for doing this.

- Matijn

[1] http://sourceforge.net/projects/phprtf/



There is 
http://stackoverflow.com/questions/188452/reading-writing-a-ms-word-file-in-php
which has some discussion on reading those files with Antiword
(http://www.winfield.demon.nl/)

But I can't get antiword.  I'm running windows while my host is running 
linux.  And there aren't any linux binaries available for download to 
put onto my host (assuming that I could do that!).  Or am I missing 
something.


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-13 Thread Jim Giner
Thanks for all the posts.  After reading and googling all afternoon, I 
think the best approach for me is:


Create two macros in Word (done!) to export each of my .doc files to 
.txt and .pdf formats.


Create a sql table to hold the .txt contents of my .doc files, along 
with a reference to the meeting date and the name of the corresponding 
.pdf file.


Upload my two sets of files with an ftp client and then use a script to 
load the table with my .txt file data.


Now I just need a couple of scripts to allow a user to locate a file and 
bring up the pdf for when he wants to read about a meeting.  And a 
second script to accept user input (search words) and perform a query 
against the textual data and present some kind of results - probably a 
listing containing a reference to the meeting date and a tbd-length 
string showing the matching result for each occurrence, ie, something 
like n chars in front of and after the match so the user can see the 
context of the match.


Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in 
.txt format.  (actually, if I 'print' the .doc as a pdf instead of using 
the Word's File,Save as, the resulting pdf is only 70kb.  Might need a 
new macro!)


Thanks again!

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-13 Thread Jim Lucas

On 12/13/2012 02:49 PM, Jim Giner wrote:

Thanks for all the posts. After reading and googling all afternoon, I
think the best approach for me is:

Create two macros in Word (done!) to export each of my .doc files to
.txt and .pdf formats.

Create a sql table to hold the .txt contents of my .doc files, along
with a reference to the meeting date and the name of the corresponding
.pdf file.

Upload my two sets of files with an ftp client and then use a script to
load the table with my .txt file data.

Now I just need a couple of scripts to allow a user to locate a file and
bring up the pdf for when he wants to read about a meeting. And a second
script to accept user input (search words) and perform a query against
the textual data and present some kind of results - probably a listing
containing a reference to the meeting date and a tbd-length string
showing the matching result for each occurrence, ie, something like n
chars in front of and after the match so the user can see the context of
the match.

Sizes - a 28k .doc file grows to 142kb in .pdf format and is only 5kb in
.txt format. (actually, if I 'print' the .doc as a pdf instead of using
the Word's File,Save as, the resulting pdf is only 70kb. Might need a
new macro!)

Thanks again!



I wrote this script a few years ago that extracted the plain text out of 
the .doc file.


http://www.cmsws.com/examples/applications/word2_/convert.php

if you look in the directory you will see a few example files.

You can view them like this.

.../convert.php?filename=test_building.doc

replace test_building.doc with any of the other .doc files from the dir 
listing to see its contents.


I currently have it set to 64bit width rows.  Show you some nice pattern 
stuff with the MS Word format.


I have the source file viewable for the convert.php script as well.

http://www.cmsws.com/examples/applications/word2_/convert.phps

I have thought about extending this even further to figure out the 
layout and test formatting.  But it hasn't gotten much attention for 
quite some time now.


Hope it helps.

--
Jim Lucas

http://www.cmsws.com/
http://www.cmsws.com/examples/

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] storing searching docs

2012-12-12 Thread Jim Giner

Slightly off-topic perhaps but I'm looking for general input here.

New idea for a project - save the minutes of my firehouse meetings into 
a mysql table and build a ui to search them for words and such.  The 
docs are written in Word currently.  My simplistic idea is to perhaps 
convert them to something other than Word format and then to store them 
into a field of a mysql record with the meeting date as a key field.
Of course having them online I should also allow for viewing as a 
document in something close to their original (?) format.


Any ideas - pro or con - on this idea?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-12 Thread Paul M Foster
On Wed, Dec 12, 2012 at 01:00:41PM -0500, Jim Giner wrote:

 Slightly off-topic perhaps but I'm looking for general input here.
 
 New idea for a project - save the minutes of my firehouse meetings
 into a mysql table and build a ui to search them for words and such.
 The docs are written in Word currently.  My simplistic idea is to
 perhaps convert them to something other than Word format and then to
 store them into a field of a mysql record with the meeting date as a
 key field.
 Of course having them online I should also allow for viewing as a
 document in something close to their original (?) format.
 
 Any ideas - pro or con - on this idea?

First off, I'd convert them to RTF (rich text format). Word format is
too ephemeral ( = self-incompatible). RTF is a lowest common denomenator
which can be converted to a variety of other formats. And RTF is a
standardized format that both Word and things like Open Office both
understand. The formatting for meeting minutes don't dictate a very
complicated layout (something that RTF isn't that good with). I would
suggest HTML format, but Word is notoriously atrocious at faithfully
converting its own formats into HTML. The result is horrid.

Second, you've hit on one of my pet peeves. Never never store huge
blocks of text in SQL files. It slows them down and there's no real
reason for it. There's no reason to force a DBMS to schlep around
massive clumps of text or binary data. That's what disk file systems are
for. Store the target data in a file and store a reference to the
location of the data in the SQL database. Or perhaps, use a NoSQL
solution. I don't know much about the internals of nosql systems, but I
would hope that the metadata about the text objects would be stored
separately from the payload (text object).

Paul

-- 
Paul M. Foster
http://noferblatz.com
http://quillandmouse.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-12 Thread Maciek Sokolewicz

On 12-12-2012 21:03, Paul M Foster wrote:

Second, you've hit on one of my pet peeves. Never never store huge
blocks of text in SQL files. It slows them down and there's no real
reason for it. There's no reason to force a DBMS to schlep around
massive clumps of text or binary data. That's what disk file systems are
for. Store the target data in a file and store a reference to the
location of the data in the SQL database. Or perhaps, use a NoSQL
solution. I don't know much about the internals of nosql systems, but I
would hope that the metadata about the text objects would be stored
separately from the payload (text object).

Paul



I actually disagree on this point. In the past, storing data in a 
database would make the entire database-system extremely slow and would 
eat up memory. These days, most database-systems can be (or even are) 
optimized to actually not do this anymore.


One positive aspect of storing such data in a database is the ability to 
search using full-text searches. For example, you could use the Sphinx 
Search Engine, which integrates into MySQL very well. It makes searching 
for specific words, phrases, etc. very simple and VERY fast.


So in this case, storing it in a database WOULD actually be a good idea IMO.

- Tul

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] storing searching docs

2012-12-12 Thread Maciek Sokolewicz

On 12-12-2012 21:40, Maciek Sokolewicz wrote:

On 12-12-2012 21:03, Paul M Foster wrote:

Second, you've hit on one of my pet peeves. Never never store huge
blocks of text in SQL files. It slows them down and there's no real
reason for it. There's no reason to force a DBMS to schlep around
massive clumps of text or binary data. That's what disk file systems are
for. Store the target data in a file and store a reference to the
location of the data in the SQL database. Or perhaps, use a NoSQL
solution. I don't know much about the internals of nosql systems, but I
would hope that the metadata about the text objects would be stored
separately from the payload (text object).

Paul



I actually disagree on this point. In the past, storing data in a
database would make the entire database-system extremely slow and would
eat up memory. These days, most database-systems can be (or even are)
optimized to actually not do this anymore.

One positive aspect of storing such data in a database is the ability to
search using full-text searches. For example, you could use the Sphinx
Search Engine, which integrates into MySQL very well. It makes searching
for specific words, phrases, etc. very simple and VERY fast.

So in this case, storing it in a database WOULD actually be a good idea
IMO.

- Tul


Actually, I have to come back on that one. You could also store it 
locally in files, and feed it into the searchd daemon manually.



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php