Re: [Trisquel-users] finding particular pages within PDFs

2014-10-01 Thread adel . afzal
Oh man, that is so awesome!

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-07 Thread adel . afzal
I imagine that there's a lot that people can do with tools like this. I opened a documentation page for it here: https://trisquel.info/en/wiki/information-processing I don't know whether Information Processing is the most appropriate name. If there's a better name, please open a new page,

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-07 Thread adel . afzal
My first run of the script was successful, and I now have a basename-matches PDF. I renamed the match file to word-word-#ofhits.pdf I run the script a second time, on that basename-matches PDF (word-word-#ofhits.pdf), to achieve the and functionality. Unfortunatley, on this second run,

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-07 Thread adel . afzal
Trisquel comes with programs like mv and pdfunite right? Do most gnu users (like me until very recently) have them on the computer and not use them? Or do popular GUI programs depend on these kinds of programs to do things (like export to pdf in LibreOffice)?

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-06 Thread adel . afzal
Thanks for the tip; I hadn't thought about how to structure the search in this way. I will review how I'm doing my searches with this in mind.

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-06 Thread adel . afzal
My largest set of PDFs is 80 files. In that set, some PDFs are as big as ~20 mb, some are only ~500 kb. In the newest version of pdf-page-grep, the number of matching pages is restricted to 1021 right? I can search my PDFs in smaller groups if this is the case.

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-06 Thread adel . afzal
I think that I understand -- pdfjam lets the computer group the matches without first creating an individual PDF for each page-match. I will read the new script to spot the differences and to try to understand how you did it.

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-04 Thread adel . afzal
I get an error message at the end of the run, and there doesn't seem to be a matches file in my working folder. Maybe 500+ megs of PDFs is too much. I did a few test runs with a few 10-15 page PDFs, and that seemed to work. I/O Error: Couldn't open file '/tmp/pdf-page-grep.PHgWDa-1022': Too

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-03 Thread adel . afzal
Oh, I understand now -- When I read pipe grep earlier, I thought that pipe referred to script instruction or terminal command that I didn't know yet. You're right; no need to separate out the pages. I just have to pipe grep with a second set of words to achieve and.

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-03 Thread adel . afzal
The script writes the basename-matches.pdf file to the same folder where the script and PDFs are, right? I can't find that matches file. Is it a problem if my PDFs have spaces in the name? (Particularly the last PDF, that the script uses to create the matches.pdf file name)

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-02 Thread adel . afzal
I tried running the script again today, but am having trouble. When I cd to the directory where pdf-page-grep is, and enter pdf-page-grep, the terminal tells me that there is no such command. I tried moving the script to the directory in my PATH variable; I'm not sure that this went well.

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-02 Thread mikko . viinamaki
That means there's no such command in your $PATH. If the script is in the research folder and executable you need to run it with ./scriptname

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-02 Thread adel . afzal
Right! Thank you! I moved the script to the directory in my PATH variable like MB suggested. But I'm not sure that it worked properly. I'll do a little research about that and try again more carefully.

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-02 Thread adel . afzal
Is it possible to search for pages that contain words -- at least one word from each of two groups? For example: First group of ORs: car, truck, bus, bicycle, or motorcycle and Second group of ORs: blue, red, green, purple, or beige So a good hit could have the word green and truck on

Re: [Trisquel-users] finding particular pages within PDFs

2014-09-02 Thread adel . afzal
I just thought of something. I could use pdf-page-grep to do a first pass with my first group of ORs. Then I could split the matches file into single-page PDFs. And then use a new set of ORs on those single-page PDFs. This would be like having an And in the search. Is there an automatic

Re: [Trisquel-users] finding particular pages within PDFs

2014-08-31 Thread magicbanana
There actually was a problem with the input PDFs: if they were not in the working directory, the script was crashing. Also, the output pages were not in the correct order (the order in which the user gave the PDFs). Finally, the script was retuning 0 even if no page matched the patterns (the

Re: [Trisquel-users] finding particular pages within PDFs

2014-08-31 Thread adel . afzal
I want to learn Shell scripting now and will read those comments carefully -- thanks

Re: [Trisquel-users] finding particular pages within PDFs

2014-08-31 Thread gramex
Where's the license for the script?

Re: [Trisquel-users] finding particular pages within PDFs

2014-08-31 Thread magicbanana
Oops! I added those lines: # Distributed under the terms of the GNU General Public License v3 # AUTHOR: Magic Banana # e-mail: lc...@dcc.ufmg.br

Re: [Trisquel-users] finding particular pages within PDFs

2014-08-31 Thread magicbanana
I simplified the script: http://dcc.ufmg.br/~lcerf/utilities/pdf-page-grep It now is closer to my original proposal since it extracts the individual pages with matches and, in the end, join them all. Besides basic POSIX commands (such as 'grep' and 'awk'), the script now only relies on

Re: [Trisquel-users] finding particular pages within PDFs

2014-08-30 Thread adel . afzal
In case someone with a similar situation finds this page -- here's how to run a script: 1. Open the terminal, and type: cd [directory where your script is] Example: cd /home/username/Desktop/research/ Put the PDF files in the same directory 2. Then type the following, to give yourself

Re: [Trisquel-users] finding particular pages within PDFs

2014-08-30 Thread adel . afzal
It took me a little while to figure out what I was looking at ... thank you so much! I'm running the script now, and it's finding pages! This is so cool. I'm going to PM you about that beer. Also, thanks a lot Legimet. You guys are the best.

Re: [Trisquel-users] finding particular pages within PDFs

2014-08-30 Thread magicbanana
It is true that you have to turn the script executable. You can do that with 'chmod +x' or from a graphical file browser (in Nautilus: right click, Properties, Permissions tab, a box to check). If you plan to frequently use the script, you had better move it to a directory listed in your

Re: [Trisquel-users] finding particular pages within PDFs

2014-08-30 Thread magicbanana
The script now considers that the arguments that start with - (e.g., -F or --ignore-case) are options for 'grep'. I put the script on my website: http://dcc.ufmg.br/~lcerf/en/utilities.html#pdf-page-grep

Re: [Trisquel-users] finding particular pages within PDFs

2014-08-29 Thread adel . afzal
MB, having the text would be way more useful than the PDF pages! Thanks for recommending pdftotext and the -layout option. I have some questions -- could you help me break this process down into smaller steps? I looked up pdfjam's split command online -- I think that it may be a little

Re: [Trisquel-users] finding particular pages within PDFs

2014-08-29 Thread legimet . calc
for file in pages/* is a for loop. That means that it will execute the body of the loop for each file in the directory pages/*, setting the variable file to the filename each time. 'if pdftotext $file - | grep -i regexps': the 'pdftotext $file -' part outputs the text of the pdf to

[Trisquel-users] finding particular pages within PDFs

2014-08-28 Thread adel . afzal
Is there a way to search PDF files for keywords, and then create new PDFs that contain only the pages that contain those keywords. I'd like to search the PDFs for words with various combinations of and and or. Is this possible?