Re: [Orgmode] Searching inside of attachments (pdf, odt)?
Hi, Am 13.10.09 19:09, schrieb Samuel Wales: Have you tried the agenda search feature yet? If not, perhaps trying it first will help ground the discussion. OK, I had another look at the org agenda search feature and I agree that it would be much smarter to use the already implemented org features - to go the org-way. But I must confess I do not know how to push the attachments to pdf2txt, make the org agenda search the text-files and link back to the corresponding org task in the org-file. Any ideas? Karl ___ Emacs-orgmode mailing list Remember: use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode
Re: [Orgmode] Searching inside of attachments (pdf, odt)?
Hi Samuel, Samuel Wales samolog...@gmail.com schrieb: My idea is to use ordinary agenda search like this: 1) agenda search displays the headline that has the attachment. 2) org uses an alist to determine the correct textifier according to extension. e.g. '((.pdf . pdf2text)). 3) agenda searches normally (as if the contents of the attachment were body text). correct me if i'm wrong, but your approach is to search inside (an) already identified attachment(s)? I'd like to find attachments by searching inside the whole set of attachments. I do have many articles (pdf-files) to deal with. When i write a report on a special topic i have to find articles that are relevant to the topic i'm working on at the moment. If we use the standard textifiers the procedure will probably get very slow if there are many attachments. I think using an index would be a good idea. To describe what i'm looking for: My first step is to create an entry for each article, define tags (describing the content) and add some notes. * Title of the article :tag:tag:tag: :PROPERTIES: :Attachments: article.pdf :ID: 387HJGJD78-758GZFHF87-JKHKJ57dfd9 :END: - Very good explanation of X. - New view on Y. But it would be much more powerful to be able not only to find an entry by searching for tags but to search inside the attachments. I'm not a programmer, so sorry if my ideas are stupid. ;-) But i thing the following questions have to be answered: 1) Is there a tool like Lucene that can index pdf-files as they are stored by orgmode (directory structure)? 2) Is it possible to send a query to this tool from within emacs? 3) Is it possible to import the answer of the tool into emacs and combine it with orgmode so that the result looks somehow like this: Search string 'XX' found in file 'article.pdf' attached to task 'Title of the article'. A click on the name of the attachment should open the pdf-file in the pdf-reader; a click on the task name should show the task in the org-buffer. Karl ___ Emacs-orgmode mailing list Remember: use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode
Re: [Orgmode] Searching inside of attachments (pdf, odt)?
FWIW I think this might be handled easier if all that happened would be a grep on the attachments, or directories. The usual grep interface can be used and then it becomes a fast general purpose data mining extension. I can see it being used to search a codebase or website for a text string. I guess it could be further refined with some kind of dispatcher - like the file dispatcher that invokes a specific tool to view an attachment, except it uses an attachment specific search or defaults to grep if its not an emacs editable file. Possibly an extension fo the current file:text-file::in buffer search , but uses this grep or whatever if it comes up against something un-emacs-editable. An added bonus of a search dispatcher type approach: it would give users the chance to extend the search into whatever tool(s)/file format(s) they are using without having to become core to org. Just my 2eurocents worth: Tim. 2009/10/13 Karl Maihofer ignora...@gmx.de: Hi Samuel, Samuel Wales samolog...@gmail.com schrieb: My idea is to use ordinary agenda search like this: 1) agenda search displays the headline that has the attachment. 2) org uses an alist to determine the correct textifier according to extension. e.g. '((.pdf . pdf2text)). 3) agenda searches normally (as if the contents of the attachment were body text). correct me if i'm wrong, but your approach is to search inside (an) already identified attachment(s)? I'd like to find attachments by searching inside the whole set of attachments. I do have many articles (pdf-files) to deal with. When i write a report on a special topic i have to find articles that are relevant to the topic i'm working on at the moment. If we use the standard textifiers the procedure will probably get very slow if there are many attachments. I think using an index would be a good idea. To describe what i'm looking for: My first step is to create an entry for each article, define tags (describing the content) and add some notes. * Title of the article :tag:tag:tag: :PROPERTIES: :Attachments: article.pdf :ID: 387HJGJD78-758GZFHF87-JKHKJ57dfd9 :END: - Very good explanation of X. - New view on Y. But it would be much more powerful to be able not only to find an entry by searching for tags but to search inside the attachments. I'm not a programmer, so sorry if my ideas are stupid. ;-) But i thing the following questions have to be answered: 1) Is there a tool like Lucene that can index pdf-files as they are stored by orgmode (directory structure)? 2) Is it possible to send a query to this tool from within emacs? 3) Is it possible to import the answer of the tool into emacs and combine it with orgmode so that the result looks somehow like this: Search string 'XX' found in file 'article.pdf' attached to task 'Title of the article'. A click on the name of the attachment should open the pdf-file in the pdf-reader; a click on the task name should show the task in the org-buffer. Karl ___ Emacs-orgmode mailing list Remember: use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode ___ Emacs-orgmode mailing list Remember: use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode
Re: [Orgmode] Searching inside of attachments (pdf, odt)?
Hi, My idea is to keep it simple at first. Everybody will come up with great ways to integrate with his favorite IR tool. Here I want to focus on the org interface. The org interface can be the same as any other agenda search, with all the same controls. The back end can use special-purpose textifiers like pdf2text (or whatever) or general-purpose textifiers from IR tools. Doesn't matter. Later, the mechanism can get more fancy if desired. But first, we should implement existing behavior. I often move things to attachments merely because they are large. I don't want search to work differently just because I did that. Search should IMO work the same as it does for outline bodies. This includes regexp syntax. If we use anything other than Emacs, we risk one regexp syntax for attachments and another for outline bodies. That makes me shudder. Later, we can use the fancier IR tools, or use reverse indexes. But not everybody has IR tools installed, and reverse indexes might be premature optimization. If you're worried about speed, this is a perfect, simple application for caching. I'd try it before concluding that it is too slow. If it is, we have a good foundation into which we can hook your favorite IR. I don't think there's a downside to achieving compatibility and full agenda integration first, then only after that doing the fancy stuff. Have you tried the agenda search feature yet? If not, perhaps trying it first will help ground the discussion. ___ Emacs-orgmode mailing list Remember: use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode
[Orgmode] Searching inside of attachments (pdf, odt)?
Hi, does anyone use something like Lucene[*] with orgmode to search inside attachments like pdf- and odt-files? At the moment I use org for time-planning and a stand-alone Confluence wiki for knowledge management (which uses Lucene to index attachments). My knowledge management mainly consists of a large amount of pdf-files. If I could search inside attachments with org, I could perhaps switch to an Emacs-only solution. That would be awesome. Kind regards, Karl [*] http://en.wikipedia.org/wiki/Lucene ___ Emacs-orgmode mailing list Remember: use `Reply All' to send replies to the list. Emacs-orgmode@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-orgmode