Re: [Orgmode] Searching inside of attachments (pdf, odt)?

2009-10-14 Thread Karl Maihofer

Hi,

Am 13.10.09 19:09, schrieb Samuel Wales:

Have you tried the agenda search feature yet?  If not, perhaps trying
it first will help ground the discussion.


OK, I had another look at the org agenda search feature and I agree that 
it would be much smarter to use the already implemented org features - 
to go the org-way.


But I must confess I do not know how to push the attachments to pdf2txt, 
make the org agenda search the text-files and link back to the 
corresponding org task in the org-file.


Any ideas?

Karl


___
Emacs-orgmode mailing list
Remember: use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode


Re: [Orgmode] Searching inside of attachments (pdf, odt)?

2009-10-13 Thread Karl Maihofer

Hi Samuel,

Samuel Wales samolog...@gmail.com schrieb:

My idea is to use ordinary agenda search like this:
  1) agenda search displays the headline that has the
 attachment.
  2) org uses an alist to determine the correct textifier
 according to extension.  e.g. '((.pdf . pdf2text)).
  3) agenda searches normally (as if the contents of the
 attachment were body text).


correct me if i'm wrong, but your approach is to search inside (an)
already identified attachment(s)?

I'd like to find attachments by searching inside the whole set of
attachments. I do have many articles (pdf-files) to deal with. When i
write a report on a special topic i have to find articles that are
relevant to the topic i'm working on at the moment.

If we use the standard textifiers the procedure will probably get very
slow if there are many attachments. I think using an index would be a
good idea.

To describe what i'm looking for:
My first step is to create an entry for each article, define tags
(describing the content) and add some notes.

* Title of the article   :tag:tag:tag:
  :PROPERTIES:
  :Attachments: article.pdf
  :ID: 387HJGJD78-758GZFHF87-JKHKJ57dfd9
  :END:
  - Very good explanation of X.
  - New view on Y.

But it would be much more powerful to be able not only to find an
entry by searching for tags but to search inside the attachments.

I'm not a programmer, so sorry if my ideas are stupid. ;-) But i thing
the following questions have to be answered:

1) Is there a tool like Lucene that can index pdf-files as they are
   stored by orgmode (directory structure)?
2) Is it possible to send a query to this tool from within emacs?
3) Is it possible to import the answer of the tool into emacs and
   combine it with orgmode so that the result looks somehow like this:
   Search string 'XX' found in file 'article.pdf' attached to task
   'Title of the article'. A click on the name of the attachment
   should open the pdf-file in the pdf-reader; a click on the task
   name should show the task in the org-buffer.

Karl






___
Emacs-orgmode mailing list
Remember: use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode


Re: [Orgmode] Searching inside of attachments (pdf, odt)?

2009-10-13 Thread Tim O'Callaghan
FWIW

I think this might be handled easier if all that happened would be a
grep on the attachments, or directories.

The usual grep interface can be used and then it becomes a fast
general purpose data mining extension.

I can see it being used to search a codebase or website for a text string.

I guess it could be further refined with some kind of dispatcher -
like the file dispatcher that invokes a specific tool to view an
attachment, except it uses an attachment specific search or defaults
to grep if its not an emacs editable file.

Possibly an extension fo the current file:text-file::in buffer
search , but uses this grep or whatever if it comes up against
something un-emacs-editable.

An added bonus of a search dispatcher type approach: it would give
users the chance to extend the search into whatever tool(s)/file
format(s) they are using without having to become core to org.

Just my 2eurocents worth:

Tim.
2009/10/13 Karl Maihofer ignora...@gmx.de:
 Hi Samuel,

 Samuel Wales samolog...@gmail.com schrieb:

 My idea is to use ordinary agenda search like this:
  1) agenda search displays the headline that has the
 attachment.
  2) org uses an alist to determine the correct textifier
 according to extension.  e.g. '((.pdf . pdf2text)).
  3) agenda searches normally (as if the contents of the
 attachment were body text).

 correct me if i'm wrong, but your approach is to search inside (an)
 already identified attachment(s)?

 I'd like to find attachments by searching inside the whole set of
 attachments. I do have many articles (pdf-files) to deal with. When i
 write a report on a special topic i have to find articles that are
 relevant to the topic i'm working on at the moment.

 If we use the standard textifiers the procedure will probably get very
 slow if there are many attachments. I think using an index would be a
 good idea.

 To describe what i'm looking for:
 My first step is to create an entry for each article, define tags
 (describing the content) and add some notes.

 * Title of the article   :tag:tag:tag:
  :PROPERTIES:
  :Attachments: article.pdf
  :ID: 387HJGJD78-758GZFHF87-JKHKJ57dfd9
  :END:
  - Very good explanation of X.
  - New view on Y.

 But it would be much more powerful to be able not only to find an
 entry by searching for tags but to search inside the attachments.

 I'm not a programmer, so sorry if my ideas are stupid. ;-) But i thing
 the following questions have to be answered:

 1) Is there a tool like Lucene that can index pdf-files as they are
   stored by orgmode (directory structure)?
 2) Is it possible to send a query to this tool from within emacs?
 3) Is it possible to import the answer of the tool into emacs and
   combine it with orgmode so that the result looks somehow like this:
   Search string 'XX' found in file 'article.pdf' attached to task
   'Title of the article'. A click on the name of the attachment
   should open the pdf-file in the pdf-reader; a click on the task
   name should show the task in the org-buffer.

 Karl






 ___
 Emacs-orgmode mailing list
 Remember: use `Reply All' to send replies to the list.
 Emacs-orgmode@gnu.org
 http://lists.gnu.org/mailman/listinfo/emacs-orgmode



___
Emacs-orgmode mailing list
Remember: use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode


Re: [Orgmode] Searching inside of attachments (pdf, odt)?

2009-10-13 Thread Samuel Wales
Hi,

My idea is to keep it simple at first.  Everybody will come
up with great ways to integrate with his favorite IR tool.

Here I want to focus on the org interface.

The org interface can be the same as any other agenda
search, with all the same controls.  The back end can use
special-purpose textifiers like pdf2text (or whatever) or
general-purpose textifiers from IR tools.  Doesn't matter.

Later, the mechanism can get more fancy if desired.  But
first, we should implement existing behavior.  I often move
things to attachments merely because they are large.  I
don't want search to work differently just because I did
that.  Search should IMO work the same as it does for
outline bodies.

This includes regexp syntax.  If we use anything other than
Emacs, we risk one regexp syntax for attachments and another
for outline bodies.  That makes me shudder.

Later, we can use the fancier IR tools, or use reverse
indexes.  But not everybody has IR tools installed, and
reverse indexes might be premature optimization.

If you're worried about speed, this is a perfect, simple
application for caching.  I'd try it before concluding that
it is too slow.  If it is, we have a good foundation into
which we can hook your favorite IR.

I don't think there's a downside to achieving compatibility
and full agenda integration first, then only after that
doing the fancy stuff.

Have you tried the agenda search feature yet?  If not, perhaps trying
it first will help ground the discussion.


___
Emacs-orgmode mailing list
Remember: use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode


[Orgmode] Searching inside of attachments (pdf, odt)?

2009-10-12 Thread Karl Maihofer

Hi,

does anyone use something like Lucene[*] with orgmode to search inside  
attachments like pdf- and odt-files? At the moment I use org for  
time-planning and a stand-alone Confluence wiki for knowledge  
management (which uses Lucene to index attachments). My knowledge  
management mainly consists of a large amount of pdf-files. If I could  
search inside attachments with org, I could perhaps switch to an  
Emacs-only solution. That would be awesome.


Kind regards,
Karl

[*] http://en.wikipedia.org/wiki/Lucene






___
Emacs-orgmode mailing list
Remember: use `Reply All' to send replies to the list.
Emacs-orgmode@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-orgmode