[O] building tagcloud datastructure in elisp

2012-09-12 Thread Marcelo de Moraes Serpa
Hi list,

How hard would it be to parse a bunch of org files and build an elisp data
structure (Hash?) that represents a tagcloud? All tags in all headlines and
subtrees should be taken into account (for all org files that are parsed).
Could I use org-element to help me parse this or is there a better way?

I'm just learning the org API, and I've only done a bunch of elisp hacks,
so any insight would be greatly appreciated!

Thanks,

- Marcelo.


Re: [O] building tagcloud datastructure in elisp

2012-09-12 Thread Eric Schulte
Marcelo de Moraes Serpa celose...@gmail.com writes:

 Hi list,

 How hard would it be to parse a bunch of org files and build an elisp data
 structure (Hash?) that represents a tagcloud? All tags in all headlines and
 subtrees should be taken into account (for all org files that are parsed).
 Could I use org-element to help me parse this or is there a better way?

 I'm just learning the org API, and I've only done a bunch of elisp hacks,
 so any insight would be greatly appreciated!

 Thanks,

 - Marcelo.

My favorite method of getting word frequencies from text files is the
following.  Sometimes it is easier to just Org-mode files as text files
rather than to use e-lisp.

# -*- shell-script -*-
many=20 # to print the 20 most popular words
cat org-file.org \
|tr -cs A-Za-z '\n' \
|tr A-Z a-z \
|sort \
|uniq -c \
|sort -rn \
|sed ${many}q \
|sed 's/^ *//' \
|sed 's/\([^ ]*\) \([^ ]*\)/\2:\1/' \
|tr '\n' ' ' \
|sed 's/ $/\n/'

Adapted from http://www.leancrew.com/all-this/2011/12/more-shell-less-egg/

Best,

-- 
Eric Schulte
http://cs.unm.edu/~eschulte



Re: [O] building tagcloud datastructure in elisp

2012-09-12 Thread Jonathan Leech-Pepin
Hello Marcello,

On 12 September 2012 14:41, Marcelo de Moraes Serpa celose...@gmail.com wrote:
 Hi list,

 How hard would it be to parse a bunch of org files and build an elisp data
 structure (Hash?) that represents a tagcloud? All tags in all headlines and
 subtrees should be taken into account (for all org files that are parsed).
 Could I use org-element to help me parse this or is there a better way?

 I'm just learning the org API, and I've only done a bunch of elisp hacks, so
 any insight would be greatly appreciated!

I'm learning as well, mostly by providing a feature I could use, or by
seeing a problem I find interesting and deciding I want to find a
solution to it.

 Thanks,

 - Marcelo.

Org-element doesn't seem to include tag-inheritance when providing
tags for a given headline, so counting inherited tags becomes slightly
more complex.

The following should provide what you want:

#+begin_src emacs-lisp
  (defun zin/org-tag-cloud-freq (optional inherit file)
Return an alist containing tag and frequency.

  When INHERIT is given, the frequency of a tag includes the number
  of subheadings (to indicate tag inheritance).  FILE allows for an
  arbitrary file to be retrieved and used for tag counting.
(interactive P)
(when file
  (find-file file))
(let* ((source (org-element-parse-buffer 'headline))
   (tags (org-element-map
  source 'headline
  (lambda (headline)
(let ((tags (org-export-get-tags headline source))
  (count (if inherit
 (length (org-element-map headline
'headline 'identity))
   1)))
  (list tags count)
   taglist)
  (setq taglist
(mapcar (lambda (s)
  (when (car s)
(loop for item in (car s) collect
  (list item (cadr s) tags))
  (setq taglist
(loop for item in taglist append item))
  (dolist (tag taglist result)
(let* ((tagitem (car tag))
   (tagcount (cadr tag))
   (sofar (assoc tagitem result)))
  (if sofar
  (setcdr sofar (+ tagcount (cdr sofar)))
(push (cons tagitem tagcount) result
  (format %s result)))

  (defun zin/org-tag-freq-list (files optional inherit)
List of files to be processed by `zin/org-tag-cloud-freq'.

  Returns a single alist of tag counts.
(let (result)
  (dolist (file files result)
(let ((entries (zin/org-tag-cloud-freq inherit file)))
  (loop for tag in entries do
(let ((tagitem (car tag))
  (tagcount (cdr tag))
  (sofar (assoc tagitem result)))
  (if sofar
  (setcdr sofar (+ tagcount (cdr sofar)))
(push (cons tagitem tagcount) result))
  (format %s result)))
#+end_src

The dolist loop for counting the tags themselves comes from
http://stackoverflow.com/questions/6050033/elegant-way-to-count-items.
There may be a cleaner way to obtain the list of tags and associated
counts but this provides the values.

The first function will work on any Org buffer to return the list of
tags while the second will do so for a list of org files (for example
org-agenda-files).

I hope this helps

Regards,

--
Jon