official orgmode parser

2020-09-15 Thread Przemysław Kamiński

Hello,

I oftentimes find myself needing to parse org files with some external 
tools (to generate reports for customers or sum up clock times for given 
month, etc). Looking through the list


https://orgmode.org/worg/org-tools/

and having tested some of these, I must say they are lacking. The 
Haskell ones seem to be done best, but then the compile overhead of 
Haskell and difficulty in embedding this into other languages is a drawback.


I think it might benefit the community when such an official parser 
would exist (and maybe could be hooked into org mode directly).


I was thinking picking some scheme like chicken or guile, which could be 
later easily embedded into C or whatever. Then use that parser in org 
mode itself. This way some important part of org mode would be outside 
of the small world of elisp.


This is just an idea, what do you think? :)

Best,
Przemek



Re: official orgmode parser

2020-09-15 Thread Przemysław Kamiński

On 9/15/20 11:03 AM, Tim Cross wrote:


Przemysław Kamiński  writes:


Hello,

I oftentimes find myself needing to parse org files with some external
tools (to generate reports for customers or sum up clock times for given
month, etc). Looking through the list

https://orgmode.org/worg/org-tools/

and having tested some of these, I must say they are lacking. The
Haskell ones seem to be done best, but then the compile overhead of
Haskell and difficulty in embedding this into other languages is a drawback.

I think it might benefit the community when such an official parser
would exist (and maybe could be hooked into org mode directly).

I was thinking picking some scheme like chicken or guile, which could be
later easily embedded into C or whatever. Then use that parser in org
mode itself. This way some important part of org mode would be outside
of the small world of elisp.

This is just an idea, what do you think? :)



The problem with this idea is maintenance. It is also partly why
external tools are not terribly reliable/good. Org mode is constantly
being enhanced and improved. It is very hard for external tools to keep
pace with org-mode development, so they soon get out of date or stop
working correctly.

Org mode IS an elsip application. This is the main goal. The reason it
works so well is because elisp is largely a DSL that focuses on text
manipulation and is therefore ideally suited for a text based organiser.

This means if you want to implement parsing of org files in any
other language, there is a lot of fundamental functionality which willl
need to be implemented that is not necessary when using elisp as it is
already built-in. Not only that, it is also 'battle hardened' and well
tested. The other problem would be in selecting another language which
behaves consistently across all the platforms Emacs and org-mode is
supported on. As org-mode is a stnadard part of Emacs, it also needs to
be implemented in something which is also available on all the platforms
emacs is on without needing the user to install additional software.

The other issue is that you would need another skill in order to
maintain/extend org-mode. In addition to elisp, you will also need to
know whatever the parser implementation language is.

A third negative is that if the parser was in a different language to
elisp, the interface between the rest of org mode (in elisp) and the
parser would become an issue. At the moment, there are far fewer
barriers as it is all elisp. However, if part of the system is in
another language, you are now restricted to whatever defined interface
exists. This would likely also have performance issues and overheads
associated with translating from one format to another etc.

So, in short, the chances of org mode using a parser written in
something other than elisp is pretty close to 0. This leaves you with 2
options -

1. Implement another external tool which can parse org-files. As
metnioned above, this is a non-trivial task and will likely be difficult
to maintain. Probably not the best first choice.

2. Provide some details about your workflow where you believe you need
to use external tools to process the org-files. It is very likely there
are alternative approaches to give you the result you want, but without
the need to do external parsing of org-files. There isn't sufficient
details in the examples you mention to provide any specific details.
However, I have used org-mode for reporting, invoicing, time tracking,
documentation, issue/request tracking, project planning and project
management and never needed to parse my org files with an external tool.
I have exported the data in different formats which have then been
processed by other tools and I have tweaked my setup to support various
enterprise/corporate standards or requirements (logos, corporate
colours, report formats, etc). Sometimes these tweaks are trivial and
others require more extensive effort. Often, others have had to do
something the same or similar and have working examples etc.

So my recommendation is post some messages to this list with details on
what you need to try and do and see what others can suggest. I would
keep each post to a single item rather than one long post with multiple
requests. From watching this list, I've often see someone post a "How
can I ..." question only to get the answer "Oh, that is already
built-in, just do .". Org is a large application with lots of
sophisticated power that isn't always obvious from just reading the
manual.




So, I keep clock times for work in org mode, this is very handy. 
However, my customers require that I use their service to provide the 
times. They do offer API. So basically I'm using elisp to parse org, 
make API calls, and at the same time generate CSV reports with a Python 
interop with org babel (because my elisp is just too bad to do that). If 
I had access to some org parser, I'd pick a language that would be more 
comfortable f

Re: official orgmode parser

2020-09-15 Thread Przemysław Kamiński

On 9/15/20 11:55 AM, Russell Adams wrote:

On Tue, Sep 15, 2020 at 11:17:57AM +0200, Przemysław Kamiński wrote:

Org mode IS an elsip application. This is the main goal. The reason it
works so well is because elisp is largely a DSL that focuses on text
manipulation and is therefore ideally suited for a text based organiser.


So, I keep clock times for work in org mode, this is very handy.
However, my customers require that I use their service to provide the
times. They do offer API. So basically I'm using elisp to parse org,
make API calls, and at the same time generate CSV reports with a Python
interop with org babel (because my elisp is just too bad to do
that).


Please consider this is a very specialized use case.


If I had access to some org parser, I'd pick a language that would
be more comfortable for me to get the job done. I guess it can all
be done in elisp, however this is just a tool for me alone and I
have limited time resources on hacking things for myself :)


Maintainer time is limited too. Maintaining a parser library outside
of Emacs would be difficult for the reasons already given. I'd
encourage you to pick up some more Elisp, which I am also trying to
do.


Anyways, my parser needs aren't that sophisticated: just parse the file,
return headings with clock drawers. I tried the common lisp library but
got frustrated after fiddling with it for couple of hours.


If it's that small you could always do that in Python with regexps for
your usage if you're more comfortable in Python. Org's plain text
format means you can read it with anything. I suspect grep might even
pull headlines and clocks successfully.



I haven't looked at the elisp parser much, but I do wonder if someone
couldn't write an exporter that exports a programmatic version of your
org file data (ie: to xml). Then other tools could ingest those xml
files. That'd certainly be a contrib module and not in the core, but
might be worth your while to explore the idea if you really want to
work with Org data outside of Emacs.


--
Russell Adamsrlad...@adamsinfoserv.com

PGP Key ID: 0x1160DCB3   http://www.adamsinfoserv.com/

Fingerprint:1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3



There's the org-json (or ox-json) package but for some reason I wasn't 
able to run it successfully. I guess export to S-exps would be best 
here. But yes I'll check that out.


Przemek



Re: official orgmode parser

2020-09-23 Thread Przemysław Kamiński

On 9/23/20 10:09 AM, Bastien wrote:

Hi Przemysław,

Przemysław Kamiński  writes:


I oftentimes find myself needing to parse org files with some external
tools (to generate reports for customers or sum up clock times for
given month, etc). Looking through the list

https://orgmode.org/worg/org-tools/


Can you help on making the above page more useful to anyone?

Perhaps we can have a separate worg page just for parsers, reporting
the ones that seem to fully work.

I disagree that a parser is too difficult to maintain because Org is
a moving target.  Org core syntax is not moving anymore, a parser can
reasonably target it.  That's what is done with the Ruby parser, in
use in this small project called github.com :)

So I'd say:

- let's enhance Worg's documentation
- yes, please go for enhancing parsing tools

I don't think we need official tools.  The official Org parser exists,
it is Org itself.

Thanks,



Hello Bastien,

Thank you for your remarks.

I updated the README, hopefully it's more usable now.

Przemek



Re: official orgmode parser

2020-09-16 Thread Przemysław Kamiński

On 9/16/20 9:56 AM, Ihor Radchenko wrote:

Wow, another awesomewm user here; could you share your code?


Are you interested in something particular about awesome WM integration?

I am using simple textbox widgets to show currently clocked in task and
weighted summary of clocked time. See the attachments.

Best,
Ihor




Marcin Borkowski  writes:


On 2020-09-15, at 11:17, Przemysław Kamiński  wrote:


So, I keep clock times for work in org mode, this is very
handy. However, my customers require that I use their service to
provide the times. They do offer API. So basically I'm using elisp to
parse org, make API calls, and at the same time generate CSV reports
with a Python interop with org babel (because my elisp is just too bad
to do that). If I had access to some org parser, I'd pick a language
that would be more comfortable for me to get the job done. I guess it
can all be done in elisp, however this is just a tool for me alone and
I have limited time resources on hacking things for myself :)


I was in the exact same situation - I use Org-mode clocking, and we use
Toggl at our company, so I wrote a simple tool to fire API requests to
Toggl on clock start/cancel/end: https://github.com/mbork/org-toggl
It's a bit more than 200 lines of Elisp, so you might try to look into
it and adapt it to whatever tool your employer is using.


Another one is generating total hours report for day/week/month to put
into my awesomewm toolbar. I ended up using orgstat
https://github.com/volhovM/orgstat
however the author is creating his own DSL in YAML and I guess things
were much better off if it all stayed in some Scheme :)


Wow, another awesomewm user here; could you share your code?

Best,

--
Marcin Borkowski
http://mbork.pl



I don't have interesting code, just standard awesomevm setup. I run 
periodic script to output data computed by orgstat and show it in the 
taskbar (uses the shellout_widget).


However what Ihor presented is interesting. Do you use similar approach 
with shellout and 'emacs -batch' to show currently running task or you 
'push' data from emacs to show it in the taskbar?


P.



Re: official orgmode parser

2020-09-17 Thread Przemysław Kamiński

On 9/17/20 3:18 AM, Ihor Radchenko wrote:

So basically this is what this thread is about. One needs a working
Emacs instance and work in "push" mode to export any Org data. This
requires dealing with temporary files, as described above, and some
ad-hoc formats to keep whatever data I need to pull from org.



"Pull" mode would be preferred. I could then, say, write a script in
Guile, execute 'emacs -batch' to export org data (I'm ok with that),
then parse the S-expressions to get what I need.


My choice to use "push" mode is just for performance reasons. Nothing
prevents you from writing a function called from emacs --batch that
converts parsed org data into whatever format your Guile script prefers.
That function may be either on Emacs side or on Guile side. Probably,
Emacs has more capabilities when dealing with s-expressions though.

You can even directly push the information from Emacs to API server.
You may find https://github.com/tkf/emacs-request useful for this task.

Finally, you may also consider clock tables to create clock summaries
using existing org-mode functionality. The tables can be named and
accessed using any programming language via babel.

Best,
Ihor


Przemysław Kamiński  writes:


On 9/16/20 2:02 PM, Ihor Radchenko wrote:

However what Ihor presented is interesting. Do you use similar approach
with shellout and 'emacs -batch' to show currently running task or you
'push' data from emacs to show it in the taskbar?


I prefer to avoid querying emacs too often for performance reasons.
Instead, I only update the clocking info when I clock in/out in emacs.
Then, the clocked in time is dynamically updated by independent bash
script.

The scheme is the following:
1. org clock in/out in Emacs trigger writing clocking info into
 ~/.org-clock-in status file
2. bash script periodically monitors the file and calculates the clocked
 in time according to the contents and time from last modification
3. the script updates simple textbox widget using awesome-client
4. the script also warns me (notify-send) when the weighted clocked in
 time is negative (meaning that I should switch to some more
 productive activity)

Best,
Ihor

Przemysław Kamiński  writes:


On 9/16/20 9:56 AM, Ihor Radchenko wrote:

Wow, another awesomewm user here; could you share your code?


Are you interested in something particular about awesome WM integration?

I am using simple textbox widgets to show currently clocked in task and
weighted summary of clocked time. See the attachments.

Best,
Ihor




Marcin Borkowski  writes:


On 2020-09-15, at 11:17, Przemysław Kamiński  wrote:


So, I keep clock times for work in org mode, this is very
handy. However, my customers require that I use their service to
provide the times. They do offer API. So basically I'm using elisp to
parse org, make API calls, and at the same time generate CSV reports
with a Python interop with org babel (because my elisp is just too bad
to do that). If I had access to some org parser, I'd pick a language
that would be more comfortable for me to get the job done. I guess it
can all be done in elisp, however this is just a tool for me alone and
I have limited time resources on hacking things for myself :)


I was in the exact same situation - I use Org-mode clocking, and we use
Toggl at our company, so I wrote a simple tool to fire API requests to
Toggl on clock start/cancel/end: https://github.com/mbork/org-toggl
It's a bit more than 200 lines of Elisp, so you might try to look into
it and adapt it to whatever tool your employer is using.


Another one is generating total hours report for day/week/month to put
into my awesomewm toolbar. I ended up using orgstat
https://github.com/volhovM/orgstat
however the author is creating his own DSL in YAML and I guess things
were much better off if it all stayed in some Scheme :)


Wow, another awesomewm user here; could you share your code?

Best,

--
Marcin Borkowski
http://mbork.pl



I don't have interesting code, just standard awesomevm setup. I run
periodic script to output data computed by orgstat and show it in the
taskbar (uses the shellout_widget).

However what Ihor presented is interesting. Do you use similar approach
with shellout and 'emacs -batch' to show currently running task or you
'push' data from emacs to show it in the taskbar?

P.



So basically this is what this thread is about. One needs a working
Emacs instance and work in "push" mode to export any Org data. This
requires dealing with temporary files, as described above, and some
ad-hoc formats to keep whatever data I need to pull from org.

"Pull" mode would be preferred. I could then, say, write a script in
Guile, execute 'emacs -batch' to export org data (I'm ok with that),
then parse the S-expressions to get what I need.

P.




OK so this is what I got so far
https://gitlab.com/cgenie/org-parse
I stole the simple test.org file from ox-json test suite.
Guile seems 

Re: official orgmode parser

2020-09-16 Thread Przemysław Kamiński

On 9/15/20 2:37 PM, to...@tuxteam.de wrote:

On Tue, Sep 15, 2020 at 01:15:56PM +0200, Przemysław Kamiński wrote:

[...]


There's the org-json (or ox-json) package but for some reason I
wasn't able to run it successfully. I guess export to S-exps would
be best here. But yes I'll check that out.


If that's your route, perhaps the "Org element API" [1] might be
helpful. Especially `org-element-parse-buffer' gives you a Lisp
data structure which is supposed to be a parse of your Org buffer.

 From there to S-expression can be trivial (e.g. `print' or `pp'),
depending on what you want to do.

Walking the structure should be nice in Lisp, too.

The topic of (non-Emacs) parsing of Org comes up regularly, and
there is a good (but AFAIK not-quite-complete) Org syntax spec
in Worg [2], but there are a couple of difficulties to be mastered
before such a thing can become really enjoyable and useful.

The loose specification of Org's format (arguably its second
or third strongest asset, the first two being its incredible
community and Emacs itself) is something which makes this
problem "interesting". People have invented lots of usages
which might be broken should Org change to a strict formal
spec. You don't want to break those people.

But yes, perhaps some day someone nails it. Perhaps it's you :)

Cheers

[1] https://orgmode.org/worg/dev/org-element-api.html
[2] https://orgmode.org/worg/dev/org-syntax.html

  - t



So I looked at (pp (org-element-parse-buffer)) however it does print out 
recursive stuff which other schemes have trouble parsing.


My code looks more or less like this:

(defun org-parse (f)
  (with-temp-buffer
(find-file f)
(let* ((parsed (org-element-parse-buffer))
   (all (append org-element-all-elements org-element-all-objects))
   (mapped (org-element-map parsed all
 (lambda (item)
   (strip-parent item)
  (pp mapped


strip-parent is basically (plist-put props :parent nil) for elements 
properties. However it turns out there are more recursive objects, like


:title
  #("Headline 1" 0 10
(:parent
 (headline #2
   (section

So I'm wondering do I have to do it by hand for all cases or is there 
some way to output only a simple AST without those nested objects?


Best,
Przemek



Re: official orgmode parser

2020-09-16 Thread Przemysław Kamiński

On 9/16/20 2:02 PM, Ihor Radchenko wrote:

However what Ihor presented is interesting. Do you use similar approach
with shellout and 'emacs -batch' to show currently running task or you
'push' data from emacs to show it in the taskbar?


I prefer to avoid querying emacs too often for performance reasons.
Instead, I only update the clocking info when I clock in/out in emacs.
Then, the clocked in time is dynamically updated by independent bash
script.

The scheme is the following:
1. org clock in/out in Emacs trigger writing clocking info into
~/.org-clock-in status file
2. bash script periodically monitors the file and calculates the clocked
in time according to the contents and time from last modification
3. the script updates simple textbox widget using awesome-client
4. the script also warns me (notify-send) when the weighted clocked in
time is negative (meaning that I should switch to some more
productive activity)

Best,
Ihor

Przemysław Kamiński  writes:


On 9/16/20 9:56 AM, Ihor Radchenko wrote:

Wow, another awesomewm user here; could you share your code?


Are you interested in something particular about awesome WM integration?

I am using simple textbox widgets to show currently clocked in task and
weighted summary of clocked time. See the attachments.

Best,
Ihor




Marcin Borkowski  writes:


On 2020-09-15, at 11:17, Przemysław Kamiński  wrote:


So, I keep clock times for work in org mode, this is very
handy. However, my customers require that I use their service to
provide the times. They do offer API. So basically I'm using elisp to
parse org, make API calls, and at the same time generate CSV reports
with a Python interop with org babel (because my elisp is just too bad
to do that). If I had access to some org parser, I'd pick a language
that would be more comfortable for me to get the job done. I guess it
can all be done in elisp, however this is just a tool for me alone and
I have limited time resources on hacking things for myself :)


I was in the exact same situation - I use Org-mode clocking, and we use
Toggl at our company, so I wrote a simple tool to fire API requests to
Toggl on clock start/cancel/end: https://github.com/mbork/org-toggl
It's a bit more than 200 lines of Elisp, so you might try to look into
it and adapt it to whatever tool your employer is using.


Another one is generating total hours report for day/week/month to put
into my awesomewm toolbar. I ended up using orgstat
https://github.com/volhovM/orgstat
however the author is creating his own DSL in YAML and I guess things
were much better off if it all stayed in some Scheme :)


Wow, another awesomewm user here; could you share your code?

Best,

--
Marcin Borkowski
http://mbork.pl



I don't have interesting code, just standard awesomevm setup. I run
periodic script to output data computed by orgstat and show it in the
taskbar (uses the shellout_widget).

However what Ihor presented is interesting. Do you use similar approach
with shellout and 'emacs -batch' to show currently running task or you
'push' data from emacs to show it in the taskbar?

P.



So basically this is what this thread is about. One needs a working 
Emacs instance and work in "push" mode to export any Org data. This 
requires dealing with temporary files, as described above, and some 
ad-hoc formats to keep whatever data I need to pull from org.


"Pull" mode would be preferred. I could then, say, write a script in 
Guile, execute 'emacs -batch' to export org data (I'm ok with that), 
then parse the S-expressions to get what I need.


P.



Re: official orgmode parser

2020-10-26 Thread Przemysław Kamiński
I'm no expert in parsing but I would expect org's parser to be quite 
similar to the multitude of markdown or CommonMark [1] parsers. There 
isn't that much difference in syntax, except maybe org is more versatile 
and has more syntax elements, like drawers.


Searching for "EBNF Markdown" I stumbled upon [2].

[1] https://commonmark.org/
[2] http://roopc.net/posts/2014/markdown-cfg/

On 10/26/20 10:00 PM, Tom Gillespie wrote:

Here is an attempt to clarify my own confusion around the nested
structures in org. In short: each node in the headline tree and the
plain list tree can be parse using the EBNF, the nesting level cannot,
which means that certain useful operations such as folding, require
additional rules beyond the grammar. More in line. Best!
Tom


Do you need to? This is valid as an entire Org file, I think:

*** foo
* bar
* baz

And that can be represented in EBNF. I'm not aware of places where behavior is 
indent-level specific, except inline tasks, and that edge case can be 
represented.


You are correct, and as long as the heading depth doesn't change some
interpretation then this is a non-issue. The reason I mentioned this
though is
because it means that you cannot determine how to correctly fold an
org file from the grammar alone.

To make sure I understand. It is possible to determine the number of
leading stars (and thus the level), but I think that it is not
possible to identify the end of a section.
For example

* a
*** b
** c
* d

You can parse out a 1, b 3, c 2, d 1, but if you want to be able to
nest b and c inside a but not nest d inside a, then you need a stack
in there somewhere. You
can't have a rule such as

section : headline content
content : text | section

because the parse would incorrectly nest sections at the same level,
you would have to write

section-level-1 : headline-1 content-1
content-1 : text | section-level-2-n

but since we have an arbitrary number of levels the grammar would have
to be infinite.
This is only if you want your grammar to be able to encode that the
content of sections
can include other more deeply nested sections, which in this context
we almost certainly
do not (as you point out).


There is a similar issue with the indentation level in
order to correctly interpret plain lists.


list ::= ('+' string newline)+ sublist?
sublist ::= (indent list)+

I think this captures lists?


Ah yes, I see my mistake here. In order for this to work the parser
has to implement significant whitespace,
so whitespace cannot be parsed into a single token. I think everything
works out after that.


Definitely not able to be represented in EBNF, unless as you say {name} is a 
limited vocabulary.


Darn those pesky open sets!