Announcing bean-query: a SQL query language for Beancount

2015-01-04 Thread Martin Blais
Update for Beancount:
Today I merged the 'shell' branch of Beancount into its default branch.
This provides the bean-query tool in the main code that gets installed by
default. This now provides a fourth method to extract data from a Beancount
ledger file:

1. bean-web: The web interface
2. bean-report: Report generation using dedicated routines (like Ledger)
3. bean-query: SQL query language to specify reports
4. scripting: You write your own report using the loader library and data
structures

High-level documentation for the corresponding SQL variant is available from
http://furius.ca/beancount/doc/query

The original design/motivation doc is here:
furius.ca/beancount/doc/proposal-query

Note that this is just the first revision of this tool and will mature over
the coming months, but I felt it was in a good enough state to let others
start tinkering with it. About half or to 3/4 of the proposal is
implemented. Feedback and bug reports are welcome at this point.

Cheers,

-- 

--- 
You received this message because you are subscribed to the Google Groups 
Ledger group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ledger-cli+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: A Query Language for Beancount

2014-10-19 Thread Martin Blais
 if you want to be perverse, otherwise you put
 them next to each other or blob them, your choice, but not a problem that
 says only a plain text file makes this better because I haven't thought
 about it enough


I don't understand what you're trying to say.


contemporary dbs can easily split out stuff from a blob better than you
 spending years refining some transactional thing that only you care about,
 promise


Years? Yesterday alone I wrote a pretty complete SQL parser and today I'm
writing an interpreter for it.

Here:
https://bitbucket.org/blais/beancount/src/tip/src/python/beancount/query/query_parser.py?at=shell

You need to understand that despite its ubiquity and familiarity, SQL isn't
a silver bullet for everything. It is by no means perfect. Its does not
express hierarchical data formats well, which is partly why object
databases like MongoDB are enjoying a fair amount of popularity (not so
long ago all the rage was XML-based datastores with better defined
schemas). You can also see traces of this in all the ways that people
writing an access language for hierarchical formats have had to get
creative in order to provide access to nested and repeated fields, e.g.
http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36632.pdf
Also see how the different SQL database implementations each have their own
warts and limitations.

It is entirely reasonable to extend SQL and propose specializations for
particular use cases--everyone does it. You seem to think that SQL is some
sort of superior technology that has solved all data access specification
problems. SQL is a way to specify access to your data, but it is still the
case that application writers need to occasionally combine the output of
multiple queries with some code to perform some operations. I think that
with minimal extensions I can define a variant that works well for our
little accounting database.





  If you have multiple tables, you end up having to do joins and
 deduplication.


 joins yes, deduplication no

 why do you think deduplication would be necessary?  genuinely puzzled


(See detailed explanation above with example about what I called
deduplication.)



 With a single table, you have duplication of information.


 nope, a single table could be just as simple as a plain text file

  At the moment I output separate tables for transactions and postings,
 which
 can be joined, but it still doesn't allow you to make great queries that
 involve the kind of filtering that is convenient for generating, e.g.
 balance sheets.


 as above [1] , you are incorrect


(I think I expressed myself clearly enough with the above.)



 The Beancount Query Language will specifically address these issues by:
 - allowing filtering to occur at two levels in a single query:
  * filtering at the entries level (with predicates on postings allowed),
 plus


 free with SQL

   * filtering at the postings level
 - support inventory arithmetic and aggregation functions on inventory
 objects and rendering functions (e.g., COST()).
 - support flattening of multiple lots (to produce lists of holdings each
 with their cost basis)


 free with SQL if you know what you are doing


I explained the problems above.

If you prefer to use straight-up SQL, bean-sql creates the corresponding
database for you and you can use that if you wish. Ledger has an equivalent
conversion to SQL (I haven't looked at the tables it produces but it's fair
to assume there's more than one table). You're welcome to copy (within the
constraints of GPL) and change the script to make it output its tables
differently if you wish, this should be simple. In fact, we welcome this
kind of experimentation. Show us how you can generate all the same reports
using SQL only.



 - explicitly support the close operation as syntax, that inserts
 transactions to clear income statement accounts to equity and removes past
 history


 Ah, now you see, I disagree with that from an accounting and mathematical
 POV.  I explicitly don't want that.  Transactions are a continuum over
 time, reporting on them may be necessary because one or other government
 says so but does what actually happened change because I have to state a
 position on a date to an authority ?  I say that is a report not a
 transaction, my accounts close when I die.


I think I might need to explain more clearly what I mean by close here.
You read this wrong. I'm referring to previous discussions that involved
closing the year. The process of closing in Beancount refers to

- the insertion of transactions at a particular date that transfer the
balances of the Income and Expenses accounts to an Equity account (previous
earnings) to zero them out

- the insertion of a conversion entry to account for changes due to
currency conversion (to address the same problem you've been referring to
as Trading accounts)

- the removal of previous transactions before the given date and
simultaneous insertion of opening balances

Re: A Query Language for Beancount

2014-10-19 Thread Wm...
Sun, 19 Oct 2014 15:23:10 
CAK21+hMYzN3BXaCJPV++RxZCm=h7fuxsnxwgqjujpkld7z-...@mail.gmail.com

Martin Blais bl...@furius.ca wrote...

Wm:

I'm not sure I'm getting this.  Surely most of the objects (point in time,
something, amount of that something, who had it before and who had it
afterwards) are easily defined and represented with a bit of glue saying
that-one-over-there and this-one-here belong together, let's call it a
transaction ?



This is a simple and common database-related problem.

You have at least two tables:
One table contains information attached to the transaction.
The other table contains information attached to the posting.

Let's illustrate: let us simplify things grossly and say that transactions
have only

 (entry-id, date, flag, payee)

fields and that postings have only

 (posting-id, entry-id, account, amount, currency)

fields.

You can join the two tables on some id (easily doable) to obtain a table of
postings plus their corresponding entry data.
If so, you get rows of

 (entry-id, date, flag, payee, posting-id, account, amount, currency)
 ...

The rub here is that you have duplication of the entry information. For
example, given the following transaction:

 2014-09-12 * WALMART
   Assets:Checking-45.00 USD
   Expenses:Toys45.00 USD

You would obtain

 (1000, 2014-09-12, *, WALMART, 2, Assets:Checking, -45.00,
USD)
 (1000, 2014-09-12, *, WALMART, 20001, Expenses:Toys, 45.00, USD)


Only if you throw in unnecessary stuff, a query need not include junk


Do you not see that the entry fields are duplicated?


Nope, they aren't duplicated


Ok, so you might think, this is not much of a problem.


It is worse than that, I see no duplication at all.  I see legs / splits 
/ whatever you want to call them of a transaction.



Well, you _can_ generate many meaningful reports this way, but not all
(read on). This is conceptually how Ledger works AFAIK: the filter applies
against the list of postings, as they appear in the file (Ledgerfolk:
please correct me if I'm wrong, this is how I understand it to be).


STOP: you are arguing beancount think vs ledger-cli think

I am not part of that argument!


Deduplication

The problem I referred to as deduplication is that of combining rows of the
same transaction. This is how you would normally want to render a journal's
entries. For example:

 2014-09-12 * WALMART
   Assets:Checking-45.00 USD
   Expenses:Toys15.00 USD
   Expenses:Toys30.00 USD

This should ideally render as a single entry in a journal for
Expenses:Toys. Beancount's journal rendering routines do this
automatically. A GROUP BY for all entry columns for a result set on a list
of postings would be awkward.


Puzzled: why do you think SQL would struggle with this?

I admire your thinking in the sense that you seem to ignore input from 
the real world but don't understand why you can't see that a query can 
produce a result.  Is it possible you are unfamiliar with SQL? -- a yes 
or no will do



Operations on Inventories

A more difficult problem is that the Ledgerverse systems admit any number
and type of units to be stored and accumulated in association with an
account. Accounts don't have associated currencies. This means that summing
the changes defined by postings is not a simple arithmetic operation.


yup, you can have sums of something if you want, all you have to do is 
say there is a something associated with a number



Inventory objects are more like mappings that contains associations of
(currency, cost, acquisition-date) to a number of units. Adding a new
position has one of two effects: it either creates a new lot, or if there
is already a lot for that (currency, cost, acquisition-date), it adds units
to that lot. Reducing a position is handled differently by Ledger and
Beancount:


... and don't we know it!  Do you think I'd be here if I didn't already 
know about the differences?  FFS, man!



in Ledger, the operation works the same as with the addition of
new lots, except with a negative number. It allows positive and negative
lots in the same inventory. This naturally supports other methods of
booking (you can average all the cost bases, for example) but has the
unfortunate effect of admitting impossible situations, such as an inventory
with a negative cost basis.


Not impossible if you have bought stuff that is yet to be delivered. 
Ordinary business.



Reducing a position in Beancount is more
strict:


we know that, I am asking why are you going to all this trouble?  I 
think it is something personal because it doesn't reflect ordinary 
commerce



it requires that an existing lot exist to be removed from.


nope


Therefore, it never allows for both positive and negative numbers for the
same lot. This creates some complications of its own, in particular when
attempting to book positions at average cost (the lots of an inventory need
to be reaggregated) but has the advantage of preventing impossible
situations and resulting only in 

Re: A Query Language for Beancount

2014-10-18 Thread Martin Blais
On Sat, Oct 18, 2014 at 12:45 PM, Wm... tcn...@tarrcity.demon.co.uk wrote:

 Thu, 2 Oct 2014 23:37:09 CAK21+hMx7wgt0imEtHp_WTFuYaqLTPJ+OwFHM+wjPS_
 77sn...@mail.gmail.com
 Martin Blais bl...@furius.ca wrote...

 [not sub'd to hledger so dropped from ng list]

  It's always important to me to state the reasons why this is not possible
 as a regular table: filtering has to be operable on complete transactions
 (with all legs), and operations on inventories of commodities have to be
 possible. These are the reasons that we cannot just spit out the data to
 some SQLite table and then use that (I'd like to see someone argue the
 opposite).


 Given that you seem to write your thoughts out I've missed the doc where
 you say what the reasons are it can't be done in SQLite (or postgres or
 mysql) tables.  Bear in mind that GnuCash uses any of the 3 mentioned dbs
 as storage methods alongside XML (cf plain text file).


I'm not so much interested in the storage method; a relational database of
XML can represent the internals of Beancount alright, that's easy, the
issue is making simple queries on tree-structured data that produces
reports suitable for accounting, with operations on inventories of
positions.




 Your own recently mentioned implementation (bean-sql) creates multiple
 tables but it could actually be done in one table if you simply regard
 SQLite3 (the overhead and fuss to new users mitigates against the others)
 as a very efficient storage method and just take it from there.


No you can't.
How would you represent the two-level tree structure of transactions -
1:N - postings with
- a single relational table, and
- no duplication (otherwise it's a derived form and not suitable for
representation)
?
If you have multiple tables, you end up having to do joins and
deduplication.
With a single table, you have duplication of information.
At the moment I output separate tables for transactions and postings, which
can be joined, but it still doesn't allow you to make great queries that
involve the kind of filtering that is convenient for generating, e.g.
balance sheets.

The Beancount Query Language will specifically address these issues by:
- allowing filtering to occur at two levels in a single query:
  * filtering at the entries level (with predicates on postings allowed),
plus
  * filtering at the postings level
- support inventory arithmetic and aggregation functions on inventory
objects and rendering functions (e.g., COST()).
- support flattening of multiple lots (to produce lists of holdings each
with their cost basis)
- explicitly support the close operation as syntax, that inserts
transactions to clear income statement accounts to equity and removes past
history

I claim that with the features above you can explicitly model all the
queries being done in Ledger, HLedger and Beancount.




 Within a few minutes you will start seeing your data and wanting to make
 some impression on it like I want to put the transaction and transaction
 lines seperately ... which is what you did :)


I don't understand this sentence.








 --
 Wm...

 --
 You received this message because you are subscribed to the Google Groups
 Beancount group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to beancount+unsubscr...@googlegroups.com.
 To post to this group, send email to beanco...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/beancount/tLKufVTCdqQUFwvv%40tarrcity.demon.co.uk.

 For more options, visit https://groups.google.com/d/optout.


-- 

--- 
You received this message because you are subscribed to the Google Groups 
Ledger group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ledger-cli+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: A Query Language for Beancount

2014-10-03 Thread Martin Blais
On Thu, Oct 2, 2014 at 5:51 PM, Simon Michael si...@joyful.com wrote:

 On 9/28/14 10:02 PM, Martin Blais wrote:

 The ideas are a little bit fresh, so this proposal might feel a bit
 stream of consciousness and needs a bit more polishing.

 ...

 https://docs.google.com/document/d/1d88MkHqxiVdF8XSQBT1QQpOKEOt6O
 C1P9ZoF3u86DwI/edit?usp=sharing


 I haven't much to add today, except this is a very nice discussion -
 thanks.

 Your Gmail-style query language describes hledger's current query
 language well.


I'm assuming you mean this: http://hledger.org/manual#query-arguments
Thanks for pointing it out, I hadn't noticed, it's indeed quite similar.

I think I lean heavily on the SQL version though, for the following reasons:
- I like the explicit nature of the transaction filter vs. postings filter
being in two separate location.
- With aggregate operations defined on inventory objects, I'm able to fold
both the journal and balance operations into a single syntax. This gets rid
of the concept of report altogether, which is nice. There would be no
more reports, just a query expression syntax (and perhaps some shortcuts to
generate the reports using a prebaked query).
- I like to be able to specify the desired output columns. Along with
aggregations, this will allow it to replace all the custom holdings reports
I had been creating lately.

It's always important to me to state the reasons why this is not possible
as a regular table: filtering has to be operable on complete transactions
(with all legs), and operations on inventories of commodities have to be
possible. These are the reasons that we cannot just spit out the data to
some SQLite table and then use that (I'd like to see someone argue the
opposite).




A more powerful SQL-ish query language does seem like a good idea,
 especially one based on the nice Google Charts api linked in your comments.


I'm not sure about the chart API, that's an idea that has been pointed out
to me.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
Ledger group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ledger-cli+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: A Query Language for Beancount

2014-10-02 Thread Simon Michael

On 9/28/14 10:02 PM, Martin Blais wrote:

The ideas are a little bit fresh, so this proposal might feel a bit
stream of consciousness and needs a bit more polishing.

...

https://docs.google.com/document/d/1d88MkHqxiVdF8XSQBT1QQpOKEOt6OC1P9ZoF3u86DwI/edit?usp=sharing


I haven't much to add today, except this is a very nice discussion - thanks.

Your Gmail-style query language describes hledger's current query 
language well.


A more powerful SQL-ish query language does seem like a good idea, 
especially one based on the nice Google Charts api linked in your comments.



--

--- 
You received this message because you are subscribed to the Google Groups Ledger group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to ledger-cli+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: A Query Language for Beancount

2014-10-01 Thread Alexandre Rademaker

I am still thinking a lot about this idea of a general query language for 
transactions. Not clear to me if the best abstraction is relational one 
(tables, SQL like) or a graph based, more like SPARQL or 
https://www.youtube.com/watch?v=QFaH6IvbPiw approach.

I remember that I read something in the past about a general mapping from 
transactions to graph.

[]s
Alexandre


On Sep 29, 2014, at 5:02 AM, Martin Blais bl...@furius.ca wrote:

Hi,

I've been doing some thinking about creating a query language for Beancount for 
a while, but until a few days ago I was left profoundly unsatisfied with the 
design I had in mind, but I think I've finally nailed down an idea that would 
allow me to remove all report types (i.e., no more balances (Ledger: bal), no 
more journal (Ledger: register)): by combining filtering of transactions with 
filtering of postings, as two separate clauses, in a SQL-like syntax where you 
get to specify the desired output columns and inventory aggregations. The 
target of the FROM clause is not a table name, but a set of filters applied to 
full transactions, and the WHERE clause applies to postings.

The ideas are a little bit fresh, so this proposal might feel a bit stream of 
consciousness and needs a bit more polishing.  See the Two-Phase Filtering 
section of this doc:

  
https://docs.google.com/document/d/1d88MkHqxiVdF8XSQBT1QQpOKEOt6OC1P9ZoF3u86DwI/edit?usp=sharing

I'd love some feedback and your thoughts (as always you can comment on the doc 
directly).

If this works as I imagine it would, I would remove support for all other 
report types, or perhaps create aliases for them to canned select queries.


-- 

--- 
You received this message because you are subscribed to the Google Groups 
Ledger group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ledger-cli+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


A Query Language for Beancount

2014-09-29 Thread Martin Blais
Hi,

I've been doing some thinking about creating a query language for Beancount
for a while, but until a few days ago I was left profoundly unsatisfied
with the design I had in mind, but I think I've finally nailed down an idea
that would allow me to remove all report types (i.e., no more balances
(Ledger: bal), no more journal (Ledger: register)): by combining
filtering of transactions with filtering of postings, as two separate
clauses, in a SQL-like syntax where you get to specify the desired output
columns and inventory aggregations. The target of the FROM clause is not a
table name, but a set of filters applied to full transactions, and the
WHERE clause applies to postings.

The ideas are a little bit fresh, so this proposal might feel a bit stream
of consciousness and needs a bit more polishing.  See the Two-Phase
Filtering section of this doc:


https://docs.google.com/document/d/1d88MkHqxiVdF8XSQBT1QQpOKEOt6OC1P9ZoF3u86DwI/edit?usp=sharing

I'd love some feedback and your thoughts (as always you can comment on the
doc directly).

If this works as I imagine it would, I would remove support for all other
report types, or perhaps create aliases for them to canned select queries.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
Ledger group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ledger-cli+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.