Announcing bean-query: a SQL query language for Beancount
Update for Beancount: Today I merged the 'shell' branch of Beancount into its default branch. This provides the bean-query tool in the main code that gets installed by default. This now provides a fourth method to extract data from a Beancount ledger file: 1. bean-web: The web interface 2. bean-report: Report generation using dedicated routines (like Ledger) 3. bean-query: SQL query language to specify reports 4. scripting: You write your own report using the loader library and data structures High-level documentation for the corresponding SQL variant is available from http://furius.ca/beancount/doc/query The original design/motivation doc is here: furius.ca/beancount/doc/proposal-query Note that this is just the first revision of this tool and will mature over the coming months, but I felt it was in a good enough state to let others start tinkering with it. About half or to 3/4 of the proposal is implemented. Feedback and bug reports are welcome at this point. Cheers, -- --- You received this message because you are subscribed to the Google Groups Ledger group. To unsubscribe from this group and stop receiving emails from it, send an email to ledger-cli+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: A Query Language for Beancount
if you want to be perverse, otherwise you put them next to each other or blob them, your choice, but not a problem that says only a plain text file makes this better because I haven't thought about it enough I don't understand what you're trying to say. contemporary dbs can easily split out stuff from a blob better than you spending years refining some transactional thing that only you care about, promise Years? Yesterday alone I wrote a pretty complete SQL parser and today I'm writing an interpreter for it. Here: https://bitbucket.org/blais/beancount/src/tip/src/python/beancount/query/query_parser.py?at=shell You need to understand that despite its ubiquity and familiarity, SQL isn't a silver bullet for everything. It is by no means perfect. Its does not express hierarchical data formats well, which is partly why object databases like MongoDB are enjoying a fair amount of popularity (not so long ago all the rage was XML-based datastores with better defined schemas). You can also see traces of this in all the ways that people writing an access language for hierarchical formats have had to get creative in order to provide access to nested and repeated fields, e.g. http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36632.pdf Also see how the different SQL database implementations each have their own warts and limitations. It is entirely reasonable to extend SQL and propose specializations for particular use cases--everyone does it. You seem to think that SQL is some sort of superior technology that has solved all data access specification problems. SQL is a way to specify access to your data, but it is still the case that application writers need to occasionally combine the output of multiple queries with some code to perform some operations. I think that with minimal extensions I can define a variant that works well for our little accounting database. If you have multiple tables, you end up having to do joins and deduplication. joins yes, deduplication no why do you think deduplication would be necessary? genuinely puzzled (See detailed explanation above with example about what I called deduplication.) With a single table, you have duplication of information. nope, a single table could be just as simple as a plain text file At the moment I output separate tables for transactions and postings, which can be joined, but it still doesn't allow you to make great queries that involve the kind of filtering that is convenient for generating, e.g. balance sheets. as above [1] , you are incorrect (I think I expressed myself clearly enough with the above.) The Beancount Query Language will specifically address these issues by: - allowing filtering to occur at two levels in a single query: * filtering at the entries level (with predicates on postings allowed), plus free with SQL * filtering at the postings level - support inventory arithmetic and aggregation functions on inventory objects and rendering functions (e.g., COST()). - support flattening of multiple lots (to produce lists of holdings each with their cost basis) free with SQL if you know what you are doing I explained the problems above. If you prefer to use straight-up SQL, bean-sql creates the corresponding database for you and you can use that if you wish. Ledger has an equivalent conversion to SQL (I haven't looked at the tables it produces but it's fair to assume there's more than one table). You're welcome to copy (within the constraints of GPL) and change the script to make it output its tables differently if you wish, this should be simple. In fact, we welcome this kind of experimentation. Show us how you can generate all the same reports using SQL only. - explicitly support the close operation as syntax, that inserts transactions to clear income statement accounts to equity and removes past history Ah, now you see, I disagree with that from an accounting and mathematical POV. I explicitly don't want that. Transactions are a continuum over time, reporting on them may be necessary because one or other government says so but does what actually happened change because I have to state a position on a date to an authority ? I say that is a report not a transaction, my accounts close when I die. I think I might need to explain more clearly what I mean by close here. You read this wrong. I'm referring to previous discussions that involved closing the year. The process of closing in Beancount refers to - the insertion of transactions at a particular date that transfer the balances of the Income and Expenses accounts to an Equity account (previous earnings) to zero them out - the insertion of a conversion entry to account for changes due to currency conversion (to address the same problem you've been referring to as Trading accounts) - the removal of previous transactions before the given date and simultaneous insertion of opening balances
Re: A Query Language for Beancount
Sun, 19 Oct 2014 15:23:10 CAK21+hMYzN3BXaCJPV++RxZCm=h7fuxsnxwgqjujpkld7z-...@mail.gmail.com Martin Blais bl...@furius.ca wrote... Wm: I'm not sure I'm getting this. Surely most of the objects (point in time, something, amount of that something, who had it before and who had it afterwards) are easily defined and represented with a bit of glue saying that-one-over-there and this-one-here belong together, let's call it a transaction ? This is a simple and common database-related problem. You have at least two tables: One table contains information attached to the transaction. The other table contains information attached to the posting. Let's illustrate: let us simplify things grossly and say that transactions have only (entry-id, date, flag, payee) fields and that postings have only (posting-id, entry-id, account, amount, currency) fields. You can join the two tables on some id (easily doable) to obtain a table of postings plus their corresponding entry data. If so, you get rows of (entry-id, date, flag, payee, posting-id, account, amount, currency) ... The rub here is that you have duplication of the entry information. For example, given the following transaction: 2014-09-12 * WALMART Assets:Checking-45.00 USD Expenses:Toys45.00 USD You would obtain (1000, 2014-09-12, *, WALMART, 2, Assets:Checking, -45.00, USD) (1000, 2014-09-12, *, WALMART, 20001, Expenses:Toys, 45.00, USD) Only if you throw in unnecessary stuff, a query need not include junk Do you not see that the entry fields are duplicated? Nope, they aren't duplicated Ok, so you might think, this is not much of a problem. It is worse than that, I see no duplication at all. I see legs / splits / whatever you want to call them of a transaction. Well, you _can_ generate many meaningful reports this way, but not all (read on). This is conceptually how Ledger works AFAIK: the filter applies against the list of postings, as they appear in the file (Ledgerfolk: please correct me if I'm wrong, this is how I understand it to be). STOP: you are arguing beancount think vs ledger-cli think I am not part of that argument! Deduplication The problem I referred to as deduplication is that of combining rows of the same transaction. This is how you would normally want to render a journal's entries. For example: 2014-09-12 * WALMART Assets:Checking-45.00 USD Expenses:Toys15.00 USD Expenses:Toys30.00 USD This should ideally render as a single entry in a journal for Expenses:Toys. Beancount's journal rendering routines do this automatically. A GROUP BY for all entry columns for a result set on a list of postings would be awkward. Puzzled: why do you think SQL would struggle with this? I admire your thinking in the sense that you seem to ignore input from the real world but don't understand why you can't see that a query can produce a result. Is it possible you are unfamiliar with SQL? -- a yes or no will do Operations on Inventories A more difficult problem is that the Ledgerverse systems admit any number and type of units to be stored and accumulated in association with an account. Accounts don't have associated currencies. This means that summing the changes defined by postings is not a simple arithmetic operation. yup, you can have sums of something if you want, all you have to do is say there is a something associated with a number Inventory objects are more like mappings that contains associations of (currency, cost, acquisition-date) to a number of units. Adding a new position has one of two effects: it either creates a new lot, or if there is already a lot for that (currency, cost, acquisition-date), it adds units to that lot. Reducing a position is handled differently by Ledger and Beancount: ... and don't we know it! Do you think I'd be here if I didn't already know about the differences? FFS, man! in Ledger, the operation works the same as with the addition of new lots, except with a negative number. It allows positive and negative lots in the same inventory. This naturally supports other methods of booking (you can average all the cost bases, for example) but has the unfortunate effect of admitting impossible situations, such as an inventory with a negative cost basis. Not impossible if you have bought stuff that is yet to be delivered. Ordinary business. Reducing a position in Beancount is more strict: we know that, I am asking why are you going to all this trouble? I think it is something personal because it doesn't reflect ordinary commerce it requires that an existing lot exist to be removed from. nope Therefore, it never allows for both positive and negative numbers for the same lot. This creates some complications of its own, in particular when attempting to book positions at average cost (the lots of an inventory need to be reaggregated) but has the advantage of preventing impossible situations and resulting only in
Re: A Query Language for Beancount
On Sat, Oct 18, 2014 at 12:45 PM, Wm... tcn...@tarrcity.demon.co.uk wrote: Thu, 2 Oct 2014 23:37:09 CAK21+hMx7wgt0imEtHp_WTFuYaqLTPJ+OwFHM+wjPS_ 77sn...@mail.gmail.com Martin Blais bl...@furius.ca wrote... [not sub'd to hledger so dropped from ng list] It's always important to me to state the reasons why this is not possible as a regular table: filtering has to be operable on complete transactions (with all legs), and operations on inventories of commodities have to be possible. These are the reasons that we cannot just spit out the data to some SQLite table and then use that (I'd like to see someone argue the opposite). Given that you seem to write your thoughts out I've missed the doc where you say what the reasons are it can't be done in SQLite (or postgres or mysql) tables. Bear in mind that GnuCash uses any of the 3 mentioned dbs as storage methods alongside XML (cf plain text file). I'm not so much interested in the storage method; a relational database of XML can represent the internals of Beancount alright, that's easy, the issue is making simple queries on tree-structured data that produces reports suitable for accounting, with operations on inventories of positions. Your own recently mentioned implementation (bean-sql) creates multiple tables but it could actually be done in one table if you simply regard SQLite3 (the overhead and fuss to new users mitigates against the others) as a very efficient storage method and just take it from there. No you can't. How would you represent the two-level tree structure of transactions - 1:N - postings with - a single relational table, and - no duplication (otherwise it's a derived form and not suitable for representation) ? If you have multiple tables, you end up having to do joins and deduplication. With a single table, you have duplication of information. At the moment I output separate tables for transactions and postings, which can be joined, but it still doesn't allow you to make great queries that involve the kind of filtering that is convenient for generating, e.g. balance sheets. The Beancount Query Language will specifically address these issues by: - allowing filtering to occur at two levels in a single query: * filtering at the entries level (with predicates on postings allowed), plus * filtering at the postings level - support inventory arithmetic and aggregation functions on inventory objects and rendering functions (e.g., COST()). - support flattening of multiple lots (to produce lists of holdings each with their cost basis) - explicitly support the close operation as syntax, that inserts transactions to clear income statement accounts to equity and removes past history I claim that with the features above you can explicitly model all the queries being done in Ledger, HLedger and Beancount. Within a few minutes you will start seeing your data and wanting to make some impression on it like I want to put the transaction and transaction lines seperately ... which is what you did :) I don't understand this sentence. -- Wm... -- You received this message because you are subscribed to the Google Groups Beancount group. To unsubscribe from this group and stop receiving emails from it, send an email to beancount+unsubscr...@googlegroups.com. To post to this group, send email to beanco...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/ msgid/beancount/tLKufVTCdqQUFwvv%40tarrcity.demon.co.uk. For more options, visit https://groups.google.com/d/optout. -- --- You received this message because you are subscribed to the Google Groups Ledger group. To unsubscribe from this group and stop receiving emails from it, send an email to ledger-cli+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: A Query Language for Beancount
On Thu, Oct 2, 2014 at 5:51 PM, Simon Michael si...@joyful.com wrote: On 9/28/14 10:02 PM, Martin Blais wrote: The ideas are a little bit fresh, so this proposal might feel a bit stream of consciousness and needs a bit more polishing. ... https://docs.google.com/document/d/1d88MkHqxiVdF8XSQBT1QQpOKEOt6O C1P9ZoF3u86DwI/edit?usp=sharing I haven't much to add today, except this is a very nice discussion - thanks. Your Gmail-style query language describes hledger's current query language well. I'm assuming you mean this: http://hledger.org/manual#query-arguments Thanks for pointing it out, I hadn't noticed, it's indeed quite similar. I think I lean heavily on the SQL version though, for the following reasons: - I like the explicit nature of the transaction filter vs. postings filter being in two separate location. - With aggregate operations defined on inventory objects, I'm able to fold both the journal and balance operations into a single syntax. This gets rid of the concept of report altogether, which is nice. There would be no more reports, just a query expression syntax (and perhaps some shortcuts to generate the reports using a prebaked query). - I like to be able to specify the desired output columns. Along with aggregations, this will allow it to replace all the custom holdings reports I had been creating lately. It's always important to me to state the reasons why this is not possible as a regular table: filtering has to be operable on complete transactions (with all legs), and operations on inventories of commodities have to be possible. These are the reasons that we cannot just spit out the data to some SQLite table and then use that (I'd like to see someone argue the opposite). A more powerful SQL-ish query language does seem like a good idea, especially one based on the nice Google Charts api linked in your comments. I'm not sure about the chart API, that's an idea that has been pointed out to me. -- --- You received this message because you are subscribed to the Google Groups Ledger group. To unsubscribe from this group and stop receiving emails from it, send an email to ledger-cli+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: A Query Language for Beancount
On 9/28/14 10:02 PM, Martin Blais wrote: The ideas are a little bit fresh, so this proposal might feel a bit stream of consciousness and needs a bit more polishing. ... https://docs.google.com/document/d/1d88MkHqxiVdF8XSQBT1QQpOKEOt6OC1P9ZoF3u86DwI/edit?usp=sharing I haven't much to add today, except this is a very nice discussion - thanks. Your Gmail-style query language describes hledger's current query language well. A more powerful SQL-ish query language does seem like a good idea, especially one based on the nice Google Charts api linked in your comments. -- --- You received this message because you are subscribed to the Google Groups Ledger group. To unsubscribe from this group and stop receiving emails from it, send an email to ledger-cli+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: A Query Language for Beancount
I am still thinking a lot about this idea of a general query language for transactions. Not clear to me if the best abstraction is relational one (tables, SQL like) or a graph based, more like SPARQL or https://www.youtube.com/watch?v=QFaH6IvbPiw approach. I remember that I read something in the past about a general mapping from transactions to graph. []s Alexandre On Sep 29, 2014, at 5:02 AM, Martin Blais bl...@furius.ca wrote: Hi, I've been doing some thinking about creating a query language for Beancount for a while, but until a few days ago I was left profoundly unsatisfied with the design I had in mind, but I think I've finally nailed down an idea that would allow me to remove all report types (i.e., no more balances (Ledger: bal), no more journal (Ledger: register)): by combining filtering of transactions with filtering of postings, as two separate clauses, in a SQL-like syntax where you get to specify the desired output columns and inventory aggregations. The target of the FROM clause is not a table name, but a set of filters applied to full transactions, and the WHERE clause applies to postings. The ideas are a little bit fresh, so this proposal might feel a bit stream of consciousness and needs a bit more polishing. See the Two-Phase Filtering section of this doc: https://docs.google.com/document/d/1d88MkHqxiVdF8XSQBT1QQpOKEOt6OC1P9ZoF3u86DwI/edit?usp=sharing I'd love some feedback and your thoughts (as always you can comment on the doc directly). If this works as I imagine it would, I would remove support for all other report types, or perhaps create aliases for them to canned select queries. -- --- You received this message because you are subscribed to the Google Groups Ledger group. To unsubscribe from this group and stop receiving emails from it, send an email to ledger-cli+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
A Query Language for Beancount
Hi, I've been doing some thinking about creating a query language for Beancount for a while, but until a few days ago I was left profoundly unsatisfied with the design I had in mind, but I think I've finally nailed down an idea that would allow me to remove all report types (i.e., no more balances (Ledger: bal), no more journal (Ledger: register)): by combining filtering of transactions with filtering of postings, as two separate clauses, in a SQL-like syntax where you get to specify the desired output columns and inventory aggregations. The target of the FROM clause is not a table name, but a set of filters applied to full transactions, and the WHERE clause applies to postings. The ideas are a little bit fresh, so this proposal might feel a bit stream of consciousness and needs a bit more polishing. See the Two-Phase Filtering section of this doc: https://docs.google.com/document/d/1d88MkHqxiVdF8XSQBT1QQpOKEOt6OC1P9ZoF3u86DwI/edit?usp=sharing I'd love some feedback and your thoughts (as always you can comment on the doc directly). If this works as I imagine it would, I would remove support for all other report types, or perhaps create aliases for them to canned select queries. -- --- You received this message because you are subscribed to the Google Groups Ledger group. To unsubscribe from this group and stop receiving emails from it, send an email to ledger-cli+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.