Re: Beancount with large journals

2019-02-13 Thread Stefano Zacchiroli
On Sun, Feb 10, 2019 at 11:07:03PM -0500, Martin Blais wrote:
> You can view the breakdown in time with the -v option to bean-check:

You've probably already thought about that, so out of curiosity: how
much of this is potentially parallelizable, as an avenue for "easily"
getting a performance boost? I guess not much, due to either I/O
constraints or the GIL lock, right? I'm curious about whether
validation, booking, and plugins might be made parallelizable in the
future.

-- 
Stefano Zacchiroli . z...@upsilon.cc . upsilon.cc/zack . . o . . . o . o
Computer Science Professor . CTO Software Heritage . . . . . o . . . o o
Former Debian Project Leader & OSI Board Director  . . . o o o . . . o .
« the first rule of tautology club is the first rule of tautology club »

-- 
You received this message because you are subscribed to the Google Groups 
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to beancount+unsubscr...@googlegroups.com.
To post to this group, send email to beancount@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beancount/20190214074406.e4d4h2yobs2rc4ac%40upsilon.cc.
For more options, visit https://groups.google.com/d/optout.


Re: Beancount with large journals

2019-02-13 Thread mick . phillips
Thanks for the info, Martin.

On my laptop, most of the time is spent in the parser and the validator.

The heads-up about the conversions is good to know. Fortunately, the 
account with the largest number of transactions has no conversions to worry 
about (they're microloans, so it's all the same currency) - I can probably 
aggregate those without any headaches.

Thanks for the link to the Selinger tutorial: tracking multiple currencies 
is something I'd been thinking about ahead of a move later this year, so is 
very useful.

On Monday, 11 February 2019 04:07:17 UTC, Martin Blais wrote:
>
> On Sun, Feb 10, 2019 at 11:34 AM > 
> wrote:
>
>> Hi.
>>
>> I've been using Beancount and fava to report on microinvestment 
>> transactions. I'm hitting serious performance issues, as the journal for a 
>> single account is approaching 11Mb. (This is no criticism of either fava or 
>> Beancount, as I think this use case is probably far beyond their intended 
>> usage.)
>>
>
> Big file. My entire history is around 4MB now, and it's starting to bother 
> me (even with the cache).
>
>
> * Are there any big performance hits I could avoid (e.g. does relying on 
>> auto-posting have a significant impact)?
>>
>
> I don't think so, though never say never, a pointed performance sprint by 
> someone who can profile C / Python well might yield some savings.
> I've been thinking about rewriting all of beancount.core in C++, but 
> that's not going to be for Tomorrow just yet (I'm resisting, I have very 
> few cycles on my personal time as of late) and I'd have to also reimplement 
> the plugins (see below).
>
> You can view the breakdown in time with the -v option to bean-check:
> $ bean-check -v $L
> INFO: Operation: 'beancount.parser.parser.parse_file'
>  Time:732 ms
> INFO: Operation: 'beancount.parser.parser.parse_file'
>  Time:  7 ms
> INFO: Operation: 'beancount.parser.parser'
> Time:  740 ms
> INFO: Operation: 'parse'  
> Time:  755 ms
> INFO: Operation: 'booking'
> Time: 1219 ms
> INFO: Operation: 'beancount.ops.pad'  
> Time:125 ms
> INFO: Operation: 'beancount.ops.documents'
> Time:128 ms
> INFO: Operation: 'beancount.plugins.ira_contribs'
>  Time: 21 ms
> INFO: Operation: 'beancount.plugins.implicit_prices'  
> Time:171 ms
> INFO: Operation: 'beancount.plugins.sellgains'
> Time: 23 ms
> INFO: Operation: 'beancount.plugins.check_closing'
> Time: 18 ms
> INFO: Operation: 'washsales.commissions'  
> Time: 29 ms
> INFO: Operation: 'beancount.plugins.check_commodity'  
> Time: 31 ms
> INFO: Operation: 'beancount.plugins.commodity_attr'  
>  Time:  4 ms
> INFO: Operation: 'office.options'
>  Time:  5 ms
> INFO: Operation: 'office.share_caroline'  
> Time: 19 ms
> INFO: Operation: 'beancount.plugins.divert_expenses'  
> Time:  7 ms
> INFO: Operation: 'beancount.ops.balance'  
> Time:616 ms
> INFO: Operation: 'run_transformations'
> Time: 1470 ms
> INFO: Operation: 'function: validate_open_close'  
> Time:  6 ms
> INFO: Operation: 'function: validate_active_accounts'
>  Time: 38 ms
> INFO: Operation: 'function: validate_currency_constraints'
> Time: 25 ms
> INFO: Operation: 'function: validate_duplicate_balances'  
> Time:  8 ms
> INFO: Operation: 'function: validate_duplicate_commodities'  
>  Time:  4 ms
> INFO: Operation: 'function: validate_documents_paths'
>  Time:  5 ms
> INFO: Operation: 'function: validate_check_transaction_balances'  
> Time:264 ms
> INFO: Operation: 'function: validate_data_types'  
> Time:100 ms
> INFO: Operation: 'beancount.ops.validate'
>  Time:  450 ms
> INFO: Operation: 'beancount.loader (total)'  
>  Time:   4529 ms
>
> That's on a ~4MB file running on my little Intel NUC.
> As you can see, the parsing, booking, and plugins (transformations) code 
> are the big hitters.
>
>  
>
>> * Does anyone know of any tools out there for aggregating journal entries 
>> into summary journals (or has anyone had any success using Beancount's API 
>> to do this)?
>>
>
> I m