Re: Your ingest workflow?

2022-02-04 Thread Daryl Manning
Marvin,

I used ledger-cli (not beancount) but I get almost completely automated 
categorization via Reckon (which is a bayesian predictor of current 
category from past categories). Super handy and has reduced importing and 
categorizing workload a magnitude. 
Blog post here. Author of the ruby gem very responsive and helpful. YMMV.

https://daryl.wakatara.com/tracking-your-finances-with-reckon-and-ledger/

ciao !
Daryl.


On Tuesday, October 19, 2021 at 3:42:14 PM UTC+8 marvin...@gmail.com wrote:

> Hi,
>
> I'm once more troubled by changes in the statements provided by one of my 
> banks. It's probably easy to fix but I thought I use the opportunity to 
> learn what others do.
>
> My current workflow:
>
> 1. Manually download statements (most financial institutes provide me with 
> CSV files but I also have 1-2 PDFs). When possible I always download all 
> transactions for the current year and replace my current file. This is easy 
> but logging into ~5 banks, clicking through the menus and downloading the 
> correct file takes a bit. This is one of the reasons I update my ledger 
> less often than I would like to.
> 2. Run importers for all statement files for the current year. I 
> implemented the `beancount.ingest.importer.ImporterProtocol` interface for 
> my banks and mostly this just works. I manually map the statement file to 
> the Importer since identifying importers wasn't always reliable. While it 
> mostly works this is the part were things likely fail because a bank 
> changed the format of the statements or an importer has a bug.
> 3. Merge new entries with my ledger. I just rely on knowing the latest 
> transaction per account in my ledger and only adding newer entries.
> 4. Manually categorize. This works well. I also rely on a small plugin to 
> find transactions between my accounts and mark once of them as a duplicate 
> of the other.
>
> The above workflow works but it could be smoother. So I wonder what others 
> do and what I could learn from it.
>
> In general I think that Beancount is awesome (and from what I read v3 will 
> be even better). And instead of cooking my own solution I would rather 
> contribute to existing solutions to make the import process smoother for 
> everyone. One way I see is to recommend best practices (or a single 
> workflow) and encourage people to collect importers (and maybe also tools 
> for fetching statements). A bit of consistency might save us all some time 
> here. Wouldn't your ingest workflow be covered by the steps below?
>
> 1. Fetch statements
>
> The first step is always to fetch the transaction history from the 
> financial institute. Automating this would be nice but seems a lot of work 
> (websites keep changing and come in many languages) and can easily become a 
> security risk. We shouldn't encourage users to store their passwords in 
> plaintext. I think we should allow multiple paths here:
>
> - User manually downloads CSV or PDF files. When banks provide multiple 
> formats we should document which format is expected in the next step.
> - Automate download by scraping the bank website (
> https://github.com/jbms/finance-dl seems like a good approach). This is 
> nice for users with many accounts and who know what they are doing.
> - Use APIs. The only example I know is Wise which provide a nice API to 
> securely fetch the list of transactions by uploading a public key and 
> keeping a token in a environment variable.
>
> 2. Parse statements into Beancount entries
>
> I think the current `Importer` interface works. In my current workflow I 
> don't use the ability to identify and sort files but that might change. And 
> the rest is just a function mapping from the file with statements to a list 
> of Beancount entries.
>
> The latter is the biggest trouble right now. We are missing a repository 
> that provides importers for the majority of financial institutions out 
> there. Implementing one importer is easy but keeping 5-10 importers up to 
> date is a lot of work. I think this maintenance could be shared but 
> collecting importers into a single repository. This is the main issue I 
> would like to solve.
>
> Luckily I think performance is not critical here. Files are usually small 
> and I only care about new files.
>
> 3. Merge with ledger
>
> This is again tricky. I don't want to overwrite any transactions in the 
> ledger and I don't want to create duplicates. The best solution I found so 
> far:
>
> Find the most recent transaction for each of my accounts in my ledger. 
> Take all entries from the previous step that are newer than the transaction 
> and append them to my ledger.
>
> This works reasonable well.
>
> 4. Categorize
>
> I agree with 
> https://beanco

Re: Your ingest workflow?

2021-10-19 Thread Martin Michlmayr
* Marvin Ritter  [2021-10-19 00:42]:
> I'm once more troubled by changes in the statements provided by one of my 
> banks. It's probably easy to fix but I thought I use the opportunity to 
> learn what others do.

Just curious because you posted this to the ledger mailing list, talk
about the beancount ingest API and speak about importing into your
"ledger": are you using the beancount ingest API and then produce
ledger entries, or do you use beancount and this should have posted to
the beancount list?

Several aspects you describe (e.g. downloading statements) apply
equally to ledger and beancount but others (creating entries) are
quite different.
-- 
Martin Michlmayr
https://www.cyrius.com/

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ledger-cli+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ledger-cli/YW6InDfkQRzi0zbG%40jirafa.cyrius.com.


Your ingest workflow?

2021-10-19 Thread Marvin Ritter
Hi,

I'm once more troubled by changes in the statements provided by one of my 
banks. It's probably easy to fix but I thought I use the opportunity to 
learn what others do.

My current workflow:

1. Manually download statements (most financial institutes provide me with 
CSV files but I also have 1-2 PDFs). When possible I always download all 
transactions for the current year and replace my current file. This is easy 
but logging into ~5 banks, clicking through the menus and downloading the 
correct file takes a bit. This is one of the reasons I update my ledger 
less often than I would like to.
2. Run importers for all statement files for the current year. I 
implemented the `beancount.ingest.importer.ImporterProtocol` interface for 
my banks and mostly this just works. I manually map the statement file to 
the Importer since identifying importers wasn't always reliable. While it 
mostly works this is the part were things likely fail because a bank 
changed the format of the statements or an importer has a bug.
3. Merge new entries with my ledger. I just rely on knowing the latest 
transaction per account in my ledger and only adding newer entries.
4. Manually categorize. This works well. I also rely on a small plugin to 
find transactions between my accounts and mark once of them as a duplicate 
of the other.

The above workflow works but it could be smoother. So I wonder what others 
do and what I could learn from it.

In general I think that Beancount is awesome (and from what I read v3 will 
be even better). And instead of cooking my own solution I would rather 
contribute to existing solutions to make the import process smoother for 
everyone. One way I see is to recommend best practices (or a single 
workflow) and encourage people to collect importers (and maybe also tools 
for fetching statements). A bit of consistency might save us all some time 
here. Wouldn't your ingest workflow be covered by the steps below?

1. Fetch statements

The first step is always to fetch the transaction history from the 
financial institute. Automating this would be nice but seems a lot of work 
(websites keep changing and come in many languages) and can easily become a 
security risk. We shouldn't encourage users to store their passwords in 
plaintext. I think we should allow multiple paths here:

- User manually downloads CSV or PDF files. When banks provide multiple 
formats we should document which format is expected in the next step.
- Automate download by scraping the bank website 
(https://github.com/jbms/finance-dl seems like a good approach). This is 
nice for users with many accounts and who know what they are doing.
- Use APIs. The only example I know is Wise which provide a nice API to 
securely fetch the list of transactions by uploading a public key and 
keeping a token in a environment variable.

2. Parse statements into Beancount entries

I think the current `Importer` interface works. In my current workflow I 
don't use the ability to identify and sort files but that might change. And 
the rest is just a function mapping from the file with statements to a list 
of Beancount entries.

The latter is the biggest trouble right now. We are missing a repository 
that provides importers for the majority of financial institutions out 
there. Implementing one importer is easy but keeping 5-10 importers up to 
date is a lot of work. I think this maintenance could be shared but 
collecting importers into a single repository. This is the main issue I 
would like to solve.

Luckily I think performance is not critical here. Files are usually small 
and I only care about new files.

3. Merge with ledger

This is again tricky. I don't want to overwrite any transactions in the 
ledger and I don't want to create duplicates. The best solution I found so 
far:

Find the most recent transaction for each of my accounts in my ledger. Take 
all entries from the previous step that are newer than the transaction and 
append them to my ledger.

This works reasonable well.

4. Categorize

I agree with 
https://beancount.github.io/docs/importing_external_data.html#automatic-categorization
 
here.

I have a script that adds some tags to the transactions for common 
transactions that I do (same grocery store twice a week) but overall I 
don't mind adding the second account and a comment manually.

5. Commit

This is optional. I keep my whole ledger (raw statements + beancount files) 
under version control. This makes it super easy to revert back to the last 
commit if something above went wrong.


As said, I'm mostly curious to hear what others do and how we can leverage 
potential overlap.

Regards,
Marvin

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Ledger" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to ledger-cli+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ledger-cl