Re: CSL-JSON support for =parsebib=

2021-05-08 Thread Joost Kremers


On Sat, May 08 2021, András Simonyi wrote:
> this is just to +1 this on my part as well. Although unadvertised,
> citeproc-org basically already supports CSL-JSON bibliographies, and
> it would be fantastic if other components of the Emacs
> citation/bibliography infrastructure also did. BTW, would CSL-JSON
> support in =parsebib= mean that there is hope for having CSL-support
> in Ebib too?

Yes, that is the plan. No promises on an ETA, but it's high on my to-do list.

-- 
Joost Kremers
Life has its moments



Re: CSL-JSON support for =parsebib=

2021-05-08 Thread Denis Maier
Hi,well, this is what I asked Joost in the first place. Adjusting parsebib is 
part of the efforts to make that possible.Denis

Re: CSL-JSON support for =parsebib=

2021-05-08 Thread András Simonyi
Dear All,

this is just to +1 this on my part as well. Although unadvertised,
citeproc-org basically already supports CSL-JSON bibliographies, and
it would be fantastic if other components of the Emacs
citation/bibliography infrastructure also did. BTW, would CSL-JSON
support in =parsebib= mean that there is hope for having CSL-support
in Ebib too?

best regards,
András

On Fri, 7 May 2021 at 18:23, Titus von der Malsburg  wrote:
>
>
> On 2021-05-07 Fri 16:47, Joost Kremers wrote:
> > On Fri, May 07 2021, Titus von der Malsburg wrote:
> >>> Apparently, =json-parse-{buffer|string}= then gives you a symbol with a 
> >>> space
> >>> in it...
> >>
> >> I now see that symbol names “can contain any characters whatever” [1]. But 
> >> many
> >> characters need to be escaped (like spaces) which isn’t pretty.
> >
> > Agreed. But if you pass such a symbol to =symbol-name= or to =(format 
> > "%s")=,
> > the escape character is removed, so when it comes to displaying those 
> > symbols to
> > users, it shouldn't matter much.
> >
> > Note, though, that the keys in CSL-JSON don't seem to contain any spaces or
> > other weird characters. There are just lower case a-z and dash, that's all.
>
> I agree that weird characters are unlikely going to be an issue.  
> Nonetheless, strings seem slightly more future-proof.  Funky unicode stuff is 
> now appearing everywhere (I’ve seen emoji being used for variable names) and 
> the situation could be different a couple of years down the line.
>
> >>> This works for the Elisp library =json.el=, but Emacs 27 can be compiled 
> >>> with
> >>> native JSON support, which, however, doesn't provide this option,
> >>> unfortunately.
> >>
> >> I see. In this case it might make sense to propose string keys as a 
> >> feature for
> >> json.c. The key is a string anyway at some point during parsing, so 
> >> avoiding the
> >> conversion to symbol may actually be the best way to speed things up.
> >
> > True. I'll ask on emacs-devel. Personally, I'd prefer strings, too, but I'm 
> > a
> > bit hesitant about doing the conversion myself, esp. given that in Ebib, 
> > all the
> > keys would need to be converted back before I can save a file.
>
> Sure, converting all keys in parsebib is not attractive.
>
> >>> That would be easy to support, but IMHO is better handled in
> >>> bibtex-completion:
> >>> just parse the buffer and then call =gethash= on the resulting hash 
> >>> table. Or
> >>> what use-case do you have in mind?
> >>
> >> One use case: bibtex-completion drops fields that aren’t needed early on 
> >> to save
> >> memory and CPU cycles. (Some people work with truly enormous 
> >> bibliographies,
> >> like crypto.bib with ~60K entries.) But this means that we sometimes have 
> >> to
> >> read an individual entry again if we need more fields that were dropped 
> >> earlier.
> >> In this case I’d like to be able to read just one entry without having to
> >> reparse the complete bibliography.
> >
> > Makes sense. For .bib sources, this should be fairly easy to do. For .json, 
> > I
> > can't really say how easy it would be. It's not difficult to find the entry 
> > key
> > in the buffer, but from there you'd have to be able to find the start of the
> > entry in order to parse it. Currently, I don't know how to do that.
>
> Not a big deal.  Since it’s just about individual entries and the code isn’t 
> super central, we can easily hack something.
>
> >>>> - Functions for resolving strings and cross-references.
> > [...]
> >>> parsebib has a lower-level API and a higher-level API, and the latter does
> >>> essentially what you suggest here. I thought bibtex-completion was already
> >>> using it...
> >>
> >> Nope. I think the high-level API didn’t exist when I wrote my code in 2014.
> >
> > No, it didn't. I seem to remember, though, that you gave me the idea for the
> > higher-level API, which is probably why I assumed you were using it.
> >
> > So that part of =parsebib= hasn't been tested much... (Ebib doesn't use it,
> > either). If you do decide to start using it, please test it and report any
> > issues you find. And let me know if I can help with testing.
>
> The organically grown parsing code in the Bibtex completion has been bugging 
> me for a while.  So I'm keen on rewriting this.  But I may not get to it 
> until the summer.  I'll keep you posted when I start working on it.
>
>   Titus
>
>



Re: CSL-JSON support for =parsebib=

2021-05-07 Thread Titus von der Malsburg


On 2021-05-07 Fri 16:47, Joost Kremers wrote:
> On Fri, May 07 2021, Titus von der Malsburg wrote:
>>> Apparently, =json-parse-{buffer|string}= then gives you a symbol with a 
>>> space
>>> in it...
>>
>> I now see that symbol names “can contain any characters whatever” [1]. But 
>> many
>> characters need to be escaped (like spaces) which isn’t pretty.
>
> Agreed. But if you pass such a symbol to =symbol-name= or to =(format "%s")=,
> the escape character is removed, so when it comes to displaying those symbols 
> to
> users, it shouldn't matter much.
>
> Note, though, that the keys in CSL-JSON don't seem to contain any spaces or
> other weird characters. There are just lower case a-z and dash, that's all.

I agree that weird characters are unlikely going to be an issue.  Nonetheless, 
strings seem slightly more future-proof.  Funky unicode stuff is now appearing 
everywhere (I’ve seen emoji being used for variable names) and the situation 
could be different a couple of years down the line.

>>> This works for the Elisp library =json.el=, but Emacs 27 can be compiled 
>>> with
>>> native JSON support, which, however, doesn't provide this option,
>>> unfortunately.
>>
>> I see. In this case it might make sense to propose string keys as a feature 
>> for
>> json.c. The key is a string anyway at some point during parsing, so avoiding 
>> the
>> conversion to symbol may actually be the best way to speed things up.
>
> True. I'll ask on emacs-devel. Personally, I'd prefer strings, too, but I'm a
> bit hesitant about doing the conversion myself, esp. given that in Ebib, all 
> the
> keys would need to be converted back before I can save a file.

Sure, converting all keys in parsebib is not attractive.

>>> That would be easy to support, but IMHO is better handled in
>>> bibtex-completion:
>>> just parse the buffer and then call =gethash= on the resulting hash table. 
>>> Or
>>> what use-case do you have in mind?
>>
>> One use case: bibtex-completion drops fields that aren’t needed early on to 
>> save
>> memory and CPU cycles. (Some people work with truly enormous bibliographies,
>> like crypto.bib with ~60K entries.) But this means that we sometimes have to
>> read an individual entry again if we need more fields that were dropped 
>> earlier.
>> In this case I’d like to be able to read just one entry without having to
>> reparse the complete bibliography.
>
> Makes sense. For .bib sources, this should be fairly easy to do. For .json, I
> can't really say how easy it would be. It's not difficult to find the entry 
> key
> in the buffer, but from there you'd have to be able to find the start of the
> entry in order to parse it. Currently, I don't know how to do that.

Not a big deal.  Since it’s just about individual entries and the code isn’t 
super central, we can easily hack something.

 - Functions for resolving strings and cross-references.
> [...]
>>> parsebib has a lower-level API and a higher-level API, and the latter does
>>> essentially what you suggest here. I thought bibtex-completion was already
>>> using it...
>>
>> Nope. I think the high-level API didn’t exist when I wrote my code in 2014.
>
> No, it didn't. I seem to remember, though, that you gave me the idea for the
> higher-level API, which is probably why I assumed you were using it.
>
> So that part of =parsebib= hasn't been tested much... (Ebib doesn't use it,
> either). If you do decide to start using it, please test it and report any
> issues you find. And let me know if I can help with testing.

The organically grown parsing code in the Bibtex completion has been bugging me 
for a while.  So I'm keen on rewriting this.  But I may not get to it until the 
summer.  I'll keep you posted when I start working on it.

  Titus




Re: CSL-JSON support for =parsebib=

2021-05-07 Thread Joost Kremers


On Fri, May 07 2021, Titus von der Malsburg wrote:
>> Apparently, =json-parse-{buffer|string}= then gives you a symbol with a space
>> in it...
>
> I now see that symbol names “can contain any characters whatever” [1]. But 
> many
> characters need to be escaped (like spaces) which isn’t pretty.

Agreed. But if you pass such a symbol to =symbol-name= or to =(format "%s")=,
the escape character is removed, so when it comes to displaying those symbols to
users, it shouldn't matter much.

Note, though, that the keys in CSL-JSON don't seem to contain any spaces or
other weird characters. There are just lower case a-z and dash, that's all.

>> This works for the Elisp library =json.el=, but Emacs 27 can be compiled with
>> native JSON support, which, however, doesn't provide this option,
>> unfortunately.
>
> I see. In this case it might make sense to propose string keys as a feature 
> for
> json.c. The key is a string anyway at some point during parsing, so avoiding 
> the
> conversion to symbol may actually be the best way to speed things up.

True. I'll ask on emacs-devel. Personally, I'd prefer strings, too, but I'm a
bit hesitant about doing the conversion myself, esp. given that in Ebib, all the
keys would need to be converted back before I can save a file.

>> That would be easy to support, but IMHO is better handled in
>> bibtex-completion:
>> just parse the buffer and then call =gethash= on the resulting hash table. Or
>> what use-case do you have in mind?
>
> One use case: bibtex-completion drops fields that aren’t needed early on to 
> save
> memory and CPU cycles. (Some people work with truly enormous bibliographies,
> like crypto.bib with ~60K entries.) But this means that we sometimes have to
> read an individual entry again if we need more fields that were dropped 
> earlier.
> In this case I’d like to be able to read just one entry without having to
> reparse the complete bibliography.

Makes sense. For .bib sources, this should be fairly easy to do. For .json, I
can't really say how easy it would be. It's not difficult to find the entry key
in the buffer, but from there you'd have to be able to find the start of the
entry in order to parse it. Currently, I don't know how to do that.

>>> - Functions for resolving strings and cross-references.
[...]
>> parsebib has a lower-level API and a higher-level API, and the latter does
>> essentially what you suggest here. I thought bibtex-completion was already
>> using it...
>
> Nope. I think the high-level API didn’t exist when I wrote my code in 2014.

No, it didn't. I seem to remember, though, that you gave me the idea for the
higher-level API, which is probably why I assumed you were using it.

So that part of =parsebib= hasn't been tested much... (Ebib doesn't use it,
either). If you do decide to start using it, please test it and report any
issues you find. And let me know if I can help with testing.


-- 
Joost Kremers
Life has its moments



Re: CSL-JSON support for =parsebib=

2021-05-07 Thread Titus von der Malsburg


On 2021-05-07 Fri 14:34, Joost Kremers wrote:
> Hi Titus,
>
> On Fri, May 07 2021, Titus von der Malsburg wrote:
>> I’m the maintainer of bibtex-completion, helm-bibtex, and ivy-bibtex. My 
>> name is
>> actually Titus, not Theo ;)
>
> :$ (I do apologise!)
>
>> Regarding the symbols vs. string issue: I don’t have a strong opinion, but
>> personally tend to favor a conservative solution that avoids braking changes.
>> First, it’s difficult to predict how switching to symbols is going to affect
>> other software including custom code written by users. Second, JSON key names
>> can contain spaces and other weird stuff.
>
> Apparently, =json-parse-{buffer|string}= then gives you a symbol with a space 
> in it...

I now see that symbol names “can contain any characters whatever” [1].  But 
many characters need to be escaped (like spaces) which isn’t pretty.

>> So strings are perhaps a more natural
>> choice anyway. (It appears that you can actually configure the JSON parser to
>> use strings instead of symbols. See variable `json-key-type`.)
>
> This works for the Elisp library =json.el=, but Emacs 27 can be compiled with
> native JSON support, which, however, doesn't provide this option, 
> unfortunately.

I see.  In this case it might make sense to propose string keys as a feature 
for json.c.  The key is a string anyway at some point during parsing, so 
avoiding the conversion to symbol may actually be the best way to speed things 
up.

>> Finally,
>> it’s not necessarily clear that avoiding the conversion to strings saves
>> sufficiently many CPU cycles to justify the effort.
>
> I can simply try it out. Shouldn't be difficult to code up.
>
>> Regarding support for CSL-JSON: bibtex-completion is currently very
>> BibTeX-oriented and uses fairly low-level parsing functions from parsebib. We
>> could add similar support for CSL-JSON
>
> I'm afraid that won't be possible, because the CLS-JSON support in parsebib
> isn't low-level. ;-) There's basically just a single function that gives you 
> all
> the entries in the buffer and that's it.
>
>> Some rough ideas for such an API (just for illustration):
>> - A function that returns all entries in a .bib or CSL-JSON file.
>
> Those already exist... ;-) For JSON, that's basically the only option, because
> the actual parsing isn't handled by parsebib. For BibTeX, such a function has
> existed for some time now.

Wasn’t aware.  Fantastic!

>> - A function that returns an entry with a specific key (or multiple entries).
>
> That would be easy to support, but IMHO is better handled in 
> bibtex-completion:
> just parse the buffer and then call =gethash= on the resulting hash table. Or
> what use-case do you have in mind?

One use case: bibtex-completion drops fields that aren’t needed early on to 
save memory and CPU cycles.  (Some people work with truly enormous 
bibliographies, like crypto.bib with ~60K entries.)  But this means that we 
sometimes have to read an individual entry again if we need more fields that 
were dropped earlier.  In this case I’d like to be able to read just one entry 
without having to reparse the complete bibliography. 

>> - Functions for resolving strings and cross-references.
>
> This, too, is something that parsebib already does.

OMG, bibtex-completion is doing this as well, but I’d be happy to get rid of 
this code.

> parsebib has a lower-level API and a higher-level API, and the latter does
> essentially what you suggest here. I thought bibtex-completion was already 
> using it...

Nope. I think the high-level API didn’t exist when I wrote my code in 2014.

Seems like there’s quite a bit of potential for streamlining bibtex-completion. 
 Now I just need a week to work on it.  :)

  Titus


[1] https://www.gnu.org/software/emacs/manual/html_node/elisp/Symbol-Type.html



Re: CSL-JSON support for =parsebib=

2021-05-07 Thread Joost Kremers
Hi Titus,

On Fri, May 07 2021, Titus von der Malsburg wrote:
> I’m the maintainer of bibtex-completion, helm-bibtex, and ivy-bibtex. My name 
> is
> actually Titus, not Theo ;)

:$ (I do apologise!)

> Regarding the symbols vs. string issue: I don’t have a strong opinion, but
> personally tend to favor a conservative solution that avoids braking changes.
> First, it’s difficult to predict how switching to symbols is going to affect
> other software including custom code written by users. Second, JSON key names
> can contain spaces and other weird stuff.

Apparently, =json-parse-{buffer|string}= then gives you a symbol with a space 
in it...

> So strings are perhaps a more natural
> choice anyway. (It appears that you can actually configure the JSON parser to
> use strings instead of symbols. See variable `json-key-type`.)

This works for the Elisp library =json.el=, but Emacs 27 can be compiled with
native JSON support, which, however, doesn't provide this option, unfortunately.

> Finally,
> it’s not necessarily clear that avoiding the conversion to strings saves
> sufficiently many CPU cycles to justify the effort.

I can simply try it out. Shouldn't be difficult to code up.

> Regarding support for CSL-JSON: bibtex-completion is currently very
> BibTeX-oriented and uses fairly low-level parsing functions from parsebib. We
> could add similar support for CSL-JSON

I'm afraid that won't be possible, because the CLS-JSON support in parsebib
isn't low-level. ;-) There's basically just a single function that gives you all
the entries in the buffer and that's it.

> Some rough ideas for such an API (just for illustration):
> - A function that returns all entries in a .bib or CSL-JSON file.

Those already exist... ;-) For JSON, that's basically the only option, because
the actual parsing isn't handled by parsebib. For BibTeX, such a function has
existed for some time now.

> - A function that returns an entry with a specific key (or multiple entries).

That would be easy to support, but IMHO is better handled in bibtex-completion:
just parse the buffer and then call =gethash= on the resulting hash table. Or
what use-case do you have in mind?

> - Functions for resolving strings and cross-references.

This, too, is something that parsebib already does.

parsebib has a lower-level API and a higher-level API, and the latter does
essentially what you suggest here. I thought bibtex-completion was already 
using it...


-- 
Joost Kremers
Life has its moments



Re: CSL-JSON support for =parsebib=

2021-05-07 Thread Bruce D'Arcus
On Fri, May 7, 2021 at 8:52 AM Titus von der Malsburg
 wrote:

> It might be more elegant to have a higher-level API in parsebib.  This API 
> could perhaps even abstract away from the underlying format (BibTeX, 
> CSL-JSON, or others in the future?).  This would substantially simplify 
> matters in bibtex-completion, but would also enable many other cool uses of 
> parsebib.

Just wanted to +1 this!

Bruce



Re: CSL-JSON support for =parsebib=

2021-05-07 Thread Titus von der Malsburg



Hi all,

I’m the maintainer of bibtex-completion, helm-bibtex, and ivy-bibtex.  My name 
is actually Titus, not Theo ;)

Cool to see that the ecosystem around academic writing in org mode is 
developing so nicely.  I use org mode for this purpose every single working day 
and it’s amazing already.  I have to confess, though, that I haven’t been 
keeping up with recent developments.  I just saw the recent thread about the 
citation syntax.  (Thanks to Bruce D’Arcus for pointing me to it.)  Is there a 
good place where I can read up on the current efforts and plans regarding 
citations, bibliographies and so on (I mean other than reading the last couple 
of months of the mailing list archive)?

Regarding the symbols vs. string issue:  I don’t have a strong opinion, but 
personally tend to favor a conservative solution that avoids braking changes.  
First, it’s difficult to predict how switching to symbols is going to affect 
other software including custom code written by users.  Second, JSON key names 
can contain spaces and other weird stuff.  So strings are perhaps a more 
natural choice anyway.  (It appears that you can actually configure the JSON 
parser to use strings instead of symbols.  See variable `json-key-type`.)  
Third, as you say, it would also be nice to maintain compatibility with 
bibtex.el.  Finally, it’s not necessarily clear that avoiding the conversion to 
strings saves sufficiently many CPU cycles to justify the effort.  (But this 
may be a non-issue anyway, if the JSON parser can return strings directly.)

Having said that, I’d be happy to merge a PR that that implements the switch to 
symbols in bibtex-completion if that’s the consensus.  Touches a substantial 
number of lines, but should nonetheless be relatively straightforward.

Regarding support for CSL-JSON: bibtex-completion is currently very 
BibTeX-oriented and uses fairly low-level parsing functions from parsebib.  We 
could add similar support for CSL-JSON but things would become messy.  (It’s 
already a bit ugly, I have to say, which is entirely my fault.)  It might be 
more elegant to have a higher-level API in parsebib.  This API could perhaps 
even abstract away from the underlying format (BibTeX, CSL-JSON, or others in 
the future?).  This would substantially simplify matters in bibtex-completion, 
but would also enable many other cool uses of parsebib.

Some rough ideas for such an API (just for illustration):
- A function that returns all entries in a .bib or CSL-JSON file.
- A function that returns an entry with a specific key (or multiple entries).
- Functions for resolving strings and cross-references.

So much for now.

  Titus


On 2021-05-07 Fri 11:17, Joost Kremers wrote:
> Hi,
>
> [Cc-ing Theo von der Malsburg]
>
> Now that Org is getting support for Citeproc, it could be useful to add 
> support
> for the CSL-JSON format for bibliographic data to Emacs. Therefore, after a
> friendly request from Denis Maier, I have added support for this format to the
> =parsebib= library.
>
> Since =parsebib= is used by =bibtex-completions=, which in turn is used by
> =bibtex-actions=, =helm-bibtex=, =ivy-bibtex=, =org-ref= and 
> =org-roam-bibtex=,
> this is a first step in making bibliographic data in =.json= format directly
> available to Org users, without the need of any BibTeX conversion.
>
> [Boy, look at me doing the marketing speak! :D ]
>
> Anyway, this really is the first step. =bibtex-completion= will need to be
> modified in order to make use of the new functionality, and the same may be 
> true
> of the packages based on it.
>
> At this point, the new code isn't merged into =master= yet. It is available in
> the =wip/csl= branch of =parsebib='s Github repo:
>
> https://github.com/joostkremers/parsebib/tree/wip/csl
>
> The README has most of the details. I appreciate any and all comments,
> suggestions and tips.
>
> For those maintaining packages based on =parsebib=, I have at least one
> question: currently, =parsebib= returns a BibTeX entry in the form of an alist
> of =( . )= pairs, where both == and == are 
> strings.
> A CSL-JSON entry is returned as an alist, but the == names are symbols,
> not strings.
>
> It would be extremely impractical to return the JSON data with strings as 
> field
> names, because the JSON parsing libraries in Emacs return symbols, so 
> converting
> them would take time. Plus, those libraries also expect symbols when 
> serialising
> Elisp data to JSON. (Which I intend to make use of in Ebib later on.)
>
> It would be easier to modify the BibTeX output to return field names as 
> symbols.
> I originally chose strings, because that's what =bibtex.el= uses, making it a
> little easier to integrate with it.
>
> So the question: would it be helpful to make this change to the BibTeX data, 
> so
> that the data from both sources uses the same format? Or would it be better to
> keep it as it is, even if that means that BibTeX data and JSON data isn't
> compatible?
>
> TIA
>
> Joost

Re: CSL-JSON support for =parsebib=

2021-05-07 Thread Bruce D'Arcus
On Fri, May 7, 2021 at 7:30 AM Joost Kremers  wrote:

> Now that Org is getting support for Citeproc, it could be useful to add 
> support
> for the CSL-JSON format for bibliographic data to Emacs. Therefore, after a
> friendly request from Denis Maier, I have added support for this format to the
> =parsebib= library.

Nice!

...

> So the question: would it be helpful to make this change to the BibTeX data, 
> so
> that the data from both sources uses the same format?

Just as a general point, this.

>From my perspective as =bibtex-actions= developer, it's not a problem
given I don't have a lot of code that accesses that data directly. And
I'd rather be able to support both import formats without hassle.

Titus may have other views, of course, given how much
=bibtex-completoin= does work directly with that data.

Bruce



CSL-JSON support for =parsebib=

2021-05-07 Thread Joost Kremers
Hi,

[Cc-ing Theo von der Malsburg]

Now that Org is getting support for Citeproc, it could be useful to add support
for the CSL-JSON format for bibliographic data to Emacs. Therefore, after a
friendly request from Denis Maier, I have added support for this format to the
=parsebib= library.

Since =parsebib= is used by =bibtex-completions=, which in turn is used by
=bibtex-actions=, =helm-bibtex=, =ivy-bibtex=, =org-ref= and =org-roam-bibtex=,
this is a first step in making bibliographic data in =.json= format directly
available to Org users, without the need of any BibTeX conversion.

[Boy, look at me doing the marketing speak! :D ]

Anyway, this really is the first step. =bibtex-completion= will need to be
modified in order to make use of the new functionality, and the same may be true
of the packages based on it.

At this point, the new code isn't merged into =master= yet. It is available in
the =wip/csl= branch of =parsebib='s Github repo:

https://github.com/joostkremers/parsebib/tree/wip/csl

The README has most of the details. I appreciate any and all comments,
suggestions and tips.

For those maintaining packages based on =parsebib=, I have at least one
question: currently, =parsebib= returns a BibTeX entry in the form of an alist
of =( . )= pairs, where both == and == are strings.
A CSL-JSON entry is returned as an alist, but the == names are symbols,
not strings.

It would be extremely impractical to return the JSON data with strings as field
names, because the JSON parsing libraries in Emacs return symbols, so converting
them would take time. Plus, those libraries also expect symbols when serialising
Elisp data to JSON. (Which I intend to make use of in Ebib later on.)

It would be easier to modify the BibTeX output to return field names as symbols.
I originally chose strings, because that's what =bibtex.el= uses, making it a
little easier to integrate with it.

So the question: would it be helpful to make this change to the BibTeX data, so
that the data from both sources uses the same format? Or would it be better to
keep it as it is, even if that means that BibTeX data and JSON data isn't
compatible?

TIA

Joost


-- 
Joost Kremers
Life has its moments