Re: Yet another browser extension for capturing notes - LinkRemark

2022-01-19 Thread Samuel Wales
just a quick fwiw before i try to reply to the longer message by max.
my own suggestion is modest for metadata, [even for science papers and
things with funny web construction].  just title like org-capture
extension.  no need to cite in my case.

my needs for saving and restoring, however, are more fancy.  something
like achieving a 1:1 mapping from firefox selected tabs, or a tree
style tabs extension tree, to their counterparts in org, even when
those counterparts have notes and such.  this might include marking
the org version as deleted/doneified] merely by closing tab in
firefox.  vice-versa would be straightforward.  so it's really a "get
organized and don't get confused by having both firefox and org" kinda
thing.



Re: Yet another browser extension for capturing notes - LinkRemark

2022-01-19 Thread Ihor Radchenko
András Simonyi  writes:

>> As a side note, citeproc-el currently has poor performance on large org
>> files. It is unusable for me.
>
> Could you elaborate? In theory, oc-cs.el's performance should depend
> only on the number of citations (as opposed to the size of the Org
> document) and be in the same ballpark as pandoc's citeproc. It'd be
> interesting to know the details since I plan to work on speeding up
> citeproc-el's rendering, although you are the first one to actually
> complain :-).

There is no doubt why I complain - 15Mb "bibliography" file.

The oc-csl.el performance depends on the size of the Org document during
caching stage. Moreover, every time I change the Org document, caching
is repeated. Every time I open the file using oc-csl.el, caching is
repeated. Every time I revert file using oc-csl.el, caching is repeated.

I think that the easiest solution for citeproc would be not calling
org-bibtex-headline on every single headline, but using regexp search
for "BTYPE" property.

Best,
Ihor



Re: Yet another browser extension for capturing notes - LinkRemark

2022-01-19 Thread András Simonyi
Dear All,

On Wed, 19 Jan 2022 at 10:56, Ihor Radchenko  wrote:

> As a side note, citeproc-el currently has poor performance on large org
> files. It is unusable for me.

Could you elaborate? In theory, oc-cs.el's performance should depend
only on the number of citations (as opposed to the size of the Org
document) and be in the same ballpark as pandoc's citeproc. It'd be
interesting to know the details since I plan to work on speeding up
citeproc-el's rendering, although you are the first one to actually
complain :-).

best wishes,
András



Re: Yet another browser extension for capturing notes - LinkRemark

2022-01-19 Thread Ihor Radchenko
András Simonyi  writes:

> Just wanted to note that the CSL-based export processor, oc-csl.el,
> already supports this: you can add an Org file as a bibliography, cite
> items described by ol-bibtex style headings and export the citations.

Thanks for telling! oc-csl is tricky because it relies on external
library. So, it's hard to know what it can do and what it cannot do.

As a side note, citeproc-el currently has poor performance on large org
files. It is unusable for me.

> It'd be very nice indeed if other built-in processors supported the
> format too (e.g., "basic"). As for external ones, the CSL-based
> activation processor I wrote
> (https://github.com/andras-simonyi/org-cite-csl-activate) also
> supports it

Interesting. By the way, I recommend using composition instead of
display property for rendering. See prettify-symbols-mode.

Best,
Ihor



Re: Yet another browser extension for capturing notes - LinkRemark

2022-01-19 Thread András Simonyi
Dear All,

On Wed, 19 Jan 2022 at 04:24, Ihor Radchenko  wrote:

> > Scientific papers require more work, it is necessary to make them
> > available to org-cite somehow. Some nerds use quite peculiar blog
> > engines and strange setting of metadata. So shopping on some sites might
> > work better than other cases.
>
> I have plans to implement something called oc-org.el The plan is
> using ol-bibtex-compatible Org headings as a source of citations.

Just wanted to note that the CSL-based export processor, oc-csl.el,
already supports this: you can add an Org file as a bibliography, cite
items described by ol-bibtex style headings and export the citations.
It'd be very nice indeed if other built-in processors supported the
format too (e.g., "basic"). As for external ones, the CSL-based
activation processor I wrote
(https://github.com/andras-simonyi/org-cite-csl-activate) also
supports it and there are plans to add support to Citar as well
(through parsebib); see the discussion at
https://github.com/bdarcus/citar/issues/397.

best wishes,
András

> Best,
> Ihor
>



Re: Yet another browser extension for capturing notes - LinkRemark

2022-01-18 Thread Ihor Radchenko
Max Nikulin  writes:

> Scientific papers require more work, it is necessary to make them 
> available to org-cite somehow. Some nerds use quite peculiar blog 
> engines and strange setting of metadata. So shopping on some sites might 
> work better than other cases.

I have plans to implement something called oc-org.el The plan is
using ol-bibtex-compatible Org headings as a source of citations.

Best,
Ihor



Re: Yet another browser extension for capturing notes - LinkRemark

2022-01-18 Thread Max Nikulin

On 18/01/2022 12:43, Samuel Banya wrote:
Not sure if it helps, but you could also use the w3m browser's mentality 
of just keeping an HTML file that contains all of your bookmarks. I'm 
sure there's probably even a way to use 'eww' in the same fashion too.


Maybe even making your own personal wiki of a webring of sorts would 
help too.


I don't personally bookmark anything anymore but just store links on a 
webring on my site.


Actually Samuel Wales added more details to his message posted a year 
ago. I started that thread to announce LinkRemark browser extension 
https://github.com/maxnikulin/linkremark It was me who tried to revive 
the thread a month ago.


The idea is to store bookmarks in Org file and it should be more than 
just URL and page title. Rich "bookmark" should have more metadata and 
may have user comments.


In eww you likely can use org-store-link or org-capture directly. 
Example of projects that extracts metadata: 
https://github.com/yantar92/org-capture-ref


Doesn't Org mode is better than any wiki? At least in some aspects.




Re: Yet another browser extension for capturing notes - LinkRemark

2022-01-18 Thread Max Nikulin

Samuel,

since significant part of your message is dedicated to capturing of tab 
groups I should ask if you have tried version of LinkRemark add-on 
currently available from browser extension catalogues:


- https://addons.mozilla.org/firefox/addon/linkremark/
- https://chrome.google.com/webstore/detail/mgmcoaemjnaehlliifkgljdnbpedihoe

Groups of tabs or selected (highlighted) tabs are supported for 
Chromium, Firefox has no built-in tab groups, but it is still possible 
to capture selected tabs.


Your feature requests:
- Clean-up URLs. I have such idea, but I have not approached to 
implementation of it. Maybe URLs should be sent to another extension 
that excels in such task. If you have come comments which add-ons are 
great and which work rather poor, the suggestions my be helpful.
- Deduplicate URLs from tab groups. It requires some work to merge 
selected text, links, or nested frames from each tab. The complication 
is that some sites use internal navigation not reflected in location, so 
the same URL may have completely different content. Some sites have 
their top pages as canonical URLs, so some measures against false 
positives is required. Currently the extension may check if URL already 
present in org files. It requires https://github.com/maxnikulin/burl 
helper application that is in proof-of concept stage.
- Restore set of tabs. It requires some elisp code to iterate over 
subtree and to pick first "Link URL" or "URL" from description lists.


Currently I am thinking on some changes of interface since sometimes I 
just want to check if some URL is in my notes already. I would prefer to 
avoid adding more context menu items.


Additional details are inline.

On 17/01/2022 09:29, Samuel Wales wrote:

On 12/26/20, Maxim Nikulin  wrote:

On 26/12/2020, Samuel Wales wrote:


[... i can imagine great things possible with such extensions. for
example, you could have sets of tabs, selected by right click in
firefox, to save to a bunch of org entries.  then you could load that
particular set of entries into firefox whenever you want.


interesting.  i do note tab selection features in recent firefox-esr
and i was just assuming something like that.


There is no a ready to use recipe for loading saved tabs, but saving 
should work to some extent.



You can do this with the "Copy all URLs" extension (ID:
djdmadneanknadilpjiknlnanaolmbfk). Use this as the custom format (note
the linebreak):


I am almost sure that similar extension should exist for Firefox as well.


i think this is for copying all tabs, not selected ones.

...

also i think this extension does not exist any more in firefox.


I have not tried them:
- https://github.com/piroor/copy-selected-tabs-to-clipboard/
- https://github.com/yorkxin/copy-as-markdown


- Are you going to capture reviews of "rice cookers" that could be
considered as ordinary pages or you are going to save items from online
stores?

...

Could you inspect head element of pages in
your favorite stores contains desired metadata using page source or
inspect element tools?


my web knowledge is too limited to understand your question, but i am
just hoping it would capture ordinary amazon links, review sites, and
so on.


It seems that quality of metadata in marketplaces like amazon severely 
depends on particular seller. The extension attempts to treat some data 
specially if there are microdata or JSON-LD with Product schema.org 
type. If I remember correctly, Amazon does not expose canonical link 
explicitly.



[now if i can only debug the extra-blank-lines-in-capture problem.]


Fully agree that it is really annoying. It is among high priority items
in my TODO list.


we might be talking about different thinks.  i am referring to
something in org that adds blank lines when my particular org capture
templates are used.


See info "(org) Template elements" 
https://orgmode.org/manual/Template-elements.html

:empty-lines, :empty-lines-after, :empty-lines-before
however I can not say that I really understand their meaning. Actually I 
do not mind to have empty line before next heading when refile is 
completed. My impression that it depends on number of empty lines at the 
end of capture buffer. I usually add some comments to captured pages.


On 18/01/2022 08:03, Samuel Wales wrote:
> my amazon example was silly and confusing.  the point isn't shopping
> for something; it's anything.  science papers, news outlets, nerd
> blogs.

Scientific papers require more work, it is necessary to make them 
available to org-cite somehow. Some nerds use quite peculiar blog 
engines and strange setting of metadata. So shopping on some sites might 
work better than other cases.





Re: Yet another browser extension for capturing notes - LinkRemark

2022-01-17 Thread Samuel Banya
Not sure if it helps, but you could also use the w3m browser's mentality of 
just keeping an HTML file that contains all of your bookmarks. I'm sure there's 
probably even a way to use 'eww' in the same fashion too.

Maybe even making your own personal wiki of a webring of sorts would help too.

I don't personally bookmark anything anymore but just store links on a webring 
on my site.

Hope this helps.

Sam

On Mon, Jan 17, 2022, at 8:03 PM, Samuel Wales wrote:
> my amazon example was silly and confusing.  the point isn't shopping
> for something; it's anything.  science papers, news outlets, nerd
> blogs.
> 
> On 1/16/22, Samuel Wales  wrote:
> > more below.
> >
> > On 12/26/20, Maxim Nikulin  wrote:
> >> On 26/12/2020, Samuel Wales wrote:
> >>
> >>> [... i can imagine great things possible with such extensions. for
> >>> example, you could have sets of tabs, selected by right click in
> >>> firefox, to save to a bunch of org entries.  then you could load that
> >>> particular set of entries into firefox whenever you want.  and you
> >>> could keep notes on each page and move the entries wherever you want.
> >>> this would be useful for such things as "i am researching rice
> >>> cookers; these are my tabs, but i don't want them cluttering firefox
> >>> and i want them with my org notes and to make notes on them and will
> >>> re-load them into firefox when i want to revisit".]
> >>
> >> It should be possible since some tab management extension were used in
> >> mozilla to evaluate if webextensions are mature enough and if support of
> >> XUL add-ons could be dropped. On the other hand do not expect such
> >> feature soon. A kind of semi-blocker is absence of automatic tests to
> >> run before every release, and it will require a lot of time.
> >
> > interesting.  i do note tab selection features in recent firefox-esr
> > and i was just assuming something like that.
> >
> >>
> >> In the meanwhile, have you looked at the following comment?
> >> https://github.com/sprig/org-capture-extension/issues/12#issuecomment-323569334
> >> alphapapa commented Aug 20, 2017
> >>
> >>> You can do this with the "Copy all URLs" extension (ID:
> >>> djdmadneanknadilpjiknlnanaolmbfk). Use this as the custom format (note
> >>> the linebreak):
> >>>
> >>> [[$url][$title]]
> >>
> >> I am almost sure that similar extension should exist for Firefox as well.
> >
> > i think this is for copying all tabs, not selected ones.  so a
> > workaround for my idea would be to have a fresh firefox window
> > dedicated to rice cookers and then save them all.  bit it does not
> > save over existing canonical location for each url or similar.
> >
> > which would be needed for my idea so as to not have duplicates etc.
> >
> > also i think this extension does not exist any more in firefox.  i
> > used to use it for storing as org links.  but it was just to store
> > links in case firefox screwed up session restore.  which it usually
> > does.  for that purpose, i use one that does not save as orglinks.
> >
> >>
> >> Some points should be clarified in my opinion
> >>
> >> - Do you expect that metadata should be captured in addition to URLs and
> >> titles? Browsers can unload some tabs making page content unavailable.
> >
> > i wouldn't need this i think.  i'd want page title, just as in
> > ordinary org links, but in principle that can be assumed from the
> > existing org entry if exists, and if not exists and you are capturing,
> > the page is already loaded.  so i think not a metadata issue.
> >
> >> - Are you going to capture reviews of "rice cookers" that could be
> >> considered as ordinary pages or you are going to save items from online
> >> stores? I do not current state of affairs but I have heard about some
> >> activity for special metadata that allows search engines to display
> >> products in a special way. Could you inspect head element of pages in
> >> your favorite stores contains desired metadata using page source or
> >> inspect element tools?
> >
> > my web knowledge is too limited to understand your question, but i am
> > just hoping it would capture ordinary amazon links, review sites, and
> > so on.  and i never use js if i can avoid it so i'm expecting pretty
> > normal website stuff i think.  so i'm flexible.
> >
> > [of course, amazon per se links might need cleaning or uniquification
> > of some type for finding the version in org maybe, or maybe for
> > improving privacy by removing amazon's data about you in the url, but
> > that might not even need any special amazon link knowledge.
> > [fanciness might look for the amazon id, if implementer willing or
> > somethign exists for that.]]
> >
> >> - Should tab group be captured as single Org heading or it should be a
> >> tree with a section per tab? I am not sure that capture will have no
> >> problem with subtree. Certainly Emacs interface for org-protocol +
> >> capture are not suitable for sending each tab as a separate link.
> >> Another option is to create nested 

Re: Yet another browser extension for capturing notes - LinkRemark

2022-01-17 Thread Samuel Wales
my amazon example was silly and confusing.  the point isn't shopping
for something; it's anything.  science papers, news outlets, nerd
blogs.

On 1/16/22, Samuel Wales  wrote:
> more below.
>
> On 12/26/20, Maxim Nikulin  wrote:
>> On 26/12/2020, Samuel Wales wrote:
>>
>>> [... i can imagine great things possible with such extensions. for
>>> example, you could have sets of tabs, selected by right click in
>>> firefox, to save to a bunch of org entries.  then you could load that
>>> particular set of entries into firefox whenever you want.  and you
>>> could keep notes on each page and move the entries wherever you want.
>>> this would be useful for such things as "i am researching rice
>>> cookers; these are my tabs, but i don't want them cluttering firefox
>>> and i want them with my org notes and to make notes on them and will
>>> re-load them into firefox when i want to revisit".]
>>
>> It should be possible since some tab management extension were used in
>> mozilla to evaluate if webextensions are mature enough and if support of
>> XUL add-ons could be dropped. On the other hand do not expect such
>> feature soon. A kind of semi-blocker is absence of automatic tests to
>> run before every release, and it will require a lot of time.
>
> interesting.  i do note tab selection features in recent firefox-esr
> and i was just assuming something like that.
>
>>
>> In the meanwhile, have you looked at the following comment?
>> https://github.com/sprig/org-capture-extension/issues/12#issuecomment-323569334
>> alphapapa commented Aug 20, 2017
>>
>>> You can do this with the "Copy all URLs" extension (ID:
>>> djdmadneanknadilpjiknlnanaolmbfk). Use this as the custom format (note
>>> the linebreak):
>>>
>>> [[$url][$title]]
>>
>> I am almost sure that similar extension should exist for Firefox as well.
>
> i think this is for copying all tabs, not selected ones.  so a
> workaround for my idea would be to have a fresh firefox window
> dedicated to rice cookers and then save them all.  bit it does not
> save over existing canonical location for each url or similar.
>
> which would be needed for my idea so as to not have duplicates etc.
>
> also i think this extension does not exist any more in firefox.  i
> used to use it for storing as org links.  but it was just to store
> links in case firefox screwed up session restore.  which it usually
> does.  for that purpose, i use one that does not save as orglinks.
>
>>
>> Some points should be clarified in my opinion
>>
>> - Do you expect that metadata should be captured in addition to URLs and
>> titles? Browsers can unload some tabs making page content unavailable.
>
> i wouldn't need this i think.  i'd want page title, just as in
> ordinary org links, but in principle that can be assumed from the
> existing org entry if exists, and if not exists and you are capturing,
> the page is already loaded.  so i think not a metadata issue.
>
>> - Are you going to capture reviews of "rice cookers" that could be
>> considered as ordinary pages or you are going to save items from online
>> stores? I do not current state of affairs but I have heard about some
>> activity for special metadata that allows search engines to display
>> products in a special way. Could you inspect head element of pages in
>> your favorite stores contains desired metadata using page source or
>> inspect element tools?
>
> my web knowledge is too limited to understand your question, but i am
> just hoping it would capture ordinary amazon links, review sites, and
> so on.  and i never use js if i can avoid it so i'm expecting pretty
> normal website stuff i think.  so i'm flexible.
>
> [of course, amazon per se links might need cleaning or uniquification
> of some type for finding the version in org maybe, or maybe for
> improving privacy by removing amazon's data about you in the url, but
> that might not even need any special amazon link knowledge.
> [fanciness might look for the amazon id, if implementer willing or
> somethign exists for that.]]
>
>> - Should tab group be captured as single Org heading or it should be a
>> tree with a section per tab? I am not sure that capture will have no
>> problem with subtree. Certainly Emacs interface for org-protocol +
>> capture are not suitable for sending each tab as a separate link.
>> Another option is to create nested lists, anyway org formatter in my
>> extension need improvements. Are you expecting headings subtree or
>> nested lists?
>
> the status quo is that there is nothing, so using lists would be a
> huge improvement and work great.  but fanciness by using org sections
> if poss [i assume this means header and metadata and content and maybe
> descendents] could be more flexible.
>
>>
>>> [now if i can only debug the extra-blank-lines-in-capture problem.]
>>
>> Fully agree that it is really annoying. It is among high priority items
>> in my TODO list.
>
> we might be talking about different thinks.  i am referring to
> something in org that adds 

Re: Yet another browser extension for capturing notes - LinkRemark

2022-01-16 Thread Samuel Wales
more below.

On 12/26/20, Maxim Nikulin  wrote:
> On 26/12/2020, Samuel Wales wrote:
>
>> [... i can imagine great things possible with such extensions. for
>> example, you could have sets of tabs, selected by right click in
>> firefox, to save to a bunch of org entries.  then you could load that
>> particular set of entries into firefox whenever you want.  and you
>> could keep notes on each page and move the entries wherever you want.
>> this would be useful for such things as "i am researching rice
>> cookers; these are my tabs, but i don't want them cluttering firefox
>> and i want them with my org notes and to make notes on them and will
>> re-load them into firefox when i want to revisit".]
>
> It should be possible since some tab management extension were used in
> mozilla to evaluate if webextensions are mature enough and if support of
> XUL add-ons could be dropped. On the other hand do not expect such
> feature soon. A kind of semi-blocker is absence of automatic tests to
> run before every release, and it will require a lot of time.

interesting.  i do note tab selection features in recent firefox-esr
and i was just assuming something like that.

>
> In the meanwhile, have you looked at the following comment?
> https://github.com/sprig/org-capture-extension/issues/12#issuecomment-323569334
> alphapapa commented Aug 20, 2017
>
>> You can do this with the "Copy all URLs" extension (ID:
>> djdmadneanknadilpjiknlnanaolmbfk). Use this as the custom format (note
>> the linebreak):
>>
>> [[$url][$title]]
>
> I am almost sure that similar extension should exist for Firefox as well.

i think this is for copying all tabs, not selected ones.  so a
workaround for my idea would be to have a fresh firefox window
dedicated to rice cookers and then save them all.  bit it does not
save over existing canonical location for each url or similar.

which would be needed for my idea so as to not have duplicates etc.

also i think this extension does not exist any more in firefox.  i
used to use it for storing as org links.  but it was just to store
links in case firefox screwed up session restore.  which it usually
does.  for that purpose, i use one that does not save as orglinks.

>
> Some points should be clarified in my opinion
>
> - Do you expect that metadata should be captured in addition to URLs and
> titles? Browsers can unload some tabs making page content unavailable.

i wouldn't need this i think.  i'd want page title, just as in
ordinary org links, but in principle that can be assumed from the
existing org entry if exists, and if not exists and you are capturing,
the page is already loaded.  so i think not a metadata issue.

> - Are you going to capture reviews of "rice cookers" that could be
> considered as ordinary pages or you are going to save items from online
> stores? I do not current state of affairs but I have heard about some
> activity for special metadata that allows search engines to display
> products in a special way. Could you inspect head element of pages in
> your favorite stores contains desired metadata using page source or
> inspect element tools?

my web knowledge is too limited to understand your question, but i am
just hoping it would capture ordinary amazon links, review sites, and
so on.  and i never use js if i can avoid it so i'm expecting pretty
normal website stuff i think.  so i'm flexible.

[of course, amazon per se links might need cleaning or uniquification
of some type for finding the version in org maybe, or maybe for
improving privacy by removing amazon's data about you in the url, but
that might not even need any special amazon link knowledge.
[fanciness might look for the amazon id, if implementer willing or
somethign exists for that.]]

> - Should tab group be captured as single Org heading or it should be a
> tree with a section per tab? I am not sure that capture will have no
> problem with subtree. Certainly Emacs interface for org-protocol +
> capture are not suitable for sending each tab as a separate link.
> Another option is to create nested lists, anyway org formatter in my
> extension need improvements. Are you expecting headings subtree or
> nested lists?

the status quo is that there is nothing, so using lists would be a
huge improvement and work great.  but fanciness by using org sections
if poss [i assume this means header and metadata and content and maybe
descendents] could be more flexible.

>
>> [now if i can only debug the extra-blank-lines-in-capture problem.]
>
> Fully agree that it is really annoying. It is among high priority items
> in my TODO list.

we might be talking about different thinks.  i am referring to
something in org that adds blank lines when my particular org capture
templates are used.  i think it is outside all of the hooks that are
available for org capture so not fixable using those.

recent org might fix it dunno.  i am limited in coputer use so i have
not tried to debug it further.  just delete the extra lines.

>
> 

Re: Yet another browser extension for capturing notes - LinkRemark

2020-12-27 Thread Maxim Nikulin

On 26/12/2020 20:49, Ihor Radchenko wrote:
> Maxim Nikulin  writes:

I have reordered some parts of discussion


Also, do you pass any of the parsed metadata to org-protocol? If you
do, it would be trivial to get it into capture templates on Elisp
(and org-capture-ref) side.


I decided that capture could be too complicated to fit into simple query 
parameters of org protocol, e.g. it could be a chain of frames. That is 
why I implemented just simple option title + body (url is available but 
it is contained in the body). I am considering generating of tree of 
headings in some cases.


On the other hand almost all captured data is available to native 
messaging backend 
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Native_messaging 
A dumb example is included into the sources. It is python, but you could 
use any other language. It is just streaming JSON with message size sent 
in binary form. I have added JSON-RPC to let native messaging host to 
report errors and to avoid ambiguity related to attribution of response 
to particular request. I do not think that setting up of org-protocol 
handler is harder than adding manifest for native messaging backend. It 
should be even a bit safer since some weird org-protocol message could 
not be placed behind an innocent link text.


I think it should be no problem to call emacs-client from such 
application. Isn't it enough for customization? Do you still need raw 
html? Currently I am trying to avoid customization inside the extensions 
since it is harder to keep history of settings changes in git. 
Extensions are quite isolated from host. Also I do not think that 
something like mustache/handlebars templates would be warmly welcomed by 
emacs users.



I do not have clear vision how to use collected data for queries.
Certainly I want to have more human-friendly representation than BibTeX
entries (maybe in addition to machine-parsable data) adjacent to my notes.


So far, I found author, website name, publication year, title, and
resource type useful. My standard capture template for links is:

*  [] () Title


I see that my current choice to prefer og:title or twitter:title for 
header is far from been optimal, even head/title text usually is better.
However I was writing about a bit more detailed two or three-line 
representation. Often I prefer a kind of "card" representation to 
table/columns view.


Concerning queries, see below.


Completely agree here. That's why I directly reuse the current DOM state
from qutebrowser in my own setup. However, extension for qutebrowser was
easy to write for me as it can be simply a bash script. I know nothing
about Firefox/Chrome extensions and I do not know javascript.


It is too easy to underquote some variable reference in bash and to get 
executed something unexpected. Almost any other script language is safer 
in this sense.



  From my point of view, you should be happy with any of projects you
mentioned below. Are all of them have some problems critical for you?


They are all javascript, except one (unicontent), which can be easily
replaced with built-in Elisp libraries (dom.el).


I mean running them using a very thin wrapper that generates metadata in 
the form easily parsable in emacs.



Another idea would be providing a callback from elisp to browser (I am
not sure if it is possible). org-capture-ref has a mechanism to check if
the link was captured in the past. If the link is already captured, the
information about the link location and todo-state can be messaged back
to the browser.

Example message (only qutebrowser is supported now):

Bookmark not saved!
Already captured into org-capture-ref:TODO maxnikulin [Github] linkremark: 
LinkRemark - page or link notes with context


Why it should be a callback from elisp? From my point of view it is 
extension that should initiate a query if particular URL has been 
captured already. I have realized that in my drafts I even have a native 
messaging backend that could filter matched URLs from a text file. It 
was intended to autocomplete URLs typed in the browser location bar 
using text file as a kind of bookmark storage, but it could be adapted 
for checks similar to yours.


Though it is better to get link to the header with URL (e.g. CUSTOM_ID), 
so additional links or quotes could be added and linked to the "main" 
entry. I have not tried if such query using emacs-client is fast enough. 
I have seen a thread on Language Server Protocol but have not checked if 
that protocol supports such queries.


I especially like idea of references to existing headers because it 
allows to avoid cluttering context menus with options to capture link 
without page metadata in addition to existing ones.





Re: Yet another browser extension for capturing notes - LinkRemark

2020-12-26 Thread Ihor Radchenko
Maxim Nikulin  writes:

> I just inspected pages on several sites using developer tools and added
> code that handles noticed elements.

I see. I basically did the same, except some minimal support for
OpenGraph (though I stopped when I saw that even YouTube is not
following the standard, except the most basic fields).

> The only force to add some formal data is "share" buttons. Maybe some
> guides for web developers from social networks or search engines could
> be more useful than formal references, but I have not had a closer
> look.

It is also consistent with what I saw.  fields seems
to be very common.

>> Also, org-capture-ref does not really force the user to put BiBTeX into
>> the capture. Individual metadata fields are available using
>> org-capture-ref-get-bibtex-field (which extracts data from internal
>> alist structure). It's just that I mostly had BiBTeX in mind (with
>> distant goal of supporting export to LaTeX) for my use-cases.
>
> I do not have clear vision how to use collected data for queries. 
> Certainly I want to have more human-friendly representation than BibTeX 
> entries (maybe in addition to machine-parsable data) adjacent to my notes.

So far, I found author, website name, publication year, title, and
resource type useful. My standard capture template for links is:

*  [] () Title

Example:

* dash-docs-el [Github] Dash-Docs-El Helm-Dash: Browse Dash Docsets Inside Emacs

Such headlines can be easily searched later, especially when I also add
some #keywords manually.

> Personally, I would prefer to avoid http queries from Emacs. Sometimes 
> it is better to have current DOM state, not page source, that is why I 
> decided to gather data inside browser, despite security fences that are 
> placed quite strangely in some cases.

Completely agree here. That's why I directly reuse the current DOM state
from qutebrowser in my own setup. However, extension for qutebrowser was
easy to write for me as it can be simply a bash script. I know nothing
about Firefox/Chrome extensions and I do not know javascript.

On the other hand, having an ability to get html is still useful in my
case (Emacs package) when the capture is not done from browser. For
example, I often capture links from elfeed - http query from Emacs is
useful then.

>  From my point of view, you should be happy with any of projects you 
> mentioned below. Are all of them have some problems critical for you?

They are all javascript, except one (unicontent), which can be easily
replaced with built-in Elisp libraries (dom.el).

>> Finally, would you be interested to join efforts on metadata parsing?
>
> Could you, please, share a bit more details on your ideas? 

> Technically it should be possible to push e.g. raw 
> document.head.innerHtml to any external metadata parser using native 
> messaging (to deal with sites requiring authorization). However it could 
> cause an alarm during review before publication of the extension to the 
> browser catalogues.

That's unfortunate. Pushing raw html/dom is what I had in mind when
talking about joining efforts.

Another idea would be providing a callback from elisp to browser (I am
not sure if it is possible). org-capture-ref has a mechanism to check if
the link was captured in the past. If the link is already captured, the
information about the link location and todo-state can be messaged back
to the browser.

Example message (only qutebrowser is supported now):

Bookmark not saved!
Already captured into org-capture-ref:TODO maxnikulin [Github] linkremark: 
LinkRemark - page or link notes with context

>There is some room for improvement, but I do not think that quality of
> metadata for ordinary sites could be dramatically better. The case
> that is not handled it all is scientific publications, unfortunately
> currently I have quite little interest in it. Definitely results
> should be stored in some structured format such as BibTeX. I have seen
> huge  elements describing even all references. Certainly such
> lists are not for general-purpose notes (at least without explicit
> request from the user), they should be handled by some bibliography
> software to display citation graphs in the local library. On the other
> hand it is not a problem to feed such data to some tool using native
> messaging protocol. I have no idea if various publisher provide such
> data in a uniform way, I just hope that pressure from citation indices
> and bibliography management software has positive influence on
> standardization.

I think https://github.com/microlinkhq/metascraper#core-rules can be
used for ideas. It has generic parsing apart from site-specific rules.

For the scientific publications, the key point is usually getting
DOI/ISBN. Then, most of the metadata can be obtained using standard API
of doi.org or various ISBN databases. In addition, reference data is
generally available in OpenCitations.net (they also have all kinds of
web APIs).

Also, do you pass any of the parsed metadata 

Re: Yet another browser extension for capturing notes - LinkRemark

2020-12-26 Thread Maxim Nikulin

On 25/12/2020, Ihor Radchenko wrote:


Reading through the code, I can see that you are familiar with metadata
conventions. Do you know good references about what og: metadata is
commonly used? I looked through the official OpenGraph specification,
but popular websites appear to ignore most of the conventions.


I just inspected pages on several sites using developer tools and added
code that handles noticed elements.

I have not tried to find any resources on metadata (OK, once I searched 
for LD+JSON, essentially the outcome was the link to schema.org that I 
have seen in data already). Looking into page source, I realized that 
almost nobody cares if the site has metadata of appropriate quality. I 
think, search engines are advanced enough to work without metadata and 
even decrease page rank if something suspicious was added by SEO. The 
only force to add some formal data is "share" buttons. Maybe some guides 
for web developers from social networks or search engines could be more 
useful than formal references, but I have not had a closer look.



Also, org-capture-ref does not really force the user to put BiBTeX into
the capture. Individual metadata fields are available using
org-capture-ref-get-bibtex-field (which extracts data from internal
alist structure). It's just that I mostly had BiBTeX in mind (with
distant goal of supporting export to LaTeX) for my use-cases.


I do not have clear vision how to use collected data for queries. 
Certainly I want to have more human-friendly representation than BibTeX 
entries (maybe in addition to machine-parsable data) adjacent to my notes.


Personally, I would prefer to avoid http queries from Emacs. Sometimes 
it is better to have current DOM state, not page source, that is why I 
decided to gather data inside browser, despite security fences that are 
placed quite strangely in some cases.


From my point of view, you should be happy with any of projects you 
mentioned below. Are all of them have some problems critical for you?


Technically it should be possible to push e.g. raw 
document.head.innerHtml to any external metadata parser using native 
messaging (to deal with sites requiring authorization). However it could 
cause an alarm during review before publication of the extension to the 
browser catalogues.



Finally, would you be interested to join efforts on metadata parsing?


Could you, please, share a bit more details on your ideas? There is some 
room for improvement, but I do not think that quality of metadata for 
ordinary sites could be dramatically better. The case that is not 
handled it all is scientific publications, unfortunately currently I 
have quite little interest in it. Definitely results should be stored in 
some structured format such as BibTeX. I have seen huge  elements 
describing even all references. Certainly such lists are not for 
general-purpose notes (at least without explicit request from the user), 
they should be handled by some bibliography software to display citation 
graphs in the local library. On the other hand it is not a problem to 
feed such data to some tool using native messaging protocol. I have no 
idea if various publisher provide such data in a uniform way, I just 
hope that pressure from citation indices and bibliography management 
software has positive influence on standardization.


I am not going to blow up the code with recipes for particular sites. 
However I realize that some special cases still should be handled. I am 
not ready to adapt user script model used by 
Greasemonkey/Violentmonkey/Tampermonkey. I believe, it is better to 
create dedicated extension(s) that either adds and overwrites existing 
meta elements or allows to query gathered data using sendMessage 
webextensions interface. By the way, scripts for above mentioned 
extensions could be used as well. It should alleviate cases when some 
site with insane metadata is important for particular user.



P.S. Some links I collected myself when working on org-capture-ref. They
might also be of interest for you:

- https://github.com/ageitgey/node-unfluff
- https://github.com/gabceb/node-metainspector
- https://github.com/wikimedia/html-metadata
- https://github.com/microlinkhq/metascraper
- https://github.com/hboisgibault/unicontent


Thank you for the links. I should have a closer look at that projects. 
E.g. I considered itemprop="author" elements but postponed 
implementation of such features. For some reason I even did not tried to 
find existing projects for metadata extraction. Maybe I still hope that 
quite simple implementation could handle most of the cases.





Re: Yet another browser extension for capturing notes - LinkRemark

2020-12-26 Thread Maxim Nikulin

On 26/12/2020, Samuel Wales wrote:


[... i can imagine great things possible with such extensions. for
example, you could have sets of tabs, selected by right click in
firefox, to save to a bunch of org entries.  then you could load that
particular set of entries into firefox whenever you want.  and you
could keep notes on each page and move the entries wherever you want.
this would be useful for such things as "i am researching rice
cookers; these are my tabs, but i don't want them cluttering firefox
and i want them with my org notes and to make notes on them and will
re-load them into firefox when i want to revisit".]


It should be possible since some tab management extension were used in 
mozilla to evaluate if webextensions are mature enough and if support of 
XUL add-ons could be dropped. On the other hand do not expect such 
feature soon. A kind of semi-blocker is absence of automatic tests to 
run before every release, and it will require a lot of time.


In the meanwhile, have you looked at the following comment?
https://github.com/sprig/org-capture-extension/issues/12#issuecomment-323569334
alphapapa commented Aug 20, 2017


You can do this with the "Copy all URLs" extension (ID:
djdmadneanknadilpjiknlnanaolmbfk). Use this as the custom format (note
the linebreak):

[[$url][$title]]


I am almost sure that similar extension should exist for Firefox as well.

Some points should be clarified in my opinion

- Do you expect that metadata should be captured in addition to URLs and 
titles? Browsers can unload some tabs making page content unavailable.
- Are you going to capture reviews of "rice cookers" that could be 
considered as ordinary pages or you are going to save items from online 
stores? I do not current state of affairs but I have heard about some 
activity for special metadata that allows search engines to display 
products in a special way. Could you inspect head element of pages in 
your favorite stores contains desired metadata using page source or 
inspect element tools?
- Should tab group be captured as single Org heading or it should be a 
tree with a section per tab? I am not sure that capture will have no 
problem with subtree. Certainly Emacs interface for org-protocol + 
capture are not suitable for sending each tab as a separate link. 
Another option is to create nested lists, anyway org formatter in my 
extension need improvements. Are you expecting headings subtree or 
nested lists?



[now if i can only debug the extra-blank-lines-in-capture problem.]


Fully agree that it is really annoying. It is among high priority items 
in my TODO list.


Accidentally I pressed =C-x C-o= and discovered 
[[help:delete-blank-lines]] innerText is not exactly the same as 
selection range toString but the rules could work in a similar way. 
Table rows, floating and absolutely positioned elements require 
newlines. Such elements are often abused by designers.

https://html.spec.whatwg.org/multipage/dom.html#dom-innertext




Re: Yet another browser extension for capturing notes - LinkRemark

2020-12-25 Thread Samuel Wales
maxim, it is great to see new work in this area.  thanks for sharing.

russell, i use the org-capture extension for firefox, which is on the
firefox extensions site.  it is for if you want a different set of
data captured [it uses your org capture template].  it works well for
me.

[not a suggestion for maxim to integrate into everything; ignore
please.  i can imagine great things possible with such extensions. for
example, you could have sets of tabs, selected by right click in
firefox, to save to a bunch of org entries.  then you could load that
particular set of entries into firefox whenever you want.  and you
could keep notes on each page and move the entries wherever you want.
this would be useful for such things as "i am researching rice
cookers; these are my tabs, but i don't want them cluttering firefox
and i want them with my org notes and to make notes on them and will
re-load them into firefox when i want to revisit".]

[now if i can only debug the extra-blank-lines-in-capture problem.]


On 12/25/20, Russell Adams  wrote:
> On Fri, Dec 25, 2020 at 07:44:22PM +0700, Maxim Nikulin wrote:
>> I am experimenting with a browser add-on that is intended
>> to be a bridge between browser and Org mode.
>> In the family of Org mode capture helpers it is among ones
>> that adds web page metadata to the note.
>> Source code repository: https://github.com/maxnikulin/linkremark
>
> That's a really neat idea!
>
> I hadn't previously considered having a Firefox plugin to capture
> information. Now I must look!
>
> --
> Russell Adamsrlad...@adamsinfoserv.com
>
> PGP Key ID: 0x1160DCB3   http://www.adamsinfoserv.com/
>
> Fingerprint:1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3
>
>


-- 
The Kafka Pandemic

Please learn what misopathy is.
https://thekafkapandemic.blogspot.com/2013/10/why-some-diseases-are-wronged.html



Re: Yet another browser extension for capturing notes - LinkRemark

2020-12-25 Thread Russell Adams
On Fri, Dec 25, 2020 at 07:44:22PM +0700, Maxim Nikulin wrote:
> I am experimenting with a browser add-on that is intended
> to be a bridge between browser and Org mode.
> In the family of Org mode capture helpers it is among ones
> that adds web page metadata to the note.
> Source code repository: https://github.com/maxnikulin/linkremark

That's a really neat idea!

I hadn't previously considered having a Firefox plugin to capture
information. Now I must look!

--
Russell Adamsrlad...@adamsinfoserv.com

PGP Key ID: 0x1160DCB3   http://www.adamsinfoserv.com/

Fingerprint:1723 D8CA 4280 1EC9 557F  66E8 1154 E018 1160 DCB3



Re: Yet another browser extension for capturing notes - LinkRemark

2020-12-25 Thread Ihor Radchenko
Maxim Nikulin  writes:

> I am experimenting with a browser add-on that is intended
> to be a bridge between browser and Org mode.
> In the family of Org mode capture helpers it is among ones
> that adds web page metadata to the note.
> Source code repository: https://github.com/maxnikulin/linkremark

The author of org-capture-ref here.

Reading through the code, I can see that you are familiar with metadata
conventions. Do you know good references about what og: metadata is
commonly used? I looked through the official OpenGraph specification,
but popular websites appear to ignore most of the conventions.

Also, org-capture-ref does not really force the user to put BiBTeX into
the capture. Individual metadata fields are available using
org-capture-ref-get-bibtex-field (which extracts data from internal
alist structure). It's just that I mostly had BiBTeX in mind (with
distant goal of supporting export to LaTeX) for my use-cases.

Finally, would you be interested to join efforts on metadata parsing? (I
hope this question does not qualify as "integrate this extension to
everything").

P.S. Some links I collected myself when working on org-capture-ref. They
might also be of interest for you:

- https://github.com/ageitgey/node-unfluff
- https://github.com/gabceb/node-metainspector
- https://github.com/wikimedia/html-metadata
- https://github.com/microlinkhq/metascraper
- https://github.com/hboisgibault/unicontent

Best,
Ihor





Yet another browser extension for capturing notes - LinkRemark

2020-12-25 Thread Maxim Nikulin

I am experimenting with a browser add-on that is intended
to be a bridge between browser and Org mode.
In the family of Org mode capture helpers it is among ones
that adds web page metadata to the note.
Source code repository: https://github.com/maxnikulin/linkremark

Examples

Link:

--->8---
Link: Karl Voit: UOMF: Managing web bookmarks with Org Mode
  :PROPERTIES:
  :DATE_ADDED: [2020-12-25 18:06]
  :END:

- Link URL :: [[https://karl-voit.at/2014/08/10/bookmarks-with-orgmode/]]
- Link text :: Karl Voit: UOMF: Managing web bookmarks with Org Mode

On the page

- URL :: [[https://alphapapa.github.io/org-almanac/]]
- title :: org-almanac
- author :: Adam Porter
- referrer :: [[https://www.google.com/]]
---8<---

Page:

--->8---
public voit
  :PROPERTIES:
  :DATE_ADDED: [2020-12-25 18:11]
  :URL_IMAGE: http://Karl-Voit.at/images/public-voit_T_logo_200x200.png
  :END:

- URL :: [[https://karl-voit.at/2014/08/10/bookmarks-with-orgmode/]]
- title :: public voit
- author :: Karl Voit
- published_time :: 2014-08-10T17:13+01:00
- referrer :: [[https://alphapapa.github.io/org-almanac/]]

#+begin_quote
In my notes.org file, I collect all kind of snippets, knowledge, ideas, 
how-tos, and such stuff.

#+end_quote
---8<---

It is not really ready for the wild web, though
I believe it is already possible to get general impression
and even use it for pages where specially crafted data
are rather unlikely. Due to early development stage,
there is no stability promise yet.

The extension has not published to catalogues of browser extensions.
Signed version for Firefox could be found in "releases" section
on GitHub: 
https://github.com/maxnikulin/linkremark/releases/download/v0.1/linkremark-0.1-fx.xpi

For chrome/chromium it could be loaded as unpacked
extension. Just clone the code and create a symlink
to =manifest-chrome.json= named =manifest.json=.

=README.org= file contains a bit more details,
so visit [[https://github.com/maxnikulin/linkremark]]
or just clone this repository.

The mail list is quite noisy last couple of months,
so, please, do not post lengthy proposals how to integrate
this extension to everything in response.

The gift is crafted quite roughly, glue has not fully cured,
so do not be surprised if you are stuck trying to adapt it
for your habits.

Merry Christmas and Happy New Year!