Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-09-26 Thread Bastien
Hi,

I lost track of the discussion in this thread.  

My understanding is that there is no agreement on what feature 
should be added to Org regarding "smarter interactive table import"
and a patch is not yet ready for the suggested feature.

I'm marking this as canceled on updates.orgmode.org but feel free
to correct me if needed.

Thanks,

-- 
 Bastien



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-09 Thread Maxim Nikulin

On 06/06/2021 00:50, Utkarsh Singh wrote:

On 2021-06-05, 19:40 +0700, Maxim Nikulin wrote:


This line should be added to org-table.el, but notice "rx ‘not’ syntax
error: (or ". This logical operators have a bit different meaning for
regular expression.


I just "tested" most of them by M-x eval-buffer on
the patched file and then checked the output of `with-temp-buffer'.


I get "Eager macro-expansion failure: (error "rx ‘not’ syntax error: (or 
10 32)")" even in response to M-x eval-buffer.



After reading FAQ about multiple lines in table field I don't think this
PATCH makes much sense as my main driving force for this patch was to
simplify regular expression in `org-table-convert-region' and somehow
add newlines support.


You may check whether csv-mode have got support of newline inside cell 
already and try to sell your parser to them.


There are regular discussions how wonderful it would be to extend org 
table syntax. In my opinion org spreadsheet is overloaded already. On 
the other hand, newline in table cell sometimes is really useful. I have 
seen an idea to put some macro to single-line cell that expands to 
format-specific representation during export ({{{nl}}} =>  in the 
case of HTML). Such macro does not require extension of syntax, so it 
may be viable at least in user configuration.



P.S. Have you read "Structure and Interpretation of Computer Programs"

I also tried Racket after reading:


SICP is dedicated to general concepts rather than to particular 
language, that is why I consider it as a "special case".






Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-05 Thread Utkarsh Singh
On 2021-06-05, 19:40 +0700, Maxim Nikulin  wrote:

> This line should be added to org-table.el, but notice "rx ‘not’ syntax 
> error: (or ". This logical operators have a bit different meaning for 
> regular expression.
>
> Nicolas suggested you to use `list' to avoid `eval'. Certainly `eval' 
> should be avoided whenever possible, but there is alternative with "`" 
> and "," instead of "'". Though I often prefer `list'.

Done.

>> Can you also share test.org so that I can test locally?
>
> I can reproduce it with an empty org file. Emacs-26.3. So I am curious 
> if your tested your patch at all.

Yes you are right I was not testing my patches using 'emacs -Q' and I am
apologetic about it.  I just "tested" most of them by M-x eval-buffer on
the patched file and then checked the output of `with-temp-buffer'.

>> Now on the topic of CSV parser, I have succesfully implemented a CSV
>> parser in less than 65 LOC which also preserves newline character but I
>> am facing a dilemma on how should I represent it as a Org table.  For
>> ex:
> ...
>> 6,"Cell with new
>> Line",6.28
>
> Due to
> https://orgmode.org/worg/org-faq.html#Tables
> "Will there ever be support for multiple lines in a table field? No."
> (there are may be some tricks in Org manual) any variant is acceptable, 
> even current one. Since during export split cell is considered as 
> different rows, it is responsibility of the user to preprocess original 
> file and import it in a form suitable for Org.
>
> However independently of chosen variant, it would be great to get list 
> of warnings with ambiguous lines similar to compiler errors. Such 
> feature is not implemented in Org yet, the most close is `org-lint' 
> result (its behavior differs from compiler errors).

After reading FAQ about multiple lines in table field I don't think this
PATCH makes much sense as my main driving force for this patch was to
simplify regular expression in `org-table-convert-region' and somehow
add newlines support.

I also think `org-table-convert-region' is doing a fine job and I should
keep my "I want to contribute" habit in check.

> I expect complexity O(number of lines in result). I do not like 
> excessive usage of `dolist'. I would try slicing subrows using `mapcar' 
> while there is at least one non-nil element.

Yes! Currently I am trying to sequentially read Elisp manual and I have
completed the section on 'Mapping Function' yesterday.  I think it's a
really nice language feature that I can make use of.

> P.S. Have you read "Structure and Interpretation of Computer Programs" 
> by Abelson, Sussman, and Sussman?
> https://sarabander.github.io/sicp/html/index.xhtml
> or https://mitpress.mit.edu/sites/default/files/sicp/index.html
> While reading, it is necessary to realize that elisp is not scheme, tail 
> recursion should be avoided in elisp and there are over differences.

Thank you! I have seen most of the decision on Lisp includes SICP but I
am currently stopping myself to Elisp manual to learn something useful
about this timeless editor and learn thing in a step-by-step manner.

I also tried Racket after reading:

https://dustycloud.org/blog/racket-is-an-acceptable-python
https://beautifulracket.com/

But stopped myself for the reason same as above.

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-05 Thread Maxim Nikulin

On 04/06/2021 11:04, Utkarsh Singh wrote:

On 2021-06-02, 23:44 +0700, Maxim Nikulin wrote:


On 02/06/2021 22:08, Utkarsh Singh wrote:

;;;###autoload
   (defun org-table-import (file separator)
@@ -955,12 +971,13 @@ lines.  It can have the following values:
   - integer When a number, use that many spaces, or a TAB, as field separator.
   - regexp  When a regular expression, use it to match the separator."
 (interactive (list (read-file-name "Import file: ")
- (prefix-numeric-value current-prefix-arg)))
+ current-prefix-arg))


It seems, prefix argument works now. Let me remind that file name
completion was working better before your change.


If you are still going to update this patch, I recommend you to check if 
`list' form of `interactive' declaration is supported by Emacs 24.3. 
Recently Kyle had to almost revert one my patch. Fortunately a simpler 
alternative supported by old Emacs versions were already discussed.



I have noticed a couple of error messages. Unsure what is going wrong, I
hope, command to launch emacs is correct (-Q -L ~/src/org-mode/lisp
test.org).

At startup:
Eager macro-expansion failure: (error "rx ‘not’ syntax error: (or 10 44)")

In response to M-x org-table-import:
rx-check-not: rx ‘not’ syntax error: (or 10 44)


Is `rx' library loading correctly? If no then you can try:

+(require 'rx)


This line should be added to org-table.el, but notice "rx ‘not’ syntax 
error: (or ". This logical operators have a bit different meaning for 
regular expression.


Nicolas suggested you to use `list' to avoid `eval'. Certainly `eval' 
should be avoided whenever possible, but there is alternative with "`" 
and "," instead of "'". Though I often prefer `list'.




Can you also share test.org so that I can test locally?


I can reproduce it with an empty org file. Emacs-26.3. So I am curious 
if your tested your patch at all.



Now on the topic of CSV parser, I have succesfully implemented a CSV
parser in less than 65 LOC which also preserves newline character but I
am facing a dilemma on how should I represent it as a Org table.  For
ex:

...

6,"Cell with new
Line",6.28


Due to
https://orgmode.org/worg/org-faq.html#Tables
"Will there ever be support for multiple lines in a table field? No."
(there are may be some tricks in Org manual) any variant is acceptable, 
even current one. Since during export split cell is considered as 
different rows, it is responsibility of the user to preprocess original 
file and import it in a form suitable for Org.


However independently of chosen variant, it would be great to get list 
of warnings with ambiguous lines similar to compiler errors. Such 
feature is not implemented in Org yet, the most close is `org-lint' 
result (its behavior differs from compiler errors).



2. Represent every newline as distinct cell.  I was trying to implement
this method but the algorithm that I came up with requires time
complexity of O(n^2) to just print each row which is too much for large
CSV files.

(defun org-table--print-row (row)
   (let ((maxlines 1)
 list1 list2 tmp)
 ;; get maxlines
 (dolist (i row)


I expect complexity O(number of lines in result). I do not like 
excessive usage of `dolist'. I would try slicing subrows using `mapcar' 
while there is at least one non-nil element.


P.S. Have you read "Structure and Interpretation of Computer Programs" 
by Abelson, Sussman, and Sussman?

https://sarabander.github.io/sicp/html/index.xhtml
or https://mitpress.mit.edu/sites/default/files/sicp/index.html
While reading, it is necessary to realize that elisp is not scheme, tail 
recursion should be avoided in elisp and there are over differences.





Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-03 Thread Utkarsh Singh
On 2021-06-02, 23:44 +0700, Maxim Nikulin  wrote:

> On 02/06/2021 22:08, Utkarsh Singh wrote:
>> ;;;###autoload
>>   (defun org-table-import (file separator)
>> @@ -955,12 +971,13 @@ lines.  It can have the following values:
>>   - integer When a number, use that many spaces, or a TAB, as field 
>> separator.
>>   - regexp  When a regular expression, use it to match the separator."
>> (interactive (list (read-file-name "Import file: ")
>> - (prefix-numeric-value current-prefix-arg)))
>> + current-prefix-arg))
>
> It seems, prefix argument works now. Let me remind that file name 
> completion was working better before your change.
>
> I have noticed a couple of error messages. Unsure what is going wrong, I 
> hope, command to launch emacs is correct (-Q -L ~/src/org-mode/lisp 
> test.org).
>
> At startup:
> Eager macro-expansion failure: (error "rx ‘not’ syntax error: (or 10 44)")
>
> In response to M-x org-table-import:
> rx-check-not: rx ‘not’ syntax error: (or 10 44)
>

Is `rx' library loading correctly? If no then you can try:

diff --git a/lisp/org-table.el b/lisp/org-table.el
index 20144c91b..9278f91bb 100644
--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -38,6 +38,7 @@
 (require 'org-macs)
 (require 'org-compat)
 (require 'org-keys)
+(require 'rx)

Can you also share test.org so that I can test locally?

>> Currently I am trying to refactor CSV parsing by applying techniques
>> used in pcsv library (MELPA package) which I think you will also enjoy
>> to play with!
>
> I do not know opinion of Org maintainers. Personally I believe that org 
> can take advantage of Emacs core features or of other packages if they 
> are available and fallback to minimal implementation otherwise. Unsure 
> whether borrowing code from pcsv can cause license issues. Current CSV 
> parser is not perfect but it works reasonably well.

Yes, I think you are right about how Org should make use of existing
feature rather than including everyting with itself but for now I am
considering this as an opportunity to learn how Org and Emacs work and
will leave it upto the maintainers if they find anything useful to be
including into Org itself.

Now on the topic of CSV parser, I have succesfully implemented a CSV
parser in less than 65 LOC which also preserves newline character but I
am facing a dilemma on how should I represent it as a Org table.  For
ex:

Row with newline as CSV data:

6,"Cell with new
Line",6.28

Row with newline as Lisp data:

("6" "Cell with new\nLine" "6.28")

Row with newline a Org table:

| 6 | Cell with new  |   6.28 |
|   | Line   ||

Rows in Org mode are separated using a newline-character which makes
"Cell with new" and "Line" as different cells which is different from
how Libreoffice Calc represents it.  Libreoffice Calc has a much
powerfull representation of a cell which is out of scope for any plain
text tables.  Possible solutions for this problem are:

1. Replace newline with a space character.  What will happen when
imported line as a really long line?

2. Represent every newline as distinct cell.  I was trying to implement
this method but the algorithm that I came up with requires time
complexity of O(n^2) to just print each row which is too much for large
CSV files.

(defun org-table--print-row (row)
  (let ((maxlines 1)
list1 list2 tmp)
;; get maxlines
(dolist (i row)
  (if (not (stringp i))
  (push (list i) list1)
(push (string-lines i) list1)
(setq tmp (length (car list1)))
(when (< maxlines tmp)
  (setq maxlines tmp
;; normalize row
(dolist (i list1)
  (setq tmp (length i))
  (when (< tmp maxlines)
(setq i (append i (make-list (- maxlines tmp) ""
  (push i list2))
;; print row
(dolist (i list2)
  (setq tmp (point))
  (dolist (j i) <-- Printing is still not complete

Thank you,
Utkarsh Singh

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-02 Thread Maxim Nikulin

On 02/06/2021 22:08, Utkarsh Singh wrote:

;;;###autoload
  (defun org-table-import (file separator)
@@ -955,12 +971,13 @@ lines.  It can have the following values:
  - integer When a number, use that many spaces, or a TAB, as field separator.
  - regexp  When a regular expression, use it to match the separator."
(interactive (list (read-file-name "Import file: ")
- (prefix-numeric-value current-prefix-arg)))
+ current-prefix-arg))


It seems, prefix argument works now. Let me remind that file name 
completion was working better before your change.


I have noticed a couple of error messages. Unsure what is going wrong, I 
hope, command to launch emacs is correct (-Q -L ~/src/org-mode/lisp 
test.org).


At startup:
Eager macro-expansion failure: (error "rx ‘not’ syntax error: (or 10 44)")

In response to M-x org-table-import:
rx-check-not: rx ‘not’ syntax error: (or 10 44)


Currently I am trying to refactor CSV parsing by applying techniques
used in pcsv library (MELPA package) which I think you will also enjoy
to play with!


I do not know opinion of Org maintainers. Personally I believe that org 
can take advantage of Emacs core features or of other packages if they 
are available and fallback to minimal implementation otherwise. Unsure 
whether borrowing code from pcsv can cause license issues. Current CSV 
parser is not perfect but it works reasonably well.





Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-02 Thread Utkarsh Singh
On 2021-06-02, 19:06 +0700, Maxim Nikulin  wrote:

> Notice that for "C-u M-x org-table-import" heuristics is not necessary, 
> the separator is specified explicitly. I see that your intention was to 
> improve user interface of org-table-import, but actually you broke it by 
> the "org-table-import: add file prompt" patch. I have not tried it, but 
> my expectation is that user prompt can be customized while keeping all 
> other things working. Maybe the docstring should be updated to make it 
> more friendly to novices (like me while reviewing your patch).

Thanks for pointing out the error, locally you can fix it by (patches
for patches):

;;;###autoload
 (defun org-table-import (file separator)
@@ -955,12 +971,13 @@ lines.  It can have the following values:
 - integer When a number, use that many spaces, or a TAB, as field separator.
 - regexp  When a regular expression, use it to match the separator."
   (interactive (list (read-file-name "Import file: ")
- (prefix-numeric-value current-prefix-arg)))
+ current-prefix-arg))

Currently I am trying to refactor CSV parsing by applying techniques
used in pcsv library (MELPA package) which I think you will also enjoy
to play with!

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-02 Thread Maxim Nikulin

On 02/06/2021 00:46, Utkarsh Singh wrote:

Org 9.4.5+patches M-x org-table-import

| 1,Word,66.3e-35 ||  |   |
| 2,Unquoted  | cell,2.7   |  |   |
| 3,"Quoted   | cell",3.14 |  |   |
| 4,"Cell | ""with | quotes""",2021-06-01 |   |
| 5,"Next | cell   | is   | empty","" |
| 6,"Cell | with   | new  |   |
| Line",6.28  ||  |   |

So my personal conclusion is that CSV file is imported incorrectly in
both cases: with guessed separator and with explicitly requested through
prefix argument. Completion works a bit worse too.


Currently `org-table-guess-separator' returns "," (COMMA as string) and
`org-table-covert-region' uses '(4) to represent COMMA as separator
which is the main cause of breakdown in importing.

To make importing work well we have to:

+ Guess right separator (`org-table-guess-separator')
+ Parse CSV with this separator (`org-table-covert-region')

As far as I can tell "guessing part" works well and now we just have to
make parser work well with new separators.


Notice that for "C-u M-x org-table-import" heuristics is not necessary, 
the separator is specified explicitly. I see that your intention was to 
improve user interface of org-table-import, but actually you broke it by 
the "org-table-import: add file prompt" patch. I have not tried it, but 
my expectation is that user prompt can be customized while keeping all 
other things working. Maybe the docstring should be updated to make it 
more friendly to novices (like me while reviewing your patch).







Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-01 Thread Utkarsh Singh
Hi Maxim,

First of all I would like to thank you for testing out patches and
taking time to write a detailed report about it.

On 2021-06-01, 23:23 +0700, Maxim Nikulin  wrote:

> On 17/05/2021 12:29, Bastien wrote:
>> Utkarsh Singh  writes:
>>> For now can you review the patches I proposed earlier in this
>>> thread?
>> 
>> Not until both you and Maxim are confident this is useful, complete
>> and predictable.
>
> I have too many points to object to consider my opinion as objective.
>
> Org 9.4.5+patches M-x org-table-import
>
> | 1,Word,66.3e-35 ||  |   |
> | 2,Unquoted  | cell,2.7   |  |   |
> | 3,"Quoted   | cell",3.14 |  |   |
> | 4,"Cell | ""with | quotes""",2021-06-01 |   |
> | 5,"Next | cell   | is   | empty","" |
> | 6,"Cell | with   | new  |   |
> | Line",6.28  ||  |   |
>
> So my personal conclusion is that CSV file is imported incorrectly in 
> both cases: with guessed separator and with explicitly requested through 
> prefix argument. Completion works a bit worse too.

Currently `org-table-guess-separator' returns "," (COMMA as string) and
`org-table-covert-region' uses '(4) to represent COMMA as separator
which is the main cause of breakdown in importing.

To make importing work well we have to:

+ Guess right separator (`org-table-guess-separator')
+ Parse CSV with this separator (`org-table-covert-region')

As far as I can tell "guessing part" works well and now we just have to
make parser work well with new separators.

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-01 Thread Maxim Nikulin

On 17/05/2021 12:29, Bastien wrote:

Utkarsh Singh  writes:

For now can you review the patches I proposed earlier in this
thread?


Not until both you and Maxim are confident this is useful, complete
and predictable.


I have too many points to object to consider my opinion as objective.

I am unsure if colon as a separator (passwd "db") is widely used case. I 
was surprised that it is impossible to implement locale-aware detection 
of semicolon as separator due to limitations of Emacs that is not 
friendly in respect to internationalization of number formatting.


Bastien, I do not know how much tests you prefer to have in Org. Nicolas 
asked for extensive tests 
https://orgmode.org/list/875z07jx6n@nicolasgoaziou.fr I think, 
before starting work on tests, it is necessary to decide if the patches 
are acceptable in general.


Personally, I have realized that I would prefer to have anything related 
to CSV (besides very basic features) in a dedicated package, e.g. in 
csv-mode. It might define a special yank handler for copy-paste to 
org-mode. Unsure if the author of csv-mode agrees with my point of view. 
Another option is to pass files through python code to take advantage of 
more advanced heuristics.


On 15/05/2021 18:09, https://orgmode.org/list/87im3kdzi5@gmail.com 
Utkarsh Singh wrote:

--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -954,7 +954,8 @@ lines.  It can have the following values:
 - (64)Prompt for a regular expression as field separator.
 - integer When a number, use that many spaces, or a TAB, as field separator.
 - regexp  When a regular expression, use it to match the separator."
-  (interactive "f\nP")
+  (interactive (list (read-file-name "Import file: ")
+ (prefix-numeric-value current-prefix-arg)))
   (when (and (called-interactively-p 'any)


Sending patches, I am afraid to break something I was not aware about.

I have read docstrings for the modified functions and tried to import 
the following file "tbl.csv" with some CSV features. LibreOffice imports 
it correctly, it even normalizes 66.3e-35 to 6.63e-34 (I can not say 
that I always appreciate such silent modifications).


1,Word,66.3e-35
2,Unquoted cell,2.7
3,"Quoted cell",3.14
4,"Cell ""with quotes""",2021-06-01
5,"Next cell is empty",""
6,"Cell with new
Line",6.28

My optimistic expectation was (OK, I did not believe I got such result):

| 1 | Word   |   66.3e-35 |
| 2 | Unquoted cell  |2.7 |
| 3 | Quoted cell|   3.14 |
| 4 | Cell "with quotes" | 2021-06-01 |
| 5 | Next cell is empty ||
| 6 | Cell with new  |   6.28 |
|   | Line   ||

Org 9.1.9 M-x org-table-import
and C-u M-x org-table-import
actual results are close enough to my expectations

| 1 | Word   |   66.3e-35 |
| 2 | Unquoted cell  |2.7 |
| 3 | Quoted cell|   3.14 |
| 4 | Cell "with quotes" | 2021-06-01 |
| 5 | Next cell is empty ||
| 6 | "Cell with new ||
| Line" | 6.28   ||

M-x org-table-import RET tbl RET
completes file name to tbl.csv

Org 9.4.5+patches M-x org-table-import

| 1,Word,66.3e-35 ||  |   |
| 2,Unquoted  | cell,2.7   |  |   |
| 3,"Quoted   | cell",3.14 |  |   |
| 4,"Cell | ""with | quotes""",2021-06-01 |   |
| 5,"Next | cell   | is   | empty","" |
| 6,"Cell | with   | new  |   |
| Line",6.28  ||  |   |

Org 9.4.5+patches C-u M-x org-table-import

| 1,Word,66.3e-35 |
| 2,Unquoted cell,2.7 |
| 3,"Quoted cell",3.14|
| 4,"Cell ""with quotes""",2021-06-01 |
| 5,"Next cell is empty",""   |
| 6,"Cell with new|
| Line",6.28  |

M-x org-table-import RET tbl RET
complains that file name extension is not txt, tsv or csv.

So my personal conclusion is that CSV file is imported incorrectly in 
both cases: with guessed separator and with explicitly requested through 
prefix argument. Completion works a bit worse too.


One more note concerning locale support.

On 18/05/2021 22:05, Utkarsh Singh wrote:
> On 2021-05-18, 19:31 +0700, Maxim Nikulin wrote:
The question may be risen in emacs-devel but I am unsure if I will 
participate in discussion.


Why?


I am aware of some problems related to localization but I do not have 
consistent vision what API emacs should have. I have no idea what 
information is available in Windows. That is why I expect that 
discussion may be time consuming while I am not sure that someone will 
be ready to implement new features.





Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-18 Thread Utkarsh Singh
On 2021-05-18, 19:31 +0700, Maxim Nikulin  wrote:

> The question may be risen in emacs-devel but I am unsure if I will 
> participate in discussion.

Why?

>> Can you test this function:
>> 
>> (defun org-table--comma-as-decimal-sep ()
>>"Return nil or 2 if separator is dot or comma respectively."
>>(string-search "," (format "%f" 10)))
>
> No, it does not work. `format' always uses dot. It is reasonable when 
> e.g. during writing a config file or during data exchange when locales 
> must be ignored.
>
> I was too optimistic. I did not expect that support of locales are so 
> poor in Emcacs. I do not see any traces of localeconv(3) in sources that 
> would allow to get value of decimal_point directly.
>
> Numbers are forced to use "C" locale and I have not noticed any way to 
> override it. Initial settings:
>
> http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n1490
>
> http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n2861
>
> setlocale (LC_NUMERIC, "C");
>

Hmm, so this means that Elisp cannot read something like this '10,1' as
floating point number.

>> To test I am using: $ LANG=de_DE.UTF-8 emacs -Q
>> 
>> But I am getting this as warning:
>> (process:1787): Gtk-WARNING **: 15:40:49.375: Locale not supported by C 
>> library.
>>  Using the fallback 'C' locale.
>
> You get this error due to you have not generated this locale. On debian 
> & ubuntu
>
> dpkg-reconfigure locales
>
> allows to select desired locales and performs all necessary actions.

Thanks!  I have fixed it now.

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-18 Thread Maxim Nikulin
It seems, there is no reliable way to work with numbers in a 
locale-aware way in emacs.  I am still against hard-coded list of 
locales. Requirement to customize a variable is rather inconvenient. 
Considering such properties as a part of translation is a little better 
but I prefer to avoid it.


The question may be risen in emacs-devel but I am unsure if I will 
participate in discussion.


On 18/05/2021 17:24, Utkarsh Singh wrote:

On 2021-05-16, 23:24 +0700, Maxim Nikulin wrote:


On 14/05/2021 21:54, Utkarsh Singh wrote:

On 2021-05-13, 00:08 +0700, Maxim Nikulin wrote:


Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
that order in which separator candidates are tried should depend on
active locale.


I am willing to work on this problem but before this can you identify
any other locale with similar problem or a resource from where I can
information's about locale.


I do not think list of locales should be hard coded. I am not familiar
with elisp enough to tell you if locale properties such as decimal
separator are exposed through some function. I expect, it is quite
probable. As a last measure, some number, e.g. 0.5 or 1.5 could be
formatted using a locale-aware function and result string could be
checked if it contains ",".


Can you test this function:

(defun org-table--comma-as-decimal-sep ()
   "Return nil or 2 if separator is dot or comma respectively."
   (string-search "," (format "%f" 10)))


No, it does not work. `format' always uses dot. It is reasonable when 
e.g. during writing a config file or during data exchange when locales 
must be ignored.


I was too optimistic. I did not expect that support of locales are so 
poor in Emcacs. I do not see any traces of localeconv(3) in sources that 
would allow to get value of decimal_point directly.


Numbers are forced to use "C" locale and I have not noticed any way to 
override it. Initial settings:


http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n1490

http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n2861

setlocale (LC_NUMERIC, "C");


To test I am using:
$ LANG=de_DE.UTF-8 emacs -Q

But I am getting this as warning:
(process:1787): Gtk-WARNING **: 15:40:49.375: Locale not supported by C library.
Using the fallback 'C' locale.


You get this error due to you have not generated this locale. On debian 
& ubuntu


dpkg-reconfigure locales

allows to select desired locales and performs all necessary actions.




Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-18 Thread Utkarsh Singh
On 2021-05-16, 23:24 +0700, Maxim Nikulin  wrote:

> On 14/05/2021 21:54, Utkarsh Singh wrote:
>> On 2021-05-13, 00:08 +0700, Maxim Nikulin wrote:
>> 
>>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>>> that order in which separator candidates are tried should depend on
>>> active locale.
>>>
>> I am willing to work on this problem but before this can you identify
>> any other locale with similar problem or a resource from where I can
>> information's about locale.
>
> I do not think list of locales should be hard coded. I am not familiar 
> with elisp enough to tell you if locale properties such as decimal 
> separator are exposed through some function. I expect, it is quite 
> probable. As a last measure, some number, e.g. 0.5 or 1.5 could be 
> formatted using a locale-aware function and result string could be 
> checked if it contains ",".

Can you test this function:

(defun org-table--comma-as-decimal-sep ()
  "Return nil or 2 if separator is dot or comma respectively."
  (string-search "," (format "%f" 10)))

To test I am using:
$ LANG=de_DE.UTF-8 emacs -Q

But I am getting this as warning:
(process:1787): Gtk-WARNING **: 15:40:49.375: Locale not supported by C library.
Using the fallback 'C' locale.
-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-17 Thread Utkarsh Singh
On 2021-05-17, 07:29 +0200, Bastien  wrote:

> Hi Utkarsh,
>
> Utkarsh Singh  writes:
>
>> For now can you review the patches I proposed earlier in this
>> thread?
>
> Not until both you and Maxim are confident this is useful, complete
> and predictable.

Ok.

> Also, if you resend the patch for review, please use a proper commit
> message: https://orgmode.org/worg/org-contribute.html#commit-messages

Thanks for the guidance.  I was just trying to fit commit message in 50
words but next time I will keep this in mind.

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-17 Thread Utkarsh Singh
On 2021-05-16, 23:24 +0700, Maxim Nikulin  wrote:

> On 14/05/2021 21:54, Utkarsh Singh wrote:
>> On 2021-05-13, 00:08 +0700, Maxim Nikulin wrote:
>> 
>>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>>> that order in which separator candidates are tried should depend on
>>> active locale.
>>>
>> I am willing to work on this problem but before this can you identify
>> any other locale with similar problem or a resource from where I can
>> information's about locale.
>
> I do not think list of locales should be hard coded. I am not familiar 
> with elisp enough to tell you if locale properties such as decimal 
> separator are exposed through some function. I expect, it is quite 
> probable. As a last measure, some number, e.g. 0.5 or 1.5 could be 
> formatted using a locale-aware function and result string could be 
> checked if it contains ",".

I will go through the Elisp manual to find solution for this problem.
If you now any other better resource fell free to mail me.

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-16 Thread Bastien
Hi Utkarsh,

Utkarsh Singh  writes:

> For now can you review the patches I proposed earlier in this
> thread?

Not until both you and Maxim are confident this is useful, complete
and predictable.

Also, if you resend the patch for review, please use a proper commit
message: https://orgmode.org/worg/org-contribute.html#commit-messages

Thanks,

-- 
 Bastien



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-16 Thread Maxim Nikulin

On 14/05/2021 21:54, Utkarsh Singh wrote:

On 2021-05-13, 00:08 +0700, Maxim Nikulin wrote:


Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
that order in which separator candidates are tried should depend on
active locale.


I am willing to work on this problem but before this can you identify
any other locale with similar problem or a resource from where I can
information's about locale.


I do not think list of locales should be hard coded. I am not familiar 
with elisp enough to tell you if locale properties such as decimal 
separator are exposed through some function. I expect, it is quite 
probable. As a last measure, some number, e.g. 0.5 or 1.5 could be 
formatted using a locale-aware function and result string could be 
checked if it contains ",".






Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-15 Thread Utkarsh Singh
On 2021-05-15, 12:30 +0200, Bastien  wrote:

> Hi Utkarsh,
>
> Utkarsh Singh  writes:
>
>> For now can you review the patches I proposed earlier in this
>> thread?
>
> Do these patches provide a complete and predictable solution?
> If so, can you merge them into a single patch against master?
>

Here are the patches:

I have separated out 'adding of prompt' to patch2 as it doesn't
have to do anything with separator guessing part.

>From 659e5e95bd8e024ac4730cf410a565b9e975b669 Mon Sep 17 00:00:00 2001
From: Utkarsh Singh 
Date: Sat, 15 May 2021 16:30:36 +0530
Subject: [PATCH 1/2] org-table-convert-region: move out separator-guessing

1. Move separator guessing code to org-table-guess-separator (new
function).
2. Add semicolon, colon and SPACE to the list of know separator
(separator which we can guess).
---
 lisp/org-table.el | 49 ---
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/lisp/org-table.el b/lisp/org-table.el
index cc69542..d3242da 100644
--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -837,6 +837,39 @@ SIZE is a string Columns x Rows like for example \"3x2\"."
   (goto-char pos))
 (org-table-align)))
 
+(defun org-table-guess-separator (beg0 end0)
+  "Guess separator for region BEG0 to END0.
+
+List of preferred separator (in order of preference):
+comma, TAB, semicolon, colon or SPACE.
+
+Search for a line which doesn't contain a separator if found
+search again using next preferred separator or else return
+separator as string."
+  (let* ((beg (save-excursion
+(goto-char (min beg0 end0))
+(skip-chars-forward " \t\n")
+(if (eobp) (point) (line-beginning-position
+	 (end (save-excursion
+(goto-char (max beg0 end0))
+(skip-chars-backward " \t\n" beg)
+(if (= beg (point)) (point) (line-end-position
+ (sep-regexp
+  (list (list ","  (rx bol (1+ (not (or ?\n ?,))) eol))
+		(list "\t" (rx bol (1+ (not (or ?\n ?\t))) eol))
+		(list ";"  (rx bol (1+ (not (or ?\n ?\;))) eol))
+		(list ":"  (rx bol (1+ (not (or ?\n ?:))) eol))
+		(list " "  (rx bol (1+ (not (or ?\n ?\s))) eol)
+(unless (= beg end)
+  (save-excursion
+(goto-char beg)
+(catch :found
+  (pcase-dolist (`(,sep ,regexp) sep-regexp)
+(save-excursion
+  (unless (re-search-forward regexp end t)
+(throw :found sep
+  nil)
+
 ;;;###autoload
 (defun org-table-convert-region (beg0 end0  separator)
   "Convert region to a table.
@@ -853,10 +886,7 @@ following values:
 integer  When a number, use that many spaces, or a TAB, as field separator
 regexp   When a regular expression, use it to match the separator
 nil  When nil, the command tries to be smart and figure out the
- separator in the following way:
- - when each line contains a TAB, assume TAB-separated material
- - when each line contains a comma, assume CSV material
- - else, assume one or more SPACE characters as separator."
+ separator using `org-table-guess-seperator'."
   (interactive "r\nP")
   (let* ((beg (min beg0 end0))
 	 (end (max beg0 end0))
@@ -873,13 +903,10 @@ nil  When nil, the command tries to be smart and figure out the
   (if (bolp) (backward-char 1) (end-of-line 1))
   (setq end (point-marker))
   ;; Get the right field separator
-  (unless separator
-	(goto-char beg)
-	(setq separator
-	  (cond
-	   ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
-	   ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
-	   (t 1
+  (when (and (not separator)
+ (not (setq separator
+(org-table-guess-separator beg end
+(user-error "Failed to guess separator"))
   (goto-char beg)
   (if (equal separator '(4))
 	  (while (< (point) end)
-- 
2.31.1

>From 328d59dd2d2fc797e5819f43b40c96ffa45cd62f Mon Sep 17 00:00:00 2001
From: Utkarsh Singh 
Date: Sat, 15 May 2021 16:32:32 +0530
Subject: [PATCH 2/2] org-table-import: add file prompt

---
 lisp/org-table.el | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lisp/org-table.el b/lisp/org-table.el
index d3242da..20144c9 100644
--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -954,7 +954,8 @@ lines.  It can have the following values:
 - (64)Prompt for a regular expression as field separator.
 - integer When a number, use that many spaces, or a TAB, as field separator.
 - regexp  When a regular expression, use it to match the separator."
-  (interactive "f\nP")
+  (interactive (list (read-file-name "Import file: ")
+ (prefix-numeric-value current-prefix-arg)))
   (when (and (called-interactively-p 'any)
 	 (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
  (not (yes-or-no-p "The file's extension is not .txt, .tsv or .csv.  Import? ")))
-- 

Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-15 Thread Bastien
Hi Utkarsh,

Utkarsh Singh  writes:

> For now can you review the patches I proposed earlier in this
> thread?

Do these patches provide a complete and predictable solution?
If so, can you merge them into a single patch against master?

Thanks,

-- 
 Bastien



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-15 Thread Utkarsh Singh
On 2021-05-15, 11:13 +0200, Bastien  wrote:

> Hi Utkarsh and Maxim,
>
> Utkarsh Singh  writes:
>
>> On 2021-05-13, 00:08 +0700, Maxim Nikulin  wrote:
>>
>>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>>> that order in which separator candidates are tried should depend on
>>> active locale.
>>>
>>> I do not insist that interactive preview or smarter approach to guess
>>> separator have to be implemented. 
>
> I don't think the interactive preview would be that useful.  For the
> smarter approach, I think it will always be somewhat fragile, I don't
> think it is worth the addional complexity in Org's code.

Ok.

For now can you review the patches I proposed earlier in this thread?

Thank you,
Utkarsh Singh

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-15 Thread Bastien
Hi Utkarsh and Maxim,

Utkarsh Singh  writes:

> On 2021-05-13, 00:08 +0700, Maxim Nikulin  wrote:
>
>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>> that order in which separator candidates are tried should depend on
>> active locale.
>>
>> I do not insist that interactive preview or smarter approach to guess
>> separator have to be implemented. 

I don't think the interactive preview would be that useful.  For the
smarter approach, I think it will always be somewhat fragile, I don't
think it is worth the addional complexity in Org's code.

> I am willing to work on this problem but before this can you identify
> any other locale with similar problem or a resource from where I can
> information's about locale.
>
> I would also like to hear from the maintainers on what the think about
> the issue and are they willing for an workaround?

There you have it.  Perhaps you can continue to implement the feature
the way you want it, and if it feels complete and predictable, submit
it again?

Thanks,

-- 
 Bastien



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-14 Thread Utkarsh Singh
On 2021-05-13, 00:08 +0700, Maxim Nikulin  wrote:

> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
> that order in which separator candidates are tried should depend on
> active locale.
>
> I do not insist that interactive preview or smarter approach to guess
> separator have to be implemented. Feel free to disregard my
> comments. I am just not sure whether you are aware of limitations for
> noticeable part of users.

I am willing to work on this problem but before this can you identify
any other locale with similar problem or a resource from where I can
information's about locale.

I would also like to hear from the maintainers on what the think about
the issue and are they willing for an workaround?

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-12 Thread Maxim Nikulin

On 11/05/2021 01:36, Utkarsh Singh wrote:

What do we mean by interactive preview?  Does this mean that we should
present a user with a list of possible delimiters using minibuffer?


I mean something like the dialog that LibreOffice shows on opening of a 
csv file. There are various options and table preview that allows to 
check if options are selected correctly before dropping to usual 
spreadsheet interface. I have no idea what is the best way to implement 
something like this in emacs. Likely it is out of scope of the discussed 
patch. I do not know what is your use cases. My intention was to show 
that CSV import could be quite cumbersome.


Some users believe that CSV is a reliable portable format and expect 
support of most features from Excel. Actually there are plenty of 
dialects and no way to determine which one was used (e.g. no header that 
defines field or string separator).



For ex (please review my usage of alist):

#+begin_src elisp
(defvar my-separator-alist '(("comma" . ",")
 ("tab" . "\t")
 ("semicolon" . ";")
 ("colon" . ":")
 ("space" . " ")))

(defun my/table-import (file separator)
   (interactive (list (read-file-name "Import CSV file: " nil nil t)
 (cdr (assoc (completing-read "Separator: " 
my-separator-alist)
 my-separator-alist
   (org-table-import file separator))
#+end_src


With such function it is necessary to open file at first to see what 
separator is used inside.


My comments were related to the "guess" part of your patch. Comma is 
tried first. Consider the following file


A;1,2;3,4;5,6
B;7,8;9,1;11,12
C;13,14;15,16;17,18

Decimal separator is ",". Field separator is ";" but there are plenty of 
"," in each row.


LANG=fr_FR.UTF-8 python3 -c "import locale as l; l.setlocale(l.LC_ALL, 
''); print(l.format_string('%.2f', 123456.789))"

123456,79

Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is 
that order in which separator candidates are tried should depend on 
active locale.


I do not insist that interactive preview or smarter approach to guess 
separator have to be implemented. Feel free to disregard my comments. I 
am just not sure whether you are aware of limitations for noticeable 
part of users.





Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-10 Thread Utkarsh Singh
Hi Maxim,

Sorry for the late reply!

On 2021-04-28, 23:38 +0700, Maxim Nikulin  wrote:

> On 28/04/2021 15:37, Utkarsh Singh wrote:
>> +List of preferred separator (in order of preference):
>> +comma, TAB, semicolon, colon or SPACE.
> I will hardly be using this feature heavily, so I do not insist that
> the following must be taken into account. Just some considerations...
>
> There are locales where comma is used as decimal separator 23,5
> (e.g. ru_RU.UTF-8). Office software and applications oriented to
> office users often use semicolon as field separator. There are still
> may be plenty of numbers with fractional part, so with commas. Likely
> ";" should be tried at first for such locales.
>
> However the same user may have enough CSV files that are really "comma
> separated", e.g. results of numerical simulation where localization is
> intentionally ignored, data obtained from some equipment, etc.
>
> Some files (e.g. downloaded bank statements) may be in legacy 8-bit
> encoding instead of UTF-8.
>
> As a result, sometimes the only convenient way is to try various
> options with interactive preview.

What do we mean by interactive preview?  Does this mean that we should
present a user with a list of possible delimiters using minibuffer?

If you are suggesting this then rather trying 'various options' based on
locals you can use =org-table-import= function directly as it accepts
separator as it's argument.

For ex (please review my usage of alist):

#+begin_src elisp
(defvar my-separator-alist '(("comma" . ",")
 ("tab" . "\t")
 ("semicolon" . ";")
 ("colon" . ":")
 ("space" . " ")))

(defun my/table-import (file separator)
  (interactive (list (read-file-name "Import CSV file: " nil nil t)
 (cdr (assoc (completing-read "Separator: " 
my-separator-alist)
 my-separator-alist
  (org-table-import file separator))
#+end_src

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-04-28 Thread Maxim Nikulin

On 28/04/2021 15:37, Utkarsh Singh wrote:

+List of preferred separator (in order of preference):
+comma, TAB, semicolon, colon or SPACE.
I will hardly be using this feature heavily, so I do not insist that the 
following must be taken into account. Just some considerations...


There are locales where comma is used as decimal separator 23,5 (e.g. 
ru_RU.UTF-8). Office software and applications oriented to office users 
often use semicolon as field separator. There are still may be plenty of 
numbers with fractional part, so with commas. Likely ";" should be tried 
at first for such locales.


However the same user may have enough CSV files that are really "comma 
separated", e.g. results of numerical simulation where localization is 
intentionally ignored, data obtained from some equipment, etc.


Some files (e.g. downloaded bank statements) may be in legacy 8-bit 
encoding instead of UTF-8.


As a result, sometimes the only convenient way is to try various options 
with interactive preview.





bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-04-28 Thread Utkarsh Singh
Hi,

On 2021-04-27, 22:21 +0200, Nicolas Goaziou  wrote:

>> + When using org-table-import interactively if we failed to guess
>> separator then we will be left with a user-error message and an
>> 'unconverted table'.  We can make use of 'temp-buffer' to import our
>> file after successfully conversion.
>
> I'm not sure to understand what you mean.

Note: I will advice you to apply patch no. 2 before trying out the
following example.

1. Download the attached CSV file.  We can call this example.csv

2. Go to *scratch* buffer.

3. Use 'M-x org-table-import' to import example.csv as org-table.

You will see even thought org-table-guess-separator failed in guessing
separator we are still left with unconverted region added to our buffer.

>> + Conversion part of org-table-convert-region make a distinction between
>> '(4) (comma separator) and rest of the separator we should either string
>> version of comma as AND condition or rewrite to simplify it.
>
> Ditto. But it can be the object of another patch. Let's concentrate on
> `org-table-guess-separator' first.
>
>> I am willing to do these possible changes but currently waiting for your
>> review for org-table-guess-separator as there can be more serious bugs
>> lurking around on my code which I am considering base for these
>> changes.
>
> You should definitely write tests for this function. Here's a start:
>
> (ert-deftest test-org-table/guess-separator ()
>   "Test `test-org-table/guess-separator'."
>   ;; Test space separator.
>   (should
>(equal " "
>   (org-test-with-temp-text "a b\nc d"
> (org-table-guess-separator (point-min) (point-max)
>   (should
>(equal " "
>   (org-test-with-temp-text "a b\nc d"
> (org-table-guess-separator (point-min) (point-max)
>   ;; Test "inverted" region.
>   (should
>(equal " "
>   (org-test-with-temp-text "a b\nc d"
> (org-table-guess-separator (point-max) (point-min)
>   ;; Do not error on empty region.
>   (should-not
>(org-test-with-temp-text ""
>  (org-table-guess-separator (point-max) (point-min
>   (should-not
>(org-test-with-temp-text "   \n"
>  (org-table-guess-separator (point-max) (point-min)
>

I will surely do more testing.

I would also like to simplify the condition for guessing SPACE as
separator due to following cases:

+ field1 'this is field2' 'this is field3' :: In this case we still have
SPACE inside quote (' in this case).

+ Since SPACE is our last valid separator I think searching for a line
which doesn't contains space is more than enough.

Required patch:

>From 6b112927de73c43edfd08254217808ebff42772a Mon Sep 17 00:00:00 2001
From: Utkarsh Singh 
Date: Wed, 28 Apr 2021 10:26:46 +0530
Subject: [PATCH 1/3] org-table.el (org-table-import): add yes-and-no prompt

Add a yes and no prompt for files which don't have .txt, .tsv OR .csv
as file extensions.
---
 lisp/org/org-table.el | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index 0e93fb271f..e0b2be6892 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -938,7 +938,8 @@ org-table-import
 - regexp  When a regular expression, use it to match the separator."
   (interactive "f\nP")
   (when (and (called-interactively-p 'any)
-	 (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)))
+	 (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
+ (not (yes-or-no-p "File does not have .txt, .tsv or .csv as extension.  Do you still want to continue? ")))
 (user-error "Cannot import such file"))
   (unless (bolp) (insert "\n"))
   (let ((beg (point))
-- 
2.31.1

>From 9bb017cfc8284075e04faf5496ed560ba48d5bbc Mon Sep 17 00:00:00 2001
From: Utkarsh Singh 
Date: Wed, 28 Apr 2021 10:42:32 +0530
Subject: [PATCH 2/3] org-table.el (org-table-convert-region): move out
 separator-guessing

1. Move separator guessing code to org-table-guess-separator (new
function).
2. Add semicolon, colon and SPACE to the list of know separator
(separator which we can guess).
---
 lisp/org/org-table.el | 49 +--
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index e0b2be6892..295f7a9b90 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -846,6 +846,39 @@ org-table-create
   (goto-char pos))
 (org-table-align)))
 
+(defun org-table-guess-separator (beg0 end0)
+  "Guess separator for region BEG0 to END0.
+
+List of preferred separator (in order of preference):
+comma, TAB, semicolon, colon or SPACE.
+
+Search for a line which doesn't contain a separator if found
+search again using next preferred separator or else return
+separator as string."
+  (let* ((beg (save-excursion
+(goto-char (min beg0 end0))
+

bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-04-27 Thread Nicolas Goaziou
Hello,

Utkarsh Singh  writes:

> I am attaching my patch which also include my previous suggestion of
> including yes-or-no prompt to org-table-import to allow file which don't
> have csv, tsv or txt as extension.  

I suggest to make several patches. Do not try to stuff as many changes
as possible in a single one, please.

> + When using org-table-import interactively if we failed to guess
> separator then we will be left with a user-error message and an
> 'unconverted table'.  We can make use of 'temp-buffer' to import our
> file after successfully conversion.

I'm not sure to understand what you mean.

> + Conversion part of org-table-convert-region make a distinction between
> '(4) (comma separator) and rest of the separator we should either string
> version of comma as AND condition or rewrite to simplify it.

Ditto. But it can be the object of another patch. Let's concentrate on
`org-table-guess-separator' first.

> I am willing to do these possible changes but currently waiting for your
> review for org-table-guess-separator as there can be more serious bugs
> lurking around on my code which I am considering base for these
> changes.

You should definitely write tests for this function. Here's a start:

(ert-deftest test-org-table/guess-separator ()
  "Test `test-org-table/guess-separator'."
  ;; Test space separator.
  (should
   (equal " "
  (org-test-with-temp-text "a b\nc d"
(org-table-guess-separator (point-min) (point-max)
  (should
   (equal " "
  (org-test-with-temp-text "a b\nc d"
(org-table-guess-separator (point-min) (point-max)
  ;; Test "inverted" region.
  (should
   (equal " "
  (org-test-with-temp-text "a b\nc d"
(org-table-guess-separator (point-max) (point-min)
  ;; Do not error on empty region.
  (should-not
   (org-test-with-temp-text ""
 (org-table-guess-separator (point-max) (point-min
  (should-not
   (org-test-with-temp-text "   \n"
 (org-table-guess-separator (point-max) (point-min)

> +  (end (save-excursion
> +(goto-char (max beg end0))

This should be beg0 instead of beg above.

> + (sep-regexp '((","  (rx bol (1+ (not (or ?\n ?,))) eol))
> +("\t" (rx bol (1+ (not (or ?\n ?\t))) eol))
> +(";"  (rx bol (1+ (not (or ?\n ?\;))) eol))
> +(":"  (rx bol (1+ (not (or ?\n ?:))) eol))
> +(" "  (rx bol (1+ (not (or ?' ?\" ))
> + (not (or ?\s ?\;))
> + (not (or ?' ?\"))) eol

Use
 (sep-regexp
  (list (list ","  (rx bol (1+ (not (or ?\n ?,))) eol))
(list "\t" (rx bol (1+ (not (or ?\n ?\t))) eol))
(list ";"  (rx bol (1+ (not (or ?\n ?\;))) eol))
(list ":"  (rx bol (1+ (not (or ?\n ?:))) eol))
(list " "  (rx bol (1+ (not (or ?' ?\" ))
   (not (or ?\s ?\;))
   (not (or ?' ?\")))
   eol

so you don't need eval below, and rx forms become constants when
compiled.

> + sep)

This `sep' binding can be removed.

> +(unless (= beg end)
> +  (save-excursion
> +(goto-char beg)
> +(catch :found
> +  (pcase-dolist (`(,sep ,regexp) sep-regexp)
> +(save-excursion
> +  (unless (re-search-forward (eval regexp) end t)

You can drop the `eval'.

>(when (and (called-interactively-p 'any)
> -  (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)))
> +  (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
> + (not (yes-or-no-p "File does not havs .txt .txt .csv as 
> extension.  Do you still want to continue? ")))

"does not have" and ".txt" -> ".tsv" I guess.

Also please provide a patch with a commit message, possibly using `git
format-patch'.

Thanks!

Regards,
-- 
Nicolas Goaziou