Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-05 Thread Utkarsh Singh
On 2021-06-05, 19:40 +0700, Maxim Nikulin  wrote:

> This line should be added to org-table.el, but notice "rx ‘not’ syntax 
> error: (or ". This logical operators have a bit different meaning for 
> regular expression.
>
> Nicolas suggested you to use `list' to avoid `eval'. Certainly `eval' 
> should be avoided whenever possible, but there is alternative with "`" 
> and "," instead of "'". Though I often prefer `list'.

Done.

>> Can you also share test.org so that I can test locally?
>
> I can reproduce it with an empty org file. Emacs-26.3. So I am curious 
> if your tested your patch at all.

Yes you are right I was not testing my patches using 'emacs -Q' and I am
apologetic about it.  I just "tested" most of them by M-x eval-buffer on
the patched file and then checked the output of `with-temp-buffer'.

>> Now on the topic of CSV parser, I have succesfully implemented a CSV
>> parser in less than 65 LOC which also preserves newline character but I
>> am facing a dilemma on how should I represent it as a Org table.  For
>> ex:
> ...
>> 6,"Cell with new
>> Line",6.28
>
> Due to
> https://orgmode.org/worg/org-faq.html#Tables
> "Will there ever be support for multiple lines in a table field? No."
> (there are may be some tricks in Org manual) any variant is acceptable, 
> even current one. Since during export split cell is considered as 
> different rows, it is responsibility of the user to preprocess original 
> file and import it in a form suitable for Org.
>
> However independently of chosen variant, it would be great to get list 
> of warnings with ambiguous lines similar to compiler errors. Such 
> feature is not implemented in Org yet, the most close is `org-lint' 
> result (its behavior differs from compiler errors).

After reading FAQ about multiple lines in table field I don't think this
PATCH makes much sense as my main driving force for this patch was to
simplify regular expression in `org-table-convert-region' and somehow
add newlines support.

I also think `org-table-convert-region' is doing a fine job and I should
keep my "I want to contribute" habit in check.

> I expect complexity O(number of lines in result). I do not like 
> excessive usage of `dolist'. I would try slicing subrows using `mapcar' 
> while there is at least one non-nil element.

Yes! Currently I am trying to sequentially read Elisp manual and I have
completed the section on 'Mapping Function' yesterday.  I think it's a
really nice language feature that I can make use of.

> P.S. Have you read "Structure and Interpretation of Computer Programs" 
> by Abelson, Sussman, and Sussman?
> https://sarabander.github.io/sicp/html/index.xhtml
> or https://mitpress.mit.edu/sites/default/files/sicp/index.html
> While reading, it is necessary to realize that elisp is not scheme, tail 
> recursion should be avoided in elisp and there are over differences.

Thank you! I have seen most of the decision on Lisp includes SICP but I
am currently stopping myself to Elisp manual to learn something useful
about this timeless editor and learn thing in a step-by-step manner.

I also tried Racket after reading:

https://dustycloud.org/blog/racket-is-an-acceptable-python
https://beautifulracket.com/

But stopped myself for the reason same as above.

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-03 Thread Utkarsh Singh
On 2021-06-02, 23:44 +0700, Maxim Nikulin  wrote:

> On 02/06/2021 22:08, Utkarsh Singh wrote:
>> ;;;###autoload
>>   (defun org-table-import (file separator)
>> @@ -955,12 +971,13 @@ lines.  It can have the following values:
>>   - integer When a number, use that many spaces, or a TAB, as field 
>> separator.
>>   - regexp  When a regular expression, use it to match the separator."
>> (interactive (list (read-file-name "Import file: ")
>> - (prefix-numeric-value current-prefix-arg)))
>> + current-prefix-arg))
>
> It seems, prefix argument works now. Let me remind that file name 
> completion was working better before your change.
>
> I have noticed a couple of error messages. Unsure what is going wrong, I 
> hope, command to launch emacs is correct (-Q -L ~/src/org-mode/lisp 
> test.org).
>
> At startup:
> Eager macro-expansion failure: (error "rx ‘not’ syntax error: (or 10 44)")
>
> In response to M-x org-table-import:
> rx-check-not: rx ‘not’ syntax error: (or 10 44)
>

Is `rx' library loading correctly? If no then you can try:

diff --git a/lisp/org-table.el b/lisp/org-table.el
index 20144c91b..9278f91bb 100644
--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -38,6 +38,7 @@
 (require 'org-macs)
 (require 'org-compat)
 (require 'org-keys)
+(require 'rx)

Can you also share test.org so that I can test locally?

>> Currently I am trying to refactor CSV parsing by applying techniques
>> used in pcsv library (MELPA package) which I think you will also enjoy
>> to play with!
>
> I do not know opinion of Org maintainers. Personally I believe that org 
> can take advantage of Emacs core features or of other packages if they 
> are available and fallback to minimal implementation otherwise. Unsure 
> whether borrowing code from pcsv can cause license issues. Current CSV 
> parser is not perfect but it works reasonably well.

Yes, I think you are right about how Org should make use of existing
feature rather than including everyting with itself but for now I am
considering this as an opportunity to learn how Org and Emacs work and
will leave it upto the maintainers if they find anything useful to be
including into Org itself.

Now on the topic of CSV parser, I have succesfully implemented a CSV
parser in less than 65 LOC which also preserves newline character but I
am facing a dilemma on how should I represent it as a Org table.  For
ex:

Row with newline as CSV data:

6,"Cell with new
Line",6.28

Row with newline as Lisp data:

("6" "Cell with new\nLine" "6.28")

Row with newline a Org table:

| 6 | Cell with new  |   6.28 |
|   | Line   ||

Rows in Org mode are separated using a newline-character which makes
"Cell with new" and "Line" as different cells which is different from
how Libreoffice Calc represents it.  Libreoffice Calc has a much
powerfull representation of a cell which is out of scope for any plain
text tables.  Possible solutions for this problem are:

1. Replace newline with a space character.  What will happen when
imported line as a really long line?

2. Represent every newline as distinct cell.  I was trying to implement
this method but the algorithm that I came up with requires time
complexity of O(n^2) to just print each row which is too much for large
CSV files.

(defun org-table--print-row (row)
  (let ((maxlines 1)
list1 list2 tmp)
;; get maxlines
(dolist (i row)
  (if (not (stringp i))
  (push (list i) list1)
(push (string-lines i) list1)
(setq tmp (length (car list1)))
(when (< maxlines tmp)
  (setq maxlines tmp
;; normalize row
(dolist (i list1)
  (setq tmp (length i))
  (when (< tmp maxlines)
    (setq i (append i (make-list (- maxlines tmp) ""
  (push i list2))
;; print row
(dolist (i list2)
  (setq tmp (point))
  (dolist (j i) <-- Printing is still not complete

Thank you,
Utkarsh Singh

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-02 Thread Utkarsh Singh
On 2021-06-02, 19:06 +0700, Maxim Nikulin  wrote:

> Notice that for "C-u M-x org-table-import" heuristics is not necessary, 
> the separator is specified explicitly. I see that your intention was to 
> improve user interface of org-table-import, but actually you broke it by 
> the "org-table-import: add file prompt" patch. I have not tried it, but 
> my expectation is that user prompt can be customized while keeping all 
> other things working. Maybe the docstring should be updated to make it 
> more friendly to novices (like me while reviewing your patch).

Thanks for pointing out the error, locally you can fix it by (patches
for patches):

;;;###autoload
 (defun org-table-import (file separator)
@@ -955,12 +971,13 @@ lines.  It can have the following values:
 - integer When a number, use that many spaces, or a TAB, as field separator.
 - regexp  When a regular expression, use it to match the separator."
   (interactive (list (read-file-name "Import file: ")
- (prefix-numeric-value current-prefix-arg)))
+ current-prefix-arg))

Currently I am trying to refactor CSV parsing by applying techniques
used in pcsv library (MELPA package) which I think you will also enjoy
to play with!

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-06-01 Thread Utkarsh Singh
Hi Maxim,

First of all I would like to thank you for testing out patches and
taking time to write a detailed report about it.

On 2021-06-01, 23:23 +0700, Maxim Nikulin  wrote:

> On 17/05/2021 12:29, Bastien wrote:
>> Utkarsh Singh  writes:
>>> For now can you review the patches I proposed earlier in this
>>> thread?
>> 
>> Not until both you and Maxim are confident this is useful, complete
>> and predictable.
>
> I have too many points to object to consider my opinion as objective.
>
> Org 9.4.5+patches M-x org-table-import
>
> | 1,Word,66.3e-35 ||  |   |
> | 2,Unquoted  | cell,2.7   |  |   |
> | 3,"Quoted   | cell",3.14 |  |   |
> | 4,"Cell | ""with | quotes""",2021-06-01 |   |
> | 5,"Next | cell   | is   | empty","" |
> | 6,"Cell | with   | new  |   |
> | Line",6.28  ||  |   |
>
> So my personal conclusion is that CSV file is imported incorrectly in 
> both cases: with guessed separator and with explicitly requested through 
> prefix argument. Completion works a bit worse too.

Currently `org-table-guess-separator' returns "," (COMMA as string) and
`org-table-covert-region' uses '(4) to represent COMMA as separator
which is the main cause of breakdown in importing.

To make importing work well we have to:

+ Guess right separator (`org-table-guess-separator')
+ Parse CSV with this separator (`org-table-covert-region')

As far as I can tell "guessing part" works well and now we just have to
make parser work well with new separators.

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: [PATCH] org-table.el: Fix usage of user-error

2021-05-31 Thread Utkarsh Singh


Ping!

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: [PATCH] org-table.el: Fix usage of user-error

2021-05-23 Thread Utkarsh Singh
On 2021-05-23, 14:24 +0530, Utkarsh Singh  wrote:

Hi,

This patch tries to fix usage of `user-error' in
`org-table-convert-region' which is currently _not_ taking benefit of
non-local exit of `user-error' function and using `if' to carry out it's
operations.

Please note that this is just an aesthetic improvement.

Thank you,
Utkarsh Singh

-- 
Utkarsh Singh
http://utkarshsingh.xyz



[PATCH] org-table.el: Fix usage of user-error

2021-05-23 Thread Utkarsh Singh
>From 96b1a0b6095ecefed0f7103ea1e08e325452d3a6 Mon Sep 17 00:00:00 2001
From: Utkarsh Singh 
Date: Sun, 23 May 2021 13:48:33 +0530
Subject: [PATCH] org-table.el: Fix usage of user-error

* lisp/org-table.el (org-table-convert-region): Don't use `if' because
  Elisp has no concept of continuable errors.
---
 lisp/org-table.el | 88 +++
 1 file changed, 44 insertions(+), 44 deletions(-)

diff --git a/lisp/org-table.el b/lisp/org-table.el
index cc69542f9..3e5563573 100644
--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -861,52 +861,52 @@ nil  When nil, the command tries to be smart and figure out the
   (let* ((beg (min beg0 end0))
 	 (end (max beg0 end0))
 	 re)
-(if (> (count-lines beg end) org-table-convert-region-max-lines)
-	(user-error "Region is longer than `org-table-convert-region-max-lines' (%s) lines; not converting"
-		org-table-convert-region-max-lines)
-  (when (equal separator '(64))
-	(setq separator (read-regexp "Regexp for field separator")))
-  (goto-char beg)
-  (beginning-of-line 1)
-  (setq beg (point-marker))
-  (goto-char end)
-  (if (bolp) (backward-char 1) (end-of-line 1))
-  (setq end (point-marker))
-  ;; Get the right field separator
-  (unless separator
-	(goto-char beg)
-	(setq separator
-	  (cond
-	   ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
-	   ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
-	   (t 1
+(when (> (count-lines beg end) org-table-convert-region-max-lines)
+  (user-error "Region is longer than `org-table-convert-region-max-lines' (%s) lines; not converting"
+		  org-table-convert-region-max-lines))
+(when (equal separator '(64))
+  (setq separator (read-regexp "Regexp for field separator")))
+(goto-char beg)
+(beginning-of-line 1)
+(setq beg (point-marker))
+(goto-char end)
+(if (bolp) (backward-char 1) (end-of-line 1))
+(setq end (point-marker))
+;; Get the right field separator
+(unless separator
   (goto-char beg)
-  (if (equal separator '(4))
-	  (while (< (point) end)
-	;; parse the csv stuff
+  (setq separator
 	(cond
-	 ((looking-at "^") (insert "| "))
-	 ((looking-at "[ \t]*$") (replace-match " |") (beginning-of-line 2))
-	 ((looking-at "[ \t]*\"\\([^\"\n]*\\)\"")
-	  (replace-match "\\1")
-	  (if (looking-at "\"") (insert "\"")))
-	 ((looking-at "[^,\n]+") (goto-char (match-end 0)))
-	 ((looking-at "[ \t]*,") (replace-match " | "))
-	 (t (beginning-of-line 2
-	(setq re (cond
-		  ((equal separator '(4)) "^\\|\"?[ \t]*,[ \t]*\"?")
-		  ((equal separator '(16)) "^\\|\t")
-		  ((integerp separator)
-		   (if (< separator 1)
-		   (user-error "Number of spaces in separator must be >= 1")
-		 (format "^ *\\| *\t *\\| \\{%d,\\}" separator)))
-		  ((stringp separator)
-		   (format "^ *\\|%s" separator))
-		  (t (error "This should not happen"
-	(while (re-search-forward re end t)
-	  (replace-match "| " t t)))
-  (goto-char beg)
-  (org-table-align
+	 ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
+	 ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
+	 (t 1
+(goto-char beg)
+(if (equal separator '(4))
+	(while (< (point) end)
+	  ;; parse the csv stuff
+	  (cond
+	   ((looking-at "^") (insert "| "))
+	   ((looking-at "[ \t]*$") (replace-match " |") (beginning-of-line 2))
+	   ((looking-at "[ \t]*\"\\([^\"\n]*\\)\"")
+	(replace-match "\\1")
+	(if (looking-at "\"") (insert "\"")))
+	   ((looking-at "[^,\n]+") (goto-char (match-end 0)))
+	   ((looking-at "[ \t]*,") (replace-match " | "))
+	   (t (beginning-of-line 2
+  (setq re (cond
+		((equal separator '(4)) "^\\|\"?[ \t]*,[ \t]*\"?")
+		((equal separator '(16)) "^\\|\t")
+		((integerp separator)
+		 (if (< separator 1)
+		 (user-error "Number of spaces in separator must be >= 1")
+		   (format "^ *\\| *\t *\\| \\{%d,\\}" separator)))
+		((stringp separator)
+		 (format "^ *\\|%s" separator))
+		(t (error "This should not happen"
+  (while (re-search-forward re end t)
+	(replace-match "| " t t)))
+(goto-char beg)
+(org-table-align)))
 
 ;;;###autoload
 (defun org-table-import (file separator)
-- 
2.31.1

-- 
Utkarsh Singh
http://utkarshsingh.xyz


Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-18 Thread Utkarsh Singh
On 2021-05-18, 19:31 +0700, Maxim Nikulin  wrote:

> The question may be risen in emacs-devel but I am unsure if I will 
> participate in discussion.

Why?

>> Can you test this function:
>> 
>> (defun org-table--comma-as-decimal-sep ()
>>"Return nil or 2 if separator is dot or comma respectively."
>>(string-search "," (format "%f" 10)))
>
> No, it does not work. `format' always uses dot. It is reasonable when 
> e.g. during writing a config file or during data exchange when locales 
> must be ignored.
>
> I was too optimistic. I did not expect that support of locales are so 
> poor in Emcacs. I do not see any traces of localeconv(3) in sources that 
> would allow to get value of decimal_point directly.
>
> Numbers are forced to use "C" locale and I have not noticed any way to 
> override it. Initial settings:
>
> http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n1490
>
> http://git.savannah.gnu.org/cgit/emacs.git/tree/src/emacs.c#n2861
>
> setlocale (LC_NUMERIC, "C");
>

Hmm, so this means that Elisp cannot read something like this '10,1' as
floating point number.

>> To test I am using: $ LANG=de_DE.UTF-8 emacs -Q
>> 
>> But I am getting this as warning:
>> (process:1787): Gtk-WARNING **: 15:40:49.375: Locale not supported by C 
>> library.
>>  Using the fallback 'C' locale.
>
> You get this error due to you have not generated this locale. On debian 
> & ubuntu
>
> dpkg-reconfigure locales
>
> allows to select desired locales and performs all necessary actions.

Thanks!  I have fixed it now.

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-18 Thread Utkarsh Singh
On 2021-05-16, 23:24 +0700, Maxim Nikulin  wrote:

> On 14/05/2021 21:54, Utkarsh Singh wrote:
>> On 2021-05-13, 00:08 +0700, Maxim Nikulin wrote:
>> 
>>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>>> that order in which separator candidates are tried should depend on
>>> active locale.
>>>
>> I am willing to work on this problem but before this can you identify
>> any other locale with similar problem or a resource from where I can
>> information's about locale.
>
> I do not think list of locales should be hard coded. I am not familiar 
> with elisp enough to tell you if locale properties such as decimal 
> separator are exposed through some function. I expect, it is quite 
> probable. As a last measure, some number, e.g. 0.5 or 1.5 could be 
> formatted using a locale-aware function and result string could be 
> checked if it contains ",".

Can you test this function:

(defun org-table--comma-as-decimal-sep ()
  "Return nil or 2 if separator is dot or comma respectively."
  (string-search "," (format "%f" 10)))

To test I am using:
$ LANG=de_DE.UTF-8 emacs -Q

But I am getting this as warning:
(process:1787): Gtk-WARNING **: 15:40:49.375: Locale not supported by C library.
Using the fallback 'C' locale.
-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-17 Thread Utkarsh Singh
On 2021-05-17, 07:29 +0200, Bastien  wrote:

> Hi Utkarsh,
>
> Utkarsh Singh  writes:
>
>> For now can you review the patches I proposed earlier in this
>> thread?
>
> Not until both you and Maxim are confident this is useful, complete
> and predictable.

Ok.

> Also, if you resend the patch for review, please use a proper commit
> message: https://orgmode.org/worg/org-contribute.html#commit-messages

Thanks for the guidance.  I was just trying to fit commit message in 50
words but next time I will keep this in mind.

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-17 Thread Utkarsh Singh
On 2021-05-16, 23:24 +0700, Maxim Nikulin  wrote:

> On 14/05/2021 21:54, Utkarsh Singh wrote:
>> On 2021-05-13, 00:08 +0700, Maxim Nikulin wrote:
>> 
>>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>>> that order in which separator candidates are tried should depend on
>>> active locale.
>>>
>> I am willing to work on this problem but before this can you identify
>> any other locale with similar problem or a resource from where I can
>> information's about locale.
>
> I do not think list of locales should be hard coded. I am not familiar 
> with elisp enough to tell you if locale properties such as decimal 
> separator are exposed through some function. I expect, it is quite 
> probable. As a last measure, some number, e.g. 0.5 or 1.5 could be 
> formatted using a locale-aware function and result string could be 
> checked if it contains ",".

I will go through the Elisp manual to find solution for this problem.
If you now any other better resource fell free to mail me.

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-15 Thread Utkarsh Singh
On 2021-05-15, 12:30 +0200, Bastien  wrote:

> Hi Utkarsh,
>
> Utkarsh Singh  writes:
>
>> For now can you review the patches I proposed earlier in this
>> thread?
>
> Do these patches provide a complete and predictable solution?
> If so, can you merge them into a single patch against master?
>

Here are the patches:

I have separated out 'adding of prompt' to patch2 as it doesn't
have to do anything with separator guessing part.

>From 659e5e95bd8e024ac4730cf410a565b9e975b669 Mon Sep 17 00:00:00 2001
From: Utkarsh Singh 
Date: Sat, 15 May 2021 16:30:36 +0530
Subject: [PATCH 1/2] org-table-convert-region: move out separator-guessing

1. Move separator guessing code to org-table-guess-separator (new
function).
2. Add semicolon, colon and SPACE to the list of know separator
(separator which we can guess).
---
 lisp/org-table.el | 49 ---
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/lisp/org-table.el b/lisp/org-table.el
index cc69542..d3242da 100644
--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -837,6 +837,39 @@ SIZE is a string Columns x Rows like for example \"3x2\"."
   (goto-char pos))
 (org-table-align)))
 
+(defun org-table-guess-separator (beg0 end0)
+  "Guess separator for region BEG0 to END0.
+
+List of preferred separator (in order of preference):
+comma, TAB, semicolon, colon or SPACE.
+
+Search for a line which doesn't contain a separator if found
+search again using next preferred separator or else return
+separator as string."
+  (let* ((beg (save-excursion
+(goto-char (min beg0 end0))
+(skip-chars-forward " \t\n")
+(if (eobp) (point) (line-beginning-position
+	 (end (save-excursion
+(goto-char (max beg0 end0))
+(skip-chars-backward " \t\n" beg)
+(if (= beg (point)) (point) (line-end-position
+ (sep-regexp
+  (list (list ","  (rx bol (1+ (not (or ?\n ?,))) eol))
+		(list "\t" (rx bol (1+ (not (or ?\n ?\t))) eol))
+		(list ";"  (rx bol (1+ (not (or ?\n ?\;))) eol))
+		(list ":"  (rx bol (1+ (not (or ?\n ?:))) eol))
+		(list " "  (rx bol (1+ (not (or ?\n ?\s))) eol)
+(unless (= beg end)
+  (save-excursion
+(goto-char beg)
+(catch :found
+  (pcase-dolist (`(,sep ,regexp) sep-regexp)
+(save-excursion
+  (unless (re-search-forward regexp end t)
+(throw :found sep
+  nil)
+
 ;;;###autoload
 (defun org-table-convert-region (beg0 end0  separator)
   "Convert region to a table.
@@ -853,10 +886,7 @@ following values:
 integer  When a number, use that many spaces, or a TAB, as field separator
 regexp   When a regular expression, use it to match the separator
 nil  When nil, the command tries to be smart and figure out the
- separator in the following way:
- - when each line contains a TAB, assume TAB-separated material
- - when each line contains a comma, assume CSV material
- - else, assume one or more SPACE characters as separator."
+ separator using `org-table-guess-seperator'."
   (interactive "r\nP")
   (let* ((beg (min beg0 end0))
 	 (end (max beg0 end0))
@@ -873,13 +903,10 @@ nil  When nil, the command tries to be smart and figure out the
   (if (bolp) (backward-char 1) (end-of-line 1))
   (setq end (point-marker))
   ;; Get the right field separator
-  (unless separator
-	(goto-char beg)
-	(setq separator
-	  (cond
-	   ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
-	   ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
-	   (t 1
+  (when (and (not separator)
+ (not (setq separator
+(org-table-guess-separator beg end
+(user-error "Failed to guess separator"))
   (goto-char beg)
   (if (equal separator '(4))
 	  (while (< (point) end)
-- 
2.31.1

>From 328d59dd2d2fc797e5819f43b40c96ffa45cd62f Mon Sep 17 00:00:00 2001
From: Utkarsh Singh 
Date: Sat, 15 May 2021 16:32:32 +0530
Subject: [PATCH 2/2] org-table-import: add file prompt

---
 lisp/org-table.el | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lisp/org-table.el b/lisp/org-table.el
index d3242da..20144c9 100644
--- a/lisp/org-table.el
+++ b/lisp/org-table.el
@@ -954,7 +954,8 @@ lines.  It can have the following values:
 - (64)Prompt for a regular expression as field separator.
 - integer When a number, use that many spaces, or a TAB, as field separator.
 - regexp  When a regular expression, use it to match the separator."
-  (interactive "f\nP")
+  (interactive (list (read-file-name "Import file: ")
+ (prefix-numeric-value curre

Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-15 Thread Utkarsh Singh
On 2021-05-15, 11:13 +0200, Bastien  wrote:

> Hi Utkarsh and Maxim,
>
> Utkarsh Singh  writes:
>
>> On 2021-05-13, 00:08 +0700, Maxim Nikulin  wrote:
>>
>>> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
>>> that order in which separator candidates are tried should depend on
>>> active locale.
>>>
>>> I do not insist that interactive preview or smarter approach to guess
>>> separator have to be implemented. 
>
> I don't think the interactive preview would be that useful.  For the
> smarter approach, I think it will always be somewhat fragile, I don't
> think it is worth the addional complexity in Org's code.

Ok.

For now can you review the patches I proposed earlier in this thread?

Thank you,
Utkarsh Singh

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-14 Thread Utkarsh Singh
On 2021-05-13, 00:08 +0700, Maxim Nikulin  wrote:

> Comma is decimal separator for es_ES, de_DE, ru_RU, etc. The point is
> that order in which separator candidates are tried should depend on
> active locale.
>
> I do not insist that interactive preview or smarter approach to guess
> separator have to be implemented. Feel free to disregard my
> comments. I am just not sure whether you are aware of limitations for
> noticeable part of users.

I am willing to work on this problem but before this can you identify
any other locale with similar problem or a resource from where I can
information's about locale.

I would also like to hear from the maintainers on what the think about
the issue and are they willing for an workaround?

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-05-10 Thread Utkarsh Singh
Hi Maxim,

Sorry for the late reply!

On 2021-04-28, 23:38 +0700, Maxim Nikulin  wrote:

> On 28/04/2021 15:37, Utkarsh Singh wrote:
>> +List of preferred separator (in order of preference):
>> +comma, TAB, semicolon, colon or SPACE.
> I will hardly be using this feature heavily, so I do not insist that
> the following must be taken into account. Just some considerations...
>
> There are locales where comma is used as decimal separator 23,5
> (e.g. ru_RU.UTF-8). Office software and applications oriented to
> office users often use semicolon as field separator. There are still
> may be plenty of numbers with fractional part, so with commas. Likely
> ";" should be tried at first for such locales.
>
> However the same user may have enough CSV files that are really "comma
> separated", e.g. results of numerical simulation where localization is
> intentionally ignored, data obtained from some equipment, etc.
>
> Some files (e.g. downloaded bank statements) may be in legacy 8-bit
> encoding instead of UTF-8.
>
> As a result, sometimes the only convenient way is to try various
> options with interactive preview.

What do we mean by interactive preview?  Does this mean that we should
present a user with a list of possible delimiters using minibuffer?

If you are suggesting this then rather trying 'various options' based on
locals you can use =org-table-import= function directly as it accepts
separator as it's argument.

For ex (please review my usage of alist):

#+begin_src elisp
(defvar my-separator-alist '(("comma" . ",")
 ("tab" . "\t")
 ("semicolon" . ";")
 ("colon" . ":")
 ("space" . " ")))

(defun my/table-import (file separator)
  (interactive (list (read-file-name "Import CSV file: " nil nil t)
     (cdr (assoc (completing-read "Separator: " 
my-separator-alist)
 my-separator-alist
  (org-table-import file separator))
#+end_src

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47820: [PATCH] org-table-import: Give option to import interactively even if filename doesn't match

2021-05-01 Thread Utkarsh Singh
On 2021-05-01, 18:06 +0200, Bastien  wrote:

> I let you and Nicolas move forward on separators guessing, because so
> far I'm not really convinved Org should try to be too clever here.

Yes! Nicolas has also convinced me on that front.

There was some misunderstanding on my side as I was considering Elisp
(my first Lisp) to be a general purpose language and Org tables it's CSV
library.  Even though due to the very nature of Lisp most of this
possible but Org is not a right place for this.

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: bug#47820: [PATCH] org-table-import: Give option to import interactively even if filename doesn't match

2021-05-01 Thread Utkarsh Singh
Hi Bastien,

On 2021-05-01, 10:51 +0200, Bastien  wrote:

> Hi Utkarsh,
>
> Utkarsh Singh  writes:
>
>> Can we give small flexibility in choosing the filename interactively for
>> `org-table-import'?  Currently org-table-import will just throw an error
>> when file name doesn't have .txt, .csv or .tsv as extension.
>
> Applied in master as commit 7c99d1555, thanks.
>
> Please see the commit changelog for future patches:
> https://code.orgmode.org/bzg/org-mode/commit/7c99d1555
>
> Also https://orgmode.org/worg/org-contribute.html#commit-messages give
> more detailed directions.

Actually, I proposed a revised patch in the following tread (which also
include more patches building up on this patch):
https://orgmode.org/list/87zgxpwqa7@gmail.com/T/#m38de818718a71f670fbc21b7abb8c789ef485181

Btw, thank you for this and I will try to keep original tread updated
from next time.

-- 
Utkarsh Singh
http://utkarshsingh.xyz



bug#47885: [PATCH] org-table-import: Make it more smarter for interactive use

2021-04-28 Thread Utkarsh Singh
Hi,

On 2021-04-27, 22:21 +0200, Nicolas Goaziou  wrote:

>> + When using org-table-import interactively if we failed to guess
>> separator then we will be left with a user-error message and an
>> 'unconverted table'.  We can make use of 'temp-buffer' to import our
>> file after successfully conversion.
>
> I'm not sure to understand what you mean.

Note: I will advice you to apply patch no. 2 before trying out the
following example.

1. Download the attached CSV file.  We can call this example.csv

2. Go to *scratch* buffer.

3. Use 'M-x org-table-import' to import example.csv as org-table.

You will see even thought org-table-guess-separator failed in guessing
separator we are still left with unconverted region added to our buffer.

>> + Conversion part of org-table-convert-region make a distinction between
>> '(4) (comma separator) and rest of the separator we should either string
>> version of comma as AND condition or rewrite to simplify it.
>
> Ditto. But it can be the object of another patch. Let's concentrate on
> `org-table-guess-separator' first.
>
>> I am willing to do these possible changes but currently waiting for your
>> review for org-table-guess-separator as there can be more serious bugs
>> lurking around on my code which I am considering base for these
>> changes.
>
> You should definitely write tests for this function. Here's a start:
>
> (ert-deftest test-org-table/guess-separator ()
>   "Test `test-org-table/guess-separator'."
>   ;; Test space separator.
>   (should
>(equal " "
>   (org-test-with-temp-text "a b\nc d"
> (org-table-guess-separator (point-min) (point-max)
>   (should
>(equal " "
>   (org-test-with-temp-text "a b\nc d"
> (org-table-guess-separator (point-min) (point-max)
>   ;; Test "inverted" region.
>   (should
>(equal " "
>   (org-test-with-temp-text "a b\nc d"
> (org-table-guess-separator (point-max) (point-min)
>   ;; Do not error on empty region.
>   (should-not
>(org-test-with-temp-text ""
>  (org-table-guess-separator (point-max) (point-min
>   (should-not
>(org-test-with-temp-text "   \n"
>  (org-table-guess-separator (point-max) (point-min)
>

I will surely do more testing.

I would also like to simplify the condition for guessing SPACE as
separator due to following cases:

+ field1 'this is field2' 'this is field3' :: In this case we still have
SPACE inside quote (' in this case).

+ Since SPACE is our last valid separator I think searching for a line
which doesn't contains space is more than enough.

Required patch:

>From 6b112927de73c43edfd08254217808ebff42772a Mon Sep 17 00:00:00 2001
From: Utkarsh Singh 
Date: Wed, 28 Apr 2021 10:26:46 +0530
Subject: [PATCH 1/3] org-table.el (org-table-import): add yes-and-no prompt

Add a yes and no prompt for files which don't have .txt, .tsv OR .csv
as file extensions.
---
 lisp/org/org-table.el | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index 0e93fb271f..e0b2be6892 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -938,7 +938,8 @@ org-table-import
 - regexp  When a regular expression, use it to match the separator."
   (interactive "f\nP")
   (when (and (called-interactively-p 'any)
-	 (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)))
+	 (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
+ (not (yes-or-no-p "File does not have .txt, .tsv or .csv as extension.  Do you still want to continue? ")))
 (user-error "Cannot import such file"))
   (unless (bolp) (insert "\n"))
   (let ((beg (point))
-- 
2.31.1

>From 9bb017cfc8284075e04faf5496ed560ba48d5bbc Mon Sep 17 00:00:00 2001
From: Utkarsh Singh 
Date: Wed, 28 Apr 2021 10:42:32 +0530
Subject: [PATCH 2/3] org-table.el (org-table-convert-region): move out
 separator-guessing

1. Move separator guessing code to org-table-guess-separator (new
function).
2. Add semicolon, colon and SPACE to the list of know separator
(separator which we can guess).
---
 lisp/org/org-table.el | 49 +--
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index e0b2be6892..295f7a9b90 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -846,6 +846,39 @@ org-table-create
   (goto-char pos))
 (org-table-align)))
 
+(defun org-table-guess-separator (beg0 end

Re: [PATCH] org-table-import: Make it more smarter for interactive use

2021-04-22 Thread Utkarsh Singh
 beg)
   (if (equal separator '(4))
-	  (while (< (point) end)
+  (while (< (point) end)
 	;; parse the csv stuff
 	(cond
 	 ((looking-at "^") (insert "| "))
@@ -905,7 +936,7 @@ org-table-convert-region
 	(setq re (cond
 		  ((equal separator '(4)) "^\\|\"?[ \t]*,[ \t]*\"?")
 		  ((equal separator '(16)) "^\\|\t")
-		  ((integerp separator)
+	  ((integerp separator)
 		   (if (< separator 1)
 		   (user-error "Number of spaces in separator must be >= 1")
 		 (format "^ *\\| *\t *\\| \\{%d,\\}" separator)))
@@ -921,12 +952,8 @@ org-table-convert-region
 (defun org-table-import (file separator)
   "Import FILE as a table.
 
-The command tries to be smart and figure out the separator in the
-following way:
-
-- when each line contains a TAB, assume TAB-separated material;
-- when each line contains a comma, assume CSV material;
-- else, assume one or more SPACE characters as separator.
+The command tries to be smart and figure out the separator using
+`org-table-guess-seperator'.
 
 When non-nil, SEPARATOR specifies the field separator in the
 lines.  It can have the following values:
@@ -938,7 +965,8 @@ org-table-import
 - regexp  When a regular expression, use it to match the separator."
   (interactive "f\nP")
   (when (and (called-interactively-p 'any)
-	 (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)))
+	 (not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
+ (not (yes-or-no-p "File does not havs .txt .txt .csv as extension.  Do you still want to continue? ")))
 (user-error "Cannot import such file"))
   (unless (bolp) (insert "\n"))
   (let ((beg (point))

-- 
Utkarsh Singh
http://utkarshsingh.xyz


Re: [PATCH] org-table-import: Make it more smarter for interactive use

2021-04-20 Thread Utkarsh Singh
Hi,

On 2021-04-20, 15:40 +0200, Nicolas Goaziou  wrote:

> For the problem we're trying to solve, this sounds like over-engineering
> to me. Do we want so badly to guess a separator?

Earlier I took is as an assignment to learn Elisp but now I don't think
we should increase complexity this much.

> Thinking again about it, this needs extra care, as end0 might end up on
> an empty line. You tried to avoid this in your first function, but
> I think this was not sufficient either. Actually, beg0 could also start
> on an empty line.
>
> This needs to be tested extensively, but as a first approximation,
> I think `beg' needs to be defined as:
>
>   (save-excursion
> (goto-char (min beg0 end0))
> (skip-chars-forward " \t\n")
> (if (eobp) (point) (line-beginning-position)))
>
> and `end' as
>
>   (save-excursion
> (goto-char (max beg end0))
> (skip-chars-backward " \t\n" beg)
> (if (= beg (point)) (point) (line-end-position)))
>
> Then you need to bail out if beg = end.
>
>>   (sep-rexp '((","  "^[^\n,]+$")
>
> sep-rexp -> sep-regexp
>
>>   ("\t" "^[^\n\t]+$")
>>   (";"  "^[^\n;]+$")
>>   (":"  "^[^\n:]+$")
>>   (" "  "^\\([^'\"][^\n\s][^'\"]\\)+$")))
>
> At this point, I suggest to use `rx' macro instead.
>
> I suggest this (yes, I like pattern-matching, `car' and `cdr' are so
> 80's) instead:
>
>   (save-excursion
> (goto-char beg)
> (catch :found
>   (pcase-dolist (`(,sep ,regexp) sep-regexp)
> (save-excursion
>   (unless (re-search-forward regexp end t)
>   (throw :found sep
> nil))
>

Thanks! I was not aware of pcase-dolist function.

Function after doing the necessary changes:

(defun org-table-guess-separator (beg0 end0)
  "Guess separator for `org-table-convert-region' for region BEG0 to END0.

List of preferred separator:
comma, TAB, semicolon, colon or SPACE.

If region contains a line which doesn't contain the required
separator then discard the separator and search again using next
separator."
  (let* ((beg (save-excursion
(goto-char (min beg0 end0))
(skip-chars-forward " \t\n")
(if (eobp) (point) (line-beginning-position
 (end (save-excursion
(goto-char (max beg end0))
(skip-chars-backward " \t\n" beg)
(if (= beg (point)) (point) (line-end-position
 (sep-regexp '((","  (rx bol (1+ (not (or ?\n ?,))) eol))
   ("\t" (rx bol (1+ (not (or ?\n ?\t))) eol))
   (";"  (rx bol (1+ (not (or ?\n ?\;))) eol))
   (":"  (rx bol (1+ (not (or ?\n ?:))) eol))
   (" "  (rx bol (1+ (not (or ?' ?\" ))
 (not (or ?\s ?\;))
 (not (or ?' ?\"))) eol
 sep)
(unless (= beg end)
  (save-excursion
(goto-char beg)
(catch :found
  (pcase-dolist (`(,sep ,regexp) sep-regexp)
(save-excursion
  (unless (re-search-forward (eval regexp) end t)
(throw :found sep
  nil)

> Again all this needs to extensively tested, as there are a lot of
> dangers lurking around.

Summary of things that still requires a review:

+ Setting boundary right

+ When using SPACE as separator is it sufficient to check for all for
all non quoted SPACE's?

-- 
Utkarsh Singh
http://utkarshsingh.xyz



Re: [PATCH] org-table-import: Make it more smarter for interactive use

2021-04-19 Thread Utkarsh Singh
ts its goal is the delimiter
For performance reasons, the data is evaluated in chunks, so it can
try and evaluate the smallest portion of the data possible, evaluating
additional chunks as necessary.
"""

I tried to do similar in Elisp but currently facing some issues due to
my inexperience in functional programming.  Also moving the 'guessing'
part out the function may lead to development of even better algorithm
than Python counterpart.

Modified version of concerned function:

(defun org-table-guess-separator (beg0 end0)
  "Guess separator for `org-table-convert-region' for region BEG0 to END0.

List of preferred separator:
comma, TAB, semicolon, colon or SPACE.

If region contains a line which doesn't contain the required
separator then discard the separator and search again using next
separator."
  (let* ((beg (save-excursion
(goto-char (min beg0 end0))
(line-beginning-position)))
 (end (save-excursion
(goto-char (max beg0 end0))
(line-end-position)))
 (sep-rexp '((","  "^[^\n,]+$")
 ("\t" "^[^\n\t]+$")
 (";"  "^[^\n;]+$")
 (":"  "^[^\n:]+$")
 (" "  "^\\([^'\"][^\n\s][^'\"]\\)+$")))
 (tmp (car sep-rexp))
 sep)
(save-excursion
  (goto-char beg)
  (while (and (not sep)
  (if (save-excursion
(not (re-search-forward (nth 1 tmp) end t)))
  (setq sep (nth 0 tmp))
(setq sep-rexp (cdr sep-rexp))
(setq tmp (car sep-rexp)
sep)))

Version without using iteration:

(defun org-table-guess-separator (beg0 end0)
  "Guess separator for `org-table-convert-region' for region BEG0 to END0.

List of preferred separator:
COMMA, TAB, SEMICOLON, COLON or SPACE.

If region contains a line which doesn't contain the required
separator then discard the separator and search again using next
separator."
  (let ((beg (save-excursion
       (goto-char (min beg0 end0))
   (line-beginning-position)))
(end (save-excursion
   (goto-char (max beg0 end0))
   (line-end-position
(save-excursion
  (goto-char beg)
  (cond
   ((save-excursion (not (re-search-forward "^[^\n,]+$" end t))) ",")
   ((save-excursion (not (re-search-forward "^[^\n\t]+$" end t))) "\t")
   ((save-excursion (not (re-search-forward "^[^\n;]+$" end t))) ";")
   ((save-excursion (not (re-search-forward "^[^\n:]+$" end t))) ":")
   ((save-excursion (not (re-search-forward "^\\([^'\"][^\n\s][^'\"]\\)+$" 
end t))) " ")
   (t nil)

--
Utkarsh Singh
http://utkarshsingh.xyz



[PATCH] org-table-import: Make it more smarter for interactive use

2021-04-18 Thread Utkarsh Singh
 (interactive "r\nP")
   (let* ((beg (min beg0 end0))
 (end (max beg0 end0))
@@ -881,14 +907,9 @@ org-table-convert-region
   (goto-char end)
   (if (bolp) (backward-char 1) (end-of-line 1))
   (setq end (point-marker))
-  ;; Get the right field separator
-  (unless separator
-   (goto-char beg)
-   (setq separator
- (cond
-  ((not (re-search-forward "^[^\n\t]+$" end t)) '(16))
-  ((not (re-search-forward "^[^\n,]+$" end t)) '(4))
-  (t 1
+  (if (and (not separator)
+   (not (setq separator (org-table-guess-separator beg end
+  (error "Unable to guess suitable separator."))
   (goto-char beg)
   (if (equal separator '(4))
  (while (< (point) end)
@@ -921,12 +942,8 @@ org-table-convert-region
 (defun org-table-import (file separator)
   "Import FILE as a table.
 
-The command tries to be smart and figure out the separator in the
-following way:
-
-- when each line contains a TAB, assume TAB-separated material;
-- when each line contains a comma, assume CSV material;
-- else, assume one or more SPACE characters as separator.
+The command tries to be smart and figure out the separator using
+`org-table-guess-seperator'.
 
 When non-nil, SEPARATOR specifies the field separator in the
 lines.  It can have the following values:

-- 
Utkarsh Singh
http://utkarshsingh.xyz



[PATCH] org-table-import: Give option to import interactively even if filename doesn't match

2021-04-16 Thread Utkarsh Singh
Hi,

Can we give small flexibility in choosing the filename interactively for
`org-table-import'?  Currently org-table-import will just throw an error
when file name doesn't have .txt, .csv or .tsv as extension.

This patch tries to add a simple yes-and-no to let user choose if they
want to continue importing or not.

diff --git a/lisp/org/org-table.el b/lisp/org/org-table.el
index 0e93fb271f..ab66859d6a 100644
--- a/lisp/org/org-table.el
+++ b/lisp/org/org-table.el
@@ -938,7 +938,8 @@ org-table-import
 - regexp  When a regular expression, use it to match the separator."
   (interactive "f\nP")
   (when (and (called-interactively-p 'any)
-(not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file)))
+(not (string-match-p (rx "." (or "txt" "tsv" "csv") eos) file))
+ (not (yes-or-no-p "File doesn't have .txt, .tsv or .csv as 
extension.  Do you still want to continue? ")))
 (user-error "Cannot import such file"))
   (unless (bolp) (insert "\n"))
   (let ((beg (point))

-- 
Utkarsh Singh
http://utkarshsingh.xyz