[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-08-15 Thread Georg Brandl

Georg Brandl added the comment:

@Serhiy/anyone: can I get another review, so that we can commit this in time 
for beta? Thanks!

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-22 Thread Georg Brandl

Georg Brandl added the comment:

Thanks for the detailed review, Serhiy! Next try incoming.

--
Added file: http://bugs.python.org/file42939/numeric_underscores_final_v8.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-21 Thread Eric V. Smith

Eric V. Smith added the comment:

I've created issue 27080 to track the formatting part of this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-18 Thread Stefan Krah

Stefan Krah added the comment:

Thanks, Georg! The decimal parts look good to me. I understand that
people wonder about the relaxed rules for Decimal -- we have discussed
that here:

https://mail.python.org/pipermail/python-dev/2016-March/143557.html


I don't think that it will be a problem in practice.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Added comments on Rietveld.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-17 Thread Georg Brandl

Georg Brandl added the comment:

Thanks Eric!

Serhiy, do you want to do a review? The v6/v7 patches are based on your 
"strict" patch with the constructor changes adapted from v4.

New version v7 addresses the review comments from Stefan and Martin.

--
Added file: http://bugs.python.org/file42887/numeric_underscores_final_v7.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-15 Thread Eric V. Smith

Eric V. Smith added the comment:

Yes, I'll read PEP 515 and work on the formatting.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-15 Thread Georg Brandl

Georg Brandl added the comment:

Note: the changes for format()ting ("_" as thousands separator) are still 
missing. Eric, would you consider doing this part?

--
nosy: +eric.smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-14 Thread Georg Brandl

Changes by Georg Brandl :


Removed file: http://bugs.python.org/file41892/numeric_underscores_v2.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-14 Thread Georg Brandl

Changes by Georg Brandl :


Removed file: http://bugs.python.org/file42852/numeric_underscores_final_v5.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-14 Thread Georg Brandl

Changes by Georg Brandl :


Removed file: http://bugs.python.org/file41888/numeric_underscores.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-14 Thread Georg Brandl

Changes by Georg Brandl :


Removed file: http://bugs.python.org/file41894/numeric_underscores_v3_full.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-14 Thread Georg Brandl

Changes by Georg Brandl :


Added file: http://bugs.python.org/file42854/numeric_underscores_final_v6.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-05-14 Thread Georg Brandl

Georg Brandl added the comment:

New patch; implements the accepted version of the PEP. I added the additional 
tests, thanks Stefan!

--
Added file: http://bugs.python.org/file42852/numeric_underscores_final_v5.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-03-19 Thread Stefan Behnel

Stefan Behnel added the comment:

Ah, thanks. Here's my implementation then:

https://github.com/cython/cython/pull/499/files

It seems that tests for valid complex literals are missing. I've added these to 
the end of the list:

'1_00_00.5j',
'1_00_00.5e5',
'1_00_00j',
'1_00_00e5_1',
'1e1_0',
'.1_4',
'.1_4e1',
'.1_4j',

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-03-19 Thread Georg Brandl

Georg Brandl added the comment:

The last patch isn't up to date with the PEP; Serhiy's patch is the closest one.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-03-19 Thread Stefan Behnel

Stefan Behnel added the comment:

Nice one. While reimplementing it for Cython, I noticed that the grammar 
described in the PEP isn't exactly as it's implemented, though. The grammar says

digit (["_"] digit)*

whereas the latest patch (v4) says

`digit` (`digit` | "_")*

and also implements it that way. The former doesn't allow underscores at the 
end of a literal.

And the regexes in tokenize.py seem happy to accept "0x___", for example. Is 
that intended?

--
nosy: +scoder

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-13 Thread Georg Brandl

Georg Brandl added the comment:

Raymond, you've also worked on Decimal - do you have an opinion on allowing 
underscores in Decimal(string) conversions?

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-13 Thread Georg Brandl

Georg Brandl added the comment:

Hm. On the one hand there is a spec, so it can be argued that underscores don't 
belong to Decimal.

On the other hand, if we get Decimal literals at one point, there will be a 
strong argument for allowing underscores in them as in all other number 
literals.

Although supporting them in strings can also be added at that time.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-13 Thread Stefan Krah

Stefan Krah added the comment:

I still wonder about the complexity of all this for decimal. We now have two 
grammars on top of each other, this being the actual one for decimal:

  http://speleotrove.com/decimal/daconvs.html


For string conversions I'd prefer a lax way (similar to OCaml) that would 
somehow be specified in terms of preprocessing, same as the leading/trailing 
whitespace removal. Short of "ignore all underscores" it isn't easy though.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-13 Thread Stefan Krah

Stefan Krah added the comment:

> Georg Brandl added the comment:
> 
> Thanks, I hadn't looked at cdecimal yet - I was planning to ask you to do the 
> necessary changes there :)

Oh, well. :)

> But there are a few versions of this (e.g. converting unicode digits to 
> ASCII) scattered throughout the codebase, it would make sense to consolidate 
> on this occasion.

Yes, actually I have to look at the _decimal version again, it contains
some optimizations that may only work for _decimal:

  https://hg.python.org/cpython/file/default/Modules/_decimal/_decimal.c#l1943

I *did* optimize it for speed at the time, I hope general functions won't be
slower.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-13 Thread Georg Brandl

Georg Brandl added the comment:

Thanks, I hadn't looked at cdecimal yet - I was planning to ask you to do the 
necessary changes there :)

But there are a few versions of this (e.g. converting unicode digits to ASCII) 
scattered throughout the codebase, it would make sense to consolidate on this 
occasion.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-13 Thread Stefan Krah

Stefan Krah added the comment:

Correction: The explanation of the functions should be reversed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-13 Thread Stefan Krah

Stefan Krah added the comment:

If the string conversions stay, may I suggest two functions:

  1) PyUnicode_NumericAsAscii()
  2) PyUnicode_NumericAsAsciiWS()

The first one eliminates only underscores, the second one both
underscores and leading/trailing whitespace.

Decimal must support both:

  https://hg.python.org/cpython/file/default/Modules/_decimal/_decimal.c#l1890

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-13 Thread Georg Brandl

Georg Brandl added the comment:

It's mostly for consistency. For example, ``int(x, 0)`` is defined by the docs 
as "interpret x as in a literal".  Other bases have special cases as well, e.g. 
"0x" is accepted by base 16.

In the current version of the conversions, the string is scanned for "_" before 
doing the more expensive allocation+copy.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-13 Thread Stefan Krah

Stefan Krah added the comment:

I like the feature for literals, but I'm not sure about conversions from 
string. It slows down the conversion for (IMO) a very small benefit.

Other languages allow it, but I've never attempted to use the feature:

$ ocaml
OCaml version 4.02.1

# float_of_string "__12.___e___101_";;
- : float = 1.2e+102

--
nosy: +skrah

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-11 Thread Mark Dickinson

Changes by Mark Dickinson :


--
nosy: +mark.dickinson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-11 Thread Georg Brandl

Georg Brandl added the comment:

New patch with minimal doc updates.

--
Added file: http://bugs.python.org/file41896/numeric_underscores_v4_full.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-11 Thread Georg Brandl

Georg Brandl added the comment:

This patch includes int(), float(), complex() operations, as well as _pydecimal.

--
Added file: http://bugs.python.org/file41894/numeric_underscores_v3_full.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-11 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Proposed patch implements strict underscore rules. The implementation is not 
more complex.

--
Added file: http://bugs.python.org/file41893/numeric_underscores_strict.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-11 Thread Georg Brandl

Georg Brandl added the comment:

New patch matching revision of PEP.

--
Added file: http://bugs.python.org/file41892/numeric_underscores_v2.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-10 Thread Petr Viktorin

Petr Viktorin added the comment:

Regarding the patch: if trailing underscores are not allowed, `0 if 1_else 
1` should be illegal.

--
nosy: +encukou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-10 Thread Georg Brandl

Georg Brandl added the comment:

PEP 515 is written up and posted to python-dev.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

C++14 uses the same strict rule as Ada, but uses apostrphes instead of 
underscores. [1]

Thus there are two groups of languages, implementing strict or lenient rules:

* Strict: Ada, C++, Java, C#, Ruby, Julia, Perl (as documented), Swift (textual 
description).
* Lenient: D, Rust, Perl (actually), Swift (grammar productions).

[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-10 Thread Yury Selivanov

Yury Selivanov added the comment:

> I prefer simpler and more strict rule:
> * Underscores are allowed only between digits in numeric literals.

+1.  But in any case we need a PEP for this change.

--
nosy: +yselivanov

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

* Java: only between digits. [1]
* Julia: only between digits. [2] (not well specified)
* C# 7.0 (proposal): only between digits, but adjacent underscores allowed. [3]
* Ada: only between digits. [4] (strong but very simple rules)
* D: very much like proposed patch, but trailing underscores allowed. [5]
* Perl 5: only between digits as documented (23__500 is not legal), but 
actually more lenient. [6]

[1] 
https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html
[2] 
http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-point-numbers/
[3] https://github.com/dotnet/roslyn/issues/216
[4] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4
[5] http://dlang.org/spec/lex.html#integerliteral
[6] http://perldoc.perl.org/perldata.html#Scalar-value-constructors

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-10 Thread Georg Brandl

Georg Brandl added the comment:

It sure is more strict, but I don't think it's simpler (and it's definitely not 
simpler to implement).

(Also 1_j is pretty nice, I wouldn't want to lose that.)

We can also check what other languages do.

* Rust: very much like this, but trailing underscores allowed.
* Perl 5: same as here, but underscores after dot and trailing underscores 
allowed.
* Ruby: only between digits.

* Swift: the grammar productions say it's basically the same as Rust.  The 
textual description says "between digits".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-10 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I prefer simpler and more strict rule:

* Underscores are allowed only between digits in numeric literals.

Thus 1__2, 12_, 1_.2, 1_e2, 1e_2, 1_j, 0x_12 are not allowed.

It is easier to make the rule more lenient later if it will be needed.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-10 Thread Ethan Furman

Changes by Ethan Furman :


--
nosy: +ethan.furman

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26331] Tokenizer: allow underscores for grouping in numeric literals

2016-02-10 Thread Georg Brandl

New submission from Georg Brandl:

As discussed on python-ideas: 
https://mail.python.org/pipermail/python-ideas/2016-February/038354.html

The rules are: 
Underscores are allowed anywhere in numeric literals, except:

* at the beginning of a literal (obviously)
* at the end of a literal
* directly after a dot (since the underscore could start an attribute name)
* directly after a sign in exponents (for consistency with leading signs)
* in the middle of the "0x", "0o" or "0b" base specifiers

Currently this only touches literals, not the inputs of int() or float().  
Whether they should accept this syntax is debatable (I'd vote no).

Otherwise missing: doc updates.

Review question: is PyMem_RawStrdup/RawFree the right API to use here?

--
components: Interpreter Core
files: numeric_underscores.diff
keywords: patch
messages: 260026
nosy: georg.brandl
priority: normal
severity: normal
stage: patch review
status: open
title: Tokenizer: allow underscores for grouping in numeric literals
type: enhancement
versions: Python 3.6
Added file: http://bugs.python.org/file41888/numeric_underscores.diff

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com