Re: Extract data

2018-05-14 Thread Steven D'Aprano
On Tue, 15 May 2018 11:53:47 +0530, mahesh d wrote:

> Hii.
> 
>  I have folder.in that folder some files .txt and some files .msg files.
>  .
> My requirement is reading those file contents . Extract data in that
> files .

The answer to this question is the same as the answer to your previous 
question "Extract" sent earlier.


Use the glob module, or the os.listdir function, to get a list of files 
matching the extension you want, then read each file.

Do you know how to open and read the contents of a file?

with open("filename.txt", "r"):
data = f.read()


What you do with the data is then up to you.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Extract data

2018-05-14 Thread mahesh d
Hii.

 I have folder.in that folder some files .txt and some files .msg files. .
My requirement is reading those file contents . Extract data in that files .
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: spurious BadDrawable error when running test_tk

2018-05-14 Thread dieter
Matthias Kievernagel  writes:
> I changed some detail in the tkinter library,
> so I'm running the tk test like this:
>
> ./python -m test -u gui -v test_tk
>
> Approximately every 2 or 3 runs I get a BadDrawable error
> from the X server, most of the time at the end after
> the last test finished successfully.
> As this also happens when I run the test on the unmodified CPython sources
> I suspect it is a shortcoming of my non-mainstream setup.
> I use ctwm with interactive window placement,
> so running the test involves a lot of clicks.
> Does anyone know if the errors are to be expected
> or should it work nonetheless?

Your error description suggests a race condition.

Like other (apparently) non deterministic error conditions,
they can be very hard to detect and diagnose. They may occur only
in very rare situations and in special contexts.

My feeling (!) is that "test_tk" has a flaw in the teardown of
the test setup which may in rare cases cause this race condition.
I would not be too worried about this.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Extract

2018-05-14 Thread Cameron Simpson

On 15May2018 07:26, mahesh d  wrote:

I have a directory. In that folder .msg files . How can I extract those
files.


You can get the filenames from the directory with the os.listdir function or 
with the glob.glob function. If you mean "extract the contents of those files" 
instread of just finding their names, you would need to know about the data 
format within those files, which you have not described.


See the Python docs here:

 https://docs.python.org/3/

and look up the "os" and "glob" modules for the functions mentioned above.

Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Extract

2018-05-14 Thread mahesh d
Hii
 I have a directory. In that folder .msg files . How can I extract those
files.


Thanks & regards

Mahesh
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: seeking deeper (language theory) reason behind Python design choice

2018-05-14 Thread Steven D'Aprano
On Mon, 14 May 2018 18:20:13 -0500, Python wrote:

> I am hardly perfect.

Have you tried just wanting to be perfect more?


Look, we get it: it is possible to improve the quality of your code by 
paying attention to what you do, by proof-reading, testing, code reviews, 
warnings, linters, etc. We're not all doomed to be careless coders. I 
agree completely.

Also agree completely that assignment in expressions is sometimes useful.

Also agree that *with care and good management* it is possible to reduce 
the error rate from assignment-expressions to a manageable level -- even 
if assignment is spelled "=". Not to *zero*, but some non-zero manageable 
level.

But you miss the point that even if = versus == errors are picked up by 
code reviews or tests, they are still software bugs. Your *process* 
(testing and reviews) picked up the bug before they went into production, 
but *the bug still was made*.

A mere typo is not a bug if the compiler flags it before the code runs. 
It's just a typo.

So instead of congratulating yourself over how you never make the = 
versus == bug, you ought to be sheepishly realising how often you make 
it, but fortunately you have the processes in place to catch it before it 
reaches production.

Now remember that not every programmer works in large teams with pair 
programming, code reviews, test driven development, automatic buildbots 
to catch errors, etc.

Now remember that in 1991 when Guido made the decision to ban = as an 
expression, those concepts didn't even exist. There were no Python 
linters, and no reason to imagine that there ever would be. Guido didn't 
know that Python would become one of the top 10 most used languages. For 
all he knew, version 1.0 could be the final release.

By 1991 there had already been *decades* of experience with C proving 
that the "=" assignment syntax is dangerously confusable with == and a 
total bug magnet when allowed as expressions as well, so it was perfectly 
reasonable to ban it from the language.

There's nothing you can do with assignment expressions that can't be done 
*almost* as easily with assignment statements. Its often a matter of mere 
personal preference, do I want to write this as a single line or two?

And as the discussions over PEP 572 prove, the choice about allowing 
assignment expressions is *not easy*. Not only is there the matter of 
whether or not to allow it, but what spelling to use, and what scope the 
assignment should operate in.

And if you think that last one is the most trivial, in fact with list 
comprehensions and generator expressions, it will probably end up being 
the most controversial of all the questions.

And the most likely to sink the proposal.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pylint/pyreverse with Python3

2018-05-14 Thread Rich Shepard

On Mon, 14 May 2018, summerra...@gmail.com wrote:


I'm having the same issue; can you give an example command line for
python2 and python3 specific installation?


Summerrae,

  All my development is now strictly Python3. The installation depends on
your OS and distribution. For a basic installation (ignoring dependendices
which you'll discover when you try to run it), type
python setup.py install or
python3 setup.py install
from within the directory with the egg, wheel, or source code.

HTH,

Rich
--
https://mail.python.org/mailman/listinfo/python-list


Re: seeking deeper (language theory) reason behind Python design choice

2018-05-14 Thread Chris Angelico
On Tue, May 15, 2018 at 9:20 AM, Python  wrote:
> I'm well acquainted with that pheonomenon, though I daresay that if
> you proofread your own product you will often find your mistakes.  You
> just won't always.  But, I never said review it right after you wrote
> it, and in fact I don't do that (well, I do reread it if it seems
> something potentially concerning).  Rather I review it when I'm about
> to check it in, which for anything non-trivial is generally days
> later, after it's been tested (which implies the tests were written).
> I find my own bugs very often (but not nearly as often as I'd like).

Where does your code sit during those days? What happens if multiple
people make changes to the same files in parallel - do you deal with
merge conflicts all the time simply because you don't want to push
code in a timely manner?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: seeking deeper (language theory) reason behind Python design choice

2018-05-14 Thread Python
On Mon, May 14, 2018 at 12:02:47PM -0600, Ian Kelly wrote:
> On Mon, May 14, 2018 at 9:38 AM, Python  wrote:
> > Absolutely correct.  If you're not doing THOROUGH code reviews, and
> > not thoroughly testing your code, your job is only half done.  You
> > should be your own first reviewer, and then have a second someone
> > competent review it after you do.
> 
> One should never be their own "first reviewer" because it may lead to
> the mindset that a "second reviewer" is unnecessary.

I went on to say that the second reviewer was required (i.e. this
should be considered a required part of good process).  If you decide
this is your process and it's required then it's required, regardless
of what you personally think.

> You're about as likely to notice the glaring bugs in the code that
> you just wrote as you are to notice the missing or misspelled words
> in the sentence you just penned.

I'm well acquainted with that pheonomenon, though I daresay that if
you proofread your own product you will often find your mistakes.  You
just won't always.  But, I never said review it right after you wrote
it, and in fact I don't do that (well, I do reread it if it seems
something potentially concerning).  Rather I review it when I'm about
to check it in, which for anything non-trivial is generally days
later, after it's been tested (which implies the tests were written).
I find my own bugs very often (but not nearly as often as I'd like).

I am hardly perfect.  What I am is thorough.  I would argue that is
required to write quality software.  If your goal is not to write
quality software, then none of this matters.  And if it is your goal,
you shouldn't need to care about this, because you'll either get it
right through whatever process you have, or you'll avoid it entirely
if you don't think your process is adequate.  Your choice.  Or not, if
the language decides you can't be responsible enough to make that
choice for yourself.

> That said, when I'm doing a code review, my focus is on all of the
> following things:
> 
> * Design: does this code make sense for what it's trying to accomplish?
> * Functionality: does the code work as intended?
> * Readability: can I understand it, and will others understand it later?
> * Complexity: could this code be simpler?
> * Tests: does the code include good tests?

These all seem fine, but if you're missing extremely well-known
pitfalls, then... I'll prefer a different code reviewer. :)  Maybe you
should consider adding that to your list.

> The existence of subtle bugs are just one of the things that I'm
> thinking about, so from my perspective, the more the compiler can help
> with this, the better.

I don't disagree with that, except that I don't consider this a subtle
bug, largely on account of its aforementioned status as well-known
pitfall.  But niether do I consider it damning, obviously.

> In C, if I miss a misplaced '=' then the code will do the wrong
> thing. 

Better yet, the compiler should warn you about it (which I believe it
does).  And you should be compiling with warnings. 

> In Python, I don't even have to worry about it, and I like it that
> way. 

If you're so concerned about making that mistake, YOU CAN CHOOSE TO
NOT USE THAT CONSTRUCT.  It doesn't need to be a decision forced on
you by the compiler.

As we've learned in this thread, there is a PEP for implementing this
feature which Guido apparently approves, so it can't be all that evil
after all, can it?

> So when you say that '=' as an expression should be supported
> because you think it's useful, and anyway those sorts of bugs will
> be caught by code reviews, the way that reads to me is:

Actually if you read all of what I wrote, you'll know that what I said
was assignment as an expression should be allowed, and if there were
to be a different operator to express that to avoid confusion, that
would be just fine with me.  But I don't think that need be a
condition.

> "'=' as an expression should be supported because it's convenient to
> me, and I don't believe I write bugs

Convenience is sort of Python's gig, isn't it?  "Make simple things
easy, make hard things possible."

> and if I do it doesn't matter because my time is important than that
> of the person who reviews my code."

Circumstantially, that may actually be true... or it may not.  Depends
on who you're working with, their roles and seniority, relative skill
and compensation, and probably  other factors.  Most groups have such
a heirarchy, even if it is largely implied rather than stated.
HOWEVER it hardly matters if the construct in question is accepted
practice.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: object types, mutable or not?

2018-05-14 Thread Ben Finney
Steven D'Aprano  writes:

> Five years ago, the President Of the United States of America, or POTUS 
> for short, referred to Barrack Obama. Today, it refers to Donald Trump. 
> This didn't happen by mutating a single person (an object) from a 
> youngish black-skinned man to an oldish orange-skinned man. It happened 
> by re-assigning the name from Obama to Trump.

This is a good analogy.

> Likewise when you re-assign a variable name from one value (an object) to 
> another, the original object doesn't change. You just make the name refer 
> to a different object, while the first goes on its merry way (probably to 
> be collected by the garbage collector and the memory reclaimed).

Hopefully I am not the only one who wondered how closely the analogy to
POTUS extends into this description.

-- 
 \  “I can picture in my mind a world without war, a world without |
  `\   hate. And I can picture us attacking that world, because they'd |
_o__)   never expect it.” —Jack Handey |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pylint/pyreverse with Python3

2018-05-14 Thread summerrae78
I'm having the same issue; can you give an example command line for python2 and 
python3 specific installation? 

Thanks!

On Sunday, May 13, 2018 at 7:12:29 PM UTC-7, Terry Reedy wrote:
> On 5/13/2018 1:01 PM, Rich Shepard wrote:
> >    Installed here is pylint-1.7.1 and python-3.6.5. When I try to run
> > pyreverse (and pylint) on python3 source code it fails because it finds 
> > only
> > the python-2.7 site-package and not the python-3.6 site-package.
> 
> >    If you have learned how to run pylint/pyreverse on python3 code please
> > share your knowledge with me.
> 
> You have to install a package in /site-packages for each version you 
> want to run it with.
> 
> Then you have to make sure you run the version you intend to run.
> 
> 
> -- 
> Terry Jan Reedy

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python-list Digest, Vol 176, Issue 16

2018-05-14 Thread Paul Moore
On 14 May 2018 at 20:02, Paul  wrote:
> 1) I understand the added cost of verifying the sequence.  However, this
> appears to be a one-time cost.  E.G., if I submit this,
>
> random.choices(lm,cum_weights=[25,26,36,46,136],k=400
>
> then the code will do an O(n log n) operation 400 times.
>
> If verification was added, then the the code would do an O(n log n)
> operation 400 times, plus an O(n) operation done *one* time.   So, I'm not
> sure that this would be a significant efficiency hit (except in rare cases).

That's a good point. But as I don't have any need myself for
random.choices with a significant population size (the only case where
this matters) I'll leave it to those who do use the functionality to
decide on that point.

Regards,
Paul
-- 
https://mail.python.org/mailman/listinfo/python-list


ANN: DIPY 0.14.0

2018-05-14 Thread Eleftherios Garyfallidis
Hello all!

We are excited to announce a new *major release* of Diffusion Imaging in
Python (DIPY).

*DIPY 0.14 (Tuesday, 1rst May 2018)*

This release received contributions from *24 developers*. A warm thank you
to each one of you for your contribution.

The complete release notes are available at:
http://nipy.org/dipy/release0.14.html

*Highlights *of this release include:


   - *RecoBundles*: anatomically relevant segmentation of bundles
   - New super fast clustering algorithm: *QuickBundlesX*
   - New tracking algorithm:* Particle Filtering Tracking*.
   - New tracking algorithm: *Probabilistic Residual Bootstrap Tracking*.
   - New API for reading, saving and processing tractograms.
   - Fiber ORientation Estimated using Continuous Axially Symmetric Tensors
   (*FORECAST*).
   - New command line interfaces.
   - Deprecated fvtk (old visualization framework).
   - A range of new visualization improvements.
   - *Large documentation update*.

To upgrade, run the following command in your terminal:

*pip install --upgrade dipy*

or

*conda install -c conda-forge dipy*

This version of DIPY depends on recent versions of nibabel (2.1.0+).

For any questions go to http://dipy.org, or send an e-mail to
neuroimag...@python.org  or ask a question to our interactive chat room
available at https://gitter.im/nipy/dipy

On behalf of the DIPY developers,
Eleftherios Garyfallidis, Ariel Rokem, Serge Koudoro
http://dipy.org/developers.html
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: f-string anomaly

2018-05-14 Thread Lele Gaifax
Ken Kundert  writes:

> Lele,
> I am using Python3.6. d has to be an object of mydict.

My bad, sorry, I completely missed the premise :-|.

ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
l...@metapensiero.it  | -- Fortunato Depero, 1929.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: f-string anomaly

2018-05-14 Thread Ken Kundert
Lele,
I am using Python3.6. d has to be an object of mydict.

Here is the code that exhibits the problem:

 import sys, os
 from inform import error, os_error

 class mydict(dict):
 def __format__(self, template):
 print('Template:', template)
 return ', '.join(template.format(v, k=k, v=v) for k, v in
self.items())


 d = mydict(bob='239-8402', ted='371-8567', carol='891-5810',
alice='552-2219')

 print('Using format():')
 print('Email: {0:{{k}}: {{v}}}'.format(d))
 print()
 print('Using f-string:')
 print(f'Email: {d:{{k}} {{v}}}')
 print()
 print('Using f-string:')
 k=6
 v=9
 print(f'Email: {d:{{k}} {{v}}}')

The result is:

NameError: name 'k' is not defined

-Ken





On 05/14/2018 12:24 PM, Lele Gaifax wrote:
> Ken Kundert  writes:
> 
>> Lele,
>> I'm afraid I was unclear. The ... in the code snippet was intended
>> to imply that these lines were appended to the end of the original code,
>> where d was defined.
> 
> Ok, but then I get a different behaviour:
> 
> Python 3.6.5 (default, May 11 2018, 13:30:17) 
> [GCC 7.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> k=1
> >>> v=2
> >>> d=3
> >>> print(f'Email: {d:{{k}} {{v}}}')
> Traceback (most recent call last):
>   File "", line 1, in 
> ValueError: Invalid format specifier
> 
> Which Python version are you using?
> 
> ciao, lele.
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: f-string anomaly

2018-05-14 Thread MRAB

On 2018-05-14 20:24, Lele Gaifax wrote:

Ken Kundert  writes:


Lele,
I'm afraid I was unclear. The ... in the code snippet was intended
to imply that these lines were appended to the end of the original code,
where d was defined.


Ok, but then I get a different behaviour:

 Python 3.6.5 (default, May 11 2018, 13:30:17)
 [GCC 7.3.0] on linux
 Type "help", "copyright", "credits" or "license" for more information.
 >>> k=1
 >>> v=2
 >>> d=3
 >>> print(f'Email: {d:{{k}} {{v}}}')
 Traceback (most recent call last):
   File "", line 1, in 
 ValueError: Invalid format specifier

Which Python version are you using?


You need to look at the original post for the value of 'd':

class mydict(dict):
def __format__(self, template):
print('Template:', template)
return ', '.join(template.format(v, k=k, v=v) for k, v in 
self.items())


d = mydict(bob='239-8402', ted='371-8567', carol='891-5810',
alice='552-2219')
--
https://mail.python.org/mailman/listinfo/python-list


Re: f-string anomaly

2018-05-14 Thread Lele Gaifax
Ken Kundert  writes:

> Lele,
> I'm afraid I was unclear. The ... in the code snippet was intended
> to imply that these lines were appended to the end of the original code,
> where d was defined.

Ok, but then I get a different behaviour:

Python 3.6.5 (default, May 11 2018, 13:30:17) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> k=1
>>> v=2
>>> d=3
>>> print(f'Email: {d:{{k}} {{v}}}')
Traceback (most recent call last):
  File "", line 1, in 
ValueError: Invalid format specifier

Which Python version are you using?

ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
l...@metapensiero.it  | -- Fortunato Depero, 1929.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

2018-05-14 Thread Paul
forgot to edit the subject.  Sorry.
  paul c.

On Mon, May 14, 2018 at 12:02 PM, Paul  wrote:

> Hello all,
>Thanks for the thoughtful (and non-snarky) replies.
>
> First, a suggestion for a documentation change:
>
> To this paragraph:
>
> *If neither weights nor cum_weights are specified, selections are made
> with equal probability. If a weights sequence is supplied, it must be the
> same length as the population sequence. It is a TypeError
>  to specify
> both weights and cum_weights.*
> Add this sentence:
>
> "A cum_weights sequence, if supplied, must be in strictly-ascending order,
> else incorrect results will be (silently) returned."
>
> Secondly, about the cost of verifying the sequence:
>
> 1) I understand the added cost of verifying the sequence.  However, this
> appears to be a one-time cost.  E.G., if I submit this,
>
> random.choices(lm,cum_weights=[25,26,36,46,136],k=400
>
> then the code will do an O(n log n) operation 400 times.
>
> If verification was added, then the the code would do an O(n log n)
> operation 400 times, plus an O(n) operation done *one* time.   So, I'm not
> sure that this would be a significant efficiency hit (except in rare cases).
>
> 2) Paul Moore wrote:
>
> > So the people who *really* need cum_weights are those
>
> > who have the cumulative weights already, and cannot
>
> > afford an O(n)precalculation step.
>
>
> I agree that with the "already have the cum_weights" argument.  Based on
> my point #1, I'm not convinced about the "can't afford" argument.
>
> 3) A minor point.  The documentation also says: "so supplying the
> cumulative weights saves work."  However, this is work done (once, as noted
> above) by a computer rather than work done (even if aided by a a computer)
> by a human, so I'd vote for having the computer do it. :)
>
>
> To conclude, I would still lean slightly toward having the code enforce
> the 'strictly-ascending sequence' requirement.  However, given that a)
> improving the documentation is much more doable and that, b) in some cases,
> the addition of an order O(n) step might be significant, I'd be more than
> happy if the documentation could be improved (as suggested).
>
> thanks
>   Paul Czyzewki
>
> PS.  I see the issue which steven.daprano opened.  Thanks, Steven.
> However, I'm not sure what's appropriate in terms of updating that issue,
> or even if I have permission to update it, so I'd appreciate if someone
> would add this response to the issue. Thanks.
>
>
>
>> From: Paul Moore 
>> To: "Steven D'Aprano" 
>> Cc: Python 
>> Bcc:
>> Date: Mon, 14 May 2018 14:35:34 +0100
>> Subject: Re: random.choices() Suggest that the code confirm that
>> cum_weights sequence is in ascending order
>> On 14 May 2018 at 14:07, Steven D'Aprano
>>  wrote:
>> > On Mon, 14 May 2018 12:59:28 +0100, Paul Moore wrote:
>> >
>> >> The problem is that supplying cum_weights allows the code to run in
>> >> O(log n) by using bisection. This is significantly faster on large
>> >> populations. Adding a test that the cumulative weights are
>> nondecreasing
>> >> would add an O(n) step to the code.
>> >>
>> >> So while I understand the OP's problem, I don't think it's soluble
>> >> without making the cum_weights argument useless in practice.
>> >
>> > How does O(N) make it "useless"? There are lots of O(N) algorithms, even
>> > O(N**2) and O(2**N) which are nevertheless still useful.
>>
>> Well, I've never seen an actual use case for this argument (I can't
>> think of a case where I'd even have cumulative weights rather than
>> weights, and obviously calculating the cumulative weights from the
>> actual weights is what we're trying to avoid). And if you have
>> cum_weights and O(n) is fine, then calculating weights from
>> cum_weights is acceptable (although pointless, as it simply duplicates
>> work). So the people who *really* need cum_weights are those who have
>> the cumulative weights already, and cannot afford an O(n)
>> precalculation step.
>>
>> But yes, clearly in itself an O(n) algorithm isn't useless. And
>> agreed, in most cases whether random.choices() is O(n) or O(log n) is
>> irrelevant in practice.
>>
>> Paul
>>
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python-list Digest, Vol 176, Issue 16

2018-05-14 Thread Paul
Hello all,
   Thanks for the thoughtful (and non-snarky) replies.

First, a suggestion for a documentation change:

To this paragraph:

*If neither weights nor cum_weights are specified, selections are made with
equal probability. If a weights sequence is supplied, it must be the same
length as the population sequence. It is a TypeError
 to specify
both weights and cum_weights.*
Add this sentence:

"A cum_weights sequence, if supplied, must be in strictly-ascending order,
else incorrect results will be (silently) returned."

Secondly, about the cost of verifying the sequence:

1) I understand the added cost of verifying the sequence.  However, this
appears to be a one-time cost.  E.G., if I submit this,

random.choices(lm,cum_weights=[25,26,36,46,136],k=400

then the code will do an O(n log n) operation 400 times.

If verification was added, then the the code would do an O(n log n)
operation 400 times, plus an O(n) operation done *one* time.   So, I'm not
sure that this would be a significant efficiency hit (except in rare cases).

2) Paul Moore wrote:

> So the people who *really* need cum_weights are those

> who have the cumulative weights already, and cannot

> afford an O(n)precalculation step.


I agree that with the "already have the cum_weights" argument.  Based on my
point #1, I'm not convinced about the "can't afford" argument.

3) A minor point.  The documentation also says: "so supplying the
cumulative weights saves work."  However, this is work done (once, as noted
above) by a computer rather than work done (even if aided by a a computer)
by a human, so I'd vote for having the computer do it. :)


To conclude, I would still lean slightly toward having the code enforce the
'strictly-ascending sequence' requirement.  However, given that a)
improving the documentation is much more doable and that, b) in some cases,
the addition of an order O(n) step might be significant, I'd be more than
happy if the documentation could be improved (as suggested).

thanks
  Paul Czyzewki

PS.  I see the issue which steven.daprano opened.  Thanks, Steven.
However, I'm not sure what's appropriate in terms of updating that issue,
or even if I have permission to update it, so I'd appreciate if someone
would add this response to the issue. Thanks.



> From: Paul Moore 
> To: "Steven D'Aprano" 
> Cc: Python 
> Bcc:
> Date: Mon, 14 May 2018 14:35:34 +0100
> Subject: Re: random.choices() Suggest that the code confirm that
> cum_weights sequence is in ascending order
> On 14 May 2018 at 14:07, Steven D'Aprano
>  wrote:
> > On Mon, 14 May 2018 12:59:28 +0100, Paul Moore wrote:
> >
> >> The problem is that supplying cum_weights allows the code to run in
> >> O(log n) by using bisection. This is significantly faster on large
> >> populations. Adding a test that the cumulative weights are nondecreasing
> >> would add an O(n) step to the code.
> >>
> >> So while I understand the OP's problem, I don't think it's soluble
> >> without making the cum_weights argument useless in practice.
> >
> > How does O(N) make it "useless"? There are lots of O(N) algorithms, even
> > O(N**2) and O(2**N) which are nevertheless still useful.
>
> Well, I've never seen an actual use case for this argument (I can't
> think of a case where I'd even have cumulative weights rather than
> weights, and obviously calculating the cumulative weights from the
> actual weights is what we're trying to avoid). And if you have
> cum_weights and O(n) is fine, then calculating weights from
> cum_weights is acceptable (although pointless, as it simply duplicates
> work). So the people who *really* need cum_weights are those who have
> the cumulative weights already, and cannot afford an O(n)
> precalculation step.
>
> But yes, clearly in itself an O(n) algorithm isn't useless. And
> agreed, in most cases whether random.choices() is O(n) or O(log n) is
> irrelevant in practice.
>
> Paul
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: f-string anomaly

2018-05-14 Thread Ken Kundert
Lele,
I'm afraid I was unclear. The ... in the code snippet was intended
to imply that these lines were appended to the end of the original code,
where d was defined.

-Ken

On 05/14/2018 12:30 AM, Lele Gaifax wrote:
> Ken Kundert  writes:
> 
>> I tried adding k and v to the local namespace:
>>
>> ...
>> k = 6
>> v = 9
>> print(f'Email: {d:{{k}} {{v}}}')
>>
>> I still got:
>>
>> NameError: name 'k' is not defined
> 
> This is not what I get:
> 
> Python 3.6.5 (default, May 11 2018, 13:30:17) 
> [GCC 7.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> k=1
> >>> v=2
> >>> print(f'this is {{k}} and {{v}}')
> this is {k} and {v}
> >>> print(f'this is {k} and {v}')
> this is 1 and 2
> >>> print(f'Email: {d:{{k}} {{v}}}')
> Traceback (most recent call last):
>   File "", line 1, in 
> NameError: name 'd' is not defined
> 
> ciao, lele.
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: seeking deeper (language theory) reason behind Python design choice

2018-05-14 Thread Ian Kelly
On Mon, May 14, 2018 at 9:38 AM, Python  wrote:
> Absolutely correct.  If you're not doing THOROUGH code reviews, and
> not thoroughly testing your code, your job is only half done.  You
> should be your own first reviewer, and then have a second someone
> competent review it after you do.

One should never be their own "first reviewer" because it may lead to
the mindset that a "second reviewer" is unnecessary. You're about as
likely to notice the glaring bugs in the code that you just wrote as
you are to notice the missing or misspelled words in the sentence you
just penned. Why? Because you just wrote it, and as a result you
believe that you know what it says, and you'll simply fail to process
the fact that it actually says something different.

That said, when I'm doing a code review, my focus is on all of the
following things:

* Design: does this code make sense for what it's trying to accomplish?
* Functionality: does the code work as intended?
* Readability: can I understand it, and will others understand it later?
* Complexity: could this code be simpler?
* Tests: does the code include good tests?

The existence of subtle bugs are just one of the things that I'm
thinking about, so from my perspective, the more the compiler can help
with this, the better. In C, if I miss a misplaced '=' then the code
will do the wrong thing. In Python, I don't even have to worry about
it, and I like it that way. So when you say that '=' as an expression
should be supported because you think it's useful, and anyway those
sorts of bugs will be caught by code reviews, the way that reads to me
is:

"'=' as an expression should be supported because it's convenient to
me, and I don't believe I write bugs, and if I do it doesn't matter
because my time is important than that of the person who reviews my
code."
-- 
https://mail.python.org/mailman/listinfo/python-list


spurious BadDrawable error when running test_tk

2018-05-14 Thread Matthias Kievernagel
Dear list,

I changed some detail in the tkinter library,
so I'm running the tk test like this:

./python -m test -u gui -v test_tk

Approximately every 2 or 3 runs I get a BadDrawable error
from the X server, most of the time at the end after
the last test finished successfully.
As this also happens when I run the test on the unmodified CPython sources
I suspect it is a shortcoming of my non-mainstream setup.
I use ctwm with interactive window placement,
so running the test involves a lot of clicks.
Does anyone know if the errors are to be expected
or should it work nonetheless?
I haven't found anything on b.p.o. about this.

Thanks for any insights,
Matthias Kievernagel


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: seeking deeper (language theory) reason behind Python design choice

2018-05-14 Thread Ian Kelly
On Mon, May 14, 2018 at 9:20 AM, Python  wrote:
> On Sun, May 13, 2018 at 02:42:48PM +1000, Chris Angelico wrote:
>> On Sun, May 13, 2018 at 2:31 PM, Python  wrote:
>> >> Yes, and I'd go further: I *am* too stupid to get this right.
>> >
>> > No, you are not.  Do you ever say "dog" when you mean "dot" instead?
>> > Do you ever say "dad" when you mean "mom" instead?  Internalize that
>> > "=" is "equals" (or "assigns" if you prefer) and "==" is "is equal to"
>> > then use those phrases in your head when you're thinking about which
>> > one you need in your code, and I'm pretty sure you'll stop making this
>> > mistake.  It may help that the phrase with twice as many syllables
>> > represents the operator that has twice as many characters.  Eventually
>> > it becomes second nature, like not calling Dad "Mom."
>>
>> Rght, of course. Because prevention of bugs is just a matter of
>> wanting to.
>
> Preventing *certain classes* of bugs, mainly botching syntax, is mostly
> just a matter of wanting to, like a piano virtuoso who can play
> complicated pieces night after night flawlessly.  It just takes focus
> and practice.  Preventing the = vs. == bug is nowhere near as
> complex or difficut as La Campanella, so you don't even need to be a
> virtuoso.  You just have to be mindful and careful.

I'm reminded of the first bullet point of step 6 in this article,
which just crossed my inbox this morning:

https://www.e4developer.com/2018/05/13/how-to-write-horrible-java/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: seeking deeper (language theory) reason behind Python design choice

2018-05-14 Thread Steven D'Aprano
On Mon, 14 May 2018 10:20:06 -0500, Python wrote:

> Preventing *certain classes* of bugs, mainly botching syntax, is mostly
> just a matter of wanting to, 

That comment is very ignorant of the mental processes involved in both 
language processing and typing, two skills used in programming. You can't 
prevent errors merely, or even "mostly", by wanting not to make errors.


> like a piano virtuoso who can play
> complicated pieces night after night flawlessly.

Right up until the moment that they make a mistake, which they do.

Virtuosos suffer from fatigue or injuries, they have slumps, they have 
bad days, they often cannot reproduce the same performance (every 
performance is unique since they are not robots that can repeat every 
minute motion over and over again) and they make mistakes. "Flawlessly" 
does not mean without flaw, it is mere hyperbole.

https://www.telegraph.co.uk/culture/music/
classicalconcertreviews/10878171/Khatia-Buniatishvili-Queen-Elizabeth-
Hall-review-sorely-disappointing.html


> It just takes focus
> and practice.  Preventing the = vs. == bug is nowhere near as complex or
> difficut as La Campanella, so you don't even need to be a virtuoso.  You
> just have to be mindful and careful.

Botched syntax is a form of botched spelling.

https://mail.python.org/pipermail/python-list/2018-May/733040.html

Maybe you just didn't want to spell "pposted" or "lenghty" correctly?


À propos of nothing, I used to know somebody who seriously used to argue 
that his spelling mistakes were deliberate. Not as as a self-deprecating 
joke. He literally tried to convince people that whenever he spelled 
something incorrectly, it was a deliberate choice for "irony" or 
"rhetorical effect" or "my own personal reasons". He fooled nobody.

Very sad, the extents people will go to to fool themselves into believing 
that they have 100% control over each and every one of their actions. 
Just sayin'.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: seeking deeper (language theory) reason behind Python design choice

2018-05-14 Thread Python
On Sun, May 13, 2018 at 09:46:48PM +1000, Chris Angelico wrote:
> > I expect that these days it will be rare, since most C compilers would
> > default to warning about it if you run with warnings enabled.
> 
> That assumes that you regularly run with warnings enabled. While that
> might seem like a no-brainer, unfortunately it isn't. With the number
> of C compilers out there, it's hard to make sure your code compiles
> cleanly with -Wall on every one of them; and if there's a spew of
> warnings, one more isn't going to be noticed. So for a large codebase,
> it's entirely possible that it WON'T regularly be compiled with
> warnings enabled.

As it happens, my team does compile with -Wall -Werror at all times in
every project (though we do rely on some third-party libraries as
dependencies which we can not). But I do agree with your point...

> Warnings certainly help, but they're not a complete solution.

Absolutely correct.  If you're not doing THOROUGH code reviews, and
not thoroughly testing your code, your job is only half done.  You
should be your own first reviewer, and then have a second someone
competent review it after you do.  You should also be your own first
tester, and then have someone competent test it after you.  In both
cases, ideally the "someone competent" would be a team of someones,
though that's not always practical.  But I believe this process is
absolutely essential to producing non-trivial quality software.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: seeking deeper (language theory) reason behind Python design choice

2018-05-14 Thread Python
On Sun, May 13, 2018 at 11:05:48AM +, Steven D'Aprano wrote:
> On Sat, 12 May 2018 21:42:13 -0500, Python wrote:
> 
> > Responding to this further would essentially just require me to
> > reiterate what I already wrote--I won't do that.  I'll simply maintain
> > that in my rather lenghty experience, this mistake has actually been
> > rather rare and has to my knowledge *never* caused a support issue
> > requiring a bug fix to production code in projects I've been associated
> > with.  It's a useful construction whose detriment has, IMO, been
> > completely overblown.
> 
> I already linked to the attempt to install a backdoor in the Linux kernel 
> with this, but even for accidental errors, thirty seconds on the CVE 
> database finds at least one real-world example:
> 
> https://www.cvedetails.com/cve/CVE-2009-4633/
> 
> I expect that these days it will be rare, since most C compilers would 
> default to warning about it if you run with warnings enabled.

A couple of anecdotes is a very far way off from making the case.
Either the code was not reviewed or the reviewer was careless.  And
I'm not saying it never happens, I'm saying it's not any worse than
any other possible bug, and far less common in practice than plenty of
other classes of bugs.  I'm also not saying that you couldn't use a
different operator that's less likely to cause confusion.  I *am*
saying this is a useful feature that I find myself wanting very often.
Obviously since there's a PEP about a way to provide exactly this
feature, plenty of people consider it a worthwhile feature to have.

Yes, bugs happen.  Eliminating useful constructs from the language is
not a good way of dealing with that problem.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: seeking deeper (language theory) reason behind Python design choice

2018-05-14 Thread Python
On Sun, May 13, 2018 at 02:42:48PM +1000, Chris Angelico wrote:
> On Sun, May 13, 2018 at 2:31 PM, Python  wrote:
> >> Yes, and I'd go further: I *am* too stupid to get this right.
> >
> > No, you are not.  Do you ever say "dog" when you mean "dot" instead?
> > Do you ever say "dad" when you mean "mom" instead?  Internalize that
> > "=" is "equals" (or "assigns" if you prefer) and "==" is "is equal to"
> > then use those phrases in your head when you're thinking about which
> > one you need in your code, and I'm pretty sure you'll stop making this
> > mistake.  It may help that the phrase with twice as many syllables
> > represents the operator that has twice as many characters.  Eventually
> > it becomes second nature, like not calling Dad "Mom."
> 
> Rght, of course. Because prevention of bugs is just a matter of
> wanting to. 

Preventing *certain classes* of bugs, mainly botching syntax, is mostly
just a matter of wanting to, like a piano virtuoso who can play
complicated pieces night after night flawlessly.  It just takes focus
and practice.  Preventing the = vs. == bug is nowhere near as 
complex or difficut as La Campanella, so you don't even need to be a
virtuoso.  You just have to be mindful and careful.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

2018-05-14 Thread Paul Moore
On 14 May 2018 at 14:07, Steven D'Aprano
 wrote:
> On Mon, 14 May 2018 12:59:28 +0100, Paul Moore wrote:
>
>> The problem is that supplying cum_weights allows the code to run in
>> O(log n) by using bisection. This is significantly faster on large
>> populations. Adding a test that the cumulative weights are nondecreasing
>> would add an O(n) step to the code.
>>
>> So while I understand the OP's problem, I don't think it's soluble
>> without making the cum_weights argument useless in practice.
>
> How does O(N) make it "useless"? There are lots of O(N) algorithms, even
> O(N**2) and O(2**N) which are nevertheless still useful.

Well, I've never seen an actual use case for this argument (I can't
think of a case where I'd even have cumulative weights rather than
weights, and obviously calculating the cumulative weights from the
actual weights is what we're trying to avoid). And if you have
cum_weights and O(n) is fine, then calculating weights from
cum_weights is acceptable (although pointless, as it simply duplicates
work). So the people who *really* need cum_weights are those who have
the cumulative weights already, and cannot afford an O(n)
precalculation step.

But yes, clearly in itself an O(n) algorithm isn't useless. And
agreed, in most cases whether random.choices() is O(n) or O(log n) is
irrelevant in practice.

Paul
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

2018-05-14 Thread Steven D'Aprano
On Mon, 14 May 2018 12:59:28 +0100, Paul Moore wrote:

> The problem is that supplying cum_weights allows the code to run in
> O(log n) by using bisection. This is significantly faster on large
> populations. Adding a test that the cumulative weights are nondecreasing
> would add an O(n) step to the code.
> 
> So while I understand the OP's problem, I don't think it's soluble
> without making the cum_weights argument useless in practice.

How does O(N) make it "useless"? There are lots of O(N) algorithms, even 
O(N**2) and O(2**N) which are nevertheless still useful.

Besides, might this be "premature optimization"? I get it that everyone 
wants their code to be faster rather than unnecessarily slower, but how 
often is random.choices() the bottleneck in your application?



> Better
> documentation might be worthwhile (although I don't personally find the
> current docs confusing, so suggestions for improvements would be
> helpful).

Indeed.

Hopefully the OP is still reading and is willing to sign up on the bug 
tracker.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

2018-05-14 Thread Paul Moore
On 14 May 2018 at 13:53, Chris Angelico  wrote:
> On Mon, May 14, 2018 at 10:49 PM, Paul Moore  wrote:
>> On 14 May 2018 at 13:27, Chris Angelico  wrote:
>>> On Mon, May 14, 2018 at 9:59 PM, Paul Moore  wrote:
 The problem is that supplying cum_weights allows the code to run in
 O(log n) by using bisection. This is significantly faster on large
 populations. Adding a test that the cumulative weights are
 nondecreasing would add an O(n) step to the code.

>>>
>>> Hang on - are the 'n' and 'log n' there referring to the same n?
>>
>> Yes. The number of elements in the sample population (which is the
>> same as the number of entries in the weights/cum_weights arrays).
>
> Okay, cool. Thanks. I was a little confused as to whether the weights
> were getting grouped up or not. Have seen too many cases where someone
> panics about an O(n²) on a tiny n that's unrelated to the important
> O(n) on a huge n :)

Yeah, for all of *my* uses of the functions in random, n is so small
as to make all this irrelevant. But when I looked into how cum_weights
worked, I realised it's aimed at people passing significant sized data
sets. An they would probably be hit hard by a change from O(log n) to
O(n).

One thing I always liked about C++ was the way the standard library
documented a lot of the O(n) properties of the operations. It not only
made it easier to know what was costly and what wasn't, it also made
it much clearer what functions were intended for use on large data
sets. I sort of miss that information in Python - not least because
functions like random.choices are often a lot faster than I'd naively
expect.

Paul
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

2018-05-14 Thread Steven D'Aprano
On Mon, 14 May 2018 22:27:24 +1000, Chris Angelico wrote:

> On Mon, May 14, 2018 at 9:59 PM, Paul Moore  wrote:
>> The problem is that supplying cum_weights allows the code to run in
>> O(log n) by using bisection. This is significantly faster on large
>> populations. Adding a test that the cumulative weights are
>> nondecreasing would add an O(n) step to the code.
>>
>>
> Hang on - are the 'n' and 'log n' there referring to the same n?

Yes -- the number of values you are choosing from, hence the number of 
weights.

If there are N values (and N weights), an upfront check would need to 
look at all N of them in the worst case that they were already in non-
descending order. (Of course it can bail out early if the check fails.)

Whereas the choice itself can use bisect to do a binary search of the 
values, which on average takes only log N comparisons.


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: seeking deeper (language theory) reason behind Python design choice

2018-05-14 Thread Chris Angelico
On Mon, May 14, 2018 at 10:49 PM, Rhodri James  wrote:
> On 13/05/18 05:31, Python wrote:
>>
>> No, you are not.  Do you ever say "dog" when you mean "dot" instead?
>> Do you ever say "dad" when you mean "mom" instead?
>
>
> One of my aunts used to muddle family names all the time.  She once called
> me by my sister's name; one would have thought the beard was a clue to that
> one.
>
> Similarly  my mother once asked my sister to "Get the thingummy off the
> whatsit."  The alarming thing was that my sister understood this and handed
> her the correct object.

That's alarming to you? It's pretty normal in my family. I think we
all developed mindreading abilities as young children, or something.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

2018-05-14 Thread Chris Angelico
On Mon, May 14, 2018 at 10:49 PM, Paul Moore  wrote:
> On 14 May 2018 at 13:27, Chris Angelico  wrote:
>> On Mon, May 14, 2018 at 9:59 PM, Paul Moore  wrote:
>>> The problem is that supplying cum_weights allows the code to run in
>>> O(log n) by using bisection. This is significantly faster on large
>>> populations. Adding a test that the cumulative weights are
>>> nondecreasing would add an O(n) step to the code.
>>>
>>
>> Hang on - are the 'n' and 'log n' there referring to the same n?
>
> Yes. The number of elements in the sample population (which is the
> same as the number of entries in the weights/cum_weights arrays).

Okay, cool. Thanks. I was a little confused as to whether the weights
were getting grouped up or not. Have seen too many cases where someone
panics about an O(n²) on a tiny n that's unrelated to the important
O(n) on a huge n :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pylint/pyreverse with Python3 [RESOLVED]

2018-05-14 Thread Rich Shepard

On Sun, 13 May 2018, Terry Reedy wrote:


You have to install a package in /site-packages for each version you want
to run it with.


Terry,

  It is installed in the site-packages/ directory for both 2.7 and 3.6.


Then you have to make sure you run the version you intend to run.


  Since both /usr/bin/pylint and /usr/bin/pyreverse are python scripts I can
change the first line to call python3 rather than python. I should have
looked at this before writing.

Thanks,

Rich

--
https://mail.python.org/mailman/listinfo/python-list


Re: seeking deeper (language theory) reason behind Python design choice

2018-05-14 Thread Rhodri James

On 13/05/18 05:31, Python wrote:

No, you are not.  Do you ever say "dog" when you mean "dot" instead?
Do you ever say "dad" when you mean "mom" instead?


One of my aunts used to muddle family names all the time.  She once 
called me by my sister's name; one would have thought the beard was a 
clue to that one.


Similarly  my mother once asked my sister to "Get the thingummy off the 
whatsit."  The alarming thing was that my sister understood this and 
handed her the correct object.


So yes, actually we do make that kind of error all the time.  Moreover, 
it's very hard to notice *in your own code* because you read what you 
meant, not what you wrote.  Ask any author about proof-reading, and 
they'll tell you to get someone else to do it.


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Re: random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

2018-05-14 Thread Paul Moore
On 14 May 2018 at 13:27, Chris Angelico  wrote:
> On Mon, May 14, 2018 at 9:59 PM, Paul Moore  wrote:
>> The problem is that supplying cum_weights allows the code to run in
>> O(log n) by using bisection. This is significantly faster on large
>> populations. Adding a test that the cumulative weights are
>> nondecreasing would add an O(n) step to the code.
>>
>
> Hang on - are the 'n' and 'log n' there referring to the same n?

Yes. The number of elements in the sample population (which is the
same as the number of entries in the weights/cum_weights arrays). See
https://github.com/python/cpython/blob/master/Lib/random.py#L382 for
details, but basically calculating cum_weights from weights costs
O(n), and locating the right index into the population by doing a
bisection search (bisect.bisect) on the cum_weights sequence costs
O(log n). Using the cum_weights argument rather than the weights
argument skips the O(n) step.

If it's possible to check that cum_weights is nondecreasing in O(log
n) time (either directly here, or in bisect.bisect), then the check
wouldn't affect the algorithmic complexity of that case (it would
affect the constants, but I assume we don't care too much about that).
But I don't know of a way of doing that.

Improving the documentation is of course free of runtime cost. And
making it clear that "you should only use cum_weights if you know what
you're doing, and in particular it doesn't cost you O(n) to work them
out" would seem entirely reasonable to me.

Paul
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pandas cat.categories.isin list, is this a bug?

2018-05-14 Thread Matt Ruffalo
On 2018-05-14 07:05, zljubi...@gmail.com wrote:
> Hi,
>
> I have dataframe with CRM_assetID column as category dtype:
>
> df.info()
>
> 
> RangeIndex: 1435952 entries, 0 to 1435951
> Data columns (total 75 columns):
> startTime1435952 non-null object
> CRM_assetID  1435952 non-null category
>
> searching a dataframe for each of three categories:
>
> df[df.CRM_assetID == 'V1254748'].shape
> (35, 75)
> df[df.CRM_assetID == 'V805722'].shape
> (45, 75)
> df[df.CRM_assetID == 'V1105400'].shape
> (34, 75)
>
>
> len(df.CRM_assetID.cat.categories.isin(['V1254748', 'V805722', 'V1105400']))
>
> Why this len is not equal to 114 (35 + 45 + 34)?
>
> Regards.

Hello-

First, this is a general Python group; not everyone here is necessarily
an expert in or user of Pandas. In the future you might have more
success with the pydata mailing list/group.

When you say that `len(df.CRM_assetID.cat.categories.isin(['V1254748',
'V805722', 'V1105400']))` is not equal to 114, it would be helpful to
say what this length actually is.

Your usage of `df.CRM_assetID.cat.categories` refers to the *unique
categories in that column*, not the actual values in that column.
Presumably you have more categories in that column than the three you
are checking with `isin`, since you are checking the length of a boolean
vector that signifies whether each distinct category is in that list.

MMR...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

2018-05-14 Thread Chris Angelico
On Mon, May 14, 2018 at 9:59 PM, Paul Moore  wrote:
> The problem is that supplying cum_weights allows the code to run in
> O(log n) by using bisection. This is significantly faster on large
> populations. Adding a test that the cumulative weights are
> nondecreasing would add an O(n) step to the code.
>

Hang on - are the 'n' and 'log n' there referring to the same n?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pandas cat.categories.isin list, is this a bug?

2018-05-14 Thread zljubisic
On Monday, 14 May 2018 13:05:24 UTC+2, zlju...@gmail.com  wrote:
> Hi,
> 
> I have dataframe with CRM_assetID column as category dtype:
> 
> df.info()
> 
> 
> RangeIndex: 1435952 entries, 0 to 1435951
> Data columns (total 75 columns):
> startTime1435952 non-null object
> CRM_assetID  1435952 non-null category
> 
> searching a dataframe for each of three categories:
> 
> df[df.CRM_assetID == 'V1254748'].shape
> (35, 75)
> df[df.CRM_assetID == 'V805722'].shape
> (45, 75)
> df[df.CRM_assetID == 'V1105400'].shape
> (34, 75)
> 
> 
> len(df.CRM_assetID.cat.categories.isin(['V1254748', 'V805722', 'V1105400']))
> 
> Why this len is not equal to 114 (35 + 45 + 34)?
> 
> Regards.

I forgot to copy result of:

len(df.CRM_assetID.cat.categories.isin(['V1254748', 'V805722', 'V1105400'])) 

which is 55418.
-- 
https://mail.python.org/mailman/listinfo/python-list


Pandas cat.categories.isin list, is this a bug?

2018-05-14 Thread zljubisic
Hi,

I have dataframe with CRM_assetID column as category dtype:

df.info()


RangeIndex: 1435952 entries, 0 to 1435951
Data columns (total 75 columns):
startTime1435952 non-null object
CRM_assetID  1435952 non-null category

searching a dataframe for each of three categories:

df[df.CRM_assetID == 'V1254748'].shape
(35, 75)
df[df.CRM_assetID == 'V805722'].shape
(45, 75)
df[df.CRM_assetID == 'V1105400'].shape
(34, 75)


len(df.CRM_assetID.cat.categories.isin(['V1254748', 'V805722', 'V1105400']))

Why this len is not equal to 114 (35 + 45 + 34)?

Regards.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: object types, mutable or not?

2018-05-14 Thread Steven D'Aprano
On Sun, 13 May 2018 13:02:01 -0700, Mike McClain wrote:

[...]
> It appears to me as if an object's type is totally mutable and solely
> dependant on assignment.
> 
 obj = 'a1b2'
 type (obj)
> 

 obj = list(obj)
 type (obj)
> 



> At what level does my understanding break down?

Mistaking a name (which refers to an object) for the object itself.

Five years ago, the President Of the United States of America, or POTUS 
for short, referred to Barrack Obama. Today, it refers to Donald Trump. 
This didn't happen by mutating a single person (an object) from a 
youngish black-skinned man to an oldish orange-skinned man. It happened 
by re-assigning the name from Obama to Trump.

Likewise when you re-assign a variable name from one value (an object) to 
another, the original object doesn't change. You just make the name refer 
to a different object, while the first goes on its merry way (probably to 
be collected by the garbage collector and the memory reclaimed).


-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

2018-05-14 Thread Paul Moore
The problem is that supplying cum_weights allows the code to run in
O(log n) by using bisection. This is significantly faster on large
populations. Adding a test that the cumulative weights are
nondecreasing would add an O(n) step to the code.

So while I understand the OP's problem, I don't think it's soluble
without making the cum_weights argument useless in practice. Better
documentation might be worthwhile (although I don't personally find
the current docs confusing, so suggestions for improvements would be
helpful).

Paul


On 14 May 2018 at 12:36, Steven D'Aprano
 wrote:
> Hi Paul, and welcome!
>
> On Sun, 13 May 2018 17:48:47 -0700, Paul wrote:
>
>> Hi,
>>   I just learned how to use random.choices().
> [...]
>> Consequently, I specified 'cum_weights' with a sequence which wasn't in
>> ascending order.  I got back k results but I determined that they
>> weren't correct (eg, certain population values were never returned).
>>
>>   Since the non-ascending sequence, which I had supplied, could not
>> possibly be valid input, why isn't this checked (and an error returned)?
>>  Returning incorrect results (which could be hard to spot as being
>> incorrect) is much more dangerous.  Also, checking that the list is in
>> ascending order need only be done once, and seems like it would be
>> inexpensive.
>
> Sounds like a reasonable feature request to me.
>
>
> https://bugs.python.org/issue33494
>
>
>
> --
> Steve
>
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

2018-05-14 Thread Steven D'Aprano
Hi Paul, and welcome!

On Sun, 13 May 2018 17:48:47 -0700, Paul wrote:

> Hi,
>   I just learned how to use random.choices().  
[...]
> Consequently, I specified 'cum_weights' with a sequence which wasn't in
> ascending order.  I got back k results but I determined that they
> weren't correct (eg, certain population values were never returned).
> 
>   Since the non-ascending sequence, which I had supplied, could not
> possibly be valid input, why isn't this checked (and an error returned)?
>  Returning incorrect results (which could be hard to spot as being
> incorrect) is much more dangerous.  Also, checking that the list is in
> ascending order need only be done once, and seems like it would be
> inexpensive.

Sounds like a reasonable feature request to me.


https://bugs.python.org/issue33494



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: f-string anomaly

2018-05-14 Thread Thomas Jollans
On 2018-05-14 04:08, Terry Reedy wrote:
> On 5/13/2018 3:22 PM, Ken Kundert wrote:
> 
> Please do not double post.
> 
>> I am seeing an unexpected difference between the behavior of the string
>> format method and f-strings.
> 
> Read
> https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals
> carefully.
> 
>> Here is an example:
>>
>>  import sys, os
>>  from inform import error, os_error
>>
>>  class mydict(dict):
>>  def __format__(self, template):
>>  print('Template:', template)
>>  return ', '.join(template.format(v, k=k, v=v) for k, v in
>> self.items())
>>
>>
>>  d = mydict(bob='239-8402', ted='371-8567', carol='891-5810',
>> alice='552-2219')
>>
>>  print('Using format():')
>>  print('Email: {0:{{k}}: {{v}}}'.format(d))
>>  print()
>>  print('Using f-string:')
>>  print(f'Email: {d:{{k}} {{v}}}')
>>  print()
>>  print('Using f-string:')
>>  print(f'Email: {d:{{k}} {{v}}}', k=6, v=9)
>>
>>
>> It generates the following response:
>>
>>  Using format():
>>  Template: {k}: {v}
>>  Email: bob: 239-8402, ted: 371-8567, carol: 891-5810, alice:
>> 552-2219
>>
>>  Using f-string:
>>  Traceback (most recent call last):
>>  File "tryit", line 18, in 
>>  print(f'Email: {d:{{k}} {{v}}}')
>>  NameError: name 'k' is not defined
> 
> This is what I expected.
> 
>> Essentially I am using a format string as the template that indicates
>> how to format each member of a dictionary, {{k}} should interpolate the
>> key and {{v}} interpolates the value.  This format string is embedded
>> inside another format string, so the braces are doubled up so that they
>> will be ignored by the outer format string.
> 
> "The parts of the string outside curly braces are treated literally,
> except that any doubled curly braces '{{' or '}}' are replaced with the
> corresponding single curly brace. " note 'outside'

It only accidentally works with the format method. It, too, does not
support escaped curly brackets in format specifiers, but they'll slip
through IFF all of them are matched:

>>> d = mydict(bob='239-8402', ted='371-8567', carol='891-5810',
... alice='552-2219')
>>> 'Email: {0:{{k}}: {{v}}}'.format(d)
Template: {k}: {v}
'Email: bob: 239-8402, ted: 371-8567, carol: 891-5810, alice: 552-2219'
>>> 'Email: {0:{{k}}  {{v}}}'.format(d)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: unmatched '{' in format spec
>>> 'Email: {0:{{k}}  {{v}}}'.format(d)
Template: {k} {{}} {v}
'Email: bob {} 239-8402, ted {} 371-8567, carol {} 891-5810, alice {}
552-2219'
>>>


> 
>> This idea seems to work okay when using the format() method. You can see
>> I added a print statement inside __format__ that shows that the method
>> is being called.
>>
>> However, trying the same idea with f-strings results in a NameError.  It
>> appears that the escaping does not work when used within the template.
>> It appears the error occurs before __format__ is called (there is no
>> output from the print function).
>>
>> Does anybody know why the format() method would work in this case but
>> the f-string would not?
> 
> All names in the expression are resolved in the local namespace of the f
> string.  There are other differences.  Nesting can only be one level deep.
> 
>> Is this a bug in f-strings?
> 
> Not to me.
> 
> 

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: f-string anomaly

2018-05-14 Thread Lele Gaifax
Ken Kundert  writes:

> I tried adding k and v to the local namespace:
>
> ...
> k = 6
> v = 9
> print(f'Email: {d:{{k}} {{v}}}')
>
> I still got:
>
> NameError: name 'k' is not defined

This is not what I get:

Python 3.6.5 (default, May 11 2018, 13:30:17) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> k=1
>>> v=2
>>> print(f'this is {{k}} and {{v}}')
this is {k} and {v}
>>> print(f'this is {k} and {v}')
this is 1 and 2
>>> print(f'Email: {d:{{k}} {{v}}}')
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'd' is not defined

ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
l...@metapensiero.it  | -- Fortunato Depero, 1929.

-- 
https://mail.python.org/mailman/listinfo/python-list