from:"Jen Kris via Python\-list"

How to write list of integers to file with struct.pack_into?

2023-10-03 Thread Jen Kris via Python-list

My previous message just went up -- sorry for the mangled formatting.  Here it 
is properly formatted:

I want to write a list of 64-bit integers to a binary file.  Every example I 
have seen in my research converts it to .txt, but I want it in binary.  I wrote 
this code, based on some earlier work I have done:

    buf = bytes((len(qs_array)) * 8)
    for offset in range(len(qs_array)):
    item_to_write = bytes(qs_array[offset])
    struct.pack_into(buf, "https://mail.python.org/mailman/listinfo/python-list

Re: How to write list of integers to file with struct.pack_into?

2023-10-02 Thread Jen Kris via Python-list

Dieter, thanks for your comment that:

* In your code, `offset` is `0`, `1`, `2`, ...
but it should be `0 *8`, `1 * 8`, `2 * 8`, ...

But you concluded with essentially the same solution proposed by MRAB, so that 
would obviate the need to write item by item because it writes the whole buffer 
at once.  

Thanks for your help.  


Oct 2, 2023, 17:47 by die...@handshake.de:

> Jen Kris wrote at 2023-10-2 00:04 +0200:
> >Iwant to write a list of 64-bit integers to a binary file.  Everyexample I 
> >have seen in my research convertsit to .txt, but I want it in binary.  I 
> >wrote this code,based on some earlier work I have done:
>
>>
>>
> >buf= bytes((len(qs_array)) * 8)
>
>>
>>
> >for offset in range(len(qs_array)):
>
>> item_to_write= bytes(qs_array[offset])
>>  struct.pack_into(buf,">
> >But I get the error "struct.error: embedded null character."
>
> You made a lot of errors:
>
>  * the signature of `struct.pack_into` is
>  `(format, buffer, offset, v1, v2, ...)`.
>  Especially: `format` is the first, `buffer` the second argument
>
>  * In your code, `offset` is `0`, `1`, `2`, ...
>  but it should be `0 *8`, `1 * 8`, `2 * 8`, ...
>
>  * The `vi` should be something which fits with the format:
>  integers in your case. But you pass bytes.
>
> Try `struct.pack_into(" instead of your loop.
>
>
> Next time: carefully read the documentation and think carefully
> about the types involved.
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to write list of integers to file with struct.pack_into?

2023-10-02 Thread Jen Kris via Python-list

Thanks very much, MRAB.  I just tried that and it works.  What frustrated me is 
that every research example I found writes integers as strings.  That works -- 
sort of -- but it requires re-casting each string to integer when reading the 
file.  If I'm doing binary work I don't want the extra overhead, and it's more 
difficult yet if I'm using the Python integer output in a C program.  Your 
solution solves those problems.  



Oct 2, 2023, 17:11 by python-list@python.org:

> On 2023-10-01 23:04, Jen Kris via Python-list wrote:
>
>>
>> Iwant to write a list of 64-bit integers to a binary file. Everyexample I 
>> have seen in my research convertsit to .txt, but I want it in binary.  I 
>> wrote this code,based on some earlier work I have done:
>>
>> buf= bytes((len(qs_array)) * 8)
>>
>> foroffset in range(len(qs_array)):
>>
>> item_to_write= bytes(qs_array[offset])
>>
>> struct.pack_into(buf,">
>> ButI get the error "struct.error: embedded null character."
>>
>> Maybethere's a better way to do this?
>>
> You can't pack into a 'bytes' object because it's immutable.
>
> The simplest solution I can think of is:
>
> buf = struct.pack("<%sQ" % len(qs_array), *qs_array)
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

How to write list of integers to file with struct.pack_into?

2023-10-02 Thread Jen Kris via Python-list


Iwant to write a list of 64-bit integers to a binary file.  Everyexample I have 
seen in my research convertsit to .txt, but I want it in binary.  I wrote this 
code,based on some earlier work I have done:




buf= bytes((len(qs_array)) * 8)

foroffset in range(len(qs_array)):


item_to_write= bytes(qs_array[offset])


struct.pack_into(buf,"https://mail.python.org/mailman/listinfo/python-list

Re: How does a method of a subclass become a method of the base class?

2023-03-27 Thread Jen Kris via Python-list


Thanks to everyone who answered this question.  Your answers have helped a lot. 
 

Jen


Mar 27, 2023, 14:12 by m...@wichmann.us:

> On 3/26/23 17:53, Jen Kris via Python-list wrote:
>
>> I’m asking all these question because I have worked in a procedural style 
>> for many years, with class work limited to only simple classes, but now I’m 
>> studying classes in more depth. The three answers I have received today, 
>> including yours, have helped a lot.
>>
>
> Classes in Python don't work quite like they do in many other languages.
>
> You may find a lightbulb if you listen to Raymond Hettinger talk about them:
>
> https://dailytechvideo.com/raymond-hettinger-pythons-class-development-toolkit/
>
> I'd also advise that benchmarks often do very strange things to set up the 
> scenario they're trying to test, a benchmark sure wouldn't be my first place 
> to look in learning a new piece of Python - I don't know if it was the first 
> place, but thought this was worth a mention.
>
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How does a method of a subclass become a method of the base class?

2023-03-26 Thread Jen Kris via Python-list


Cameron, 

Thanks for your reply.  You are correct about the class definition lines – e.g. 
class EqualityConstraint(BinaryConstraint).  I didn’t post all of the code 
because this program is over 600 lines long.  It's DeltaBlue in the Python 
benchmark suite.  

I’ve done some more work since this morning, and now I see what’s happening.  
But it gave rise to another question, which I’ll ask at the end. 

The call chain starts at

    EqualityConstraint(prev, v, Strength.REQUIRED) 

The class EqualityConstraint is a subclass of BinaryConstraint.  The entire 
class code is:

    class EqualityConstraint(BinaryConstraint):
    def execute(self):
    self.output().value = self.input().value

Because EqualityConstraint is a subclass of BinaryConstraint, the init method 
of BinaryConstraint is called first.  During that initialization (I showed the 
call chain in my previous message), it calls choose_method.  When I inspect the 
code at "self.choose_method(mark):" in PyCharm, it shows:

    >

As EqualityConstraint is a subclass of BinaryConstraint it has bound the choose 
method from BinaryConstraint, apparently during the BinaryConstraint init 
process, and that’s the one it uses.  So that answers my original question. 

But that brings up a new question.  I can create a class instance with x = 
BinaryConstraint(), but what happens when I have a line like 
"EqualityConstraint(prev, v, Strength.REQUIRED)"? Is it because the only method 
of EqualityConstraint is execute(self)?  Is execute a special function like a 
class __init__?  I’ve done research on that but I haven’t found an answer. 

I’m asking all these question because I have worked in a procedural style for 
many years, with class work limited to only simple classes, but now I’m 
studying classes in more depth. The three answers I have received today, 
including yours, have helped a lot. 

Thanks very much. 

Jen


Mar 26, 2023, 22:45 by c...@cskk.id.au:

> On 26Mar2023 22:36, Jen Kris  wrote:
>
>> At the final line it calls "satisfy" in the Constraint class, and that line 
>> calls choose_method in the BinaryConstraint class.  Just as Peter Holzer 
>> said, it requires a call to "satisfy." 
>>
>> My only remaining question is, did it select the choose_method in the 
>> BinaryConstraint class instead of the choose_method in the UrnaryConstraint 
>> class because of "super(BinaryConstraint, self).__init__(strength)" in step 
>> 2 above? 
>>
>
> Basicly, no.
>
> You've omitting the "class" lines of the class definitions, and they define 
> the class inheritance, _not "__init__". The "__init__" method just 
> initialises the state of the new objects (which has already been created). 
> The:
>
>  super(BinaryConstraint,_ self).__init__(strength)
>
> line simply calls the appropriate superclass "__init__" with the "strength" 
> parameter to do that aspect of the initialisation.
>
> You haven't cited the line which calls the "choose_method" method, but I'm 
> imagining it calls "choose_method" like this:
>
>  self.choose_method(...)
>
> That searchs for the "choose_method" method based on the method resolution 
> order of the object "self". So if "self" was an instance of 
> "EqualityConstraint", and I'm guessing abut its class definition, assuming 
> this:
>
>  class EqualityConstraint(BinaryConstraint):
>
> Then a call to "self.choose_method" would look for a "choose_method" method 
> first in the EqualityConstraint class and then via the BinaryConstraint 
> class. I'm also assuming UrnaryConstraint is not in that class ancestry i.e. 
> not an ancestor of BinaryConstraint, for example.
>
> The first method found is used.
>
> In practice, when you define a class like:
>
>  class EqualityConstraint(BinaryConstraint):
>
> the complete class ancestry (the addition classes from which BinaryConstraint 
> inherits) gets flatterned into a "method resultion order" list of classes to 
> inspect in order, and that is stored as the ".__mro__" field on the new class 
> (EqualityConstraint). You can look at it directly as 
> "EqualityConstraint.__mro__".
>
> So looking up:
>
>  self.choose_method()
>
> looks for a "choose_method" method on the classes in "type(self).__mro__".
>
> Cheers,
> Cameron Simpson 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How does a method of a subclass become a method of the base class?

2023-03-26 Thread Jen Kris via Python-list


Based on your explanations, I went through the call chain and now I understand 
better how it works, but I have a follow-up question at the end.    

This code comes from the DeltaBlue benchmark in the Python benchmark suite. 

1
The call chain starts in a non-class program with the following call:

EqualityConstraint(prev, v, Strength.REQUIRED)

2
EqualityConstraint is a subclass of BinaryConstraint, so first it calls the 
__init__ method of BinaryConstraint:

 def __init__(self, v1, v2, strength):
    super(BinaryConstraint, self).__init__(strength)
    self.v1 = v1
    self.v2 = v2
    self.direction = Direction.NONE
    self.add_constraint()

3
At the final line shown above it calls add_constraint in the Constraint class, 
the base class of BinaryConstraint:

  def add_constraint(self):
    global planner
    self.add_to_graph()
    planner.incremental_add(self)

4
At planner.incremental_add it calls incremental_add in the Planner class 
because planner is a global instance of the Planner class: 

    def incremental_add(self, constraint):
    mark = self.new_mark()
    overridden = constraint.satisfy(mark)

At the final line it calls "satisfy" in the Constraint class, and that line 
calls choose_method in the BinaryConstraint class.  Just as Peter Holzer said, 
it requires a call to "satisfy." 

My only remaining question is, did it select the choose_method in the 
BinaryConstraint class instead of the choose_method in the UrnaryConstraint 
class because of "super(BinaryConstraint, self).__init__(strength)" in step 2 
above? 

Thanks for helping me clarify that. 

Jen



Mar 26, 2023, 18:55 by hjp-pyt...@hjp.at:

> On 2023-03-26 19:43:44 +0200, Jen Kris via Python-list wrote:
>
>> The base class:
>>
>>
>> class Constraint(object):
>>
> [...]
>
>> def satisfy(self, mark):
>>     global planner
>>     self.choose_method(mark)
>>
>> The subclass:
>>
>> class UrnaryConstraint(Constraint):
>>
> [...]
>
>>     def choose_method(self, mark):
>>     if self.my_output.mark != mark and \
>>    Strength.stronger(self.strength, self.my_output.walk_strength):
>> self.satisfied = True
>>     else:
>>     self.satisfied = False
>>
>> The base class Constraint doesn’t have a "choose_method" class method,
>> but it’s called as self.choose_method(mark) on the final line of
>> Constraint shown above. 
>>
>> My question is:  what makes "choose_method" a method of the base
>> class,
>>
>
> Nothing. choose_method isn't a method of the base class.
>
>> called as self.choose_method instead of
>> UrnaryConstraint.choose_method?  Is it super(UrnaryConstraint,
>> self).__init__(strength) or just the fact that Constraint is its base
>> class? 
>>
>
> This works only if satisfy() is called on a subclass of Constraint which
> actually implements this method.
>
> If you do something like
>
> x = UrnaryConstraint()
> x.satisfy(whatever)
>
> Then x is a member of class UrnaryConstraint and will have a
> choose_method() method which can be called.
>
>
>> Also, this program also has a class BinaryConstraint that is also a
>> subclass of Constraint and it also has a choose_method class method
>> that is similar but not identical:
>>
> ...
>
>> When called from Constraint, it uses the one at UrnaryConstraint.  How
>> does it know which one to use? 
>>
>
> By inspecting self. If you call x.satisfy() on an object of class
> UrnaryConstraint, then self.choose_method will be the choose_method from
> UrnaryConstraint. If you call it on an object of class BinaryConstraint,
> then self.choose_method will be the choose_method from BinaryConstraint.
>
>  hp
>
> PS: Pretty sure there's one "r" too many in UrnaryConstraint.
>
> -- 
>  _  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How does a method of a subclass become a method of the base class?

2023-03-26 Thread Jen Kris via Python-list

Thanks to Richard Damon and Peter Holzer for your replies.  I'm working through 
the call chain to understand better so I can post a followup question if 
needed.  

Thanks again.

Jen


Mar 26, 2023, 19:21 by rich...@damon-family.org:

> On 3/26/23 1:43 PM, Jen Kris via Python-list wrote:
>
>> The base class:
>>
>>
>> class Constraint(object):
>>
>> def __init__(self, strength):
>>      super(Constraint, self).__init__()
>>      self.strength = strength
>>
>> def satisfy(self, mark):
>>      global planner
>>      self.choose_method(mark)
>>
>> The subclass:
>>
>> class UrnaryConstraint(Constraint):
>>
>> def __init__(self, v, strength):
>>      super(UrnaryConstraint, self).__init__(strength)
>>      self.my_output = v
>>      self.satisfied = False
>>      self.add_constraint()
>>
>>      def choose_method(self, mark):
>>      if self.my_output.mark != mark and \
>>     Strength.stronger(self.strength, self.my_output.walk_strength):
>> self.satisfied = True
>>      else:
>>      self.satisfied = False
>>
>> The base class Constraint doesn’t have a "choose_method" class method, but 
>> it’s called as self.choose_method(mark) on the final line of Constraint 
>> shown above.
>>
>> My question is:  what makes "choose_method" a method of the base class, 
>> called as self.choose_method instead of UrnaryConstraint.choose_method?  Is 
>> it super(UrnaryConstraint, self).__init__(strength) or just the fact that 
>> Constraint is its base class?
>>
>> Also, this program also has a class BinaryConstraint that is also a subclass 
>> of Constraint and it also has a choose_method class method that is similar 
>> but not identical:
>>
>> def choose_method(self, mark):
>>      if self.v1.mark == mark:
>>      if self.v2.mark != mark and Strength.stronger(self.strength, 
>> self.v2.walk_strength):
>>      self.direction = Direction.FORWARD
>>      else:
>>      self.direction = Direction.BACKWARD
>>
>> When called from Constraint, it uses the one at UrnaryConstraint.  How does 
>> it know which one to use?
>>
>> Thanks,
>>
>> Jen
>>
>
> Perhaps the key point to remember is that when looking up the methods on an 
> object, those methods are part of the object as a whole, not particually 
> "attached" to a given class. When creating the subclass typed object, first 
> the most base class part is built, and all the methods of that class are put 
> into the object, then the next level, and so on, and if a duplicate method is 
> found, it just overwrites the connection. Then when the object is used, we 
> see if there is a method by that name to use, so methods in the base can find 
> methods in subclasses to use.
>
> Perhaps a more modern approach would be to use the concept of an "abstract 
> base" which allows the base to indicate that a derived class needs to define 
> certain abstract methods, (If you need that sort of support, not defining a 
> method might just mean the subclass doesn't support some optional behavior 
> defined by the base)
>
> -- 
> Richard Damon
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

How does a method of a subclass become a method of the base class?

2023-03-26 Thread Jen Kris via Python-list


The base class:


class Constraint(object):

def __init__(self, strength):
    super(Constraint, self).__init__()
    self.strength = strength

def satisfy(self, mark):
    global planner
    self.choose_method(mark)

The subclass:

class UrnaryConstraint(Constraint):

def __init__(self, v, strength):
    super(UrnaryConstraint, self).__init__(strength)
    self.my_output = v
    self.satisfied = False
    self.add_constraint()

    def choose_method(self, mark):
    if self.my_output.mark != mark and \
   Strength.stronger(self.strength, self.my_output.walk_strength):
self.satisfied = True
    else:
    self.satisfied = False

The base class Constraint doesn’t have a "choose_method" class method, but it’s 
called as self.choose_method(mark) on the final line of Constraint shown above. 

My question is:  what makes "choose_method" a method of the base class, called 
as self.choose_method instead of UrnaryConstraint.choose_method?  Is it 
super(UrnaryConstraint, self).__init__(strength) or just the fact that 
Constraint is its base class? 

Also, this program also has a class BinaryConstraint that is also a subclass of 
Constraint and it also has a choose_method class method that is similar but not 
identical:

def choose_method(self, mark):
    if self.v1.mark == mark:
    if self.v2.mark != mark and Strength.stronger(self.strength, 
self.v2.walk_strength):
    self.direction = Direction.FORWARD
    else:
    self.direction = Direction.BACKWARD

When called from Constraint, it uses the one at UrnaryConstraint.  How does it 
know which one to use? 

Thanks,

Jen


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-28 Thread Jen Kris via Python-list


I wrote my previous message before reading this.  Thank you for the test you 
ran -- it answers the question of performance.  You show that re.finditer is 
30x faster, so that certainly recommends that over a simple loop, which 
introduces looping overhead.  


Feb 28, 2023, 05:44 by li...@tompassin.net:

> On 2/28/2023 4:33 AM, Roel Schroeven wrote:
>
>> Op 28/02/2023 om 3:44 schreef Thomas Passin:
>>
>>> On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
>>>
 And, just for fun, since there is nothing wrong with your code, this minor 
 change is terser:

>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
>>>
 ... print(match.start(), match.end())
 ...
 ...
 4 18
 26 40

>>>
>>> Just for more fun :) -
>>>
>>> Without knowing how general your expressions will be, I think the following 
>>> version is very readable, certainly more readable than regexes:
>>>
>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>> KEY = 'abc_degree + 1'
>>>
>>> for i in range(len(example)):
>>>     if example[i:].startswith(KEY):
>>>     print(i, i + len(KEY))
>>> # prints:
>>> 4 18
>>> 26 40
>>>
>> I think it's often a good idea to use a standard library function instead of 
>> rolling your own. The issue becomes less clear-cut when the standard library 
>> doesn't do exactly what you need (as here, where re.finditer() uses regular 
>> expressions while the use case only uses simple search strings). Ideally 
>> there would be a str.finditer() method we could use, but in the absence of 
>> that I think we still need to consider using the almost-but-not-quite 
>> fitting re.finditer().
>>
>> Two reasons:
>>
>> (1) I think it's clearer: the name tells us what it does (though of course 
>> we could solve this in a hand-written version by wrapping it in a suitably 
>> named function).
>>
>> (2) Searching for a string in another string, in a performant way, is not as 
>> simple as it first appears. Your version works correctly, but slowly. In 
>> some situations it doesn't matter, but in other cases it will. For better 
>> performance, string searching algorithms jump ahead either when they found a 
>> match or when they know for sure there isn't a match for some time (see e.g. 
>> the Boyer–Moore string-search algorithm). You could write such a more 
>> efficient algorithm, but then it becomes more complex and more error-prone. 
>> Using a well-tested existing function becomes quite attractive.
>>
>
> Sure, it all depends on what the real task will be.  That's why I wrote 
> "Without knowing how general your expressions will be". For the example 
> string, it's unlikely that speed will be a factor, but who knows what target 
> strings and keys will turn up in the future?
>
>> To illustrate the difference performance, I did a simple test (using the 
>> paragraph above is test text):
>>
>>      import re
>>      import timeit
>>
>>      def using_re_finditer(key, text):
>>      matches = []
>>      for match in re.finditer(re.escape(key), text):
>>      matches.append((match.start(), match.end()))
>>      return matches
>>
>>
>>      def using_simple_loop(key, text):
>>      matches = []
>>      for i in range(len(text)):
>>      if text[i:].startswith(key):
>>      matches.append((i, i + len(key)))
>>      return matches
>>
>>
>>      CORPUS = """Searching for a string in another string, in a performant 
>> way, is
>>      not as simple as it first appears. Your version works correctly, but 
>> slowly.
>>      In some situations it doesn't matter, but in other cases it will. For 
>> better
>>      performance, string searching algorithms jump ahead either when they 
>> found a
>>      match or when they know for sure there isn't a match for some time (see 
>> e.g.
>>      the Boyer–Moore string-search algorithm). You could write such a more
>>      efficient algorithm, but then it becomes more complex and more 
>> error-prone.
>>      Using a well-tested existing function becomes quite attractive."""
>>      KEY = 'in'
>>      print('using_simple_loop:', timeit.repeat(stmt='using_simple_loop(KEY, 
>> CORPUS)', globals=globals(), number=1000))
>>      print('using_re_finditer:', timeit.repeat(stmt='using_re_finditer(KEY, 
>> CORPUS)', globals=globals(), number=1000))
>>
>> This does 5 runs of 1000 repetitions each, and reports the time in seconds 
>> for each of those runs.
>> Result on my machine:
>>
>>      using_simple_loop: [0.1395295020792, 0.1306313000456, 
>> 0.1280345001249, 0.1318618002423, 0.1308461032626]
>>      using_re_finditer: [0.00386140005233, 0.00406190124297, 
>> 0.00347899970256, 0.00341310216218, 0.003732001273]
>>
>> We find that in this test re.finditer() is more than 30 times faster 
>> (despite the overhead of regular expressions.
>>
>> While speed isn't everything in programming,

Re: How to escape strings for re.finditer?

2023-02-28 Thread Jen Kris via Python-list


Using str.startswith is a cool idea in this case.  But is it better than regex 
for performance or reliability?  Regex syntax is not a model of simplicity, but 
in my simple case it's not too difficult.  


Feb 27, 2023, 18:52 by li...@tompassin.net:

> On 2/27/2023 9:16 PM, avi.e.gr...@gmail.com wrote:
>
>> And, just for fun, since there is nothing wrong with your code, this minor 
>> change is terser:
>>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> for match in re.finditer(re.escape('abc_degree + 1') , example):
>
>> ... print(match.start(), match.end())
>> ...
>> ...
>> 4 18
>> 26 40
>>
>
> Just for more fun :) -
>
> Without knowing how general your expressions will be, I think the following 
> version is very readable, certainly more readable than regexes:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> KEY = 'abc_degree + 1'
>
> for i in range(len(example)):
>  if example[i:].startswith(KEY):
>  print(i, i + len(KEY))
> # prints:
> 4 18
> 26 40
>
> If you may have variable numbers of spaces around the symbols, OTOH, the 
> whole situation changes and then regexes would almost certainly be the best 
> approach.  But the regular expression strings would become harder to read.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: How to escape strings for re.finditer?

2023-02-28 Thread Jen Kris via Python-list

The code I sent is correct, and it runs here.  Maybe you received it with a 
carriage return removed, but on my copy after posting, it is correct:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
 find_string = re.escape('abc_degree + 1')
 for match in re.finditer(find_string, example):
 print(match.start(), match.end())

One question:  several people have made suggestions other than regex (not your 
terser example with regex you shown below).  Is there a reason why regex is not 
preferred to, for example, a list comp?  Performance?  Reliability?  



  


Feb 27, 2023, 18:16 by avi.e.gr...@gmail.com:

> Jen,
>
> Can you see what SOME OF US see as ASCII text? We can help you better if we 
> get code that can be copied and run as-is.
>
>  What you sent is not terse. It is wrong. It will not run on any python 
> interpreter because you somehow lost a carriage return and indent.
>
> This is what you sent:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') for match in 
> re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> This is code indentedproperly:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') 
> for match in re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> Of course I am sure you wrote and ran code more like the latter version but 
> somewhere in your copy/paste process, 
>
> And, just for fun, since there is nothing wrong with your code, this minor 
> change is terser:
>
>>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
>>>>
> ... print(match.start(), match.end())
> ... 
> ... 
> 4 18
> 26 40
>
> But note once you use regular expressions, and not in your case, you might 
> match multiple things that are far from the same such as matching two 
> repeated words of any kind in any case including "and and" and "so so" or 
> finding words that have multiple doubled letter as in the  stereotypical 
> bookkeeper. In those cases, you may want even more than offsets but also show 
> the exact text that matched or even show some characters before and/or after 
> for context.
>
>
> -Original Message-
> From: Python-list  On 
> Behalf Of Jen Kris via Python-list
> Sent: Monday, February 27, 2023 8:36 PM
> To: Cameron Simpson 
> Cc: Python List 
> Subject: Re: How to escape strings for re.finditer?
>
>
> I haven't tested it either but it looks like it would work.  But for this 
> case I prefer the relative simplicity of:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') for match in 
> re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> 4 18
> 26 40
>
> I don't insist on terseness for its own sake, but it's cleaner this way. 
>
> Jen
>
>
> Feb 27, 2023, 16:55 by c...@cskk.id.au:
>
>> On 28Feb2023 01:13, Jen Kris  wrote:
>>
>>> I went to the re module because the specified string may appear more than 
>>> once in the string (in the code I'm writing).
>>>
>>
>> Sure, but writing a `finditer` for plain `str` is pretty easy (untested):
>>
>>  pos = 0
>>  while True:
>>  found = s.find(substring, pos)
>>  if found < 0:
>>  break
>>  start = found
>>  end = found + len(substring)
>>  ... do whatever with start and end ...
>>  pos = end
>>
>> Many people go straight to the `re` module whenever they're looking for 
>> strings. It is often cryptic error prone overkill. Just something to keep in 
>> mind.
>>
>> Cheers,
>> Cameron Simpson 
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list


I haven't tested it either but it looks like it would work.  But for this case 
I prefer the relative simplicity of:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1')
for match in re.finditer(find_string, example):
    print(match.start(), match.end())

4 18
26 40

I don't insist on terseness for its own sake, but it's cleaner this way.  

Jen


Feb 27, 2023, 16:55 by c...@cskk.id.au:

> On 28Feb2023 01:13, Jen Kris  wrote:
>
>> I went to the re module because the specified string may appear more than 
>> once in the string (in the code I'm writing).
>>
>
> Sure, but writing a `finditer` for plain `str` is pretty easy (untested):
>
>  pos = 0
>  while True:
>  found = s.find(substring, pos)
>  if found < 0:
>  break
>  start = found
>  end = found + len(substring)
>  ... do whatever with start and end ...
>  pos = end
>
> Many people go straight to the `re` module whenever they're looking for 
> strings. It is often cryptic error prone overkill. Just something to keep in 
> mind.
>
> Cheers,
> Cameron Simpson 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list


string.count() only tells me there are N instances of the string; it does not 
say where they begin and end, as does re.finditer.  

Feb 27, 2023, 16:20 by bobmellow...@gmail.com:

> Would string.count() work for you then?
>
> On Mon, Feb 27, 2023 at 5:16 PM Jen Kris via Python-list <> 
> python-list@python.org> > wrote:
>
>>
>> I went to the re module because the specified string may appear more than 
>> once in the string (in the code I'm writing).  For example: 
>>  
>>  a = "X - abc_degree + 1 + qq + abc_degree + 1"
>>   b = "abc_degree + 1"
>>   q = a.find(b)
>>  
>>  print(q)
>>  4
>>  
>>  So it correctly finds the start of the first instance, but not the second 
>> one.  The re code finds both instances.  If I knew that the substring 
>> occurred only once then the str.find would be best.  
>>  
>>  I changed my re code after MRAB's comment, it now works.  
>>  
>>  Thanks much.  
>>  
>>  Jen
>>  
>>  
>>  Feb 27, 2023, 15:56 by >> c...@cskk.id.au>> :
>>  
>>  > On 28Feb2023 00:11, Jen Kris <>> jenk...@tutanota.com>> > wrote:
>>  >
>>  >> When matching a string against a longer string, where both strings have 
>> spaces in them, we need to escape the spaces. 
>>  >>
>>  >> This works (no spaces):
>>  >>
>>  >> import re
>>  >> example = 'abcdefabcdefabcdefg'
>>  >> find_string = "abc"
>>  >> for match in re.finditer(find_string, example):
>>  >>     print(match.start(), match.end())
>>  >>
>>  >> That gives me the start and end character positions, which is what I 
>> want. 
>>  >>
>>  >> However, this does not work:
>>  >>
>>  >> import re
>>  >> example = re.escape('X - cty_degrees + 1 + qq')
>>  >> find_string = re.escape('cty_degrees + 1')
>>  >> for match in re.finditer(find_string, example):
>>  >>     print(match.start(), match.end())
>>  >>
>>  >> I’ve tried several other attempts based on my reseearch, but still no 
>> match. 
>>  >>
>>  >
>>  > You need to print those strings out. You're escaping the _example_ 
>> string, which would make it:
>>  >
>>  >  X - cty_degrees \+ 1 \+ qq
>>  >
>>  > because `+` is a special character in regexps and so `re.escape` escapes 
>> it. But you don't want to mangle the string you're searching! After all, the 
>> text above does not contain the string `cty_degrees + 1`.
>>  >
>>  > My secondary question is: if you're escaping the thing you're searching 
>> _for_, then you're effectively searching for a _fixed_ string, not a 
>> pattern/regexp. So why on earth are you using regexps to do your searching?
>>  >
>>  > The `str` type has a `find(substring)` function. Just use that! It'll be 
>> faster and the code simpler!
>>  >
>>  > Cheers,
>>  > Cameron Simpson <>> c...@cskk.id.au>> >
>>  > -- 
>>  > >> https://mail.python.org/mailman/listinfo/python-list
>>  >
>>  
>>  -- 
>>  >> https://mail.python.org/mailman/listinfo/python-list
>>
>
>
> -- 
>  Listen to my CD at > http://www.mellowood.ca/music/cedars>  
> Bob van der Poel ** Wynndel, British Columbia, CANADA **
> EMAIL: > b...@mellowood.ca
> WWW:   > http://www.mellowood.ca
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list


I went to the re module because the specified string may appear more than once 
in the string (in the code I'm writing).  For example:  

a = "X - abc_degree + 1 + qq + abc_degree + 1"
 b = "abc_degree + 1"
 q = a.find(b)

print(q)
4

So it correctly finds the start of the first instance, but not the second one.  
The re code finds both instances.  If I knew that the substring occurred only 
once then the str.find would be best.  

I changed my re code after MRAB's comment, it now works.  

Thanks much.  

Jen


Feb 27, 2023, 15:56 by c...@cskk.id.au:

> On 28Feb2023 00:11, Jen Kris  wrote:
>
>> When matching a string against a longer string, where both strings have 
>> spaces in them, we need to escape the spaces. 
>>
>> This works (no spaces):
>>
>> import re
>> example = 'abcdefabcdefabcdefg'
>> find_string = "abc"
>> for match in re.finditer(find_string, example):
>>     print(match.start(), match.end())
>>
>> That gives me the start and end character positions, which is what I want. 
>>
>> However, this does not work:
>>
>> import re
>> example = re.escape('X - cty_degrees + 1 + qq')
>> find_string = re.escape('cty_degrees + 1')
>> for match in re.finditer(find_string, example):
>>     print(match.start(), match.end())
>>
>> I’ve tried several other attempts based on my reseearch, but still no match. 
>>
>
> You need to print those strings out. You're escaping the _example_ string, 
> which would make it:
>
>  X - cty_degrees \+ 1 \+ qq
>
> because `+` is a special character in regexps and so `re.escape` escapes it. 
> But you don't want to mangle the string you're searching! After all, the text 
> above does not contain the string `cty_degrees + 1`.
>
> My secondary question is: if you're escaping the thing you're searching 
> _for_, then you're effectively searching for a _fixed_ string, not a 
> pattern/regexp. So why on earth are you using regexps to do your searching?
>
> The `str` type has a `find(substring)` function. Just use that! It'll be 
> faster and the code simpler!
>
> Cheers,
> Cameron Simpson 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list

Yes, that's it.  I don't know how long it would have taken to find that detail 
with research through the voluminous re documentation.  Thanks very much.  

Feb 27, 2023, 15:47 by pyt...@mrabarnett.plus.com:

> On 2023-02-27 23:11, Jen Kris via Python-list wrote:
>
>> When matching a string against a longer string, where both strings have 
>> spaces in them, we need to escape the spaces.
>>
>> This works (no spaces):
>>
>> import re
>> example = 'abcdefabcdefabcdefg'
>> find_string = "abc"
>> for match in re.finditer(find_string, example):
>>      print(match.start(), match.end())
>>
>> That gives me the start and end character positions, which is what I want.
>>
>> However, this does not work:
>>
>> import re
>> example = re.escape('X - cty_degrees + 1 + qq')
>> find_string = re.escape('cty_degrees + 1')
>> for match in re.finditer(find_string, example):
>>      print(match.start(), match.end())
>>
>> I’ve tried several other attempts based on my reseearch, but still no match.
>>
>> I don’t have much experience with regex, so I hoped a reg-expert might help.
>>
> You need to escape only the pattern, not the string you're searching.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list

When matching a string against a longer string, where both strings have spaces 
in them, we need to escape the spaces.  

This works (no spaces):

import re
example = 'abcdefabcdefabcdefg'
find_string = "abc"
for match in re.finditer(find_string, example):
    print(match.start(), match.end())

That gives me the start and end character positions, which is what I want. 

However, this does not work:

import re
example = re.escape('X - cty_degrees + 1 + qq')
find_string = re.escape('cty_degrees + 1')
for match in re.finditer(find_string, example):
    print(match.start(), match.end())

I’ve tried several other attempts based on my reseearch, but still no match. 

I don’t have much experience with regex, so I hoped a reg-expert might help. 

Thanks,

Jen

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: To clarify how Python handles two equal objects

2023-01-14 Thread Jen Kris via Python-list

Yes, in fact I asked my original question – "I discovered something about 
Python array handling that I would like to clarify" -- because I saw that 
Python did it that way.  



Jan 14, 2023, 15:51 by ros...@gmail.com:

> On Sun, 15 Jan 2023 at 10:32, Jen Kris via Python-list
>  wrote:
>
>> The situation I described in my original post is limited to a case such as x 
>> = y ... the assignment can be done simply by "x" taking the pointer to "y" 
>> rather than moving all the data from "y" into the memory buffer for "x"
>>
>
> It's not simply whether it *can* be done. It, in fact, *MUST* be done
> that way. The ONLY meaning of "x = y" is that you now have a name "x"
> which refers to whatever object is currently found under the name "y".
> This is not an optimization, it is a fundamental of Python's object
> model. This is true regardless of what kind of object this is; every
> object must behave this way.
>
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: To clarify how Python handles two equal objects

2023-01-14 Thread Jen Kris via Python-list

Avi, 

Your comments go farther afield than my original question, but you made some 
interesting additional points.  For example, I sometimes work with the C API 
and sys.getrefcount may be helpful in deciding when to INCREF and DECREF.  But 
that’s another issue. 

The situation I described in my original post is limited to a case such as x = 
y where both "x" and "y" are arrays – whether they are lists in Python, or from 
the array module – and the question in a compiled C extension is whether the 
assignment can be done simply by "x" taking the pointer to "y" rather than 
moving all the data from "y" into the memory buffer for "x" which, for a wide 
array, would be much more time consuming than just moving a pointer.  The other 
advantage to doing it that way is if, as in my case, we perform a math 
operation on any element in "x" then Python expects that the same change to be 
reflected in "y."  If I don’t use the same pointers then I would have to 
perform that operation twice – once for "x" and once  for "y" – in addition to 
the expense of moving all the data. 

The answers I got from this post confirmed that it I can use the pointer if "y" 
is not re-defined to something else during the lifespan of "x."  If it is then 
"x" has to be restored to its original pointer.  I did it that way, and 
helpfully the compiler did not overrule me. 


Jan 13, 2023, 18:41 by avi.e.gr...@gmail.com:

> Jen,
>
> This may not be on target but I was wondering about your needs in this 
> category. Are all your data in a form where all in a cluster are the same 
> object type, such as floating point?
>
> Python has features designed to allow you to get multiple views on such 
> objects such as memoryview that can be used to say see an array as a matrix 
> of n rows by m columns, or m x n, or any other combo. And of course the 
> fuller numpy package has quite a few features.
>
> However, as you note, there is no guarantee that any reference to the data 
> may not shift away from it unless you build fairly convoluted logic or data 
> structures such as having an object that arranges to do something when you 
> try to remove it, such as tinkering with the __del__ method as well as 
> whatever method is used to try to set it to a new value. I guess that might 
> make sense for something like asynchronous programming including when setting 
> locks so multiple things cannot overlap when being done.
>
> Anyway, some of the packages like numpy are optimized in many ways but if you 
> want to pass a subset of sorts to make processing faster, I suspect you could 
> do things like pass a memoryview but it might not be faster than what you 
> build albeit probably more reliable and portable.
>
> I note another odd idea that others may have mentioned, with caution.
>
> If you load the sys module, you can CAREFULLY use code like this.
>
> a="Something Unique"
> sys.getrefcount(a)
> 2
>
> Note if a==1 you will get some huge number of references and this is 
> meaningless. The 2 above is because asking about how many references also 
> references it.
>
> So save what ever number you have and see what happens when you make a second 
> reference or a third, and what happens if you delete or alter a reference:
>
> a="Something Unique"
> sys.getrefcount(a)
> 2
> b = a
> sys.getrefcount(a)
> 3
> sys.getrefcount(b)
> 3
> c = b
> d = a
> sys.getrefcount(a)
> 5
> sys.getrefcount(d)
> 5
> del(a)
> sys.getrefcount(d)
> 4
> b = "something else"
> sys.getrefcount(d)
> 3
>
> So, in theory, you could carefully write your code to CHECK the reference 
> count had not changed but there remain edge cases where a removed reference 
> is replaced by yet another new reference and you would have no idea.
>
> Avi
>
>
> -Original Message-
> From: Python-list  On 
> Behalf Of Jen Kris via Python-list
> Sent: Wednesday, January 11, 2023 1:29 PM
> To: Roel Schroeven 
> Cc: python-list@python.org
> Subject: Re: To clarify how Python handles two equal objects
>
> Thanks for your comments.  After all, I asked for clarity so it’s not 
> pedantic to be precise, and you’re helping to clarify. 
>
> Going back to my original post,
>
> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
> arr1 = mx1[2]
>
> Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed 
> because while they are different names, they are the assigned same memory 
> location (pointer).  Similarly, if I write "mx1[2][1] += 5" then again both 
> names will be updated. 
>
> That’s what I meant by "an operation on one is an operat

Re: To clarify how Python handles two equal objects

2023-01-13 Thread Jen Kris via Python-list

Bob, 

Your examples show a and b separately defined.  My example is where the 
definition is a=1; b = a.  But I'm only interested in arrays.  I would not rely 
on this for integers, and there's not likely to be any real cost savings there. 
  


Jan 13, 2023, 08:45 by b...@mellowood.ca:

> It seems to me that the the entire concept of relying on python's idea of 
> where an object is stored is just plain dangerous. A most simple example 
> might be:
>    >>> a=1
>    >>> b=1
>    >>> a is b
>   True
>   >>> a=1234
>   >>> b=1234
>   >>> a is b
>   False
>
> Not sure what happens if you manipulate the data referenced by 'b' in the 
> first example thinking you are changing something referred to by 'a' ... but 
> you might be smart to NOT think that you know.
>
>
>
> On Fri, Jan 13, 2023 at 9:00 AM Jen Kris via Python-list <> 
> python-list@python.org> > wrote:
>
>>
>> Avi,
>>  
>>  Thanks for your comments.  You make a good point. 
>>  
>>  Going back to my original question, and using your slice() example: 
>>  
>>  middle_by_two = slice(5, 10, 2)
>>  nums = [n for n in range(12)]
>>  q = nums[middle_by_two]
>>  x = id(q)
>>  b = q
>>  y = id(b)
>>  
>>  If I assign "b" to "q", then x and y match – they point to the same memory 
>> until "b" OR "q" are  reassigned to something else.  If "q" changes during 
>> the lifetime of "b" then it’s not safe to use the pointer to "q" for "b", as 
>> in:
>>  
>>  nums = [n for n in range(2, 14)]
>>  q = nums[middle_by_two]
>>  x = id(q)
>>  y = id(b)
>>  
>>  Now "x" and "y" are different, as we would expect.  So when writing a spot 
>> speed up in a compiled language, you can see in the Python source if either 
>> is reassigned, so you’ll know how to handle it.  The motivation behind my 
>> question was that in a compiled extension it’s faster to borrow a pointer 
>> than to move an entire array if it’s possible, but special care must be 
>> taken. 
>>  
>>  Jen
>>  
>>  
>>  
>>  Jan 12, 2023, 20:51 by >> avi.e.gr...@gmail.com>> :
>>  
>>  > Jen,
>>  >
>>  > It is dangerous territory you are treading as there are times all or 
>> parts of objects are copied, or changed in place or the method you use to 
>> make a view is not doing quite what you want.
>>  >
>>  > As an example, you can create a named slice such as:
>>  >
>>  >  middle_by_two = slice(5, 10, 2)
>>  >
>>  > The above is not in any sense pointing at anything yet. But given a long 
>> enough list or other such objects, it will take items (starting at index 0) 
>> starting with item that are at indices 5 then 7 then 9  as in this:
>>  >
>>  >  nums = [n for n in range(12)]
>>  >  nums[middle_by_two]
>>  >
>>  > [5, 7, 9]
>>  >
>>  > The same slice will work on anything else:
>>  >
>>  >  list('abcdefghijklmnopqrstuvwxyz')[middle_by_two]
>>  > ['f', 'h', 'j']
>>  >
>>  > So although you may think the slice is bound to something, it is not. It 
>> is an object that only later is briefly connected to whatever you want to 
>> apply it to.
>>  >
>>  > If I later change nums, above, like this:
>>  >
>>  >  nums = [-3, -2, -1] + nums
>>  >  nums
>>  > [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>  >  nums[middle_by_two]
>>  > [2, 4, 6]
>>  >
>>  > In the example, you can forget about whether we are talking about 
>> pointers directly or indirectly or variable names and so on. Your "view" 
>> remains valid ONLY as long as you do not change either the slice or the 
>> underlying object you are applying to -- at least not the items you want to 
>> extract.
>>  >
>>  > Since my example inserted three new items at the start using negative 
>> numbers for illustration, you would need to adjust the slice by making a new 
>> slice designed to fit your new data. The example below created an adjusted 
>> slice that adds 3 to the start and stop settings of the previous slice while 
>> copying the step value and then it works on the elongated object:
>>  >
>>  >  middle_by_two_adj = slice(middle_by_two.start + 3, middle_by_two.stop + 
>> 3, middle_by_two.step)
>>  >  nums[middle_by_two_adj]
>>  > [5, 7, 9]
>>  >
>>  >

RE: To clarify how Python handles two equal objects

2023-01-13 Thread Jen Kris via Python-list


Avi,

Thanks for your comments.  You make a good point. 

Going back to my original question, and using your slice() example: 

middle_by_two = slice(5, 10, 2)
nums = [n for n in range(12)]
q = nums[middle_by_two]
x = id(q)
b = q
y = id(b)

If I assign "b" to "q", then x and y match – they point to the same memory 
until "b" OR "q" are  reassigned to something else.  If "q" changes during the 
lifetime of "b" then it’s not safe to use the pointer to "q" for "b", as in:

nums = [n for n in range(2, 14)]
q = nums[middle_by_two]
x = id(q)
y = id(b)

Now "x" and "y" are different, as we would expect.  So when writing a spot 
speed up in a compiled language, you can see in the Python source if either is 
reassigned, so you’ll know how to handle it.  The motivation behind my question 
was that in a compiled extension it’s faster to borrow a pointer than to move 
an entire array if it’s possible, but special care must be taken. 

Jen



Jan 12, 2023, 20:51 by avi.e.gr...@gmail.com:

> Jen,
>
> It is dangerous territory you are treading as there are times all or parts of 
> objects are copied, or changed in place or the method you use to make a view 
> is not doing quite what you want.
>
> As an example, you can create a named slice such as:
>
>  middle_by_two = slice(5, 10, 2)
>
> The above is not in any sense pointing at anything yet. But given a long 
> enough list or other such objects, it will take items (starting at index 0) 
> starting with item that are at indices 5 then 7 then 9  as in this:
>
>  nums = [n for n in range(12)]
>  nums[middle_by_two]
>
> [5, 7, 9]
>
> The same slice will work on anything else:
>
>  list('abcdefghijklmnopqrstuvwxyz')[middle_by_two]
> ['f', 'h', 'j']
>
> So although you may think the slice is bound to something, it is not. It is 
> an object that only later is briefly connected to whatever you want to apply 
> it to.
>
> If I later change nums, above, like this:
>
>  nums = [-3, -2, -1] + nums
>  nums
> [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>  nums[middle_by_two]
> [2, 4, 6]
>
> In the example, you can forget about whether we are talking about pointers 
> directly or indirectly or variable names and so on. Your "view" remains valid 
> ONLY as long as you do not change either the slice or the underlying object 
> you are applying to -- at least not the items you want to extract.
>
> Since my example inserted three new items at the start using negative numbers 
> for illustration, you would need to adjust the slice by making a new slice 
> designed to fit your new data. The example below created an adjusted slice 
> that adds 3 to the start and stop settings of the previous slice while 
> copying the step value and then it works on the elongated object:
>
>  middle_by_two_adj = slice(middle_by_two.start + 3, middle_by_two.stop + 3, 
> middle_by_two.step)
>  nums[middle_by_two_adj]
> [5, 7, 9]
>
> A suggestion is  that whenever you are not absolutely sure that the contents 
> of some data structure might change without your participation, then don't 
> depend on various kinds of aliases to keep the contents synchronized. Make a 
> copy, perhaps  a deep copy and make sure the only thing ever changing it is 
> your code and later, if needed, copy the result back to any other data 
> structure. Of course, if anything else is accessing the result in the 
> original in between, it won't work.
>
> Just FYI, a similar analysis applies to uses of the numpy and pandas and 
> other modules if you get some kind of object holding indices to a series such 
> as integers or Booleans and then later try using it after the number of items 
> or rows or columns have changed. Your indices no longer match.
>
> Avi
>
> -Original Message-
> From: Python-list  On 
> Behalf Of Jen Kris via Python-list
> Sent: Wednesday, January 11, 2023 1:29 PM
> To: Roel Schroeven 
> Cc: python-list@python.org
> Subject: Re: To clarify how Python handles two equal objects
>
> Thanks for your comments.  After all, I asked for clarity so it’s not 
> pedantic to be precise, and you’re helping to clarify. 
>
> Going back to my original post,
>
> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
> arr1 = mx1[2]
>
> Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed 
> because while they are different names, they are the assigned same memory 
> location (pointer).  Similarly, if I write "mx1[2][1] += 5" then again both 
> names will be updated. 
>
> That’s what I meant by "an operation on one is an operation on the other."  
> To be more precise, an operation on one name will

Re: To clarify how Python handles two equal objects

2023-01-11 Thread Jen Kris via Python-list

Thanks for your comments.  After all, I asked for clarity so it’s not pedantic 
to be precise, and you’re helping to clarify.  

Going back to my original post,

mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
arr1 = mx1[2]

Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed 
because while they are different names, they are the assigned same memory 
location (pointer).  Similarly, if I write "mx1[2][1] += 5" then again both 
names will be updated. 

That’s what I meant by "an operation on one is an operation on the other."  To 
be more precise, an operation on one name will be reflected in the other name.  
The difference is in the names,  not the pointers.  Each name has the same 
pointer in my example, but operations can be done in Python using either name. 




Jan 11, 2023, 09:13 by r...@roelschroeven.net:

> Op 11/01/2023 om 16:33 schreef Jen Kris via Python-list:
>
>> Yes, I did understand that.  In your example, "a" and "b" are the same 
>> pointer, so an operation on one is an operation on the other (because 
>> they’re the same memory block).
>>
>
> Sorry if you feel I'm being overly pedantic, but your explanation "an 
> operation on one is an operation on the other (because they’re the same 
> memory block)" still feels a bit misguided. "One" and "other" still make it 
> sound like there are two objects, and "an operation on one" and "an operation 
> on the other" make it sound like there are two operations.
> Sometimes it doesn't matter if we're a bit sloppy for sake of simplicity or 
> convenience, sometimes we really need to be precise. I think this is a case 
> where we need to be precise.
>
> So, to be precise: there is only one object, with possible multiple names to 
> it. We can change the object, using one of the names. That is one and only 
> one operation on one and only one object. Since the different names refer to 
> the same object, that change will of course be visible through all of them.
> Note that 'name' in that sentence doesn't just refer to variables (mx1, arr1, 
> ...) but also things like indexed lists (mx1[0], mx1[[0][0], ...), loop 
> variables, function arguments.
>
> The correct mental model is important here, and I do think you're on track or 
> very close to it, but the way you phrase things does give me that nagging 
> feeling that you still might be just a bit off.
>
> -- 
> "Peace cannot be kept by force. It can only be achieved through 
> understanding."
>  -- Albert Einstein
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: To clarify how Python handles two equal objects

2023-01-11 Thread Jen Kris via Python-list

Yes, I did understand that.  In your example, "a" and "b" are the same pointer, 
so an operation on one is an operation on the other (because they’re the same 
memory block).  My issue in Python came up because Python can dynamically 
change one or the other to a different object (memory block) so I have to be 
aware of that when handing this kind of situation. 


Jan 10, 2023, 17:31 by greg.ew...@canterbury.ac.nz:

> On 11/01/23 11:21 am, Jen Kris wrote:
>
>> where one object derives from another object (a = b[0], for example), any 
>> operation that would alter one will alter the other.
>>
>
> I think you're still confused. In C terms, after a = b[0], a and b[0]
> are pointers to the same block of memory. If you change that block of
> memory, then of course you will see the change through either pointer.
>
> Here's a rough C translation of some of your Python code:
>
> /* mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] */
> int **mx1 = (int **)malloc(3 * sizeof(int *));
> mx1[0] = (int *)malloc(3 * sizeof(int));
> mx1[0][0] = 1;
> mx1[0][1] = 2;
> mx1[0][2] = 3;
> mx1[1] = (int *)malloc(3 * sizeof(int));
> mx1[1][0] = 4;
> mx1[1][1] = 5;
> mx1[1][2] = 6;
> mx1[2] = (int *)malloc(3 * sizeof(int));
> mx1[2][0] = 7;
> mx1[2][1] = 8;
> mx1[2][2] = 9;
>
> /* arr1 = mx1[2] */
> int *arr1 = mx[2];
>
> /* arr1 = [ 10, 11, 12 ] */
> arr1 = (int *)malloc(3 * sizeof(int));
> arr1[0] = 10;
> arr1[1] = 11;
> arr1[2] = 12;
>
> Does that help your understanding?
>
> -- 
> Greg
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: To clarify how Python handles two equal objects

2023-01-10 Thread Jen Kris via Python-list

There are cases where NumPy would be the best choice, but that wasn’t the case 
here with what the loop was doing.  

To sum up what I learned from this post, where one object derives from another 
object (a = b[0], for example), any operation that would alter one will alter 
the other.  When either is assigned to something else, then they no longer 
point to the same memory location and they’re once again independent.   I hope 
the word "derives" sidesteps the semantic issue of whether they are "equal."    

Thanks to all who replied to this post.  

Jen


Jan 10, 2023, 13:59 by li...@tompassin.net:

> Just to add a possibly picky detail to what others have said, Python does not 
> have an "array" type.  It has a "list" type, as well as some other, not 
> necessarily mutable, sequence types.
>
> If you want to speed up list and matrix operations, you might use NumPy.  Its 
> arrays and matrices are heavily optimized for fast processing and provide 
> many useful operations on them.  No use calling out to C code yourself when 
> NumPy has been refining that for many years.
>
> On 1/10/2023 4:10 PM, MRAB wrote:
>
>> On 2023-01-10 20:41, Jen Kris via Python-list wrote:
>>
>>>
>>> Thanks for your comments.  I'd like to make one small point.  You say:
>>>
>>> "Assignment in Python is a matter of object references. It's not
>>> "conform them as long as they remain equal". You'll have to think in
>>> terms of object references the entire way."
>>>
>>> But where they have been set to the same object, an operation on one will 
>>> affect the other as long as they are equal (in Python).  So I will have to 
>>> conform them in those cases because Python will reflect any math operation 
>>> in both the array and the matrix.
>>>
>> It's not a 2D matrix, it's a 1D list containing references to 1D lists, each 
>> of which contains references to Python ints.
>>
>> In CPython, references happen to be pointers, but that's just an 
>> implementation detail.
>>
>>>
>>>
>>> Jan 10, 2023, 12:28 by ros...@gmail.com:
>>>
>>>> On Wed, 11 Jan 2023 at 07:14, Jen Kris via Python-list
>>>>  wrote:
>>>>
>>>>>
>>>>> I am writing a spot speedup in assembly language for a short but 
>>>>> computation-intensive Python loop, and I discovered something about 
>>>>> Python array handling that I would like to clarify.
>>>>>
>>>>> For a simplified example, I created a matrix mx1 and assigned the array 
>>>>> arr1 to the third row of the matrix:
>>>>>
>>>>> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
>>>>> arr1 = mx1[2]
>>>>>
>>>>> The pointers to these are now the same:
>>>>>
>>>>> ida = id(mx1[2]) - 140260325306880
>>>>> idb = id(arr1) - 140260325306880
>>>>>
>>>>> That’s great because when I encounter this in assembly or C, I can just 
>>>>> borrow the pointer to row 3 for the array arr1, on the assumption that 
>>>>> they will continue to point to the same object.  Then when I do any math 
>>>>> operations in arr1 it will be reflected in both arrays because they are 
>>>>> now pointing to the same array:
>>>>>
>>>>
>>>> That's not an optimization; what you've done is set arr1 to be a
>>>> reference to that object.
>>>>
>>>>> But on the next iteration we assign arr1 to something else:
>>>>>
>>>>> arr1 = [ 10, 11, 12 ]
>>>>> idc = id(arr1) – 140260325308160
>>>>> idd = id(mx1[2]) – 140260325306880
>>>>>
>>>>> Now arr1 is no longer equal to mx1[2], and any subsequent operations in 
>>>>> arr1 will not affect mx1.
>>>>>
>>>>
>>>> Yep, you have just set arr1 to be a completely different object.
>>>>
>>>>> So where I’m rewriting some Python code in a low level language, I can’t 
>>>>> assume that the two objects are equal because that equality will not 
>>>>> remain if either is reassigned.  So if I do some operation on one array I 
>>>>> have to conform the two arrays for as long as they remain equal, I can’t 
>>>>> just do it in one operation because I can’t rely on the objects remaining 
>>>>> equal.
>>>>>
>>>>> Is my understanding of this correct?  Is there anything I’m missing?
>>>>>
>>>>
>>>> Assignment in Python is a matter of object references. It's not
>>>> "conform them as long as they remain equal". You'll have to think in
>>>> terms of object references the entire way.
>>>>
>>>> ChrisA
>>>> -- 
>>>> https://mail.python.org/mailman/listinfo/python-list
>>>>
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: To clarify how Python handles two equal objects

2023-01-10 Thread Jen Kris via Python-list


Thanks for your comments.  I'd like to make one small point.  You say:

"Assignment in Python is a matter of object references. It's not
"conform them as long as they remain equal". You'll have to think in
terms of object references the entire way."

But where they have been set to the same object, an operation on one will 
affect the other as long as they are equal (in Python).  So I will have to 
conform them in those cases because Python will reflect any math operation in 
both the array and the matrix.  



Jan 10, 2023, 12:28 by ros...@gmail.com:

> On Wed, 11 Jan 2023 at 07:14, Jen Kris via Python-list
>  wrote:
>
>>
>> I am writing a spot speedup in assembly language for a short but 
>> computation-intensive Python loop, and I discovered something about Python 
>> array handling that I would like to clarify.
>>
>> For a simplified example, I created a matrix mx1 and assigned the array arr1 
>> to the third row of the matrix:
>>
>> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
>> arr1 = mx1[2]
>>
>> The pointers to these are now the same:
>>
>> ida = id(mx1[2]) - 140260325306880
>> idb = id(arr1) - 140260325306880
>>
>> That’s great because when I encounter this in assembly or C, I can just 
>> borrow the pointer to row 3 for the array arr1, on the assumption that they 
>> will continue to point to the same object.  Then when I do any math 
>> operations in arr1 it will be reflected in both arrays because they are now 
>> pointing to the same array:
>>
>
> That's not an optimization; what you've done is set arr1 to be a
> reference to that object.
>
>> But on the next iteration we assign arr1 to something else:
>>
>> arr1 = [ 10, 11, 12 ]
>> idc = id(arr1) – 140260325308160
>> idd = id(mx1[2]) – 140260325306880
>>
>> Now arr1 is no longer equal to mx1[2], and any subsequent operations in arr1 
>> will not affect mx1.
>>
>
> Yep, you have just set arr1 to be a completely different object.
>
>> So where I’m rewriting some Python code in a low level language, I can’t 
>> assume that the two objects are equal because that equality will not remain 
>> if either is reassigned.  So if I do some operation on one array I have to 
>> conform the two arrays for as long as they remain equal, I can’t just do it 
>> in one operation because I can’t rely on the objects remaining equal.
>>
>> Is my understanding of this correct?  Is there anything I’m missing?
>>
>
> Assignment in Python is a matter of object references. It's not
> "conform them as long as they remain equal". You'll have to think in
> terms of object references the entire way.
>
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

To clarify how Python handles two equal objects

2023-01-10 Thread Jen Kris via Python-list

I am writing a spot speedup in assembly language for a short but 
computation-intensive Python loop, and I discovered something about Python 
array handling that I would like to clarify.  

For a simplified example, I created a matrix mx1 and assigned the array arr1 to 
the third row of the matrix:

mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
arr1 = mx1[2]

The pointers to these are now the same:

ida = id(mx1[2]) - 140260325306880
idb = id(arr1) - 140260325306880

That’s great because when I encounter this in assembly or C, I can just borrow 
the pointer to row 3 for the array arr1, on the assumption that they will 
continue to point to the same object.  Then when I do any math operations in 
arr1 it will be reflected in both arrays because they are now pointing to the 
same array:

arr1[0] += 2
print(mx1[2]) - [9, 8, 9]
print(arr1) - [9, 8, 9]

Now mx1 looks like this:

[ 1, 2, 3 ]
[ 4, 5, 6 ]
[ 9, 8, 9 ]

and it stays that way for remaining iterations.  

But on the next iteration we assign arr1 to something else:

arr1 = [ 10, 11, 12 ]
idc = id(arr1) – 140260325308160
idd = id(mx1[2]) – 140260325306880

Now arr1 is no longer equal to mx1[2], and any subsequent operations in arr1 
will not affect mx1.  So where I’m rewriting some Python code in a low level 
language, I can’t assume that the two objects are equal because that equality 
will not remain if either is reassigned.  So if I do some operation on one 
array I have to conform the two arrays for as long as they remain equal, I 
can’t just do it in one operation because I can’t rely on the objects remaining 
equal. 

Is my understanding of this correct?  Is there anything I’m missing? 

Thanks very much. 

Jen


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Debugging Python C extensions with GDB

2022-11-14 Thread Jen Kris via Python-list

Thanks for your reply.  Victor's article didn't mention ctypes extensions, so I 
wanted to post a question before I build from source.  


Nov 14, 2022, 14:32 by ba...@barrys-emacs.org:

>
>
>> On 14 Nov 2022, at 19:10, Jen Kris via Python-list  
>> wrote:
>>
>> In September 2021, Victor Stinner wrote “Debugging Python C extensions with 
>> GDB” 
>> (https://developers.redhat.com/articles/2021/09/08/debugging-python-c-extensions-gdb#getting_started_with_python_3_9).
>>  
>>
>> My question is:  with Python 3.9+, can I debug into a C extension written in 
>> pure C and called from ctypes  -- that is not written using the C_API?
>>
>
> Yes.
>
> Just put a breakpoint on the function in the c library that you want to debug.
> You can set the breakpoint before a .so is loaded.
>
> Barry
>
>>
>> Thanks. 
>>
>> Jen
>>
>>
>>
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
>>

-- 
https://mail.python.org/mailman/listinfo/python-list

Debugging Python C extensions with GDB

2022-11-14 Thread Jen Kris via Python-list

In September 2021, Victor Stinner wrote “Debugging Python C extensions with 
GDB” 
(https://developers.redhat.com/articles/2021/09/08/debugging-python-c-extensions-gdb#getting_started_with_python_3_9).
  

My question is:  with Python 3.9+, can I debug into a C extension written in 
pure C and called from ctypes  -- that is not written using the C_API? 

Thanks. 

Jen



-- 
https://mail.python.org/mailman/listinfo/python-list

Re: PyObject_CallFunctionObjArgs segfaults

2022-09-30 Thread Jen Kris via Python-list


That's great.  It clarifies things a lot for me, particularly re ref count for 
new references.  I would have had trouble if I didn't decref it twice.  

Thanks very much once again.  


Sep 30, 2022, 12:18 by pyt...@mrabarnett.plus.com:

> On 2022-09-30 17:02, Jen Kris wrote:
>
>>
>> Thanks very much for your detailed reply.  I have a few followup questions.
>>
>> You said, “Some functions return an object that has already been incref'ed 
>> ("new reference"). This occurs when it has either created a new object (the 
>> refcount will be 1) or has returned a pointer to an existing object (the 
>> refcount will be > 1 because it has been incref'ed).  Other functions return 
>> an object that hasn't been incref'ed. This occurs when you're looking up 
>> something, for example, looking at a member of a list or the value of an 
>> attribute.”
>>
>> In the official docs some functions show “Return value: New reference” and 
>> others do not.  Is there any reason why I should not just INCREF on every 
>> new object, regardless of whether it’s a new reference or not, and DECREF 
>> when I am finished with it?  The answer at 
>> https://stackoverflow.com/questions/59870703/python-c-extension-need-to-py-incref-a-borrowed-reference-if-not-returning-it-to
>>  says “With out-of-order execution, the INCREF/DECREF are basically free 
>> operations, so performance is no reason to leave them out.”  Doing so means 
>> I don’t have to check each object to see if it needs to be INCREF’d or not, 
>> and that is a big help.
>>
> It's OK to INCREF them, provided that you DECREF them when you no longer need 
> them, and remember that if it's a "new reference" you'd need to DECREF it 
> twice.
>
>> Also:
>>
>> What is a borrowed reference, and how does it effect reference counting?  
>> According to https://jayrambhia.com/blog/pythonc-api-reference-counting, 
>> “Use Py_INCREF on a borrowed PyObject pointer you already have. This 
>> increments the reference count on the object, and obligates you to dispose 
>> of it properly.”  So I guess it’s yes, but I’m confused by “pointer you 
>> already have.”
>>
>
> A borrowed reference is when it hasn't been INCREFed.
>
> You can think of INCREFing as a way of indicating ownership, which is often 
> shared ownership (refcount > 1). When you're borrowing a reference, you're 
> using it temporarily, but not claiming ownership. When the last owner 
> releases its ownership (DECREF reduces the refcount to 0), the object can be 
> garbage collected.
>
> When, say, you lookup an attribute, or get an object from a list with 
> PyList_GetItem, it won't have been INCREFed. You're using it temporarily, 
> just borrowing a reference.
>
>>
>> What does it mean to steal a reference?  If a function steals a reference 
>> does it have to decref it without incref (because it’s stolen)?
>>
> When function steals a reference, it's claiming ownership but not INCREFing 
> it.
>
>>
>> Finally, you said:
>>
>> if (pMod_random == 0x0){
>>     PyErr_Print();
>> Leaks here because of the refcount
>>
>> Assuming pMod_random is not null, why would this leak?
>>
> It's pName_random that's the leak.
>
> PyUnicode_FromString("random") will either create and return a new object for 
> the string "random" (refcount == 1) or return a reference to an existing 
> object (refcount > 1). You need to DECREF it before returning from the 
> function.
>
> Suppose it created a new object. You call the function, it creates an object, 
> you use it, then return from the function. The object still exists, but 
> there's no reference to it. Now call the function again. It creates another 
> object, you use it, then return from the function. You now have 2 objects 
> with no reference to them.
>
>> Thanks again for your input on this question.
>>
>> Jen
>>
>>
>>
>> Sep 29, 2022, 17:33 by pyt...@mrabarnett.plus.com:
>>
>>  On 2022-09-30 01:02, MRAB wrote:
>>
>>  On 2022-09-29 23:41, Jen Kris wrote:
>>
>>
>>  I just solved this C API problem, and I’m posting the
>>  answer to help anyone else who might need it.
>>
>>  [snip]
>>
>>  What I like to do is write comments that state which variables
>>  hold a reference, followed by '+' if it's a new reference
>>  (incref'ed) and '?' if it could be null. '+?' means that it's
>>  probably a new reference but could be null. Once I know that it's
>>  not null, I can remove the '?', and once I've decref'ed it (if
>>  required) and no longer need it, I remobe it from the comment.
>>
>>  Clearing up references, as soon as they're not needed, helps to
>>  keep the number of current references more manageable.
>>
>>
>>  int64_t Get_LibModules(int64_t * return_array) {
>>  PyObject * pName_random = PyUnicode_FromString("random");
>>  //> pName_random+?
>>  if (!pName_random) {
>>  PyErr_Print();
>>  return 1;
>>  }
>>
>>  //> pName_random+
>>  PyObject * pMod_random = PyImport_Import(pName_random);
>>  //> pName_random+ pMod_random+?
>>  Py_DECREF(pName_random);
>>  //> pMod_random+?
>>  if

Re: PyObject_CallFunctionObjArgs segfaults

2022-09-30 Thread Jen Kris via Python-list


Thanks very much for your detailed reply.  I have a few followup questions.  

You said, “Some functions return an object that has already been incref'ed 
("new reference"). This occurs when it has either created a new object (the 
refcount will be 1) or has returned a pointer to an existing object (the 
refcount will be > 1 because it has been incref'ed).  Other functions return an 
object that hasn't been incref'ed. This occurs when you're looking up 
something, for example, looking at a member of a list or the value of an 
attribute.” 

In the official docs some functions show “Return value: New reference” and 
others do not.  Is there any reason why I should not just INCREF on every new 
object, regardless of whether it’s a new reference or not, and DECREF when I am 
finished with it?  The answer at 
https://stackoverflow.com/questions/59870703/python-c-extension-need-to-py-incref-a-borrowed-reference-if-not-returning-it-to
 says “With out-of-order execution, the INCREF/DECREF are basically free 
operations, so performance is no reason to leave them out.”  Doing so means I 
don’t have to check each object to see if it needs to be INCREF’d or not, and 
that is a big help. 

Also: 

What is a borrowed reference, and how does it effect reference counting?  
According to https://jayrambhia.com/blog/pythonc-api-reference-counting, “Use 
Py_INCREF on a borrowed PyObject pointer you already have. This increments the 
reference count on the object, and obligates you to dispose of it properly.”  
So I guess it’s yes, but I’m confused by “pointer you already have.” 

What does it mean to steal a reference?  If a function steals a reference does 
it have to decref it without incref (because it’s stolen)?

Finally, you said:

if (pMod_random == 0x0){
    PyErr_Print();
Leaks here because of the refcount

Assuming pMod_random is not null, why would this leak? 

Thanks again for your input on this question. 

Jen



Sep 29, 2022, 17:33 by pyt...@mrabarnett.plus.com:

> On 2022-09-30 01:02, MRAB wrote:
>
>> On 2022-09-29 23:41, Jen Kris wrote:
>>
>>>
>>> I just solved this C API problem, and I’m posting the answer to help anyone 
>>> else who might need it.
>>>
> [snip]
>
> What I like to do is write comments that state which variables hold a 
> reference, followed by '+' if it's a new reference (incref'ed) and '?' if it 
> could be null. '+?' means that it's probably a new reference but could be 
> null. Once I know that it's not null, I can remove the '?', and once I've 
> decref'ed it (if required) and no longer need it, I remobe it from the 
> comment.
>
> Clearing up references, as soon as they're not needed, helps to keep the 
> number of current references more manageable.
>
>
> int64_t Get_LibModules(int64_t * return_array) {
>  PyObject * pName_random = PyUnicode_FromString("random");
>  //> pName_random+?
>  if (!pName_random) {
>  PyErr_Print();
>  return 1;
>  }
>
>  //> pName_random+
>  PyObject * pMod_random = PyImport_Import(pName_random);
>  //> pName_random+ pMod_random+?
>  Py_DECREF(pName_random);
>  //> pMod_random+?
>  if (!pMod_random) {
>  PyErr_Print();
>  return 1;
>  }
>
>  //> pMod_random+
>  PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
>  //> pMod_random+ pAttr_seed?
>  if (!pAttr_seed) {
>  Py_DECREF(pMod_random);
>  PyErr_Print();
>  return 1;
>  }
>
>  //> pMod_random+ pAttr_seed
>  PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, 
> "randrange");
>  //> pMod_random+ pAttr_seed pAttr_randrange?
>  Py_DECREF(pMod_random);
>  //> pAttr_seed pAttr_randrange?
>  if (!pAttr_randrange) {
>  PyErr_Print();
>  return 1;
>  }
>
>  //> pAttr_seed pAttr_randrange
>  return_array[0] = (int64_t)pAttr_seed;
>  return_array[1] = (int64_t)pAttr_randrange;
>
>  return 0;
> }
>
> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) {
>  PyObject * value_ptr = PyLong_FromLong(value_1);
>  //> value_ptr+?
>  if (!!value_ptr) {
>  PyErr_Print();
>  return 1;
>  }
>
>  //> value_ptr+
>  PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, 
> NULL);
>  //> value_ptr+ p_seed_calc+?
>  Py_DECREF(value_ptr);
>  //> p_seed_calc+?
>  if (!p_seed_calc) {
>  PyErr_Print();
>  return 1;
>  }
>
>  //> p_seed_calc+
>  Py_DECREF(p_seed_calc);
>  return 0;
> }
>
> int64_t C_API_12(PyObject * pAttr_randrange, Py_ssize_t value_1) {
>  PyObject * value_ptr = PyLong_FromLong(value_1);
>  //> value_ptr+?
>  if (!value_ptr) {
>  PyErr_Print();
>  return 1;
>  }
>
>  //> value_ptr+
>  PyObject * p_randrange_calc = PyObject_CallFunctionObjArgs(pAttr_randrange, 
> value_ptr, NULL);
>  //> value_ptr+ p_randrange_calc+?
>  Py_DECREF(value_ptr);
>  //> p_randrange_calc+?
>  if (!p_randrange_calc) {
>  PyErr_Print();
>  return 1;
>  }
>
>  //Prepare return values
>  //> p_randrange_calc+
>  return_val = PyLong_AsLong(p_randrange_calc);
>  Py_DECREF(p_randrange_calc);
>
>  return return_val;
> }
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

--

Re: PyObject_CallFunctionObjArgs segfaults

2022-09-29 Thread Jen Kris via Python-list


I just solved this C API problem, and I’m posting the answer to help anyone 
else who might need it.  

The errors were:

(1) we must call Py_INCREF on each object when it’s created.

(2) in C_API_2 (see below) we don’t cast value_1 as I did before with PyObject 
* value_ptr = (PyObject * )value_1.  Instead we use PyObject * value_ptr = 
PyLong_FromLong(value_1);

(3) The command string to PyObject_CallFunctionObjArgs must be null terminated.

Here’s the revised code:

First we load the modules, and increment the reference to each object: 

int64_t Get_LibModules(int64_t * return_array)
{
PyObject * pName_random = PyUnicode_FromString("random");
PyObject * pMod_random = PyImport_Import(pName_random);

Py_INCREF(pName_random);
Py_INCREF(pMod_random);

if (pMod_random == 0x0){
PyErr_Print();
return 1;}

PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange");

Py_INCREF(pAttr_seed);
Py_INCREF(pAttr_randrange);

return_array[0] = (int64_t)pAttr_seed;
return_array[1] = (int64_t)pAttr_randrange;

return 0;
}

Next we call a program to initialize the random number generator with 
random.seed(), and increment the reference to its return value p_seed_calc:

int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
{
PyObject * value_ptr = PyLong_FromLong(value_1);
PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, 
NULL);

// _

if (p_seed_calc == 0x0){
    PyErr_Print();
    return 1;}

Py_INCREF(p_seed_calc);

return 0;
}

Now we call another program to get a random number:

int64_t C_API_12(PyObject * pAttr_randrange, Py_ssize_t value_1)
{
PyObject * value_ptr = PyLong_FromLong(value_1);
PyObject * p_randrange_calc = PyObject_CallFunctionObjArgs(pAttr_randrange, 
value_ptr, NULL);

if (p_randrange_calc == 0x0){
    PyErr_Print();
    return 1;}

//Prepare return values
long return_val = PyLong_AsLong(p_randrange_calc);

return return_val;
}

That returns 28, which is what I get from the Python command line. 

Thanks again to MRAB for helpful comments. 

Jen


Sep 29, 2022, 15:31 by pyt...@mrabarnett.plus.com:

> On 2022-09-29 21:47, Jen Kris wrote:
>
>> To update my previous email, I found the problem, but I have a new problem.
>>
>> Previously I cast PyObject * value_ptr = (PyObject * )value_1 but that's not 
>> correct.  Instead I used PyObject * value_ptr = PyLong_FromLong(value_1) and 
>> that works.  HOWEVER, while PyObject_CallFunctionObjArgs does work now, it 
>> returns -1, which is not the right answer for random.seed.  I use "long 
>> return_val = PyLong_AsLong(p_seed_calc);" to convert it to a long.
>>
> random.seed returns None, so when you call PyObject_CallFunctionObjArgs it 
> returns a new reference to Py_None.
>
> If you then pass to PyLong_AsLong a reference to something that's not a 
> PyLong, it'll set an error and return -1.
>
>> So my question is why do I get -1 as return value?  When I query p_seed calc 
>> : get:
>>
>> (gdb) p p_seed_calc
>> $2 = (PyObject *) 0x769be120 <_Py_NoneStruct>
>>
> Exactly. It's Py_None, not a PyLong.
>
>> Thanks again.
>>
>> Jen
>>
>>
>>
>>
>> Sep 29, 2022, 13:02 by python-list@python.org:
>>
>>  Thanks very much to @MRAB for taking time to answer.  I changed my
>>  code to conform to your answer (as best I understand your comments
>>  on references), but I still get the same error.  My comments
>>  continue below the new code immediately below.
>>
>>  int64_t Get_LibModules(int64_t * return_array)
>>  {
>>  PyObject * pName_random = PyUnicode_FromString("random");
>>  PyObject * pMod_random = PyImport_Import(pName_random);
>>
>>  Py_INCREF(pName_random);
>>  Py_INCREF(pMod_random);
>>
>>  if (pMod_random == 0x0){
>>  PyErr_Print();
>>  return 1;}
>>
>>  PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
>>  PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random,
>>  "randrange");
>>
>>  Py_INCREF(pAttr_seed);
>>  Py_INCREF(pAttr_randrange);
>>
>>  return_array[0] = (int64_t)pAttr_seed;
>>  return_array[1] = (int64_t)pAttr_randrange;
>>
>>  return 0;
>>  }
>>
>>  int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
>>  {
>>  PyObject * value_ptr = (PyObject * )value_1;
>>  PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed,
>>  value_ptr, NULL);
>>
>>  if (p_seed_calc == 0x0){
>>      PyErr_Print();
>>      return 1;}
>>
>>  //Prepare return values
>>  long return_val = PyLong_AsLong(p_seed_calc)

Re: PyObject_CallFunctionObjArgs segfaults

2022-09-29 Thread Jen Kris via Python-list

To update my previous email, I found the problem, but I have a new problem.  

Previously I cast PyObject * value_ptr = (PyObject * )value_1 but that's not 
correct.  Instead I used PyObject * value_ptr = PyLong_FromLong(value_1) and 
that works.  HOWEVER, while PyObject_CallFunctionObjArgs does work now, it 
returns -1, which is not the right answer for random.seed.  I use "long 
return_val = PyLong_AsLong(p_seed_calc);" to convert it to a long.  

So my question is why do I get -1 as return value?  When I query p_seed calc : 
get:

(gdb) p p_seed_calc
$2 = (PyObject *) 0x769be120 <_Py_NoneStruct>

Thanks again.

Jen




Sep 29, 2022, 13:02 by python-list@python.org:

> Thanks very much to @MRAB for taking time to answer.  I changed my code to 
> conform to your answer (as best I understand your comments on references), 
> but I still get the same error.  My comments continue below the new code 
> immediately below.  
>
> int64_t Get_LibModules(int64_t * return_array)
> {
> PyObject * pName_random = PyUnicode_FromString("random");
> PyObject * pMod_random = PyImport_Import(pName_random);
>
> Py_INCREF(pName_random);
> Py_INCREF(pMod_random);
>
> if (pMod_random == 0x0){
> PyErr_Print();
> return 1;}
>
> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange");
>
> Py_INCREF(pAttr_seed);
> Py_INCREF(pAttr_randrange);
>
> return_array[0] = (int64_t)pAttr_seed;
> return_array[1] = (int64_t)pAttr_randrange;
>
> return 0;
> }
>
> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
> {
> PyObject * value_ptr = (PyObject * )value_1;
> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, 
> NULL);
>
> if (p_seed_calc == 0x0){
>     PyErr_Print();
>     return 1;}
>
> //Prepare return values
> long return_val = PyLong_AsLong(p_seed_calc);
>
> return return_val;
> }
>
> So I incremented the reference to all objects in Get_LibModules, but I still 
> get the same segfault at PyObject_CallFunctionObjArgs.  Unfortunately, 
> reference counting is not well documented so I’m not clear what’s wrong. 
>
>
>
>
> Sep 29, 2022, 10:06 by pyt...@mrabarnett.plus.com:
>
>> On 2022-09-29 16:54, Jen Kris via Python-list wrote:
>>
>>> Recently I completed a project where I used PyObject_CallFunctionObjArgs 
>>> extensively with the NLTK library from a program written in NASM, with no 
>>> problems.  Now I am on a new project where I call the Python random 
>>> library.  I use the same setup as before, but I am getting a segfault with 
>>> random.seed.
>>>
>>> At the start of the NASM program I call a C API program that gets PyObject 
>>> pointers to “seed” and “randrange” in the same way as I did before:
>>>
>>> int64_t Get_LibModules(int64_t * return_array)
>>> {
>>> PyObject * pName_random = PyUnicode_FromString("random");
>>> PyObject * pMod_random = PyImport_Import(pName_random);
>>>
>> Both PyUnicode_FromString and PyImport_Import return new references or null 
>> pointers.
>>
>>> if (pMod_random == 0x0){
>>> PyErr_Print();
>>>
>>
>> You're leaking a reference here (pName_random).
>>
>>> return 1;}
>>>
>>> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
>>> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, 
>>> "randrange");
>>>
>>> return_array[0] = (int64_t)pAttr_seed;
>>> return_array[1] = (int64_t)pAttr_randrange;
>>>
>>
>> You're leaking 2 references here (pName_random and pMod_random).
>>
>>> return 0;
>>> }
>>>
>>> Later in the same program I call a C API program to call random.seed:
>>>
>>> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
>>> {
>>> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_1);
>>>
>>
>> It's expecting all of the arguments to be PyObject*, but value_1 is 
>> Py_ssize_t instead of PyObject* (a pointer to a _Python_ int).
>>
>> The argument list must end with a null pointer.
>>
>> It returns a new reference or a null pointer.
>>
>>>
>>> if (p_seed_calc == 0x0){
>>>      PyErr_Print();
>>>      return 1;}
>>>
>>> //Prepare return values
>>> long return_val = PyLong_AsLong(p_seed_calc);
>>>
>> You're leaking a reference here (p_seed_calc).
>>
>>>

Re: PyObject_CallFunctionObjArgs segfaults

2022-09-29 Thread Jen Kris via Python-list

Thanks very much to @MRAB for taking time to answer.  I changed my code to 
conform to your answer (as best I understand your comments on references), but 
I still get the same error.  My comments continue below the new code 
immediately below.  

int64_t Get_LibModules(int64_t * return_array)
{
PyObject * pName_random = PyUnicode_FromString("random");
PyObject * pMod_random = PyImport_Import(pName_random);

Py_INCREF(pName_random);
Py_INCREF(pMod_random);

if (pMod_random == 0x0){
PyErr_Print();
return 1;}

PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange");

Py_INCREF(pAttr_seed);
Py_INCREF(pAttr_randrange);

return_array[0] = (int64_t)pAttr_seed;
return_array[1] = (int64_t)pAttr_randrange;

return 0;
}

int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
{
PyObject * value_ptr = (PyObject * )value_1;
PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, 
NULL);

if (p_seed_calc == 0x0){
    PyErr_Print();
    return 1;}

//Prepare return values
long return_val = PyLong_AsLong(p_seed_calc);

return return_val;
}

So I incremented the reference to all objects in Get_LibModules, but I still 
get the same segfault at PyObject_CallFunctionObjArgs.  Unfortunately, 
reference counting is not well documented so I’m not clear what’s wrong. 




Sep 29, 2022, 10:06 by pyt...@mrabarnett.plus.com:

> On 2022-09-29 16:54, Jen Kris via Python-list wrote:
>
>> Recently I completed a project where I used PyObject_CallFunctionObjArgs 
>> extensively with the NLTK library from a program written in NASM, with no 
>> problems.  Now I am on a new project where I call the Python random library. 
>>  I use the same setup as before, but I am getting a segfault with 
>> random.seed.
>>
>> At the start of the NASM program I call a C API program that gets PyObject 
>> pointers to “seed” and “randrange” in the same way as I did before:
>>
>> int64_t Get_LibModules(int64_t * return_array)
>> {
>> PyObject * pName_random = PyUnicode_FromString("random");
>> PyObject * pMod_random = PyImport_Import(pName_random);
>>
> Both PyUnicode_FromString and PyImport_Import return new references or null 
> pointers.
>
>> if (pMod_random == 0x0){
>> PyErr_Print();
>>
>
> You're leaking a reference here (pName_random).
>
>> return 1;}
>>
>> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
>> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, 
>> "randrange");
>>
>> return_array[0] = (int64_t)pAttr_seed;
>> return_array[1] = (int64_t)pAttr_randrange;
>>
>
> You're leaking 2 references here (pName_random and pMod_random).
>
>> return 0;
>> }
>>
>> Later in the same program I call a C API program to call random.seed:
>>
>> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
>> {
>> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_1);
>>
>
> It's expecting all of the arguments to be PyObject*, but value_1 is 
> Py_ssize_t instead of PyObject* (a pointer to a _Python_ int).
>
> The argument list must end with a null pointer.
>
> It returns a new reference or a null pointer.
>
>>
>> if (p_seed_calc == 0x0){
>>      PyErr_Print();
>>      return 1;}
>>
>> //Prepare return values
>> long return_val = PyLong_AsLong(p_seed_calc);
>>
> You're leaking a reference here (p_seed_calc).
>
>> return return_val;
>> }
>>
>> The first program correctly imports “random” and gets pointers to “seed” and 
>> “randrange.”  I verified that the same pointer is correctly passed into 
>> C_API_2, and the seed value (1234) is passed as  Py_ssize_t value_1.  But I 
>> get this segfault:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x764858d5 in _Py_INCREF (op=0x4d2) at ../Include/object.h:459
>> 459 ../Include/object.h: No such file or directory.
>>
>> So I tried Py_INCREF in the first program:
>>
>> Py_INCREF(pMod_random);
>> Py_INCREF(pAttr_seed);
>>
>> Then I moved Py_INCREF(pAttr_seed) to the second program.  Same segfault.
>>
>> Finally, I initialized “random” and “seed” in the second program, where they 
>> are used.  Same segfault.
>>
>> The segfault refers to Py_INCREF, so this seems to do with reference 
>> counting, but Py_INCREF didn’t solve it.
>>
>> I’m using Python 3.8 on Ubuntu.
>>
>> Thanks for any ideas on how to solve this.
>>
>> Jen
>>
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

PyObject_CallFunctionObjArgs segfaults

2022-09-29 Thread Jen Kris via Python-list

Recently I completed a project where I used PyObject_CallFunctionObjArgs 
extensively with the NLTK library from a program written in NASM, with no 
problems.  Now I am on a new project where I call the Python random library.  I 
use the same setup as before, but I am getting a segfault with random.seed.  

At the start of the NASM program I call a C API program that gets PyObject 
pointers to “seed” and “randrange” in the same way as I did before:

int64_t Get_LibModules(int64_t * return_array)
{
PyObject * pName_random = PyUnicode_FromString("random");
PyObject * pMod_random = PyImport_Import(pName_random);

if (pMod_random == 0x0){
PyErr_Print();
return 1;}

PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange");

return_array[0] = (int64_t)pAttr_seed;
return_array[1] = (int64_t)pAttr_randrange;

return 0;
}

Later in the same program I call a C API program to call random.seed:

int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
{
PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_1);

if (p_seed_calc == 0x0){
    PyErr_Print();
    return 1;}

//Prepare return values
long return_val = PyLong_AsLong(p_seed_calc);

return return_val;
}

The first program correctly imports “random” and gets pointers to “seed” and 
“randrange.”  I verified that the same pointer is correctly passed into 
C_API_2, and the seed value (1234) is passed as  Py_ssize_t value_1.  But I get 
this segfault:

Program received signal SIGSEGV, Segmentation fault.
0x764858d5 in _Py_INCREF (op=0x4d2) at ../Include/object.h:459
459 ../Include/object.h: No such file or directory.

So I tried Py_INCREF in the first program: 

Py_INCREF(pMod_random);
Py_INCREF(pAttr_seed);

Then I moved Py_INCREF(pAttr_seed) to the second program.  Same segfault.

Finally, I initialized “random” and “seed” in the second program, where they 
are used.  Same segfault. 

The segfault refers to Py_INCREF, so this seems to do with reference counting, 
but Py_INCREF didn’t solve it.   

I’m using Python 3.8 on Ubuntu. 

Thanks for any ideas on how to solve this. 

Jen

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Problem slicing a list with the C API

2022-03-12 Thread Jen Kris via Python-list


Thanks for PySequence_InPlaceConcat, so when I need to extend I'll know what to 
use.  But my previous email was based on incorrect information from several SO 
posts that claimed only the extend method will work to add tuples to a list.  I 
found that's wrong -- even my own Python code uses the append method.  But my 
PyList_Append is not doing the job so that's where I'm looking now.  

Thanks very much for your reply.

Mar 12, 2022, 15:36 by ros...@gmail.com:

> On Sun, 13 Mar 2022 at 10:30, Jen Kris  wrote:
>
>>
>>
>> Chris, you were right to focus on the list pDictData itself.   As I said, 
>> that is a list of 2-tuples, but I added each of the 2-tuples with 
>> PyList_Append, but you can only append a tuple to a list with the extend 
>> method.  However, there is no append method in the C API as far as I can 
>> tell -- hence pDictData is empty.  I tried with PyList_SetItem but that 
>> doesn't work.  Do you know of way to "extend" a list in the C API.
>>
>
> Hmm. Not entirely sure I understand the question.
>
> In Python, a list has an append method, which takes any object (which
> may be a tuple) and adds that object to the end of the list:
>
 x = ["spam", "ham"]
 x.append((1,2))
 x

> ['spam', 'ham', (1, 2)]
>
> A list also has an extend method, which takes any sequence (that also
> includes tuples), and adds *the elements from it* to the end of the
> list:
>
 x = ["spam", "ham"]
 x.extend((1,2))
 x

> ['spam', 'ham', 1, 2]
>
> The append method corresponds to PyList_Append, as you mentioned. It
> should be quite happy to append a tuple, and will add the tuple
> itself, not the contents of it. So when you iterate over the list,
> you'll get tuples.
>
> Extending a list can be done with the sequence API. In Python, you can
> write extend() as +=, indicating that you're adding something onto the
> end:
>
 x = ["spam", "ham"]
 x += (1, 2)
 x

> ['spam', 'ham', 1, 2]
>
> This corresponds to PySequence_InPlaceConcat, so if that's the
> behaviour you want, that would be the easiest way to do it.
>
> Based on your other comments, I would suspect that appending the
> tuples is probably what you want here?
>
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Problem slicing a list with the C API

2022-03-12 Thread Jen Kris via Python-list


Chris, you were right to focus on the list pDictData itself.   As I said, that 
is a list of 2-tuples, but I added each of the 2-tuples with PyList_Append, but 
you can only append a tuple to a list with the extend method.  However, there 
is no append method in the C API as far as I can tell -- hence pDictData is 
empty.  I tried with PyList_SetItem but that doesn't work.  Do you know of way 
to "extend" a list in the C API.  
Thanks very much.  

Jen



Mar 12, 2022, 13:57 by ros...@gmail.com:

> On Sun, 13 Mar 2022 at 08:54, Jen Kris  wrote:
>
>>
>>
>> pDictData, despite the name, is a list of 2-tuples where each 2-tuple is a 
>> dictionary object and a string.
>>
>
> Ah, gotcha. In that case, yeah, slicing it will involve referencing
> the tuples all the way down the line (adding to their refcounts, so if
> there's a borked one, kaboom).
>
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Problem slicing a list with the C API

2022-03-12 Thread Jen Kris via Python-list


pDictData, despite the name, is a list of 2-tuples where each 2-tuple is a 
dictionary object and a string.  

Mar 12, 2022, 13:41 by ros...@gmail.com:

> On Sun, 13 Mar 2022 at 08:25, Jen Kris via Python-list
>  wrote:
>
>> PyObject* slice = PySlice_New(PyLong_FromLong(0), half_slice, 0);
>> PyObject* subdata_a = PyObject_GetItem(pDictddata, slice);
>>
>> On the final line (subdata_a) I get a segfault.  I know that the second 
>> parameter of  PyObject_GetItem is a “key” and I suspect that’s where the 
>> problem comes from, but I don’t understand what a key is in this context.
>>
>
> The key is simply whatever would be in the square brackets in Python
> code, so that part looks fine.
>
> But dictionaries aren't usually subscripted with slices, so I'm a bit
> confused as to what's going on here. What exactly is
> dictdata/pDictdata?
>
> Have you confirmed that pDictdata (a) isn't NULL, (b) is the object
> you intend it to be, and (c) contains the objects you expect it to?
> The segfault might not be from the slice object itself, it might be
> from actually iterating over the thing being sliced and touching all
> its elements. For instance, if dictdata is actually a list, that call
> will be constructing a new list with references to the same elements,
> so if one of them is broken (maybe NULL), it'll break badly.
>
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Problem slicing a list with the C API

2022-03-12 Thread Jen Kris via Python-list


Thanks to you both.  I am going to implement PySequence_Get_Slice now.  If I 
have trouble then, per comments from Chris Angelico, I will iterate through 
pDictData to verify it because I haven't done that.  It is not null, however.  

 Jen


Mar 12, 2022, 13:40 by pyt...@mrabarnett.plus.com:

> On 2022-03-12 21:24, Jen Kris via Python-list wrote:
>
>> I have a C API project where I have to slice a list into two parts.   
>> Unfortunately the documentation on the slice objects is not clear enough for 
>> me to understand how to do this, and I haven’t found enough useful info 
>> through research.  The list contains tuple records where each tuple consists 
>> of a dictionary object and a string.
>>
>> The relevant part of the Python code is:
>>
>> half_slice = int(len(dictdata) * 0.5)
>> subdata_a = dictdata[half_slice:]
>> subdata_b = dictdata[:half_slice]
>>
>> This is what I’ve done so far with the C API:
>>
>> int64_t Calc_Slices(PyObject* pDictdata, int64_t records_count)
>> {
>> long round_half = records_count * 0.5;
>> PyObject* half_slice = PyLong_FromLong(round_half);
>>
>> PyObject* slice = PySlice_New(PyLong_FromLong(0), half_slice, 0);
>> PyObject* subdata_a = PyObject_GetItem(pDictddata, slice);
>>
>> return 0;
>> }
>>
>> On the final line (subdata_a) I get a segfault.  I know that the second 
>> parameter of  PyObject_GetItem is a “key” and I suspect that’s where the 
>> problem comes from, but I don’t understand what a key is in this context.
>>
>> The code shown above omits error handling but none of the objects leading up 
>> to the final line is null, they all succeed.
>>
>> Thanks for any ideas.
>>
> Use PySequence_GetSlice to slice the list.
>
> Also, why use floats when you can use integers?
>
>  long round_half = records_count / 2;
>
> (In Python that would be half_slice = len(dictdata) // 2.)
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Problem slicing a list with the C API

2022-03-12 Thread Jen Kris via Python-list

I have a C API project where I have to slice a list into two parts.   
Unfortunately the documentation on the slice objects is not clear enough for me 
to understand how to do this, and I haven’t found enough useful info through 
research.  The list contains tuple records where each tuple consists of a 
dictionary object and a string.  

The relevant part of the Python code is:

half_slice = int(len(dictdata) * 0.5)
subdata_a = dictdata[half_slice:]
subdata_b = dictdata[:half_slice]

This is what I’ve done so far with the C API:

int64_t Calc_Slices(PyObject* pDictdata, int64_t records_count)
{
long round_half = records_count * 0.5;
PyObject* half_slice = PyLong_FromLong(round_half);

PyObject* slice = PySlice_New(PyLong_FromLong(0), half_slice, 0);
PyObject* subdata_a = PyObject_GetItem(pDictddata, slice);

return 0;
}

On the final line (subdata_a) I get a segfault.  I know that the second 
parameter of  PyObject_GetItem is a “key” and I suspect that’s where the 
problem comes from, but I don’t understand what a key is in this context. 

The code shown above omits error handling but none of the objects leading up to 
the final line is null, they all succeed. 

Thanks for any ideas.

Jen

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: C API PyObject_CallFunctionObjArgs returns incorrect result

2022-03-07 Thread Jen Kris via Python-list

Thanks to MRAB and Chris Angelico for your help.  Here is how I implemented the 
string conversion, and it works correctly now for a library call that needs a 
list converted to a string (error handling not shown):

PyObject* str_sentence = PyObject_Str(pSentence);  
PyObject* separator = PyUnicode_FromString(" ");
PyObject* str_join = PyUnicode_Join(separator, pSentence);
Py_DECREF(separator);
PyObject* pNltk_WTok = PyObject_GetAttrString(pModule_mstr, "word_tokenize");
PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok, str_join, 0);

That produces what I need (this is the REPR of pWTok):

"['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']"

Thanks again to both of you. 

Jen


Mar 7, 2022, 11:03 by pyt...@mrabarnett.plus.com:

> On 2022-03-07 17:05, Jen Kris wrote:
>
>> Thank you MRAB for your reply.
>>
>> Regarding your first question, pSentence is a list.  In the nltk library, 
>> nltk.word_tokenize takes a string, so we convert sentence to string before 
>> we call nltk.word_tokenize:
>>
>> >>> sentence = " ".join(sentence)
>> >>> pt = nltk.word_tokenize(sentence)
>> >>> print(sentence)
>> [ Emma by Jane Austen 1816 ]
>>
>> But with the C API it looks like this:
>>
>> PyObject *pSentence = PySequence_GetItem(pSents, sent_count);
>> PyObject* str_sentence = PyObject_Str(pSentence); // Convert to string
>>
>> ; See what str_sentence looks like:
>> PyObject* repr_str = PyObject_Repr(str_sentence);
>> PyObject* str_str = PyUnicode_AsEncodedString(repr_str, "utf-8", "~E~");
>> const char *bytes_str = PyBytes_AS_STRING(str_str);
>> printf("REPR_String: %s\n", bytes_str);
>>
>> REPR_String: "['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']"
>>
>> So the two string representations are not the same – or at least the   
>> PyUnicode_AsEncodedString is not the same, as each item is surrounded by 
>> single quotes.
>>
>> Assuming that the conversion to bytes object for the REPR is an accurate 
>> representation of str_sentence, it looks like I need to strip the quotes 
>> from str_sentence before “PyObject* pWTok = 
>> PyObject_CallFunctionObjArgs(pNltk_WTok, str_sentence, 0).”
>>
>> So my questions now are (1) is there a C API function that will convert a 
>> list to a string exactly the same way as ‘’.join, and if not then (2) how 
>> can I strip characters from a string object in the C API?
>>
> Your Python code is joining the list with a space as the separator.
>
> The equivalent using the C API is:
>
>     PyObject* separator;
>     PyObject* joined;
>
>     separator = PyUnicode_FromString(" ");
>     joined = PyUnicode_Join(separator, pSentence);
>     Py_DECREF(sep);
>
>>
>> Mar 6, 2022, 17:42 by pyt...@mrabarnett.plus.com:
>>
>>  On 2022-03-07 00:32, Jen Kris via Python-list wrote:
>>
>>  I am using the C API in Python 3.8 with the nltk library, and
>>  I have a problem with the return from a library call
>>  implemented with PyObject_CallFunctionObjArgs.
>>
>>  This is the relevant Python code:
>>
>>  import nltk
>>  from nltk.corpus import gutenberg
>>  fileids = gutenberg.fileids()
>>  sentences = gutenberg.sents(fileids[0])
>>  sentence = sentences[0]
>>  sentence = " ".join(sentence)
>>  pt = nltk.word_tokenize(sentence)
>>
>>  I run this at the Python command prompt to show how it works:
>>
>>  sentence = " ".join(sentence)
>>  pt = nltk.word_tokenize(sentence)
>>  print(pt)
>>
>>  ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']
>>
>>  type(pt)
>>
>>  
>>
>>  This is the relevant part of the C API code:
>>
>>  PyObject* str_sentence = PyObject_Str(pSentence);
>>  // nltk.word_tokenize(sentence)
>>  PyObject* pNltk_WTok = PyObject_GetAttrString(pModule_mstr,
>>  "word_tokenize");
>>  PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok,
>>  str_sentence, 0);
>>
>>  (where pModule_mstr is the nltk library).
>>
>>  That should produce a list with a length of 7 that looks like
>>  it does on the command line version shown above:
>>
>>  ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']
>>
>>  But instead the C API produces a list with a length of 24, and
>>  the REPR looks like this:
>>
>>  '[\'[\', "\'", \'[\', "\'", \',\', "\'Emma", "\'", \',\',
>>  "\'by", "\'", \',\', "\'Jane", "\'", \',\', "\'Austen", "\'",
>>  \',\', "\'1816", "\'", \',\', "\'", \']\', "\'", \']\']'
>>
>>  I also tried this with PyObject_CallMethodObjArgs and
>>  PyObject_Call without success.
>>
>>  Thanks for any help on this.
>>
>>  What is pSentence? Is it what you think it is?
>>  To me it looks like it's either the list:
>>
>>  ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']
>>
>>  or that list as a string:
>>
>>  "['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']"
>>
>>  and that what you're tokenising.
>>  -- https://mail.python.org/mailman/listinfo/python-list
>>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: C API PyObject_CallFunctionObjArgs returns incorrect result

2022-03-07 Thread Jen Kris via Python-list


The PyObject str_sentence is a string representation of a list.  I need to 
convert the list to a string like "".join because that's what the library call 
takes.  


Mar 7, 2022, 09:09 by ros...@gmail.com:

> On Tue, 8 Mar 2022 at 04:06, Jen Kris via Python-list
>  wrote:
>
>> But with the C API it looks like this:
>>
>> PyObject *pSentence = PySequence_GetItem(pSents, sent_count);
>> PyObject* str_sentence = PyObject_Str(pSentence);  // Convert to string
>>
>> PyObject* repr_str = PyObject_Repr(str_sentence);
>>
>
> You convert it to a string, then take the representation of that. Is
> that what you intended?
>
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: C API PyObject_CallFunctionObjArgs returns incorrect result

2022-03-07 Thread Jen Kris via Python-list

Thank you MRAB for your reply.

Regarding your first question, pSentence is a list.  In the nltk library, 
nltk.word_tokenize takes a string, so we convert sentence to string before we 
call nltk.word_tokenize:

>>> sentence = " ".join(sentence)
>>> pt = nltk.word_tokenize(sentence)
>>> print(sentence)
[ Emma by Jane Austen 1816 ]

But with the C API it looks like this:

PyObject *pSentence = PySequence_GetItem(pSents, sent_count);
PyObject* str_sentence = PyObject_Str(pSentence);  // Convert to string

; See what str_sentence looks like:
PyObject* repr_str = PyObject_Repr(str_sentence);  
PyObject* str_str = PyUnicode_AsEncodedString(repr_str, "utf-8", "~E~");  
const char *bytes_str = PyBytes_AS_STRING(str_str);
printf("REPR_String: %s\n", bytes_str); 

REPR_String: "['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']"
So the two string representations are not the same – or at least the   
PyUnicode_AsEncodedString is not the same, as each item is surrounded by single 
quotes. 

Assuming that the conversion to bytes object for the REPR is an accurate 
representation of str_sentence, it looks like I need to strip the quotes from 
str_sentence before “PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok, 
str_sentence, 0).”   

So my questions now are (1) is there a C API function that will convert a list 
to a string exactly the same way as ‘’.join, and if not then (2) how can I 
strip characters from a string object in the C API? 

Thanks.



Mar 6, 2022, 17:42 by pyt...@mrabarnett.plus.com:

> On 2022-03-07 00:32, Jen Kris via Python-list wrote:
>
>> I am using the C API in Python 3.8 with the nltk library, and I have a 
>> problem with the return from a library call implemented with 
>> PyObject_CallFunctionObjArgs.
>>
>> This is the relevant Python code:
>>
>> import nltk
>> from nltk.corpus import gutenberg
>> fileids = gutenberg.fileids()
>> sentences = gutenberg.sents(fileids[0])
>> sentence = sentences[0]
>> sentence = " ".join(sentence)
>> pt = nltk.word_tokenize(sentence)
>>
>> I run this at the Python command prompt to show how it works:
>>
>>>>> sentence = " ".join(sentence)
>>>>> pt = nltk.word_tokenize(sentence)
>>>>> print(pt)
>>>>>
>> ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']
>>
>>>>> type(pt)
>>>>>
>> 
>>
>> This is the relevant part of the C API code:
>>
>> PyObject* str_sentence = PyObject_Str(pSentence);
>> // nltk.word_tokenize(sentence)
>> PyObject* pNltk_WTok = PyObject_GetAttrString(pModule_mstr, "word_tokenize");
>> PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok, str_sentence, 0);
>>
>> (where pModule_mstr is the nltk library).
>>
>> That should produce a list with a length of 7 that looks like it does on the 
>> command line version shown above:
>>
>> ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']
>>
>> But instead the C API produces a list with a length of 24, and the REPR 
>> looks like this:
>>
>> '[\'[\', "\'", \'[\', "\'", \',\', "\'Emma", "\'", \',\', "\'by", "\'", 
>> \',\', "\'Jane", "\'", \',\', "\'Austen", "\'", \',\', "\'1816", "\'", 
>> \',\', "\'", \']\', "\'", \']\']'
>>
>> I also tried this with PyObject_CallMethodObjArgs and PyObject_Call without 
>> success.
>>
>> Thanks for any help on this.
>>
> What is pSentence? Is it what you think it is?
> To me it looks like it's either the list:
>
>  ['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']
>
> or that list as a string:
>
>  "['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']"
>
> and that what you're tokenising.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

C API PyObject_CallFunctionObjArgs returns incorrect result

2022-03-06 Thread Jen Kris via Python-list

I am using the C API in Python 3.8 with the nltk library, and I have a problem 
with the return from a library call implemented with 
PyObject_CallFunctionObjArgs.  

This is the relevant Python code:

import nltk
from nltk.corpus import gutenberg
fileids = gutenberg.fileids()
sentences = gutenberg.sents(fileids[0])
sentence = sentences[0]
sentence = " ".join(sentence)
pt = nltk.word_tokenize(sentence)

I run this at the Python command prompt to show how it works:
>>> sentence = " ".join(sentence)
>>> pt = nltk.word_tokenize(sentence)
>>> print(pt)
['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']
>>> type(pt)


This is the relevant part of the C API code:

PyObject* str_sentence = PyObject_Str(pSentence);  
// nltk.word_tokenize(sentence)  
PyObject* pNltk_WTok = PyObject_GetAttrString(pModule_mstr, "word_tokenize");
PyObject* pWTok = PyObject_CallFunctionObjArgs(pNltk_WTok, str_sentence, 0);

(where pModule_mstr is the nltk library). 

That should produce a list with a length of 7 that looks like it does on the 
command line version shown above:

['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']']

But instead the C API produces a list with a length of 24, and the REPR looks 
like this:

'[\'[\', "\'", \'[\', "\'", \',\', "\'Emma", "\'", \',\', "\'by", "\'", \',\', 
"\'Jane", "\'", \',\', "\'Austen", "\'", \',\', "\'1816", "\'", \',\', "\'", 
\']\', "\'", \']\']'

I also tried this with PyObject_CallMethodObjArgs and PyObject_Call without 
success. 

Thanks for any help on this. 

Jen

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: C API - How to return a Dictionary as a Dictionary type

2022-02-14 Thread Jen Kris via Python-list

Yes, that works.  This is my first day with C API dictionaries.  Now that 
you've explained it, it makes perfect sense.  Thanks much.  

Jen


Feb 14, 2022, 17:24 by ros...@gmail.com:

> On Tue, 15 Feb 2022 at 12:07, Jen Kris via Python-list
>  wrote:
>
>>
>> I created a dictionary with the Python C API and assigned two keys and 
>> values:
>>
>> PyObject* this_dict = PyDict_New();
>> const char *key = "key1";
>> char *val = "data_01";
>> PyObject* val_p = PyUnicode_FromString(val);
>> int r = PyDict_SetItemString(this_dict, key, val_p);
>>
>> // Add another k-v pair
>> key = "key2";
>> val = "data_02";
>> val_p = PyUnicode_FromString(val);
>> r = PyDict_SetItemString(this_dict, key, val_p);
>>
>> I need to retrieve the entire dictionary to be passed to a library function 
>> that expects a dictionary.  I used  PyDict_Items:
>>
>> PyObject* pdi = PyDict_Items(this_dict);
>> PyObject* str_untagd = PyObject_Str(pdi);
>> PyObject* repr_utd = PyObject_Repr(str_untagd);
>> PyObject* str_utd = PyUnicode_AsEncodedString(repr_utd, "utf-8", "~E~");
>> const char *bytes_d = PyBytes_AS_STRING(str_utd);
>> printf("REPR_UnTag: %s\n", bytes_d);
>>
>> but as the docs say (https://docs.python.org/3.8/c-api/dict.html), that 
>> returns a PyListObject, not a dictionary enclosed with curly braces:
>>
>> [('key1', 'data_01'), ('key2', 'data_02')]".
>>
>> My question is, how can I get the dictionary as a dictionary type, enclosed 
>> with curly braces.  I found PyObject_GenericGetDict 
>> (https://docs.python.org/3.8/c-api/object.html) but I haven't found any 
>> documentation or explanation of how it works.
>>
>> Is PyObject_GenericGetDict what I need, or is there another way to do it?
>>
>
> Not sure what you mean. The dict is already a dict. If you refer to
> this_dict, it is a dict, right?
>
> If you need the string representation of that, you should be able to
> call PyObject_Repr just as you are, but call it on the dict, not on
> the dict items.
>
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

C API - How to return a Dictionary as a Dictionary type

2022-02-14 Thread Jen Kris via Python-list

I created a dictionary with the Python C API and assigned two keys and values:

PyObject* this_dict = PyDict_New(); 
const char *key = "key1";
char *val = "data_01"; 
PyObject* val_p = PyUnicode_FromString(val); 
int r = PyDict_SetItemString(this_dict, key, val_p); 

// Add another k-v pair
key = "key2";
val = "data_02"; 
val_p = PyUnicode_FromString(val); 
r = PyDict_SetItemString(this_dict, key, val_p); 

I need to retrieve the entire dictionary to be passed to a library function 
that expects a dictionary.  I used  PyDict_Items:

PyObject* pdi = PyDict_Items(this_dict);
PyObject* str_untagd = PyObject_Str(pdi);
PyObject* repr_utd = PyObject_Repr(str_untagd);
PyObject* str_utd = PyUnicode_AsEncodedString(repr_utd, "utf-8", "~E~");  
const char *bytes_d = PyBytes_AS_STRING(str_utd);
printf("REPR_UnTag: %s\n", bytes_d);

but as the docs say (https://docs.python.org/3.8/c-api/dict.html), that returns 
a PyListObject, not a dictionary enclosed with curly braces: 

[('key1', 'data_01'), ('key2', 'data_02')]". 

My question is, how can I get the dictionary as a dictionary type, enclosed 
with curly braces.  I found PyObject_GenericGetDict 
(https://docs.python.org/3.8/c-api/object.html) but I haven't found any 
documentation or explanation of how it works. 

Is PyObject_GenericGetDict what I need, or is there another way to do it?


Thanks,

Jen


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: C API PyObject_Call segfaults with string

2022-02-10 Thread Jen Kris via Python-list

Thank you for that suggestion.  It allowed me to replace six lines of code with 
one.  :)


Feb 10, 2022, 12:43 by pyt...@mrabarnett.plus.com:

> On 2022-02-10 20:00, Jen Kris via Python-list wrote:
>
>> With the help of PyErr_Print() I have it solved.  Here is the final code 
>> (the part relevant to sents):
>>
>>     Py_ssize_t listIndex = 0;
>>     pListItem = PyList_GetItem(pFileIds, listIndex);
>>     pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>>     pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer
>>
>>     // Then:  sentences = gutenberg.sents(fileid) - this is a sequence item
>>     PyObject *c_args = Py_BuildValue("s", pListStr);
>>     PyObject *args_tuple = PyTuple_New(1);
>>     PyTuple_SetItem(args_tuple, 0, c_args);
>>
>>     pSents = PyObject_CallObject(pSentMod, args_tuple);
>>
>>     if ( pSents == 0x0){
>>     PyErr_Print();
>>     return return_value; }
>>
>> As you mentioned yesterday, CallObject needs a tuple, so that was the 
>> problem.  Now it works.
>>
>> You also asked why I don't just use pListStrE.  I tried that and got a long 
>> error message from PyErr_Print.  I'm not far enough along in my C_API work 
>> to understand why, but it doesn't work.
>>
>> Thanks very much for your help on this.
>>
> You're encoding a Unicode string to a UTF-8 bytestring:
>
>  pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>
> then pointing to the bytes of that UTF-8 bytestring:
>
>  pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer
>
> then making a Unicode string from those UTF-8 bytes:
>
>  PyObject *c_args = Py_BuildValue("s", pListStr);
>
> You might was well just use the original Unicode string!
>
> Try this instead:
>
>  Py_ssize_t listIndex = 0;
>  pListItem = PyList_GetItem(pFileIds, listIndex);
>  //> pListItem?
>
>  pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem, 0);
>  //> pSents+?
>
>  if (pSents == 0x0){
>  PyErr_Print();
>  return return_value;
>  }
>
>>
>>
>> Feb 9, 2022, 17:40 by songofaca...@gmail.com:
>>
>>> On Thu, Feb 10, 2022 at 10:37 AM Jen Kris  wrote:
>>>
>>>>
>>>> I'm using Python 3.8 so I tried your second choice:
>>>>
>>>> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);
>>>>
>>>> but pSents is 0x0.  pSentMod and pListItem are valid pointers.
>>>>
>>>
>>> It means exception happened.
>>> If you are writing Python/C function, return NULL (e.g. `if (pSents ==
>>> NULL) return NULL`)
>>> Then Python show the exception and traceback for you.
>>>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: C API PyObject_Call segfaults with string

2022-02-10 Thread Jen Kris via Python-list

Hi and thanks very much for your comments on reference counting.  Since I'm new 
to the C_API that will help a lot.  I know that reference counting is one of 
the difficult issues with the C API.  

I just posted a reply to Inada Naoki showing how I solved the problem I posted 
yesterday.  

Thanks much for your help.

Jen


Feb 9, 2022, 18:43 by pyt...@mrabarnett.plus.com:

> On 2022-02-10 01:37, Jen Kris via Python-list wrote:
>
>> I'm using Python 3.8 so I tried your second choice:
>>
>> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);
>>
>> but pSents is 0x0.  pSentMod and pListItem are valid pointers.
>>
> 'PyObject_CallFunction' looks like a good one to use:
>
> """PyObject* PyObject_CallFunction(PyObject *callable, const char *format, 
> ...)
>
> Call a callable Python object callable, with a variable number of C 
> arguments. The C arguments are described using a Py_BuildValue() style format 
> string. The format can be NULL, indicating that no arguments are provided.
> """
>
> [snip]
>
> What I do is add comments to keep track of what objects I have references to 
> at each point and whether they are new references or could be NULL.
>
> For example:
>
>  pName = PyUnicode_FromString("nltk.corpus");
>  //> pName+?
>
> This means that 'pName' contains a reference, '+' means that it's a new 
> reference, and '?' means that it could be NULL (usually due to an exception, 
> but not always) so I need to check it.
>
> Continuing in this vein:
>
>  pModule = PyImport_Import(pName);
>  //> pName+? pModule+?
>
>  pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
>  //> pName+? pModule+? pSubMod+?
>  pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
>  //> pName+? pModule+? pSubMod+? pFidMod+?
>  pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>  //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+?
>
>  pFileIds = PyObject_CallObject(pFidMod, 0);
>  //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+?
>  pListItem = PyList_GetItem(pFileIds, listIndex);
>  //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? 
> pListItem?
>  pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>  //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? 
> pListItem? pListStrE+?
>
> As you can see, there's a lot of leaked references building up.
>
> Note how after:
>
>  pListItem = PyList_GetItem(pFileIds, listIndex);
>
> the addition is:
>
>  //> pListItem?
>
> This means that 'pListItem' contains a borrowed (not new) reference, but 
> could be NULL.
>
> I find it easiest to DECREF as soon as I no longer need the reference and 
> remove a name from the list as soon I no longer need it (and DECREFed where).
>
> For example:
>
>  pName = PyUnicode_FromString("nltk.corpus");
>  //> pName+?
>  if (!pName)
>  goto error;
>  //> pName+
>  pModule = PyImport_Import(pName);
>  //> pName+ pModule+?
>  Py_DECREF(pName);
>  //> pModule+?
>  if (!pModule)
>  goto error;
>  //> pModule+
>
> I find that doing this greatly reduces the chances of getting the reference 
> counting wrong, and I can remove the comments once I've finished the function 
> I'm writing.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: C API PyObject_Call segfaults with string

2022-02-10 Thread Jen Kris via Python-list

With the help of PyErr_Print() I have it solved.  Here is the final code (the 
part relevant to sents):

   Py_ssize_t listIndex = 0;
   pListItem = PyList_GetItem(pFileIds, listIndex);
   pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
   pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer

   // Then:  sentences = gutenberg.sents(fileid) - this is a sequence item
   PyObject *c_args = Py_BuildValue("s", pListStr);
   PyObject *args_tuple = PyTuple_New(1);
   PyTuple_SetItem(args_tuple, 0, c_args);

   pSents = PyObject_CallObject(pSentMod, args_tuple);

   if ( pSents == 0x0){
   PyErr_Print();
   return return_value; }

As you mentioned yesterday, CallObject needs a tuple, so that was the problem.  
Now it works.  

You also asked why I don't just use pListStrE.  I tried that and got a long 
error message from PyErr_Print.  I'm not far enough along in my C_API work to 
understand why, but it doesn't work.  

Thanks very much for your help on this.  

Jen


Feb 9, 2022, 17:40 by songofaca...@gmail.com:

> On Thu, Feb 10, 2022 at 10:37 AM Jen Kris  wrote:
>
>>
>> I'm using Python 3.8 so I tried your second choice:
>>
>> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);
>>
>> but pSents is 0x0.  pSentMod and pListItem are valid pointers.
>>
>
> It means exception happened.
> If you are writing Python/C function, return NULL (e.g. `if (pSents ==
> NULL) return NULL`)
> Then Python show the exception and traceback for you.
>
> -- 
> Inada Naoki  
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Jen Kris via Python-list

I'll do that and post back tomorrow.  The office is closing and I have to leave 
now (I'm in Seattle).  Thanks again for your help.  


Feb 9, 2022, 17:40 by songofaca...@gmail.com:

> On Thu, Feb 10, 2022 at 10:37 AM Jen Kris  wrote:
>
>>
>> I'm using Python 3.8 so I tried your second choice:
>>
>> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);
>>
>> but pSents is 0x0.  pSentMod and pListItem are valid pointers.
>>
>
> It means exception happened.
> If you are writing Python/C function, return NULL (e.g. `if (pSents ==
> NULL) return NULL`)
> Then Python show the exception and traceback for you.
>
> -- 
> Inada Naoki  
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Jen Kris via Python-list

I'm using Python 3.8 so I tried your second choice:

pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);

but pSents is 0x0.  pSentMod and pListItem are valid pointers.  


Feb 9, 2022, 17:23 by songofaca...@gmail.com:

> // https://docs.python.org/3/c-api/call.html#c.PyObject_CallNoArgs
> // This function is only for one arg. Python >= 3.9 is required.
> pSents = PyObject_CallOneArg(pSentMod, pListItem);
>
> Or
>
> // https://docs.python.org/3/c-api/call.html#c.PyObject_CallFunctionObjArgs
> // This function can call function with multiple arguments. Can be
> used with Python <3.9 too.
> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);
>
> On Thu, Feb 10, 2022 at 10:15 AM Jen Kris  wrote:
>
>>
>> Right you are.  In that case should I use Py_BuildValue and convert to tuple 
>> (because it won't return a tuple for a one-arg), or should I just convert 
>> pListStr to tuple?  Thanks for your help.
>>
>>
>> Feb 9, 2022, 17:08 by songofaca...@gmail.com:
>>
>> On Thu, Feb 10, 2022 at 10:05 AM Jen Kris  wrote:
>>
>>
>> Thanks for your reply.
>>
>> I eliminated the DECREF and now it doesn't segfault but it returns 0x0. Same 
>> when I substitute pListStrE for pListStr. pListStr contains the string 
>> representation of the fileid, so it seemed like the one to use. According to 
>> http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, 
>> PyBuildValue "builds a tuple only if its format string contains two or more 
>> format units" and that doc contains examples.
>>
>>
>> Yes, and PyObject_Call accept tuple, not str.
>>
>>
>> https://docs.python.org/3/c-api/call.html#c.PyObject_Call
>>
>>
>> Feb 9, 2022, 16:52 by songofaca...@gmail.com:
>>
>> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list
>>  wrote:
>>
>>
>> I have everything finished down to the last line (sentences = 
>> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, 
>> but it segfaults. The fileid is a string -- the first fileid in this corpus 
>> is "austen-emma.txt."
>>
>> pName = PyUnicode_FromString("nltk.corpus");
>> pModule = PyImport_Import(pName);
>>
>> pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
>> pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
>> pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>>
>> pFileIds = PyObject_CallObject(pFidMod, 0);
>> pListItem = PyList_GetItem(pFileIds, listIndex);
>> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>> pListStr = PyBytes_AS_STRING(pListStrE);
>> Py_DECREF(pListStrE);
>>
>>
>> HERE.
>> PyBytes_AS_STRING() returns pointer in the pListStrE Object.
>> So Py_DECREF(pListStrE) makes pListStr a dangling pointer.
>>
>>
>> // sentences = gutenberg.sents(fileid)
>> PyObject *c_args = Py_BuildValue("s", pListStr);
>>
>>
>> Why do you encode pListStrE?
>> Why don't you use just pListStrE?
>>
>> PyObject *NullPtr = 0;
>> pSents = PyObject_Call(pSentMod, c_args, NullPtr);
>>
>>
>> c_args must tuple, but you passed a unicode object here.
>> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue
>>
>> The final line segfaults:
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x76e4e8d5 in _PyEval_EvalCodeWithName ()
>> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
>>
>> My guess is the problem is in Py_BuildValue, which returns a pointer but it 
>> may not be constructed correctly. I also tried it with "O" and it doesn't 
>> segfault but it returns 0x0.
>>
>> I'm new to using the C API. Thanks for any help.
>>
>> Jen
>>
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>>
>> Bests,
>>
>> --
>> Inada Naoki 
>>
>>
>>
>> --
>> Inada Naoki 
>>
>
>
> -- 
> Inada Naoki  
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Jen Kris via Python-list

Right you are.  In that case should I use Py_BuildValue and convert to tuple 
(because it won't return a tuple for a one-arg), or should I just convert 
pListStr to tuple?  Thanks for your help.  


Feb 9, 2022, 17:08 by songofaca...@gmail.com:

> On Thu, Feb 10, 2022 at 10:05 AM Jen Kris  wrote:
>
>>
>> Thanks for your reply.
>>
>> I eliminated the DECREF and now it doesn't segfault but it returns 0x0.  
>> Same when I substitute pListStrE for pListStr.  pListStr contains the string 
>> representation of the fileid, so it seemed like the one to use.  According 
>> to  http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, 
>> PyBuildValue "builds a tuple only if its format string contains two or more 
>> format units" and that doc contains examples.
>>
>
> Yes, and PyObject_Call accept tuple, not str.
>
>
> https://docs.python.org/3/c-api/call.html#c.PyObject_Call
>
>>
>> Feb 9, 2022, 16:52 by songofaca...@gmail.com:
>>
>> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list
>>  wrote:
>>
>>
>> I have everything finished down to the last line (sentences = 
>> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, 
>> but it segfaults. The fileid is a string -- the first fileid in this corpus 
>> is "austen-emma.txt."
>>
>> pName = PyUnicode_FromString("nltk.corpus");
>> pModule = PyImport_Import(pName);
>>
>> pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
>> pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
>> pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>>
>> pFileIds = PyObject_CallObject(pFidMod, 0);
>> pListItem = PyList_GetItem(pFileIds, listIndex);
>> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>> pListStr = PyBytes_AS_STRING(pListStrE);
>> Py_DECREF(pListStrE);
>>
>>
>> HERE.
>> PyBytes_AS_STRING() returns pointer in the pListStrE Object.
>> So Py_DECREF(pListStrE) makes pListStr a dangling pointer.
>>
>>
>> // sentences = gutenberg.sents(fileid)
>> PyObject *c_args = Py_BuildValue("s", pListStr);
>>
>>
>> Why do you encode pListStrE?
>> Why don't you use just pListStrE?
>>
>> PyObject *NullPtr = 0;
>> pSents = PyObject_Call(pSentMod, c_args, NullPtr);
>>
>>
>> c_args must tuple, but you passed a unicode object here.
>> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue
>>
>> The final line segfaults:
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x76e4e8d5 in _PyEval_EvalCodeWithName ()
>> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
>>
>> My guess is the problem is in Py_BuildValue, which returns a pointer but it 
>> may not be constructed correctly. I also tried it with "O" and it doesn't 
>> segfault but it returns 0x0.
>>
>> I'm new to using the C API. Thanks for any help.
>>
>> Jen
>>
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>>
>> Bests,
>>
>> --
>> Inada Naoki 
>>
>
>
> -- 
> Inada Naoki  
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Jen Kris via Python-list

Thanks for your reply.  

I eliminated the DECREF and now it doesn't segfault but it returns 0x0.  Same 
when I substitute pListStrE for pListStr.  pListStr contains the string 
representation of the fileid, so it seemed like the one to use.  According to  
http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, PyBuildValue 
"builds a tuple only if its format string contains two or more format units" 
and that doc contains examples. 


Feb 9, 2022, 16:52 by songofaca...@gmail.com:

> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list
>  wrote:
>
>>
>> I have everything finished down to the last line (sentences = 
>> gutenberg.sents(fileid)) where I use  PyObject_Call to call gutenberg.sents, 
>> but it segfaults.  The fileid is a string -- the first fileid in this corpus 
>> is "austen-emma.txt."
>>
>> pName = PyUnicode_FromString("nltk.corpus");
>> pModule = PyImport_Import(pName);
>>
>> pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
>> pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
>> pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>>
>> pFileIds = PyObject_CallObject(pFidMod, 0);
>> pListItem = PyList_GetItem(pFileIds, listIndex);
>> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>> pListStr = PyBytes_AS_STRING(pListStrE);
>> Py_DECREF(pListStrE);
>>
>
> HERE.
> PyBytes_AS_STRING() returns pointer in the pListStrE Object.
> So Py_DECREF(pListStrE) makes pListStr a dangling pointer.
>
>>
>> // sentences = gutenberg.sents(fileid)
>> PyObject *c_args = Py_BuildValue("s", pListStr);
>>
>
> Why do you encode pListStrE?
> Why don't you use just pListStrE?
>
>> PyObject *NullPtr = 0;
>> pSents = PyObject_Call(pSentMod, c_args, NullPtr);
>>
>
> c_args must tuple, but you passed a unicode object here.
> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue
>
>
>> The final line segfaults:
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x76e4e8d5 in _PyEval_EvalCodeWithName ()
>>  from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
>>
>> My guess is the problem is in Py_BuildValue, which returns a pointer but it 
>> may not be constructed correctly.  I also tried it with "O" and it doesn't 
>> segfault but it returns 0x0.
>>
>> I'm new to using the C API.  Thanks for any help.
>>
>> Jen
>>
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
> Bests,
>
> -- 
> Inada Naoki  
>

-- 
https://mail.python.org/mailman/listinfo/python-list

C API PyObject_Call segfaults with string

2022-02-09 Thread Jen Kris via Python-list

This is a follow-on to a question I asked yesterday, which was answered by 
MRAB.   I'm using the Python C API to load the Gutenberg corpus from the nltk 
library and iterate through the sentences.  The Python code I am trying to 
replicate is:

from nltk.corpus import gutenberg
for i, fileid in enumerate(gutenberg.fileids()):
    sentences = gutenberg.sents(fileid)
    etc

I have everything finished down to the last line (sentences = 
gutenberg.sents(fileid)) where I use  PyObject_Call to call gutenberg.sents, 
but it segfaults.  The fileid is a string -- the first fileid in this corpus is 
"austen-emma.txt."  

pName = PyUnicode_FromString("nltk.corpus");
pModule = PyImport_Import(pName);

pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
pSentMod = PyObject_GetAttrString(pSubMod, "sents");

pFileIds = PyObject_CallObject(pFidMod, 0);
pListItem = PyList_GetItem(pFileIds, listIndex);
pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
pListStr = PyBytes_AS_STRING(pListStrE);
Py_DECREF(pListStrE);

// sentences = gutenberg.sents(fileid)
PyObject *c_args = Py_BuildValue("s", pListStr);  
PyObject *NullPtr = 0;
pSents = PyObject_Call(pSentMod, c_args, NullPtr);

The final line segfaults:
Program received signal SIGSEGV, Segmentation fault.
0x76e4e8d5 in _PyEval_EvalCodeWithName ()
   from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0

My guess is the problem is in Py_BuildValue, which returns a pointer but it may 
not be constructed correctly.  I also tried it with "O" and it doesn't segfault 
but it returns 0x0. 

I'm new to using the C API.  Thanks for any help. 

Jen


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Can't get iterator in the C API

2022-02-09 Thread Jen Kris via Python-list

Thank you for clarifying that.  Now on to getting the iterator from the method. 
 

Jen


Feb 8, 2022, 18:10 by pyt...@mrabarnett.plus.com:

> On 2022-02-09 01:12, Jen Kris via Python-list wrote:
>
>> I am using the Python C API to load the Gutenberg corpus from the nltk 
>> library and iterate through the sentences.  The Python code I am trying to 
>> replicate is:
>>
>> from nltk.corpus import gutenberg
>> for i, fileid in enumerate(gutenberg.fileids()):
>>      sentences = gutenberg.sents(fileid)
>>      etc
>>
>> where gutenberg.fileids is, of course, iterable.
>>
>> I use the following C API code to import the module and get pointers:
>>
>> int64_t Call_PyModule()
>> {
>>      PyObject *pModule, *pName, *pSubMod, *pFidMod, *pFidSeqIter,*pSentMod;
>>
>>      pName = PyUnicode_FromString("nltk.corpus");
>>      pModule = PyImport_Import(pName);
>>
>>      if (pModule == 0x0){
>>      PyErr_Print();
>>      return 1; }
>>
>>      pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
>>      pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
>>      pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>>
>>      pFidIter = PyObject_GetIter(pFidMod);
>>      int ckseq_ok = PySeqIter_Check(pFidMod);
>>      pFidSeqIter  = PySeqIter_New(pFidMod);
>>
>>      return 0;
>> }
>>
>> pSubMod, pFidMod and pSentMod all return valid pointers, but the iterator 
>> lines return zero:
>>
>> pFidIter = PyObject_GetIter(pFidMod);
>> int ckseq_ok = PySeqIter_Check(pFidMod);
>> pFidSeqIter  = PySeqIter_New(pFidMod);
>>
>> So the C API thinks gutenberg.fileids is not iterable, but it is.  What am I 
>> doing wrong?
>>
> Look at your Python code. You have "gutenberg.fileids()", so the 'fileids' 
> attribute is not an iterable itself, but a method that you need to call to 
> get the iterable.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Can't get iterator in the C API

2022-02-08 Thread Jen Kris via Python-list

I am using the Python C API to load the Gutenberg corpus from the nltk library 
and iterate through the sentences.  The Python code I am trying to replicate is:

from nltk.corpus import gutenberg
for i, fileid in enumerate(gutenberg.fileids()):
    sentences = gutenberg.sents(fileid)
    etc

where gutenberg.fileids is, of course, iterable. 

I use the following C API code to import the module and get pointers:

int64_t Call_PyModule()
{
    PyObject *pModule, *pName, *pSubMod, *pFidMod, *pFidSeqIter,*pSentMod;

    pName = PyUnicode_FromString("nltk.corpus");
    pModule = PyImport_Import(pName);

    if (pModule == 0x0){
    PyErr_Print();
    return 1; }

    pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
    pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
    pSentMod = PyObject_GetAttrString(pSubMod, "sents");

    pFidIter = PyObject_GetIter(pFidMod);
    int ckseq_ok = PySeqIter_Check(pFidMod);
    pFidSeqIter  = PySeqIter_New(pFidMod);

    return 0;
}

pSubMod, pFidMod and pSentMod all return valid pointers, but the iterator lines 
return zero: 

pFidIter = PyObject_GetIter(pFidMod);
int ckseq_ok = PySeqIter_Check(pFidMod);
pFidSeqIter  = PySeqIter_New(pFidMod);

So the C API thinks gutenberg.fileids is not iterable, but it is.  What am I 
doing wrong?


-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Data unchanged when passing data to Python in multiprocessing shared memory

2022-02-02 Thread Jen Kris via Python-list

An ASCII string will not work.  If you convert 32894 to an ascii string you 
will have five bytes, but you need four.  In my original post I showed the C 
program I used to convert any 32-bit number to 4 bytes.  


Feb 2, 2022, 10:16 by python-list@python.org:

> I applaud trying to find the right solution but wonder if a more trivial 
> solution is even being considered. It ignores big and little endians and just 
> converts your data into another form and back.
>
> If all you want to do is send an integer that fit in 32 bits or 64 bits, why 
> not convert it to a character string in a form that both machines will see 
> the same way and when read back, convert it back to an integer? 
>
> As long as both side see the same string, this can be done in reasonable time 
> and portably.
>
> Or am I missing something? Is "1234" not necessarily seen in the same order, 
> or "1.234e3" or whatever?
>
> Obviously, if the mechanism is heavily used and multiple sides keep reading 
> and even writing the same memory location, this is not ideal. But having 
> different incompatible processors looking at the same memory is also not.
>
> -Original Message-
> From: Dennis Lee Bieber 
> To: python-list@python.org
> Sent: Wed, Feb 2, 2022 12:30 am
> Subject: Re: Data unchanged when passing data to Python in multiprocessing 
> shared memory
>
>
> On Wed, 2 Feb 2022 00:40:22 +0100 (CET), Jen Kris 
>
> declaimed the following:
>
>
>
>>
>>
>> breakup = int.from_bytes(byte_val, "big")
>>
>
> >print("this is breakup " + str(breakup))
>
>>
>>
>
> >Python prints:  this is breakup 32894
>
>>
>>
>
> >Note that I had to switch from little endian to big endian.  Python is 
> >little endian by default, but in this case it's big endian.  
>
>>
>>
>
>     Look at the struct module. I'm pretty certain it has flags for big or
>
> little end, or system native (that, or run your integers through the
>
> various "network byte order" functions that I think C and Python both
>
> support.
>
>
>
> https://www.gta.ufrj.br/ensino/eel878/sockets/htonsman.html
>
>
>
>
>
> >However, if anyone on this list knows how to pass data from a non-Python 
> >language to Python in multiprocessing.shared_memory please let me (and the 
> >list) know.  
>
>
>
>     MMU cache lines not writing through to RAM? Can't find
>
> anything on Google to force a cache flush Can you test on a
>
> different OS? (Windows vs Linux)
>
>
>
>
>
>
>
> -- 
>
>     Wulfraed                 Dennis Lee Bieber         AF6VN
>
>     wlfr...@ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/
>
> -- 
>
> https://mail.python.org/mailman/listinfo/python-list
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Data unchanged when passing data to Python in multiprocessing shared memory

2022-02-02 Thread Jen Kris via Python-list

It's not clear to me from the struct module whether it can actually auto-detect 
endianness.  I think it must be specified, just as I had to do with 
int.from_bytes().  In my case endianness was dictated by how the four bytes 
were populated, starting with the zero bytes on the left.  


Feb 1, 2022, 21:30 by wlfr...@ix.netcom.com:

> On Wed, 2 Feb 2022 00:40:22 +0100 (CET), Jen Kris 
> declaimed the following:
>
>>
>> breakup = int.from_bytes(byte_val, "big")
>>
> >print("this is breakup " + str(breakup))
>
>>
>>
> >Python prints:  this is breakup 32894
>
>>
>>
> >Note that I had to switch from little endian to big endian.  Python is 
> >little endian by default, but in this case it's big endian.  
>
>>
>>
> Look at the struct module. I'm pretty certain it has flags for big or
> little end, or system native (that, or run your integers through the
> various "network byte order" functions that I think C and Python both
> support.
>
> https://www.gta.ufrj.br/ensino/eel878/sockets/htonsman.html
>
>
> >However, if anyone on this list knows how to pass data from a non-Python 
> >language to Python in multiprocessing.shared_memory please let me (and the 
> >list) know.  
>
>  MMU cache lines not writing through to RAM? Can't find
> anything on Google to force a cache flush Can you test on a
> different OS? (Windows vs Linux)
>
>
>
> -- 
>  Wulfraed Dennis Lee Bieber AF6VN
>  wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Data unchanged when passing data to Python in multiprocessing shared memory

2022-02-01 Thread Jen Kris via Python-list

Barry, thanks for your reply.  

On the theory that it is not yet possible to pass data from a non-Python 
language to Python with multiprocessing.shared_memory, I bypassed the problem 
by attaching 4 bytes to my FIFO pipe message from NASM to Python:

byte_val = v[10:14]

where v is the message read from the FIFO.  Then:

 breakup = int.from_bytes(byte_val, "big")
print("this is breakup " + str(breakup))

Python prints:  this is breakup 32894

Note that I had to switch from little endian to big endian.  Python is little 
endian by default, but in this case it's big endian.  

However, if anyone on this list knows how to pass data from a non-Python 
language to Python in multiprocessing.shared_memory please let me (and the 
list) know.  

Thanks.  


Feb 1, 2022, 14:20 by ba...@barrys-emacs.org:

>
>
>> On 1 Feb 2022, at 20:26, Jen Kris via Python-list  
>> wrote:
>>
>> I am using multiprocesssing.shared_memory to pass data between NASM and 
>> Python.  The shared memory is created in NASM before Python is called.  
>> Python connects to the shm:  shm_00 = 
>> shared_memory.SharedMemory(name='shm_object_00',create=False). 
>>
>> I have used shared memory at other points in this project to pass text data 
>> from Python back to NASM with no problems.  But now this time I need to pass 
>> a 32-bit integer (specifically 32,894) from NASM to Python. 
>>
>> First I convert the integer to bytes in a C program linked into NASM:
>>
>>  unsigned char bytes[4]
>>  unsigned long int_to_convert = 32894;
>>
>>  bytes[0] = (int_to_convert >> 24) & 0xFF;
>>  bytes[1] = (int_to_convert >> 16) & 0xFF;
>>  bytes[2] = (int_to_convert >> 8) & 0xFF;
>>  bytes[3] = int_to_convert & 0xFF;
>>  memcpy(outbuf, bytes, 4);
>>
>> where outbuf is a pointer to the shared memory.  On return from C to NASM, I 
>> verify that the first four bytes of the shared memory contain what I want, 
>> and they are 0, 0, -128, 126 which is binary   1000 
>> 0110, and that's correct (32,894). 
>>
>> Next I send a message to Python through a FIFO to read the data from shared 
>> memory.  Python uses the following code to read the first four bytes of the 
>> shared memory:
>>
>>  byte_val = shm_00.buf[:4]
>>  print(shm_00.buf[0])
>>  print(shm_00.buf[1])
>>  print(shm_00.buf[2])
>>  print(shm_00.buf[3])
>>
>> But the bytes show as 40 39 96 96, which is exactly what the first four 
>> bytes of this shared memory contained before I called C to overwrite them 
>> with the bytes 0, 0, -128, 126.  So Python does not see the updated bytes, 
>> and naturally int.from_bytes(byte_val, "little") does not return the result 
>> I want. 
>>
>> I know that Python refers to shm00.buf, using the buffer protocol.  Is that 
>> the reason that Python can't see the data that has been updated by another 
>> language? 
>>
>> So my question is, how can I alter the data in shared memory in a non-Python 
>> language to pass back to Python?
>>
>
> Maybe you need to use a memory barrier to force the data to be seen by 
> another cpu?
> Maybe use shm lock operation to sync both sides?
> Googling I see people talking about using stdatomic.h for this.
>
> But I am far from clear what you would need to do.
>
> Barry
>
>>
>> Thanks,
>>
>> Jen
>>
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
>>

-- 
https://mail.python.org/mailman/listinfo/python-list

Data unchanged when passing data to Python in multiprocessing shared memory

2022-02-01 Thread Jen Kris via Python-list

I am using multiprocesssing.shared_memory to pass data between NASM and Python. 
 The shared memory is created in NASM before Python is called.  Python connects 
to the shm:  shm_00 = 
shared_memory.SharedMemory(name='shm_object_00',create=False).  

I have used shared memory at other points in this project to pass text data 
from Python back to NASM with no problems.  But now this time I need to pass a 
32-bit integer (specifically 32,894) from NASM to Python. 

First I convert the integer to bytes in a C program linked into NASM:

    unsigned char bytes[4]
    unsigned long int_to_convert = 32894;

    bytes[0] = (int_to_convert >> 24) & 0xFF;
    bytes[1] = (int_to_convert >> 16) & 0xFF;
    bytes[2] = (int_to_convert >> 8) & 0xFF;
    bytes[3] = int_to_convert & 0xFF;
    memcpy(outbuf, bytes, 4);

where outbuf is a pointer to the shared memory.  On return from C to NASM, I 
verify that the first four bytes of the shared memory contain what I want, and 
they are 0, 0, -128, 126 which is binary   1000 0110, 
and that's correct (32,894). 

Next I send a message to Python through a FIFO to read the data from shared 
memory.  Python uses the following code to read the first four bytes of the 
shared memory:

    byte_val = shm_00.buf[:4]
    print(shm_00.buf[0])
    print(shm_00.buf[1])
    print(shm_00.buf[2])
    print(shm_00.buf[3])

But the bytes show as 40 39 96 96, which is exactly what the first four bytes 
of this shared memory contained before I called C to overwrite them with the 
bytes 0, 0, -128, 126.  So Python does not see the updated bytes, and naturally 
int.from_bytes(byte_val, "little") does not return the result I want. 

I know that Python refers to shm00.buf, using the buffer protocol.  Is that the 
reason that Python can't see the data that has been updated by another 
language? 

So my question is, how can I alter the data in shared memory in a non-Python 
language to pass back to Python? 

Thanks,

Jen

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python child process in while True loop blocks parent

2021-12-08 Thread Jen Kris via Python-list

I started this post on November 29, and there have been helpful comments since 
then from Barry Scott, Cameron Simpson, Peter Holzer and Chris Angelico.  
Thanks to all of you.  

I've found a solution that works for my purpose, and I said earlier that I 
would post the solution I found. If anyone has a better solution I would 
appreciate any feedback. 

To recap, I'm using a pair of named pipes for IPC between C and Python.  Python 
runs as a child process after fork-execv.  The Python program continues to run 
concurrently in a while True loop, and responds to requests from C at 
intervals, and continues to run until it receives a signal from C to exit.  C 
sends signals to Python, then waits to receive data back from Python.  My 
problem was that C was blocked when Python started. 

The solution was twofold:  (1) for Python to run concurrently it must be a 
multiprocessing loop (from the multiprocessing module), and (2) Python must 
terminate its write strings with \n, or read will block in C waiting for 
something that never comes.  The multiprocessing module sidesteps the GIL; 
without multiprocessing the GIL will block all other threads once Python 
starts. 

Originally I used epoll() on the pipes.  Cameron Smith and Barry Scott advised 
against epoll, and for this case they are right.  Blocking pipes work here, and 
epoll is too much overhead for watching on a single file descriptor. 

This is the Python code now:

#!/usr/bin/python3
from multiprocessing import Process
import os

print("Python is running")

child_pid = os.getpid()
print('child process id:', child_pid)

def f(a, b):

    print("Python now in function f")

    pr = os.open('/tmp/Pipe_01', os.O_RDONLY)
    print("File Descriptor1 Opened " + str(pr))
    pw = os.open('/tmp/Pipe_02', os.O_WRONLY)
    print("File Descriptor2 Opened " + str(pw))

    while True:

    v = os.read(pr,64)
    print("Python read from pipe pr")
    print(v)

    if v == b'99':
    os.close(pr)
    os.close(pw)
    print("Python is terminating")
    os._exit(os.EX_OK)

    if v != "Send child PID":
    os.write(pw, b"OK message received\n")
    print("Python wrote back")

if __name__ == '__main__':
    a = 0
    b = 0
    p = Process(target=f, args=(a, b,))
    p.start()
    p.join()

The variables a and b are not currently used in the body, but they will be 
later. 

This is the part of the C code that communicates with Python:

    fifo_fd1 = open(fifo_path1, O_WRONLY);
    fifo_fd2 = open(fifo_path2, O_RDONLY);

    status_write = write(fifo_fd1, py_msg_01, sizeof(py_msg_01));
    if (status_write < 0) perror("write");

    status_read = read(fifo_fd2, fifo_readbuf, sizeof(py_msg_01));
    if (status_read < 0) perror("read");
    printf("C received message 1 from Python\n");
    printf("%.*s",(int)buf_len, fifo_readbuf);

    status_write = write(fifo_fd1, py_msg_02, sizeof(py_msg_02));
    if (status_write < 0) perror("write");

    status_read = read(fifo_fd2, fifo_readbuf, sizeof(py_msg_02));
    if (status_read < 0) perror("read");
    printf("C received message 2 from Python\n");
    printf("%.*s",(int)buf_len, fifo_readbuf);

    // Terminate Python multiprocessing
    printf("C is sending exit message to Python\n");
    status_write = write(fifo_fd1, py_msg_03, 2);

    printf("C is closing\n");
    close(fifo_fd1);
    close(fifo_fd2);

Screen output:

Python is running
child process id: 5353
Python now in function f
File Descriptor1 Opened 6
Thread created 0
File Descriptor2 Opened 7
Process ID: 5351
Parent Process ID: 5351
I am the parent
Core joined 0
I am the child
Python read from pipe pr
b'Hello to Python from C\x00\x00'
Python wrote back
C received message 1 from Python
OK message received
Python read from pipe pr
b'Message to Python 2\x00\x00'
Python wrote back
C received message 2 from Python
OK message received
C is sending exit message to Python
C is closing
Python read from pipe pr
b'99'
Python is terminating

Python runs on a separate thread (created with pthreads) because I want the 
flexibility of using this same basic code as a stand-alone .exe, or for a C 
extension from Python called with ctypes.  If I use it as a C extension then I 
want the Python code on a separate thread because I can't have two instances of 
the Python interpreter running on one thread, and one instance will already be 
running on the main thread, albeit "suspended" by the call from ctypes. 

So that's my solution:  (1) Python multiprocessing module; (2) Python strings 
written to the pipe must be terminated with \n. 

Thanks again to all who commented. 



Dec 6, 2021, 13:33 by ba...@barrys-emacs.org:

>
>
>
>> On 6 Dec 2021, at 21:05, Jen Kris <>> jenk...@tutanota.com>> > wrote:
>>
>> Here is what I don't understand from what you said.  "The child process is 
>> created with a single thread—the one that called fork()."  To me that 
>> implies that the thread that called fork() is the same

Re: Python child process in while True loop blocks parent

2021-12-06 Thread Jen Kris via Python-list

Here is what I don't understand from what you said.  "The child process is 
created with a single thread—the one that called fork()."  To me that implies 
that the thread that called fork() is the same thread as the child process.  I 
guess you're talking about the distinction between logical threads and physical 
threads.  

But the main issue is your suggestion that I should call fork-execv from the 
thread that runs the main C program, not from a separate physical pthread.  
That would certainly eliminate the overhead of creating a new pthread. 

I am working now to finish this, and I will try your suggestion of calling 
fork-execv from the "main" thread.  When I reply back next I can give you a 
complete picture of what I'm doing. 

Your comments, and those of Peter Holzer and Chris Angelico, are most 
appreciated. 




Dec 6, 2021, 10:37 by ba...@barrys-emacs.org:

>
>
>> On 6 Dec 2021, at 17:09, Jen Kris via Python-list  
>> wrote:
>>
>> I can't find any support for your comment that "Fork creates a new
>> process and therefore also a new thread."  From the Linux man pages 
>> https://www.man7.org/linux/man-pages/man2/fork.2.html, "The child process is 
>> created with a single thread—the one that called fork()."
>>
>
> You just quoted the evidence!
>
> All new processes on unix (may all OS) only ever have one thread when they 
> start.
> The thread-id of the first thread is the same as the process-id and referred 
> to as the main thread.
>
>>
>> I have a one-core one-thread instance at Digital Ocean available running 
>> Ubuntu 18.04.  I can fork and create a new process on it, but it doesn't 
>> create a new thread because it doesn't have one available.
>>
>
>
> By that logic it can only run one process...
>
> It has one hardware core that support one hardware thread.
> Linux can create as many software threads as it likes.
>
>> You may also want to see "Forking vs Threading" 
>> (https://www.geekride.com/fork-forking-vs-threading-thread-linux-kernel), 
>> "Fork vs Thread" 
>> (https://medium.com/obscure-system/fork-vs-thread-38e09ec099e2), and "Linux 
>> process and thread" (https://zliu.org/post/linux-process-and-thread) ("This 
>> means that to create a normal process fork() is used that further calls 
>> clone() with appropriate arguments while to create a thread or LWP, a 
>> function from pthread library calls clone() with relvant flags. So, the main 
>> difference is generated by using different flags that can be passed to 
>> clone() funciton(to be exact, it is a system call"). 
>>
>> You may be confused by the fact that threads are called light-weight 
>> processes.
>>
>
> No Peter and I are not confused.
>
>>
>> Or maybe I'm confused :)
>>
>
> Yes you are confused.
>
>>
>> If you have other information, please let me know.  Thanks.
>>
>
> Please get the book I recommended, or another that covers systems programming 
> on unix, and have a read.
>
> Barry
>
>>
>> Jen
>>
>>
>> Dec 5, 2021, 18:08 by hjp-pyt...@hjp.at:
>>
>>> On 2021-12-06 00:51:13 +0100, Jen Kris via Python-list wrote:
>>>
>>>> The C program creates two threads (using pthreads), one for itself and
>>>> one for the child process.  On creation, the second pthread is pointed
>>>> to a C program that calls fork-execv to run the Python program.  That
>>>> way Python runs on a separate thread. 
>>>>
>>>
>>> I think you have the relationship between processes and threads
>>> backwards. A process consists of one or more threads. Fork creates a new
>>> process and therefore also a new thread.
>>>
>>> hp
>>>
>>> -- 
>>> _  | Peter J. Holzer| Story must make more sense than reality.
>>> |_|_) ||
>>> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
>>> __/   | http://www.hjp.at/ |   challenge!"
>>>
>>
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
>>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python child process in while True loop blocks parent

2021-12-06 Thread Jen Kris via Python-list

I can't find any support for your comment that "Fork creates a new
process and therefore also a new thread."  From the Linux man pages 
https://www.man7.org/linux/man-pages/man2/fork.2.html, "The child process is 
created with a single thread—the one that called fork()." 

I have a one-core one-thread instance at Digital Ocean available running Ubuntu 
18.04.  I can fork and create a new process on it, but it doesn't create a new 
thread because it doesn't have one available. 

You may also want to see "Forking vs Threading" 
(https://www.geekride.com/fork-forking-vs-threading-thread-linux-kernel), "Fork 
vs Thread" (https://medium.com/obscure-system/fork-vs-thread-38e09ec099e2), and 
"Linux process and thread" (https://zliu.org/post/linux-process-and-thread) 
("This means that to create a normal process fork() is used that further calls 
clone() with appropriate arguments while to create a thread or LWP, a function 
from pthread library calls clone() with relvant flags. So, the main difference 
is generated by using different flags that can be passed to clone() funciton(to 
be exact, it is a system call"). 

You may be confused by the fact that threads are called light-weight processes. 

Or maybe I'm confused :)

If you have other information, please let me know.  Thanks. 

Jen

Dec 5, 2021, 18:08 by hjp-pyt...@hjp.at:

> On 2021-12-06 00:51:13 +0100, Jen Kris via Python-list wrote:
>
>> The C program creates two threads (using pthreads), one for itself and
>> one for the child process.  On creation, the second pthread is pointed
>> to a C program that calls fork-execv to run the Python program.  That
>> way Python runs on a separate thread. 
>>
>
> I think you have the relationship between processes and threads
> backwards. A process consists of one or more threads. Fork creates a new
> process and therefore also a new thread.
>
>  hp
>
> -- 
>  _  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python child process in while True loop blocks parent

2021-12-05 Thread Jen Kris via Python-list

t;> Maybe "Advanced Programming in the UNIX Environment" >>> would be helpful?
>>>
>>> https://www.amazon.co.uk/Programming-Environment-Addison-Wesley-Professional-Computing-dp-0321637739/dp/0321637739/ref=dp_ob_image_bk>>>
>>>   
>>>
>>> It's a great book and covers a wide range of Unix systems programming 
>>> topics.
>>>
>>> Have you created a small C program that just does the fork and exec of a 
>>> python program to test out your assumptions?
>>> If not I recommend that you do.
>>>
>>> Barry
>>>
>>>
>>>
>>>>
>>>>
>>>>
>>>> Nov 30, 2021, 11:42 by >>>> ba...@barrys-emacs.org>>>> :
>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On 29 Nov 2021, at 22:31, Jen Kris <>>>>>> jenk...@tutanota.com>>>>>> > 
>>>>>> wrote:
>>>>>>
>>>>>> Thanks to you and Cameron for your replies.  The C side has an epoll_ctl 
>>>>>> set, but no event loop to handle it yet.  I'm putting that in now with a 
>>>>>> pipe write in Python-- as Cameron pointed out that is the likely source 
>>>>>> of blocking on C.  The pipes are opened as rdwr in Python because that's 
>>>>>> nonblocking by default.  The child will become more complex, but not in 
>>>>>> a way that affects polling.  And thanks for the tip about the c-string 
>>>>>> termination. 
>>>>>>
>>>>>>
>>>>>
>>>>> flags is a bit mask. You say its BLOCKing by not setting os.O_NONBLOCK.
>>>>> You should not use O_RDWR when you only need O_RDONLY access or only 
>>>>> O_WRONLY access.
>>>>>
>>>>> You may find
>>>>>
>>>>> man 2 open
>>>>>
>>>>> useful to understand in detail what is behind os.open().
>>>>>
>>>>> Barry
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> Nov 29, 2021, 14:12 by >>>>>> ba...@barrys-emacs.org>>>>>> :
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On 29 Nov 2021, at 20:36, Jen Kris via Python-list <>>>>>>>> 
>>>>>>>> python-list@python.org>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>> I have a C program that forks to create a child process and uses 
>>>>>>>> execv to call a Python program.  The Python program communicates with 
>>>>>>>> the parent process (in C) through a FIFO pipe monitored with epoll(). 
>>>>>>>>
>>>>>>>> The Python child process is in a while True loop, which is intended to 
>>>>>>>> keep it running while the parent process proceeds, and perform 
>>>>>>>> functions for the C program only at intervals when the parent sends 
>>>>>>>> data to the child -- similar to a daemon process. 
>>>>>>>>
>>>>>>>> The C process writes to its end of the pipe and the child process 
>>>>>>>> reads it, but then the child process continues to loop, thereby 
>>>>>>>> blocking the parent. 
>>>>>>>>
>>>>>>>> This is the Python code:
>>>>>>>>
>>>>>>>> #!/usr/bin/python3
>>>>>>>> import os
>>>>>>>> import select
>>>>>>>>
>>>>>>>> #Open the named pipes
>>>>>>>> pr = os.open('/tmp/Pipe_01', os.O_RDWR)
>>>>>>>>
>>>>>>> Why open rdwr if you are only going to read the pipe?
>>>>>>>
>>>>>>>> pw = os.open('/tmp/Pipe_02', os.O_RDWR)
>>>>>>>>
>>>>>>> Only need to open for write.
>>>>>>>
>>>>>>>>
>>>>>>>> ep = select.epoll(-1)
>>>>>>>> ep.register(pr, select.EPOLLIN)
>>>>>>>>
>>>>>>>
>>>>>>> Is the only thing that the child does this:
>>>>>>> 1. Read message from pr
>>>>>>> 2. Process message
>>>>>>> 3. Write result to pw.
>>>>>>> 4. Loop from 1
>>>>>>>
>>>>>>> If so as Cameron said you do not need to worry about the poll.
>>>>>>> Do you plan for the child to become more complex?
>>>>>>>
>>>>>>>>
>>>>>>>> while True:
>>>>>>>>
>>>>>>>> events = ep.poll(timeout=2.5, maxevents=-1)
>>>>>>>> #events = ep.poll(timeout=None, maxevents=-1)
>>>>>>>>
>>>>>>>> print("child is looping")
>>>>>>>>
>>>>>>>> for fileno, event in events:
>>>>>>>> print("Python fileno")
>>>>>>>> print(fileno)
>>>>>>>> print("Python event")
>>>>>>>> print(event)
>>>>>>>> v = os.read(pr,64)
>>>>>>>> print("Pipe value")
>>>>>>>> print(v)
>>>>>>>>
>>>>>>>> The child process correctly receives the signal from ep.poll and 
>>>>>>>> correctly reads the data in the pipe, but then it continues looping.  
>>>>>>>> For example, when I put in a timeout:
>>>>>>>>
>>>>>>>> child is looping
>>>>>>>> Python fileno
>>>>>>>> 4
>>>>>>>> Python event
>>>>>>>> 1
>>>>>>>> Pipe value
>>>>>>>> b'10\x00'
>>>>>>>>
>>>>>>> The C code does not need to write a 0 bytes at the end.
>>>>>>> I assume the 0 is from the end of a C string.
>>>>>>> UDS messages have a length.
>>>>>>> In the C just write 2 byes in the case.
>>>>>>>
>>>>>>> Barry
>>>>>>>
>>>>>>>> child is looping
>>>>>>>> child is looping
>>>>>>>>
>>>>>>>> That suggests that a while True loop is not the right thing to do in 
>>>>>>>> this case.  My question is, what type of process loop is best for this 
>>>>>>>> situation?  The multiprocessing, asyncio and subprocess libraries are 
>>>>>>>> very extensive, and it would help if someone could suggest the best 
>>>>>>>> alternative for what I am doing here. 
>>>>>>>>
>>>>>>>> Thanks very much for any ideas. 
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> https://mail.python.org/mailman/listinfo/python-list
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>
>>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python child process in while True loop blocks parent

2021-12-05 Thread Jen Kris via Python-list

Thanks for your comments.  

I put the Python program on its own pthread, and call a small C program to 
fork-execv to call the Python program as a child process.  I revised the Python 
program to be a multiprocessing loop using the Python multiprocessing module.  
That bypasses the GIL and allows Python to run concurrently with C.  So far so 
good.  

Next I will use Linux pipes, not Python multiprocessing pipes, for IPC between 
Python and C.  Multiprocessing pipes are (as far as I can tell) only for commo 
between two Python processes.  I will have the parent thread send a signal 
through the pipe to the child process to exit when the parent thread is ready 
to exit, then call wait() to finalize the child process.  

I will reply back when it's finished and post the code so you can see what I 
have done.  

Thanks again.  

Jen


Dec 4, 2021, 09:22 by ba...@barrys-emacs.org:

>
>
>> On 1 Dec 2021, at 16:01, Jen Kris <>> jenk...@tutanota.com>> > wrote:
>>
>> Thanks for your comment re blocking.  
>>
>> I removed pipes from the Python and C programs to see if it blocks without 
>> them, and it does.
>>
>> It looks now like the problem is not pipes.
>>
>
> Ok.
>
>
>> I use fork() and execv() in C to run Python in a child process, but the 
>> Python process blocks
>>
>
> Use strace on the parent process to see what is happening.
> You will need to use the option to follow subprocesses so that you can see 
> what goes on in the python process.
>
> See man strace and the --follow-forks and --output-separately options.
> That will allow you to find the blocking system call that your code is making.
>
>
>> because fork() does not create a new thread, so the Python global 
>> interpreter lock (GIL) prevents the C program from running once Python 
>> starts.
>>
>
> Not sure why you think this.
>
>
>>   So the solution appears to be run Python in a separate thread, which I can 
>> do with pthread create.
>>
>>   See "Thread State and the Global Interpreter Lock" >> 
>> https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock>>
>>   and the sections below that "Non-Python created threads" and "Cautions 
>> about fork()." 
>>
>
> I take it you mean that in the parent you think that using pthreads will 
> affect python after the exec() call?
> I does not. After exec() the process has one main thread create by the kernel 
> and a new address space as defined by the /usr/bin/python.
> The only state that in inherited from the parent are open file descriptors, 
> the current working directory and security state like UID, GID.
>
>
>> I'm working on that today and I hope all goes well :) 
>>
>
> You seem to be missing background information on how processes work.
> Maybe "Advanced Programming in the UNIX Environment" > would be helpful?
>
> https://www.amazon.co.uk/Programming-Environment-Addison-Wesley-Professional-Computing-dp-0321637739/dp/0321637739/ref=dp_ob_image_bk>
>   
>
> It's a great book and covers a wide range of Unix systems programming topics.
>
> Have you created a small C program that just does the fork and exec of a 
> python program to test out your assumptions?
> If not I recommend that you do.
>
> Barry
>
>
>
>>
>>
>>
>> Nov 30, 2021, 11:42 by >> ba...@barrys-emacs.org>> :
>>
>>>
>>>
>>>
>>>> On 29 Nov 2021, at 22:31, Jen Kris <>>>> jenk...@tutanota.com>>>> > wrote:
>>>>
>>>> Thanks to you and Cameron for your replies.  The C side has an epoll_ctl 
>>>> set, but no event loop to handle it yet.  I'm putting that in now with a 
>>>> pipe write in Python-- as Cameron pointed out that is the likely source of 
>>>> blocking on C.  The pipes are opened as rdwr in Python because that's 
>>>> nonblocking by default.  The child will become more complex, but not in a 
>>>> way that affects polling.  And thanks for the tip about the c-string 
>>>> termination. 
>>>>
>>>>
>>>
>>> flags is a bit mask. You say its BLOCKing by not setting os.O_NONBLOCK.
>>> You should not use O_RDWR when you only need O_RDONLY access or only 
>>> O_WRONLY access.
>>>
>>> You may find
>>>
>>> man 2 open
>>>
>>> useful to understand in detail what is behind os.open().
>>>
>>> Barry
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>> Nov 29, 2021, 14:12 b

Re: Python child process in while True loop blocks parent

2021-12-01 Thread Jen Kris via Python-list

Thanks for your comment re blocking.  

I removed pipes from the Python and C programs to see if it blocks without 
them, and it does.  It looks now like the problem is not pipes.  I use fork() 
and execv() in C to run Python in a child process, but the Python process 
blocks because fork() does not create a new thread, so the Python global 
interpreter lock (GIL) prevents the C program from running once Python starts.  
So the solution appears to be run Python in a separate thread, which I can do 
with pthread create.  See "Thread State and the Global Interpreter Lock" 
https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock
 and the sections below that "Non-Python created threads" and "Cautions about 
fork()." 

I'm working on that today and I hope all goes well :) 



Nov 30, 2021, 11:42 by ba...@barrys-emacs.org:

>
>
>
>> On 29 Nov 2021, at 22:31, Jen Kris <>> jenk...@tutanota.com>> > wrote:
>>
>> Thanks to you and Cameron for your replies.  The C side has an epoll_ctl 
>> set, but no event loop to handle it yet.  I'm putting that in now with a 
>> pipe write in Python-- as Cameron pointed out that is the likely source of 
>> blocking on C.  The pipes are opened as rdwr in Python because that's 
>> nonblocking by default.  The child will become more complex, but not in a 
>> way that affects polling.  And thanks for the tip about the c-string 
>> termination. 
>>
>>
>
> flags is a bit mask. You say its BLOCKing by not setting os.O_NONBLOCK.
> You should not use O_RDWR when you only need O_RDONLY access or only O_WRONLY 
> access.
>
> You may find
>
> man 2 open
>
> useful to understand in detail what is behind os.open().
>
> Barry
>
>
>
>
>>
>>
>> Nov 29, 2021, 14:12 by >> ba...@barrys-emacs.org>> :
>>
>>>
>>>
>>>> On 29 Nov 2021, at 20:36, Jen Kris via Python-list <>>>> 
>>>> python-list@python.org>>>> > wrote:
>>>>
>>>> I have a C program that forks to create a child process and uses execv to 
>>>> call a Python program.  The Python program communicates with the parent 
>>>> process (in C) through a FIFO pipe monitored with epoll(). 
>>>>
>>>> The Python child process is in a while True loop, which is intended to 
>>>> keep it running while the parent process proceeds, and perform functions 
>>>> for the C program only at intervals when the parent sends data to the 
>>>> child -- similar to a daemon process. 
>>>>
>>>> The C process writes to its end of the pipe and the child process reads 
>>>> it, but then the child process continues to loop, thereby blocking the 
>>>> parent. 
>>>>
>>>> This is the Python code:
>>>>
>>>> #!/usr/bin/python3
>>>> import os
>>>> import select
>>>>
>>>> #Open the named pipes
>>>> pr = os.open('/tmp/Pipe_01', os.O_RDWR)
>>>>
>>> Why open rdwr if you are only going to read the pipe?
>>>
>>>> pw = os.open('/tmp/Pipe_02', os.O_RDWR)
>>>>
>>> Only need to open for write.
>>>
>>>>
>>>> ep = select.epoll(-1)
>>>> ep.register(pr, select.EPOLLIN)
>>>>
>>>
>>> Is the only thing that the child does this:
>>> 1. Read message from pr
>>> 2. Process message
>>> 3. Write result to pw.
>>> 4. Loop from 1
>>>
>>> If so as Cameron said you do not need to worry about the poll.
>>> Do you plan for the child to become more complex?
>>>
>>>>
>>>> while True:
>>>>
>>>> events = ep.poll(timeout=2.5, maxevents=-1)
>>>> #events = ep.poll(timeout=None, maxevents=-1)
>>>>
>>>> print("child is looping")
>>>>
>>>> for fileno, event in events:
>>>> print("Python fileno")
>>>> print(fileno)
>>>> print("Python event")
>>>> print(event)
>>>> v = os.read(pr,64)
>>>> print("Pipe value")
>>>> print(v)
>>>>
>>>> The child process correctly receives the signal from ep.poll and correctly 
>>>> reads the data in the pipe, but then it continues looping.  For example, 
>>>> when I put in a timeout:
>>>>
>>>> child is looping
>>>> Python fileno
>>>> 4
>>>> Python event
>>>> 1
>>>> Pipe value
>>>> b'10\x00'
>>>>
>>> The C code does not need to write a 0 bytes at the end.
>>> I assume the 0 is from the end of a C string.
>>> UDS messages have a length.
>>> In the C just write 2 byes in the case.
>>>
>>> Barry
>>>
>>>> child is looping
>>>> child is looping
>>>>
>>>> That suggests that a while True loop is not the right thing to do in this 
>>>> case.  My question is, what type of process loop is best for this 
>>>> situation?  The multiprocessing, asyncio and subprocess libraries are very 
>>>> extensive, and it would help if someone could suggest the best alternative 
>>>> for what I am doing here. 
>>>>
>>>> Thanks very much for any ideas. 
>>>>
>>>>
>>>> -- 
>>>> https://mail.python.org/mailman/listinfo/python-list
>>>>
>>
>>

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Python child process in while True loop blocks parent

2021-11-29 Thread Jen Kris via Python-list

Thanks to you and Cameron for your replies.  The C side has an epoll_ctl set, 
but no event loop to handle it yet.  I'm putting that in now with a pipe write 
in Python-- as Cameron pointed out that is the likely source of blocking on C.  
The pipes are opened as rdwr in Python because that's nonblocking by default.  
The child will become more complex, but not in a way that affects polling.  And 
thanks for the tip about the c-string termination. 



Nov 29, 2021, 14:12 by ba...@barrys-emacs.org:

>
>
>> On 29 Nov 2021, at 20:36, Jen Kris via Python-list  
>> wrote:
>>
>> I have a C program that forks to create a child process and uses execv to 
>> call a Python program.  The Python program communicates with the parent 
>> process (in C) through a FIFO pipe monitored with epoll(). 
>>
>> The Python child process is in a while True loop, which is intended to keep 
>> it running while the parent process proceeds, and perform functions for the 
>> C program only at intervals when the parent sends data to the child -- 
>> similar to a daemon process. 
>>
>> The C process writes to its end of the pipe and the child process reads it, 
>> but then the child process continues to loop, thereby blocking the parent. 
>>
>> This is the Python code:
>>
>> #!/usr/bin/python3
>> import os
>> import select
>>
>> #Open the named pipes
>> pr = os.open('/tmp/Pipe_01', os.O_RDWR)
>>
> Why open rdwr if you are only going to read the pipe?
>
>> pw = os.open('/tmp/Pipe_02', os.O_RDWR)
>>
> Only need to open for write.
>
>>
>> ep = select.epoll(-1)
>> ep.register(pr, select.EPOLLIN)
>>
>
> Is the only thing that the child does this:
> 1. Read message from pr
> 2. Process message
> 3. Write result to pw.
> 4. Loop from 1
>
> If so as Cameron said you do not need to worry about the poll.
> Do you plan for the child to become more complex?
>
>>
>> while True:
>>
>>  events = ep.poll(timeout=2.5, maxevents=-1)
>>  #events = ep.poll(timeout=None, maxevents=-1)
>>
>>  print("child is looping")
>>
>>  for fileno, event in events:
>>  print("Python fileno")
>>  print(fileno)
>>  print("Python event")
>>  print(event)
>>  v = os.read(pr,64)
>>  print("Pipe value")
>>  print(v)
>>
>> The child process correctly receives the signal from ep.poll and correctly 
>> reads the data in the pipe, but then it continues looping.  For example, 
>> when I put in a timeout:
>>
>> child is looping
>> Python fileno
>> 4
>> Python event
>> 1
>> Pipe value
>> b'10\x00'
>>
> The C code does not need to write a 0 bytes at the end.
> I assume the 0 is from the end of a C string.
> UDS messages have a length.
> In the C just write 2 byes in the case.
>
> Barry
>
>> child is looping
>> child is looping
>>
>> That suggests that a while True loop is not the right thing to do in this 
>> case.  My question is, what type of process loop is best for this situation? 
>>  The multiprocessing, asyncio and subprocess libraries are very extensive, 
>> and it would help if someone could suggest the best alternative for what I 
>> am doing here. 
>>
>> Thanks very much for any ideas. 
>>
>>
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
>>

-- 
https://mail.python.org/mailman/listinfo/python-list

Python child process in while True loop blocks parent

2021-11-29 Thread Jen Kris via Python-list

I have a C program that forks to create a child process and uses execv to call 
a Python program.  The Python program communicates with the parent process (in 
C) through a FIFO pipe monitored with epoll().  

The Python child process is in a while True loop, which is intended to keep it 
running while the parent process proceeds, and perform functions for the C 
program only at intervals when the parent sends data to the child -- similar to 
a daemon process. 

The C process writes to its end of the pipe and the child process reads it, but 
then the child process continues to loop, thereby blocking the parent. 

This is the Python code:

#!/usr/bin/python3
import os
import select

#Open the named pipes
pr = os.open('/tmp/Pipe_01', os.O_RDWR)
pw = os.open('/tmp/Pipe_02', os.O_RDWR)

ep = select.epoll(-1)
ep.register(pr, select.EPOLLIN)

while True:

    events = ep.poll(timeout=2.5, maxevents=-1)
    #events = ep.poll(timeout=None, maxevents=-1)

    print("child is looping")

    for fileno, event in events:
    print("Python fileno")
    print(fileno)
    print("Python event")
    print(event)
    v = os.read(pr,64)
    print("Pipe value")
    print(v)

The child process correctly receives the signal from ep.poll and correctly 
reads the data in the pipe, but then it continues looping.  For example, when I 
put in a timeout:

child is looping
Python fileno
4
Python event
1
Pipe value
b'10\x00'
child is looping
child is looping

That suggests that a while True loop is not the right thing to do in this case. 
 My question is, what type of process loop is best for this situation?  The 
multiprocessing, asyncio and subprocess libraries are very extensive, and it 
would help if someone could suggest the best alternative for what I am doing 
here. 

Thanks very much for any ideas. 


-- 
https://mail.python.org/mailman/listinfo/python-list

67 matches

Mail list logo