Re: C API PyObject_Call segfaults with string

2022-02-10 Thread Jen Kris via Python-list
Thank you for that suggestion.  It allowed me to replace six lines of code with 
one.  :)


Feb 10, 2022, 12:43 by pyt...@mrabarnett.plus.com:

> On 2022-02-10 20:00, Jen Kris via Python-list wrote:
>
>> With the help of PyErr_Print() I have it solved.  Here is the final code 
>> (the part relevant to sents):
>>
>>     Py_ssize_t listIndex = 0;
>>     pListItem = PyList_GetItem(pFileIds, listIndex);
>>     pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>>     pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer
>>
>>     // Then:  sentences = gutenberg.sents(fileid) - this is a sequence item
>>     PyObject *c_args = Py_BuildValue("s", pListStr);
>>     PyObject *args_tuple = PyTuple_New(1);
>>     PyTuple_SetItem(args_tuple, 0, c_args);
>>
>>     pSents = PyObject_CallObject(pSentMod, args_tuple);
>>
>>     if ( pSents == 0x0){
>>     PyErr_Print();
>>     return return_value; }
>>
>> As you mentioned yesterday, CallObject needs a tuple, so that was the 
>> problem.  Now it works.
>>
>> You also asked why I don't just use pListStrE.  I tried that and got a long 
>> error message from PyErr_Print.  I'm not far enough along in my C_API work 
>> to understand why, but it doesn't work.
>>
>> Thanks very much for your help on this.
>>
> You're encoding a Unicode string to a UTF-8 bytestring:
>
>  pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>
> then pointing to the bytes of that UTF-8 bytestring:
>
>  pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer
>
> then making a Unicode string from those UTF-8 bytes:
>
>  PyObject *c_args = Py_BuildValue("s", pListStr);
>
> You might was well just use the original Unicode string!
>
> Try this instead:
>
>  Py_ssize_t listIndex = 0;
>  pListItem = PyList_GetItem(pFileIds, listIndex);
>  //> pListItem?
>
>  pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem, 0);
>  //> pSents+?
>
>  if (pSents == 0x0){
>  PyErr_Print();
>  return return_value;
>  }
>
>>
>>
>> Feb 9, 2022, 17:40 by songofaca...@gmail.com:
>>
>>> On Thu, Feb 10, 2022 at 10:37 AM Jen Kris  wrote:
>>>

 I'm using Python 3.8 so I tried your second choice:

 pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);

 but pSents is 0x0.  pSentMod and pListItem are valid pointers.

>>>
>>> It means exception happened.
>>> If you are writing Python/C function, return NULL (e.g. `if (pSents ==
>>> NULL) return NULL`)
>>> Then Python show the exception and traceback for you.
>>>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-10 Thread MRAB

On 2022-02-10 20:00, Jen Kris via Python-list wrote:

With the help of PyErr_Print() I have it solved.  Here is the final code (the 
part relevant to sents):

    Py_ssize_t listIndex = 0;
    pListItem = PyList_GetItem(pFileIds, listIndex);
    pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
    pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer

    // Then:  sentences = gutenberg.sents(fileid) - this is a sequence item
    PyObject *c_args = Py_BuildValue("s", pListStr);
    PyObject *args_tuple = PyTuple_New(1);
    PyTuple_SetItem(args_tuple, 0, c_args);

    pSents = PyObject_CallObject(pSentMod, args_tuple);

    if ( pSents == 0x0){
    PyErr_Print();
    return return_value; }

As you mentioned yesterday, CallObject needs a tuple, so that was the problem.  
Now it works.

You also asked why I don't just use pListStrE.  I tried that and got a long 
error message from PyErr_Print.  I'm not far enough along in my C_API work to 
understand why, but it doesn't work.

Thanks very much for your help on this.


You're encoding a Unicode string to a UTF-8 bytestring:

pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");

then pointing to the bytes of that UTF-8 bytestring:

pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer

then making a Unicode string from those UTF-8 bytes:

PyObject *c_args = Py_BuildValue("s", pListStr);

You might was well just use the original Unicode string!

Try this instead:

   Py_ssize_t listIndex = 0;
   pListItem = PyList_GetItem(pFileIds, listIndex);
   //> pListItem?

   pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem, 0);
   //> pSents+?

   if (pSents == 0x0){
   PyErr_Print();
   return return_value;
   }




Feb 9, 2022, 17:40 by songofaca...@gmail.com:


On Thu, Feb 10, 2022 at 10:37 AM Jen Kris  wrote:



I'm using Python 3.8 so I tried your second choice:

pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);

but pSents is 0x0.  pSentMod and pListItem are valid pointers.



It means exception happened.
If you are writing Python/C function, return NULL (e.g. `if (pSents ==
NULL) return NULL`)
Then Python show the exception and traceback for you.


--
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-10 Thread Jen Kris via Python-list
Hi and thanks very much for your comments on reference counting.  Since I'm new 
to the C_API that will help a lot.  I know that reference counting is one of 
the difficult issues with the C API.  

I just posted a reply to Inada Naoki showing how I solved the problem I posted 
yesterday.  

Thanks much for your help.

Jen


Feb 9, 2022, 18:43 by pyt...@mrabarnett.plus.com:

> On 2022-02-10 01:37, Jen Kris via Python-list wrote:
>
>> I'm using Python 3.8 so I tried your second choice:
>>
>> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);
>>
>> but pSents is 0x0.  pSentMod and pListItem are valid pointers.
>>
> 'PyObject_CallFunction' looks like a good one to use:
>
> """PyObject* PyObject_CallFunction(PyObject *callable, const char *format, 
> ...)
>
> Call a callable Python object callable, with a variable number of C 
> arguments. The C arguments are described using a Py_BuildValue() style format 
> string. The format can be NULL, indicating that no arguments are provided.
> """
>
> [snip]
>
> What I do is add comments to keep track of what objects I have references to 
> at each point and whether they are new references or could be NULL.
>
> For example:
>
>  pName = PyUnicode_FromString("nltk.corpus");
>  //> pName+?
>
> This means that 'pName' contains a reference, '+' means that it's a new 
> reference, and '?' means that it could be NULL (usually due to an exception, 
> but not always) so I need to check it.
>
> Continuing in this vein:
>
>  pModule = PyImport_Import(pName);
>  //> pName+? pModule+?
>
>  pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
>  //> pName+? pModule+? pSubMod+?
>  pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
>  //> pName+? pModule+? pSubMod+? pFidMod+?
>  pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>  //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+?
>
>  pFileIds = PyObject_CallObject(pFidMod, 0);
>  //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+?
>  pListItem = PyList_GetItem(pFileIds, listIndex);
>  //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? 
> pListItem?
>  pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>  //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? 
> pListItem? pListStrE+?
>
> As you can see, there's a lot of leaked references building up.
>
> Note how after:
>
>  pListItem = PyList_GetItem(pFileIds, listIndex);
>
> the addition is:
>
>  //> pListItem?
>
> This means that 'pListItem' contains a borrowed (not new) reference, but 
> could be NULL.
>
> I find it easiest to DECREF as soon as I no longer need the reference and 
> remove a name from the list as soon I no longer need it (and DECREFed where).
>
> For example:
>
>  pName = PyUnicode_FromString("nltk.corpus");
>  //> pName+?
>  if (!pName)
>  goto error;
>  //> pName+
>  pModule = PyImport_Import(pName);
>  //> pName+ pModule+?
>  Py_DECREF(pName);
>  //> pModule+?
>  if (!pModule)
>  goto error;
>  //> pModule+
>
> I find that doing this greatly reduces the chances of getting the reference 
> counting wrong, and I can remove the comments once I've finished the function 
> I'm writing.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-10 Thread Jen Kris via Python-list
With the help of PyErr_Print() I have it solved.  Here is the final code (the 
part relevant to sents):

   Py_ssize_t listIndex = 0;
   pListItem = PyList_GetItem(pFileIds, listIndex);
   pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
   pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer

   // Then:  sentences = gutenberg.sents(fileid) - this is a sequence item
   PyObject *c_args = Py_BuildValue("s", pListStr);
   PyObject *args_tuple = PyTuple_New(1);
   PyTuple_SetItem(args_tuple, 0, c_args);

   pSents = PyObject_CallObject(pSentMod, args_tuple);

   if ( pSents == 0x0){
   PyErr_Print();
   return return_value; }

As you mentioned yesterday, CallObject needs a tuple, so that was the problem.  
Now it works.  

You also asked why I don't just use pListStrE.  I tried that and got a long 
error message from PyErr_Print.  I'm not far enough along in my C_API work to 
understand why, but it doesn't work.  

Thanks very much for your help on this.  

Jen


Feb 9, 2022, 17:40 by songofaca...@gmail.com:

> On Thu, Feb 10, 2022 at 10:37 AM Jen Kris  wrote:
>
>>
>> I'm using Python 3.8 so I tried your second choice:
>>
>> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);
>>
>> but pSents is 0x0.  pSentMod and pListItem are valid pointers.
>>
>
> It means exception happened.
> If you are writing Python/C function, return NULL (e.g. `if (pSents ==
> NULL) return NULL`)
> Then Python show the exception and traceback for you.
>
> -- 
> Inada Naoki  
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-09 Thread MRAB

On 2022-02-10 01:37, Jen Kris via Python-list wrote:

I'm using Python 3.8 so I tried your second choice:

pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);

but pSents is 0x0.  pSentMod and pListItem are valid pointers.


'PyObject_CallFunction' looks like a good one to use:

"""PyObject* PyObject_CallFunction(PyObject *callable, const char 
*format, ...)


Call a callable Python object callable, with a variable number of C 
arguments. The C arguments are described using a Py_BuildValue() style 
format string. The format can be NULL, indicating that no arguments are 
provided.

"""

[snip]

What I do is add comments to keep track of what objects I have 
references to at each point and whether they are new references or could 
be NULL.


For example:

pName = PyUnicode_FromString("nltk.corpus");
//> pName+?

This means that 'pName' contains a reference, '+' means that it's a new 
reference, and '?' means that it could be NULL (usually due to an 
exception, but not always) so I need to check it.


Continuing in this vein:

pModule = PyImport_Import(pName);
//> pName+? pModule+?

pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
//> pName+? pModule+? pSubMod+?
pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
//> pName+? pModule+? pSubMod+? pFidMod+?
pSentMod = PyObject_GetAttrString(pSubMod, "sents");
//> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+?

pFileIds = PyObject_CallObject(pFidMod, 0);
//> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? 
PyObject_CallObject+?

pListItem = PyList_GetItem(pFileIds, listIndex);
//> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? 
PyObject_CallObject+? pListItem?

pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
//> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? 
PyObject_CallObject+? pListItem? pListStrE+?


As you can see, there's a lot of leaked references building up.

Note how after:

pListItem = PyList_GetItem(pFileIds, listIndex);

the addition is:

//> pListItem?

This means that 'pListItem' contains a borrowed (not new) reference, but 
could be NULL.


I find it easiest to DECREF as soon as I no longer need the reference 
and remove a name from the list as soon I no longer need it (and 
DECREFed where).


For example:

pName = PyUnicode_FromString("nltk.corpus");
//> pName+?
if (!pName)
goto error;
//> pName+
pModule = PyImport_Import(pName);
//> pName+ pModule+?
Py_DECREF(pName);
//> pModule+?
if (!pModule)
goto error;
//> pModule+

I find that doing this greatly reduces the chances of getting the 
reference counting wrong, and I can remove the comments once I've 
finished the function I'm writing.

--
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Jen Kris via Python-list
I'll do that and post back tomorrow.  The office is closing and I have to leave 
now (I'm in Seattle).  Thanks again for your help.  


Feb 9, 2022, 17:40 by songofaca...@gmail.com:

> On Thu, Feb 10, 2022 at 10:37 AM Jen Kris  wrote:
>
>>
>> I'm using Python 3.8 so I tried your second choice:
>>
>> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);
>>
>> but pSents is 0x0.  pSentMod and pListItem are valid pointers.
>>
>
> It means exception happened.
> If you are writing Python/C function, return NULL (e.g. `if (pSents ==
> NULL) return NULL`)
> Then Python show the exception and traceback for you.
>
> -- 
> Inada Naoki  
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Inada Naoki
On Thu, Feb 10, 2022 at 10:37 AM Jen Kris  wrote:
>
> I'm using Python 3.8 so I tried your second choice:
>
> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);
>
> but pSents is 0x0.  pSentMod and pListItem are valid pointers.
>

It means exception happened.
If you are writing Python/C function, return NULL (e.g. `if (pSents ==
NULL) return NULL`)
Then Python show the exception and traceback for you.

-- 
Inada Naoki  
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Jen Kris via Python-list
I'm using Python 3.8 so I tried your second choice:

pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);

but pSents is 0x0.  pSentMod and pListItem are valid pointers.  


Feb 9, 2022, 17:23 by songofaca...@gmail.com:

> // https://docs.python.org/3/c-api/call.html#c.PyObject_CallNoArgs
> // This function is only for one arg. Python >= 3.9 is required.
> pSents = PyObject_CallOneArg(pSentMod, pListItem);
>
> Or
>
> // https://docs.python.org/3/c-api/call.html#c.PyObject_CallFunctionObjArgs
> // This function can call function with multiple arguments. Can be
> used with Python <3.9 too.
> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);
>
> On Thu, Feb 10, 2022 at 10:15 AM Jen Kris  wrote:
>
>>
>> Right you are.  In that case should I use Py_BuildValue and convert to tuple 
>> (because it won't return a tuple for a one-arg), or should I just convert 
>> pListStr to tuple?  Thanks for your help.
>>
>>
>> Feb 9, 2022, 17:08 by songofaca...@gmail.com:
>>
>> On Thu, Feb 10, 2022 at 10:05 AM Jen Kris  wrote:
>>
>>
>> Thanks for your reply.
>>
>> I eliminated the DECREF and now it doesn't segfault but it returns 0x0. Same 
>> when I substitute pListStrE for pListStr. pListStr contains the string 
>> representation of the fileid, so it seemed like the one to use. According to 
>> http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, 
>> PyBuildValue "builds a tuple only if its format string contains two or more 
>> format units" and that doc contains examples.
>>
>>
>> Yes, and PyObject_Call accept tuple, not str.
>>
>>
>> https://docs.python.org/3/c-api/call.html#c.PyObject_Call
>>
>>
>> Feb 9, 2022, 16:52 by songofaca...@gmail.com:
>>
>> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list
>>  wrote:
>>
>>
>> I have everything finished down to the last line (sentences = 
>> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, 
>> but it segfaults. The fileid is a string -- the first fileid in this corpus 
>> is "austen-emma.txt."
>>
>> pName = PyUnicode_FromString("nltk.corpus");
>> pModule = PyImport_Import(pName);
>>
>> pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
>> pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
>> pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>>
>> pFileIds = PyObject_CallObject(pFidMod, 0);
>> pListItem = PyList_GetItem(pFileIds, listIndex);
>> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>> pListStr = PyBytes_AS_STRING(pListStrE);
>> Py_DECREF(pListStrE);
>>
>>
>> HERE.
>> PyBytes_AS_STRING() returns pointer in the pListStrE Object.
>> So Py_DECREF(pListStrE) makes pListStr a dangling pointer.
>>
>>
>> // sentences = gutenberg.sents(fileid)
>> PyObject *c_args = Py_BuildValue("s", pListStr);
>>
>>
>> Why do you encode pListStrE?
>> Why don't you use just pListStrE?
>>
>> PyObject *NullPtr = 0;
>> pSents = PyObject_Call(pSentMod, c_args, NullPtr);
>>
>>
>> c_args must tuple, but you passed a unicode object here.
>> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue
>>
>> The final line segfaults:
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x76e4e8d5 in _PyEval_EvalCodeWithName ()
>> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
>>
>> My guess is the problem is in Py_BuildValue, which returns a pointer but it 
>> may not be constructed correctly. I also tried it with "O" and it doesn't 
>> segfault but it returns 0x0.
>>
>> I'm new to using the C API. Thanks for any help.
>>
>> Jen
>>
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>>
>> Bests,
>>
>> --
>> Inada Naoki 
>>
>>
>>
>> --
>> Inada Naoki 
>>
>
>
> -- 
> Inada Naoki  
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Inada Naoki
// https://docs.python.org/3/c-api/call.html#c.PyObject_CallNoArgs
// This function is only for one arg. Python >= 3.9 is required.
pSents = PyObject_CallOneArg(pSentMod, pListItem);

Or

// https://docs.python.org/3/c-api/call.html#c.PyObject_CallFunctionObjArgs
// This function can call function with multiple arguments. Can be
used with Python <3.9 too.
pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem);

On Thu, Feb 10, 2022 at 10:15 AM Jen Kris  wrote:
>
> Right you are.  In that case should I use Py_BuildValue and convert to tuple 
> (because it won't return a tuple for a one-arg), or should I just convert 
> pListStr to tuple?  Thanks for your help.
>
>
> Feb 9, 2022, 17:08 by songofaca...@gmail.com:
>
> On Thu, Feb 10, 2022 at 10:05 AM Jen Kris  wrote:
>
>
> Thanks for your reply.
>
> I eliminated the DECREF and now it doesn't segfault but it returns 0x0. Same 
> when I substitute pListStrE for pListStr. pListStr contains the string 
> representation of the fileid, so it seemed like the one to use. According to 
> http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, PyBuildValue 
> "builds a tuple only if its format string contains two or more format units" 
> and that doc contains examples.
>
>
> Yes, and PyObject_Call accept tuple, not str.
>
>
> https://docs.python.org/3/c-api/call.html#c.PyObject_Call
>
>
> Feb 9, 2022, 16:52 by songofaca...@gmail.com:
>
> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list
>  wrote:
>
>
> I have everything finished down to the last line (sentences = 
> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, 
> but it segfaults. The fileid is a string -- the first fileid in this corpus 
> is "austen-emma.txt."
>
> pName = PyUnicode_FromString("nltk.corpus");
> pModule = PyImport_Import(pName);
>
> pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
> pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
> pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>
> pFileIds = PyObject_CallObject(pFidMod, 0);
> pListItem = PyList_GetItem(pFileIds, listIndex);
> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
> pListStr = PyBytes_AS_STRING(pListStrE);
> Py_DECREF(pListStrE);
>
>
> HERE.
> PyBytes_AS_STRING() returns pointer in the pListStrE Object.
> So Py_DECREF(pListStrE) makes pListStr a dangling pointer.
>
>
> // sentences = gutenberg.sents(fileid)
> PyObject *c_args = Py_BuildValue("s", pListStr);
>
>
> Why do you encode pListStrE?
> Why don't you use just pListStrE?
>
> PyObject *NullPtr = 0;
> pSents = PyObject_Call(pSentMod, c_args, NullPtr);
>
>
> c_args must tuple, but you passed a unicode object here.
> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue
>
> The final line segfaults:
> Program received signal SIGSEGV, Segmentation fault.
> 0x76e4e8d5 in _PyEval_EvalCodeWithName ()
> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
>
> My guess is the problem is in Py_BuildValue, which returns a pointer but it 
> may not be constructed correctly. I also tried it with "O" and it doesn't 
> segfault but it returns 0x0.
>
> I'm new to using the C API. Thanks for any help.
>
> Jen
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
>
> Bests,
>
> --
> Inada Naoki 
>
>
>
> --
> Inada Naoki 
>
>


-- 
Inada Naoki  
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Jen Kris via Python-list
Right you are.  In that case should I use Py_BuildValue and convert to tuple 
(because it won't return a tuple for a one-arg), or should I just convert 
pListStr to tuple?  Thanks for your help.  


Feb 9, 2022, 17:08 by songofaca...@gmail.com:

> On Thu, Feb 10, 2022 at 10:05 AM Jen Kris  wrote:
>
>>
>> Thanks for your reply.
>>
>> I eliminated the DECREF and now it doesn't segfault but it returns 0x0.  
>> Same when I substitute pListStrE for pListStr.  pListStr contains the string 
>> representation of the fileid, so it seemed like the one to use.  According 
>> to  http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, 
>> PyBuildValue "builds a tuple only if its format string contains two or more 
>> format units" and that doc contains examples.
>>
>
> Yes, and PyObject_Call accept tuple, not str.
>
>
> https://docs.python.org/3/c-api/call.html#c.PyObject_Call
>
>>
>> Feb 9, 2022, 16:52 by songofaca...@gmail.com:
>>
>> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list
>>  wrote:
>>
>>
>> I have everything finished down to the last line (sentences = 
>> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, 
>> but it segfaults. The fileid is a string -- the first fileid in this corpus 
>> is "austen-emma.txt."
>>
>> pName = PyUnicode_FromString("nltk.corpus");
>> pModule = PyImport_Import(pName);
>>
>> pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
>> pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
>> pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>>
>> pFileIds = PyObject_CallObject(pFidMod, 0);
>> pListItem = PyList_GetItem(pFileIds, listIndex);
>> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>> pListStr = PyBytes_AS_STRING(pListStrE);
>> Py_DECREF(pListStrE);
>>
>>
>> HERE.
>> PyBytes_AS_STRING() returns pointer in the pListStrE Object.
>> So Py_DECREF(pListStrE) makes pListStr a dangling pointer.
>>
>>
>> // sentences = gutenberg.sents(fileid)
>> PyObject *c_args = Py_BuildValue("s", pListStr);
>>
>>
>> Why do you encode pListStrE?
>> Why don't you use just pListStrE?
>>
>> PyObject *NullPtr = 0;
>> pSents = PyObject_Call(pSentMod, c_args, NullPtr);
>>
>>
>> c_args must tuple, but you passed a unicode object here.
>> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue
>>
>> The final line segfaults:
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x76e4e8d5 in _PyEval_EvalCodeWithName ()
>> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
>>
>> My guess is the problem is in Py_BuildValue, which returns a pointer but it 
>> may not be constructed correctly. I also tried it with "O" and it doesn't 
>> segfault but it returns 0x0.
>>
>> I'm new to using the C API. Thanks for any help.
>>
>> Jen
>>
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>>
>> Bests,
>>
>> --
>> Inada Naoki 
>>
>
>
> -- 
> Inada Naoki  
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Inada Naoki
On Thu, Feb 10, 2022 at 10:05 AM Jen Kris  wrote:
>
> Thanks for your reply.
>
> I eliminated the DECREF and now it doesn't segfault but it returns 0x0.  Same 
> when I substitute pListStrE for pListStr.  pListStr contains the string 
> representation of the fileid, so it seemed like the one to use.  According to 
>  http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, 
> PyBuildValue "builds a tuple only if its format string contains two or more 
> format units" and that doc contains examples.
>

Yes, and PyObject_Call accept tuple, not str.


https://docs.python.org/3/c-api/call.html#c.PyObject_Call

>
> Feb 9, 2022, 16:52 by songofaca...@gmail.com:
>
> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list
>  wrote:
>
>
> I have everything finished down to the last line (sentences = 
> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, 
> but it segfaults. The fileid is a string -- the first fileid in this corpus 
> is "austen-emma.txt."
>
> pName = PyUnicode_FromString("nltk.corpus");
> pModule = PyImport_Import(pName);
>
> pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
> pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
> pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>
> pFileIds = PyObject_CallObject(pFidMod, 0);
> pListItem = PyList_GetItem(pFileIds, listIndex);
> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
> pListStr = PyBytes_AS_STRING(pListStrE);
> Py_DECREF(pListStrE);
>
>
> HERE.
> PyBytes_AS_STRING() returns pointer in the pListStrE Object.
> So Py_DECREF(pListStrE) makes pListStr a dangling pointer.
>
>
> // sentences = gutenberg.sents(fileid)
> PyObject *c_args = Py_BuildValue("s", pListStr);
>
>
> Why do you encode pListStrE?
> Why don't you use just pListStrE?
>
> PyObject *NullPtr = 0;
> pSents = PyObject_Call(pSentMod, c_args, NullPtr);
>
>
> c_args must tuple, but you passed a unicode object here.
> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue
>
> The final line segfaults:
> Program received signal SIGSEGV, Segmentation fault.
> 0x76e4e8d5 in _PyEval_EvalCodeWithName ()
> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
>
> My guess is the problem is in Py_BuildValue, which returns a pointer but it 
> may not be constructed correctly. I also tried it with "O" and it doesn't 
> segfault but it returns 0x0.
>
> I'm new to using the C API. Thanks for any help.
>
> Jen
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
>
> Bests,
>
> --
> Inada Naoki 
>
>


-- 
Inada Naoki  
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Jen Kris via Python-list
Thanks for your reply.  

I eliminated the DECREF and now it doesn't segfault but it returns 0x0.  Same 
when I substitute pListStrE for pListStr.  pListStr contains the string 
representation of the fileid, so it seemed like the one to use.  According to  
http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, PyBuildValue 
"builds a tuple only if its format string contains two or more format units" 
and that doc contains examples. 


Feb 9, 2022, 16:52 by songofaca...@gmail.com:

> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list
>  wrote:
>
>>
>> I have everything finished down to the last line (sentences = 
>> gutenberg.sents(fileid)) where I use  PyObject_Call to call gutenberg.sents, 
>> but it segfaults.  The fileid is a string -- the first fileid in this corpus 
>> is "austen-emma.txt."
>>
>> pName = PyUnicode_FromString("nltk.corpus");
>> pModule = PyImport_Import(pName);
>>
>> pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
>> pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
>> pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>>
>> pFileIds = PyObject_CallObject(pFidMod, 0);
>> pListItem = PyList_GetItem(pFileIds, listIndex);
>> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>> pListStr = PyBytes_AS_STRING(pListStrE);
>> Py_DECREF(pListStrE);
>>
>
> HERE.
> PyBytes_AS_STRING() returns pointer in the pListStrE Object.
> So Py_DECREF(pListStrE) makes pListStr a dangling pointer.
>
>>
>> // sentences = gutenberg.sents(fileid)
>> PyObject *c_args = Py_BuildValue("s", pListStr);
>>
>
> Why do you encode pListStrE?
> Why don't you use just pListStrE?
>
>> PyObject *NullPtr = 0;
>> pSents = PyObject_Call(pSentMod, c_args, NullPtr);
>>
>
> c_args must tuple, but you passed a unicode object here.
> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue
>
>
>> The final line segfaults:
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x76e4e8d5 in _PyEval_EvalCodeWithName ()
>>  from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
>>
>> My guess is the problem is in Py_BuildValue, which returns a pointer but it 
>> may not be constructed correctly.  I also tried it with "O" and it doesn't 
>> segfault but it returns 0x0.
>>
>> I'm new to using the C API.  Thanks for any help.
>>
>> Jen
>>
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
> Bests,
>
> -- 
> Inada Naoki  
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Inada Naoki
On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list
 wrote:
>
> I have everything finished down to the last line (sentences = 
> gutenberg.sents(fileid)) where I use  PyObject_Call to call gutenberg.sents, 
> but it segfaults.  The fileid is a string -- the first fileid in this corpus 
> is "austen-emma.txt."
>
> pName = PyUnicode_FromString("nltk.corpus");
> pModule = PyImport_Import(pName);
>
> pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
> pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
> pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>
> pFileIds = PyObject_CallObject(pFidMod, 0);
> pListItem = PyList_GetItem(pFileIds, listIndex);
> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
> pListStr = PyBytes_AS_STRING(pListStrE);
> Py_DECREF(pListStrE);

HERE.
PyBytes_AS_STRING() returns pointer in the pListStrE Object.
So Py_DECREF(pListStrE) makes pListStr a dangling pointer.

>
> // sentences = gutenberg.sents(fileid)
> PyObject *c_args = Py_BuildValue("s", pListStr);

Why do you encode pListStrE?
Why don't you use just pListStrE?

> PyObject *NullPtr = 0;
> pSents = PyObject_Call(pSentMod, c_args, NullPtr);
>

c_args must tuple, but you passed a unicode object here.
Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue


> The final line segfaults:
> Program received signal SIGSEGV, Segmentation fault.
> 0x76e4e8d5 in _PyEval_EvalCodeWithName ()
>from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
>
> My guess is the problem is in Py_BuildValue, which returns a pointer but it 
> may not be constructed correctly.  I also tried it with "O" and it doesn't 
> segfault but it returns 0x0.
>
> I'm new to using the C API.  Thanks for any help.
>
> Jen
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list

Bests,

-- 
Inada Naoki  
-- 
https://mail.python.org/mailman/listinfo/python-list


C API PyObject_Call segfaults with string

2022-02-09 Thread Jen Kris via Python-list
This is a follow-on to a question I asked yesterday, which was answered by 
MRAB.   I'm using the Python C API to load the Gutenberg corpus from the nltk 
library and iterate through the sentences.  The Python code I am trying to 
replicate is:

from nltk.corpus import gutenberg
for i, fileid in enumerate(gutenberg.fileids()):
    sentences = gutenberg.sents(fileid)
    etc

I have everything finished down to the last line (sentences = 
gutenberg.sents(fileid)) where I use  PyObject_Call to call gutenberg.sents, 
but it segfaults.  The fileid is a string -- the first fileid in this corpus is 
"austen-emma.txt."  

pName = PyUnicode_FromString("nltk.corpus");
pModule = PyImport_Import(pName);

pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
pSentMod = PyObject_GetAttrString(pSubMod, "sents");

pFileIds = PyObject_CallObject(pFidMod, 0);
pListItem = PyList_GetItem(pFileIds, listIndex);
pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
pListStr = PyBytes_AS_STRING(pListStrE);
Py_DECREF(pListStrE);

// sentences = gutenberg.sents(fileid)
PyObject *c_args = Py_BuildValue("s", pListStr);  
PyObject *NullPtr = 0;
pSents = PyObject_Call(pSentMod, c_args, NullPtr);

The final line segfaults:
Program received signal SIGSEGV, Segmentation fault.
0x76e4e8d5 in _PyEval_EvalCodeWithName ()
   from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0

My guess is the problem is in Py_BuildValue, which returns a pointer but it may 
not be constructed correctly.  I also tried it with "O" and it doesn't segfault 
but it returns 0x0. 

I'm new to using the C API.  Thanks for any help. 

Jen


-- 
https://mail.python.org/mailman/listinfo/python-list