Re: C API PyObject_Call segfaults with string
Thank you for that suggestion. It allowed me to replace six lines of code with one. :) Feb 10, 2022, 12:43 by pyt...@mrabarnett.plus.com: > On 2022-02-10 20:00, Jen Kris via Python-list wrote: > >> With the help of PyErr_Print() I have it solved. Here is the final code >> (the part relevant to sents): >> >> Py_ssize_t listIndex = 0; >> pListItem = PyList_GetItem(pFileIds, listIndex); >> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); >> pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer >> >> // Then: sentences = gutenberg.sents(fileid) - this is a sequence item >> PyObject *c_args = Py_BuildValue("s", pListStr); >> PyObject *args_tuple = PyTuple_New(1); >> PyTuple_SetItem(args_tuple, 0, c_args); >> >> pSents = PyObject_CallObject(pSentMod, args_tuple); >> >> if ( pSents == 0x0){ >> PyErr_Print(); >> return return_value; } >> >> As you mentioned yesterday, CallObject needs a tuple, so that was the >> problem. Now it works. >> >> You also asked why I don't just use pListStrE. I tried that and got a long >> error message from PyErr_Print. I'm not far enough along in my C_API work >> to understand why, but it doesn't work. >> >> Thanks very much for your help on this. >> > You're encoding a Unicode string to a UTF-8 bytestring: > > pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); > > then pointing to the bytes of that UTF-8 bytestring: > > pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer > > then making a Unicode string from those UTF-8 bytes: > > PyObject *c_args = Py_BuildValue("s", pListStr); > > You might was well just use the original Unicode string! > > Try this instead: > > Py_ssize_t listIndex = 0; > pListItem = PyList_GetItem(pFileIds, listIndex); > //> pListItem? > > pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem, 0); > //> pSents+? > > if (pSents == 0x0){ > PyErr_Print(); > return return_value; > } > >> >> >> Feb 9, 2022, 17:40 by songofaca...@gmail.com: >> >>> On Thu, Feb 10, 2022 at 10:37 AM Jen Kris wrote: >>> I'm using Python 3.8 so I tried your second choice: pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); but pSents is 0x0. pSentMod and pListItem are valid pointers. >>> >>> It means exception happened. >>> If you are writing Python/C function, return NULL (e.g. `if (pSents == >>> NULL) return NULL`) >>> Then Python show the exception and traceback for you. >>> > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
On 2022-02-10 20:00, Jen Kris via Python-list wrote: With the help of PyErr_Print() I have it solved. Here is the final code (the part relevant to sents): Py_ssize_t listIndex = 0; pListItem = PyList_GetItem(pFileIds, listIndex); pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer // Then: sentences = gutenberg.sents(fileid) - this is a sequence item PyObject *c_args = Py_BuildValue("s", pListStr); PyObject *args_tuple = PyTuple_New(1); PyTuple_SetItem(args_tuple, 0, c_args); pSents = PyObject_CallObject(pSentMod, args_tuple); if ( pSents == 0x0){ PyErr_Print(); return return_value; } As you mentioned yesterday, CallObject needs a tuple, so that was the problem. Now it works. You also asked why I don't just use pListStrE. I tried that and got a long error message from PyErr_Print. I'm not far enough along in my C_API work to understand why, but it doesn't work. Thanks very much for your help on this. You're encoding a Unicode string to a UTF-8 bytestring: pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); then pointing to the bytes of that UTF-8 bytestring: pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer then making a Unicode string from those UTF-8 bytes: PyObject *c_args = Py_BuildValue("s", pListStr); You might was well just use the original Unicode string! Try this instead: Py_ssize_t listIndex = 0; pListItem = PyList_GetItem(pFileIds, listIndex); //> pListItem? pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem, 0); //> pSents+? if (pSents == 0x0){ PyErr_Print(); return return_value; } Feb 9, 2022, 17:40 by songofaca...@gmail.com: On Thu, Feb 10, 2022 at 10:37 AM Jen Kris wrote: I'm using Python 3.8 so I tried your second choice: pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); but pSents is 0x0. pSentMod and pListItem are valid pointers. It means exception happened. If you are writing Python/C function, return NULL (e.g. `if (pSents == NULL) return NULL`) Then Python show the exception and traceback for you. -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
Hi and thanks very much for your comments on reference counting. Since I'm new to the C_API that will help a lot. I know that reference counting is one of the difficult issues with the C API. I just posted a reply to Inada Naoki showing how I solved the problem I posted yesterday. Thanks much for your help. Jen Feb 9, 2022, 18:43 by pyt...@mrabarnett.plus.com: > On 2022-02-10 01:37, Jen Kris via Python-list wrote: > >> I'm using Python 3.8 so I tried your second choice: >> >> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); >> >> but pSents is 0x0. pSentMod and pListItem are valid pointers. >> > 'PyObject_CallFunction' looks like a good one to use: > > """PyObject* PyObject_CallFunction(PyObject *callable, const char *format, > ...) > > Call a callable Python object callable, with a variable number of C > arguments. The C arguments are described using a Py_BuildValue() style format > string. The format can be NULL, indicating that no arguments are provided. > """ > > [snip] > > What I do is add comments to keep track of what objects I have references to > at each point and whether they are new references or could be NULL. > > For example: > > pName = PyUnicode_FromString("nltk.corpus"); > //> pName+? > > This means that 'pName' contains a reference, '+' means that it's a new > reference, and '?' means that it could be NULL (usually due to an exception, > but not always) so I need to check it. > > Continuing in this vein: > > pModule = PyImport_Import(pName); > //> pName+? pModule+? > > pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); > //> pName+? pModule+? pSubMod+? > pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); > //> pName+? pModule+? pSubMod+? pFidMod+? > pSentMod = PyObject_GetAttrString(pSubMod, "sents"); > //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? > > pFileIds = PyObject_CallObject(pFidMod, 0); > //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? > pListItem = PyList_GetItem(pFileIds, listIndex); > //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? > pListItem? > pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); > //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? > pListItem? pListStrE+? > > As you can see, there's a lot of leaked references building up. > > Note how after: > > pListItem = PyList_GetItem(pFileIds, listIndex); > > the addition is: > > //> pListItem? > > This means that 'pListItem' contains a borrowed (not new) reference, but > could be NULL. > > I find it easiest to DECREF as soon as I no longer need the reference and > remove a name from the list as soon I no longer need it (and DECREFed where). > > For example: > > pName = PyUnicode_FromString("nltk.corpus"); > //> pName+? > if (!pName) > goto error; > //> pName+ > pModule = PyImport_Import(pName); > //> pName+ pModule+? > Py_DECREF(pName); > //> pModule+? > if (!pModule) > goto error; > //> pModule+ > > I find that doing this greatly reduces the chances of getting the reference > counting wrong, and I can remove the comments once I've finished the function > I'm writing. > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
With the help of PyErr_Print() I have it solved. Here is the final code (the part relevant to sents): Py_ssize_t listIndex = 0; pListItem = PyList_GetItem(pFileIds, listIndex); pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); pListStr = PyBytes_AS_STRING(pListStrE); // Borrowed pointer // Then: sentences = gutenberg.sents(fileid) - this is a sequence item PyObject *c_args = Py_BuildValue("s", pListStr); PyObject *args_tuple = PyTuple_New(1); PyTuple_SetItem(args_tuple, 0, c_args); pSents = PyObject_CallObject(pSentMod, args_tuple); if ( pSents == 0x0){ PyErr_Print(); return return_value; } As you mentioned yesterday, CallObject needs a tuple, so that was the problem. Now it works. You also asked why I don't just use pListStrE. I tried that and got a long error message from PyErr_Print. I'm not far enough along in my C_API work to understand why, but it doesn't work. Thanks very much for your help on this. Jen Feb 9, 2022, 17:40 by songofaca...@gmail.com: > On Thu, Feb 10, 2022 at 10:37 AM Jen Kris wrote: > >> >> I'm using Python 3.8 so I tried your second choice: >> >> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); >> >> but pSents is 0x0. pSentMod and pListItem are valid pointers. >> > > It means exception happened. > If you are writing Python/C function, return NULL (e.g. `if (pSents == > NULL) return NULL`) > Then Python show the exception and traceback for you. > > -- > Inada Naoki > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
On 2022-02-10 01:37, Jen Kris via Python-list wrote: I'm using Python 3.8 so I tried your second choice: pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); but pSents is 0x0. pSentMod and pListItem are valid pointers. 'PyObject_CallFunction' looks like a good one to use: """PyObject* PyObject_CallFunction(PyObject *callable, const char *format, ...) Call a callable Python object callable, with a variable number of C arguments. The C arguments are described using a Py_BuildValue() style format string. The format can be NULL, indicating that no arguments are provided. """ [snip] What I do is add comments to keep track of what objects I have references to at each point and whether they are new references or could be NULL. For example: pName = PyUnicode_FromString("nltk.corpus"); //> pName+? This means that 'pName' contains a reference, '+' means that it's a new reference, and '?' means that it could be NULL (usually due to an exception, but not always) so I need to check it. Continuing in this vein: pModule = PyImport_Import(pName); //> pName+? pModule+? pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); //> pName+? pModule+? pSubMod+? pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod = PyObject_GetAttrString(pSubMod, "sents"); //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? pFileIds = PyObject_CallObject(pFidMod, 0); //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? pListItem = PyList_GetItem(pFileIds, listIndex); //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? pListItem? pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); //> pName+? pModule+? pSubMod+? pFidMod+? pSentMod+? PyObject_CallObject+? pListItem? pListStrE+? As you can see, there's a lot of leaked references building up. Note how after: pListItem = PyList_GetItem(pFileIds, listIndex); the addition is: //> pListItem? This means that 'pListItem' contains a borrowed (not new) reference, but could be NULL. I find it easiest to DECREF as soon as I no longer need the reference and remove a name from the list as soon I no longer need it (and DECREFed where). For example: pName = PyUnicode_FromString("nltk.corpus"); //> pName+? if (!pName) goto error; //> pName+ pModule = PyImport_Import(pName); //> pName+ pModule+? Py_DECREF(pName); //> pModule+? if (!pModule) goto error; //> pModule+ I find that doing this greatly reduces the chances of getting the reference counting wrong, and I can remove the comments once I've finished the function I'm writing. -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
I'll do that and post back tomorrow. The office is closing and I have to leave now (I'm in Seattle). Thanks again for your help. Feb 9, 2022, 17:40 by songofaca...@gmail.com: > On Thu, Feb 10, 2022 at 10:37 AM Jen Kris wrote: > >> >> I'm using Python 3.8 so I tried your second choice: >> >> pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); >> >> but pSents is 0x0. pSentMod and pListItem are valid pointers. >> > > It means exception happened. > If you are writing Python/C function, return NULL (e.g. `if (pSents == > NULL) return NULL`) > Then Python show the exception and traceback for you. > > -- > Inada Naoki > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
On Thu, Feb 10, 2022 at 10:37 AM Jen Kris wrote: > > I'm using Python 3.8 so I tried your second choice: > > pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); > > but pSents is 0x0. pSentMod and pListItem are valid pointers. > It means exception happened. If you are writing Python/C function, return NULL (e.g. `if (pSents == NULL) return NULL`) Then Python show the exception and traceback for you. -- Inada Naoki -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
I'm using Python 3.8 so I tried your second choice: pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); but pSents is 0x0. pSentMod and pListItem are valid pointers. Feb 9, 2022, 17:23 by songofaca...@gmail.com: > // https://docs.python.org/3/c-api/call.html#c.PyObject_CallNoArgs > // This function is only for one arg. Python >= 3.9 is required. > pSents = PyObject_CallOneArg(pSentMod, pListItem); > > Or > > // https://docs.python.org/3/c-api/call.html#c.PyObject_CallFunctionObjArgs > // This function can call function with multiple arguments. Can be > used with Python <3.9 too. > pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); > > On Thu, Feb 10, 2022 at 10:15 AM Jen Kris wrote: > >> >> Right you are. In that case should I use Py_BuildValue and convert to tuple >> (because it won't return a tuple for a one-arg), or should I just convert >> pListStr to tuple? Thanks for your help. >> >> >> Feb 9, 2022, 17:08 by songofaca...@gmail.com: >> >> On Thu, Feb 10, 2022 at 10:05 AM Jen Kris wrote: >> >> >> Thanks for your reply. >> >> I eliminated the DECREF and now it doesn't segfault but it returns 0x0. Same >> when I substitute pListStrE for pListStr. pListStr contains the string >> representation of the fileid, so it seemed like the one to use. According to >> http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, >> PyBuildValue "builds a tuple only if its format string contains two or more >> format units" and that doc contains examples. >> >> >> Yes, and PyObject_Call accept tuple, not str. >> >> >> https://docs.python.org/3/c-api/call.html#c.PyObject_Call >> >> >> Feb 9, 2022, 16:52 by songofaca...@gmail.com: >> >> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list >> wrote: >> >> >> I have everything finished down to the last line (sentences = >> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, >> but it segfaults. The fileid is a string -- the first fileid in this corpus >> is "austen-emma.txt." >> >> pName = PyUnicode_FromString("nltk.corpus"); >> pModule = PyImport_Import(pName); >> >> pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); >> pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); >> pSentMod = PyObject_GetAttrString(pSubMod, "sents"); >> >> pFileIds = PyObject_CallObject(pFidMod, 0); >> pListItem = PyList_GetItem(pFileIds, listIndex); >> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); >> pListStr = PyBytes_AS_STRING(pListStrE); >> Py_DECREF(pListStrE); >> >> >> HERE. >> PyBytes_AS_STRING() returns pointer in the pListStrE Object. >> So Py_DECREF(pListStrE) makes pListStr a dangling pointer. >> >> >> // sentences = gutenberg.sents(fileid) >> PyObject *c_args = Py_BuildValue("s", pListStr); >> >> >> Why do you encode pListStrE? >> Why don't you use just pListStrE? >> >> PyObject *NullPtr = 0; >> pSents = PyObject_Call(pSentMod, c_args, NullPtr); >> >> >> c_args must tuple, but you passed a unicode object here. >> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue >> >> The final line segfaults: >> Program received signal SIGSEGV, Segmentation fault. >> 0x76e4e8d5 in _PyEval_EvalCodeWithName () >> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 >> >> My guess is the problem is in Py_BuildValue, which returns a pointer but it >> may not be constructed correctly. I also tried it with "O" and it doesn't >> segfault but it returns 0x0. >> >> I'm new to using the C API. Thanks for any help. >> >> Jen >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> >> >> Bests, >> >> -- >> Inada Naoki >> >> >> >> -- >> Inada Naoki >> > > > -- > Inada Naoki > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
// https://docs.python.org/3/c-api/call.html#c.PyObject_CallNoArgs // This function is only for one arg. Python >= 3.9 is required. pSents = PyObject_CallOneArg(pSentMod, pListItem); Or // https://docs.python.org/3/c-api/call.html#c.PyObject_CallFunctionObjArgs // This function can call function with multiple arguments. Can be used with Python <3.9 too. pSents = PyObject_CallFunctionObjArgs(pSentMod, pListItem); On Thu, Feb 10, 2022 at 10:15 AM Jen Kris wrote: > > Right you are. In that case should I use Py_BuildValue and convert to tuple > (because it won't return a tuple for a one-arg), or should I just convert > pListStr to tuple? Thanks for your help. > > > Feb 9, 2022, 17:08 by songofaca...@gmail.com: > > On Thu, Feb 10, 2022 at 10:05 AM Jen Kris wrote: > > > Thanks for your reply. > > I eliminated the DECREF and now it doesn't segfault but it returns 0x0. Same > when I substitute pListStrE for pListStr. pListStr contains the string > representation of the fileid, so it seemed like the one to use. According to > http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, PyBuildValue > "builds a tuple only if its format string contains two or more format units" > and that doc contains examples. > > > Yes, and PyObject_Call accept tuple, not str. > > > https://docs.python.org/3/c-api/call.html#c.PyObject_Call > > > Feb 9, 2022, 16:52 by songofaca...@gmail.com: > > On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list > wrote: > > > I have everything finished down to the last line (sentences = > gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, > but it segfaults. The fileid is a string -- the first fileid in this corpus > is "austen-emma.txt." > > pName = PyUnicode_FromString("nltk.corpus"); > pModule = PyImport_Import(pName); > > pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); > pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); > pSentMod = PyObject_GetAttrString(pSubMod, "sents"); > > pFileIds = PyObject_CallObject(pFidMod, 0); > pListItem = PyList_GetItem(pFileIds, listIndex); > pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); > pListStr = PyBytes_AS_STRING(pListStrE); > Py_DECREF(pListStrE); > > > HERE. > PyBytes_AS_STRING() returns pointer in the pListStrE Object. > So Py_DECREF(pListStrE) makes pListStr a dangling pointer. > > > // sentences = gutenberg.sents(fileid) > PyObject *c_args = Py_BuildValue("s", pListStr); > > > Why do you encode pListStrE? > Why don't you use just pListStrE? > > PyObject *NullPtr = 0; > pSents = PyObject_Call(pSentMod, c_args, NullPtr); > > > c_args must tuple, but you passed a unicode object here. > Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue > > The final line segfaults: > Program received signal SIGSEGV, Segmentation fault. > 0x76e4e8d5 in _PyEval_EvalCodeWithName () > from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 > > My guess is the problem is in Py_BuildValue, which returns a pointer but it > may not be constructed correctly. I also tried it with "O" and it doesn't > segfault but it returns 0x0. > > I'm new to using the C API. Thanks for any help. > > Jen > > > -- > https://mail.python.org/mailman/listinfo/python-list > > > Bests, > > -- > Inada Naoki > > > > -- > Inada Naoki > > -- Inada Naoki -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
Right you are. In that case should I use Py_BuildValue and convert to tuple (because it won't return a tuple for a one-arg), or should I just convert pListStr to tuple? Thanks for your help. Feb 9, 2022, 17:08 by songofaca...@gmail.com: > On Thu, Feb 10, 2022 at 10:05 AM Jen Kris wrote: > >> >> Thanks for your reply. >> >> I eliminated the DECREF and now it doesn't segfault but it returns 0x0. >> Same when I substitute pListStrE for pListStr. pListStr contains the string >> representation of the fileid, so it seemed like the one to use. According >> to http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, >> PyBuildValue "builds a tuple only if its format string contains two or more >> format units" and that doc contains examples. >> > > Yes, and PyObject_Call accept tuple, not str. > > > https://docs.python.org/3/c-api/call.html#c.PyObject_Call > >> >> Feb 9, 2022, 16:52 by songofaca...@gmail.com: >> >> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list >> wrote: >> >> >> I have everything finished down to the last line (sentences = >> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, >> but it segfaults. The fileid is a string -- the first fileid in this corpus >> is "austen-emma.txt." >> >> pName = PyUnicode_FromString("nltk.corpus"); >> pModule = PyImport_Import(pName); >> >> pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); >> pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); >> pSentMod = PyObject_GetAttrString(pSubMod, "sents"); >> >> pFileIds = PyObject_CallObject(pFidMod, 0); >> pListItem = PyList_GetItem(pFileIds, listIndex); >> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); >> pListStr = PyBytes_AS_STRING(pListStrE); >> Py_DECREF(pListStrE); >> >> >> HERE. >> PyBytes_AS_STRING() returns pointer in the pListStrE Object. >> So Py_DECREF(pListStrE) makes pListStr a dangling pointer. >> >> >> // sentences = gutenberg.sents(fileid) >> PyObject *c_args = Py_BuildValue("s", pListStr); >> >> >> Why do you encode pListStrE? >> Why don't you use just pListStrE? >> >> PyObject *NullPtr = 0; >> pSents = PyObject_Call(pSentMod, c_args, NullPtr); >> >> >> c_args must tuple, but you passed a unicode object here. >> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue >> >> The final line segfaults: >> Program received signal SIGSEGV, Segmentation fault. >> 0x76e4e8d5 in _PyEval_EvalCodeWithName () >> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 >> >> My guess is the problem is in Py_BuildValue, which returns a pointer but it >> may not be constructed correctly. I also tried it with "O" and it doesn't >> segfault but it returns 0x0. >> >> I'm new to using the C API. Thanks for any help. >> >> Jen >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> >> >> Bests, >> >> -- >> Inada Naoki >> > > > -- > Inada Naoki > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
On Thu, Feb 10, 2022 at 10:05 AM Jen Kris wrote: > > Thanks for your reply. > > I eliminated the DECREF and now it doesn't segfault but it returns 0x0. Same > when I substitute pListStrE for pListStr. pListStr contains the string > representation of the fileid, so it seemed like the one to use. According to > http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, > PyBuildValue "builds a tuple only if its format string contains two or more > format units" and that doc contains examples. > Yes, and PyObject_Call accept tuple, not str. https://docs.python.org/3/c-api/call.html#c.PyObject_Call > > Feb 9, 2022, 16:52 by songofaca...@gmail.com: > > On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list > wrote: > > > I have everything finished down to the last line (sentences = > gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, > but it segfaults. The fileid is a string -- the first fileid in this corpus > is "austen-emma.txt." > > pName = PyUnicode_FromString("nltk.corpus"); > pModule = PyImport_Import(pName); > > pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); > pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); > pSentMod = PyObject_GetAttrString(pSubMod, "sents"); > > pFileIds = PyObject_CallObject(pFidMod, 0); > pListItem = PyList_GetItem(pFileIds, listIndex); > pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); > pListStr = PyBytes_AS_STRING(pListStrE); > Py_DECREF(pListStrE); > > > HERE. > PyBytes_AS_STRING() returns pointer in the pListStrE Object. > So Py_DECREF(pListStrE) makes pListStr a dangling pointer. > > > // sentences = gutenberg.sents(fileid) > PyObject *c_args = Py_BuildValue("s", pListStr); > > > Why do you encode pListStrE? > Why don't you use just pListStrE? > > PyObject *NullPtr = 0; > pSents = PyObject_Call(pSentMod, c_args, NullPtr); > > > c_args must tuple, but you passed a unicode object here. > Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue > > The final line segfaults: > Program received signal SIGSEGV, Segmentation fault. > 0x76e4e8d5 in _PyEval_EvalCodeWithName () > from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 > > My guess is the problem is in Py_BuildValue, which returns a pointer but it > may not be constructed correctly. I also tried it with "O" and it doesn't > segfault but it returns 0x0. > > I'm new to using the C API. Thanks for any help. > > Jen > > > -- > https://mail.python.org/mailman/listinfo/python-list > > > Bests, > > -- > Inada Naoki > > -- Inada Naoki -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
Thanks for your reply. I eliminated the DECREF and now it doesn't segfault but it returns 0x0. Same when I substitute pListStrE for pListStr. pListStr contains the string representation of the fileid, so it seemed like the one to use. According to http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, PyBuildValue "builds a tuple only if its format string contains two or more format units" and that doc contains examples. Feb 9, 2022, 16:52 by songofaca...@gmail.com: > On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list > wrote: > >> >> I have everything finished down to the last line (sentences = >> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, >> but it segfaults. The fileid is a string -- the first fileid in this corpus >> is "austen-emma.txt." >> >> pName = PyUnicode_FromString("nltk.corpus"); >> pModule = PyImport_Import(pName); >> >> pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); >> pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); >> pSentMod = PyObject_GetAttrString(pSubMod, "sents"); >> >> pFileIds = PyObject_CallObject(pFidMod, 0); >> pListItem = PyList_GetItem(pFileIds, listIndex); >> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); >> pListStr = PyBytes_AS_STRING(pListStrE); >> Py_DECREF(pListStrE); >> > > HERE. > PyBytes_AS_STRING() returns pointer in the pListStrE Object. > So Py_DECREF(pListStrE) makes pListStr a dangling pointer. > >> >> // sentences = gutenberg.sents(fileid) >> PyObject *c_args = Py_BuildValue("s", pListStr); >> > > Why do you encode pListStrE? > Why don't you use just pListStrE? > >> PyObject *NullPtr = 0; >> pSents = PyObject_Call(pSentMod, c_args, NullPtr); >> > > c_args must tuple, but you passed a unicode object here. > Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue > > >> The final line segfaults: >> Program received signal SIGSEGV, Segmentation fault. >> 0x76e4e8d5 in _PyEval_EvalCodeWithName () >> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 >> >> My guess is the problem is in Py_BuildValue, which returns a pointer but it >> may not be constructed correctly. I also tried it with "O" and it doesn't >> segfault but it returns 0x0. >> >> I'm new to using the C API. Thanks for any help. >> >> Jen >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> > > Bests, > > -- > Inada Naoki > -- https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list wrote: > > I have everything finished down to the last line (sentences = > gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, > but it segfaults. The fileid is a string -- the first fileid in this corpus > is "austen-emma.txt." > > pName = PyUnicode_FromString("nltk.corpus"); > pModule = PyImport_Import(pName); > > pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); > pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); > pSentMod = PyObject_GetAttrString(pSubMod, "sents"); > > pFileIds = PyObject_CallObject(pFidMod, 0); > pListItem = PyList_GetItem(pFileIds, listIndex); > pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); > pListStr = PyBytes_AS_STRING(pListStrE); > Py_DECREF(pListStrE); HERE. PyBytes_AS_STRING() returns pointer in the pListStrE Object. So Py_DECREF(pListStrE) makes pListStr a dangling pointer. > > // sentences = gutenberg.sents(fileid) > PyObject *c_args = Py_BuildValue("s", pListStr); Why do you encode pListStrE? Why don't you use just pListStrE? > PyObject *NullPtr = 0; > pSents = PyObject_Call(pSentMod, c_args, NullPtr); > c_args must tuple, but you passed a unicode object here. Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue > The final line segfaults: > Program received signal SIGSEGV, Segmentation fault. > 0x76e4e8d5 in _PyEval_EvalCodeWithName () >from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 > > My guess is the problem is in Py_BuildValue, which returns a pointer but it > may not be constructed correctly. I also tried it with "O" and it doesn't > segfault but it returns 0x0. > > I'm new to using the C API. Thanks for any help. > > Jen > > > -- > https://mail.python.org/mailman/listinfo/python-list Bests, -- Inada Naoki -- https://mail.python.org/mailman/listinfo/python-list
C API PyObject_Call segfaults with string
This is a follow-on to a question I asked yesterday, which was answered by MRAB. I'm using the Python C API to load the Gutenberg corpus from the nltk library and iterate through the sentences. The Python code I am trying to replicate is: from nltk.corpus import gutenberg for i, fileid in enumerate(gutenberg.fileids()): sentences = gutenberg.sents(fileid) etc I have everything finished down to the last line (sentences = gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, but it segfaults. The fileid is a string -- the first fileid in this corpus is "austen-emma.txt." pName = PyUnicode_FromString("nltk.corpus"); pModule = PyImport_Import(pName); pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); pSentMod = PyObject_GetAttrString(pSubMod, "sents"); pFileIds = PyObject_CallObject(pFidMod, 0); pListItem = PyList_GetItem(pFileIds, listIndex); pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); pListStr = PyBytes_AS_STRING(pListStrE); Py_DECREF(pListStrE); // sentences = gutenberg.sents(fileid) PyObject *c_args = Py_BuildValue("s", pListStr); PyObject *NullPtr = 0; pSents = PyObject_Call(pSentMod, c_args, NullPtr); The final line segfaults: Program received signal SIGSEGV, Segmentation fault. 0x76e4e8d5 in _PyEval_EvalCodeWithName () from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 My guess is the problem is in Py_BuildValue, which returns a pointer but it may not be constructed correctly. I also tried it with "O" and it doesn't segfault but it returns 0x0. I'm new to using the C API. Thanks for any help. Jen -- https://mail.python.org/mailman/listinfo/python-list