[issue27826] Null-pointer dereference in tuplehash() function

2016-08-22 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Closing as won't fix.  It is impractical to make marshal resilient against 
bytecode hacks and it is likewise impractical to put a NULL pointer check 
in-front of every dereference in the language.

--
resolution:  -> wont fix
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27826] Null-pointer dereference in tuplehash() function

2016-08-22 Thread STINNER Victor

STINNER Victor added the comment:

> And it is hard to protect from such situation in marshal.c.

Python doesn't validate marshal nor bytecode. It's a deliberate choice to get 
best performances.

--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27826] Null-pointer dereference in tuplehash() function

2016-08-21 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The simplest example:

import marshal
t = [],
t[0].append(t)
b = marshal.dumps(t)
b = bytearray(b)
b[2] = b'<'[0]
marshal.loads(b)

Create a recursive tuple containing a list containing a reference to original 
tuple. Marshal it and replace TYPE_LIST ('[') by TYPE_SET ('<'). Now marshalled 
data contains a recursive tuple containing a set containing a reference to 
original tuple. When a tuple is added to a set, it still is not initialized, 
and hash is calculated on a uninitialized tuple.

I believe it is not possible to create such structure without hacking marhal 
data or using C API. And it is hard to protect from such situation in marshal.c.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27826] Null-pointer dereference in tuplehash() function

2016-08-21 Thread Raymond Hettinger

Raymond Hettinger added the comment:

It may be better to focus on Python/marshal.c to see if there are ways to make 
it more robust (at least checking to see if all of the n entries allocated in a 
container were actually filled).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27826] Null-pointer dereference in tuplehash() function

2016-08-21 Thread Raymond Hettinger

Raymond Hettinger added the comment:

> but it might be better to avoid a crash here.

I'm reluctant to introduce changes like this, especially in the middle of a 
loop.  This code and code like it has been nonproblematic for Python's 26 year 
history.  The code throughout tupleobject.c assumes well-formed tuples (for 
example, tuplecontains and tupleitem do not have NULL checks before 
dereferencing).  At some point, the C code just has to trust its own structure 
invariants and people who run marshal or pickle have to trust their inputs.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27826] Null-pointer dereference in tuplehash() function

2016-08-21 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
assignee:  -> rhettinger
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27826] Null-pointer dereference in tuplehash() function

2016-08-21 Thread Artem Smotrakov

Changes by Artem Smotrakov :


--
keywords: +patch
Added file: http://bugs.python.org/file44184/tuplehash.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue27826] Null-pointer dereference in tuplehash() function

2016-08-21 Thread Artem Smotrakov

New submission from Artem Smotrakov:

A null-pointer dereference may happen while deserialization incorrect data with 
marshal.loads() function.

Here is a test which reproduces this (see also attached 
marshal_tuplehash_null_dereference.py):


import marshal

value = (   # tuple1
"this is a string", #string1
[
1,  # int1
2,  # int2
3,  # int3
4   # int4
],
(   #tuple2
"more tuples",  #string2
1.0,# float1
2.3,# float2
4.5 # float3
),
"this is yet another string"
)

dump = marshal.dumps(value)

data = bytearray(dump)
data[10] = 40
data[4] = 16
data[103] = 143
data[97] = 245
data[78] = 114
data[35] = 188

marshal.loads(bytes(data))



This code modifies the serialized data with the following:
- update type of 'int2' element to TYPE_SET, 'int3' element becomes a length of 
the set
- update 'float3' element to TYPE_REF which points to tuple1

Here is a stack trace reported by ASan:

ASAN:SIGSEGV
=
==20296==ERROR: AddressSanitizer: SEGV on unknown address 0x0008 (pc 
0x00582064 bp 0x7ffc9e581310 sp 0x7ffc9e5812f0 T0)
#0 0x582063 in PyObject_Hash Objects/object.c:769
#1 0x5a3662 in tuplehash Objects/tupleobject.c:358
#2 0x5820ae in PyObject_Hash Objects/object.c:771
#3 0x5a3662 in tuplehash Objects/tupleobject.c:358
#4 0x5820ae in PyObject_Hash Objects/object.c:771
#5 0x58fac8 in set_add_key Objects/setobject.c:422
#6 0x59a85c in PySet_Add Objects/setobject.c:2323
#7 0x760d9d in r_object Python/marshal.c:1310
#8 0x76029d in r_object Python/marshal.c:1223
#9 0x760015 in r_object Python/marshal.c:1195
#10 0x7621dc in read_object Python/marshal.c:1465
#11 0x7639be in marshal_loads Python/marshal.c:1767
#12 0x577ff3 in PyCFunction_Call Objects/methodobject.c:109
#13 0x708a05 in call_function Python/ceval.c:4744
#14 0x6fb5a7 in PyEval_EvalFrameEx Python/ceval.c:3256
#15 0x70276f in _PyEval_EvalCodeWithName Python/ceval.c:4050
#16 0x70299f in PyEval_EvalCodeEx Python/ceval.c:4071
#17 0x6e07d7 in PyEval_EvalCode Python/ceval.c:778
#18 0x432354 in run_mod Python/pythonrun.c:980
#19 0x431e5b in PyRun_FileExFlags Python/pythonrun.c:933
#20 0x42e929 in PyRun_SimpleFileExFlags Python/pythonrun.c:396
#21 0x42caba in PyRun_AnyFileExFlags Python/pythonrun.c:80
#22 0x45f995 in run_file Modules/main.c:319
#23 0x4619c8 in Py_Main Modules/main.c:777
#24 0x41d258 in main Programs/python.c:69
#25 0x7f374629babf in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x20abf)
#26 0x41ce28 in _start 

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV Objects/object.c:769 PyObject_Hash
==20296==ABORTING



What happens when it tries to read int2 element:
- int2 element is now a set of length 3
- add int4 element to the set
- add tuple2 element
-- when it adds an element to a set, it calculates a hash of the element
-- when it calculates a hash of a tuple, it calculates hashes of all elements 
of the tuple
-- while calculating a hash of tuple2, it calculates a hash of tuple1 since 
#float3 now is a TYPE_REF which points to tuple1
-- but tuple1 is not complete yet: length of tuple1 is 4, but only string1 was 
added to it
-- tuplehash() function reads a length of a tuple, and then calls 
PyObject_Hash() for each element
-- but it doesn't check if all elements were added to the tuple
-- as a result, a null-pointer dereference happens in tuplehash() while reading 
second element of tuple1

https://hg.python.org/cpython/file/tip/Objects/tupleobject.c#l347

...
static Py_hash_t
tuplehash(PyTupleObject *v)
{
Py_uhash_t x;  /* Unsigned for defined overflow behavior. */
Py_hash_t y;
Py_ssize_t len = Py_SIZE(v);<= for tuple1 it returns 4, but tuple1 
contains only one element (string1)
PyObject **p;
Py_uhash_t mult = _PyHASH_MULTIPLIER;
x = 0x345678UL;
p = v->ob_item;
while (--len >= 0) {
y = PyObject_Hash(*p++);<= null-pointer dereference happens here 
while reading second element
...


I could reproduce it with python3.5, and latest build of 
https://hg.python.org/cpython (Aug 20th, 2016).

Here is a simple patch which updates tuplehash() to check "p" for null:

diff -r 6e6aa2054824 Objects/tupleobject.c
--- a/Objects/tupleobject.c Sat Aug 20 21:22:03 2016 +0300
+++ b/Objects/tupleobject.c Sat Aug 20 23:17:16 2016 -0700
@@ -355,7 +355,13 @@
 x = 0x345678UL;
 p = v->ob_item;
 while (--len >= 0) {
-y = PyObject_Hash(*p++);
+PyObject *next = *p++;
+if (next == NULL) {
+PyErr_SetString(PyExc_TypeError,
+"Cannot compute a hash, tuple seems to be 
invalid");
+return -1;
+}
+y = PyObject_Hash(next);
 if (y == -1)
 return -1;
 x = (x ^ y) * mult;


According to