[Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray
Hi all, for one of my projects I am using basically using NLTK for pos tagging, which internally uses a 'english.pickle' file. I managed to package the nltk library with these pickle files to make them available to mapper and reducer for hadoop streaming job using -file option. However, when nltk library is trying to load that pickle file, it gives error for numpy- since the cluster I am running this job does not have numpy installed. Also, I don't have root access thus, can't install numpy or any other package on cluster. So the only way is to package the python modules to make it available for mapper and reducer. I successfully managed to do that. But now the problem is when numpy is imported, it imports multiarray by default( as seen in *init*.py) and this is where I am getting the error: File /usr/lib64/python2.6/pickle.py, line 1370, in load return Unpickler(file).load() File /usr/lib64/python2.6/pickle.py, line 858, in load dispatch[key](self) File /usr/lib64/python2.6/pickle.py, line 1090, in load_global klass = self.find_class(module, name) File /usr/lib64/python2.6/pickle.py, line 1124, in find_class __import__(module) File numpy.mod/numpy/__init__.py, line 170, in module File numpy.mod/numpy/add_newdocs.py, line 13, in module File numpy.mod/numpy/lib/__init__.py, line 8, in module File numpy.mod/numpy/lib/type_check.py, line 11, in module File numpy.mod/numpy/core/__init__.py, line 6, in module ImportError: cannot import name multiarray I tried moving numpy directory on my local machine that contains multiarray.pyd, to the cluster to make it available to mapper and reducer but this didn't help. Any input on how to resolve this(keeping the constraint that I cannot install anything on cluster machines)? Thanks! -- Regards, Kartik Perisetla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Aligned / configurable memory allocation
Hello, I apologize for pinging the list, but I was wondering if there was interest in either of https://github.com/numpy/numpy/pull/5457 (make array data aligned by default) or https://github.com/numpy/numpy/pull/5470 (make the array data allocator configurable)? Regards Antoine. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Silent Broadcasting considered harmful
Chris Barker chris.bar...@noaa.gov wrote: The strongest use-case seems to be for teaching that involves linear algebra concepts, not real production code. Not really. SymPy is a better teaching tool. Some find A*B easier to read than dot(A,B). But with the @ operator in Python 3.5 it does not have a usecase at all. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Silent Broadcasting considered harmful
On Feb 10, 2015 1:03 AM, cjw c...@ncf.ca wrote: On 09-Feb-15 2:34 AM, Stefan Reiterer wrote: Ok that are indeed some good reasons to keep the status quo, especially since performance is crucial for numpy. It's a dillemma: Using the matrix class for linear algebra would be the correct way for such thing, but the matrix API is not that powerful and beautiful as the one of arrays. On the other hand arrays are beautiful, but not exactly intended to use for linear algebra. So maybe the better way would be not to add warnings to braodcasting operations, but to overhaul the matrix class to make it more attractive for numerical linear algebra(?) +1 I hope that this will be explored. @ could still be used by those who wish remain in the array world. What about splitting it off into a scikit, or at least some sort of separate package? If there is sufficient interest in it, it can be maintained there. If not, at least people can use it as-is. But there would not be any expectation going forward that the rest of numpy has to work well with it. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Aligned / configurable memory allocation
On 10 Feb 2015 09:11, Antoine Pitrou solip...@pitrou.net wrote: Hello, I apologize for pinging the list, but I was wondering if there was interest in either of https://github.com/numpy/numpy/pull/5457 (make array data aligned by default) or https://github.com/numpy/numpy/pull/5470 (make the array data allocator configurable)? I'm not a fan of the configurable allocator. It adds new public APIs for us to support, and makes switching to using Python's own memory allocation APIs more complex. The feature is intrinsically dangerous, because newly installed deallocators must be able to handle memory allocated by the previous allocator. (AFAICT the included test case can crash the test process if you get unlucky and GC runs during it?). And no one's articulated any compelling argument for why we need this configurability. Regarding the aligned allocation patch, I think the problem is just that none of us have any way to evaluate it. I'd feel a lot more comfortable with some solid numbers showing the costs and benefits on old and new systems. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Aligned / configurable memory allocation
On 10 Feb 2015 13:10, Antoine Pitrou solip...@pitrou.net wrote: On Tue, 10 Feb 2015 11:26:22 -0800 Nathaniel Smith n...@pobox.com wrote: On 10 Feb 2015 09:11, Antoine Pitrou solip...@pitrou.net wrote: Hello, I apologize for pinging the list, but I was wondering if there was interest in either of https://github.com/numpy/numpy/pull/5457 (make array data aligned by default) or https://github.com/numpy/numpy/pull/5470 (make the array data allocator configurable)? I'm not a fan of the configurable allocator. It adds new public APIs for us to support, and makes switching to using Python's own memory allocation APIs more complex. The feature is intrinsically dangerous, because newly installed deallocators must be able to handle memory allocated by the previous allocator. (AFAICT the included test case can crash the test process if you get unlucky and GC runs during it?). It's taken care of in the patch. Ah, I see -- I missed that you added an allocator field to PyArrayObject. That does reduce my objections to the patch. But I'm still not sure what problems this is solving exactly. Also, if we do decide to add a deallocation callback to PyArrayObject then I think we should take advantage of the opportunity to also make life easier for c API users who need a custom callback on a case-by-case basis and currently have to jump through hoops using -base. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Aligned / configurable memory allocation
On 10.02.2015 22:33, Nathaniel Smith wrote: On 10 Feb 2015 13:10, Antoine Pitrou solip...@pitrou.net mailto:solip...@pitrou.net wrote: On Tue, 10 Feb 2015 11:26:22 -0800 Nathaniel Smith n...@pobox.com mailto:n...@pobox.com wrote: On 10 Feb 2015 09:11, Antoine Pitrou solip...@pitrou.net mailto:solip...@pitrou.net wrote: Hello, I apologize for pinging the list, but I was wondering if there was interest in either of https://github.com/numpy/numpy/pull/5457 (make array data aligned by default) or https://github.com/numpy/numpy/pull/5470 (make the array data allocator configurable)? I'm not a fan of the configurable allocator. It adds new public APIs for us to support, and makes switching to using Python's own memory allocation APIs more complex. The feature is intrinsically dangerous, because newly installed deallocators must be able to handle memory allocated by the previous allocator. (AFAICT the included test case can crash the test process if you get unlucky and GC runs during it?). It's taken care of in the patch. unfortunately it also breaks the ABI on two fronts, by adding a new member to the public array struct which needs initializing by non api using users and by removing the ability to use free on array pointers. Both not particularly large breaks, but breaks nonetheless. At least for the first issue we should (like for the proposed dtype and ufunc changes) apply a more generic break of hiding the new internal members in a new private structure that embeds the public structure unchanged. The second issue can probably be ignored, though we could retain it for posix/c11 as those standards wisely decided to make aligned pointers freeable with free. That on the other hand costs us efficient calloc and realloc (standard comities are weird sometimes ...) Ah, I see -- I missed that you added an allocator field to PyArrayObject. That does reduce my objections to the patch. But I'm still not sure what problems this is solving exactly. Also, if we do decide to add a deallocation callback to PyArrayObject then I think we should take advantage of the opportunity to also make life easier for c API users who need a custom callback on a case-by-case basis and currently have to jump through hoops using -base. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Silent Broadcasting considered harmful
On Tue, Feb 10, 2015 at 5:40 PM, Chris Barker chris.bar...@noaa.gov wrote: On Tue, Feb 10, 2015 at 12:28 AM, Todd toddr...@gmail.com wrote: So maybe the better way would be not to add warnings to braodcasting operations, but to overhaul the matrix class to make it more attractive for numerical linear algebra(?) What about splitting it off into a scikit, or at least some sort of separate package? If there is sufficient interest in it, it can be maintained there. If not, at least people can use it as-is. But there would not be any expectation going forward that the rest of numpy has to work well with it Well, splitting it off is a good idea, It's not, that would be a massive backwards compat break. Just leave as is, and write this discussion up in a FAQ so we won't keep going in circles on this topic. Ralf seeing as how it hasn't gotten much love. But if the rest of numpy does not work well with it, then it becomes even less useful. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Silent Broadcasting considered harmful
Chris Barker chris.bar...@noaa.gov wrote: Well, splitting it off is a good idea, seeing as how it hasn't gotten much love. But if the rest of numpy does not work well with it, then it becomes even less useful. PEP 3118 takes care of that. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray
Thanks David. But do I need to install virtualenv on every node in hadoop cluster? Actually I am not very sure whether same namenodes are assigned for my every hadoop job. So how shall I proceed on such scenario. Thanks for your inputs. Kartik On Feb 11, 2015 1:56 AM, Daπid davidmen...@gmail.com wrote: On 11 February 2015 at 03:38, Kartik Kumar Perisetla kartik.p...@gmail.com wrote: Also, I don't have root access thus, can't install numpy or any other package on cluster You can create a virtualenv, and install packages on it without needing root access. To minimize trouble, you can ensure it uses the system packages when available. Here are instructions on how to install it: https://stackoverflow.com/questions/9348869/how-to-install-virtualenv-without-using-sudo http://opensourcehacker.com/2012/09/16/recommended-way-for-sudo-free-installation-of-python-software-with-virtualenv/ This does not require root access, but it is probably good to check with the sysadmins to make sure they are fine with it. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray
On 11 February 2015 at 03:38, Kartik Kumar Perisetla kartik.p...@gmail.com wrote: Also, I don't have root access thus, can't install numpy or any other package on cluster You can create a virtualenv, and install packages on it without needing root access. To minimize trouble, you can ensure it uses the system packages when available. Here are instructions on how to install it: https://stackoverflow.com/questions/9348869/how-to-install-virtualenv-without-using-sudo http://opensourcehacker.com/2012/09/16/recommended-way-for-sudo-free-installation-of-python-software-with-virtualenv/ This does not require root access, but it is probably good to check with the sysadmins to make sure they are fine with it. /David. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Matrix Class
It seems to be agreed that there are weaknesses in the existing Numpy Matrix Class. Some problems are illustrated below. I'll try to put some suggestions over the coming weeks and would appreciate comments. Colin W. Test Script: if __name__ == '__main__': a= mat([4, 5, 6]) # Good print('a: ', a) b= mat([4, '5', 6]) # Not the expected result print('b: ', b) c= mat([[4, 5, 6], [7, 8]]) # Wrongly accepted as rectangular print('c: ', c) d= mat([[1, 2, 3]]) try: d[0, 1]= 'b'# Correctly flagged, not numeric except ValueError: print(d[0, 1]= 'b' # Correctly flagged, not numeric, ' ValueError') print('d: ', d) Result: *** Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit (AMD64)] on win32. *** a: [[4 5 6]] b: [['4' '5' '6']] c: [[[4, 5, 6] [7, 8]]] d[0, 1]= 'b' # Correctly flagged, not numeric ValueError d: [[1 2 3]] -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/Matrix-Class-tp39719.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion