Warren et al:
On Wed, Mar 7, 2012 at 7:49 AM, Warren Weckesser
warren.weckes...@enthought.com wrote:
If you are setup with Cython to build extension modules,
I am
and you don't mind
testing an unreleased and experimental reader,
and I don't.
you can try the text reader
that I'm working
On Tue, Mar 20, 2012 at 5:59 PM, Chris Barker chris.bar...@noaa.gov wrote:
Warren et al:
On Wed, Mar 7, 2012 at 7:49 AM, Warren Weckesser
warren.weckes...@enthought.com wrote:
If you are setup with Cython to build extension modules,
I am
and you don't mind
testing an unreleased and
On Tue, Mar 6, 2012 at 4:45 PM, Chris Barker chris.bar...@noaa.gov wrote:
On Thu, Mar 1, 2012 at 10:58 PM, Jay Bourque jayv...@gmail.com wrote:
1. Loading text files using loadtxt/genfromtxt need a significant
performance boost (I think at least an order of magnitude increase in
Hi,
mmap can give a speed up in some case, but slow down in other. So care
must be taken when using it. For example, the speed difference between
read and mmap are not the same when the file is local and when it is
on NFS. On NFS, you need to read bigger chunk to make it worthwhile.
Another
Frédéric Bastien writes:
Hi,
mmap can give a speed up in some case, but slow down in other. So care
must be taken when using it. For example, the speed difference between
read and mmap are not the same when the file is local and when it is
on NFS. On NFS, you need to read bigger chunk to
*In an effort to build a consensus of what numpy's New and Improved text
file readers should look like, I've put together a short list of the main
points discussed in this thread so far:*
*
*
1. Loading text files using loadtxt/genfromtxt need a significant
performance boost (I think at least an
Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012:
Even for binary, there are pathological cases, e.g. 1) reading a random
subset of nearly all rows. 2) reading a single column when rows are
small. In case 2 you will only go this route in the first place if you
Excerpts from Erin Sheldon's message of Wed Feb 29 10:11:51 -0500 2012:
Actually, for numpy.memmap you will read the whole file if you try to
grab a single column and read a large fraction of the rows. Here is an
That should have been: ...read *all* the rows.
-e
--
Erin Scott Sheldon
On Wed, Feb 29, 2012 at 15:11, Erin Sheldon erin.shel...@gmail.com wrote:
Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012:
Even for binary, there are pathological cases, e.g. 1) reading a random
subset of nearly all rows. 2) reading a single column when rows are
On Wed, Feb 29, 2012 at 3:11 PM, Erin Sheldon erin.shel...@gmail.com wrote:
Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012:
Even for binary, there are pathological cases, e.g. 1) reading a random
subset of nearly all rows. 2) reading a single column when rows are
Excerpts from Nathaniel Smith's message of Wed Feb 29 13:17:53 -0500 2012:
On Wed, Feb 29, 2012 at 3:11 PM, Erin Sheldon erin.shel...@gmail.com wrote:
Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012:
Even for binary, there are pathological cases, e.g. 1) reading a
On Wed, Feb 29, 2012 at 7:57 PM, Erin Sheldon erin.shel...@gmail.comwrote:
Excerpts from Nathaniel Smith's message of Wed Feb 29 13:17:53 -0500 2012:
On Wed, Feb 29, 2012 at 3:11 PM, Erin Sheldon erin.shel...@gmail.com
wrote:
Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16
[Re-adding the list to the To: field, after it got dropped accidentally]
On Tue, Feb 28, 2012 at 12:28 AM, Erin Sheldon erin.shel...@gmail.com wrote:
Excerpts from Nathaniel Smith's message of Mon Feb 27 17:33:52 -0500 2012:
On Mon, Feb 27, 2012 at 6:02 PM, Erin Sheldon erin.shel...@gmail.com
Hi All -
I've added the relevant code to my numpy fork here
https://github.com/esheldon/numpy
The python module and c file are at /numpy/lib/recfile.py and
/numpy/lib/src/_recfile.c Access from python is numpy.recfile
See below for the doc string for the main class, Recfile. Some example
Erin Sheldon writes:
[...]
This was why I essentially wrote my own memmap like interface with
recfile, the code I'm converting. It allows working with columns and
rows without loading large chunks of memory.
[...]
This sounds like at any point in time you only have one part of the array
Excerpts from Jay Bourque's message of Mon Feb 27 00:24:25 -0500 2012:
Hi Erin,
I'm the one Travis mentioned earlier about working on this. I was planning on
diving into it this week, but it sounds like you may have some code already
that
fits the requirements? If so, I would be
De: Jay Bourque jayv...@gmail.com
Para: numpy-discussion@scipy.org
Enviadas: Segunda-feira, 27 de Fevereiro de 2012 2:24
Assunto: Re: [Numpy-discussion] Possible roadmap addendum: building better
text file readers
Erin Sheldon erin.sheldon at gmail.com writes:
Excerpts from
On 2/27/2012 10:10 AM, Paulo Jabardo wrote:
I have a few features that I believe would make text file easier for many
people. In some countries (most?) the decimal separator in real numbers is
not a point but a comma.
I think it would be very useful that the decimal separator be specified
On Mon, Feb 27, 2012 at 2:44 PM, Erin Sheldon erin.shel...@gmail.com wrote:
What I've got is a solution for writing and reading structured arrays to
and from files, both in text files and binary files. It is written in C
and python. It allows reading arbitrary subsets of the data efficiently
for a while.
Paulo
De: Alan G Isaac alan.is...@gmail.com
Para: Discussion of Numerical Python numpy-discussion@scipy.org
Enviadas: Segunda-feira, 27 de Fevereiro de 2012 12:53
Assunto: Re: [Numpy-discussion] Possible roadmap addendum: building better text
file readers
Excerpts from Nathaniel Smith's message of Mon Feb 27 12:07:11 -0500 2012:
On Mon, Feb 27, 2012 at 2:44 PM, Erin Sheldon erin.shel...@gmail.com wrote:
What I've got is a solution for writing and reading structured arrays to
and from files, both in text files and binary files. It is written
On 2/27/2012 1:00 PM, Paulo Jabardo wrote:
First of all '.' isn't international notation
That is in fact a standard designation.
http://en.wikipedia.org/wiki/Decimal_mark#Influence_of_calculators_and_computers
Alan Isaac
___
NumPy-Discussion mailing
27.02.2012 19:07, Alan G Isaac kirjoitti:
On 2/27/2012 1:00 PM, Paulo Jabardo wrote:
First of all '.' isn't international notation
That is in fact a standard designation.
http://en.wikipedia.org/wiki/Decimal_mark#Influence_of_calculators_and_computers
ISO specifies comma to be used in
On 2/27/2012 2:28 PM, Pauli Virtanen wrote:
ISO specifies comma to be used in international standards
(ISO/IEC Directives, part 2 / 6.6.8.1):
http://isotc.iso.org/livelink/livelink?func=llobjId=10562502objAction=download
I do not think you are right.
I think that is a presentational
Hi,
On Mon, Feb 27, 2012 at 2:43 PM, Alan G Isaac alan.is...@gmail.com wrote:
On 2/27/2012 2:28 PM, Pauli Virtanen wrote:
ISO specifies comma to be used in international standards
(ISO/IEC Directives, part 2 / 6.6.8.1):
On 2/27/2012 2:47 PM, Matthew Brett wrote:
Maybe we can just agree it is an important option to have rather than
an unimportant one,
It depends on what you mean by option.
If you mean there should be conversion tools
from other formats to a specified supported
format, then I agree.
If you
Hi,
On Mon, Feb 27, 2012 at 2:58 PM, Pauli Virtanen p...@iki.fi wrote:
Hi,
27.02.2012 20:43, Alan G Isaac kirjoitti:
On 2/27/2012 2:28 PM, Pauli Virtanen wrote:
ISO specifies comma to be used in international standards
(ISO/IEC Directives, part 2 / 6.6.8.1):
The architecture of this system should separate the iteration across the I/O
from the transformation *on* the data. It should also allow the ability to
plug-in different transformations at a low-level --- some thought should go
into the API of the low-level transformation.Being able to
On Thu, Feb 23, 2012 at 2:19 PM, Warren Weckesser
warren.weckes...@enthought.com wrote:
On Thu, Feb 23, 2012 at 2:08 PM, Travis Oliphant tra...@continuum.iowrote:
This is actually on my short-list as well --- it just didn't make it to
the list.
In fact, we have someone starting work on it
On Sun, Feb 26, 2012 at 5:23 PM, Warren Weckesser
warren.weckes...@enthought.com wrote:
I haven't pushed it to the extreme, but the big example (in the examples/
directory) is a 1 gig text file with 2 million rows and 50 fields in each
row. This is read in less than 30 seconds (but that's with
On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith n...@pobox.com wrote:
On Sun, Feb 26, 2012 at 5:23 PM, Warren Weckesser
warren.weckes...@enthought.com wrote:
I haven't pushed it to the extreme, but the big example (in the
examples/
directory) is a 1 gig text file with 2 million rows and
On Feb 26, 2012, at 1:16 PM, Warren Weckesser wrote:
For anyone benchmarking software like this, be sure to clear the disk cache
before each run. In linux:
$ sync
$ sudo sh -c echo 3 /proc/sys/vm/drop_caches
It is also a good idea to run a disk-cache enabled test too, just to better
On Sun, Feb 26, 2012 at 7:16 PM, Warren Weckesser
warren.weckes...@enthought.com wrote:
On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith n...@pobox.com wrote:
For this kind of benchmarking, you'd really rather be measuring the
CPU time, or reading byte streams that are already in memory. If you
On Sun, Feb 26, 2012 at 1:49 PM, Nathaniel Smith n...@pobox.com wrote:
On Sun, Feb 26, 2012 at 7:16 PM, Warren Weckesser
warren.weckes...@enthought.com wrote:
On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith n...@pobox.com wrote:
For this kind of benchmarking, you'd really rather be
On Feb 26, 2012, at 1:49 PM, Nathaniel Smith wrote:
On Sun, Feb 26, 2012 at 7:16 PM, Warren Weckesser
warren.weckes...@enthought.com wrote:
On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith n...@pobox.com wrote:
For this kind of benchmarking, you'd really rather be measuring the
CPU time, or
On Sun, Feb 26, 2012 at 7:58 PM, Warren Weckesser
warren.weckes...@enthought.com wrote:
Right, I got that. Sorry if the placement of the notes about how to clear
the cache seemed to imply otherwise.
OK, cool, np.
Clearing the disk cache is very important for getting meaningful,
repeatable
On Sun, Feb 26, 2012 at 3:00 PM, Nathaniel Smith n...@pobox.com wrote:
On Sun, Feb 26, 2012 at 7:58 PM, Warren Weckesser
warren.weckes...@enthought.com wrote:
Right, I got that. Sorry if the placement of the notes about how to
clear
the cache seemed to imply otherwise.
OK, cool, np.
Excerpts from Warren Weckesser's message of Sun Feb 26 16:22:35 -0500 2012:
Yes, thanks! I'm working on a mmap version now. I'm very curious to see
just how much of an improvement it can give.
FYI, memmap is generally an incomplete solution for numpy arrays; it
only understands rows, not
Excerpts from Erin Sheldon's message of Sun Feb 26 17:35:00 -0500 2012:
Excerpts from Warren Weckesser's message of Sun Feb 26 16:22:35 -0500 2012:
Yes, thanks! I'm working on a mmap version now. I'm very curious to see
just how much of an improvement it can give.
FYI, memmap is
Erin Sheldon erin.sheldon at gmail.com writes:
Excerpts from Wes McKinney's message of Sat Feb 25 15:49:37 -0500 2012:
That may work-- I haven't taken a look at the code but it is probably
a good starting point. We could create a new repo on the pydata GitHub
org
On Fri, Feb 24, 2012 at 9:07 AM, Erin Sheldon erin.shel...@gmail.com wrote:
Excerpts from Travis Oliphant's message of Thu Feb 23 15:08:52 -0500 2012:
This is actually on my short-list as well --- it just didn't make it to the
list.
In fact, we have someone starting work on it this week. It
Excerpts from Wes McKinney's message of Sat Feb 25 15:49:37 -0500 2012:
That may work-- I haven't taken a look at the code but it is probably
a good starting point. We could create a new repo on the pydata GitHub
org (http://github.com/pydata) and use that as our point of
collaboration. I will
I will just let Jay know that he should coordinate with you.It would be
helpful for him to have someone to collaborate with on this.
I'm looking forward to seeing your code. Definitely don't hold back on our
account. We will adapt to whatever you can offer.
Best regards,
-Travis
On
Excerpts from Travis Oliphant's message of Thu Feb 23 15:08:52 -0500 2012:
This is actually on my short-list as well --- it just didn't make it to the
list.
In fact, we have someone starting work on it this week. It is his
first project so it will take him a little time to get up to speed
dear all,
I haven't read all 180 e-mails, but I didn't see this on Travis's
initial list.
All of the existing flat file reading solutions I have seen are
not suitable for many applications, and they compare very unfavorably
to tools present in other languages, like R. Here are some of the
main
Hi,
23.02.2012 20:32, Wes McKinney kirjoitti:
[clip]
To be clear: I'm going to do this eventually whether or not it
happens in NumPy because it's an existing problem for heavy
pandas users. I see no reason why the code can't emit structured
arrays, too, so we might as well have a common
This is actually on my short-list as well --- it just didn't make it to the
list.
In fact, we have someone starting work on it this week. It is his first
project so it will take him a little time to get up to speed on it, but he will
contact Wes and work with him and report progress to this
On Thu, Feb 23, 2012 at 3:08 PM, Travis Oliphant tra...@continuum.io wrote:
This is actually on my short-list as well --- it just didn't make it to the
list.
In fact, we have someone starting work on it this week. It is his first
project so it will take him a little time to get up to speed
On Thu, Feb 23, 2012 at 2:08 PM, Travis Oliphant tra...@continuum.iowrote:
This is actually on my short-list as well --- it just didn't make it to
the list.
In fact, we have someone starting work on it this week. It is his first
project so it will take him a little time to get up to speed
Wes -
I designed the recfile package to fill this need. It might be a start.
Some features:
- the ability to efficiently read any subset of the data without
loading the whole file.
- reads directly into a recarray, so no overheads.
- object oriented interface, mimicking
On Thu, Feb 23, 2012 at 3:19 PM, Warren Weckesser
warren.weckes...@enthought.com wrote:
On Thu, Feb 23, 2012 at 2:08 PM, Travis Oliphant tra...@continuum.io
wrote:
This is actually on my short-list as well --- it just didn't make it to
the list.
In fact, we have someone starting work on it
On Thu, Feb 23, 2012 at 3:23 PM, Erin Sheldon erin.shel...@gmail.com wrote:
Wes -
I designed the recfile package to fill this need. It might be a start.
Some features:
- the ability to efficiently read any subset of the data without
loading the whole file.
- reads directly
Le jeudi 23 février 2012 21:24:28, Wes McKinney a écrit :
That would indeed be great. Reading large files is a real pain whatever the
python method used.
BTW, could you tell us what you mean by large files?
cheers,
Éric.
Sweet, between this, Continuum folks, and me and my guys I think we
Excerpts from Wes McKinney's message of Thu Feb 23 15:24:44 -0500 2012:
On Thu, Feb 23, 2012 at 3:23 PM, Erin Sheldon erin.shel...@gmail.com wrote:
I designed the recfile package to fill this need. It might be a start.
Can you relicense as BSD-compatible?
If required, that would be fine with
Le 23/02/2012 20:32, Wes McKinney a écrit :
If anyone wants to get involved in this particular problem right
now, let me know!
Hi Wes,
I'm totally out of the implementations issues you described, but I have
some million-lines-long CSV files so that I experience some slowdown
when loading those.
Le 23/02/2012 21:08, Travis Oliphant a écrit :
I think loadtxt is now the 3rd or 4th text-reading interface I've seen in
NumPy.
Ok, now I understand why I got confused ;-)
--
Pierre
signature.asc
Description: OpenPGP digital signature
___
On Thu, Feb 23, 2012 at 3:31 PM, Éric Depagne e...@depagne.org wrote:
Le jeudi 23 février 2012 21:24:28, Wes McKinney a écrit :
That would indeed be great. Reading large files is a real pain whatever the
python method used.
BTW, could you tell us what you mean by large files?
cheers,
Excerpts from Wes McKinney's message of Thu Feb 23 15:45:18 -0500 2012:
Reasonably wide CSV files with hundreds of thousands to millions of
rows. I have a separate interest in JSON handling but that is a
different kind of problem, and probably just a matter of forking
ultrajson and having it
On Thu, Feb 23, 2012 at 3:55 PM, Erin Sheldon erin.shel...@gmail.com wrote:
Excerpts from Wes McKinney's message of Thu Feb 23 15:45:18 -0500 2012:
Reasonably wide CSV files with hundreds of thousands to millions of
rows. I have a separate interest in JSON handling but that is a
different kind
On Thu, Feb 23, 2012 at 04:07:04PM -0500, Wes McKinney wrote:
In this last case for example, around 500 MB of RAM is taken up for an
array that should only be about 80-90MB. If you're a data scientist
working in Python, this is _not good_.
But why, oh why, are people storing big data in CSV?
But why, oh why, are people storing big data in CSV?
Well, that's what scientist do :-)
Éric.
G
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Un clavier azerty en vaut deux
On Thu, Feb 23, 2012 at 21:09, Gael Varoquaux
gael.varoqu...@normalesup.org wrote:
On Thu, Feb 23, 2012 at 04:07:04PM -0500, Wes McKinney wrote:
In this last case for example, around 500 MB of RAM is taken up for an
array that should only be about 80-90MB. If you're a data scientist
working in
Excerpts from Wes McKinney's message of Thu Feb 23 16:07:04 -0500 2012:
That's pretty good. That's faster than pandas's csv-module+Cython
approach almost certainly (but I haven't run your code to get a read
on how much my hardware makes a difference), but that's not shocking
at all:
In [1]:
On Thu, Feb 23, 2012 at 3:14 PM, Robert Kern robert.k...@gmail.com wrote:
On Thu, Feb 23, 2012 at 21:09, Gael Varoquaux
gael.varoqu...@normalesup.org wrote:
On Thu, Feb 23, 2012 at 04:07:04PM -0500, Wes McKinney wrote:
In this last case for example, around 500 MB of RAM is taken up for an
On Thu, Feb 23, 2012 at 4:20 PM, Erin Sheldon erin.shel...@gmail.com wrote:
Excerpts from Wes McKinney's message of Thu Feb 23 16:07:04 -0500 2012:
That's pretty good. That's faster than pandas's csv-module+Cython
approach almost certainly (but I haven't run your code to get a read
on how much
Le 23/02/2012 22:38, Benjamin Root a écrit :
labmate/officemate/advisor is using Excel...
... or an industrial partner with its windows-based software that can
export (when it works) some very nice field data from a proprietary
Honeywell data logger.
CSV data is better than no data ! (and better
For convenience, here's a link to the mailing list thread on this topic
from a couple months ago:
http://thread.gmane.org/gmane.comp.python.numeric.general/47094 .
Drew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
As others on this list, I've also been confused a bit by the prolific numpy
interfaces to reading text. Would it be an idea to create some sort of object
oriented solution for this purpose?
reader = np.FileReader('my_file.txt')
reader.loadtxt() # for backwards compat.; np.loadtxt could
68 matches
Mail list logo