Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-03-20 Thread Warren Weckesser
On Tue, Mar 20, 2012 at 5:59 PM, Chris Barker wrote: > Warren et al: > > On Wed, Mar 7, 2012 at 7:49 AM, Warren Weckesser > wrote: > > If you are setup with Cython to build extension modules, > > I am > > > and you don't mind > > testing an unreleased and experimental reader, > > and I don't. >

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-03-20 Thread Chris Barker
Warren et al: On Wed, Mar 7, 2012 at 7:49 AM, Warren Weckesser wrote: > If you are setup with Cython to build extension modules, I am > and you don't mind > testing an unreleased and experimental reader, and I don't. > you can try the text reader > that I'm working on: https://github.com/Warr

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-03-07 Thread Warren Weckesser
On Tue, Mar 6, 2012 at 4:45 PM, Chris Barker wrote: > On Thu, Mar 1, 2012 at 10:58 PM, Jay Bourque wrote: > > > 1. Loading text files using loadtxt/genfromtxt need a significant > > performance boost (I think at least an order of magnitude increase in > > performance is very doable based on what

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-03-06 Thread Chris Barker
On Thu, Mar 1, 2012 at 10:58 PM, Jay Bourque wrote: > 1. Loading text files using loadtxt/genfromtxt need a significant > performance boost (I think at least an order of magnitude increase in > performance is very doable based on what I've seen with Erin's recfile code) > 2. Improved memory usag

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-03-02 Thread Lluís
Frédéric Bastien writes: > Hi, > mmap can give a speed up in some case, but slow down in other. So care > must be taken when using it. For example, the speed difference between > read and mmap are not the same when the file is local and when it is > on NFS. On NFS, you need to read bigger chunk to

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-03-02 Thread Frédéric Bastien
Hi, mmap can give a speed up in some case, but slow down in other. So care must be taken when using it. For example, the speed difference between read and mmap are not the same when the file is local and when it is on NFS. On NFS, you need to read bigger chunk to make it worthwhile. Another examp

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-03-01 Thread Jay Bourque
*In an effort to build a consensus of what numpy's New and Improved text file readers should look like, I've put together a short list of the main points discussed in this thread so far:* * * 1. Loading text files using loadtxt/genfromtxt need a significant performance boost (I think at least an or

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-29 Thread Ralf Gommers
On Wed, Feb 29, 2012 at 7:57 PM, Erin Sheldon wrote: > Excerpts from Nathaniel Smith's message of Wed Feb 29 13:17:53 -0500 2012: > > On Wed, Feb 29, 2012 at 3:11 PM, Erin Sheldon > wrote: > > > Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 > 2012: > > >> > Even for binary,

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-29 Thread Erin Sheldon
Excerpts from Nathaniel Smith's message of Wed Feb 29 13:17:53 -0500 2012: > On Wed, Feb 29, 2012 at 3:11 PM, Erin Sheldon wrote: > > Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012: > >> > Even for binary, there are pathological cases, e.g. 1) reading a random > >> > sub

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-29 Thread Nathaniel Smith
On Wed, Feb 29, 2012 at 3:11 PM, Erin Sheldon wrote: > Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012: >> > Even for binary, there are pathological cases, e.g. 1) reading a random >> > subset of nearly all rows.  2) reading a single column when rows are >> > small.  In c

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-29 Thread Robert Kern
On Wed, Feb 29, 2012 at 15:11, Erin Sheldon wrote: > Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012: >> > Even for binary, there are pathological cases, e.g. 1) reading a random >> > subset of nearly all rows.  2) reading a single column when rows are >> > small.  In cas

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-29 Thread Erin Sheldon
Excerpts from Erin Sheldon's message of Wed Feb 29 10:11:51 -0500 2012: > Actually, for numpy.memmap you will read the whole file if you try to > grab a single column and read a large fraction of the rows. Here is an That should have been: "...read *all* the rows". -e -- Erin Scott Sheldon Broo

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-29 Thread Erin Sheldon
Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012: > > Even for binary, there are pathological cases, e.g. 1) reading a random > > subset of nearly all rows.  2) reading a single column when rows are > > small.  In case 2 you will only go this route in the first place if you

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-28 Thread Erin Sheldon
Hi All - I've added the relevant code to my numpy fork here https://github.com/esheldon/numpy The python module and c file are at /numpy/lib/recfile.py and /numpy/lib/src/_recfile.c Access from python is numpy.recfile See below for the doc string for the main class, Recfile. Some example

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-28 Thread Nathaniel Smith
[Re-adding the list to the To: field, after it got dropped accidentally] On Tue, Feb 28, 2012 at 12:28 AM, Erin Sheldon wrote: > Excerpts from Nathaniel Smith's message of Mon Feb 27 17:33:52 -0500 2012: >> On Mon, Feb 27, 2012 at 6:02 PM, Erin Sheldon wrote: >> > Excerpts from Nathaniel Smith's

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Travis Oliphant
The architecture of this system should separate the iteration across the I/O from the transformation *on* the data. It should also allow the ability to plug-in different transformations at a low-level --- some thought should go into the API of the low-level transformation.Being able to mem

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Matthew Brett
Hi, On Mon, Feb 27, 2012 at 2:58 PM, Pauli Virtanen wrote: > Hi, > > 27.02.2012 20:43, Alan G Isaac kirjoitti: >> On 2/27/2012 2:28 PM, Pauli Virtanen wrote: >>> ISO specifies comma to be used in international standards >>> (ISO/IEC Directives, part 2 / 6.6.8.1): >>> >>> http://isotc.iso.org/live

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Pauli Virtanen
Hi, 27.02.2012 20:43, Alan G Isaac kirjoitti: > On 2/27/2012 2:28 PM, Pauli Virtanen wrote: >> ISO specifies comma to be used in international standards >> (ISO/IEC Directives, part 2 / 6.6.8.1): >> >> http://isotc.iso.org/livelink/livelink?func=ll&objId=10562502&objAction=download > > I do not t

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Alan G Isaac
On 2/27/2012 2:47 PM, Matthew Brett wrote: > Maybe we can just agree it is an important option to have rather than > an unimportant one, It depends on what you mean by "option". If you mean there should be conversion tools from other formats to a specified supported format, then I agree. If you

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Matthew Brett
Hi, On Mon, Feb 27, 2012 at 2:43 PM, Alan G Isaac wrote: > On 2/27/2012 2:28 PM, Pauli Virtanen wrote: >> ISO specifies comma to be used in international standards >> (ISO/IEC Directives, part 2 / 6.6.8.1): >> >> http://isotc.iso.org/livelink/livelink?func=ll&objId=10562502&objAction=download > >

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Alan G Isaac
On 2/27/2012 2:28 PM, Pauli Virtanen wrote: > ISO specifies comma to be used in international standards > (ISO/IEC Directives, part 2 / 6.6.8.1): > > http://isotc.iso.org/livelink/livelink?func=ll&objId=10562502&objAction=download I do not think you are right. I think that is a presentational req

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Pauli Virtanen
27.02.2012 19:07, Alan G Isaac kirjoitti: > On 2/27/2012 1:00 PM, Paulo Jabardo wrote: >> First of all '.' isn't international notation > > That is in fact a standard designation. > http://en.wikipedia.org/wiki/Decimal_mark#Influence_of_calculators_and_computers ISO specifies comma to be used in

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Alan G Isaac
On 2/27/2012 1:00 PM, Paulo Jabardo wrote: > First of all '.' isn't international notation That is in fact a standard designation. http://en.wikipedia.org/wiki/Decimal_mark#Influence_of_calculators_and_computers Alan Isaac ___ NumPy-Discussion mailing

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Erin Sheldon
Excerpts from Nathaniel Smith's message of Mon Feb 27 12:07:11 -0500 2012: > On Mon, Feb 27, 2012 at 2:44 PM, Erin Sheldon wrote: > > What I've got is a solution for writing and reading structured arrays to > > and from files, both in text files and binary files.  It is written in C > > and python

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Paulo Jabardo
nd of stopped my transition from R to python for a while. Paulo De: Alan G Isaac Para: Discussion of Numerical Python Enviadas: Segunda-feira, 27 de Fevereiro de 2012 12:53 Assunto: Re: [Numpy-discussion] Possible roadmap addendum: building better text fi

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Nathaniel Smith
On Mon, Feb 27, 2012 at 2:44 PM, Erin Sheldon wrote: > What I've got is a solution for writing and reading structured arrays to > and from files, both in text files and binary files.  It is written in C > and python.  It allows reading arbitrary subsets of the data efficiently > without reading in

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Alan G Isaac
On 2/27/2012 10:10 AM, Paulo Jabardo wrote: > I have a few features that I believe would make text file easier for many > people. In some countries (most?) the decimal separator in real numbers is > not a point but a comma. > I think it would be very useful that the decimal separator be specified

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Paulo Jabardo
ire file AFAICT. Paulo De: Jay Bourque Para: numpy-discussion@scipy.org Enviadas: Segunda-feira, 27 de Fevereiro de 2012 2:24 Assunto: Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers Erin Sheldon gmail.com> writes: > > Excer

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Erin Sheldon
Excerpts from Jay Bourque's message of Mon Feb 27 00:24:25 -0500 2012: > Hi Erin, > > I'm the one Travis mentioned earlier about working on this. I was planning on > diving into it this week, but it sounds like you may have some code already > that > fits the requirements? If so, I would be ava

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-27 Thread Lluís
Erin Sheldon writes: [...] > This was why I essentially wrote my own memmap like interface with > recfile, the code I'm converting. It allows working with columns and > rows without loading large chunks of memory. [...] This sounds like at any point in time you only have one part of the array map

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-26 Thread Jay Bourque
Erin Sheldon gmail.com> writes: > > Excerpts from Wes McKinney's message of Sat Feb 25 15:49:37 -0500 2012: > > That may work-- I haven't taken a look at the code but it is probably > > a good starting point. We could create a new repo on the pydata GitHub > > org (http://github.com/pydata) and

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-26 Thread Erin Sheldon
Excerpts from Erin Sheldon's message of Sun Feb 26 17:35:00 -0500 2012: > Excerpts from Warren Weckesser's message of Sun Feb 26 16:22:35 -0500 2012: > > Yes, thanks! I'm working on a mmap version now. I'm very curious to see > > just how much of an improvement it can give. > > FYI, memmap is g

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-26 Thread Erin Sheldon
Excerpts from Warren Weckesser's message of Sun Feb 26 16:22:35 -0500 2012: > Yes, thanks! I'm working on a mmap version now. I'm very curious to see > just how much of an improvement it can give. FYI, memmap is generally an incomplete solution for numpy arrays; it only understands rows, not co

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-26 Thread Warren Weckesser
On Sun, Feb 26, 2012 at 3:00 PM, Nathaniel Smith wrote: > On Sun, Feb 26, 2012 at 7:58 PM, Warren Weckesser > wrote: > > Right, I got that. Sorry if the placement of the notes about how to > clear > > the cache seemed to imply otherwise. > > OK, cool, np. > > >> Clearing the disk cache is very

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-26 Thread Nathaniel Smith
On Sun, Feb 26, 2012 at 7:58 PM, Warren Weckesser wrote: > Right, I got that.  Sorry if the placement of the notes about how to clear > the cache seemed to imply otherwise. OK, cool, np. >> Clearing the disk cache is very important for getting meaningful, >> repeatable benchmarks in code where y

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-26 Thread Francesc Alted
On Feb 26, 2012, at 1:49 PM, Nathaniel Smith wrote: > On Sun, Feb 26, 2012 at 7:16 PM, Warren Weckesser > wrote: >> On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith wrote: >>> For this kind of benchmarking, you'd really rather be measuring the >>> CPU time, or reading byte streams that are alrea

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-26 Thread Warren Weckesser
On Sun, Feb 26, 2012 at 1:49 PM, Nathaniel Smith wrote: > On Sun, Feb 26, 2012 at 7:16 PM, Warren Weckesser > wrote: > > On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith wrote: > >> For this kind of benchmarking, you'd really rather be measuring the > >> CPU time, or reading byte streams that a

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-26 Thread Nathaniel Smith
On Sun, Feb 26, 2012 at 7:16 PM, Warren Weckesser wrote: > On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith wrote: >> For this kind of benchmarking, you'd really rather be measuring the >> CPU time, or reading byte streams that are already in memory. If you >> can process more MB/s than the drive

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-26 Thread Francesc Alted
On Feb 26, 2012, at 1:16 PM, Warren Weckesser wrote: > For anyone benchmarking software like this, be sure to clear the disk cache > before each run. In linux: > > $ sync > $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" > It is also a good idea to run a disk-cache enabled test too, just to b

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-26 Thread Warren Weckesser
On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith wrote: > On Sun, Feb 26, 2012 at 5:23 PM, Warren Weckesser > wrote: > > I haven't pushed it to the extreme, but the "big" example (in the > examples/ > > directory) is a 1 gig text file with 2 million rows and 50 fields in each > > row. This is r

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-26 Thread Nathaniel Smith
On Sun, Feb 26, 2012 at 5:23 PM, Warren Weckesser wrote: > I haven't pushed it to the extreme, but the "big" example (in the examples/ > directory) is a 1 gig text file with 2 million rows and 50 fields in each > row.  This is read in less than 30 seconds (but that's with a solid state > drive).

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-26 Thread Warren Weckesser
On Thu, Feb 23, 2012 at 2:19 PM, Warren Weckesser < warren.weckes...@enthought.com> wrote: > > On Thu, Feb 23, 2012 at 2:08 PM, Travis Oliphant wrote: > >> This is actually on my short-list as well --- it just didn't make it to >> the list. >> >> In fact, we have someone starting work on it this w

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-25 Thread Travis Oliphant
I will just let Jay know that he should coordinate with you.It would be helpful for him to have someone to collaborate with on this. I'm looking forward to seeing your code. Definitely don't hold back on our account. We will adapt to whatever you can offer. Best regards, -Travis On

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-25 Thread Erin Sheldon
Excerpts from Wes McKinney's message of Sat Feb 25 15:49:37 -0500 2012: > That may work-- I haven't taken a look at the code but it is probably > a good starting point. We could create a new repo on the pydata GitHub > org (http://github.com/pydata) and use that as our point of > collaboration. I w

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-25 Thread Wes McKinney
On Fri, Feb 24, 2012 at 9:07 AM, Erin Sheldon wrote: > Excerpts from Travis Oliphant's message of Thu Feb 23 15:08:52 -0500 2012: >> This is actually on my short-list as well --- it just didn't make it to the >> list. >> >> In fact, we have someone starting work on it this week.  It is his >> fir

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-24 Thread Erin Sheldon
Excerpts from Travis Oliphant's message of Thu Feb 23 15:08:52 -0500 2012: > This is actually on my short-list as well --- it just didn't make it to the > list. > > In fact, we have someone starting work on it this week. It is his > first project so it will take him a little time to get up to s

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Paul Anton Letnes
As others on this list, I've also been confused a bit by the prolific numpy interfaces to reading text. Would it be an idea to create some sort of object oriented solution for this purpose? reader = np.FileReader('my_file.txt') reader.loadtxt() # for backwards compat.; np.loadtxt could instantia

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Drew Frank
For convenience, here's a link to the mailing list thread on this topic from a couple months ago: http://thread.gmane.org/gmane.comp.python.numeric.general/47094 . Drew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/ma

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Pierre Haessig
Le 23/02/2012 22:38, Benjamin Root a écrit : > labmate/officemate/advisor is using Excel... ... or an industrial partner with its windows-based software that can export (when it works) some very nice field data from a proprietary Honeywell data logger. CSV data is better than no data ! (and better

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
On Thu, Feb 23, 2012 at 4:20 PM, Erin Sheldon wrote: > Excerpts from Wes McKinney's message of Thu Feb 23 16:07:04 -0500 2012: >> That's pretty good. That's faster than pandas's csv-module+Cython >> approach almost certainly (but I haven't run your code to get a read >> on how much my hardware mak

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Benjamin Root
On Thu, Feb 23, 2012 at 3:14 PM, Robert Kern wrote: > On Thu, Feb 23, 2012 at 21:09, Gael Varoquaux > wrote: > > On Thu, Feb 23, 2012 at 04:07:04PM -0500, Wes McKinney wrote: > >> In this last case for example, around 500 MB of RAM is taken up for an > >> array that should only be about 80-90MB.

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Erin Sheldon
Excerpts from Wes McKinney's message of Thu Feb 23 16:07:04 -0500 2012: > That's pretty good. That's faster than pandas's csv-module+Cython > approach almost certainly (but I haven't run your code to get a read > on how much my hardware makes a difference), but that's not shocking > at all: > > In

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Robert Kern
On Thu, Feb 23, 2012 at 21:09, Gael Varoquaux wrote: > On Thu, Feb 23, 2012 at 04:07:04PM -0500, Wes McKinney wrote: >> In this last case for example, around 500 MB of RAM is taken up for an >> array that should only be about 80-90MB. If you're a data scientist >> working in Python, this is _not g

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Éric Depagne
> But why, oh why, are people storing big data in CSV? Well, that's what scientist do :-) Éric. > > G > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Un clavier azerty en vaut

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Gael Varoquaux
On Thu, Feb 23, 2012 at 04:07:04PM -0500, Wes McKinney wrote: > In this last case for example, around 500 MB of RAM is taken up for an > array that should only be about 80-90MB. If you're a data scientist > working in Python, this is _not good_. But why, oh why, are people storing big data in CSV?

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
On Thu, Feb 23, 2012 at 3:55 PM, Erin Sheldon wrote: > Excerpts from Wes McKinney's message of Thu Feb 23 15:45:18 -0500 2012: >> Reasonably wide CSV files with hundreds of thousands to millions of >> rows. I have a separate interest in JSON handling but that is a >> different kind of problem, and

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Erin Sheldon
Excerpts from Wes McKinney's message of Thu Feb 23 15:45:18 -0500 2012: > Reasonably wide CSV files with hundreds of thousands to millions of > rows. I have a separate interest in JSON handling but that is a > different kind of problem, and probably just a matter of forking > ultrajson and having i

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
On Thu, Feb 23, 2012 at 3:31 PM, Éric Depagne wrote: > Le jeudi 23 février 2012 21:24:28, Wes McKinney a écrit : >> > That would indeed be great. Reading large files is a real pain whatever the > python method used. > > BTW, could you tell us what you mean by large files? > > cheers, > Éric. Reas

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Pierre Haessig
Le 23/02/2012 21:08, Travis Oliphant a écrit : > I think loadtxt is now the 3rd or 4th "text-reading" interface I've seen in > NumPy. Ok, now I understand why I got confused ;-) -- Pierre signature.asc Description: OpenPGP digital signature ___ Num

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Pierre Haessig
Le 23/02/2012 20:32, Wes McKinney a écrit : > If anyone wants to get involved in this particular problem right > now, let me know! Hi Wes, I'm totally out of the implementations issues you described, but I have some million-lines-long CSV files so that I experience "some slowdown" when loading tho

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Erin Sheldon
Excerpts from Wes McKinney's message of Thu Feb 23 15:24:44 -0500 2012: > On Thu, Feb 23, 2012 at 3:23 PM, Erin Sheldon wrote: > > I designed the recfile package to fill this need.  It might be a start. > Can you relicense as BSD-compatible? If required, that would be fine with me. -e > > > Exc

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Éric Depagne
Le jeudi 23 février 2012 21:24:28, Wes McKinney a écrit : > That would indeed be great. Reading large files is a real pain whatever the python method used. BTW, could you tell us what you mean by large files? cheers, Éric. > Sweet, between this, Continuum folks, and me and my guys I think we

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
On Thu, Feb 23, 2012 at 3:23 PM, Erin Sheldon wrote: > Wes - > > I designed the recfile package to fill this need.  It might be a start. > > Some features: > >    - the ability to efficiently read any subset of the data without >      loading the whole file. >    - reads directly into a recarray,

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
On Thu, Feb 23, 2012 at 3:19 PM, Warren Weckesser wrote: > > On Thu, Feb 23, 2012 at 2:08 PM, Travis Oliphant > wrote: >> >> This is actually on my short-list as well --- it just didn't make it to >> the list. >> >> In fact, we have someone starting work on it this week.  It is his first >> proje

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Erin Sheldon
Wes - I designed the recfile package to fill this need. It might be a start. Some features: - the ability to efficiently read any subset of the data without loading the whole file. - reads directly into a recarray, so no overheads. - object oriented interface, mimicking rec

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Warren Weckesser
On Thu, Feb 23, 2012 at 2:08 PM, Travis Oliphant wrote: > This is actually on my short-list as well --- it just didn't make it to > the list. > > In fact, we have someone starting work on it this week. It is his first > project so it will take him a little time to get up to speed on it, but he >

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
On Thu, Feb 23, 2012 at 3:08 PM, Travis Oliphant wrote: > This is actually on my short-list as well --- it just didn't make it to the > list. > > In fact, we have someone starting work on it this week.  It is his first > project so it will take him a little time to get up to speed on it, but he

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Travis Oliphant
This is actually on my short-list as well --- it just didn't make it to the list. In fact, we have someone starting work on it this week. It is his first project so it will take him a little time to get up to speed on it, but he will contact Wes and work with him and report progress to this l

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Pauli Virtanen
Hi, 23.02.2012 20:32, Wes McKinney kirjoitti: [clip] > To be clear: I'm going to do this eventually whether or not it > happens in NumPy because it's an existing problem for heavy > pandas users. I see no reason why the code can't emit structured > arrays, too, so we might as well have a common li

[Numpy-discussion] Possible roadmap addendum: building better text file readers

2012-02-23 Thread Wes McKinney
dear all, I haven't read all 180 e-mails, but I didn't see this on Travis's initial list. All of the existing flat file reading solutions I have seen are not suitable for many applications, and they compare very unfavorably to tools present in other languages, like R. Here are some of the main is