Re: Question about file objects...

2009-12-03 Thread r0g
J wrote:
 Something that came up in class...
 
 when you are pulling data from a file using f.next(), the file is read
 one line at a time.
 
 What was explained to us is that Python iterates the file based on a
 carriage return as the delimiter.
 But what if you have a file that has one line of text, but that one
 line has 16,000 items that are comma delimited?
 
 Is there a way to read the file, one item at a time, delimited by
 commas WITHOUT having to read all 16,000 items from that one line,
 then split them out into a list or dictionary??
 
 Cheers
 Jeff
 


Generators are good way of dealing with that sort of thing...

http://dalkescientific.com/writings/NBN/generators.html

Have the generator read in large chunks from file in binary mode then
use string searching/splitting to dole out records one at a time,
topping up the cache when needed.

Roger.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question about file objects...

2009-12-03 Thread nn
On Dec 2, 6:56 pm, Terry Reedy tjre...@udel.edu wrote:
 J wrote:
  On Wed, Dec 2, 2009 at 09:27, nn prueba...@latinmail.com wrote:
  Is there a way to read the file, one item at a time, delimited by
  commas WITHOUT having to read all 16,000 items from that one line,
  then split them out into a list or dictionary??

  File iteration is a convenience since it is the most common case. If
  everything is on one line, you will have to handle record separators
  manually by using the .read(number_of_bytes) method on the file
  object and searching for the comma. If everything fits in memory the
  straightforward way would be to read the whole file with .read() and
  use .split(,) on the returned string. That should give you a nice
  list of everything.

  Agreed. The confusion came because the guy teaching said that
  iterating the file is delimited by a carriage return character...

 If he said exactly that, he is not exactly correct. File iteration looks
 for line ending character(s), which depends on the system or universal
 newline setting.

  which to me sounds like it's an arbitrary thing that can be changed...

  I was already thinking that I'd have to read it in small chunks and
  search for the delimiter i want...  and reading the whole file into a
  string and then splitting that would would be nice, until the file is
  so large that it starts taking up significant amounts of memory.

  Anyway, thanks both of you for the explanations... I appreciate the help!

 I would not be surprised if a generic file chunk generator were posted
 somewhere. It would be a good entry for the Python Cookbook, if not
 there already.

 tjr

There should be but writing one isn't too difficult:

def chunker(file_obj):
parts=['']
while True:
fdata=file_obj.read(8192)
if not fdata: break
parts=(parts[-1]+fdata).split(',')
for col in parts[:-1]:
yield col
yield parts[-1]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question about file objects...

2009-12-02 Thread nn
On Dec 2, 9:14 am, J dreadpiratej...@gmail.com wrote:
 Something that came up in class...

 when you are pulling data from a file using f.next(), the file is read
 one line at a time.

 What was explained to us is that Python iterates the file based on a
 carriage return as the delimiter.
 But what if you have a file that has one line of text, but that one
 line has 16,000 items that are comma delimited?

 Is there a way to read the file, one item at a time, delimited by
 commas WITHOUT having to read all 16,000 items from that one line,
 then split them out into a list or dictionary??

 Cheers
 Jeff

 --

 Ogden Nash  - The trouble with a kitten is that when it grows up,
 it's always a cat. 
 -http://www.brainyquote.com/quotes/authors/o/ogden_nash.html

File iteration is a convenience since it is the most common case. If
everything is on one line, you will have to handle record separators
manually by using the .read(number_of_bytes) method on the file
object and searching for the comma. If everything fits in memory the
straightforward way would be to read the whole file with .read() and
use .split(,) on the returned string. That should give you a nice
list of everything.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question about file objects...

2009-12-02 Thread Andre Engels
On Wed, Dec 2, 2009 at 3:14 PM, J dreadpiratej...@gmail.com wrote:
 Something that came up in class...

 when you are pulling data from a file using f.next(), the file is read
 one line at a time.

 What was explained to us is that Python iterates the file based on a
 carriage return as the delimiter.
 But what if you have a file that has one line of text, but that one
 line has 16,000 items that are comma delimited?

 Is there a way to read the file, one item at a time, delimited by
 commas WITHOUT having to read all 16,000 items from that one line,
 then split them out into a list or dictionary??

If f is a file object, f.read(1) will get the next byte of the file.
Get single-character strings that way until you arrive at a ,, then
concatenate what you have received before that.

-- 
André Engels, andreeng...@gmail.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question about file objects...

2009-12-02 Thread J
On Wed, Dec 2, 2009 at 09:27, nn prueba...@latinmail.com wrote:
 Is there a way to read the file, one item at a time, delimited by
 commas WITHOUT having to read all 16,000 items from that one line,
 then split them out into a list or dictionary??

 File iteration is a convenience since it is the most common case. If
 everything is on one line, you will have to handle record separators
 manually by using the .read(number_of_bytes) method on the file
 object and searching for the comma. If everything fits in memory the
 straightforward way would be to read the whole file with .read() and
 use .split(,) on the returned string. That should give you a nice
 list of everything.

Agreed. The confusion came because the guy teaching said that
iterating the file is delimited by a carriage return character...
which to me sounds like it's an arbitrary thing that can be changed...

I was already thinking that I'd have to read it in small chunks and
search for the delimiter i want...  and reading the whole file into a
string and then splitting that would would be nice, until the file is
so large that it starts taking up significant amounts of memory.

Anyway, thanks both of you for the explanations... I appreciate the help!

Cheers
Jeff



-- 

Charles de Gaulle  - The better I get to know men, the more I find
myself loving dogs. -
http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Question about file objects...

2009-12-02 Thread Terry Reedy

J wrote:

On Wed, Dec 2, 2009 at 09:27, nn prueba...@latinmail.com wrote:

Is there a way to read the file, one item at a time, delimited by
commas WITHOUT having to read all 16,000 items from that one line,
then split them out into a list or dictionary??



File iteration is a convenience since it is the most common case. If
everything is on one line, you will have to handle record separators
manually by using the .read(number_of_bytes) method on the file
object and searching for the comma. If everything fits in memory the
straightforward way would be to read the whole file with .read() and
use .split(,) on the returned string. That should give you a nice
list of everything.


Agreed. The confusion came because the guy teaching said that
iterating the file is delimited by a carriage return character...


If he said exactly that, he is not exactly correct. File iteration looks 
for line ending character(s), which depends on the system or universal 
newline setting.



which to me sounds like it's an arbitrary thing that can be changed...

I was already thinking that I'd have to read it in small chunks and
search for the delimiter i want...  and reading the whole file into a
string and then splitting that would would be nice, until the file is
so large that it starts taking up significant amounts of memory.

Anyway, thanks both of you for the explanations... I appreciate the help!


I would not be surprised if a generic file chunk generator were posted 
somewhere. It would be a good entry for the Python Cookbook, if not 
there already.


tjr

--
http://mail.python.org/mailman/listinfo/python-list