Re: Question about file objects...
J wrote: Something that came up in class... when you are pulling data from a file using f.next(), the file is read one line at a time. What was explained to us is that Python iterates the file based on a carriage return as the delimiter. But what if you have a file that has one line of text, but that one line has 16,000 items that are comma delimited? Is there a way to read the file, one item at a time, delimited by commas WITHOUT having to read all 16,000 items from that one line, then split them out into a list or dictionary?? Cheers Jeff Generators are good way of dealing with that sort of thing... http://dalkescientific.com/writings/NBN/generators.html Have the generator read in large chunks from file in binary mode then use string searching/splitting to dole out records one at a time, topping up the cache when needed. Roger. -- http://mail.python.org/mailman/listinfo/python-list
Re: Question about file objects...
On Dec 2, 6:56 pm, Terry Reedy tjre...@udel.edu wrote: J wrote: On Wed, Dec 2, 2009 at 09:27, nn prueba...@latinmail.com wrote: Is there a way to read the file, one item at a time, delimited by commas WITHOUT having to read all 16,000 items from that one line, then split them out into a list or dictionary?? File iteration is a convenience since it is the most common case. If everything is on one line, you will have to handle record separators manually by using the .read(number_of_bytes) method on the file object and searching for the comma. If everything fits in memory the straightforward way would be to read the whole file with .read() and use .split(,) on the returned string. That should give you a nice list of everything. Agreed. The confusion came because the guy teaching said that iterating the file is delimited by a carriage return character... If he said exactly that, he is not exactly correct. File iteration looks for line ending character(s), which depends on the system or universal newline setting. which to me sounds like it's an arbitrary thing that can be changed... I was already thinking that I'd have to read it in small chunks and search for the delimiter i want... and reading the whole file into a string and then splitting that would would be nice, until the file is so large that it starts taking up significant amounts of memory. Anyway, thanks both of you for the explanations... I appreciate the help! I would not be surprised if a generic file chunk generator were posted somewhere. It would be a good entry for the Python Cookbook, if not there already. tjr There should be but writing one isn't too difficult: def chunker(file_obj): parts=[''] while True: fdata=file_obj.read(8192) if not fdata: break parts=(parts[-1]+fdata).split(',') for col in parts[:-1]: yield col yield parts[-1] -- http://mail.python.org/mailman/listinfo/python-list
Re: Question about file objects...
On Dec 2, 9:14 am, J dreadpiratej...@gmail.com wrote: Something that came up in class... when you are pulling data from a file using f.next(), the file is read one line at a time. What was explained to us is that Python iterates the file based on a carriage return as the delimiter. But what if you have a file that has one line of text, but that one line has 16,000 items that are comma delimited? Is there a way to read the file, one item at a time, delimited by commas WITHOUT having to read all 16,000 items from that one line, then split them out into a list or dictionary?? Cheers Jeff -- Ogden Nash - The trouble with a kitten is that when it grows up, it's always a cat. -http://www.brainyquote.com/quotes/authors/o/ogden_nash.html File iteration is a convenience since it is the most common case. If everything is on one line, you will have to handle record separators manually by using the .read(number_of_bytes) method on the file object and searching for the comma. If everything fits in memory the straightforward way would be to read the whole file with .read() and use .split(,) on the returned string. That should give you a nice list of everything. -- http://mail.python.org/mailman/listinfo/python-list
Re: Question about file objects...
On Wed, Dec 2, 2009 at 3:14 PM, J dreadpiratej...@gmail.com wrote: Something that came up in class... when you are pulling data from a file using f.next(), the file is read one line at a time. What was explained to us is that Python iterates the file based on a carriage return as the delimiter. But what if you have a file that has one line of text, but that one line has 16,000 items that are comma delimited? Is there a way to read the file, one item at a time, delimited by commas WITHOUT having to read all 16,000 items from that one line, then split them out into a list or dictionary?? If f is a file object, f.read(1) will get the next byte of the file. Get single-character strings that way until you arrive at a ,, then concatenate what you have received before that. -- André Engels, andreeng...@gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Question about file objects...
On Wed, Dec 2, 2009 at 09:27, nn prueba...@latinmail.com wrote: Is there a way to read the file, one item at a time, delimited by commas WITHOUT having to read all 16,000 items from that one line, then split them out into a list or dictionary?? File iteration is a convenience since it is the most common case. If everything is on one line, you will have to handle record separators manually by using the .read(number_of_bytes) method on the file object and searching for the comma. If everything fits in memory the straightforward way would be to read the whole file with .read() and use .split(,) on the returned string. That should give you a nice list of everything. Agreed. The confusion came because the guy teaching said that iterating the file is delimited by a carriage return character... which to me sounds like it's an arbitrary thing that can be changed... I was already thinking that I'd have to read it in small chunks and search for the delimiter i want... and reading the whole file into a string and then splitting that would would be nice, until the file is so large that it starts taking up significant amounts of memory. Anyway, thanks both of you for the explanations... I appreciate the help! Cheers Jeff -- Charles de Gaulle - The better I get to know men, the more I find myself loving dogs. - http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html -- http://mail.python.org/mailman/listinfo/python-list
Re: Question about file objects...
J wrote: On Wed, Dec 2, 2009 at 09:27, nn prueba...@latinmail.com wrote: Is there a way to read the file, one item at a time, delimited by commas WITHOUT having to read all 16,000 items from that one line, then split them out into a list or dictionary?? File iteration is a convenience since it is the most common case. If everything is on one line, you will have to handle record separators manually by using the .read(number_of_bytes) method on the file object and searching for the comma. If everything fits in memory the straightforward way would be to read the whole file with .read() and use .split(,) on the returned string. That should give you a nice list of everything. Agreed. The confusion came because the guy teaching said that iterating the file is delimited by a carriage return character... If he said exactly that, he is not exactly correct. File iteration looks for line ending character(s), which depends on the system or universal newline setting. which to me sounds like it's an arbitrary thing that can be changed... I was already thinking that I'd have to read it in small chunks and search for the delimiter i want... and reading the whole file into a string and then splitting that would would be nice, until the file is so large that it starts taking up significant amounts of memory. Anyway, thanks both of you for the explanations... I appreciate the help! I would not be surprised if a generic file chunk generator were posted somewhere. It would be a good entry for the Python Cookbook, if not there already. tjr -- http://mail.python.org/mailman/listinfo/python-list