[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-04-12 Thread Skip Montanaro

Skip Montanaro [EMAIL PROTECTED] added the comment:

I can't see a great reason to change the behavior.  I've attached my
current patch for csv.py and test_csv.py in case someone else wants
to pick it up later.

--
keywords: +patch
priority:  - low
resolution:  - postponed
status: open - closed
Added file: http://bugs.python.org/file10020/csv.diff

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-03-29 Thread Amaury Forgeot d'Arc

Amaury Forgeot d'Arc [EMAIL PROTECTED] added the comment:

 It works entirely based on chracter frequencies.

Does it make sense to restrict delimiters to a reasonable set of
characters? Usual punctuations, spaces, tabs... what else?

--
nosy: +amaury.forgeotdarc

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-03-29 Thread Skip Montanaro

Skip Montanaro [EMAIL PROTECTED] added the comment:

 It works entirely based on chracter frequencies.

Amaury Does it make sense to restrict delimiters to a reasonable set of
Amaury characters? Usual punctuations, spaces, tabs... what else?

There is an optional delimiters argument to the sniff() method which
defaults to None.  I would be happier if it was the usual suspects
(NeoOffice refuses to gues, but offers TAB, space, semicolon and comma as
the default separators when importing a CSV file - Excel seems to just
figure it out).  That would change the behavior though.  With no delimiter
set it's generally going to find something, just pick incorrectly.  With a
non-existent delimiter set it's going to raise an exception.  I'm not sure
this would be a good tradeoff and would just break existing code.

Skip

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-03-28 Thread Skip Montanaro

Skip Montanaro [EMAIL PROTECTED] added the comment:

Jean-Philippe You're right, it does seem that using f.read(1024) to
Jean-Philippe feed the sniffer works OK in my case and allows me to
Jean-Philippe instantiate the DictReader correctly...  Why that is I'm
Jean-Philippe not sure though...

It works entirely based on chracter frequencies.  The more characters you
feed it the better it should be at guessing the correct delimiter.  In
particular, it pays attention to the frequency of the possible delimiters
per line and assumes the number of columns is the same for each line.
(Well, there's one place where it does use some knowledge of the structure
of a csv file, so my earlier assertion was incorrect.)  If you only feed it
one line it can't really use that frequency-per-line information.

Jean-Philippe I was submitting the first line as I thought is was the
Jean-Philippe right sample to provide the sniffer for it to sniff the
Jean-Philippe correct dialect regardless of the file format and file
Jean-Philippe content.

That's a good guess, but not quite spot on in this case.  In particular, the
character frequencies in the first line tend to be much different than the
other lines because it usually a row of column headers, while the remainder
of the file (though not always ;-) is a table of numbers.

Skip

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-03-27 Thread Jean-Philippe Laverdure

Jean-Philippe Laverdure [EMAIL PROTECTED] added the comment:

Hello and sorry for the late reply.

Wolfgang: sorry about my misuse of the csv.DictReader constructor, that 
was a mistake on my part. However, it still is not functionning as I
think it should/could.  Look at this:

Using this content:
Sequence
AAGINRDSL
AAIANHQVL

and this piece of code:
f = open(sys.argv[-1], 'r')
dialect = csv.Sniffer().sniff(f.readline())
f.seek(0)
reader = csv.DictReader(f, dialect=dialect)
for line in reader:
print line

I get this result:
{'Sequen': 'AAGINRDSL', 'e': None}
{'Sequen': 'AAIANHQVL', 'e': None}

When I really should be getting this:
{'Sequence': 'AAGINRDSL'}
{'Sequence': 'AAIANHQVL'}

The fact is this code is in use in an application where users can submit
a .csv file produced by Excel for treatment.  The file must contain a
Sequence column since that is what the treatment is run on. Now I had
to make the following changes to my code to account for the fact that
some users submit a single column file (since only the Sequence column
is required for treatment):

f = open(sys.argv[-1], 'r')
try:
dialect = csv.Sniffer().sniff(f.readline(), [',', '\t'])
f.seek(0)
reader = csv.DictReader(f, dialect=dialect)
except:
print 'caught csv sniff() exception'
f.seek(0)
reader = csv.DictReader(f)
for line in reader:
Do what I need to do

Which really feels like a patched use of a buggy implementation of the
Sniffer class

I understand the issues raised by Skip in regards to figuring out a
delimiter at all costs...  But really, the Sniffer class should work
apropriately when a single column .csv file is submitted

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-03-27 Thread Skip Montanaro

Skip Montanaro [EMAIL PROTECTED] added the comment:

Jean-Philippe The fact is this code is in use in an application where
Jean-Philippe users can submit a .csv file produced by Excel for
Jean-Philippe treatment.  The file must contain a Sequence column
Jean-Philippe since that is what the treatment is run on. Now I had to
Jean-Philippe make the following changes to my code to account for the
Jean-Philippe fact that some users submit a single column file (since
Jean-Philippe only the Sequence column is required for treatment):

Jean-Philippe f = open(sys.argv[-1], 'r')
Jean-Philippe try:
Jean-Philippe dialect = csv.Sniffer().sniff(f.readline(), [',', '\t'])
Jean-Philippe f.seek(0)
Jean-Philippe reader = csv.DictReader(f, dialect=dialect)
Jean-Philippe except:
Jean-Philippe print 'caught csv sniff() exception'
Jean-Philippe f.seek(0)
Jean-Philippe reader = csv.DictReader(f)
Jean-Philippe for line in reader:
Jean-Philippe Do what I need to do

What exceptions are you catching?  Why are you only giving it a single line
of input as a sample?  What happens if you instead use f.read(1024) as the
sample?  When there is only a single column in the file and you give it a
delimiter set which doesn't include any characters in the file it (I think
correctly) raises an exception to tell you that it couldn't determine the
delimiter:

 import csv
 f = open(listB2Mforblast.csv)
 dialect = csv.Sniffer().sniff(f.read(1024))
 dialect.delimiter
''
 f.seek(0)
 dialect = csv.Sniffer().sniff(f.read(1024), ,\t :;)
Traceback (most recent call last):
  File stdin, line 1, in module
  File /Users/skip/local/lib/python2.6/csv.py, line 161, in sniff
raise Error, Could not determine delimiter
_csv.Error: Could not determine delimiter

In that case, use csv.excel as the dialect.  It doesn't matter what you use
as the delimiter if it doesn't occur in the file, and if it can't figure out
the delimiter it's also not going to guess the quotechar.

 try:
... dialect = csv.Sniffer().sniff(f.read(1024), ,\t :;)
... except csv.Error:
... dialect = csv.excel
... 

I personally don't much like the sniffer.  It doesn't use any knowledge of
the structure of a CSV file to guess the delimiter and quotechar (and those
are the only two parameters it does guess).  I would prefer if it just went
away, but folks use it so it's likely to remain in its current form for the
forseeable future.

Skip

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-03-27 Thread Jean-Philippe Laverdure

Jean-Philippe Laverdure [EMAIL PROTECTED] added the comment:

Hi Skip,

You're right, it does seem that using f.read(1024) to feed the sniffer
works OK in my case and allows me to instantiate the DictReader
correctly...  Why that is I'm not sure though...

I was submitting the first line as I thought is was the right sample to
provide the sniffer for it to sniff the correct dialect regardless of
the file format and file content.

And yes, 'except csv.Error' is certainly a better way to trap my desired
exception... I guess I'm a bit of a n00b using Python.

Thanks for the help. 
Python really has a great community !

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-03-19 Thread Wolfgang Langner

Wolfgang Langner [EMAIL PROTECTED] added the comment:

In this cases it is not really possible to sniff the right delimiter.
To not allow digits or letters is not a good solution.
I think the behavior as now is ok, and at this time I see now way to
improve it.

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-03-19 Thread Skip Montanaro

Skip Montanaro [EMAIL PROTECTED] added the comment:

Wolfgang In this cases it is not really possible to sniff the right
Wolfgang delimiter.  To not allow digits or letters is not a good
Wolfgang solution.  I think the behavior as now is ok, and at this time
Wolfgang I see now way to improve it.

I mostly agree.  I'm waiting for the original submitter to chime in though.

Skip

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-03-18 Thread Skip Montanaro

Skip Montanaro [EMAIL PROTECTED] added the comment:

What do you think the delimiter should be for this csv file?

43.4e12
147483648
47483648

What about this one?

abcdef
bcdefg
cdefgh

And this?

abc8def
bcd8efg
cde8fgh

If I force the sniffer to not allow digits or letters as
delimiters I can get the sniffer to return comma as the
delimiter in all three cases.  I'm not certain that's
correct in the third case though.

--
assignee:  - skip.montanaro
nosy: +skip.montanaro

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-03-17 Thread Wolfgang Langner

Wolfgang Langner [EMAIL PROTECTED] added the comment:

The sniffer returns an dialect that is not really correct. Because the
delimiter is set to value and in this case there is no delimiter.
See it as, it returns a random delimiter if there is not really one.

But your usage of the DictReader is wrong. It have to be called with
csv.DictReader(file, dialect=dialect) and then it works in this example.

This could be closed.

--
nosy: +tds333
versions: +Python 2.6, Python 3.0

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-02-12 Thread Jean-Philippe Laverdure

Changes by Jean-Philippe Laverdure:


--
components: +Library (Lib) -Extension Modules
versions: +Python 2.4

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2078] CSV Sniffer does not function properly on single column .csv files

2008-02-12 Thread Jean-Philippe Laverdure

New submission from Jean-Philippe Laverdure:

When attempting to sniff() the dialect for the attached .csv file,
csv.Sniffer.sniff() returns an unusable dialect:

 import csv
 file = open('listB2Mforblast.csv', 'r')
 dialect = csv.Sniffer().sniff(file.readline())
 file.seek(0)
 file.readline()
 file.seek(0)
 reader = csv.DictReader(file, dialect)
 reader.next()
Traceback (most recent call last):
  File stdin, line 1, in module
  File /soft/bioinfo/linux/python-2.5/lib/python2.5/csv.py, line 93,
in next
d = dict(zip(self.fieldnames, row))
TypeError: zip argument #1 must support iteration

However, this works fine:
 file.seek(0)
 reader = csv.DictReader(file)
 reader.next()
{'Sequence': 'AALENTHLL'}

If I use a 2 column file, sniff() works perfectly.
It only seems to have a problem with single column .csv files (which are
still .csv files in my opinion)

Thanks for looking into this.

--
components: Extension Modules
files: listB2Mforblast.csv
messages: 62319
nosy: jplaverdure
severity: normal
status: open
title: CSV Sniffer does not function properly on single column .csv files
type: behavior
versions: Python 2.5
Added file: http://bugs.python.org/file9416/listB2Mforblast.csv

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2078
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com