Re: issue in handling CSV data

2019-09-10 Thread Piet van Oostrum
Sharan Basappa  writes:

>> 
>> Note that the commas are within the quotes. I'd say Andrea is correct:
>> This is a tab-separated file, not a comma-separated file. But for some
>> reason all fields except the last end with a comma. 
>>

However, genfromtxt is not a full-fledged CSV parser. It does not obey quotes. 
So the commas inside the quotes ARE treated as separators.

> Hi Peter,
>
> I respectfully disagree that it is not a comma separated. Let me explain why.
> If you look the following line in the code, it specifies comma as the 
> delimiter:
>
> 
> my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)
> 
>
> Now, if you see the print after getting the data, it looks like this:
>
> ## 
> [['"\t"81' '"\t5c'] 
>  ['"\t"04' '"\t11'] 
>  ['"\t"e1' '"\t17'] 
>  ['"\t"6a' '"\t6c'] 
>  ['"\t"53' '"\t69'] 
>  ['"\t"98' '"\t87'] 
>  ['"\t"5c' '"\t4b'] 
> ## 

1) Where did the other fields (address, length) go?
>
> if you observe, the commas have disappeared. That, I think, is because
> it actually treated this as a CSV file.

2) As I said above, if you choose ',' as separator, these will disappear. 
Similarly, if you choose TAB as seperator, the TABs will disappear. As the 
format is a strange mixture of the two, you can use either one. But if it would 
be read with a real CSV-reader, that obeys the quote convention, than using ',' 
as seperator will not work. Only TAB will work.
But in both cases you would have to do some pre- or post-processing to get the 
data as you want them.

> Anyway, I am checking to see if I can discard the tabs and process this.
> I will keep everyone posted.

-- 
Piet van Oostrum 
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: issue in handling CSV data

2019-09-10 Thread Gregory Ewing

Sharan Basappa wrote:

Now, if you see the print after getting the data, it looks like this:

## 
[['"\t"81' '"\t5c'] 
 ['"\t"04' '"\t11'] 
 ['"\t"e1' '"\t17'] 
 ['"\t"6a' '"\t6c'] 
 ['"\t"53' '"\t69'] 
 ['"\t"98' '"\t87'] 
 ['"\t"5c' '"\t4b'] 
## 


But now you have weird things such as unmatched single quotes.

It seems that whichever way you try to interpret it -- tab
delimited or comma delimited -- it doesn't entirely make sense.
That leads me to believe it has been corrupted somehow.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: issue in handling CSV data

2019-09-09 Thread Sharan Basappa
On Sunday, 8 September 2019 12:45:45 UTC-4, Peter J. Holzer  wrote:
> On 2019-09-08 05:41:07 -0700, Sharan Basappa wrote:
> > On Sunday, 8 September 2019 04:56:29 UTC-4, Andrea D'Amore  wrote:
> > > On Sun, 8 Sep 2019 at 02:19, Sharan Basappa  
> > > wrote:
> > > > As you can see, the string "\t"81 is causing the error.
> > > > It seems to be due to char "\t".
> > > 
> > > It is not clear what format do you expect to be in the file.
> > > You say "it is CSV" so your actual payload seems to be a pair of three
> > > bytes (a tab and two hex digits in ASCII) per line.
> > 
> > The issue seems to be presence of tabs along with the numbers in a single 
> > string. So, when I try to convert strings to numbers, it fails due to 
> > presence of tabs.
> > 
> > Here is the hex dump:
> > 
> > 22 61 64 64 72 65 73 73 2c 22 09 22 6c 65 6e 67 
> > 74 68 2c 22 09 22 38 31 2c 22 09 35 63 0d 0a 22 
> > 61 64 64 72 65 73 73 2c 22 09 22 6c 65 6e 67 74 
> ...
> 
> This looks like this:
> 
> "address,"  "length,"   "81,"   5c
> "address,"  "length,"   "04,"   11
> "address,"  "length,"   "e1,"   17
> "address,"  "length,"   "6a,"   6c
> ...
> 
> Note that the commas are within the quotes. I'd say Andrea is correct:
> This is a tab-separated file, not a comma-separated file. But for some
> reason all fields except the last end with a comma. 
> 
> I would 
> 
> a) try to convince the person producing the file to clean up the mess
> 
> b) if that is not successful, use the csv module to read the file with
>separator tab and then discard the trailing commas.
> 

Hi Peter,

I respectfully disagree that it is not a comma separated. Let me explain why.
If you look the following line in the code, it specifies comma as the delimiter:


my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)


Now, if you see the print after getting the data, it looks like this:

## 
[['"\t"81' '"\t5c'] 
 ['"\t"04' '"\t11'] 
 ['"\t"e1' '"\t17'] 
 ['"\t"6a' '"\t6c'] 
 ['"\t"53' '"\t69'] 
 ['"\t"98' '"\t87'] 
 ['"\t"5c' '"\t4b'] 
## 

if you observe, the commas have disappeared. That, I think, is because it 
actually treated this as a CSV file.

Anyway, I am checking to see if I can discard the tabs and process this.
I will keep everyone posted.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: issue in handling CSV data

2019-09-08 Thread Peter J. Holzer
On 2019-09-08 05:41:07 -0700, Sharan Basappa wrote:
> On Sunday, 8 September 2019 04:56:29 UTC-4, Andrea D'Amore  wrote:
> > On Sun, 8 Sep 2019 at 02:19, Sharan Basappa  
> > wrote:
> > > As you can see, the string "\t"81 is causing the error.
> > > It seems to be due to char "\t".
> > 
> > It is not clear what format do you expect to be in the file.
> > You say "it is CSV" so your actual payload seems to be a pair of three
> > bytes (a tab and two hex digits in ASCII) per line.
> 
> The issue seems to be presence of tabs along with the numbers in a single 
> string. So, when I try to convert strings to numbers, it fails due to 
> presence of tabs.
> 
> Here is the hex dump:
> 
> 22 61 64 64 72 65 73 73 2c 22 09 22 6c 65 6e 67 
> 74 68 2c 22 09 22 38 31 2c 22 09 35 63 0d 0a 22 
> 61 64 64 72 65 73 73 2c 22 09 22 6c 65 6e 67 74 
...

This looks like this:

"address,"  "length,"   "81,"   5c
"address,"  "length,"   "04,"   11
"address,"  "length,"   "e1,"   17
"address,"  "length,"   "6a,"   6c
...

Note that the commas are within the quotes. I'd say Andrea is correct:
This is a tab-separated file, not a comma-separated file. But for some
reason all fields except the last end with a comma. 

I would 

a) try to convince the person producing the file to clean up the mess

b) if that is not successful, use the csv module to read the file with
   separator tab and then discard the trailing commas.

hp


-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: issue in handling CSV data

2019-09-08 Thread Sharan Basappa
On Sunday, 8 September 2019 04:56:29 UTC-4, Andrea D'Amore  wrote:
> On Sun, 8 Sep 2019 at 02:19, Sharan Basappa  wrote:
>  This is the error:
> > my_data_3 = my_data_2.astype(np.float)
> > could not convert string to float: " "81
> 
> > As you can see, the string "\t"81 is causing the error.
> > It seems to be due to char "\t".
> 
> It is not clear what format do you expect to be in the file.
> You say "it is CSV" so your actual payload seems to be a pair of three
> bytes (a tab and two hex digits in ASCII) per line.
> 
> Can you paste a hexdump of the first three lines of the input file and
> say what you expect to get once the data has been processed?

Andrea,

The issue seems to be presence of tabs along with the numbers in a single 
string. So, when I try to convert strings to numbers, it fails due to presence 
of tabs.

Here is the hex dump:

22 61 64 64 72 65 73 73 2c 22 09 22 6c 65 6e 67 
74 68 2c 22 09 22 38 31 2c 22 09 35 63 0d 0a 22 
61 64 64 72 65 73 73 2c 22 09 22 6c 65 6e 67 74 
68 2c 22 09 22 30 34 2c 22 09 31 31 0d 0a 22 61 
64 64 72 65 73 73 2c 22 09 22 6c 65 6e 67 74 68 
2c 22 09 22 65 31 2c 22 09 31 37 0d 0a 22 61 64 
64 72 65 73 73 2c 22 09 22 6c 65 6e 67 74 68 2c 
22 09 22 36 61 2c 22 09 36 63 0d 0a 22 61 64 64 
72 65 73 73 2c 22 09 22 6c 65 6e 67 74 68 2c 22 
09 22 35 33 2c 22 09 36 39 0d 0a 22 61 64 64 72 
65 73 73 2c 22 09 22 6c 65 6e 67 74 68 2c 22 09 
22 39 38 2c 22 09 38 37 0d 0a 22 61 64 64 72 65 
73 73 2c 22 09 22 6c 65 6e 67 74 68 2c 22 09 22 
35 63 2c 22 09 34 62 0d 0a 22 61 64 64 72 65 73 
73 2c 22 09 22 6c 65 6e 67 74 68 2c 22 09 22 32 
38 2c 22 09 33 36 0d 0a 22 61 64 64 72 65 73 73 
2c 22 09 22 6c 65 6e 67 74 68 2c 22 09 22 36 33 
2c 22 09 35 30 0d 0a 22 61 64 64 72 65 73 73 2c 
22 09 22 6c 65 6e 67 74 68 2c 22 09 22 32 34 2c 
22 09 32 31 0d 0a 22 61 64 64 72 65 73 73 2c 22 
09 22 6c 65 6e 67 74 68 2c 22 09 22 64 66 2c 22 
09 39 61 0d 0a 22 61 64 64 72 65 73 73 2c 22 09 
22 6c 65 6e 67 74 68 2c 22 09 22 61 62 2c 22 09 
62 39 0d 0a 22 61 64 64 72 65 73 73 2c 22 09 22 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: issue in handling CSV data

2019-09-08 Thread Andrea D'Amore
On Sun, 8 Sep 2019 at 02:19, Sharan Basappa  wrote:
 This is the error:
> my_data_3 = my_data_2.astype(np.float)
> could not convert string to float: " "81

> As you can see, the string "\t"81 is causing the error.
> It seems to be due to char "\t".

It is not clear what format do you expect to be in the file.
You say "it is CSV" so your actual payload seems to be a pair of three
bytes (a tab and two hex digits in ASCII) per line.

Can you paste a hexdump of the first three lines of the input file and
say what you expect to get once the data has been processed?


-- 
Andrea
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: issue in handling CSV data

2019-09-07 Thread Sharan Basappa
On Saturday, 7 September 2019 21:18:11 UTC-4, MRAB  wrote:
> On 2019-09-08 01:19, Sharan Basappa wrote:
> > I am trying to read a log file that is in CSV format.
> > 
> > The code snippet is below:
> > 
> > ###
> > import matplotlib.pyplot as plt
> > import seaborn as sns; sns.set()
> > import numpy as np
> > import pandas as pd
> > import os
> > import csv
> > from numpy import genfromtxt
> > 
> > # read the CSV and get into X array
> > os.chdir(r'D:\Users\sharanb\OneDrive - HCL Technologies 
> > Ltd\Projects\MyBackup\Projects\Initiatives\machine 
> > learning\programs\constraints')
> > X = []
> > #with open("constraints.csv", 'rb') as csvfile:
> > #reader = csv.reader(csvfile)
> > #data_as_list = list(reader)
> > #myarray = np.asarray(data_as_list)
> > 
> > my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)
> > print (my_data)
> > 
> > my_data_1 = np.delete(my_data, 0, axis=1)
> > print (my_data_1)
> > 
> > my_data_2 = np.delete(my_data_1, 0, axis=1)
> > print (my_data_2)
> > 
> > my_data_3 = my_data_2.astype(np.float)
> > 
> > 
> > Here is how print (my_data_2) looks like:
> > ##
> > [['"\t"81' '"\t5c']
> >   ['"\t"04' '"\t11']
> >   ['"\t"e1' '"\t17']
> >   ['"\t"6a' '"\t6c']
> >   ['"\t"53' '"\t69']
> >   ['"\t"98' '"\t87']
> >   ['"\t"5c' '"\t4b']
> > ##
> > 
> > Finally, I am trying to get rid of the strings and get array of numbers 
> > using Numpy's astype function. At this stage, I get an error.
> > 
> > This is the error:
> > my_data_3 = my_data_2.astype(np.float)
> > could not convert string to float: " "81
> > 
> > As you can see, the string "\t"81 is causing the error.
> > It seems to be due to char "\t".
> > 
> > I don't know how to resolve this.
> > 
> > Thanks for your help.
> > 
> Are you sure it's CSV (Comma-Separated Value) and not TSV (Tab-Separated 
> Value)?
> 
> Also the values look like hexadecimal to me. I think that 
> .astype(np.float) assumes that the values are decimal.
> 
> I'd probably start by reading them using the csv module, convert the 
> values to decimal, and then pass them on to numpy.

yes. it is CSV. The commas are gone once csv.reader processed the csv file.
The tabs seem to be there also which seem to be causing the issue.

Thanks for your response
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: issue in handling CSV data

2019-09-07 Thread MRAB

On 2019-09-08 01:19, Sharan Basappa wrote:

I am trying to read a log file that is in CSV format.

The code snippet is below:

###
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
import pandas as pd
import os
import csv
from numpy import genfromtxt

# read the CSV and get into X array
os.chdir(r'D:\Users\sharanb\OneDrive - HCL Technologies 
Ltd\Projects\MyBackup\Projects\Initiatives\machine 
learning\programs\constraints')
X = []
#with open("constraints.csv", 'rb') as csvfile:
#reader = csv.reader(csvfile)
#data_as_list = list(reader)
#myarray = np.asarray(data_as_list)

my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)
print (my_data)

my_data_1 = np.delete(my_data, 0, axis=1)
print (my_data_1)

my_data_2 = np.delete(my_data_1, 0, axis=1)
print (my_data_2)

my_data_3 = my_data_2.astype(np.float)


Here is how print (my_data_2) looks like:
##
[['"\t"81' '"\t5c']
  ['"\t"04' '"\t11']
  ['"\t"e1' '"\t17']
  ['"\t"6a' '"\t6c']
  ['"\t"53' '"\t69']
  ['"\t"98' '"\t87']
  ['"\t"5c' '"\t4b']
##

Finally, I am trying to get rid of the strings and get array of numbers using 
Numpy's astype function. At this stage, I get an error.

This is the error:
my_data_3 = my_data_2.astype(np.float)
could not convert string to float: " "81

As you can see, the string "\t"81 is causing the error.
It seems to be due to char "\t".

I don't know how to resolve this.

Thanks for your help.

Are you sure it's CSV (Comma-Separated Value) and not TSV (Tab-Separated 
Value)?


Also the values look like hexadecimal to me. I think that 
.astype(np.float) assumes that the values are decimal.


I'd probably start by reading them using the csv module, convert the 
values to decimal, and then pass them on to numpy.

--
https://mail.python.org/mailman/listinfo/python-list


Re: issue in handling CSV data

2019-09-07 Thread Joel Goldstick
On Sat, Sep 7, 2019 at 8:28 PM Joel Goldstick  wrote:
>
> On Sat, Sep 7, 2019 at 8:21 PM Sharan Basappa  
> wrote:
> >
> > I am trying to read a log file that is in CSV format.
> >
> > The code snippet is below:
> >
> > ###
> > import matplotlib.pyplot as plt
> > import seaborn as sns; sns.set()
> > import numpy as np
> > import pandas as pd
> > import os
> > import csv
> > from numpy import genfromtxt
> >
> > # read the CSV and get into X array
> > os.chdir(r'D:\Users\sharanb\OneDrive - HCL Technologies 
> > Ltd\Projects\MyBackup\Projects\Initiatives\machine 
> > learning\programs\constraints')
> > X = []
> > #with open("constraints.csv", 'rb') as csvfile:
> > #reader = csv.reader(csvfile)
> > #data_as_list = list(reader)
> > #myarray = np.asarray(data_as_list)
> >
> > my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)
> > print (my_data)
> >
> > my_data_1 = np.delete(my_data, 0, axis=1)
> > print (my_data_1)
> >
> > my_data_2 = np.delete(my_data_1, 0, axis=1)
> > print (my_data_2)
> >
> > my_data_3 = my_data_2.astype(np.float)
> > 
> >
> > Here is how print (my_data_2) looks like:
> > ##
> > [['"\t"81' '"\t5c']
> >  ['"\t"04' '"\t11']
> >  ['"\t"e1' '"\t17']
> >  ['"\t"6a' '"\t6c']
> >  ['"\t"53' '"\t69']
> >  ['"\t"98' '"\t87']
> >  ['"\t"5c' '"\t4b']
> > ##
> >
> > Finally, I am trying to get rid of the strings and get array of numbers 
> > using Numpy's astype function. At this stage, I get an error.
> >
> > This is the error:
> > my_data_3 = my_data_2.astype(np.float)
> > could not convert string to float: " "81
> >
> > As you can see, the string "\t"81 is causing the error.
> > It seems to be due to char "\t".
> >
> > I don't know how to resolve this.
> >
> > Thanks for your help.
> >
> > --
> > https://mail.python.org/mailman/listinfo/python-list
>
> how about (strip(my_data_2).astype(np.float))
>
> I haven't used numpy, but if your theory is correct, this will clean
> up the string
>
oops, I think I was careless at looking at your data.  so this doesn't
seem like such a good idea
> --
> Joel Goldstick
> http://joelgoldstick.com/blog
> http://cc-baseballstats.info/stats/birthdays



-- 
Joel Goldstick
http://joelgoldstick.com/blog
http://cc-baseballstats.info/stats/birthdays
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: issue in handling CSV data

2019-09-07 Thread Joel Goldstick
On Sat, Sep 7, 2019 at 8:21 PM Sharan Basappa  wrote:
>
> I am trying to read a log file that is in CSV format.
>
> The code snippet is below:
>
> ###
> import matplotlib.pyplot as plt
> import seaborn as sns; sns.set()
> import numpy as np
> import pandas as pd
> import os
> import csv
> from numpy import genfromtxt
>
> # read the CSV and get into X array
> os.chdir(r'D:\Users\sharanb\OneDrive - HCL Technologies 
> Ltd\Projects\MyBackup\Projects\Initiatives\machine 
> learning\programs\constraints')
> X = []
> #with open("constraints.csv", 'rb') as csvfile:
> #reader = csv.reader(csvfile)
> #data_as_list = list(reader)
> #myarray = np.asarray(data_as_list)
>
> my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)
> print (my_data)
>
> my_data_1 = np.delete(my_data, 0, axis=1)
> print (my_data_1)
>
> my_data_2 = np.delete(my_data_1, 0, axis=1)
> print (my_data_2)
>
> my_data_3 = my_data_2.astype(np.float)
> 
>
> Here is how print (my_data_2) looks like:
> ##
> [['"\t"81' '"\t5c']
>  ['"\t"04' '"\t11']
>  ['"\t"e1' '"\t17']
>  ['"\t"6a' '"\t6c']
>  ['"\t"53' '"\t69']
>  ['"\t"98' '"\t87']
>  ['"\t"5c' '"\t4b']
> ##
>
> Finally, I am trying to get rid of the strings and get array of numbers using 
> Numpy's astype function. At this stage, I get an error.
>
> This is the error:
> my_data_3 = my_data_2.astype(np.float)
> could not convert string to float: " "81
>
> As you can see, the string "\t"81 is causing the error.
> It seems to be due to char "\t".
>
> I don't know how to resolve this.
>
> Thanks for your help.
>
> --
> https://mail.python.org/mailman/listinfo/python-list

how about (strip(my_data_2).astype(np.float))

I haven't used numpy, but if your theory is correct, this will clean
up the string


-- 
Joel Goldstick
http://joelgoldstick.com/blog
http://cc-baseballstats.info/stats/birthdays
-- 
https://mail.python.org/mailman/listinfo/python-list


issue in handling CSV data

2019-09-07 Thread Sharan Basappa
I am trying to read a log file that is in CSV format.

The code snippet is below:

###
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
import pandas as pd
import os
import csv
from numpy import genfromtxt

# read the CSV and get into X array
os.chdir(r'D:\Users\sharanb\OneDrive - HCL Technologies 
Ltd\Projects\MyBackup\Projects\Initiatives\machine 
learning\programs\constraints')
X = []
#with open("constraints.csv", 'rb') as csvfile:
#reader = csv.reader(csvfile)
#data_as_list = list(reader)
#myarray = np.asarray(data_as_list)

my_data = genfromtxt('constraints.csv', delimiter = ',', dtype=None)
print (my_data)

my_data_1 = np.delete(my_data, 0, axis=1)
print (my_data_1)

my_data_2 = np.delete(my_data_1, 0, axis=1)
print (my_data_2)

my_data_3 = my_data_2.astype(np.float)


Here is how print (my_data_2) looks like:
##
[['"\t"81' '"\t5c']
 ['"\t"04' '"\t11']
 ['"\t"e1' '"\t17']
 ['"\t"6a' '"\t6c']
 ['"\t"53' '"\t69']
 ['"\t"98' '"\t87']
 ['"\t"5c' '"\t4b']
##

Finally, I am trying to get rid of the strings and get array of numbers using 
Numpy's astype function. At this stage, I get an error.

This is the error:
my_data_3 = my_data_2.astype(np.float)
could not convert string to float: " "81 

As you can see, the string "\t"81 is causing the error.
It seems to be due to char "\t". 

I don't know how to resolve this.

Thanks for your help.

-- 
https://mail.python.org/mailman/listinfo/python-list